MathALEA — Hypothesis testing

Author

Mathieu Ribatet

📝 Exercise 1

Here are the “Jogging” dataset (that we covered previously).

data <- data.frame("Before" = c(74, 86, 98, 102, 78, 84, 79, 70),
                   "After" = c(70, 85, 90, 110, 71, 80, 69, 74))
  1. Read the documentation of the t.test function.

  2. Does jogging reduce the heart rate \((\alpha = 5\%)\)?

    ## Code here
  3. In the software output, what is the meaning of \(t = 1.263\), \(df = 7\) and the real number displayed below mean difference?


📝 Exercise 2

We will be using the (already included whithin R) HairEyeColor dataset which is a 3-dimensional contingency table w.r.t. the colors of the hairs, eyes and sex.

  1. We are going to marginalize w.r.t. to the Sex variable, i.e., compute frequencies without taking into account that variable. This is easily done using the following piece of code;

    HairEye <- apply(HairEyeColor, c(1,2), sum)
    HairEye
           Eye
    Hair    Brown Blue Hazel Green
      Black    68   20    15     5
      Brown   119   84    54    29
      Red      26   17    14    14
      Blond     7   94    10    16

    Read the documentation of the chisq.test function. Are the variable color of the eyes and hairs dependent?

    ## Code here
  2. Does the color of the eyes depends on the sex? And for the hairs?

    ## Code here

📝 Exercise 3

We will analyze the data on the growth of chickens w.r.t. supplement feed. (cf. exercise sheet).

  1. Create the data frame.

    ## Write code here
  2. Read the documentation of the oneway.test function and use it to assess whether the diet influence the growth.

    ## Write code here
  3. Check if the output of the above function matches that you got during a written exercise. Conclude about your statistical abilities.

  4. With the help of the kruskal.test function (and you’ll read its documentation), perform the same hypothesis testing but using a non parametric test.

    ## Write code here
  5. The data set we used was actually a subset of a larger one. We know that analysis of variance may handle more than 2 groups. Perform the same hypothesis testing on the chickwts data provided by R.

    ## Code here

📝 Exercise 4:

We are going to perform a power analysis of a hypothesis test from numerical simualtions. To this aim, we will generate \(K\) independent samples of size \(n\) from a \(Bernoulli(p\)) distribution and test wehter \(p = p_0\) against \(p \neq p_0\) with \(p_0 = 0.75\). Next we will report the proportion of times we rejected \(H_0\).

Fill in the following piece of code

power <- function(nsim, p, n, p0 = 0.75, alpha = 0.05){
  ## This function estimates the power of a one sample z-test for proportions from simulation
  ##
  ## nsim: number of simulated samples (K above)
  ## p: proba. of success of the Bernoulli
  ## n: sample size
  ## p0: proba. of success under H0
  ## alpha: test significance
  
  ## Fill in the "## ??? ##" parts
  
  ## Generate nsim experiment using the 'rbinom' function
  nsuccess <- ## ??? ##
      
  ## Computation of the observed test statistics
  Tobs <- ## ??? ##
    
  ## Computation of the associated p-values
  pval <- ## ??? ##
    
  ## Decision rule: Accept / Reject
  decisions <- ## ??? ##
  
  return(decisions)
}

## We run the code with some fixed values
nsim <- 9999
p <- 0.5
n <- 35

decisions <- power(nsim, p, n)
table(decisions)
  1. Using the above function, plot the estimated power curve, i.e., \(p\mapsto 1 - \beta(p)\), with different sample sizes, e.g., \(n=20, 50, 100, 500\). Comment.

    ## Write your code here
  2. What happens when \(p = p_0\)? Why?

  3. In the power function, you have “reinvented the wheel” of the \(z\)-test for proportions. Of course, this function has been already implemented in R and is called prop.test. Read its documentation and use it to assess whether in your promotion the probability to have “light eyes”, i.e., blue or green, is greater when you have “non brown hair”.

    Remark: The data set is available from my webpage.


📝 Exercise 5:

Redo the (written) exercise we did on the weighting precision of scales. For this exercise you are on your own and you have to do your own research to find the appropriate test and function.