<- data.frame("Before" = c(74, 86, 98, 102, 78, 84, 79, 70),
data "After" = c(70, 85, 90, 110, 71, 80, 69, 74))
MathALEA — Hypothesis testing
📝 Exercise 1
Here are the “Jogging” dataset (that we covered previously).
Read the documentation of the t.test function.
Does jogging reduce the heart rate \((\alpha = 5\%)\)?
## Code here
In the software output, what is the meaning of \(t = 1.263\), \(df = 7\) and the real number displayed below mean difference?
📝 Exercise 2
We will be using the (already included whithin R) HairEyeColor dataset which is a 3-dimensional contingency table w.r.t. the colors of the hairs, eyes and sex.
We are going to marginalize w.r.t. to the Sex variable, i.e., compute frequencies without taking into account that variable. This is easily done using the following piece of code;
<- apply(HairEyeColor, c(1,2), sum) HairEye HairEye
Eye Hair Brown Blue Hazel Green Black 68 20 15 5 Brown 119 84 54 29 Red 26 17 14 14 Blond 7 94 10 16
Read the documentation of the chisq.test function. Are the variable color of the eyes and hairs dependent?
## Code here
Does the color of the eyes depends on the sex? And for the hairs?
## Code here
📝 Exercise 3
We will analyze the data on the growth of chickens w.r.t. supplement feed. (cf. exercise sheet).
Create the data frame.
## Write code here
Read the documentation of the oneway.test function and use it to assess whether the diet influence the growth.
## Write code here
Check if the output of the above function matches that you got during a written exercise. Conclude about your statistical abilities.
With the help of the kruskal.test function (and you’ll read its documentation), perform the same hypothesis testing but using a non parametric test.
## Write code here
The data set we used was actually a subset of a larger one. We know that analysis of variance may handle more than 2 groups. Perform the same hypothesis testing on the
chickwts
data provided by R.## Code here
📝 Exercise 4:
We are going to perform a power analysis of a hypothesis test from numerical simualtions. To this aim, we will generate \(K\) independent samples of size \(n\) from a \(Bernoulli(p\)) distribution and test wehter \(p = p_0\) against \(p \neq p_0\) with \(p_0 = 0.75\). Next we will report the proportion of times we rejected \(H_0\).
Fill in the following piece of code
<- function(nsim, p, n, p0 = 0.75, alpha = 0.05){
power ## This function estimates the power of a one sample z-test for proportions from simulation
##
## nsim: number of simulated samples (K above)
## p: proba. of success of the Bernoulli
## n: sample size
## p0: proba. of success under H0
## alpha: test significance
## Fill in the "## ??? ##" parts
## Generate nsim experiment using the 'rbinom' function
<- ## ??? ##
nsuccess
## Computation of the observed test statistics
<- ## ??? ##
Tobs
## Computation of the associated p-values
<- ## ??? ##
pval
## Decision rule: Accept / Reject
<- ## ??? ##
decisions
return(decisions)
}
## We run the code with some fixed values
<- 9999
nsim <- 0.5
p <- 35
n
<- power(nsim, p, n)
decisions table(decisions)
Using the above function, plot the estimated power curve, i.e., \(p\mapsto 1 - \beta(p)\), with different sample sizes, e.g., \(n=20, 50, 100, 500\). Comment.
## Write your code here
What happens when \(p = p_0\)? Why?
In the power function, you have “reinvented the wheel” of the \(z\)-test for proportions. Of course, this function has been already implemented in R and is called prop.test. Read its documentation and use it to assess whether in your promotion the probability to have “light eyes”, i.e., blue or green, is greater when you have “non brown hair”.
Remark: The data set is available from my webpage.
📝 Exercise 5:
Redo the (written) exercise we did on the weighting precision of scales. For this exercise you are on your own and you have to do your own research to find the appropriate test and function.