To listen to the lecture, click here.
Focus is still on terminology
Continue to assume that standard deviation is known, and we take a sample and find the mean. We have a hypothesis about the mean that we want to test.
Calculation goes from natural units to a percentile
Example: Longevity for people born in September is 77.8 years. Take a sample of people born in May and see if they have the same longevity or if it is shorter. Suppose that sample of 1600 people born in May has longevity of 77.5 years and the population standard deviation is known to be 4 years. Calculation:
Z = (m)/[s/sqrt(n)] = (77.5 - 77.8)/[4/sqrt(1600)] = -3.0
-What percentile? Normcdf (-100, -3.0) = .0013
Because .0013 is very small, there is a very low probability that you would get a sample mean of 77.5 with 1600 people if the true mean were 77.8, so we reject the hypothesis that the true mean is 77.8 in favor of the alternative that the true mean is lower than that.
How low is low? Below .05
General Procedure
Truly Guilty | Truly Innocent | |
---|---|---|
Convict | correct decision | Type I error |
Acquit | Type II error | correct decision |
Null Hypothesis False | Null Hypothesis True | |
---|---|---|
Reject Null Hypothesis | correct decision | Type I error |
Fail to Reject Null Hypothesis | Type II error | correct decision |
Null hypothesis, H0. Always an equality, e.g. H0: m = 77.8
Alternative hypothesis, Ha. Always an inequality, e.g. Ha: m < 77.8
p-value. The percentile that we get when we do the calculation, e.g. .0013 (multiplying by 100 would give 0.13 percentile)
significance level, a. The arbitrary hurdle set by the investigator, e.g. .05 or 5 percent. Technically, a is the probability that we give ourselves of making a Type I error.
If the p-value is below a, we reject H0 and accept Ha. Otherwise, we fail to reject H0. We never say that we accept H0, because we have not proven that it's true. We've only failed to find strong evidence that it's false.
Once again, a statement such as "we reject the null hypothesis at a 5 percent significance level" is a statement about our methods. We are saying that calculations based on our sample mean and sample size would incorrectly reject the null hypothesis no more than 5 percent of the time.
In classical statistics, the null hypothesis is either true or it isn't. You cannot say that there is a 0.13 percent probability that the null hypothesis is true. You can only make probability statements about your statistical methods.
State the null hypothesis in words and in symbols.
The null hypothesis is that the mean longevity of people born in May is 77.8 years. The alternative hypothesis is the the mean longevity is lower than that. Let m = mean longevity of people born in May.
H0: m = 77.8
Ha: m < 77.8
State significance level that you chose: a = .05
State sample results: n = 1600,
= 77.5State p-value: p-value = .0013
Compare p-value to a and state conclusion: Because .0013 is less than .05, we reject the null hypothesis that the mean longevity of people born in May is 77.8 years and accept the alternative that the mean longevity is lower
Important to remember: when p-value is less than a, we reject; when p-value is greater than a, we fail to reject