To listen to the lecture, click here
Chapter 9, Sampling Distributions, has one formula, plus some jargon. Chapter 10 is pretty much all jargon. Chapter 11 is practical.
Chapter 9 assumes we know m and s. If that were true, we would not need data! Chapter 10 assumes we know s but do not know m. Very unrealistic. In chapter 11, we tackle the real-world case, where m and s are unknown.
New formula:
Z = (m)/[s/sqrt(n)], where is the sample mean, and n = sample size
-parameter | statistic |
---|---|
property of entire population | estimate based on sample |
m | |
s | s |
unknown to investigator | found by investigator |
How many people meant to vote for Gore in Florida in 2000 | vote tabulations |
How much hemoglobin is in your blood | hemoglobin in a sample |
statistical estimates of true parameters will have errors. Unbiased means that on average they are right. If the expected value of m, then is an unbiased estimate of the true, unknown mean.
=Sex surveys tend to produced biased estimates. For example, a magazine will report on heterosexual relationships and say that men have 5 times as many sex partners as women. But if an omniscient voyeur knew the true number of relationships, n, the number of males, m, and the number of females, f, the voyeur would know the average number of male sex partners, n/m, and the average number of female sex partners, n/f. Since m and f are about the same, one average cannot be five times the other. Either the estimate that males report is high, the estimate that females report is low, or both.
Exit polls in 2004
If you shoot at a target, even though you are unbiased, you may still miss. The amount by which you miss is sampling error.
The central limit theorem states that if you use the sample mean, m, then as the sample size gets large:
to estimate the unknown population meanNote that the underlying population does not have to be normal for the central limit theorem to apply.
The last point of the theorem states that:
s = s/sqrt(n)
Remember the one formula from this chapter:
Z = (m)/[s/sqrt(n)]
-So, if we take a sample of 9 girls from a population with a mean height of 64 inches and a standard deviation of 3 inches, what is the probability that the average in our sample will be less than 63 inches?
Z = (m)/[s/sqrt(n)] = (63-64)/[3/sqrt(9] = -1.0 normcdf(-100, -1.0) = .16, or 16 percent.
-How large a sample would you need to make the probability of getting an average less than 63 inches less than .05? Case of the missing sigma.
invnorm(.05) = -1.64
-1.64 = (63-64)/[3/sqrt(n)]; 4.92 = sqrt(n) n = 25
true parameter is p; sample proportion is p^
mean of p^ is p
standard deviation of p^ is population standard deviation divided by the square root of the sample size, or sqrt[p(1-p)]/sqrt(n)
Note that with n binomial trials, the standard deviation of the mean number of successes is sqrt[np(1-p)]; but the standard deviation of p^ is sqrt[p(1-p)/n] (difference between counting and taking an average is that you divide by the number of trials)
Using the normal approximation. Suppose that Kobe is an 80 percent free throw shooter. What is the probability that Kobe will hit no more than 70 percent in 64 free throws?
You could use binomcdf(64, 0.8, 49)
Or, you can treat the distribution as approximately normal, with a mean of .8 and a standard deviation of sqrt[(.8)(1-.8)/64] = .05
Now calculate Z = (.70 - .80)/.05 = -2.0. Then normcdf(-100,-2) = .023, or a 2.3 percent probability that Kobe will hit less than 70 percent of 64 free throws.
Normal approximation most useful when we need to solve for n, because binomcdf won't help with that.
Normal approximation works poorly for small n (less than 20) and for very high or very low p--want np and n(1-p) to be greater than 10.
parameter vs. statistic
biased vs. unbiased
central limit theorem
using the formula
normal approximation to binomial