AP Statistics Audio Lectures

Sampling Distributions

by Arnold Kling

Sampling Distributions

by Arnold Kling

To listen to the lecture, click here

Chapter 9, Sampling Distributions, has one formula, plus some jargon. Chapter 10 is pretty much all jargon. Chapter 11 is practical.

Chapter 9 assumes we know m and s. If that were true, we would not need data! Chapter 10 assumes we know s but do not know m. Very unrealistic. In chapter 11, we tackle the real-world case, where m and s are unknown.

New formula:

Z = (m)/[s/sqrt(n)], where is the sample mean, and n = sample size

-parameter | statistic |
---|---|

property of entire population | estimate based on sample |

m | |

s | s |

unknown to investigator | found by investigator |

How many people meant to vote for Gore in Florida in 2000 | vote tabulations |

How much hemoglobin is in your blood | hemoglobin in a sample |

statistical estimates of true parameters will have errors. Unbiased means that on average they are right. If the expected value of m, then is an unbiased estimate of the true, unknown mean.

=Sex surveys tend to produced biased estimates. For example, a magazine will report on heterosexual relationships and say that men have 5 times as many sex partners as women. But if an omniscient voyeur knew the true number of relationships, n, the number of males, m, and the number of females, f, the voyeur would know the average number of male sex partners, n/m, and the average number of female sex partners, n/f. Since m and f are about the same, one average cannot be five times the other. Either the estimate that males report is high, the estimate that females report is low, or both.

Exit polls in 2004

If you shoot at a target, even though you are unbiased, you may still miss. The amount by which you miss is sampling error.

The central limit theorem states that if you use the sample mean, m, then as the sample size gets large:

to estimate the unknown population mean- the distribution of will be normal
- the expected value of m will equal
- the variance of will get smaller
- the standard deviation of will approach the population standard deviation divided by the square root of the sample size

Note that the underlying population does *not* have to be normal for the central limit theorem to apply.

The last point of the theorem states that:

s_{X} = s/sqrt(n)

Remember the one formula from this chapter:

Z = (m)/[s/sqrt(n)]

-So, if we take a sample of 9 girls from a population with a mean height of 64 inches and a standard deviation of 3 inches, what is the probability that the average in our sample will be less than 63 inches?

Z = (m)/[s/sqrt(n)] = (63-64)/[3/sqrt(9] = -1.0 normcdf(-100, -1.0) = .16, or 16 percent.

-How large a sample would you need to make the probability of getting an average less than 63 inches less than .05? Case of the missing sigma.

invnorm(.05) = -1.64

-1.64 = (63-64)/[3/sqrt(n)]; 4.92 = sqrt(n) n = 25

true parameter is p; sample proportion is p^

mean of p^ is p

standard deviation of p^ is population standard deviation divided by the square root of the sample size, or sqrt[p(1-p)]/sqrt(n)

Note that with n binomial trials, the standard deviation of the mean number of successes is sqrt[np(1-p)]; but the standard deviation of p^ is sqrt[p(1-p)/n] (difference between counting and taking an average is that you divide by the number of trials)

Using the normal approximation. Suppose that Kobe is an 80 percent free throw shooter. What is the probability that Kobe will hit no more than 70 percent in 64 free throws?

You could use binomcdf(64, 0.8, 49)

Or, you can treat the distribution as approximately normal, with a mean of .8 and a standard deviation of sqrt[(.8)(1-.8)/64] = .05

Now calculate Z = (.70 - .80)/.05 = -2.0. Then normcdf(-100,-2) = .023, or a 2.3 percent probability that Kobe will hit less than 70 percent of 64 free throws.

Normal approximation most useful when we need to solve for n, because binomcdf won't help with that.

Normal approximation works poorly for small n (less than 20) and for very high or very low p--want np and n(1-p) to be greater than 10.

parameter vs. statistic

biased vs. unbiased

central limit theorem

using the formula

normal approximation to binomial