Stats Audio Lectures--sampling distributions

AP Statistics Audio Lectures
Sampling Distributions
by Arnold Kling

To listen to the lecture, click here

Chapter 9, Sampling Distributions, has one formula, plus some jargon. Chapter 10 is pretty much all jargon. Chapter 11 is practical.

Chapter 9 assumes we know m and s. If that were true, we would not need data! Chapter 10 assumes we know s but do not know m. Very unrealistic. In chapter 11, we tackle the real-world case, where m and s are unknown.

New formula:

Z = (X - m)/[s/sqrt(n)], where X is the sample mean, and n = sample size

parameter vs. statistic

parameter	statistic
property of entire population	estimate based on sample
m	X
s	s
unknown to investigator	found by investigator
How many people meant to vote for Gore in Florida in 2000	vote tabulations
How much hemoglobin is in your blood	hemoglobin in a sample

unbiased estimate

statistical estimates of true parameters will have errors. Unbiased means that on average they are right. If the expected value of X = m, then X is an unbiased estimate of the true, unknown mean.

Sex surveys tend to produced biased estimates. For example, a magazine will report on heterosexual relationships and say that men have 5 times as many sex partners as women. But if an omniscient voyeur knew the true number of relationships, n, the number of males, m, and the number of females, f, the voyeur would know the average number of male sex partners, n/m, and the average number of female sex partners, n/f. Since m and f are about the same, one average cannot be five times the other. Either the estimate that males report is high, the estimate that females report is low, or both.

Exit polls in 2004

Central Limit Theorem

If you shoot at a target, even though you are unbiased, you may still miss. The amount by which you miss is sampling error.

The central limit theorem states that if you use the sample mean, X to estimate the unknown population mean m, then as the sample size gets large:

the distribution of X will be normal
the expected value of X will equal m
the variance of X will get smaller
the standard deviation of X will approach the population standard deviation divided by the square root of the sample size

Note that the underlying population does not have to be normal for the central limit theorem to apply.

The last point of the theorem states that:

s_X = s/sqrt(n)

Using the Formula

Remember the one formula from this chapter:

Z = (X - m)/[s/sqrt(n)]

So, if we take a sample of 9 girls from a population with a mean height of 64 inches and a standard deviation of 3 inches, what is the probability that the average in our sample will be less than 63 inches?

Z = (X - m)/[s/sqrt(n)] = (63-64)/[3/sqrt(9] = -1.0 normcdf(-100, -1.0) = .16, or 16 percent.

How large a sample would you need to make the probability of getting an average less than 63 inches less than .05? Case of the missing sigma.

invnorm(.05) = -1.64

-1.64 = (63-64)/[3/sqrt(n)]; 4.92 = sqrt(n) n = 25

Normal Approximation to Binomial

true parameter is p; sample proportion is p^

mean of p^ is p

standard deviation of p^ is population standard deviation divided by the square root of the sample size, or sqrt[p(1-p)]/sqrt(n)

Note that with n binomial trials, the standard deviation of the mean number of successes is sqrt[np(1-p)]; but the standard deviation of p^ is sqrt[p(1-p)/n] (difference between counting and taking an average is that you divide by the number of trials)

Using the normal approximation. Suppose that Kobe is an 80 percent free throw shooter. What is the probability that Kobe will hit no more than 70 percent in 64 free throws?

You could use binomcdf(64, 0.8, 49)

Or, you can treat the distribution as approximately normal, with a mean of .8 and a standard deviation of sqrt[(.8)(1-.8)/64] = .05

Now calculate Z = (.70 - .80)/.05 = -2.0. Then normcdf(-100,-2) = .023, or a 2.3 percent probability that Kobe will hit less than 70 percent of 64 free throws.

Normal approximation most useful when we need to solve for n, because binomcdf won't help with that.

Normal approximation works poorly for small n (less than 20) and for very high or very low p--want np and n(1-p) to be greater than 10.

Summary

parameter vs. statistic

biased vs. unbiased

central limit theorem

using the formula

normal approximation to binomial