AP Statistics Lectures

by Arnold Kling

Sample Standard Deviation and Degrees of Freedom

The sample mean, S(x_{i} - )^{2}/n and multiply by a "correction factor" of (n-1)/n?

The answer is that you lose a degree of freedom when you use the sample to estimate the mean. One way to think of this is that if you tell me the sample mean and then tell me the deviation of every observation but one, I can tell you the value of the last observation without looking at it. That is because I know that the sum of the deviations from the mean must equal zero.

You would not lose a degree of freedom if you had a completely independent estimate of the mean. If you were given the true mean and calculated the deviations of your sample values of x from the true mean, then you could divide by n. If you were given an estimate of the mean from an independent sample, say

, and calculated the deviations of your sample values of x from , then you could divide by n.To see why you cannot divide by n, expand the first term of the sum of squared deviations. That is, expand (x_{1} - )^{2}. You get

x_{1}^{2} - 2x_{1} + ^{2} = x_{1}^{2} - 2x_{1}S(x_{i}/n) + [S(x_{i}/n)]^{2}

Each of the x's in your sample is uncorrelated with one another, but x_{1} is certainly correlated with itself. As a result, when we take the expected value, the middle term will be -2x_{1}^{2}/n and the last term will be + x_{1}^{2}/n, so that netting out, we have

We see the factor [(n-1)/n] emerging. This is the bias in the uncorrected sample variance. What it means is that the uncorrected sample variance, x_{1}^{2}/n, has to be multiplied by [n/(n-1)] in order to produce an unbiased estimate of the true population variance, s^{2}.