AP Statistics Lectures

by Arnold Kling

Normal Distribution

The normal distribution is almost the opposite of the uniform distribution. The uniform distribution is mathematically simple but occurs rarely in nature. The normal distribution is mathematically complex but occurs frequently in nature.

With the uniform distribution, the probability of X falling within an interval depends on the size of the interval but not on its location. If X is distributed uniformly between -1 and 1, then the probability of X falling between 0.8 and 1.0 is the same as the probability of X falling between -0.1 and 0.1.

With the normal distribution, intervals of the same width can have very different probabilities. Suppose that X is distributed normally with mean 0. Then the probability of X falling between -0.1 and 0.1 will be greater than the probability of X falling between 0.8 and 1.0.

At this point, I recommend reading the discussion of the normal distribution that can be found at David Lane's Hyperstat site.

The formula for a normal variable x with mean m and standard deviation s is

f(x) = (2ps^{2})^{(-1/2)}e^{-(x-m)2/2s2}

To simplify things a bit, statisticians prefer to work with what is called the "standard" normal distribution, with mean 0 and standard deviation 1. In that case, the formula simplifies to

f(x) = (2p)^{(-1/2)}e^{-x2/2}

Any normal random variable X with mean m_{x} and standard deviation s_{x} can be converted to a standard normal by first subtracting m_{x} from all values of X and then dividing by s_{x}.

The normal distribution is important for two reasons.

- Many variables that we observe are distributed normally.
Physical characteristics of plants and animals, such as height and weight, fit a normal distribution quite well. Performance of stock prices tends to fit a normal distribution. Many types of prediction errors and measurement errors tend to be distributed normally.

- The central limit theorem.
The central limit theorem proves that a particular type of measurement error will be normally distributed. This type of measurement error is called sampling error.

Suppose that you want to know the average income of families living in the 20902 zip code. If you ask every family their income, you will get the right answer (let us assume that people respond accurately). However, what happens when you only ask a sample of, say, 100 families?

If you take a random sample, then your estimate of the average income in zip code 20902 will have an error, called sampling error. The error will be distributed normally with a mean of zero and a standard error that is inversely proportional to the size of the sample.

The normal distribution is so important that we will spend a lot of time working with it. We will get to know it really well. It will be our friend.