Working with the Normal

AP Statistics Lectures

by Arnold Kling

Working with the Normal Distribution

Because data so often is distributed normally, it is important to be able to work with the normal distribution. What that means is being able to understand how the mean and standard deviation relate to cumulative probability.

The Z transformation

For any random variable X, we know that if we take the transformation Z = a + bX, we can find the mean and standard deviation of Z in terms of X. In particular,

m_Z = a + bm_X
s_Z = bs_X

Suppose that we set b = 1/s_X and a = -m_X/s_X. Then, we have

Z = a + bX = -m_X/s_X + X/s_X = (X - m_X)/s_X

We call this particular choice of a and b the Z transformation. The reason that it is useful, is that the mean of Z is always 0 and the standard deviation of Z is always 1. When a variable is distributed normally with mean 0 and standard deviation 1, we call it a standard normal variable. The Z transformation can be used to convert any normally distributed random variable to the standard normal distribution.

In any statistical problem involving the normal distribution, we use the Z transformation to put the problem in the form of the standard normal distribution. We can then look up values of the standard normal distribution by using a table at the back of the book or using the calculator.

Three Intervals

The normal distribution is a continuous distribution. We have seen that with a continuous distribution we have to discuss the probability of a number falling within an interval. For example, if X is a normally distributed random variable, then it makes no sense to talk about P(X = 1.2).

Here are some probabilities that we can evaluate for a normal distribution:

P(X <= 1.2)
P(X > 1.2)
P(-0.6 < X < 1.6)

To find P(X <= 3.2), we look at the interval from negative infinity to 3.2. To find P(X > 3.2), we can take [1 - P(X <= 3.2)]. To find P(2.5 < X < 7.9), we would take the probability of the interval from negative infinity to 7.9 and subtract the probability of the interval from negative infinity to 2.5.

The z table in the front of the textbook describes a standard normal distribution, with mean 0 and standard deviation of 1. It gives the probability of the interval from negative infinity to z for various values of z. For example, if z = -2.00, the probability of the interval from negative infinity to z is .0228, or 2.28 percent. Borrowing calculator terminology, we will say that normcdf of -2.00 is .0228. If z = -2.08, then the probability of the interval from negative infinity to z is .0188. That is, normcdf of -2.08 is .0188. (Why is this smaller than for z = -2.00?)

Your calculator can be used instead of a z table. If you punch in 2nd/DIST/2:normcdf(-100, -2.00), you should get the same number that we had in the z table. We are using -100 as a proxy for "negative infinity," because the chance that a normal random variable will have a value that is 100 standard deviations away from its mean is almost as small as the chance that it will have a value that is infinitely far from its mean.

Using the calculator, we can compute the probability of X falling between -0.6 and 1.6 in one step. We take normcdf(-0.6, 1.6). This would take two steps using the table at the back of the book. Using the table, we would have to take normcdf of 1.6 and subtract normcdf of -0.6.

Inverting the Normal Distribution

Suppose that a doctor tells me that my daughter is in the 10th percentile for height for a girl her age. If she stays in the 10th percentile, how tall will she be as an adult?

Height follows a normal distribution. Suppose that the mean height for an adult female is 64.5 inches and the standard deviation is 2.5 inches. What is the tenth percentile for height?

We can use a calculator or the table in the book to find the tenth percentile for a standard normal variable, with mean 0 and standard deviation of 1. But then we have to adjust for the actual mean of 64.5 inches and standard deviation of 2.5 inches. If we call the standard normal z and the actual normal x, we have

z = (x-m)/s

We solve this problem by taking the following steps.

Find the tenth percentile for a standard normal distribution.
Using Table A in the front of the textbook, we see that there is a cumulative probability of .1003 at z = -1.28. This says that a standard normal variable (with mean 0 and standard deviation of 1) has a probability of .1 (or .1003) of falling below -1.28. Borrowing terminology from the calculator, we say that the invnorm of .1 is -1.28.

Alternatively, we can use the calculator to invert the normal distribution for a value of 0.1, or ten percent. We punch in 2nd/DIST/3:invNorm(0.1), which should give you a result of -1.28 or something close to it.
Figure out how many inches below the mean that my daughter's height will fall.
Multiply the z value from the standard normal table (-1.28) by the standard deviation of female height, which is 2.5 inches. The result is -3.2 inches.
Use the difference from the mean to predict my daughter's height.
The mean is 64.5 inches, and the tenth percentile is 3.2 inches short of the mean. If my daughter is in the tenth percentile as an adult, then she will be 64.5 - 3.2 = 61.3 inches tall as an adult.

From Value to Percentile

In the example we just did, we started with a percentile and converted to a value. That is, knowing that my daughter is in the tenth percentile for height, we predicted the value of her height as an adult.

Another exercise that we often perform with the normal distribution is to go the other way--to convert from a value to a percentile. For example, if an adult woman is 72 inches tall, in what percentile for height is she? Here, we go through the steps in reverse.

First, we find the difference of the value of 72 from the mean.
Subtracting the mean of 64.5 from 72 gives 7.5 inches.
Next, we want to know how many standard deviations away from the mean.
Since the standard deviation for height is 2.5 inches, a difference of 7.5 inches represents three standard deviations.
Finally, we use the z table to find the percentile for three standard deviations.
Reading from the z table, the value for 3.0 is .9987. Therefore, a woman who is 72 inches tall is in the 99.87 percentile for height. She is taller than 99.87% of all women.

Can you use your calculator to find the probability of z falling between negative infinity and 3.0? Do you use normcdf() or invNorm()?

Natural Units, Standard Deviations, and Percentiles

Working with the normal distribution means translating back and forth among natural units, standard deviations, and percentiles.

The natural units are the units in which the problem is expressed. They might be inches, or dollars, or hours. For example, this year's rainfall has been 8.7 inches.
The standard deviations are the number of standard deviations away from the mean. For example, this year's rainfall is -1.2 standard deviations from the mean, or 1.2 standard deviations below the average annual rainfall.
Percentiles are the percent of the distribution that falls below the given value. If my daughter is in the 10th percentile for height, that means that 10 percent of women are shorter and 90 percent of women are taller.

Think of natural units as X and standard deviations as Z. To go from natural units to standard deviations, we use the Z transformation. That is,

Z = (X - m_X)/s_X

To go from standard deviations to natural units, we reverse the Z transformation. That is, we take

X = Zs_X + m_X

To go from standard deviations to percentiles, we take normcdf(-100,Z). To go from percentiles to standard deviations, we take invnorm(decimal percentile). The decimal percentile means the percentile expressed as a decimal. For example, the 25th percentile would be 0.25.

When you are given a problem involving the normal distribution, the first step is to figure out what you are given and what you asked for. Then you can infer what steps you need to follow. For example, if you are given natural units and asked for percentiles, then you would use the Z transformation to get from natural units to standard deviations, and normcdf to get from standard deviations to percentiles.