Random Variables and Expectation

AP Statistics Lectures

by Arnold Kling

Random Variables

If we spin a dreidel, the four possible outcomes of the random process are {N,G,H,S}. If we are playing dreidel for toothpicks and there are 8 toothpicks in the center, then the value of G is 8, the value of H is 4, the value of N is 0 and the value of S is -1. When we quantify the values this way, we have a random variable. In this case, we have a random variable that can take on the values of -1, 0, 4, or 8.

In Monopoly, when you land on a utility and one of your opponents owns that utility (but not both utilities), you pay four times the number you rolled on the dice. What you pay is a random variable, depending on what the dice say.

Basically, a random variable is a number that is derived from a random process. For example, suppose that we measure the rainfall in July in different cities. One city might have 1.5 inches, another city might have 0.8 inches, and so on. The amount of rainfall is a random variable. Here are some more examples of random variables:

The number of hits Pedro Martinez allows when he pitches in a game
The number of students in a class with birthdays in January
The score that you get when you take the SAT test.

Give some other examples of random variables. Think of some examples that are discrete and some examples that are continuous.

For continuous random variables, we talk about the probability distribution function. The probability distribution function pertains to intervals of a continuous random variable.

One important distribution function is the uniform distribution function. If a random variable is uniformly distributed, that means that the probability of landing in a particular interval is equal to the size of that interval divided by the size of the entire distribution.

For example, consider a random variable that is uniformly distributed between 0 and 100. The entire distribution has a width of 100. The probability of landing between 0 and 10 is 10/100, or 0.1

Remarks on Random Variables

You can think of a random variable as being analogous to a histogram. In a histogram, you might show the percentage of your data that falls into each of several categories.

For example, suppose you had data on family income. You might find that 20 percent of families have an income below $30K, 27 percent have an income between $30 and $40k, 21 percent have an income between $40 and $50k, and 32 percent have an income over $50k. A histogram would be a chart of that data with the income ranges on the X-axis and the percentages on the Y-axis.

Similarly, a graph of a random variable shows the range of values of the random variable on the X-axis and the probabilities on the Y-axis. Just as the percentages in a histogram have to add to 100 percent, the probabilities in a graph of a random variable have to add to 1. (We say that the area under the curve of a probability density function has to equal one). Just as the percentages in a histogram have to be non-negative, the probabilities of the values of a random variable have to be non-negative.
A probability distribution function (pdf) for a random variable X is an equation or set of equations that allows you to calculate probability based on the value of x. Think of a pdf as a formula for producing a histogram. For example, if X can take on the values 1, 2, 3, 4, 5 and the probabilities are equally likely, then we can write the pdf of X as:
f(X) = .2 for X = 1, 2, 3, 4, or 5
The point to remember about a pdf is that the probabilities have to be nonnegative and sum to one. For a discrete distribution, it is straightforward to add all of the probabilities. For a continuous distribution, you have to take the "area under the curve." In practice, unless you know calculus, the only areas you can find are when the pdf is linear. See the Uniform Distribution.

Mean and Variance

The mean of a distribution is a measure of the average. Suppose that we had a spinner where the circle was broken into three unequal sections. The largest section is worth 5 points, one small section is worth 2 points, and the remaining small section is worth 12 points. The spinner has a probability of .6 of landing on 5, a probability of .2 of landing on 2, and a probability of .2 of landing on 12. If you were to spin the spinner a hundred times, what do you think your average score would be? To calculate the answer, you take the weighted average of the three numbers, where the weights are equal to the probabilities. See the table below.

X	P(X)	P(X)*X
3	.2	0.6
5	.6	3.0
12	.2	2.4
m_X = X = E(X)		6.0

The Greek letter m is pronounced "meiuw" and m_X is pronounced "meiuw of X" or "meiuw sub X."
X is pronounced "X bar."
E(X) is pronounced "The expected value of X."
m_X, X, and E(X) are three ways of saying the same thing. It is the average value of X.

In our example, E(X) = 6.0, even though you could never get a 6 on the spinner. Again, you should think of the mean or average as the number you would get if you averaged a large number of spins. On your first spin, you might get a 12, and the average would start out at 12. As you take more spins, you get other numbers, and the average gradually tends toward 6. In mathematical terms, you could say that the average asymptotically approaches 6.

Another property of the distribution of a random variable is its average disperson around the mean. For example, if you spin a 12, then this is 6 points away from the mean. If you spin a 2, then this is 4 points away from the mean. You could spin the spinner 100 times and calculate the average disperson around the mean. Mathematically, we calculate the average dispersion by taking the square of the differences from the mean and weighting the squared differences by the probabilities. This is shown in the table below.

X	P(X)	X-X	(X-X)²	P(X)*(X-X)²
3	.2	-3	9	1.8
5	.6	-1	1	0.6
12	.2	6	36	7.2
s² = var(X) = E[(X-X)²]				9.6

The Greek letter s is pronounced "sigma." It is a lower case sigma. The expression "var(X)" is pronounced "variance of X."

Suppose that the values of X were raised to 4, 6, and 13. What do you think would happen to the mean of X? What do you think would happen to the variance of X? Verify your guesses by setting up the table and doing the calculation.

See if you can come up with values of X that would raise the mean and raise the variance. See if you can come up with values of X that would raise the mean but lower the variance. Finally, suppose we leave the values of X the same. Can you come up with different values of P(X) that keep the same mean but lower the variance? Can you come up with values of P(X) that keep the same mean but raise the variance?

Next, consider a weighted average of random variables. In terms of a histogram, suppose that you had two zip codes with different average incomes. If you wanted to take the overall mean income of the population in both zip codes, you would have to weight the means by the different populations in the zip codes.

For example, suppose that there are 6000 families in one zip code, with a mean income of $50k. Suppose that there are 4000 families in another zip code, with a mean income of $40k. The overall mean income is equal to (1/10,000)(6000 * $50 + 4000 *$40) = $46k.

Similarly, suppose that you take a weighted average of two random variables. Let W = aX + bY. Then the mean of W is equal to a times the mean of X plus b times the mean of Y.

The Expectation Operator

In general, the expected value of a random variable, written as E(X), is equal to the weighted average of the outcomes of the random variable, where the weights are based on the probabilities of those outcomes. We can talk about E(X), E(X²), and so forth.

If a is a constant, we can talk about E(X+a), E(X-a), E(aX), and so forth. If Y is another random variable, we can talk about E(X+Y), E(XY), etc.

Moments

E(X) is called the mean of X. Often, it is written as X, pronounced "X bar."

X is also called the first moment of X. We define other moments as:

second moment: E(X - X)²
third moment: E(X - X)³
nth moment: E(X - X)ⁿ

The second moment, also called the variance of X, is a measure of the spread of the distribution of X. Synonyms for spread include variability, volatility, and uncertainty.

The third moment is a measure of skewness or asymmetry in the distribution of X. For example, suppose that we hold a raffle where we sell 100 tickets for $10 each, and we give a $500 prize to the winner. The average prize is $5. However, 99 people will get less than the average, and one person will get way more than the average. That asymmetry is called skewness.

Rules for the Expectation Operator

The expectation operator, E(X), takes the weighted sum of a random variable. In the case where there are two outcomes {x₁,x₂}, E(X) takes p₁x₁ + p₂x₂, where p₁ and p₂ are the respective probabilities of the two outcomes.

Here are some important rules for manipulating equations that involve E(X). In the following, assume that a and b are constants, and Y is another random variable.

E(a) = a
E(a + bX) = a + bE(X)
E(X + Y) = E(X) + E(Y)
E(bX)² = E(b²X²) = b²E(X²)

Using the example with two outcomes, verify these rules. Then, derive a rule for E(a+bX)².

What is nice is that the expectation operator, E(), is consistent with all of the usual rules of algebra. In particular,

E(X+Y)² = E(X² + Y² + 2XY)

A Useful Identity

One way to understand the relationship between E(X²) and the variance of X is to write out the following identity.

E(X - X)² = E(X² - 2XX + X²)
= E(X² - 2[E(X)]² + [E(X)]²
= E(X²) - [E(X)]²

Note that the probability-weighted average of the squares, E(X²), is larger than the square of the probability-weighted average, [E(X)]².

a + bX

Let Y be a linear transformation of X, namely, Y = a + bX. For example, if X is the weight of a package, and the delivery charge is $10 plus $2 per pound, then Y = 10 + 2X.

We have already seen that E(Y) = a + bE(X). What is E[(Y - Y)²]?

From the useful identity, we have

E[(Y - Y)²] = E(Y²) - [E(Y)]²]

Substituting a+bX for Y and using the properties of the expectation operator, we have

= E(a+bX)² - [a + bE(X)]²

= a² + 2abE(X) +b²E(X²) - a² - 2abE(X) - b²[E(X)]²

= b²E(X²) - b²[E(X)]² = b²E(X - X)²

You don't have to remember the proof. What you do have to remember is that if Y = a + bX, then
m_Y = a + bm_X
s²_Y = b²s²_X
s_Y = s_X

Two Random Variables

Often, we will take two random variables, X and Y, and add them to create a new random variable. We could give the new random variable its own name, Z, but often we just call it X+Y.

The properties of the expectation operator imply that:

m_X+Y = E(X+Y) = E(X) + E(Y) = m_X + m_Y
s²_X+Y = E(X+Y - X+Y)²
= E(X - X + Y - Y)²
= E(X - X)² + E(Y - Y)² + 2E([X - X][Y - Y]
= s²_X + s²_Y + 2s_XY

The term s_XY is called the covariance of X and Y. We will return to it later in the course. For now, we note that in the case where X and Y are independent, the covariance is 0, and the equation reduces to:

s²_X+Y = s²_X + s²_Y (when X and Y are independent)

It follows that if we have n independent random variables X that have the same mean m_X and variance s²_X, and we call the sum of these random variables V, then

iid equations
m_V = nm_X
s²_V = n s²_X

These are called iid equations, because they refer to the sum of indepent, identically distributed random variables. Verify that the iid equations are correct.

Start with the random variable that can take on values of 3, 5, or 12 with probabilities .2, .6, and .2, respectively. We calculated its mean as 6 and its variance as 9.6.
Next, consider the random variable V that is the sum of two X's. (Think of each X as being like one die, and V is the sum of dice.) According to the iid equations, what should be the mean and variance of V?
In a table, show all possible values of V. Show their probabilities. Then calculate the mean and variance of V using the values and their probabilities. Verify that you get the same answer as when you use the iid equations.

Practice Questions

Random variable X has a mean of 8 and a standard deviation of 4. Random variable Y has a mean of 5 and a standard deviation of 2. Let W = X - 2Y. Assuming that X and Y are independent, calculate the mean and standard deviation of W.
The mean temperature in September was 20 degrees Celsius with a standard deviation of 4.5 degrees. What was the mean and standard deviation of the temperature in Farenheit? (Farenheit = 9/5 Celsius + 32)
Which of the following is a valid probability density function when defined over the domain 0<=X<=2?
f(x) = .5x
f(x) = x-1
f(x) = x
Suppose that we have a roulette wheel with 2 green slots (zero and double-zero), 18 red slots, and 18 black slots. If you bet $1 on red and win, you are plus $1. If you bet $1 on red and lose, you are minus $1. What is the expected value of a $1 bet on red? What is the expected value of a $5 bet on red? What is the standard deviation of a $5 bet on red?
On an old AP exam, there was a problem where you needed to find the expected cost of repairs for a computer the first year after you buy it. If the probability of needing no repairs is .7, the probability of needing a $100 repair is .2, and the probability of needing a $300 repair is .1, what is the expected cost of repair? What would be a fair price to pay for a warranty that offered free repairs in the first year?
Suppose that I offer to make a bet with you. I will give you $30. Then, I will keep flipping a coin until it comes up tails. If n is the number of heads that I get before it comes up tails, you pay me $2ⁿ. The payoff is a random variable. Call it X. The distribution of X is:

number of heads (n) $2ⁿ probability

0 $1 1/2

1 $2 1/4

2 $4 1/8

3 $8 1/16

n $2ⁿ 2^-(n+1)

As long as I get fewer than five heads in a row, you have to pay me less than the $30 I pay you. Does this look like an attractive game to you?
Most people would say, "Yes." However, what happens when you take the expected value of what you will have to pay?

E(X) = (1/2)($1) + (1/4)($2) + (1/8)($4) + ... = $1/2 + $1/2 + $1/2 + ...

You really should think twice before agreeing to play this game. This example is known as the St. Petersburg Paradox.

number of heads (n)	$2ⁿ	probability
0	$1	1/2
1	$2	1/4
2	$4	1/8
3	$8	1/16
n	$2ⁿ	2^-(n+1)