AP Statistics Lectures
Introduction to Probability (chapter 6 in textbook)
by Arnold Kling

Introduction

Until relatively recent times, people did not think of probability as a branch of mathematics. Mathematics was supposed to be about certainty. Therefore, it did not seem to apply to phenomena such as tossing coins, rolling dice, or life expectancy.

Philosophically, people always had the option of refusing to believe in chance. Instead, they could divide phenomena into those that they can predict and those that they cannot predict. For a long time, people thought that it was futile to try to calculate the unpredictable. Even today, most people prefer to use rough intuition to deal with chance, rather than go through the effort to calculate exactly.

According to Peter L. Bernstein, in his book Against the Gods: The Remarkable Story of Risk, humans played games of chance for thousands of years without ever working out exact probabilities. For example, one popular game was to have a player roll a six-sided die ("die" is singular for "dice") four times. If any roll is a six, the roller wins. Otherwise, the opponent wins. People thought that this was a reasonably fair game (the roller and the opponent seemed to win about equally often), but no one worked out the exact odds.

The classic problem that Pascal and Fermat solved was somewhat like the problem that would occur if a major sports championship, such as the World Series or the Stanley Cup, could not be completed.

Imagine that one team was leading three games to two when play had to be stopped. What is the fairest way to divide the prize money, given that ordinarily the team that wins four games gets the entire prize?

Pascal and Fermat took the view that the prize should be divided between the two teams in proportion to their probability of winning. Assume that for each subsequent game each team has a fifty-fifty chance of winning that game. The team that has already won three games will win the championship a higher proportion of the time. Pascal proceeded to show how to calculate that proportion precisely.

Around 1650, Blaise Pascal and Pierre de Fermat solved a problem related to gambling that had been posed 150 years before. From this point forward, the concepts of probability and statistics were given a mathematical treatment.

You are So Random

Our textbook begins its discussion of probability with a section on "randomness." The word "random" is an adjective. What does it modify?

When people ask me why I am teaching, I say that it is because my daughters kept telling me, "Dad, you are so random." I thought they meant that I should give a class in statistics.

However, a mathematician would not say that you can use the word "random" to describe a person. Nor can you use "random" to describe a thing. The term "random" can only be used to describe a process.

A process is random if it can be executed repeatedly with different outcomes. Examples would be rolling a die, spinning a dreidel, or shooting an arrow at a target.

The opposite of random is deterministic. If a process is deterministic, then the outcome of the process is always the same. Every morning, the sun rises in the East. Every time you drop a rock, it falls to the ground.

Whether or not a process is deterministic may depend on how you define the outcome of the process. For example, if you define the outcome of dropping a rock as "landing on the ground or staying in the air," then it is deterministic--it always lands on the ground. On the other hand if you define the outcome as "the number of centimeters away from my foot that the rock lands," it is random.

A process can be random without all outcomes being equally likely. The Red Sox usually win when Pedro Martinez pitches. However, they do not always win. Therefore, the outcome is random.

QUESTION: After an election, there is sometimes a recount. Should the recount always have the same outcome? Is the recount process random or deterministic?

Sample Space

The set of all possible outcomes of a random process is called the sample space. We use set notation--{}--to describe the sample space. When we say that tossing one coin has a sample space of {H, T}, we are saying that the outcome could be either H (heads) or T (tails). Some examples of sample spaces are:

Random Process Sample Space
Toss one coin {H,T}
Toss two coins {HH,HT,TH,TT}
The total number of heads when you toss two coins {0,1,2}
Pull a sock from a drawer containing 5 red socks, 10 green socks, and 7 blue socks {R,G,B}

The concept of a sample space seems to work well for processes where the outcomes are discrete, which means that the outcomes can be clearly defined and distinguished from one another. What about a situation where the outcomes are continuous?

For example, suppose that we throw a dart at a target, and we measure the distance that the dart lands from the center of the target. Even if we assume that the dart never lands more than 50 centimeters from the target, there are still an infinite number of possible distances. The dart could land at a distance of 2 centimeters, or 2.2 centimeters, or 2.22 centimeters, etc. This poses a problem for defining the sample space.

For the purpose of this course, we will deal with continuous outcomes by dividing the space into discrete units. Rather than defining an event as "the dart lands exactly 2.222 centimeters away from the center of the target," we might define an event as "the dart lands at a point greater than or equal to 2.20 centimeters and less than 2.30 centimeters away from the target." That is, the event is defined as an interval of outcomes.

By using an interval to specify an event, we can map a continuous process into a discrete process. Using intervals, we can describe the probability space as a finite list of possible outcomes.

QUESTION: Suppose we were gathering data on the speed of cars on a particular road. Why would we use intervals to measure speed? What intervals might be appropriate?

To test your understanding of sample space, do problem 6.10 on p.322 of your textbook. Be sure to note which of your answers requires intervals.

Coins and Independence

This section looks at the mathematics of flipping coins, and the next section looks at the mathematics of rolling dice. These examples can help to illustrate the more general rules of probability that will be discussed subsequently.

Coins

What is the probability of flipping a coin four times in a row and having it land heads each time? One way to solve this problem is to set up the sample space as the set of all possible sequences of coin flips. For example, one possible sequence is (H,T,H,T), where you get heads followed by tails followed by heads followed by tails. Overall, there are sixteen possible sequences:

(H,H,H,H), (H,H,H,T), (H,H,T,H), (H,H,T,T), (H,T,H,H), (H,T,H,T), (H,T,T,H), (H,T,T,T), (T,H,H,H), (T,H,H,T), (T,H,T,H), (T,H,T,T), (T,T,H,H), (T,T,H,T), (T,T,T,H), (T,T,T,T)

Of these sixteen sequences, only the first sequence has four heads. Since each sequence would seem to have an equal probability, the logical inference is that the probability of getting four straight heads is 1/16.

What is the probability of getting exactly three heads in four coin flips? Now, the calculation is not so simple. It turns out that of the sixteen sequences, four of them have three heads. The general answer to this question was given by Pascal.

Pascal's Triangle

1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1

Each row of the triangle is constructed by adding the adjacent numbers in the preceding row, and then putting a one on the far left and far right.

For example, the second row is 1 2 1. When we add 1 and 2, we get 3, which we put in between them on the third row. Outside the two 3's, we extend the row by putting a 1 to the left and to the right.

Question: What will be the contents of the sixth row of the triangle ?

How else might we compute the probability of getting four heads in a row? One approach is to think in terms of a sequence of flips, where as soon as you get tails, you stop.

  1. The probability of coming up heads on the first flip is 1/2. If you get tails on the first flip, you might as well stop, because you cannot possibly get four heads. So, half the time you stop, and half the time you keep going.
  2. Assuming we kept going, then we flip the second coin. Again, the probability of heads is 1/2. Again, we only keep going if it comes up heads. So half the time we keep going. Overall, the chance that we will keep going is 1/2 of 1/2, or 1/4.
  3. By now, 3/4 of the time we will have stopped, and 1/4 of the time we will have moved on to flip a third coin. Again, the probability of heads is 1/2. So, the probability that we will keep going is 1/2 of 1/4, or 1/8.
  4. Finally, we have the fourth coin flip. We only get to this point 1/8 times. Again, the probability of heads is 1/2. The probability of four heads is thus 1/2 of 1/8, or 1/16.

As a shortcut, we could say that the probability of getting heads on any one throw is 1/2. The probability of getting four heads in a row therefore is (1/2)(1/2)(1/2(1/2), or (1/2)4.

A general approach to analyzing coin flips is called Pascal's triangle (right). The triangle is a shortcut way to describe the sample space for the number of heads and tails from a sequence of coin tosses. The first row says that with one coin, we can have either all heads (1) or all tails (1).

The second row says that if we toss two coins, we have one chance of getting all heads, two chances of getting one heads and one tails, and one chance of getting all tails.

The third row says that if we toss three coins, we have one chance of getting all heads, three chances of getting one head and two tails, three chances of getting two heads and one tail, and one chance of getting three tails. Since the sum of the row is 8, the probability of getting two heads and one tail is 3/8.

Question: use the triangle to find the probability of getting exactly three heads in four coin tosses. What about exactly three heads in six coin tosses?

Implicitly, we are relying on the assumption that each coin flip is independent. When coin flips are independent, it means that the probability of a coin coming up heads does not depend in any way on previous coin flips.

Independence and Superstititious Sportscasters

Many people do not understand the concept of independence. Sportscasters are notorious for this.

For example, suppose that a sequence of seven coin flips came up with five heads and two tails. What is the probability of getting tails on the next coin flip? A sportscaster would say that "the law of averages" says that we are more likely to get tails.

But there is no law of averages! The chance of getting tails on the next coin flip is 1/2.

What is true is that we expect that as the number of coin flips gets large, the proportion of heads will become closer to 50 percent. However, that is because going forward, we expect about half of the flips to be tails, not because we expect the coin to know that it has an excess of three heads and it needs to come up tails more often to make up for it.

Here is another common superstition in sports. Suppose that Kobe the coin-flipper gets heads five times in a row. A sportscaster might say, "Kobe is really hot. The coach should give the coin to Kobe to make the next flip, because he probably can get heads again."

It isn't just sportscasters who are superstitious. The theories that people use to trade stocks have no statistical validity to them. What people do not realize (or cannot accept) is that the movement of a stock price tomorrow is independent of its movement today.

"Prices have no memory and yesterday has nothing to do with tomorrow. Every day starts out fifty-fifty."
--'Adam Smith', The Money Game

In probability there is no such thing as a "hot hand." We would say that the chance of Kobe getting heads on the next flip is exactly 1/2, regardless of what his first five flips happened to be.

In theory, sports could be different from coin flips. If your mechanics are different on different days, then your probability of success will be different on different days. Therefore, successive basketball shots or baseball at-bats might not be independent from one another.

In practice, statisticians who have looked at basketball and baseball have found that the "hot hand" is largely an illusion. The fact that a basketball player has made a large number of shots in a row does not increase the probability that he will make the next one. In baseball, the statistical probability of a player getting a hit when he is "cold" usually turns out to be the same as when he is "hot."

Statistically, the assumption that each at-bat or basketball shot is independent of previous events holds up fairly well. The "law of averages" and the "hot hand" are just superstitions.

Independence and Failure Models

We use a failure model when we look at how the probability of the success of a system depends on the success of independent individual events.

For example, suppose that we think of a motorboat as the system. The system fails only if every individual engine on the boat fails. If the boat has three engines, all three have to fail in order for the boat to fail.

We assume that each individual event is independent of the others. The first engine failing is one event. The second engine failing is another event. Independence means that the probability of both engines failing is equal to the product of the probability of the first engine failing times the probability of the second engine failing.

There are two common forms of failure models.

  1. In the single-failure model, one individual failure causes the entire system to fail. For example, in a single-elimination play-off tournament (such as the NFL play-offs), one loss means that you are out of the tournament.

  2. In the compound-failure model, every component must fail in order for the entire system to fail. For example, if the boat will operate as long as at least one engine is running, then every engine must fail in order for the boat to fail.

The single-failure and the compound-failure models are mirror images of one another. A single-failure model is the same as a compound-success model: if a single component failure causes the system to fail, then every componenet must succeed for the system to succed. A compound failure model is the same as a single-success model: if every component must fail to cause the system to fail, then only one component must succeed to cause the system to succeed.

Here is a mathematical description of the failure model:

P(s) = the probability that one component (one engine) is successful
P(f) = 1 - P(s) is the probability that one component fails.
P(S) = the probability that the system (the entire motorboat) is successful
P(F) = 1 - P(S) = the probability that the system fails.
n = the number of components (engines)

For the single-failure model, P(S) = P(s)n.
For the compound-failure model, P(F) = P(f)n.

For example, suppose that we have to win three play-off games to make it to the finals, and that our chance of winning any one game is 0.6, or 60 percent. This is a single-failure model, and the probability of making the playoffs, P(S), equals (0.6)3.

On the other hand, suppose that we have a motorboat where every engine must fail in order for the boat to fail. This is a compound-failure model. Suppose that the probability that an engine will succeed, P(s) = 0.6, or 60 percent. That means that P(f) = 0.4, or 40 percent. With three engines, the probabiliby that the boat will fail, P(F) = (0.4)3. The probability that the boat will succeed, P(S) = 1 - P(F).

You can use the failure model to solve problems where the probability of a systemic success or failure depends on the probability of individually independent successes or failures. To use a failure model,

  • Determine from the context whether it is a single-failure or a compound-failure model.
  • Determine what information is given and what information is to be solved for. For example, sometimes you might be given P(f), the probability of an individual failure, and a desired value for P(S), the probability of systemic success. You might then be asked to solve for n, the number of components needed to achieve the desired value of P(S).
  • If necessary, use the facts that P(s) + P(f) = 1 and P(S) + P(F) = 1 to help solve for missing information.
  • Use either the single-failure formula or the compound-failure formula, depending on which fits the problem.

    Some sample problems:

    1. The probability of getting an odd number less than 4 when you roll one die is 1/3. What is the probability of doing this four times in a row?
    2. Joanna is practicing her basketball shots. She usually makes 55 percent of her shots. How many shots will she have to try before she has at least a 95 percent change of having made at least one basket?

    Problem one is a single-failure model. Therefore, P(S) = p(s)4 = (1/3)4 = 1/81.

    Problem two is a compound-failure model. We want P(S) to be 0.95, which means that P(F) = 0.05, or 5 percent. We know that P(f) = 1 - P(s) = 1 - .55 = .45, or 45 percent. P(F) = P(f)n, and we are solving for n. It turns out that if n=3, P(F) is above .05, but if n=4, then P(F) is below .05. So n = 4.

    Dice

    A single die has six sides, with each side having a different number of spots. The numbers go from one through six.

    In a previous lecture, we mentioned a game in which a player rolls a die four times. If the roller gets a 6 on any roll, the roller wins. Otherwise, the other person wins. What is the probability of winning this game?

    The entire sample space for this problem is quite large. We could roll (1,1,1,1), (1,1,1,2), (1,1,1,3), and so forth. There are 64 possible combinations. No wonder people played this game for hundreds of years without calculating the odds!

    Fortunately, we do not need to work with the entire sample space. First, we can reduce the sample space on each turn to simply throwing a six (call this 6) or not throwing a six (call this N, for something else). When we throw four dice, the possible results are:

    (6,6,6,6), (6,6,6,N), (6,6,N,6), (6,6,N,N), ...(N,N,N,N)

    Altogether, there are sixteen possibilities, just as there were with flipping four coins. In fact, we could have represented the outcomes as H and T rather than 6 and N.

    However, not all of the sixteen outcomes are equally likely. On each roll of the die, the chances of "N" (something other than six) are 5/6.

    Of the sixteen possibilities, the only one in which the roller loses is the last one, (N,N,N,N). Therefore, if we can just calculate out the probability of this one sequence, we will have figured out the game.

    If the probability of getting N on any roll of the die is 5/6, what is the probability of getting N for times in a row? Because the rolls are independent, once again we can multiply (5/6)(5/6)(5/6)(5/6) = 0.482253.

    The Sum of Two Dice

    Many games have you throw two dice and add the numbers together. We can calculate probabilities for these games.

    QUESTION: What is the sample space for the random process of rolling two dice and taking the sum? Are there 36 possibilities in this sample space? What does the number 36 represent?

    When you roll two dice and each die comes up with a one facing up, this is called "snake eyes." What is the probability of rolling "snake eyes"?

    The probability of rolling "snake eyes" is 1/36. Here are a couple of ways to see this.

    1. The probability of rolling a one on a single die is 1/6. Multiplying this by the probability of rolling a one on a second die gives (1/6)(1/6) = 1/36.
    2. If you look at all of the possible combinations from rolling two dice, there are 36. Of those 36, only one of them is "snake eyes." Therefore, the probability of rolling "snake eyes" is 1/36.

    Next, let us try calculating the probability of rolling a pair of dice and having the numbers sum to three. This probability is 2/36. Again, we can look at this in two ways.

    1. Think of rolling one die first. To have a chance at having a total of three on both dice, the first die must come up either 1 or 2. The probability of this is 2/6.

      Even if we roll a helpful number on the first die, the second die must come up exactly correct in order to give us a sum of three. For example, if we roll a 1 on the first die, then we must get a 2 on the second die. The chance of getting the exact number is 1/6.

      The probability of rolling a three is the probability of rolling a helpful number on the first die times the probability of rolling the exact correct number on the second die. In this case, it is (2/6)(1/6) = 2/36.

    2. Of all of the 36 possible combinations of two dice, two of them can add up to three. You can get a two on the first die and a one on the second die, or vice-versa. Therefore, the probability of rolling a three is 2/36.

    QUESTION: Rolling two sixes is called "boxcars." What is the probability of rolling boxcars? Calculate this two ways, as we did above for snake eyes.

    Calculate the probability of rolling two dice that sum to 4. That sum to 5. That sum to 6. That sum to 7. Use two methods, as we did above. Do we need to calculate the probabilities of rolling 8, 9, 10, and 11, or can we infer them from some other probabilities?

    Here are some other problems involving dice.

    1. In a popular baseball simulation game one player hits a home run if two dice sum to either three or ten. Another player hits a home run if two dice sum to six. Which player has the best chance of hitting a home run?

    2. In the game Risk, an "attacker" might roll one die against a "defender" who rolls one die. If the number on the defender's die comes up the same or higher than the number on the attacker's die, the defender wins. What is the probability that the attacker will win?

    3. In Risk, the "attacker" is allowed to roll two dice and pick the higher number to compare with the "defender's" one die. A tie still goes to the defender. Now, what is the probability that the "attacker" will win?

    Probability Rules

    Now, we are going to provide a mathematical treatment of probability. That means we are going to introduce formal definitions, axioms, and theorems. We illustrate these with examples from one coin toss and the roll of one die.

    ExpressionDefinition One Coin TossSum of Two Dice
    S The Sample Space {H,T} {2,3,4,5,6,7,8,9,10,11,12}
    P(A) The Probability of Event A P(H) = 1/2 P(4) = 3/36
    Ac The Complement of Event A Hc = T 4c = {2,3,5,6,7,8,9,10,11,12}

    Next, we introduce a few simple axioms.

    AxiomExplanation Coin Example Dice Example
    P(S) = 1 One of the outcomes in the sample space must occur. must be H or T must be 2 - 12
    P(A) >= 0 Probability is never less than 0 P(H) > 0 P(4) > 0

    From these definitions and axioms, it follows that P(Ac) = 1 - P(A). That is, if A does not occur, then the complement of A must occur. If a coin flip is not heads, then it is tails. If a dice roll is not 4, then it must be 2,3, or 5-12.

    Compound Events

    A compound event is an event that is derived from two other events. For example, if we roll two dice, then the event "getting a six on either the first or second die" is a compound event.

    We could flip a coin and roll a die to get a compound event. The compound event might be (H,4), meaning that the coin came up heads and we rolled a 4 on the die.

    There are two types of compound events:

    1. The union of two events A and B is the probability of A or B occuring. It is written as P(A or B).
    2. The intersection of two events A and B is the probability of A and B occuring. It is written as P(A and B).

    Two events are said to be disjoint, or mutually exclusive, if and only if P(A and B) = 0. For example, if we roll one die and event A is getting a 2 and event B is getting a 3, then it is impossible for the event "A and B" to occur. The two events are mutually exclusive.

    Two events are said to be independent if P(A and B) = P(A)P(B), provided that P(A) and P(B) are both nonzero. What that means is that the probability of one event occuring does not depend on whether or not the other event occurs. If A is getting heads on a coin flip and B is rolling a 4 on a die, then A and B are independent.

    Consider a single roll of a die. Define event A as getting a number greater than 3. Define event B as getting an even number. Are A and B mutually exclusive? Are A and B independent?

    QUESTIONS:

    There is a general rule for finding the union of two events.

    P(A or B) = P(A) + P(B) - P(A and B)

    A fairly typical exam question is to give you three out of the four probabilities in this equation and ask you to solve for the fourth. Note that if A and B are disjoint, then the last term is zero, and the union of the probabilities is just P(A) + P(B).

    For the compound event A and B, we can think of the sample space as consisting of four elements:

    {(A and B), (Ac and B), (A and Bc), (Ac and Bc)}

    A fairly typical exam question is to give you the probabilities for three out of the four of these elements and then ask you to solve for the fourth. Once you remember that the probabilities for all four have to sum to one--because this is the sample space--it becomes pretty simple.

    Conditional Probability

    Conditional probability is a way of describing the influence of one event on another. Because much of statistics is about making predictions, conditional probability is one of the most important concepts in this course.

    The statement, "Bernie Williams is a better hitter with runners in scoring position" is a statement about conditional probability. It says that likelihood of the event "Bernie Williams gets a hit" is positively related to the event "Bernie Williams comes to bat with runners in scoring position."

    The statement, "The Democratic candidate is more likely to win if there is a higher voter turnout" is a statement about conditional probability. It says that the likelihood of the event "The Democratic candidate wins" is positively related to the event "Voter turnout is high."

    Suppose that we are playing a game where we flip three coins with the object of having two or more flips come up heads. Before we start the game, the probability of winning is 4 out of 8, or 1/2.

    After we flip one coin, our probability of winning the game changes. If the first coin comes up H, then our probability of winning the game is 3/4. We say that the conditional probability of the event "winning the game" given the event "the first coin comes up H" is 3/4. What is the conditional probability of winning the game given that the first coin comes up T?

    In general, we define the conditional probability of event B given event A as
    P(B|A) = P(A and B)/P(A)
    That is, the conditional probability of event B given event A is equal to the probability of the compound event (A and B) divided by the probability of event A.

    For example, consider the compound event, "Bernie Williams gets a hit with runners in scoring position." We might represent this in terms of A and B as follows:

    event description probability
    A Bernie Williams comes up with a runner in scoring position 0.2
    B Bernie Williams gets a hit 0.3
    A and B Bernie Williams gets a hit with a runner in scoring position 0.08
    P(B|A) Conditional probability of Bernie Williams getting a hit with a runner in scoring position 0.4

    Note that according to these data, P(B|A), the conditional probability of Bernie Williams getting a hit with a runner in scoring position, is higher than the unconditional probability of Bernie Williams getting a hit. Bernie Williams is a .400 hitter with a runner in scoring position, compared with a .300 hitter in general.

    What is the probability of the event (A and Bc), of Bernie Williams coming up with a runner in scoring position and not getting a hit? We know that
    P(A and B) + P(A and Bc) = P(A)
    Therefore, P(A and Bc) = 0.2 - 0.08 = .12. Fill out the rest of the table below.

    event description probability
    A and Bc Bernie Williams comes up with a runner in scoring position and does not get a hit 0.12
    B and Ac Bernie Williams comes up without a runner in scoring position and gets a hit ?
    Ac and Bc Bernie Williams comes up without a runner in scoring position and does not get a hit ?
    P(B|Ac) Conditional probability of Bernie Williams getting a hit without a runner in scoring position ?

    We know that Bernie Williams is a .400 hitter with runners in scoring position. What is his batting average without a runner in scoring position? If you had no idea whether or not a runner is in scoring position, what would be your best estimate of the probability of Bernie Williams getting a hit? How does knowing whether or not a runner is in scoring position improve the accuracy of your prediction?

    Bayes' Theorem

    We have just learned that conditional probability can be used to improve our prediction of events going forward. That is, knowing whether or not a runner is in scoring position helps us to predict more accurately whether or not Bernie Williams gets a hit.

    Bayes' Theorem, published posthumously in the eighteenth century by Reverend Thomas Bayes, says that you can use conditional probability to make predictions in reverse! That is, if you know that Bernie Williams got a hit, you can "predict" the probability that he came up with a runner in scoring position.

    Bayes' Theorem, sometimes called the Inverse Probability Law, is an example of what we call statistical inference. It is very powerful. In many situations, people make bad intuitive guesses about probabilities, when they could do much better if they understood Bayes' Theorem.

    Recall that the definition of conditional probability is:
    [1] P(B|A) = P(A and B)/P(A)
    Bayes' Theorem is used to solve for the inverse conditional probability, P(A|B). By definition,
    [2] P(A|B) = P(A and B)/P(B)
    Solving [1] for P(A and B) and substituting into [2] gives Bayes' Theorem:
    P(A|B) = [P(B|A)][P(A)]/P(B)

    We can use Bayes' Theorem to find the conditional probability of event A given the conditional probability of event B and the unconditional probabilities of events A and B.

    For example, we said that Bernie Williams is a .400 hitter with a runner in scoring position. In other words, P(B|A) = 0.4. We also said that the unconditional probability of Bernie Williams coming up with a runner in scoring position is 0.2, and that the unconditional probability of Bernie Williams getting a hit is 0.3.

    Therefore, if you are given the information that Bernie Williams got a hit, you can infer something about the probability that there was a runner in scoring position. Using Bayes' Theorem,
    P(A|B) = [P(B|A)][P(A)]/P(B) = [0.4][0.2]/[0.3] = .267

    What this says is that when we are given the information that Bernie Williams got a hit, we should estimate the probability that he came up with a runner in scoring position as .267, which is higher than the unconditional probability of 0.2 that he will come up with a runner in scoring position.

    Although the derivation for Bayes' theorem is straightforward, not everyone is comfortable with it. The difficult aspect to accept is that instead of using probability to predict the future, you are using it to make inferences about the past. People who think in terms of causality have trouble with this.

    Everyone understands what it means to say, "Bernie Williams is batting with a runner in scoring position. He has a .400 chance of getting a hit." Can you interpret the statement, "Bernie Williams got a hit. Therefore, there is a .267 chance that there was a runner in scoring position"?

    Here is a classic illustration of Bayes' Theorem. Suppose that you are given two drawers. You cannot see the contents of the drawers, but you are told that one drawer contains two gold coins and the other drawer contains one gold coin and one silver coin. If someone pulls a coin at random out of drawer A and it turns out to be gold, what is the probability that drawer A is the drawer with two gold coins?

    Many people would say, "The chances are fifty-fifty that drawer A is the drawer with two gold coins." However, that is not the correct answer. Although there are many ways to get the correct answer, we will use Bayes' Theorem.

    event description probability
    A Drawer A has two gold coins 0.5
    B Person chooses a gold coin out of the four coins 0.75
    B|A Conditional probability of choosing a gold coin from A if it has two gold coins 1.0

    Using Bayes' Theorem, we have
    P(A|B) = [P(B|A)][P(A)]/P(B) = [1.0][0.5]/[0.75] = 2/3

    What other lines of reasoning can lead you to the correct answer that when someone picks a gold coin out of a drawer chosen at random the chances are two out of three that the drawer contains two gold coins?

    Very few doctors understand that a symptom is meaningless if as many as 10 percent of healthy patients have that symptom and the disease is relatively rare. Too bad they do not know Bayes' Theorem.

    Here is another illustration of Bayes' Theorem. Suppose that you are diagnosed with microscopic hematuria (blood in the urine that is only visible under a microscope). This symptom occurs in 10 percent of all people and 100 percent of people with kidney cancer. You would like to know the probability that you have kidney cancer, which occurs in 0.0002 percent of all people. Remember that if we express a probability in percent, then we must multiply by .01 to get the probability as a fraction of one.

    event description probability
    A Someone has kidney cancer 0.000002
    B Someone has microscopic hematuria 0.10
    B|A Conditional probability of having hematuria given kidney cancer 1.0

    Using Bayes' Theorem, we have
    P(A|B) = [P(B|A)][P(A)]/P(B) = [1.0][0.000002]/[0.1] = .00002
    That is, you still have a very low probability of kidney cancer. The reason is that the symptom of microscopic hematuria is relatively common in the healthy population. If it were true that only one hundredth of one percent of all people had microscopic hematuria, then microscopic hematuria would be a much more powerful indicator of kidney cancer. What would be the probability of kidney cancer if this were the case?

    Contingency Tables

    We use a contingency table to represent the probabilities of two events, A and B, which may or may not be independent. For example, event A could be that Ronit does all of her homework and event B is that Ronit passes her first quiz. The contingency table might look like this:

    EventAcARow Sum
    B0.30.40.7
    Bc0.20.10.3
    column sum0.50.51.0

    In the contingency table, an important square is the intersection of A and B. This is the probability of the event (A and B), which in this example is 0.4, or 40 percent. The upper-left corner gives the probability of event B occurring without event A, which in this example is 0.3, or 30 percent.

    The lower-left corner gives the probability that neither A nor B occurs, which is 20 percent in this example. Finally, at the intersection of A and Bc we have event A occurring without event B occurring, which in this example is a 10 percent probability.

    Overall, event A has a probability of 0.5, which is the column sum under the A column. Event B has a probability of 0.7 (70 percent), which is the sum of row B.

    Visible Relationships

    Some important relationships are visible in the contingency table. In particular:

    1. P(A) = P(A and B) + P (A and Bc).
    2. P(B) = P(A and B) + P (B and Ac).
    3. P(A) + P (Ac) = 1.
    4. P(B) + P (Bc) = 1.

    Often, you are given information in a contingency table that is incomplete. You can use these relationships to fill in the rest of the information.

    You can use the information in a contingency table to test for statistical independence. In particular, compare P(A and B) with P(A)P(B). In the example above, P(A and B) is 0.4, which is greater than P(A)P(B) = (0.5)(0.7) = .35, which means that A and B are positively related. In words, Ronit getting a passing grade on the test is positively related to her doing her homework.

    When P(A and B) is less than P(A)P(B), then the two events are negatively related. When P(A and B) = P(A)P(B), the two events are statistically independent.

    Conditional Probability

    Conditional probability is an invisible component of a contingency table. It can easily be calculated from the table.

    The conditional probability of B given A, written as P(B|A) = P(A and B)/P(A). In the example, P(B|A) = .4/.5 = .8, or 80 percent.

    The conditional probability of A given B, written as P(A|B) = P(A and B)/P(B). In the example, P(B|A) = .4/.7 = .43, or 43 percent.

    Sometimes, you will be given three pieces of information: P(A), P(B), and P(B|A). You can use this information to fill out the entire contingency table.

    1. Compute P(A and B) by multiplying P(B|A) times P(A).
    2. Compute P(Ac and B) by subtracting P(A and B) from P(A).
    3. Compute P(Bc and A) by subtracting P(A and B) from P(B).
    4. Compute P(Ac and Bc) by subtracting the the sum of the probabilities calculated in the first three steps from 1.0.

    Practice Questions

    Here are some sample test questions for for Chapter 6 and Bayes' Theorem, with review topics in parentheses. On an actual test, I will not put anything in parentheses about the topic. You are supposed to figure that out.

    1. (sample space)

      If R is the measure of an earthquake on the Richter scale, then of the following sample spaces for R, which one(s) would be considered valid? For ones that you do not think are valid, briefly explain why not.

      a) {0 < R <= 2, 2 < R <= 6, 6 < R}
      b) {0 < R <= 4, 3 < R <= 6, 6 < R}
      c) {1,2,3,4,5,6,7,8,9}

    2. (multiplication rule for independent events)

      In the dice game craps, each throw consists of rolling two dice and taking their sum. You win on the first throw if you get a 7 or 11. Big Julie from Chicago brings his own dice to a game and wins on his first throw four times in a row. What is the probability of this happening with fair dice?

    3. (contingency tables)

      A magazine asked some women aged 55 or older whether they thought their husbands needed Rogaine or Viagra. 50 percent said that they thought that their husbands needed Rogaine. 20 percent said that they thought that their husbands needed Viagra. 5 percent thought that they needed both.

      Are the need for Rogaine and the need for Viagra independent, related positively, or related negatively? Justify your answer.

    4. (Pascal's triangle)

      The World Tiddlywinks Championship is a best of 9 series, with the first team to win five games taking the trophy. Suppose that the finalists, MIT and Cal Tech, are evenly matched, so for each game MIT has a .5 probability of winning. What is the probability that after five games MIT will be up 3 games to 2?

    Cumulative Review, sample questions covering chapter 6

    1. A teacher has three sections of a history course. In section A there are 18 students, and 15 of them passed. In section B there are 14 students, and 13 of them passed. In section C there are 19 students, and 17 of them passed. Given that a student did not pass, what is the probability that the student came from section A?

      If we choose one of the teacher's students at random, what is the probability that the student passed?

    2. Suppose we took a poll and asked 100 students to name their favorite TV program. The answers are:

      AgeDawsonFelicitySimpsons
      9th grade1569
      10th grade12147
      11th grade121312

      Given that a student picked the Simpsons, what is the probability that the student is in 9th or 10th grade?

    3. A discrete probability distribution has five possible events. The probabilities of events A, B, C, and D are .2, .3, .4, and .1, respectively. What is the probability of event E?

    4. Suppose that the probability of the Redskins having a winning season is .5 and the probability that both the Redskins and the Capitals will have a winning season is .3. If the two events are independent, what is the probability that the Capitals will have a winning season?

    5. A bank uses a 3-digit PIN number for its ATM cards. How many possible PIN numbers are there (using only the numbers 0-9, not letters)? The ATM machine lets the user try to enter a PIN number 4 times before it confiscates the card. What is the probability that someone will be able to guess a PIN number before the machine confiscates the card?

    6. Suppose that a test for ulcer will indicate an ulcer 98 percent of the time when an ulcer is present. The test also will indicate an ulcer 1 percent of the time when an ulcer is not present (this type of result is called a false positive). Suppose that 7 percent of people actually have ulcers.

      • What is the probability that a person selected at random will test positive for an ulcer?
      • What is the probability that a person who tests positive actually will have the disease?