Simulation

AP Statistics Lectures

by Arnold Kling

Simulation

A statistical simulation is an attempt to estimate the properties of a process by using a random variable to represent that process.

For example, suppose that you did not know how to use Pascal's triangle or the binomial distribution. You are told that two evenly-matched teams, A and B, are going to play a best-of-five championship series, and you want to estimate how often team A will win the championship in four games.
This process can be simulated by using a series of coin flips. You could represent each game by a flip of a coin, with heads being a win for team A and tails a win for team B.

In this simulation, one "trial" consists of a series of coin flips until you have accumulated either three heads or three tails, representing the end of the championship series. For example, if you flip H, T, H, H, then the result of that trial is that team A wins in four games. This constitutes a success. On the other hand, the sequence H, H, H would constitute a failure, in that although team A wins, they do not win in exactly four games.

A simulation would consist of a large number of trials, say 100. If there were 35 successes in a hundred trials, then you would conclude from the simulation that the probability of team A winning in exactly four games is 35 percent.

In practice, statisticians use simulations in situations where the characteristics of the distribution cannot be known. Sometimes, there are too many variables interacting. This happens in financial modeling, where the value of a security may depend on movements in foreign exchange rates, interest rates, and the prices of different stocks.

Another situation that requires simulation is when there are clear reasons to expect departures from normality. If you are using data with a skewed distribution, and you want to get an idea of how the skewness affects confidence intervals, a simulation may be required.

Getting a Simulation Right

To construct a simulation correctly, take the following steps.

Identify where randomness enters the problem. For example, in a championship series, the outcome of each game is random.
Decide how the randomness is best represented. This will enable you to know how to "draw" random values for your simulation. The choices are:
- Normally distributed random variable. For example, a simulation of one-year rates of return on owning stocks might plausibly involve a normally distributed random variable. When you need to take draws of a normal random variable, you can use "randnorm" on your calculator.
- Discrete random variable that can take on n possible values. In the case of a championship game, n=2: either team A wins or team B wins. If the real-world process were drawing a card from a deck and stating the number on the card, then n=13. You can use the "randInt" function on the calculator to generate random integers. The function takes three parameters: the minimum integer, the maximum integer, and the number of integers you want drawn. For example, to simulate drawing five cards, take randInt(1, 13, 5).
- A continuous uniform random variable, meaning one that can take on all possible values between a minimum a and a maximum b. I think that there is a rand function on the calculator that will give you a continuous random variable between a minimum and a maximum.
For a simulation, a continuous random variable can be converted to a discrete random variable, and vice-versa. For example, my Web site used to run a monthly raffle, and the number of entrants varied each month. My formula for choosing a winner was to take a continuous random number between 0 and 1, multiply it by the number of entrants, and round to the nearest integer. For example, if there were 80 entrants and the random number were .2207286, then the winner would be entrant number 18 (80 times .22 is closest to 18).
Use a sequence of random numbers to determine the results of each "trial" in the simulation. For example, in simulating a best-of-five championship, each "trial" is finished when one team has one three games.
Summarize the results of the trials
.

Random Number Tables

On the AP test, you may be given a table of random numbers to use in a simulation. Although in theory you could be given a table of normal random numbers to do a problem involving a simulation of a normally distributed random variable, that is unlikely. It is more likely that you will see a table of random digits 0 through 9 that is distributed uniformly. A sequence of random numbers might look like this:

31339702887384379316349

Suppose we were trying to simulate the number that you get when you draw a card. We saw earlier that this can be done by using randInt to pick an integer between one and thirteen.

With the random number table, we need to use two digits at a time to make it possible for the numbers to cover the values of 1 through 13. We could let 01 represent an ace, 02 represent a deuce, ...and 13 represent a king. What do we do with numbers 14 through 99?

We could throw out numbers 14 through 99, but that would "waste" a lot of numbers. Instead, we could use all numbers from 1 through 91. An ace would be represented by 01 through 07. A deuce would be 08 through 14...a king would be 85 through 91.

Suppose we were asked to pick three cards, replacing each card after we draw it, so that the probability of drawing any card is always 1 out of 52, and the probability of drawing any number is always 1 out of 13. In the random number sequence above, the first four two-digit numbers are 31, 33, 97, and 02. These represent the cards 4, 4, nothing, and ace, respectively. So our three cards would be two fours and an ace.

The Replacement Issue

A common error in doing simulation problems is to fail to consider whether the problem should be represented using sampling with or without replacement. For example, suppose that we were asked to pick three cards from a deck without putting them back. In that case, it might be best to use only the digits 01 through 52 to represent the cards. So 01-04 could be an ace, 05-08 is a deuce, ...49-52 is a king. In addition to throwing out 53-100, you also would throw out a case where you get a number that you already have drawn. If you get 02 and later another 02, the first 02 counts as an ace but the second 02 counts as nothing. Note that an 02 followed by an 03 would count as two aces.

Exercises

Use the random number table, Table B, in the back of the book, to simulate the following:
A carton of eggs holds 12 eggs. In a shipment of 3 cartons of eggs (total of 36 eggs), each egg has a probability of .05 of being broken. A broken egg in any carton makes it a bad carton. What is the probability that a shipment of three cartons will contain no bad cartons? One bad carton? Two bad cartons? Three bad cartons?
1. Explain carefully how you will use the random number table to conduct this simulation, including what digits you will use and what each digit(s) will represent.
2. Conduct two trials where you show everything: copy down the random numbers, show how you grouped them, and show how they generated the results.
3. Conduct eight more trials, and summarize the results of all ten trials.
4. Use probability theory to calculate the probability of zero, one, two, or three bad cartons.
In order to confuse the opposition, coach Kling decides to pick the starting lineup of the basketball team at random. Suppose that there are 12 players on the team, and the coach picks 5 starters. How often would Himmelfarb and Joel be together in the starting lineup?
Use the random number table to conduct 15 trials. Explain carefully how you conduct the simulation, and show explicitly how the random numbers correspond to results in the first two trials. Then summarize the results of the 15 trials.