AP Statistics Audio Lectures
C2 tests
by Arnold Kling

Chapter 13 is about C2 tests, spelled chi-square and pronounced ki-square. These tests apply to categorical data with either one or two variables, at least one of which can take on more than two values.

Categorical data is distinct from numerical data. If I measure the size of cars by weight, then I am using numerical data. Instead, if I classify cars as sub-compact, compact, mid-size, and other, then I am constructing categorical data. Other examples of categorical data:

• country in which you were born
• grade (A, B, C, D, or F) in history
• marital status (single, married, widowed, divorced)

Categorical data is arranged in tables. For each cell in a table, we can calculate the value

(O-E)2/E

where O is the observed quantity in the cell and E is the expected quantity in the cell. For some reason, I always want to think in terms of proportions, which we could write with lower case letters o and e. If n is the size of the sample in the observed data and e is the expected proportion in the sample, then E/n = e and O/n = o.

For example, suppose that a rental company expects the distribution of requests by size of car to be:

Expected

sub-compactcompactmid-sizeother
.22.49.12.15

Next, suppose that a sample of 100 requests is taken, and the results turn out to be

Observed

sub-compactcompactmid-sizeother
28441810

Because I happened to pick a sample size of 100, the values for E in each cell can be obtained by multiplying the expected sample proportions by 100. So, to calculate the (O-E)2/E value for the first cell, we take (28-22)2/22 = 36/22 = 1.64

If we compute this value for each cell, and take the sum, we have a statistic called X2. In the example, this will be 6.81. If the sample size is sufficiently large, this statistic will be distributed C2 with degrees of freedom equal to the number of categories minus one. (The intuition behind the degrees of freedom is that once you know the percentages in all but one category, the percentage in the final category is fixed.) In this example, there are 4 categories, so there are 3 degrees of freedom.

On p. 842 of the book, there is a table of C2 values. For 3 degrees of freedom, a value of 6.81 corresponds to a p-value somewhere between .10 and .05. This means that if the null hypothesis is that the distribution of requests for rental cars in our sample is as expected, we can reject that hypothesis at the 10 percent level but not at the 5 percent level.

To calculate an exact p-value using the calculator, go to stats/distributions and find C2(6.81, 1000, 3) where 6.81 is the value of our statistic, 1000 is a very large number (you could try 100 or 1000000 to see if it makes a difference), and 3 is the number of degrees of freedom.

Suppose that the observed sample proportions were the same in a sample size of 200. What would be the value of X2? What would be the p-value?

This test is sometimes called a Goodness of Fit test. It tests how well our original model (the expected proportions) fits our data (the observed cell counts).

Another use of the C2 test is examine whether or not two categorical variables are independent. Recall that A and B are independent if P(A and B) = P(A)P(B). We use this relationship to develop the expected counts in a table of two variables, and then compare it with the observed percentages.

For example, suppose that we want to test to see whether in our sample of 100 the choice of car size is independent of whether the rental is for a weekend or a weekday. Suppose that 20 percent of rentals are for the weekend, and 80 percent of rentals are for a weekday. Assuming that the size of car is independent of the day or rental, then if 28 percent of rentals are sub-compacts then 20 percent of these, or 5.6 percent of all rentals, should be sub-compacts for the weekend, and 80 percent of 28 percent, or 22.4 percent of all rentals, should be subcompacts for a weekday.

Overall, the table of expected percentages is

Expected percentages (assuming independence)

rental daysub-compactcompactmid-sizeother
weekend5.68.83.62.0
weekday22.435.214.48.0

However, suppose that the actual counts are

Actual counts

rental daysub-compactcompactmid-sizeother
weekend2693
weekday263897

Now, we can calculate C2 for the whole table. Adding up the terms from all 8 cells gives a number about 15.

The degrees of freedom for a table with r rows and c columns is equal to (r-1)(c-1), or in this case (2-1)(4-1) = 3. (Intuitively, the row and column percentages have to sum to one, so the degrees of freedom is the [number of row possibilities minus one] times the [number of column possibilities minus one].)

In this case, the p-value is very small. There is strong evidence that the size of car is not independent of the day of the week of the rental.

We can do the test on the calculator by entering the data in the table above (the actual counts, not the expected counts) as a matrix. Select matrix/edit/1:[A] and enter the number of rows (2) and the number of columns (4), followed by the data in the cells. Then, go to STAT/TESTS and select C:C2 test. Let A be the observed matrix, and B be the expected matrix (which the calculator will compute). Choose calculate, and you should get X2 and a p-value.

Here is another example, involving sample proportions. Suppose that we do an experiment where we try different drug treatments on four equal-sized samples. The results turn out to be that the proportion of successes in each sample is .48, .12 .24, and .16, respectively. For a sample size of 50 for each drug trial, are the differences in outcomes statistically significant?

If we note that in each sample the proportion of failures is one minus the proportion of successes, and take into account the fact that the sample size in each column is 50, then we can write the results in a matrix.

outcomedrug 1drug 2drug 3drug 4
success246128
failure26443842

If you put this matrix into a calculator and calculate the C2statistic, you can see whether the differences in the results across drugs are significant. Note that if there were only two drug treatments, then a simple z-test would work.

Finally, a note about sample size. I said that the sample size has to be sufficiently large for C2 tests to be valid. The rule of thumb that our book suggests is that you need an expected count in each cell to be at least one and you should have no more than 20 percent of your cells with an expected count of less than five.