C^{2} Tests

Chapter 13 is about C^{2} tests, spelled chi-square and pronounced ki-square. These tests apply to categorical data with either one or two variables, at least one of which can take on more than two values.

*Categorical data* is distinct from numerical data. If I measure the size of cars by weight, then I am using numerical data. Instead, if I classify cars as sub-compact, compact, mid-size, and other, then I am constructing categorical data. Other examples of categorical data:

- country in which you were born
- grade (A, B, C, D, or F) in history
- marital status (single, married, widowed, divorced)

Categorical data is arranged in tables. For each cell in a table, we can calculate the value

(O-E)^{2}/E

where O is the observed quantity in the cell and E is the expected quantity in the cell. For some reason, I always want to think in terms of proportions, which we could write with lower case letters o and e. If n is the size of the sample in the observed data and e is the expected proportion in the sample, then E/n = e and O/n = o.

For example, suppose that a rental company expects the distribution of requests by size of car to be:

sub-compact | compact | mid-size | other |
---|---|---|---|

.22 | .49 | .12 | .15 |

Next, suppose that a sample of 100 requests is taken, and the results turn out to be

sub-compact | compact | mid-size | other |
---|---|---|---|

28 | 44 | 18 | 10 |

Because I happened to pick a sample size of 100, the values for E in each cell can be obtained by multiplying the expected sample proportions by 100. So, to calculate the (O-E)^{2}/E value for the first cell, we take (28-22)^{2}/22 = 36/22 = 1.64

If we compute this value for each cell, and take the sum, we have a statistic called X^{2}. In the example, this will be 6.81. If the sample size is sufficiently large, this statistic will be distributed C^{2} with degrees of freedom equal to the number of categories minus one. (The intuition behind the degrees of freedom is that once you know the percentages in all but one category, the percentage in the final category is fixed.) In this example, there are 4 categories, so there are 3 degrees of freedom.

On p. 842 of the book, there is a table of C^{2} values. For 3 degrees of freedom, a value of 6.81 corresponds to a p-value somewhere between .10 and .05. This means that if the null hypothesis is that the distribution of requests for rental cars in our sample is as expected, we can reject that hypothesis at the 10 percent level but not at the 5 percent level.

To calculate an exact p-value using the calculator, go to stats/distributions and find C^{2}(6.81, 1000, 3) where 6.81 is the value of our statistic, 1000 is a very large number (you could try 100 or 1000000 to see if it makes a difference), and 3 is the number of degrees of freedom.

Suppose that the observed sample proportions were the same in a sample size of 200. What would be the value of X^{2}? What would be the p-value?

This test is sometimes called a Goodness of Fit test. It tests how well our original model (the expected proportions) fits our data (the observed cell counts).

Another use of the C^{2} test is examine whether or not two categorical variables are independent. Recall that A and B are independent if P(A and B) = P(A)P(B). We use this relationship to develop the expected counts in a table of two variables, and then compare it with the observed percentages.

For example, suppose that we want to test to see whether in our sample of 100 the choice of car size is independent of whether the rental is for a weekend or a weekday. Suppose that 20 percent of rentals are for the weekend, and 80 percent of rentals are for a weekday. Assuming that the size of car is independent of the day or rental, then if 28 percent of rentals are sub-compacts then 20 percent of these, or 5.6 percent of all rentals, should be sub-compacts for the weekend, and 80 percent of 28 percent, or 22.4 percent of all rentals, should be subcompacts for a weekday.

Overall, the table of expected percentages is

rental day | sub-compact | compact | mid-size | other |
---|---|---|---|---|

weekend | 5.6 | 8.8 | 3.6 | 2.0 |

weekday | 22.4 | 35.2 | 14.4 | 8.0 |

However, suppose that the actual counts are

rental day | sub-compact | compact | mid-size | other |
---|---|---|---|---|

weekend | 2 | 6 | 9 | 3 |

weekday | 26 | 38 | 9 | 7 |

Now, we can calculate C^{2} for the whole table. Adding up the terms from all 8 cells gives a number about 15.

The degrees of freedom for a table with r rows and c columns is equal to (r-1)(c-1), or in this case (2-1)(4-1) = 3. (Intuitively, the row and column percentages have to sum to one, so the degrees of freedom is the [number of row possibilities minus one] times the [number of column possibilities minus one].)

In this case, the p-value is very small. There is strong evidence that the size of car is not independent of the day of the week of the rental.

We can do the test on the calculator by entering the data in the table above (the actual counts, not the expected counts) as a matrix. Select matrix/edit/1:[A] and enter the number of rows (2) and the number of columns (4), followed by the data in the cells. Then, go to STAT/TESTS and select C:C^{2} test. Let A be the observed matrix, and B be the expected matrix (which the calculator will compute). Choose calculate, and you should get X^{2} and a p-value.

Here is another example, involving sample proportions. Suppose that we do an experiment where we try different drug treatments on four equal-sized samples. The results turn out to be that the proportion of successes in each sample is .48, .12 .24, and .16, respectively. For a sample size of 50 for each drug trial, are the differences in outcomes statistically significant?

If we note that in each sample the proportion of failures is one minus the proportion of successes, and take into account the fact that the sample size in each column is 50, then we can write the results in a matrix.

outcome | drug 1 | drug 2 | drug 3 | drug 4 |
---|---|---|---|---|

success | 24 | 6 | 12 | 8 |

failure | 26 | 44 | 38 | 42 |

If you put this matrix into a calculator and calculate the C^{2}statistic, you can see whether the differences in the results across drugs are significant. Note that if there were only two drug treatments, then a simple z-test would work.

Finally, a note about sample size. I said that the sample size has to be sufficiently large for C^{2} tests to be valid. The rule of thumb that our book suggests is that you need an expected count in each cell to be at least one and you should have no more than 20 percent of your cells with an expected count of less than five.