AP Statistics Lectures
by Arnold Kling

Experimental Design

Statistics is intimately related to the scientific method. One way to show the steps in the scientific method is as follows:

State a hypothesis -->Come up with a way to test the hypothesis -->Design an experiment to carry out the test -->Collect data -->Analyze data -->State conclusions

The first step in becoming a statistician is to learn the "back end" of this process--analyzing data and stating conclusions. Taking the other steps as given, anyone who has completed a statistics course like this one should be able to handle data analysis and stating conclusions. All you need is a quick tutorial in whatever statistical software has been selected.

In the real world, as a statistician you spend most of your time worrying about the "front end" of the process. This requires expertise in the specific subject matter, such as economics, biology, ecology, or psychology. A student who knows statistics but lacks subject-area expertise is unlikely to be able to formulate an interesting testable hypothesis.

Experimental design is closer to the "front end" of the process. It requires subject-area expertise. However, it also requires some statistical know-how, and this section of the course deals with that topic. In the book, sections 5.1 and 5.2 deal with experimental design.

Confounding Variables and Controlled Experiments

We know from science that a good experiment is a controlled experiment. In a controlled experiment, the results are due to the factors that you are testing, not to extraneous or accidental factors.

For example, suppose that we wanted to measure the effect that going to college has on your future income, but we cannot measure your ability, motivation, and other factors. We talk about three types of variables.

Type of VariableDefinitionExample
Response VariableThe variable that we measure to assess the significance of the resultsincome of a sample of people
Explanatory VariablesVariables whose impact on the response variable we plan to measureyears of schooling completed for people in the sample
Lurking or Unobserved VariablesVariables that may affect the response variable but which we cannot measureindividual ability; individual motivation

By definition, lurking variables affect the response variable. That by itself is not a problem. A problem exists when we cannot rule out correlation between the lurking variables and the explanatory variables. For example, people who are highly motivated may be more likely to go to college.

When unobserved variables and explanatory variables are likely to be correlated, economists say that the results are biased. In other fields, I gather that one says that the explanatory variables and the lurking variables are confounded.

For example, it could be that going to college has no effect on income. It could be that highly motivated people earn higher incomes, and highly motivated people also tend to go to college. If we only observe education and income without controlling for motivation, we would deceive ourselves into thinking that going to college raises your income.

Most of the challenge in statistics-based research consists of designing ways to innoculate the study from the potential biases of unobserved or lurking variables. This requires a combination of subject-area expertise (so that you can anticipate lurking variables) and statistical know-how.

Sampling Design

Some types of sampling are prone to cause bias.

• Convenience sampling, which means sampling the people that are easiest to locate. For example, if a conservative web site takes a poll on the issue of tax cuts, it is unlikely to get a sample that represents the opinions of the entire U.S. population.

• Voluntary response sampling, which means soliciting responses to a survey and looking at the results of people who choose to respond. Non-respondents tend to be people who do not have very strong opinions on a topic. For example, a pre-election voluntary-response poll will tend to reflect the views of strong partisans, but not of independents.

Random sampling is used to try to minimize the introduction of bias. However, there are some potential sources of bias that remain even with a random sample.

• Nonresponse bias.

For a sample to be truly random, the response rate must be 100 percent, or you have to be able to prove somehow that non-response is not correlated with other variables in your analysis. Nonresponse is a major issue in real-world statistics. One procedure for dealing with it is to make a special effort to re-survey a sample of non-respondents and to use that information to make inferences about the remaining non-respondents.

• Response bias.

This is the term that describes the bias due to people lying or telling the interviewer what they think the interviewer wants to hear.

• Wording effects.

Two polls attempting to measure the same thing can get different results depending on how questions are worded, or on what material preceded the question.

Sometimes, there is a concern that a random sample will not turn up enough people in a certain category. In this case, a stratified random sample is chosen.

For example, economists may want to study the distribution of wealth, and a simple random sample of people will not turn up very many millionaires. Suppose that the population under study has 400 people who are millionaires and 40,000 people who are non-millionaires. In a stratified sample, you might choose a random sample of 100 millionaires and another random sample of 100 non-millionaires.

This is a legitimate random sample. However, for making inferences about the entire population, you need to reweight the results of the non-millionaires upward by a factor of 100 to reflect the fact that they are more prevalent in the population than in your sample.

Terminology for Studies

Here is some terminology from section 5.2 of your book. I never had to study it in economics, but it gets used in other fields. Make sure that you can give a definition and explain the significance of each of these terms.

• observational study
• experiment
• experimental units
• subjects
• response variable
• treatment
• factor/factor level
• observation
• control group

Suppose that you want to test the effectiveness of a drug. If you just give people the drug and count the proportion of people who got better, you do not know how well they would have done without the drug. So you give the drug to one sample and you give nothing to another sample--the control group.

Suppose that the drug's effectiveness will be evaluated subjectively. People who know they got the drug may be more likely to report success than people who know they got nothing. The solution for this is a "blind" experiment, in which some people are given a drug and others are given a placebo, and the subjects do not know which is which.

An even more extreme concern is that the people collecting the data might be biased. In that case, you do not even tell them who got the placebo and who got the drug until after they have collected the data. That is called a double-blind study.

For experiments, you want to randomly choose which experimental units receive which treatments. You choose block design when you are aware of lurking variables and you do not have the ability to choose a large enough sample to make their effects random.

For example, suppose you were looking at the effectiveness of two different sunscreens on a sample of 20 people. In the absence of any lurking variables, you would assign 10 subjects to sunscreen A and ten to sunscreen B.

However, suppose that four people in the sample are really fair-skinned and burn easily. If your random procedure happens to assign three of them to the group using sunscreen A, then the results will be biased against suncreen A. So you would "block" the sample: you would put the four fair-skinned people in one block, and randomly assign two each to sunscreen A and sunscreen B. The remaining sixteen people would be in the other block. Those sixteen also would be randomly divided between sunscreen A and sunscreen B.

An extreme way to control for lurking variables is matched-pairs design. In the sunscreen example, suppose that we had ten sets of twins. We could take each set of twins, and within the set flip a coin to decide which gets sunscreen A and which gets sunscreen B. These ten matched pairs would produce tight control over lurking variables.

For more discussion of experimental design, see this chapter from Philip B. Stark's online statistics course.