Statisticiness

‘Scott Alexander’ writes,

But r = 0.23 means the percent of variance explained is 0.23^2 = ~5%. If some Social Darwinist organization were to announce that they had evidence that who your parents were only determined 5% of the variance in wealth, it would sound like such overblown strong evidence for their position that everyone would assume they were making it up.

His point is that some pundits have used a recent study to claim that inherited wealth is really, really important, even though numerically the study fails to show that.

Suppose we define mathiness as people making misleading, ideologically loaded claims about what their theorems prove. It seems fair to suggest that statisticiness is a similar problem.

The Art of Statistical Scamming in Experiments

John Bohannon writes,

Here’s a dirty little science secret: If you measure a large number of things about a small number of people, you are almost guaranteed to get a “statistically significant” result. Our study included 18 different measurements—weight, cholesterol, sodium, blood protein levels, sleep quality, well-being, etc.—from 15 people. (One subject was dropped.) That study design is a recipe for false positives.

Usually, I think of health studies as bad because they are non-experimental. But this is a way to scam experimental studies.

What I’m Reading

The Aggregate Production Function and the Measurement of Technical Change: ‘Not Even Wrong’ by Jesus Felipe and John S.L. McCombie. It is a long technical book. Here is my attempt to summarize one of the main arguments.

Suppose I give you two observations, which might come from otherwise-similar economies or from the same economy at two different points in time:

1. Output per worker = 400, capital per worker = 100.

2. Output per worker = 410, capital per worker = 110.

Can you calculate the elasticity of output with respect to capital?

The answer is “yes” if we are measuring physical units. Bushels per worker. Tractors per worker.

But suppose that we are using national income accounting data. Then our measure of output is GDP. And our measure of capital is income not going to labor. Now, in addition to having well-known aggregation problems in computing output and a capital index, we have to assume implicitly that the marginal product of the 10 additional units of capital is the same as the average product of the first 100 units of capital. But that amounts to assuming that you knew the answer before you even had the second observation. You are only pretending to learn from the data.

This calls into question a whole lot of empirical research purporting to describe economic growth or cross-country productivity differentials.

Interpreting Roland Fryer

He is the latest winner of the John Bates Clark Medal. The announcement reads, in part

Roland Fryer in a series of highly-influential studies has examined the age profile and sources of the U.S. racial achievement gap as measured by standardized test scores for children from 8 months to seventeen years old. Fryer (with Steven Levitt) has shown the black-white test score gap is quite small in the first year of life, but black children fall behind quickly thereafter (“Testing for Racial Differences in Mental Ability among Young Children,” American Economic Review 2013). The racial test score gap is largely explained by racial differences in socioeconomic status at the start of schooling (“Understanding the Black-White Test Gap in the First Two Years of School,” Review of Economics and Statistics 2004), but observable family background and school variables cannot explain most of the growth of the racial test score gap after kindergarten. Fryer’s comprehensive chapter in the Handbook of Labor Economics (2011, “Racial Inequality in the 21st Century: The Declining Significance of Discrimination”) documents that racial differences in social and economic outcomes today are greatly reduced when one accounts for educational achievement gaps. He concludes that understanding the obstacles facing minority children in K12 schools is essential to addressing racial inequality. Fryer has taken up this challenge to study the efficacy of education policies to improve the academic achievement and economic outcomes of low-income and minority children.

His research has a lot of bearing on the Null Hypothesis. Some of his papers contradict the Null Hypothesis, and some do not.

It certainly is intriguing that the racial test score gap is low in the first year of life and rapidly rises early in the school years but that “observable family background and school variables cannot explain most of the growth of the racial test score gap after kindergarten.” Some possibilities:

1. The Null Hypothesis is incorrect, but the school variables that make a difference are subtler than what we now find to be “observable.” Some of Fryer’s other studies might lend some support to this, but other of his studies would not.

2. The Null Hypothesis is correct because test scores performance is dominated by non-school environmental factors.

3. The Null Hypothesis is correct because test score performance is dominated by genetic factors. Then the problem is to explain why the gap appears at age seven (say) but not at age one. The lack of any gap at age one might be due to tests not being able to discriminate ability as well at that age as at later ages. This would give rise to a measurement error problem, one which biases differences toward zero.

Incidentally, someone pointed me to the blogger Isegoria’s link to a journal article entitled Individual Differences in Executive Functions Are Almost Entirely Genetic in Origin. The article comes from 2008, and the finding of 99 percent heritability strikes me as ridiculous. My guess is that if the same person is measured for executive function by two different investigators, the correlation will not be anywhere close to 99 percent. I hereby invoke Merle Kling’s third iron law.

Throw Peer Review Under the Bus?

From The Independent

Richard Smith, who edited the British Medical Journal for more than a decade, said there was no evidence that peer review was a good method of detecting errors and claimed that “most of what is published in journals is just plain wrong or nonsense”.

…Speaking at a Royal Society event earlier this week, he said an experiment conducted during his time at the BMJ, in which eight deliberate errors were included in a short paper sent to 300 reviewers, had exposed how easily the peer review process could fail.

Pointer from Jason Collins.

What might be better? Off the top of my head, I propose that:

1. No individual study should receive more than a page or two in a journal. Just explain the findings, interpret them, and put all of the methodological details and literature review on the author’s web page. Results from all such papers should be treated as “preliminary and unconfirmed.” Accept any study for publication, including studies with findings of “no significant effect.”

2. Longer articles should be survey articles that focus on studies that have been replicated and confirmed. The survey articles should also report on studies where attempted replication failed or the method was otherwise shown to be invalid.

3. Do not assign high status to researchers just because they get studies published. Instead, assign high status to researchers who attempt to replicate or otherwise confirm other studies and also to researchers whose work is cited favorably in survey articles.

Look at it and Think About it

That is John Cochrane’s advice on the unit root issue.

A unit root means a random walk component. A random walk will eventually pass any upper and lower limit. Look at it [the unemployment rate]. That’s as stationary a series as you’re going to find in economics. (“Look at it” and “think about it” are the Cochrane unit root tests.)

Yes, unemployment like other stationary ratios in macro (consumption/GDP, hours/day, etc.) have important and frequently overlooked low-frequency movements. But they are far from random walks, and they like unemployment have a very large transitory component at business cycle frequencies. When unemployment is above 8%, it is a good bet that it will decline over the next 5 years.

Four Forces Watch: Poor Children Have Smaller Brains

The Washington Post did not put this story on page one.

>New research that shows poor children have smaller brains than affluent children has deepened the national debate about ways to narrow the achievement gap.

Most of the story goes with the assumption poverty causes smaller brain size. But amazingly enough, the story also includes this alternative interpretation:

But James Thompson, a psychologist at University College London, has a third theory.

“People who have less ability and marry people with less ability have children who, on balance, on average, have less ability,” he said. Thompson noted that there is a genetic component to intelligence that Noble and Sowell failed to consider.

“It makes my jaw drop that we’ve known for years intelligence is inheritable and scientists are beginning to track down exactly how it happens,” Thompson said. “The well-known genetic hypothesis has not even had a chance to enter the door in this discussion.”

The story also quotes Charles Murray.

“I would be astonished if children’s brain size were NOT correlated with parental income. How could it be otherwise?”

The politically correct presumption would be that the brain size of poor children can be increased using some government programs. Can we verify that by comparing brain sizes of identical children raised apart? Or by comparing brain sizes of children randomly chosen for pre-school programs with children from a control group?

Murray has more commentary here, including a historical scientific controversy over whether there even exists a relationship between brain size and intelligence among humans.

Genghis Khan on Structural Change in Finance

Stanley Fischer said,

To conclude, the U.S. financial system has changed a great deal over the past several decades. One of the most important changes has been the rapid growth of the nonbank sector. Many reforms have been adopted for both banks and nonbank financial institutions. But regulation is a cat and mouse game. Regulators need to respond to existing regulatory gaps and to keep pace with further changes. We hope we will succeed in doing so. But we know that we will never be able to identify in advance all the threats to stability that are out there, and that it is therefore all the more critical to maintain and strengthen the robustness of our financial institutions, and of the financial system as a whole.

Read the whole thing. Pointer from Mark Thoma.

Both theoretical and empirical work in macroeconomics tends to ignore structural changes of this kind. On the empirical side, think of econometrics. There are three classes of factors at work in macroeconomics data. One is short-term noise, such as a cash-for-clunkers program adding to auto sales one quarter and subtracting from them the next. Another is cyclical drivers–the sorts of things that your theory suggests as causal factors in macroeconomic fluctuations. Finally, there are structural changes, such as the changes in financial markets that Fischer is talking about.

I do not think that it is possible to sift through these three factors without using judgment. Just dumping the data into your econometrics software is an exercise in garbage-in, garbage-out.

Response to a Comment

Concerning Gregory Clark’s findings of the absence of high multi-generational mobility, a commenter writes,

I still can’t believe things are quite as static as he makes them out to be, but I don’t know enough to dispute any of his specific findings. The model of human social behavior I carry around in my brain just doesn’t match the one he presents.

One thing we know is that there is high variance in outcomes across siblings. Back when people had many children, it may have been the case that if you were well off it was very likely that at least one of your grandchildren would be well off, but not so likely that every one of your grandchildren would be well off. With people having fewer children, either multigenerational mobility will go up or other forces (such as stronger assortative mating) will offset what otherwise would be an increase in random variation across generations.