Decentralized Data Collection

Virginia Postrel writes,

Premise reverses the usual do-gooder assumption about the Internet’s benefits for people in developing countries — that it supplies precious information from abroad. (People in Pakistan can take online courses from MIT!) Instead, it turns those ubiquitous phones into a way of bypassing distant bureaucrats to get systematic information, collected by people who understand the local territory, out of the shadows and into the world economy.

Read the whole thing, which describes an app that allows businesses and governments to undertake research about price trends and other economic phenomena in developing countries.

Machine Learning and Holdback Samples

Susan Athey writes,

One common feature of many ML methods is that they use cross-validation to select model complexity; that is, they repeatedly estimate a model on part of the data and then test it on another part, and they find the “complexity penalty term” that fits the data best in terms of mean-squared error of the prediction (the squared difference between the model prediction and the actual outcome).

Pointer from Tyler Cowen.

In the early 1980s, Ed Leamer caused quite a ruckus when he pointed out that nearly all econometricians at that time engaged in specification searches. The statistical validity of multiple regression is based on the assumption that you confront the data only once. Instead, economists would try dozens of specifications until they found one that satisfied their desires for high R-squared and support for their prior beliefs. Because the same data has been re-used over and over, there is a good chance that the process of specification searching leads to spurious relationships.

One possible check on the Leamer problem is to use a holdback sample. That is, you take some observations out of your sample while you do your specification searches on the rest of the data. Then when you are done searching and have your preferred specification, you try it out on the holdback sample. If it still works, then you are fine. If the preferred specification falls apart on the holdback sample, then it indicates that your specification searching produced a spurious relationship.

Machine learning sounds a bit like trying this process over and over again until you get a good fit with the holdback sample. If the holdback sample is a fixed set of data, then this would again lead you to find spurious relationships. Instead, if you randomly select a different holdback sample each time you try a new specification, then I think that the process might be more robust.

I don’t know how it is done in practice.

Significance Comparisons and Measurement Error

Leilan Shu and Sara Dada report,

We first use a simple linear regression model of average test score and average household income to first establish a positively correlated relationship. This relationship is further analyzed by differentiating for other community-based factors (race, household type, and educational attainment level) in three multiple variable regression models. For comparison and to evaluate any consistencies these variables may have, the regressions were run on data from both 2007 and 2014. In both cases, the final multiple regressions found that average household income was not statistically significant in impacting the average test scores of the counties studied, while household type and educational attainment level were statistically significant.

Pointer from Tyler Cowen. If this were credible, it would seem to suggest that “schooling inequality” is really ability inequality.

BUT…Whenever somebody says that “X1 does better than X2 at predicting Y,” watch out for the impact of measurement error. A variable that is measured with less error will drive out a variable that is measured with more error.

In this case, suppose that the variable that matters is “parents’ resources.” Income could measure that variable. Educational attainment could predict that variable. Income has many sources of measurement error–if nothing else, one year’s income could be high or low due to volatility. Educational attainment has fewer sources of measurement error. So even if parents’ resources is the true cause of children’s test scores, you could wind up with a zero coefficient on income, particularly if you include another regressor with lower measurement error.

And this is one of many reasons to prefer experimental data to regressions.

Chris Blattman on Experiments

In a must-read post, he describes a number of methodological problems with the interpretation of experiments in social science, but says

There’s no problem here if you think that a large number of slightly biased studies are worse than a smaller number of unbiased and more precise studies. But I’m not sure that’s true. My bet is that it’s false. Meanwhile, the momentum of technical advance is pushing us in the direction of fewer studies.

For me, the crux of the issue is this remark from Blattman.

It’s only a slight exaggeration to say that one randomized trial on the shores of Lake Victoria in Kenya led some of the best development economists to argue we need to deworm the world. I make the same mistake all the time.

The way I would put it is that there is no such thing as a study that is so methodologically pure that by itself it can serve as a reliable guide to policy. As I wrote in What Else Would be True?, the results of any study need to be thought about in the context of other knowledge.

Often, one encounters studies with conflicting results. You tend to focus on the methodological flaws only of the studies with results that you do not like. But remember Merle Kling’s third iron law of social science: the methodology is flawed. That law applies to every study, including experiments.

Great Minds and Hive Minds

Scott Alexander on Garett Jones’ book:

Hive Mind‘s “central paradox” is why IQ has very little predictive power among individuals, but very high predictive power among nations. Jones’ answer is [long complicated theory of social cooperation]. Why not just “signal-to-noise ratio gets higher as sample size increases”?

Me:

Can we rule out statistical artifact? Put it this way. Suppose we chose 1000 people at random. Then we create 50 groups of them. Group 1 has the 20 lowest IQ scores. Group 2 had the next 20 lowest IQ scores, etc. Then we run a regression of group average income on group average IQ for this sample of 50 groups. My prediction is that the correlation would be much higher than you would get if you just took the original sample of 1000 and did a correlation of IQ and income. I think that this is because grouped data will filter out noise well. Perhaps the stronger correlation among national averages is just a result of using (crudely) grouped data.

Questions for Garett Jones

After a quick reading of Hive Mind. The core issue is what he calls the paradox of IQ. That is, among individuals, the correlation between IQ and income is modest. However, among nations, the correlation between average IQ and average income is strong.

How does your high IQ raise my income? Think of four possible explanations for this paradox.

1. Statistical artifact.
2. Proximity effect–I earn more income by living close to people with high IQ’s.
3. Cultural effect–people with high IQ’s transmit good cultural traits to me.
4. Political effect–having people with high IQ in my jurisdiction leads to me enjoying better government.

Can we rule out statistical artifact? Put it this way. Suppose we chose 1000 people at random. Then we create 50 groups of them. Group 1 has the 20 lowest IQ scores. Group 2 had the next 20 lowest IQ scores, etc. Then we run a regression of group average income on group average IQ for this sample of 50 groups. My prediction is that the correlation would be much higher than you would get if you just took the original sample of 1000 and did a correlation of IQ and income. I think that this is because grouped data will filter out noise well. Perhaps the stronger correlation among national averages is just a result of using (crudely) grouped data.

Can we sort out between proximity effects, cultural effects, and political effects? Perhaps a natural experiment involving people from different cultures living moving to different jurisdictions, or people living close to one another but having different cultures?

The most parsimonious proximity effect could be capital per worker. Assume that people tend to invest close to home (Jones calls this the Feldstein-Horioka effect when it applies across countries). Then if high-IQ people invest more wisely, then I will have better capital to work with if I live close to high-IQ people. Or if high-IQ people invest more (because, as Jones points out, they are more patient), then I will have more capital to work with if I live close to high-IQ people. How well does capital per worker serve as a channel for transmitting someone else’s IQ to my income?

Another proximity effect would be strong complementarity in team production (what Jones, following Kremer, calls the O-Ring effect). If the value of my output depends on the value of others in a team, then I will be better off living close to people with high IQ’s.

What happens when you divide the U.S. into fifty states and put teach state into the database with other countries? My guess is that Mississippi will look really good on average income relative to average IQ when you compare it with Denmark. If so, is that because of high capital per worker in Mississippi? A higher trust culture? Or better overall governance than Denmark?

What Else Would be True?

Chris Dillow writes,

we should remember the Big Facts. For example, one the Big Facts in finance is that active equity fund managers rarely beat the market for very long, at least after fees. This, as much as Campbell Harvey’s statistical work reminds us to be wary of the hundreds of papers claiming to find factors that beat the market.

Pointer from Mark Thoma

This is a good example of asking, “What else would be true?” When you are inclined to believe that a study shows X, consider all of the implications of X. In the example above, Dillow is suggesting that if one finds that there is some factor that allows one to earn above-market returns, how do we reconcile that with the fact that we do not observe active fund managers earning above-market returns?

Recall that I raised a similar question about the purported finding that in the United States worker earnings have gone nowhere as productivity increased. This should greatly increase the demand for labor. It should greatly increase international competitiveness, turning us into an export powerhouse. Since I do not see either of those taking place, and since many economists have pointed to flaws in the construction of the comparison of earnings and productivity, I think this makes the purported finding highly suspect.

In contrast, consider the view that assortative mating has increased and plays an important role in inequality. I have not seen anyone say, “IF that were true, then we would expect to observe Y, and Y has not happened.”

I think that this is the way to evaluate interpretive frameworks in economics. Consider many possible implications of an interpretive framework. Relative to those implications, do we observe anomalies? When you have several anomalies, you may choose to overlook them or to explain them away, but you should at least treat the anomalies as caution flags. If instead you keep finding other phenomena that are consistent with the interpretive framework, then that should make you more comfortable with using that framework.

Poor Replication in Economics

Andrew C. Chang and Phillip Li write,

we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable.

Pointer from Mark Thoma.

As an undergraduate at Swarthmore, I took Bernie Saffran’s econometrics course. The assignment was to find a paper, replicate the findings, and then try some alternative specifications. The paper I chose to replicate was a classic article by Marc Nerlove, using adaptive expectations. The data he used were from a Department of Agriculture publication. There was a copy of that publication at the University of Pennsylvania, so I went to their library and photocopied the relevant pages. I typed the data, put into the computer at Swarthmore–and got results that were nowhere close to Nerlove’s.

Research with Pre-commitment

Kimberly G. Noble writes,

In a study published this year in Nature Neuroscience, several co-authors and I found that family income is significantly correlated with children’s brain size — specifically, the surface area of the cerebral cortex, which is the outer layer of the brain that does most of the cognitive heavy lifting. Further, we found that increases in income were associated with the greatest increases in brain surface area among the poorest children.

…I am part of a team of social scientists and neuroscientists planning a large clinical trial in which 1,000 low-income mothers will be randomly assigned to receive either a large ($333) or small ($20) monthly income supplement for the first three years of their children’s lives. Periodic assessments of the children and their mothers will enable us to estimate the impact of these cash supplements on children’s cognitive, emotional and brain development, as well as the effect on family functioning.

…Our clinical trial is designed to provide strong evidence regarding whether and how poverty reduction promotes cognitive and brain development. This study, however, will take at least five years to complete — far too long for young children living in poverty today. We should not wait until then to push for policies that can help inoculate young children’s pliable brains against the ravages of poverty.

Pointer from Mark Thoma.

Her policy suggestions seem to me to be based quite a bit on emotion, and some of them do not even (to me) seem related to children’s brain development. This makes me concerned that perhaps she is so emotionally attached to her preferred policy solutions that she is pre-committed to finding that poverty causes small surface area of the cerebral cortex, rather than finding that the correlation comes from income and brain characteristics being correlated between parents and children. This my be an example of a study where only a positive finding will be published; a null finding may never see the light of day.

I am glad that she wants to do a controlled experiment. If the results come out the way she would like, then it will be an important finding. I would be happy to volunteer to help audit the study.

Noah Smith on Natural Experiments

He writes,

With lab experiments you can retest and retest a hypothesis over a wide set of different conditions. This allows you to effectively test whole theories. Of course, at some point your ability to build ever bigger particle colliders will fail, so you can never verify that you have The Final Theory of Everything. But you can get a really good sense of whether a theory is reliable for any practical application.

Not so in econ. You have to take natural experiments as they come. You can test hypotheses locally, but you usually can’t test whole theories. There are exceptions, especially in micro, where for example you can test out auction theories over a huge range of auction situations. But in terms of policy-relevant theories, you’re usually stuck with only a small epsilon-sized ball of knowledge, and no one tells you how large epsilon is.

Pointer from Mark Thoma. Read the whole post.