Trying to raise the status of Edward Leamer

[Note: askblog had an existence prior to the virus crisis. I still schedule occasional posts like this one.]

In this article, I say that Edward Leamer deserves a Nobel Prize.

Edward Leamer deserves the Nobel Prize in Economic Sciences for launching the movement to examine critically the uses of statistical methods in empirical research. The movement has had repercussions that go beyond econometrics. It has affected medicine and epidemiology, where John P. A. Ioannidis has been a leading figure in pointing out methodological failures (Ioannidis 2005; 2016; Begley and Ioannidis 2015). It has impelled psychology and behavioral economics to confront what has become known as the ‘replication crisis’ (Camerer et al. 2018).

. . .After Leamer pointed out problems inherent in the multiple-regression
approach and the inevitable specification searching that it involves, economists
have turned to quasi-experimental methods

The policy of Econ Journal Watch is not to allow the author to give acknowledgments to editorial staff, but Brendan Beare and Jason Briggeman did a great deal to improve the essay.

Genes and heritability: from the comments

At least two commenters pointed to an article that indicates that the use of genes to predict height has gotten more effective.

One of them wrote,

Furthermore, the DNA chips used in today’s genome-wide association studies contain a few million variants at most, so these studies cannot even in principle recover the full heritability which is strongly influenced by very rare variants.

This is the answer. Polygenic scores are, for now, based on SNPs. Whole-genome sequencing (WGS) recovers full heritability for height.

As the other commenter put it,

in a short amount of time, we’ve gone from “17%” as “most predictive”, to another study saying 40%, to a new one getting close to the heritability range.

Road to sociology watch

I have a hypothesis that the trend in mainstream economics is toward an increased focus on Gender, Race, Inequality, and Climate. So I went to the home page of the American Economic Review, where you can look at the table of contents for past issues of the journal and also look at the table of contents of each, without being a member. I am a member, so I can see the actual articles, but for this purpose that was not necessary.

I checked out the “papers and proceedings” issue, which lists all of the sessions at the previous year’s annual convention that were selected for publication. Prior to 2018, this was always the May issue of the AER, but starting in 2018 this is broken out as a separate publication, called AEA Papers and Proceedings. Note that the annual meeting takes place right at the beginning of the year (it used to be right at the end of the preceding year). The sessions have to be planned well in advance, so for example it was difficult to arrange many sessions about the 2008 financial crisis in the 2009 meetings.

Looking at the titles of the sessions, I attempted to classify them as Gender, Race, Inequality, Climate, or all Other. This is somewhat subjective. Looking at some more recent session titles and using my best judgment, here is what I came up with:

year G R I C O
2007 1 0 1 0 25
2008 1 0 1 0 23
2014 2 1 2 1 23
2015 1 1 4 1 25
2016 3 2 4 1 23
2017 3 0 4 1 26
2018 5 3 3 2 22
2019 5 1 6 0 21

Another approach is to do this at the level of papers, rather than sessions using the JEL classification codes as assigned by the authors. For any paper that is plausibly in the GRIC category, I looked at the paper description that includes the JEL codes. Those that include a JEL code of J16 or K38 would be in the G category, those that include a code of J15, J70, J71, J78, or J79 would be in the R category, etc.

An interesting contrast is between the papers in the May 2012 AER (meetings organized by Chris Sims) and the May 2013 AER (meetings organized by Claudia Goldin). In the table below, T stands for the total number of papers.

type 2012 2013
G 1 13
R 2 5
I 2 8
C 2 4
T 105 110

Note that I avoided double-counting. If a paper had both a J16 code (G) and a J15 code (R), I tried to pick the one that best fit the paper.

With the liberal Goldin, there were 30 GRIC papers out of 110. With Sims, who I assume is conservative, there were 7 out of 105. My guess is that we won’t see any more AEA meetings organized by conservatives.

Other notes:

–from 2005 through 2012, the number of GRIC papers ranged from a low of 7 to a high of 19; from 2013 through 2019, the number ranged from a low of 19 to a high of 41. The peak of 41 (36.9 percent of papers) was reached in 2018, meetings organized by Olivier Blanchard.

Suits vs. geeks in the virus crisis

1. Allison Schrager writes,

Among the unknowns about the virus: the true hospitalization and death rates; how infectious it is; how many asymptomatic patients are walking around; how it affects young people; how risk factors vary among different countries with different populations, pollution levels and urban densities. It seems certain the virus will overwhelm hospitals in some places, as it has in China and Italy. We also don’t know how long these extreme economic and social disruptions will last. Without reliable information, predictions are based on incomplete data and heroic assumptions.

…The way forward is testing as many people as possible—not only people with symptoms. Some carriers are asymptomatic. California is starting to test asymptomatic young people to learn more about transmission and infection rates. Testing everyone may not be feasible, but regularly testing a random sample of the population would be informative.

This is the analytical mindset, which is sorely needed. What I called the “suits vs. geeks divide” in 2008 is haunting us again. Ten days ago, the challenge was to get the suits to understand exponential growth. Hence, they were two weeks behind. Now, the challenge is to get the suits to make decisions based on rational calculations as opposed to fears or whoever shouts the loudest in their ears.

But much needs to change. Think about the “analytics revolution” in baseball. In the 1980s, the revolution started*, with Bill James and others questioning the value of the routinely-calculated statistics. Just as one example, data geeks discovered that a batter’s value was better measured by on-base percentage than batting average, even though the latter was prominently featured in the newspapers and the former was not. Soon, the geeks started longing for statistics that weren’t even being kept, and they started efforts to track and record the desired metrics.

(*In 1964, Earnshaw Cook wrote an analytical book, but he drew no followers, probably because personal computers had not yet been invented.)

Based on what we are seeing now, I think that epidemiology is ripe for an analytics revolution. To me as an outsider, the field relies too much on simulations using hypothetical parameters and not enough on identifying the data that would be useful in real time and making sure that the such data gets collected.

2. James Stock writes,

A key coronavirus unknown is the asymptomatic rate, the fraction of those infected who have either no symptoms or symptoms mild enough to be confused with a common cold and not reported. A high asymptomatic rate is decidedly good news: it would mean that the death rate is lower, that the hospital system is less likely to be overrun, and that we are closer to achieving herd immunity. From an economic point of view, a high asymptomatic rate means it is safe to relax restrictions relatively soon, and that hospitalizations can be kept within limits as economic activity resumes.

Conversely, a low asymptomatic rate would require trading off losing many lives against punishing
economic losses.

Neither the asymptomatic rate nor the prevalence of the coronavirus can be estimated if tests are prioritized to the symptomatic or if the included asymptomatic are unrepresentative (think NBA players).

Instead, we need widespread randomized testing of the population.

It may seem counterintuitive that we should be rooting for a high number of people running around with the virus without symptoms. But that would mean, among other things, that their presence is not creating huge risks for the rest of the population. You want the ratio of mild cases to emergency-room cases to be high.

3. Larry Brilliant says,

We should be doing a stochastic process random probability sample of the country to find out where the hell the virus really is.

Note that he has a lot of anger against President Trump. I won’t push back at Mr. Brilliant (I’m not being sarcastic, that is his name), but I think his rhetoric is stronger than his case. See my post on anger.

4. Dan Yamin says,

But there is one country we can learn from: South Korea. South Korea has been coping with corona for a long time, more than most Western countries, and they lead in the number of tests per capita. Therefore, the official mortality rate there is 0.9 percent. But even in South Korea, not all the infected were tested – most have very mild symptoms.

The actual number of people who are sick with the virus in South Korea is at least double what’s being reported, so the chance of dying is at least twice as low, standing at about 0.45 percent – very far from the World Health Organization’s [global mortality] figure of 3.4 percent.

He is at least taking care not to take statistics at face value. But don’t be satisfied with trying to guess based on data that don’t measure what you want. Try to get the authorities to provide you with the numbers you need.

Computer models and the ADOO loop

When you ask a question of a computer model, it provides out to several decimal places answers that can be off by several orders of magnitude. Give me a clear, logical back-of-the-envelope calculation grounded in real-world data over a model simulation any day.

Several people have sent me links to papers that use computer models to purport to simulate the economic consequences of alternative strategies for dealing with the virus. I don’t bother reading them. When I see that Jeffrey Shaman’s pronouncements about the rate of asymptomatic spreading are based on a simulation model, I assign them low confidence.

Once you build a model that is so complex that it can only be solved by a computer, you lose control over the way that errors in the data can propagate through the model. For me, it is important to look at data from a perspective of “How much can I trust this? What could make it misleadingly high? What could make it misleadingly low?” before you incorporate that data into a complex model with a lot of parameters.

I read that in the U.S. we have done 250,000 tests for the virus, and yet we have only 35,000 positive cases. But before we jump to any conclusions based on this, we ought to get an idea of how many of these tests are re-tests. If the average person who is tested is tested three times, then almost half of the people being tested are positive. I have no idea what the average number of tests per person actually is–it probably isn’t as high as three, but it isn’t as low as one, either.

A lot of people are quoting lines from Gene Krantz in the movie Apollo 13. One of my favorites is when he warns against “guessin’.” Computer models are just “guessin'” in my view. Making decisions based on models is approximately as bad as making them based on blind panic.

I am constantly calling for taking a random sample of the population, say 5000 people, and testing them on a repeated bases. I am quite willing to take some testing resources away from being used for people walking in with symptoms. If people have symptoms and we don’t have resources to test them, then isolate them as if they were infected, in a non-hospital setting. You can base the decision about when to hospitalize the person on how their symptoms progress.

We don’t yet have a proven drug treatment, so you don’t really help an infected person by testing them. Testing helps reassure the uninfected people that they don’t need to be totally isolated. That is a benefit, but not enough to justify putting all our resources into people with symptoms, leaving no resources for random testing.

The OODA loop says, “Observe, Orient, Decide, Act.” Right now, our public policy seems like we’re in an ADOO loop–“act, decide, orient, observe.” I find it frustrating.

Addressing the issue of asymptomatic spreading

From the WSJ.

“Certainly there is some degree of asymptomatic transmissibility,” Anthony Fauci, the director of the National Institute of Allergy and Infectious Diseases, said at a news conference Friday. “It’s still not quite clear exactly what that is. But when people focus on that, I think they take their eye off the real ball, which is the things you do will mitigate against getting infected, no matter whether you are near someone who is asymptomatic or not.”

I think Dr. Fauci has missed the point. It’s one thing for me as an individual to treat everyone around me as if they could be a spreader, and act accordingly. I don’t shut down the economy by washing my hands a lot and staying 6 feet away from people.

But when public officials treat everyone as a spreader and order people to shelter in place, that does shut down the economy. So I think it is important to make an informed decision about whether treating everyone as if they could be spreaders is wise. That is, it would help to be able to know the results of the experiment, or to be able to anticipate the results.

The article goes on to say,

Researchers have posted to the open-access site MedRxiv their own recent studies that used data from the outbreak that suggest people can be infectious sometimes days before they show symptoms of Covid-19. Some reports suggest some carriers never experience any.

But being asymptomatic only makes you dangerous if you can be a spreader. The story gives numbers from one research paper.

. . .early in China’s outbreak, 86% of infections went undetected. The paper also noted that because they were so numerous, stealth infections were the source for roughly 80% of known ones.

This isn’t quite the answer we need, though.

Let C be the event “come in contact with someone with the virus who is asymptomatic.”

Let I be the event “become knowingly infected with the virus.”

What the quoted paragraph gives is the claim that P(C|I)= 80/100. That says that of every 100 people knowingly infected, 80 got the infection from coming in contact with an asymptomatic carrier. What I want to know is P(I|C). Out of 100 people who come in contact with an asymptomatic carrier, how many will become knowingly infected? P(I|C) = P(C|I)*P(I)/P(C).

At first, I thought that there cannot be more asymptomatic carriers than there are people infected, so P(I) has to be greater than than P(C). So if the report is correct, out of every 100 people who come into contact with an asymptomatic carrier, more than 80 will become infected. That would seem to justify a lockdown policy.

But remember the important modifier knowingly infected. If not everyone is tested, then certainly there can be more asymptomatic carriers than there are people knowingly infected. If there are 10 times more, then out of 100 people who come in contact with an asymptomatic carrier, only 8 will themselves become infected, and that might not be enough to justify crippling the economy by telling everyone to shelter in place.

So I still think we need harder data. And yet once again, I make a plea for random testing. Since we know P(I), if we also knew P(C), we could make an intelligent estimate of the key probability, P(I|C). That in turn would help inform public policy decisions that are of huge import.

Price discrimination explains adjunct salaries?

Tyler Cowen writes,

My immediate reaction was “Given the crowding in the sector, and that they presumably earn non-pecuniary returns from the enjoyment of teaching, shouldn’t we be taxing them at a higher rate?”

He is referring to the low salaries for adjunct professors. A college that pays different salaries to full-time faculty and adjuncts is engaging in price discrimination (actually, wage discrimination). Just as a price discriminator tries to charge based on willingness to pay, the wage discriminator tries to pay according to willingness to work. Like it or not, low pay for adjuncts is an efficient outcome.

Basic econ: costs of production

I am starting to work on filling in/updating some of these college economics topics. Students land on the site when they want to get help with their college econ courses. But I plan to include some occasional “improvements” to mainstream thinking. I did not much need to amend mainstream thinking in my first topic, costs of production.

Other things equal, when fixed costs are high, there will be only a few firms. When fixed costs are low, there will tend to be many firms. When the Internet reduced the fixed cost of becoming a publisher, because you no longer need a printing press, the number of providers of written content skyrocketed.

Why are polygenic scores not better?

Start with what I said in my review of Robert Plomin’s Blueprint.

Plomin is excited by polygenic scores, a recent development in genetic studies. Researchers use large databases of DNA-sequence individuals to identify combinations of hundreds of genes that correlate with traits.

The most predictive polygenic score so far is height, which explains 17 percent of the variance in adult height… height at birth scarcely predicts adult height. The predictive power of polygenic scores is greater than any other predictors, even the height of the individuals’ parents.

One can view this 17 percent figure either as encouraging or not. It represents progress over attempts to find one or two genes that predict height, an effort that is futile. But compared to the 80 percent heritability of height it seems weak.

Plomin is optimistic that with larger sample sizes better polygenic scores will be found, but I am skeptical.

My question, to which I do not have the answer, is this: if height is 80 percent heritable, why is the statistical correlation found between genes and height only 17 percent?

I do not know any biology. But as a statistician, here is how I would go about developing a polygenic score.

1. I would work with one gender at a time. Assume we have a sample of 100,000 adults of one gender, with measurements of height and DNA sequences. I would throw out the middle 80,000 and just work with the top and bottom deciles.

2. For every gene, sum up the total number in the top decile with that gene and the total number in the bottom decile with that gene, and see where the differences are the greatest. If 8500 in the top decile have a particular gene and 1200 in the bottom decile have the gene, that is a huge difference. 7500 and 7200 would be a small difference. Take the 100 largest differences and build a score that is a weighted average of the presence of those genes.

3. To try to improve the score, see whether adding the gene with the 101st largest difference improves predictive power. My guess is that it won’t.

4. Also to try to improve the score, see whether adding two-gene interactions helps the score. That is, does having gene 1 and gene 2 make a difference other than what you would expect from having each of those genes separately? My guess is that some of these two-gene interactions will prove significant, but not many.

It seems to me that one should be able to extract most of the heritability from the data by doing this. But perhaps this approach is not truly applicable.

Another possibility is that heritability comes from factors other than DNA. Perhaps the reliance on twin studies to try to separate environmental factors from genetic factors is flawed, and the heritability of height comes in large part from environmental factors. Or perhaps DNA is not the only biological force affecting heritability, and we need to start looking for that other force.

Another possibility is that scientists are working with much smaller sample sizes. If you have a sample of one thousand, then the top decile just has one hundred cases in it, and that is not enough to pick out the important DNA differences.

As a related possibility, the effective sample sizes might be small, because of a lot of duplication. Suppose that the top decile in your sample had mostly Scandinavians, and the bottom decile had mostly Mexicans. Your score will be good at separating Scandinavians from Mexicans, but it will be of little use in predicting heights within a group of Russians or Greeks or Kenyans or Scots.

I am just throwing out wild guesses about why polygenic scores do not work very well. I probably misunderstand the problem. I wish that someone could explain it to me.

The self-quarantine decision: my thought process

Even though we have no symptoms and no reason to believe we have been infected, my wife and I are going to try to do everything reasonable to reduce outside contact for a while. Call it “social distancing” or self-quarantining.

This means giving up discretionary trips to the grocery store or other shopping. It means giving up going to dance sessions (that is a big sacrifice, as far as I am concerned). It means not having social meals with others. It means not going to visit our children and grandchildren (an even bigger sacrifice).

My thought process is this:

1. I would rather be in front of an exponential curve than behind it.

When I started my Internet business in April of 1994, most people had not heard of the World Wide Web, and many of those who had heard of it took a “wait and see” attitude about whether it would work out as a business environment. It only became clear that the Web was a business platform more than a year later. But by that time, it was harder to ride the curve.

A lot of people, including government leaders in most countries, are going with a “wait and see” approach before reacting to the virus. They are certainly not getting ahead of the curve. In a few weeks, the self-quarantine decision we are taking may be imposed on everyone. Meanwhile, we hope to reduce our chance of contracting the virus and becoming spreaders.

2. In an uncertain situation, I like to compare the upside and the downside. When the upside of doing something is high and the downside is low, go for it. When it’s the opposite, avoid it.

So think about the upside and the downside of going about our normal business instead of self-quarantining. The upside would be that for the next few weeks I get to dance more and spend more time with friends and family. The downside is that I contract the virus and spread it. I think that the downside, even though it is unlikely, is worse, especially becoming a spreader.

3. How long will we self-quarantine? Either we’ll get something like an “all-clear” signal in a few weeks, or, if my worst fears are correct, there will be government-imposed measures that are as strong or stronger than what we are taking.

4. If I were in government, I would, in addition to making an all-out effort to test people with pneumonia symptoms, be making a large effort to test a sample of asymptomatic people. And re-test people in that sample every few days. From a statistical perspective, random testing strikes me as necessary in order to get a reliable picture of the epidemic. I would not trust an “all-clear” signal that was not backed by evidence from random testing.

Note that this post is not about the current Administration, so please self-quarantine your political comments and take them elsewhere.

UPDATE: John Cochrane recommends an essay by Tomas Pueyo. The message is to respect the exponential curve.