Why Published Results Can be Unreliable

Mark Peplow reports on research by Neil Malhotra, who tracked research projects to compare those that were published with those that were not.

Of all the null studies, just 20% had appeared in a journal, and 65% had not even been written up. By contrast, roughly 60% of studies with strong results had been published. Many of the researchers contacted by Malhotra’s team said that they had not written up their null results because they thought that journals would not publish them, or that the findings were neither interesting nor important enough to warrant any further effort.

Pointer from Mark Thoma.

This is not shocking news. Can anyone find Malhotra’s paper?

Me on Greg Clark’s Latest

I write,

his findings argue against the need to create strong incentives to succeed. If some people are genetically oriented toward success, then they do not need lower tax rates to spur them on. Such people would be expected to succeed regardless. The ideal society implicit in Clark’s view is one in which the role of government is to ameliorate, rather than attempt to fix, the unequal distribution of incomes.

The book I am reviewing is The Son also Rises, in which Clark argues that social status is highly heritable everywhere in spite of many differences in institutional rules. I spend a lot of the review talking about the statistical basis for Clark’s work.

Correlation, Signal, and Noise

As a public service, I am going to offer two propositions about correlation.

1. Where there is correlation, there is signal.

2. Where there is noise, correlation is understated.

The other night, I met with a large group of people to discuss Gregory Clark’s new book. Many people made comments that were uninformed regarding these two propositions.

For example, I gather that people who are strongly into political correctness are wont to say that “There is no reason to believe that IQ measures anything.” I think that is untrue.

Measured IQ is correlated with other variables, including education and income. Any variable that is reliably correlated with other variables must have some signal. It must be measuring something. It may not be measuring what it purports to measure. It may not have a causal relationship with the variables to which it is correlated. But to deny that it measures anything at all moves you deeply into science-denier territory.

Other comments suggest that people believe that if the correlation between parents and children on some variable is, say, 0.4, then this represents a ceiling on heritability. In fact, if measurement of the variable in question is subject to noise, then true heritability could be higher. For example, if IQ tests are inexact (which I assume they are), then it could be that the heritability of “true IQ” could be 0.6, even though the heritability of measured IQ is only 0.4. The opposite is not the case–random noise will not cause the measured IQ to appear more correlated than it really is. The bias is only downward.

I have written a review (may appear next month) of Clark’s book, and in my view the main contribution of his multigenerational studies of social mobility is to give us a means for assessing the impact of noise on heritability estimates. The affect appears to be large, meaning that some characteristics are far more heritable than one-generation correlation studies suggest.

Velasquez-Manoff on Causal Density

From An Epidemic of Absence.

The scientific method that had proven so useful in defeating infectious disease was, by definition, reductionist in its approach. Germ theory was predicated on certain microbes causing certain diseases. Scientists invariably tried to isolate one product, reproduce one result consistently in experiments, and then, based on this research, create one drug. But we’d evolved surrounded by almost incomprehensible microbial diversity, not just one, or even ten species. And the immune system had an array of inputs for communication with microbes. What if we required multiple stimuli acting on these sensors simultaneously? How would any of the purified substances mentioned above mimic that experience? “The reductionist approach is going to fail in this arena,” says Anthony Horner, who’d used a melange of microbes in his experiment. “There are just too many things we’re exposed to.”

In an essay over ten years ago, I wrote,

E.D. Hirsch, Jr., writes, “If just one factor such as class size is being analyzed, then its relative contribution to student outcomes (which might be co-dependent on many other real-world factors) may not be revealed by even the most careful analysis…And if a whole host of factors are simultaneously evaluated as in ‘whole-school reform,’ it is not just difficult but, despite the claims made for regression analysis, impossible to determine relative causality with confidence.”

In the essay, my own example of a complex process that is not amenable to reductionist scientific method is economic development and growth. In that essay, I also provide a little game, like the children’s game “mastermind,” to illustrate the difficulty of applying reductionism in a complex, nonlinear world. Try playing it (it shows up better in Internet Explorer than in Google Chrome).

The phrase “causal density” is, of course, from James Manzi and his book, Uncontrolled.

The Case Against VARs

In a comment on this post, Noah Smith commended to me the work of George-Marios Angeletos of MIT. Unfortunately, Angeletos is fond of vector autoregressions (VARs), which I detest.

I got my start in macro working on structural macroeconometric models. I saw them close up, and I am keenly aware of the problems with them. Hence, I wrote Macroeconometrics: The Science of Hubris.

However, I will give the old-fashioned macroeconometricians credit for at least worrying about the details of the data they are using. If there are structural factors that are changing over time, such as trend productivity growth or labor force participation, the macroeconometrician will keep track of these trends. If there are special factors that change quarterly patterns, such as the “cash-for-clunkers” program that shifted automobile purchases around, the macroeconometrician will take these into account.

The VAR crowd cheerfully ignores all the details in macro data. The economist with a computer program that will churn out VARs is like a 25-year-old with a new immersion blender. He does not want to spend time cooking carefully-selected ingredients. He just wants to throw whatever is in the pantry into the blender to make a smoothie or soup. (Note that I am being unfair to people with immersion blenders. I am not being unfair to people who use VARs.)

The VAR appeared because economists became convinced that structural macroeconometric models are subject to the Lucas Critique, which says that as monetary policy attempts to manipulate demand, people will adjust their expectations. My reaction to this is

(a) the Lucas critique is a minor theoretical curiosity. There are much worse problems with macroeconometrics in practice.

(b) How the heck does running a VAR exempt you from the Lucas Critique? A VAR is no less subject to breakdown than is a structural model.

The macroeconometric project that I first worked with is doomed to fail. Implicitly, you are trying to make 1988 Q1 identical to 2006 Q3 except for the one causal factor with which you are concerned. This cannot be done. There is too much Manzian causal density.

The VAR just takes this doomed macroeconometric project and cavalierly ignores details. It is not an improvement over the macroeconometrics that I learned in the 1970s. On the contrary, it is inferior. And if the big names in modern macro all use it, that does not say that there is something right about VAR. It says that there is something wrong with all the big names in modern macro. On this point, Robert Solow and I still agree.

Statistics vs. Calculus in High School

From a podcast with Russ Roberts and Erik Brynjolfsson (the guest):

Guest: My pet little thing, I just wanted to mention, is I’m not as much of a fan of calculus as I once was, and I’m on a little push in my high school to replace calculus with statistics. In terms of what I think is practical for most people, with the possible exception of Ph.D. economists: calculus is just widely needed. But that’s sort of a tangent. Russ: Well, it’s interesting. My wife is a math teacher, and she is teaching a class of seniors this year, split between calculus and statistics, for one of the levels of the school. And statistics is–I agree with you. Statistics is in many ways much more useful for most students than calculus. The problem is, to teach it well is extraordinarily difficult. It’s very easy to teach a horrible statistics class where you spin back the definitions of mean and median. But you become dangerous because you think you know something about data when in fact it’s kind of subtle. Guest: Yeah. But you read newspapers saying–I just grimace because the journalists don’t understand basic statistics, and I don’t think the readers do either. And that’s something that appears almost daily in our lives. I’d love it if we upped our education in that area. As data and data science becomes more important, it’s going to be more important to do that.

Most of the discussion concerns the new book The Second Machine Age, or what I call “average is over and over.”

Phillips Curve Specifications and the Microfoundations Debate

Scott Sumner writes,

As you may know I view inflation as an almost worthless concept… In contrast Krugman discusses the original version of the Phillips curve…which used wage inflation instead of price inflation. Whereas price inflation is a useless concept, wage inflation is a highly useful concept.

Fine. But Krugman also draws attention to how the level of the unemployment rate affects the level of the wage inflation rate. This takes us back to the original, pre-1970 Phillips Curve, from Act I in my terminology (Act I was the Forgotten Moderation, from 1960-1969, Act II was the Great Stagflation, from 1970-1985. Act III was the Great Moderation, from 1986-2007, and Act IV is whatever you want to call what we are in now.) The Act I Phillips Curve says flat-out that (wage) inflation will be high when unemployment is low, and vice-versa.

The Phillips Curve was revised in Act II, when the specification became that the rate of wage inflation increases when the unemployment rate is above below the NAIRU and decreases when it is belowabove the NAIRU. In other words, it relates the change in the rate of wage inflation to the unemployment rate. At the time, cognoscenti were saying that Friedman had moved the Phillips Curve one derivative.

Some comments.

1. The Act I Phillips Curve works better over the 27-year period (Acts III and IV) that Krugman covers. Within the sample period, in 9 out of the 10 years when unemployment is near the bottom of its range (less than 5 percent), wage inflation is near the top of its range (3.5 percent or higher). In all three high-unemployment years, wage inflation is less than 2.5 percent.

2. Although the rate-of-change in wage inflation is also correlated with the unemployment rate, the relationship is not as impressive. In the late 1990s, we had the lowest unemployment rate, but wage inflation actually declined (admittedly by only a small amount). More troubling is the fact that the very high rate of unemployment in recent years produced a decline in wage inflation hardly larger than that of the much milder previous recessions.

3. The overall variation in wage inflation over the 27 years is remarkably low. It ranges from 1.5 percent to 4 percent. When there is this little variation to explain, the actual magnitude of the effect of variations in unemployment on inflation is going to be pretty small. See the post by Menzie Chinn. If you do not have any data points that include high inflation, then you cannot use the Phillips Curve to explain high inflation. Chinn argues that the relationship is nonlinear. I would say that we do not know that there exists a nonlinear relationship. What we know is that we observe a relationship that, if linear, has a shallow slope. The most we can say is that if there is a steep slope somewhere, then there is a nonlinear relationship.

4. If you had given a macroeconomist only the information that wage inflation varied between 1.5 percent and 4 percent, that macroeconomist would never have believed that such a time period included the worst unemployment performance since the Great Depression. In terms of wage inflation, the last five years look like a continuation of the Great Moderation.

Some larger points concerning market monetarism, paleo-Keynesianism, and the microfoundations debate:

5. Concerning Scott’s view of things, I have said this before: Arithmetically, nominal GDP growth equals real GDP growth plus growth in unit labor costs plus the change in the price markup. If you keep the price markup constant and hold productivity growth constant, then nominal GDP growth equals real GDP growth plus wage growth. So it is nearly an arithmetic certainty that when nominal GDP grows more slowly than wages, then real GDP declines. But to me, this says nothing about a causal relationship. You could just as easily say that a decline in real GDP causes nominal GDP to grow more slowly than wages. What you have are three endogenous variables.

Scott insists on treating nominal GDP growth as the exogenous variable controlled by the central bank. To me, that is too much of a stretch. I am not even sure that the central bank can control any of the important interest rates in the economy, much less the growth rate of nominal GDP. Yes, if they print gobs and gobs of money, then inflation will be high and variable, and so will nominal GDP growth. But otherwise, I am skeptical.

6. I view paleo-Keynesianism as being hostile to Act III macro. I share this hostility. However, right now, you have saltwater economists saying, “Freshwater economists reduce macroeconomics to a single representative agent with flexible prices solving stochastic calculus problems. Hah-hah. That is really STOOpid.”

The way I look at it, the Act III New Keynesians reduced macroeconomics to a single representative agent with sticky prices solving stochastic calculus problems. They should not be so proud of themselves.

Paul Krugman calls Act III macro a wrong turn. (Pointer from Mark Thoma.) I would not be so kind. I also would not be as kind as he is to the MIT macroeconomists who emerged in that era.

You cannot just blame Lucas and Prescott for turning macro into a useless exercise in mathematical…er…self-abuse. You have to blame Fischer and Blanchard, too. Personally, I blame them even more.

Having said all that, I do not share Krugman’s paleo-Keynesianism. Just because the Lucas critique was overblown does not mean that other critiques are not valid. I have developed other doubts about the Act I model, and these lead me to believe that PSST is at least as plausible a starting point for thinking about macro.

Macroeconomic Changes

From an article bySerena Ng and Jonathan H. Wright (gated)

There has been a secular increase in the share of services in consumption from an average of 50 percent before 1983 to 65 percent after 2007, at the expense of nondurables (from 35 percent to 22 percent). Labor share in the nonfarm sector has fallen, as has the share of manufacturing employment. The civilian labor force participation rate stands at 63.5 percent in 2013, much below the peak of 67.2 percent in 1999. This is in spite of the female participation rate rising from under 35 percent in 1945 to over 60 percent in 2001, as the male participation rate has been falling since 1945. The economy has experienced increase openness; international trade and financial linkages with the rest of the world have strengthened, with the volume of imports plus exports rising from 12 percent of GDP before 183 to 27 percent post-2007. Meanwhile, not only have households’ and firms’ indebtedness increased, so has foreign indebtedness. For example, the household debt-to-asset ratio rose from under 0.75 in the 1950s to over 1.5 in 2000 and has since increased further. Net external assets relative to GDP have also risen from 0.82 in the 1970s to 2.4 when the sample is extended to 2007.

Keep in mind that the conceit of macroeconometrics is that each quarter is an independent observation, and that by controlling for a few variables one can make, say, 1982 Q1, equivalent to 2009 Q3, except for key policy drivers. If you cannot buy that (and of course you cannot), then I believe that you have reasons to be skeptical of any purported estimates of multipliers.

Hal Varian on Big Data

The self-recommending paper is here.

When confronted with a prediction problem of this sort an economist would think immediately of a linear or logistic regression. However, there may be better choices, particularly if a lot of data is available. These include nonlinear methods such as 1) neural nets, 2) support vector machines, 3) classifi cation and regression trees, 4) random forests, and 5) penalized regression such as lasso, lars, and elastic nets.

In one of his examples, he redoes the Boston Fed study that showed that race was a factor in mortgage declines, and using the classification tree method he finds that a tree that omits race as a variable fits the data just as well as a tree that includes race, which implies that race was not an important factor.

Thanks to Mark Thoma for the pointer.

P(A|B) != P(B|A)

Timothy Taylor writes,

those in the top 1% are almost surely paying the top marginal tax rate of about 40% on the top dollar earned. But when all the income taxed at a lower marginal rate is included, together with exemptions, deductions, and credits, this group pays an average of 20.1% of their income in individual income tax.

…The top 1% pays 39% of all income taxes and 24.2% of all federal taxes.

Assume you are in the top 1 percent. For any particular dollar of your income, there is 20.1 percent chance that it winds up with the government. However, for any particular dollar (not necessarily yours) that winds up with the government, there is a 39 percent chance that it came from your income.