Russ Roberts on non-stagnation

He has a 7-minute video lesson and a companion essay.

What the snapshots show is that the rich today are richer than the rich of yesterday. If the rich people are the same people as yesterday, than one’s class determines one’s fate. But if they are not the same people, the snapshots tell you that the dispersion of income has increased. That may or may not bother you, but it doesn’t necessarily mean that there is a distinct group called “the rich” who are capturing all the gains while the rest of us tread water.

The mis-reading of snapshots is one of my pet peeves. A snapshot means looking at, say the average income of someone in the 90th percentile in 1980 and comparing it with someone in the 90th percentile in 2010. The mis-reading of snapshots is to treat the two as if they were the same person.

If you follow actual people from 1980 to 2010, the average increase in income for people in the bottom 20 percent in 1980 is actually quite high. The thing is, many of those people no longer show up in the bottom 20 percent! Instead, the bottom 20 percent in 2010 is occupied by a new set of people, including young families, retired people no longer earning incomes, new immigrants, and people who have recently lost jobs. The snapshots can show stagnation at the bottom 20 percent, even though real people in the bottom 20 percent in 1980 did not stagnate.

I would like to see a high-profile debate on what the data show about trends in income distribution. Otherwise, I fear that those of us with a powerful case against the conventional wisdom will be ignored.

Polygenic score for obesity. . .and?

Coverage of a recent study.

The adults with the highest risk scores weighed on average 13 kilograms more than those with the lowest scores, and they were 25 times as likely to be severely obese, or more than 45 kilograms overweight. “What’s striking is not just the weight,” says Sekar Kathiresan, a cardiologist and geneticist at Massachusetts General Hospital in Boston and the Broad Institute in Cambridge, Massachusetts, who led the study. “If you have a high risk score for obesity, you’re at high risk for heart attack, stroke, diabetes, hypertension, heart failure, and blood clots in the legs.”

And what else? The polygenic score is a result of a statistical fishing expedition. We do not know whether the genes in the score govern physical characteristics, such as metabolism and food preferences, or whether they affect psychological traits, such as conscientiousness. I would be willing to bet that a lot of it is the latter.

If my intuition is correct, then the “obesity score” would predict a lot of other behavioral traits as well. Propensity for getting into financial difficulty. Grades in school. etc.

Supposedly clever statistical analysis

Russ Roberts writes,

It would be tempting to say that this is just a working paper. Perhaps it will get no traction. But I doubt it. The Becker-Friedman Institute will spread it around — I only knew about the study because the Institute sent me an email. The media will be eager to repeat the finding because people have strong feelings about Uber and Lyft: “U of Chicago Study Finds Ridesharing Kills 1000 People Each Year.” Taxicab owners and their supporters will cite it.

The fact is that economists are almost always doing observational studies, not experiments. At the very least, economists should make more use of the Hill Criteria.

  • Strength (effect size): A small association does not mean that there is not a causal effect, though the larger the association, the more likely that it is causal.
  • Consistency (reproducibility): Consistent findings observed by different persons in different places with different samples strengthens the likelihood of an effect.
  • Specificity: Causation is likely if there is a very specific population at a specific site and disease with no other likely explanation. The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship.
  • Temporality: The effect has to occur after the cause (and if there is an expected delay between the cause and expected effect, then the effect must occur after that delay).
  • Biological [or economic] gradient: Greater exposure should generally lead to greater incidence of the effect. However, in some cases, the mere presence of the factor can trigger the effect. In other cases, an inverse proportion is observed: greater exposure leads to lower incidence.
  • Plausibility: A plausible mechanism between cause and effect is helpful (but Hill noted that knowledge of the mechanism is limited by current knowledge).
  • Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect. However, Hill noted that “… lack of such [laboratory] evidence cannot nullify the epidemiological effect on associations”.
  • Experiment: “Occasionally it is possible to appeal to experimental evidence”.
  • Analogy: The effect of similar factors may be considered.
  • Some authors consider also, the Reversibility: If the cause is deleted then the effect should disappear as well

Many of Russ’ criticisms of the paper can be mapped back to some of these criteria.

A sex survey: what’s not to love?

The story is behind a WaPo paywall.

1. It is not a story about sexual frequency. It is about the incidence of people who have not had sex with a partner in the past year. Call these folks abstainers. Sorry, Tyler, but I disagree with Christopher Ingraham that it is amazing that there are more abstainers in the 18-30 age bracket than among fifty-somethings. My guess is that the proportion of married people is quite a bit higher among 50-somethings, and if you’re looking for abstainers, you are more likely to find them among people who are not married. To be blunt, the survey does not say that older folks are having more sex. It just says that fewer of them are abstaining for a year.

2. Robin Hanson also could not resist commenting.

it won’t at all do to point to effects that are constant in time, such as people not always telling the truth in polls, or men having lower standards for sex partners. It also won’t do to point to changes over this time period that effected [sic] all ages and genders similarly, such as obesity, porn, video games, social media, dating apps, and wariness re harassment claims. They might be part of an answer, but can’t explain all by themselves. To explain an unusual burst over the last decade, it is also problematic to point to factors (e.g., computing power) that changed over the last decade, but changed just as much over prior decades.

3. Here’s a way to simplify the data in one of the graphs on Robin’s post, which looks at people in the 18-30 age bracket. Suppose we had 100 heterosexual men and 100 heterosexual women. Ten years ago, there were 10 abstainers of each gender. Among the more recent cohort,there are 28 male abstainers and 18 female abstainers.

4. Here’s a way to think about this. Ten years ago, there were 10 female abstainers, each with a “partner” who abstained also. In the more recent cohort, the number of abstainer “partnerships” increased by 8. Some of that could be a decrease in marriage rate, but how much could the marriage rate of have fallen in the last decade?

5. Another interesting development is that there are now 10 male abstainers who don’t have a “partner.” To put it another way, there are now 82 women who did not abstain and only 72 men who did not abstain. (Of course, ten years ago, there were 90 non-abstainers of each gender, so definitely don’t think of this as women getting friskier.) Who did these extra ten women find? Older men? Men who already had non-abstained with someone else?

6. Robin writes,

it seems that. . .the latest age cohort has switched to a new sex culture wherein the less desirable half of young men are now seen as even less desirable by young women than previous cohorts would have seen them. And within this culture it is seen as more acceptable for young women to share the more desirable half of young men

I agree that this is likely the basic story, but I would not overstate it. It could be that we should be talking about the less desirable quarter of the male population. And the number of women who are ok with sharing desirable men may still be very small. My arithmetic exercise suggests that the proportion of women who are sharing (in the sense that they have a partner who in the past year has had additional partners) is 10 percent, and a lot of that may not be sharing by choice.

The null hypothesis for policy

Scott Alexander writes,

the same argument that disproves the importance of photolithography disproves the importance of anything else.

His post gives a number of examples where progress follows a straight line. This is sometimes used as an argument that no individual policy (or invention, as in the case of photolithography) matters. Alexander wonders whether we are deceiving ourselves into believing the null hypothesis for policy.

I think that in the case of inventions it can be difficult to discern an effect at the point in time when the invention occurs. The process of developing complementary inventions, adapting to the new technology, and achieving widespread adoption takes time. See the work of economic historian Paul David. As a result, even in a world of discrete innovations, the overall path of progress is smooth.

In the case of policy, I think that one must also allow for time lags. For example, changes in labor market incentives may not have large effects in the short run, but over time the culture can be affected.

But in general, I think that if one fails to see any historical break point in an outcome following the adoption of a policy, that justifies a presumption that the policy did not better. I would suggest more careful analysis if that is possible. A clever researcher may be able to find a “natural experiment” that has more power against the null hypothesis. For example, Tyler Cowen posted about a study that found that a carbon tax had little effect on carbon dioxide emissions by comparing across regions. In principle, that study provides more persuasive evidence that the null hypothesis holds for the carbon tax.

Newer versions of the marshmallow test

James Andreoni and others write,

We find that time preferences evolve significantly as children age, with younger children displaying more impatience than older children. This is in line with related work that finds a similar association with age (Bettinger and Slonim, 2007; Angerer et al., 2015; Deckers et al., 2015; Sutter et al., 2015). We also find a strong association with race: black children are significantly more impatient than white or Hispanic children, even while controlling for socio-economic status, cognitive skills and executive function skills.

. . .We do not observe a correlation between preferences of parents and their children. We might have expected such a correlation due to genetics or social learning.

. . .The fact that our early interventions, which were quite broad, did not lead to durable changes in time preferences suggests that such preferences may be difficult to change with education programs for 3-5 year-olds.

. . .the experiment was conducted one-on-one with a trained experimenter and each decision was accompanied by physical containers holding the number of rewards that would be earned by the child for each alternative. The rewards were always candies

Thanks to a reader for forwarding the paper to me.

The rewards were candies, with more offered if the child would wait a day. Frankly, my inclination is to be skeptical of the whole study. You are telling me that interventions do not matter, parental inclinations do not matter, but race matters? I can come up with a clever story to explain such an outcome (perhaps the children of different races reacted differently to the race of the “trained experimenter”), but I would put most of my chips on “results fail to replicate.”

Scott Alexander on the Representative Agent model

He writes,

Suppose that one-third of patients have some gene that makes them respond to Prozac with an effect size of 1.0 (very large and impressive), and nobody else responds. In a randomized controlled trial of Prozac, the average effect size will show up as 0.33 (one-third of patients get effect size of 1, two-thirds get effect size of 0).

Economists instinctively fall back on the “representative agent” model, in which you average the population results of whatever study you do. So an economist would say that the effect size is 0.33. But the point is that there is not one parameter that represents the whole population. One needs to take into account differences.

Where this bothers me the most is in the realm of expectations. Someone will take a survey of, say, consumer expectations for home price increases. The results will diverge across consumers. But the economist will report a single number for consumer expectations.

Robert Plomin talks his book

In the WSJ, Robert Plomin writes,

DNA is the major systematic influence making us who we are as individuals. Environmental influences are important too, but what look like systematic effects of the environment are often genetic effects in disguise: Parents respond to their children’s genetically driven traits, and children seek, modify and even create experiences correlated with their genetic propensities.

His book is Blueprint, which I just finished. His thesis:

DNA is the only thing that makes a substantial systematic difference, accounting for 50 percent of the variance in psychological traits. The rest comes down to chance environmental experiences that do not have long-term effects.

What he calls “chance environmental experiences” could be measurement error. Measurement error always holds down correlation. This raises the possibility that some traits that are measured with error are more heritable than they appear. For example, Gregory Clark found that social status is much more heritable across many generations than would be expected based on parent-child heritability estimates. I explained that this is likely due to error in measurement in social status, which lowers immediate-generation correlation more than multi-generation correlation.

Educational interventions are apparent environmental influences that wear off over time. You raise a test score but do not fundamentally alter ability. That is an element of what I call the Null Hypothesis, which Plomin strongly endorses, although of course he does not use that term. Related: Scott Alexander on pre-school.

This is one of the most important books of the year. Coincidentally, the NYT has an article on economists’ use of polygenic scores. Tyler and Alex both linked to it.

But you should know that I came away from Plomin’s book less than impressed with polygenic scoring. So much data mining. So little predictive value. Also, there is serious criticism of his view that environmental factors exhibit no systematic influence, but he does not confront it. I did a search inside the Kindle edition for “Flynn” and found no results.

Revisiting the Hidden Tribes poll

Several commenters did not like the poll, and a reader suggested that I try the Hidden Tribes quiz. Ugh! What a terrible survey instrument.

I would like to believe that there is a large portion of the population that is tired of hyper-partisanship. But if there is such a majority out there, this poll is not a credible way to find it.

I would trust a survey based on my three-axes model more than I would trust the Hidden Tribes report. If the general public is more centrist or nuanced, that would show up as a lot of people not consisting aligning with any one axis.

Will population growth rebound?

Jason Collins and Lionel Page write,

The United Nations produces forecasts of fertility and world population every two years. As part of these forecasts, they model fertility levels in post-demographic transition countries as tending toward a long-term mean, leading to forecasts of flat or declining population in these countries. We substitute this assumption of constant long-term fertility with a dynamic model, theoretically founded in evolutionary biology, with heritable fertility. Rather than stabilizing around a long-term level for post-demographic transition countries, fertility tends to increase as children from larger families represent a larger share of the population and partly share their parents’ trait of having more offspring. Our results suggest that world population will grow larger in the future than currently anticipated.

Collins is humble about the ability of any model to project fertility, given the importance of cultural evolution. I have not seen the paper, but I would like to know whether they tested their model against actual data in any way. For example, you could “backcast” the model and see how well it “predicts” population in, say, 1980 or 1950.