P(Bayesian) = ?

Scott Alexander writes,

I asked readers to estimate their probability that Judge Kavanaugh was guilty of sexually assaulting Dr. Ford. I got 2,350 responses (thank you, you are great). Here was the overall distribution of probabilities.

1. A classical statistician would have refused to answer this question. In classical statistics, he is either guilty or he is not. A probability statement is nonsense. For a Bayesian, it represents a “degree of belief” or something like that. Everyone who answered the poll (I did not even see it, so I did not answer) either is a Bayesian or consented to act like one.

2. A classical statistician could say something like, “If he is innocent, then the probability that all of the data would have come in as we observed it is low, therefore I believe he is guilty.”

3. For me, the most telling data is that he came out early and emphatically with his denial. This risked having someone corroborate the accusation, which would have irreparably ruined his career. If he did it, it was much safer to own it than to attempt to get away with lying about it. If he lied, chances are he would be caught–at some point, someone would corroborate her story. The fact that he took that risk, along with the fact that there was no corroboration, even from her friend, suggests to me that he is innocent.

4. But that could very well be motivate reasoning on my part, because I was in favor of his confirmation in the first place. By far, the biggest determinant of whether you believe he is guilty or not is whether or not you wanted to see him confirmed before the accusation became public. See Alexander’s third chart, which shows that Republicans overwhelmingly place a high probability on his innocence and Democrats overwhelmingly place a high probability on his guilt. That is consistent with other polls, and we should find it quite significant, and also depressing.

Genes and cognitive ability

Nicholas W. Papageorge and Kevin Thom write,

we utilize a polygenic score (a weighted sum of individual genetic markers) constructed with the results from Okbay et al. (2016) to predict educational attainment. The markers most heavily weighted in this index are implicated in neuronal development and other biological processes that affect brain tissue. We interpret the polygenic score as a measure of one type of endowed ability.

Perhaps a newer version of the paper is here.

The paper finds that gene-environment interaction matters. But I think it is important that we now have a genetic score that can serve as a proxy for IQ. Also, this genetic score affects economic outcomes even when educational attainment is controlled for.

By the way, Robert Plomin’s forthcoming book is on my radar. This review points out the obvious, which is that the book will not be well received.

And also, Tyler Cowen points to this paper, which says that it is liberals who attribute outcomes more to genetic factors.

I can only imagine genetic effects being powerful if you hold constant the cultural context. Suppose it were possible to create reliable polygenic scores for the Big Five personality traits, plus cognitive ability. I can imagine that these scores would be useful in predicting outcomes among a group of American teenagers. But if you were to take a random sample of teenagers around the world and use nothing but these scores to predict long-term outcomes, I cannot imagine that this would work. To carry the thought experiment even further, think in terms of plopping people with identical polygenic scores into different centuries.

Adult marshmallow-test winners do better

William H. Hampton1\, Nima Asadi and Ingrid R. Olson write.

Participants engaged in a delay discounting task adapted from O’Brien et al. (2011). In the task, participants were asked to make choices between a smaller sum of money offered now versus a larger sum of money (always $1,000) offered at five different delays.

They then use this variable along with other variables to predict the person’s income.

The results of each model were quite consistent, with occupation and education paramount in each case. On average, the next most important factors were zip code group and gender. While zip code group was highly associated with income, it is worth noting that our data do not adjudicate directionality. Logically, a person’s income is more likely a determinant of where they live than vice versa. Nonetheless, zip codes are a useful proxy for socioeconomic status, which is also related to income (Winkleby et al., 1992). As our zip codes were binned by average income, the association between zip code and income is not surprising, but does suggest that the individuals in our sample had incomes roughly representative of the incomes from their respective zip code group. Regarding gender, we found that males earned more money than females, a result consistent with a corpus of research on the gender wage gap (Nadler et al., 2016). The fifth most important variable was delay discounting, a factor closely related, but distinct from impulsivity. Although previous research had indicated that discounting was related to income (Green et al., 1996), it was unclear to what extent, relative to other factors, this variable mattered. Interestingly, delay discounting was more predictive than age, race, ethnicity, and height

Pointer from Tyler Cowen.

Oy. It would be nice to be able to cite their comment that “delay discounting was more predictive than age, race, ethnicity, and height.” But the flaws I perceive in the study are just too fatal to allow me to do that.

1. Most of the variables that they use to “predict” income are not plausibly exogenous to income. For that matter, it is possible that your level of income helps determine your willingness to delay receiving money, so even their key delay-discounting variable is plausibly endogenous.

2. When you compare the strength of different predictors (hardly ever a valid exercise), measurement error is everything. A variable that is measured unambiguously will do much better than a variable that is measured subject to errors, even if the latter variable has more influence in reality. So gender has the advantage of being unambiguous*, while self-reported ethnicity can be ambiguous.

*all right, some people insist that gender is ambiguous, but I don’t think those people find their way to this blog.

Truly experimental firms

by John List, on corporate social responsibility.

My initial inclination is: firm does a good thing; worker reciprocates to firm by working harder; and the world is a better place. Everyone’s better off. But what this suggests is that there’s something deeper on the psychological side, that it’s not just triggering this reciprocity from workers. C.S.R. is also triggering something deeper, which the researchers in this area call moral licensing.

Pointer from Tyler Cowen. Read the whole transcript, to see the research method. He actually starts firms and hires workers in order to do controlled experiments.

Estimating consumers’ surplus from information goods

Erik Brynjolfsson, Avi Gannamaneni, and Felix Eggers have a paper on the topic. From the abstract:

We explore the potential of massive online choice experiments to measure consumers’ willingness to accept compensation for losing access to various digital goods and thereby estimate the consumer surplus generated from these goods. We test the robustness of the approach and benchmark it against established methods, including incentive compatible choice experiments that require participants to give up Facebook for a certain period in exchange for compensation. The proposed choice experiments show convergent validity and are massively scalable. Our results indicate that digital goods have created large gains in well-being that are missed by conventional measures of GDP and productivity.

Pointer from Tyler Cowen.

Based on their powerpoint, I gather that the method is something like this.

1. Ask a user of, say, Facebook how much they would need to be paid to give it up for a month.

2. If they say they would give it up for $25, tell them to do it.

3. If after a month they have not used it, give them $25.

The methods that they use are really interesting, but I have doubts about the approach. I think dollars are too abstract. I would like to see a lot of “give up X or give up Y” choices offered. The authors do some of this and apparently it confirms their findings.

The values that the authors get are really high. If the median Facebook user gets over $40 a month in value from it, then Facebook is leaving a fortune on the table by not having a subscription service. Yes, they have to be careful that charging a subscription price could drive some customers away, lowering the value of the service to other customers, but the “freemium” model could be used to address that. That is, let anyone join for free, but give more privileges to subscribers.

Finally, note that if I pay less for Google Maps and other digital services than I would be willing to pay, I also pay more for my smart phone, home Internet connection, and wireless service provider than I would if all I were getting were just plain phone service. In other words, some of the “consumer’s surplus” from digital goods goes to Verizon and Apple as revenue, not to consumers.

Inter-generational mean reversion

Tyler Cowen, among many others, is intrigued by a study by Raj Chetty and others showing downward mobility of black males.

My view, which I came to in the process of reading Gregory Clark’s study of long-term heritability of income, is that inter-generational income has a large heritable component and a large random component. Over several generations, the random component washes out. But for the difference across a single generation, the random component matters.

This model suggests that when someone’s income is far above (below) the heritable component, it will revert to the mean. Children will do worse than parents who have enjoyed a positive shock and they will do better than parents who have suffered a negative shock.

If the shocks to income were normally distributed, then mean reversion would not produce any systematic pattern of children falling below parents or rising above them. So you would not expect the Chetty result in that case.

But what if the random component is not normally distributed? Suppose that what you observe in one generation are a few really large shocks on the up side, with a lot of smaller negative shocks on the down side. The next generation will then have some apparent big losers and a lot of apparent small winners. Depending on how you sort the data (Chetty appears to be looking at measures of income based on rank rather than absolute level), Chetty’s result could be an artifact of the random component. It might be that if he were to measure incomes three or four generations apart, the apparent downward mobility would disappear.

Clarification: the null hypothesis

A reader asked for this.

The term “null hypothesis” comes from statistics. The word “null” means “no effect” and the null hypothesis is that an intervention has no effect on the outcome. If you were testing the effectiveness of a drug, the null hypothesis would be that the drug works no better than a placebo. If you do a study and you do find that the drug works better than the placebo, and this is not likely just an accident, then you reject the null hypothesis.

I apply the term “null hypothesis” in the context of education. My observation is that most of the time when an intervention in education is evaluated rigorously, it has no effect compared to a control group, or the effect wears out over time, or the effect cannot be duplicate in repeated experiments or at large scale.

Is personality psychology just a baloney sandwich?

I say no, although it is not like physics. We are talking about modest correlations, not strict laws.

What made the marshmallow test famous was the follow-up work which suggested that a child’s ability to defer gratification on the test helped predict future outcomes, such as SAT scores. These correlations with longer-term outcomes speak to the usefulness of the test in revealing some important trait.

Me vs. Nassim Taleb

As Tyler Cowen noted, Taleb takes on some of his reviewers. In a comment, I took on Taleb when he wrote

the variance within forecasters is smaller than that between forecasts and out of sample realizations.

He saw it as a sign of forecasters copying other forecasters. I do not think that this is necessary as an explanation. Unless you are adding noise to your forecast, your forecast should always have less variance than what you are trying to forecast. And it would not surprise me to see a range of forecasts show less variance than the range of subsequent outcomes. I wrote,

That is what you could expect. Suppose that the variable you are trying to forecast, Y, has a set of known determinants, X’s, and a set of random determinants, e’s. People should forecast conditional on the X’s, and the range of forecasts should be narrow. But the range of outcomes relative to forecasts depends on the e’s, and so the out of sample realizations could (and often should) have a wider variance

As usual, in your comments, please avoid making generalizations about either Taleb or me. Speak only to the specific issue that I raised.

Jason Collins on Grit

He writes,

I will say that Duckworth appears to be one of the most open recipients of criticism in academia that I have come across. She readily concedes good arguments, and appears caught between her knowledge of the limitations of the research and the need to write or speak in a strong enough manner to sell a book or make a TED talk.

. . .But Duckworth does not address the typical problem of studies in this domain – they all ignore biology. Do the students receive higher grades because their parents are more demanding, or because they are the genetic descendants of two demanding people? Are they world-class performers because their parents model a work ethic, or because they have inherited a work ethic? Are they consistent with their extracurricular activities because their parents consistently keep them at it, or because they are the type of people likely to be consistent?

He points out that “grit” is mostly conscientiousness. The case that conscientiousness matters is sound. I do think there are some studies that show that conscientiousness can be coached, but I am not confident that it is settled science.