Behavioral Non-Science

Slavisa Tasic and Zeljka Buturovic write,

While it is difficult to gauge the cost of various decision-making errors with any precision, it may be worth contrasting them against the costs of mistakes that clearly have nothing to do with cognitive biases: the cost of choosing a profession one ends up hating, the cost of not finding a suitable mate, the cost of having children too early in life or too late, the cost of moving to a place one ends up disliking, the cost of adopting a pet or sending children to a private school, and so on. These types of decisions–i.e., actual, important decisions in which errors are genuinely costly–are not typically studied in depth. . .Faced with difficulties in assessing the accuracy of the outcome of social judgments in the real world, the field [behavioral economics] has produced various norms of judgment against which to judge human performance, but only in highly artificial settings.

My view:

1. Economics is non-experimental. Instead, we work with interpretive frameworks that cannot be falsified empirically. This means that economic models do not have the epistemic status of models in the physical sciences, which can be falsified through experiments. All of our interpretive frameworks have some degree of plausibility but also are challenged by real-world anomalies. Economists can differ in their willingness to tolerate anomalies in their preferred interpretive frameworks.

2. Behavioral economics is experimental, but the experiments test people making minor decisions in peculiar, isolated settings.

3. Therefore, I go back to (1).

Contrarian Betting/Forecasting

Bryan Caplan writes,

I doggedly take the outside view. When long-run trends say X, and the “latest news” says Y, I go with X. When Democrats won big in 2008, I saw good luck, not a new political regime. That’s why, in 2009, I bet my former co-blogger Arnold Kling that the Republicans would regain control of one branch of the federal government by 2017. I won in 2010.

I sometimes think that there are two major types of investors. Momentum investors say that “the trend is your friend.” Contrarian investors say, “if something cannot go on forever, it will stop.” The former investors look at short-term developments. The latter investors look at long-term averages. If the long-term average in U.S. politics is that no party dominates for extended periods, then I was making a momentum bet, which is not what a good economist should do.

Read Bryan’s post, which talks about superforecasters looking first to general statistics (what proportion of households own pets) and then to specific factors (what about this family suggests pet ownership?). Momentum investors probably are more likely to look at specific factors.

Cinnabonomics

I review the latest book by George Akerlof and Robert Shiller. I generally admire both authors. Ordinarily, if I do not like a book, then I do not write a long review. However, because both authors are Nobel Laureates, and because the book has received some positive press, I made an exception and let go with both barrels.

The authors do not deny that markets often work. However, if phishing equilibrium is limited to specific types of products, then the authors do not say so, nor do they give any criteria or characteristics to look for in order to predict in which markets phishing equilibria will be most prevalent.

But you have to read to the whole review to get the flavor.

Recall that Alex Tabarrok did not much are for the book, either.

The Third C is Conscientiousness

Broadly speaking, our results point to a quantitatively large and significant role for credit scores in the formation and dissolution of committed relationships. Three sets of empirical results support this conclusion: First, credit scores are positively correlated with the likelihood of forming a committed relationship and its subsequent stability. Second, partners positively sort into committed relationships along the credit score dimension even after controlling for other similarities between the partners. Third, a positive correlation notwithstanding, within-couple differences in credit scores are apparent at the start of relationships. Notably, the initial match quality in credit scores is highly predictive of subsequent separations even when controlling for other factors, such as couples’ use of credit and the occurrence of financial distress.

Jane Dokko, Geng Li, and Jessica Hayes write,

These results lead us to hypothesize that credit scores, in addition to measuring an individual’s creditworthiness regarding the repayment of debt obligations, reveal information about an important relationship skill. We argue that one such skill could be an individual’s general trustworthiness and commitment to non-debt obligations. To make this argument, we turn to survey-based measures of trustworthiness to show that the average credit score of a geographic area (typically a county) is highly correlated with the same area’s average level of trustworthiness. We also find that when individuals have a long exposure to greater trustworthiness, as measured by surveys, they tend to have higher credit scores even years after they leave those areas. Similar to how credit scores predict the formation and dissolution of committed relationships, we find that survey-based measures of trustworthiness also have predictive power for these outcomes. Interestingly, such statistical relevance diminishes when the couples’ credit score levels are controlled for, underscoring the overlapping between credit scores and survey-based measures of trustworthiness.

Pointer from Tyler Cowen.

In mortgage underwriting, they used to talk about the three C’s: collateral (the house, particularly the borrower’s equity in it), capacity (the borrower’s income relative to mortgage payments and other debt obligations), and credit history.

In fact, I think that the third C should be called conscientiousness, one of the Big Five personality traits. The authors of the paper instead use the term trustworthiness. That this trait should matter for relationship stability, and that it is well measured by credit scores, should surprise no one.

I worry that pursuit of this line of inquiry, like research on the role of IQ, will not be good for the career of a young researcher.

Life Expectancy and Income

Timothy Taylor writes,

the reasons for this growing gap in life expectancy by income are not altogether clear. Some explanations clearly aren’t supported by facts. For example, although overall levels of tobacco use are down, the decline seems to have happened in much the same way across income levels, and thus can’t explain the life expectancy factors. Obesity levels are up over time, but they seem to be up more among those with higher incomes, so that pattern doesn’t explain a growing gap in life expectancy by income, either. One hypothesis recognizes that there is a correlation between education and health, and also between education and income, so perhaps factors related to education and health have become more important over time. For example, perhaps those with higher incomes are better at managing chronic diseases like high blood pressure or diabetes. But again, this is an open question. Other possible explanations are looking at how the nature of jobs and job stress may have changed over time for jobs of different income levels, or whether greater inequality in a society may create stresses that affect health.

Before you comment, note carefully the methods used to assess life expectancy.

My own view is that the distribution of conscientiousness has become more unequal over time, and this has implications for both income and life expectancy.

For a view the conscientiousness is the endogenous variable (rather than exogenous, as I think of it), see Elliot Berkman. Pointer from Mark Thoma. He has another post on a piece saying that educational inequality has widened. Again, I have the same diagnosis–that the distribution of ability has widened.

Study Not Needed

Ray Fisman and Daniel Markovits write,

We measured attitudes toward equality by asking hundreds of Americans to distribute a pot of money between themselves and an anonymous other person. Our subjects weren’t making hypothetical choices in responding to the survey—their decisions affected how much real money they would get when the experiment ended.

Pointer from Tyler Cowen. I added the emphasis on “themselves.” That is very different from what redistribution means in political terms. There, it means redistributing other people’s money.

The authors seem to suggest that we should be surprised that rich progressives are reluctant to redistribute their own money. I do not think we needed an experiment to show this. I think we already know from their behavior that rich liberals are averse to redistributing their own money. I believe that surveys have shown that instead conservatives and people lower down the income ladder give larger shares of their income to charity.

Political support for redistribution is costless, especially compared with actually giving away some of your wealth.

Gender and Culture

A commenter writes,

I would posit that the phenomena they’re describing–gossiping, passive-aggressive, back-stabbing, shaming, and appeals to 3rd parties (hey-la, hey-la, my boyfriend’s back)–represents the worst kind of feminine social striving. If an honor culture full of duels, blood-feuds, and vendettas is the ugly, barbarous version of male social interactions, this ‘cry victim and try to sick a mob on my social rivals,’ ala the UVA rape hoax, is the female equivalent. Perhaps this was inevitable, given the increasing sex ratios on college campuses, but it’d be nice if level-headed adults would recognize this behavior for what it is and avoid indulging it rather than actively egging it on, as many campus administrators seem to do.

I think that this is an under-explored topic, because people are afraid to touch it. But I do believe that the political culture changes when women can vote and that the college culture changes when women are in the majority. It would be surprising if all of the changes are for the better, just as it would be surprising if all of the changes are for the worse.

I found in business that I felt uncomfortable in meetings that were nearly all male or nearly all female. The dynamics of both types of meetings bothered me, in ways that I found difficult to articulate. I want neither Randle McMurphy nor Nurse Ratched.

I believe that men and women tend to differ on the Big 5 personality characteristic known as “agreeableness.” For (low-agreeable) men, disagreement can be exciting and competitive (“wanna bet?”). (High-agreeable) women prefer not to have disagreement. Perhaps I am uncomfortable with the meetings that are dominated by males, because they strike me as overly confident and aggressive. Yet I am uncomfortable with the meetings dominated by females, because I feel that I cannot freely express a dissident point of view.

Timothy Taylor on Nudging and Public Choice

He writes,

think about elected officials and regulators in the spirit of behavioral economics: they often lack self-control; have a difficult time evaluating complex situations; tend to stick with rules-of-thumb and default options rather than accept the cognitive and organizational costs of re-evaluating their positions; do not evaluate costs and benefits in a consistent way across different contexts; are not good at evaluating risks accurately, instead often respond to limited information and hype; and are overly averse to the risk of taking responsibility for decisions that might turn out poorly. This perspective must have widespread implications for decisions involving the complexities of the tax code or government budgets, policies affecting the workforce and the environment, openness to new sources of domestic and foreign competition, and foreign policy as well.

He is riffing off a paper by W. Kip Viscusi and Ted Gayer.

Paging Jason Collins

Alex Tabarrok writes,

in the environment in which the mind evolved we often needed to accurately throw things but rarely needed to accurately drop things from moving objects. As a result, we developed excellent heuristics for throwing but not for dropping.

Yesterday, as I was taking a long bike ride, I thought to myself, my physique is much better suited to riding a bicycle than to running. Doesn’t that suggest that my prehistoric ancestors rode bicycles?

Anyway, I wonder if there is too much confirmation bias at work when we tell stories in which evolution explains what we are suited to doing and what we are not suited to doing.

[Update: Commenter Handle’s remarks, picking up on the bicycle example, are wise and worth reading.]

Null Hypothesis Watch

“Scott Alexander” writes,

When they caught up with these kids at age 25, the intervention group was found to have an odds ratio of around 0.6 to 0.7 of having developed various psychiatric disorders the study was testing for, including antisocial personality disorder, ADHD, depression, or anxiety. They had odds ratios around 0.7 of developing drug and alcohol abuse problems by various measures. They reported less risky sexual behavior, less domestic abuse, and fewer violent crimes. All of this was significant at the p < 0.05 level, and some of it was significant at much higher levels like p = 0.001 or below. Subgroup analysis found the data were very similar when you restricted the analysis to various subgroups like boys, girls, whites, blacks, highest-risk, lowest-risk, and by study site (it was a multi-site study)

This was a randomized, controlled study of a group of many interventions. “Scott” goes on to point out a number of caveats. The group of interventions was expensive. A lot of other indicators, including employment rates, did not improve. We do not know whether the results came from one or two of the interventions, or from the combination of all of them.

Still, it looks as though something managed to defeat the null hypothesis. As a controlled trial, it gets over the hurdle of confusing correlation with causality. As a study of long-term outcomes, it gets over the hurdle of fade-out. The results are numerically significant, not just statistically significant. The only remaining hurdle is replicability. My guess is, given the complexity of all those interventions, that the replicability hurdle will be a challenge.