As Tyler Cowen noted, Taleb takes on some of his reviewers. In a comment, I took on Taleb when he wrote
the variance within forecasters is smaller than that between forecasts and out of sample realizations.
He saw it as a sign of forecasters copying other forecasters. I do not think that this is necessary as an explanation. Unless you are adding noise to your forecast, your forecast should always have less variance than what you are trying to forecast. And it would not surprise me to see a range of forecasts show less variance than the range of subsequent outcomes. I wrote,
That is what you could expect. Suppose that the variable you are trying to forecast, Y, has a set of known determinants, X’s, and a set of random determinants, e’s. People should forecast conditional on the X’s, and the range of forecasts should be narrow. But the range of outcomes relative to forecasts depends on the e’s, and so the out of sample realizations could (and often should) have a wider variance
As usual, in your comments, please avoid making generalizations about either Taleb or me. Speak only to the specific issue that I raised.
It may seem weird to say, but comparing forecasts to out of sample realizations is bad.
The problem is that people are loose with their statistical terms when they are making forecasts. Many people who forecast don’t have much statistical training and they just say things like S&P 500 going to end the year at 3,000, or something like that. Instead, consider a more statistical approach. Instead of just giving a number for S&P 500 at the end of year, I would give the distribution of the S&P 500 at the end of the year. For someone less sophisticated, like a journalist, I could summarize and say my forecast for the expected value of the S&P 500 at the end of the year is 3,000. If you then average these values across many forecasters, what could I say about the distribution of these values? This is a distribution of means. This is not a distribution of realizations. There’s no reason why this distribution should have anything like the variance of realizations. The only way that the variance of the forecasts of the S&P 500 at the end of the year would equal the variance of out-of-sample realizations would be if people reported their forecasts of realizations. That would mean some percent of the forecasting population would be saying, “the S&P 500 is going to drop -10%”. The reason that doesn’t happen is because in most years they would be wrong and there is career risk. Moreover, should we really be making investment decisions based on the forecasts of realizations of distributions at a particular point in time? That’s not what the theory says to do. It says to trade off the expected mean with the expected variance based on your risk aversion.
Taleb’s full quote was
“Background: in The Black Swan I show a statistical illustration of such monoculture with forecasters without skin-in-the-game cluster on a wrong answers, which is nonrandom: the variance within forecasters is smaller than that between forecasts and out of sample realizations.”
The statement that the forecasts cluster on wrong answers is an important (crucial) qualifier. The issue with the forecasts is not the excessive consistency it is the consistency plus the lack of accuracy. You should only consistently get clusters around incorrect answers if everyone has the same biases (say all working from the same faulty data, or all working with the same faulty models, or just outright copying each other).
This satisfies John Hall’s complaint above as well (again assuming his measurement of ‘wrong answers’ is robust), if people are reporting a mean forecast incorrectly then you should still be able to test if the typical mean forecast is better than random. If not there is no reason for the clustering outside of the bias.
The question is whether that variable you are trying to forecast is the variance. In that case, the random determinants should be included, but it can be hard to say anything constructive when variances are large, and impossible when infinite, and impossible to measure infinite variance in finite time.
I spent decades working with Wall Street and bank forecast.
Long ago I decided that most forecasters take the consensus and make a modest modification and call it original thinking.
Forecasters face the same issue as portfolio managers. If you do, or forecast, what everyone else is doing and are wrong there is probably little cost to you. But if you go out on a limb and make a non-consensus forecast and are wrong you are going to lose clients. So the risk – reward for making non-consensus forecast is highly biased. I suspect this is the real reason behind your observation.
Let’s say you asked a bunch of experts to predict the sum of a roll of two six-sided dice. If forced to put their chips on one number, they will all say “7”. Zero variance. Meanwhile the actual sums would have the normal variance and distribution, and 5/6 of the time they will be “wrong”, and all in exactly the same way. 1/3 of the time, they will all be off a lot (by 3 or more), and all the same way. That doesn’t mean the models are even wrong, this happens in the truly ideal case when their models are all equally perfect.
That’s true, but what if they all say “11” instead? Low variance, but off-target implies that something is systematically going wrong.
Ok, but here’s the question: How does one tell the difference between “Everybody guessed 11 (not rationally justifiable), but it came up 7,” and “Everybody guessed 7 (best one can do), but it came up 11.”
There can be a perfectly correct “intellectual monoculture” that nevertheless fails to have a perfect forecasting track record, because perfect forecasting is impossible.
You look at their results over multiple years and you would be able to demonstrate from the distribution of outcomes that the forecasters did significantly better than chance in their predictions.
The uncertainties are different when each forecaster holds inside information.
GDP revisions contain a feedback loop, each revision generates agreement on pricing which closes prices down the line, causing further revision. But this is the natural quantization process. It is how we do stock and flow, a way to set delivery and depreciation cycles. Essentially parties agree with Nassim so that the Kling version is correct, ex post.
I agree that the range forecasts should always be narrower than the range of realizations. See my link.
That’s not exactly what you’re saying here, but close, and relevant.
Basketball players in the economy shave points based on the forecast.
We have the observer interacting with observed, so we have a bound trade book uncertainty. The forecasts and bets constantly converging to the uncertainty level; and forecasting error tending toward uniform random variable. Both ideas right, one ex ante, one ex post.
Phil, In your example, let the NBA owners trade players as the season progresses. But your idea of luck is quite correct, it may be 15% of the variation in the official published NBA states, the tradebook. Owners will no trade u less they see insider information with gain greater than 15%.
How does the tradebook evolve? Into tiers, at any given time the teams are prioritized in groups. Within each group the probability of winning is luck, uniform draw with a luck chance of winning, Between each tier the separate is in units of luck. Luck is the ‘Plank’s constant’ of your trading market, the closest one can observe anything.
If everybody is using the same (or very similar) historical data, why wouldn’t they get roughly the same forecast? What would cause the forecasts to diverge significantly?
Taleb’s argument is that they come to the same incorrect conclusions, why would they all be making the same mistakes with the same data?
You are correct as a statistical matter. Taleb’s problem here is that he’s trying to make a correct point with a bad argument. He knows that forecasters groupthink because he has personally observed them behaving this way, and he is correct that they do this. But he has an audience who has not necessarily personally observed this behavior and so he’s trying to demonstrate it with shorthand statistically rather than develop a real model.
The main problem for a forecaster (I’ve been one in a different context) is reputation management. Unlike, say, political pundits, for forecasters the reputational benefit for being alone and correct is much smaller than the harm from being alone and incorrect. My guess is that this ultimately derives from loss aversion, but it is the reality a forecaster faces.
The second problem for a forecaster is a kind of cognitive dissonance analogous to an active-strategy mutual fund manager who believes in EMH. In order to have your job in the first place you have to believe your analysis is better than that of Average Joe, but you also know that your advantages over the competition should be marginal rather than massive. So, you come up with an independent analysis, and your initial conclusion is substantially different than consensus. Here enters the dissonance- on the one hand you believe in your analysis, on the other you know that consensus shouldn’t be that wrong, and that there must be some probability that consensus is correct and you are wrong. At this point, the analyst checks all his work, and then starts talking to other forecasters (because he does of course personally know many of them) to try to figure out what is causing the difference in opinion, both out of curiosity and because it identifies the key assumption(s) on which he really needs confidence to diverge from consensus.
The result is that every forecaster has both a private, unhedged prediction and a public prediction which is a weighted average of his private prediction and the consensus, with the degree of difference a reflection of the forecaster’s confidence in his view on the assumptions he’s identified as causing his prediction to differ. Only the latter can be observed in any way that can be subject to direct statistical analysis. Sometimes this tendency gets so ridiculous that a false public consensus emerges simultaneously with a “whisper” consensus when a substantial number of forecasters all have similar reasons for differing from the supposed consensus, can’t figure out who actually is forming or believing the public consensus, but for the above reasons still hedge their public forecast in the direction while simultaneously letting it be known that the “real consensus” is different.
Taleb is 100% right this this happens, but is using a bad statistic to show it because there is a classic econometric problem of private information the econometrician suppose exists but cannot observe. To prove his point he’d need a pretty sophisticated econometric model, which of course has no guarantee of being both an accurate model of reality and neither over- or under- identified.