Not in general, but in this post, where he writes,
I’d like to flip it around and say: If we see something statistically significant (in a non-preregistered study), we can’t say much, because garden of forking paths. But if a comparison is not statistically significant, we’ve learned that the noise is too large to distinguish any signal, and that can be important.
Pointer from Mark Thoma. My thoughts:
1. Just as an aside, economists are sometimes (often?0 guilty of treating absence of evidence as evidence of absence. For example, if you fail to reject the efficient markets hypothesis, can you treat that as evidence in favor of the EMH? Many have. Similarly, when Bob Hall could not reject the random walk model of consumer spending, he said that this was evidence in favor of rational expectations and consumption smoothing.
2. I think that a simpler way to make Gelman’s point would be to say that passing a statistical significance test is a necessary but not a sufficient condition for declaring the evidence to be persuasive. In particular, one must also address the “selection bias” problem, which is that results that pass significance tests are more likely to be written up and published than results that fail to do so.
On point 1, the problem there would be that the researchers are treating their theory as the null hypothesis. They fail to reject it, so they accept it. Sounds like the framing is wrong.
On 2, I think you may be simplifying his point too much. It’s not just “necessary but not sufficient” and publication bias he’s concerned with, but also the researcher’s degrees of freedom in what to test and how to test it (p-value fishing).
I think we tend to call the problem you’re referring to publication bias so as to avoid confusing it with non-random sampling.
Data alone, however well massaged into standard deviations and biases and other so called sufficient statistics and their combinations, cannot be persuasive on their own and cannot prove, or reject anything. Using probabilities ( in the Keynesian way, that is, as degrees of confidence) all we can say is that the data set D supports this or that model with this or that probability.
When can we declare this support “persuasive”? We can’t say until we know the purpose of the model. Once we introduce the purpose of the model we can’t talk statistical significance at all.
5 patients get well, one patient dies faster than expected after taking some experimental medication. Definitely not a statistically significant result. Yet, I can construct a number of morally compelling situations when the “5-to-1 clinical trial” result is persuasive enough for all practical purposes for the next patient.
I agree, for some reason Gelman is very focused on the problem of a lot of tiny psych experiments with possibly spurious “significant” findings for this or that priming effect, or whatever. I agree with his skepticism, but actually these are mostly “effects” that nobody cares about very much anyway (an exception being “stereotype threat” which I understand has been hard to replicate.)
But I think that for the most important questions people actually care about for policy reasons, the problem you cite is worse. A good example is the Oregon Medicaid Experiment, where even Obamacare opponent Casey Mulligan couldn’t stand how the lack of (statistical) “significance” was being mis-interpreted.
http://economix.blogs.nytimes.com/2013/06/26/the-perils-of-significant-misunderstandings-in-evaluating-medicaid/?_r=0
Isn’t that study a good example of Gelman’s point? People expect a significant result but they couldn’t tell signal from noise. So there is either no effect or the controlled variable (money, presumably) needs to be increased to tease out statistical significance. If they had detected a very small but significant impact, we might even know less than we do with the “non-result.”