1. Daniel J. Morgan and others write,
These findings suggest that many practitioners are unaccustomed to using probability in diagnosis and clinical practice. Widespread overestimates of the probability of disease likely contribute to overdiagnosis and overuse.
Pointer from Tyler Cowen.
In 2008, Barry Nalebuff and Ian Ayres in Supercrunchers also reported that doctors tend to do poorly in basic probability. When I taught AP statistics in high school, I always used an example of bad experience inflicted on me by a Harvard-trained physician who did not know Bayes’ Theorem.
2. From a 2014 paper by Ralph Pelligra and others
The Back to Sleep Campaign was initiated in 1994 to implement the American Academy of Pediatrics’ (AAP) recommendation that infants be placed in the nonprone sleeping position to reduce the risk of the Sudden Infant Death Syndrome (SIDS). This paper offers a challenge to the Back to Sleep Campaign (BTSC) from two perspectives: (1) the questionable validity of SIDS mortality and risk statistics, and (2) the BTSC as human experimentation rather than as confirmed preventive therapy.
The principal argument that initiated the BTSC and that continues to justify its existence is the observed parallel declines in the number of infants placed in the prone sleeping position and the number of reported SIDS deaths. We are compelled to challenge both the implied causal relationship between these observations and the SIDS mortality statistics themselves.
I thank Russ Roberts for the pointer. This specific issue has been one of my pet peeves since 2016. See this post, for example. I think that back-sleeping is a terrible idea, and I never trusted the alleged evidence for it. Doctors do not understand the problem of inferring causation from non-experimental data, etc.
UPDATE: A commenter points to this Emily Oster post, reporting on a more careful study that supports back-sleeping.
My primary care doctor is very frustrated with cardiologists because he says almost all of them over treat. Treatments are typically very expensive and not without risk. He recommended a cardiologist more than an hour away who does not over treat and is a specialist in ablation for cardiac arrhythmias.
This cardiologist did not want to do an ablation on me because because I typically went years between episodes on an older medication that is used very rarely now but which I have gotten great results from for 25 years. He told me that some places claim much better results than he gets but he doesn’t trust their stats and their claims.
When my brother had an episode of this arrhythmia (atrial fibrillation) in a different state and his cardiologist told him the procedure had a 75% success rate, I had him ask his doctor exactly what counts as a “success.” Here is the answer: If an arrhythmia occurs in he first three months that is not a failure because the procedure relies on scar tissue forming that takes that long to form. What about the other end? They count 12 months as a permanent success and don’t follow the patients longer than that for purposes of computing the success rate.
So then, during a 10 year period I enjoyed between episodes on this generic medication I could have had 9 “successful” ablations if timed exactly right.
You may be interested in Emily Oster’s recent piece for the opposite perspective on SIDS. She says new NBER research gives more credible evidence for the relationship between sleeping position and SIDS. The regression discontinuity graphs look pretty convincing to me. https://emilyoster.substack.com/p/back-sleeping-and-sids-new-research
Regression discontinuity should be outlawed. If they took away those lines, it would look like a continuous downward and narrowing trend that started before Dec 1991. They need to present the data without the lines separately.
More on regression discontinuity generally here:
https://statmodeling.stat.columbia.edu/2021/03/11/regression-discontinuity-analysis-is-often-a-disaster-so-what-should-you-do-instead-do-we-just-give-up-on-the-whole-natural-experiment-idea-heres-my-recommendation/
To be more clear, back sleeping may or may not be a good idea; and I take this trend to be moderate evidence for back sleeping. I just don’t find that methodology particularly convincing, and if they lost the discontinuity, I think you’d see a convincing downward sloped line through the full series that would fit both halves quite well. And what else was changing over that ~8-9 year period that could plausibly impact SIDS? It would be more convincing if we could see what % of nights babies slept on stomach vs back in each of those years. Or better yet, data at the baby level of whether they slept on their back or stomach and what the outcome was.
I’ve also read Gelman’s blog for years and he’s more opposed to how RD is used in practice then the method itself. He has a lot of advice for how to use it effectively and has even contributed to the literature. I haven’t read the paper (only Oster’s piece) so I don’t know whether it would live up to his standards. Like any other topic, I’d hope to see a convincing literature rather than a single paper, but I think the idea that we should throw this one out because it uses RD is over the top.
Regarding the discontinuity itself, I don’t agree with you. The change is quite clear to me.
Sure, most ordinary practicing doctors are not great at probability or statistical reasoning. Neither are lawyers or other historians or other non-quantitative professionals, even if they are otherwise very bright and highly educated.
However, one might expect *epidemiological researchers* – whether or not they are also practicing physicians and whose main job it is to crunch statistics – to understand the techniques and difficulties involved in teasing out causal relationships, and to make scientifically well-founded recommendations in terms of medical best practices and health policy. If we can’t trust *those* people either, then …
I’m on the quantitative side of things by personal inclination, and I ended up learning quite a bit of advanced probability and statistics theory. There are people who say, “Every capable student should learn statistics.” This once sounded reasonable to me. Now it sounds like, “Every toddler should be given a sippy cup of wine cooler and a loaded revolver with the safety off and then set loose in the shopping mall.”
Probably for each of my first three prob/stat/econometrics courses, the teachers / professors treated it as if they were just teaching skiing. First you learn some basics, then try a bunny slope, but eventually any capable person will get the hang of it and even be able to deal adeptly with the hard trails. Yes, it can be dangerous, some people make mistakes which cause pain and embarrassment, but that can usually be avoided with practice and humility.
The more I learned about the pitfalls, the more I realized that the slopes were all covered with hidden land mines, and all but the best, professional experts would thus routinely foul up or get away with mischief. Asking anybody but people of very high caliber to do sophisticated statistical analysis is like asking an ordinary bike messenger to deliver your package when he doesn’t know it’s packed full of nitroglycerine.
https://www.theatlantic.com/magazine/archive/2010/11/lies-damned-lies-and-medical-science/308269/
Notably this was obvious to me before reading this article, because I was in the habit of reading medical study abstracts in New Scientist. They don’t replicate. The do-eat-eggs & don’t-eat-eggs thing is the norm, not the exception.
I am a statistician and concur with this. To take but one example, most methods taught in introductory stat courses assume independence of observations (the “bunny hill” you referred to), but many analyses involve observations which are correlated. Failure to account for that has led directly to some spectacularly bad predictions, such as those involving home mortgage bundles in the 2000s and election results more recently.
Back to the first paper Arnold linked to, that phenomenon – that when you use an imperfect test to screen for a condition with a low background prevalence, false positives tend to outnumber true positives – is both common and important. Thus, we try to train doctors to recognize it, and after being shown a few examples, they seem to get it. In light of that, the results from that paper are discouraging. In defense of doctors, it’s difficult to become very well-trained in topics such as medicine, probability, and research methods in one lifetime. But it’s clear that better numeracy is desirable in this and other fields.