Tyler Cowen asks why numbers imply spread rates and death rates that are so difficult to reconcile across regions and countries.
People are feeding their elegant dashboards, nifty charts, and fancy computer models with worthless numbers. Nobody seems to want to listen to me on that. But it would not surprise me to find that all of the heterogeneity that cannot be explained by demographics and differences in treatment quality is simply an artifact of the way that numbers are collected.
Only fools claim to know precisely the true spread rates or the true death rates. We don’t even have decent ballpark estimates.
If we were to obtain data that were good enough to infer true spread rates and death rates, and these rates turn out to differ greatly across regions, then I would speculate on a combination of two factors. First, different variants of the virus, which spread and kill at different rates. Second, a highly skewed spreading phenomenon. That is, instead of every infected person proceeding to infect exactly 2.2 other people, you have a few infected persons infecting dozens of others, and most infected people infecting no one else. Put those two factors together, and you will get heterogeneity. But I emphasize that this is purely speculative. Don’t take this idea and run with it. Stop guessing. Get some facts first.
I wish someone at the CDC would take and run with the idea of obtaining scientific data, rather than guessing using the numbers that are being collected. In a scientific study, the investigator chooses who gets tested for the virus, and when the tests are conducted. The study uses the same type of test kit on every subject, preferably a test kit with a low rate of false positives and false negatives. Tests are conducted by carefully trained workers who follow very standard procedures. Before we test a large sample of people, we administer two tests to 100 people and count the number of times that we get different results on the two tests. If it is large, then we need to figure out how many tests we need to do on one person to get a reliable result.
Of the many problems with numbers as collected and reported, consider the issue of time lag. Suppose that two regions each test 1000 infected people on day 1. Region A reads and records the results a few hours later. Region B reads and records the results a week later. Suppose that the one-week spread rate is 100 percent per week, and each region then tests 1000 new infected people. Suppose that the death rate is 1 percent, and death occurs near the end of the week.
After day 8, each region has 2000 cases and 10 deaths. But region A, which reads the results quickly, will report that cases are doubling weekly and the death rate is 10/2000, or 0.5 percent. Region B, which reads the results slowly, will still report 1,000 cases, with a death rate of 1.0 percent.
Another problem is that there is very large variation in the ratio of tests to infected people, not only across regions but over time within a region. As you ramp up testing, you increase the reported spread rate and lower the reported death rate.
Almost all health agencies have chosen not to monitor this crisis scientifically. I wish I could change that.