Working backwards from deaths to cases

[corrected at 12:40 PM–bad arithmetic error before. Thanks to David Henderson for spotting it]
Because we have not done random-sample testing, we have no idea of the true number of cases in the United States. Reported numbers are worthless. Below, when I refer to cases I mean the (unknown) true total number of people who have ever been infected, not the reported case numbers.

Suppose that we try to work backward. Suppose that we assume a lag of n days from the time of infection to the time of death. Then the number of cases as of n days ago is equal to the number of deaths as of today, divided by the true (unknown) case fatality rate.

The more cases (meaning actual cases, not reported cases) that it took to generate all of the deaths as of today, the happier we should be. A higher number means that there are more people who have already been infected, so we are closer to the peak of the curve.

For example, as of yesterday, there were 8314 deaths. If we assume a true case fatality rate per 1000 of 20, that means that as of n days ago there were 50 times 8314, or 41,570 415,700 cases. If both cases and deaths has doubled every three days (3DDRR = 2.0), then the number of cases today is 2^(n/3) times 415,700. If n is 9, then that means we had about 3.3 million cases as of yesterday. Is that a lower bound? Or is the cfr higher than 20 per 1000 (higher than 2 percent)?

If we raise n to 15 and keep the cfr at 20 in 1000, then the true number of cases as of yesterday was about 13 million. If we raise n to 15 but assume a cfr of only 2 in 1000, then the true number of cases as of yesterday was about 130 million. Is that the upper bound? If we are anywhere near that, we are close to a peak.

18 thoughts on “Working backwards from deaths to cases

    • I recommend the editing the post.

      If n = 9, cases = 3,325,600
      If n = 15, cases = 13,302,400
      “If we raise n to 15 but assume a cfr of only 2 in 1000”, cases = 133,024,000, herd immunity this weekend.

  1. Might n=15 still be too low? ISTR that besides the incubation period there is often a long lag between initial onset of symptoms and the kinds of intensification that typically kill people– and if someone is treated in the ICU and then dies anyway, that may cause a further delay to what in retrospect was the inevitable.

    • 21 days seems like a reasonable estimate for a median lag between initial infection and death. It seems like it takes most people 4-7 days to show symptoms, and an additional 10-20 days for death to occur.

      • And if you assume a lag of 21 days from infection to death……we’re either on the verge of herd immunity or some of these assumptions are a bit naive (including my own).

        For example, right now, NYC has about 2500 recorded deaths. With a CFR of 2%, n=21 gives you 16,000,000 cases. n = 15 gives you 4,192,000 and n = 9 gives you about 1 million infected.

        16 million infected is obviously not correct since that’s more people who live in the city. 1 – 4 million I can buy, but 9-15 days from infection to death doesn’t correspond to what I’ve read about how the disease progresses.

        My guess is that the best way to capture how the disease moves is by using some sort of mixture model to capture sub-population dynamics.

  2. David has already mentioned it- all your calculations are a magnitude too low.

  3. I don’t think you can use deaths, though- it is just too noisy from region to region. Where things go really wrong is when the disease penetrates the hospitals and the nursing/assisted-living facilities. I noticed a story a couple of days ago describing France’s sudden jump from around 500 deaths/day to over 1000 deaths/day, and it said that France started to account for nursing home deaths at that point in time. Now, that is kind of shocking- why wasn’t this done from the start in France?

    In any case, maybe you assume that every other country either did account for such deaths from the start, or they all have been ignoring such deaths. In the US, I think the states have been counting all COVID-19 deaths as well as they can- see the Kirkland, Washington case, for example.

    What I would really like to see in the New York data (and all the states, too) is a breakdown of the living arrangements of the dead (the living, too, would be nice data to have). That this isn’t being done is almost criminal at this point.

  4. In short, I would like to see a breakdown like this of the infected and dead:

    (1) Infected by household member
    (2) Infected in nursing home
    (3) Infected in hospital/hospice
    (4) Infected by medical personnel
    (5) Infected by work colleague
    (6) Unknown

  5. The Institute for Health Metrics and Evaluation (IHME) reports that the number of deaths in the USA doubled every three days predicts that the number of deaths in the USA doubled every three days in the second half of March—but predicts that it will take five days for the number of deaths to double from today, and the doubling time will steadily increase during lockdown:
    https://covid19.healthdata.org/projections

    On the other hand, should the number of new cases at the peak greatly exceed hospital and ICU capacity, then the true case fatality rate probably would increase temporarily.

  6. What is this modeling supposed to accomplish? When you perform arithmetic with uncertain numbers, the error bars rapidly grow to the point that your calculations are meaningless. We don’t know, and we lack any practical way to find out. Deal with it.

    Colorado did some random testing a week ago. They had zero positives of ~650 tested. Although it wouldn’t surprise me if one or two of the test subjects was infected in the time between sample collection and reporting of results. The best estimates I’ve heard are around 1 nonreported (asymptomatic or mild) cases per reported case (based on contact tracing), which tends to argue against the “huge numbers of asymptomatic cases” idea.

    But it doesn’t change your optimum behavior. Stay at home; wash your hands. We live until we die, and prediction is a mug’s game.

    • Actually, the number of asymptomatic cases does change the optimum behavior. If it’s high, then we shouldn’t be staying at home. Washing hands, and even maintaining 6-ft separation are fine, because they are cheap. Staying home, however, is a multi-trillion dollar question.

  7. Upon proofreading my comment after posting it, I am correcting the grammar, which I had botched through hasty cut-and-paste — Sorry about that:

    The Institute for Health Metrics and Evaluation (IHME) reports that the number of deaths in the USA doubled every three days in the second half of March—but predicts that it will take five days for the number of deaths to double from today, and that the doubling time will steadily increase during lockdown:
    https://covid19.healthdata.org/projections

    On the other hand, should the number of new cases at the peak greatly exceed hospital and ICU capacity, then the true case fatality rate probably would increase temporarily.

  8. I think those bounds are pretty safe. I don’t think we’re anywhere near 130M cumulative infections yet. Personally, I think we’re closer to your lower bound.

    Probability best to use CFRs and n’s from countries that come close to random sampling like S Korea. Their CFR (imperfectly measured because of timing issues) is currently at about 1.7% (which is up from early estimates below 1%… so maybe something has changed with their testing?). Not sure about their n. Certainly best to fact check me on which countries do the closest to representative samples. Maybe Germany, whose CFR stands at about 1%.

Comments are closed.