What is the true margin of error?

Alex Tabarrok writes,

The logic of random sampling implies that you only need a small sample to learn a lot about a big population and if the population is much bigger you only need a slightly larger sample. For example, you only need a slightly larger random sample to learn about the Chinese population than about the US population. When the sample is biased, however, then not only do you need a much larger sample you need it to large relative to the total population.

I am curious what Tabarrok means in the first sentence by “need a slightly larger sample.” I thought that with random sampling, the margin of error for a sample of 1,000 is the same whether you are sampling from a population of 10 million or 50 million.

But the issue at hand is how a small bias in a sample can affect the margin of error. We frequently see election results that are outside the stated margin of error of exit polls. As I recall, in 2004 conspiracy theorists who believed the polls claimed that there was cheating in the counting of actual votes. But what is more likely is that polling fails to obtain a true random sample. This greatly magnifies the margin of error.

In real-world statistical work, obtaining unbiased samples is very difficult. That means that the true margin of error is often much higher than what gets reported.

19 thoughts on “What is the true margin of error?

  1. If the distribution is normal (Gaussian), then you are correct. If the distribution is otherwise, then the required sample size may depend on characteristics of the distribution and population size. For example, to distinguish a normal from lognormal, the sample size needed depends on both the (geometric) standard deviation and the population size, see Table 1 of https://www.ncbi.nlm.nih.gov/pubmed/10429732. Many distributions of economic interest are lognormally distributed, for example the income distribution. In general, as the population size becomes larger (e.g. US vs China population), it becomes irrelevant and the characteristics of the distribution matter.

  2. need a slightly larger sample

    He was probably thinking of what develops an equal confidence level on a continuum trait.

    Discreet traits, such as favorite kool-aid flavor, are not particularly dependent on sample size after certain criteria are met.

    Continuum traits, such as height and weight, are going to have a greater distribution with a larger population, so a larger sample size is needed to capture the same deviation.

  3. Well, this probably the main reason why 538 aggregate the polls and makes the polls look better in the long run. It solves two main problems in which by combining 5 polls you turn 600 – 1,000 data points into ~4,000 data points. And then it also blends out the pollster biases of the data because they don’t get perfect random sample. What was the main miss of the 2016 Presidential election? The working classes were not polled enough and they underestimated the working class support of Trump. (It should be noted that all three poll pundits Wasserman, Nate Silver and Nate Cohn all saw this bias before the 2016 election.)

    Three points to remember:
    1) In reality the polls were closer in 2016 (3.1 HRC lead to 2.2 HRC lead) than they were in 2012 (1.1 Obama lead to 3.1 Obama lead)
    2) The biggest issue with polls done within two of the election is they herd their data. This was obvious in 2016 as they assumed the national lead meant HRC had a lead in Wisconsin.
    3) The reason why people believe or trust polls is because there is nothing better. Remember how much people believe Trump’s lead in the Republican Primary was not real. But the polls were right in the long run. (And think HRC got 53.3 RCP polls in January 4, 2016 and she got 55% of the Primary vote.)

    • The working classes were not polled enough and they underestimated the working class support of Trump.

      I generally agree with this statement except that it gives the impression that the issue is with the design of the poll rather than the inherent bias built-in to the polling process that has changed over time.

      In an age where attention is the main scarcity, the population that can’t be reached or refuses to answer (dark matter in the pollster universe) introduces structural bias that can undermine carefully acquired results if the dark matter is not evenly distributed.

      • From what I remember about 2012 & 2016, the pollsters were increasingly becoming more aware of this reality that working class voters were harder to include in the sample. They did work hard with the data they had and each poll does fill in the demographics they underpolled.

        The weird reality of 2012 and 2016 is the pollster underweighted the working class voters even with different results. What Happened?

        In 2012, pollsters assumed the minority working class would return to pre-2008 levels but African- and, somewhat lower) Hispanic-Americans came out hard for Obama. And the opposite happened in 2016 with Trump that the white working class voted at higher rates than they have in decades while minority working class dropped below 2008 levels.

        • What happened? The assumptions underlying the dark matter skew estimates dominate the polling process. Forgive me for beating this metaphor like a rented mule.

          • Pollsters have a hard reaching working class voters and if they are under-represented in sample data you use past data to adjust the polls. Nate Cohn did a big series on this with the 2018 Midterms and his polls were the closest to the number of House D wins. Of course this has issues as every election may adjust the Party to win a demographic such as Trump with WWC. (Or Obama keep the African-American and gain Hispanic-American in 2012.)

            The issue of polls is they can be wrong, such as the 2016 election, off such as 2012 but name something that is better than reading and understanding polls before an election. And note there has not been a Prez election since 1976, the polls got the winner wrong although Bush/Gore was essentially tied in 2000. (If you review the poll data Reagan was expected to win 1980.)

  4. I thought that with random sampling, the margin of error for a sample of 1,000 is the same whether you are sampling from a population of 10 million or 50 million.

    No, the equation requires population size. The margin of error just increases very slowly.

    Trivial proof by counter-example: If the population is 1000, the margin of error is zero. So the margin of error obviously changes with population size.

    • If the sample is a large portion of the population, it’s not a statistical sample.

      • Why not? The point of sampling is that the sample can be very small compared to the population size, and yet accurate. But the sample size doesn’t have to be small. It’s just a lot easier to use small samples, and the gain in accuracy isn’t that high.

        Go back to the basic equations. It’s just math.

        • Actually, I’m sorry, you’re correct. The population sizes are large enough that they don’t really change the error.

  5. I can”t help thinking of a brouhaha from several years ago. The statisticians in the Census Bureau said that the final 2020 Census results shouldn’t just be the people who answered but estimated numbers to fill in the people who were missed. This was rejected by the Administration and the story was played as racist Republicans want to disappear minorities.

    Perhaps. But this comment thread suggests that no one can really predict who is undercounted when.

    • …Census results shouldn’t just be the people who answered but estimated numbers to fill in the people who were missed

      I recall hearing this for the last forty years of census. My take is that Republicans are seen as more conscientious and “mainstream”, thus counted more fully; while Democrats have a constituency hiding under bridges.

      Interesting to compare this to the above polling discussion where Trump voters are undercounted.

  6. Do the same sampling problems appear outside of sociological samples? My impression is that financial auditors don’t worry too much about their samples of transactions being biased, though it is common for them to identify and individually test unusual transactions and to stratify populations before selecting a sample.

  7. I think there is a troubling implication for statistics-using scholarship in general.

    What the study points out is that it only takes the tiniest amount of difference (for instance, in response rates) between two sampled groups to generate large potential errors in results from any practically feasible sample size.

    One moral of the story is that one ought to be very leery of reported tight margins of error, especially in the context of poor track record and reason to suspect those differences in sampled populations.

    But the wore worrying moral of the story is that there is a terrific opportunity for mischief for an unscrupulous researcher to generate big shifts in the results by saying they have some ‘reliable’ estimate for that slight differences between sampled groups, and then ‘correcting’ for it. Or if not shift the average exactly, then to sneak an outcome from outside the former margin of error into the window of real possibility.

    So, especially if one is dealing with a subject in which low-probability scenarios would be catastrophic and thus the question is whether it’s likely enough to be worth worrying about (and maybe even transforming the whole economy), the ability to sneak in such a scenario would be very tempting for a researcher looking to get predictable amounts of public attention.

    And this would be tough to detect and police. It seems completely benign for a researcher to say they are only ‘correcting’ for a teeny, weeny ‘known’ difference between sampled groups. But as this study shows, it can have major impacts, and one could predict yet another round of ‘replication crisis’ (i.e. Fake Results) papers coming out.

  8. re: margin of error

    I found a website that has a calculator which calculates sample size for a 5% margin of error. I punched in a population size of 1,000 and got a sample size of 278.

    I then doubled the population size to 2,000. The sample size increased to 323.

    I then put in a population size of 100,000 and got a sample size of 383.

    So the sample size doesn’t change much and doesn’t scale with population, which is in line with Alex Tabarrok’s comment. A link to the calculator is below. I used the top calculator on that page:

    https://www.calculator.net/sample-size-calculator.html

    • But if you take a sample of 300 out of a population of only 1000, you cannot take an unbiased sample unless you sample with replacement. By sampling with replacement I mean that after you pick someone, you throw them back into the pool so that they could be picked again. That is what makes the whole idea of taking a large sample out of a small population quite problematic. I think in terms of large population size relative to sample size.

      • Also note that the Wikipedia article on sample size determination does not include population size in its calculation.

      • Upon further reflection on your comment in the article, I now see the point you were making.

        Using the same calculator I linked to, I entered a population value of 330 million for the US population and the resulting required sample size for a 5% margin of error is 385. Using a population of 1.4 billion for China and a 5% margin of error gives the same sample size of 385. Which is the same point you were making but stated slightly differently.

        I stand corrected.

Comments are closed.