Wading into Probability and Race

I am still only partly through Nicholas Wade’s A Troubled Inheritance, and I am dealing with what I think is a Bayes’ Theorem issue.

For Wade, a race (as in European or African) is a cluster of genes that go together in a sense that is probabilistic rather than absolute. You can identify a person’s race with high probability, based on DNA analysis. Apparently, the cluster of genes involved is large–over 100 different alleles. (I may not have this right. I never took a bio course.)

Suppose that there are 100 alleles of interest, each of which can be “heads” or “tails” (did I mention that I never took bio?). In Europeans, each one has a 55 percent chance of being heads. In Africans, each one has a 50 percent chance of being heads. If you observe a person in which only 40 of the alleles are heads, you can be very confident that this person is an African. There will be 14 times as many Africans as Europeans with 40 or fewer “heads.”

But once you have confidently identified an African, and you want to predict whether the African has heads or tails on a particularly allele, it is still a 50-50 guess. Or, I suppose you could say that in this particular instance, knowing that 40 of the alleles are heads, it is a 40-60 guess.

My point is that you can have a very high probability that someone is an African, conditional on a bunch of genetic characteristics. But at the same time, you can have a not-so-high probability that someone has a particular genetic characteristic, given that someone is an African. (For some characteristics, notably dark skin color, the probability that the African has that characteristic is very high. But the same need not be true for other characteristics.)

The really loaded issues in race have to do with the probability of having a characteristic given that you belong to a race. So far, Wade has not told me anything that indicates that we know much about this. Instead, he tells me we know a sort of reciprocal probability, which is the probability of belonging to a race given that you have a particular set of genetic characteristics. If you know Bayes’ theorem, you know that the one conditional probability need not be close to its reciprocal. I think that the hypothetical result given above makes one wary that there is a Bayesian sort of problem lurking in this book. Bear in mind that I am only part of the way through.

UPDATE: Wade apparently is out as NYT science writer. The reasons have not been disclosed, but I doubt that Bayes’ Theorem was a factor.

UPDATE 2: Commenters point out that Wade has been a former science editor at the NYT for quite some time, so perhaps the story of him being out is a misinterpretation.

11 thoughts on “Wading into Probability and Race

  1. “Wade apparently is out as NYT science writer. The reasons have not been disclosed, but I doubt that Bayes’ Theorem was a factor”.

    I also saw that appendage yesterday, in the book excerpt that Wade wrote for time.com. It says that he is a former “editor” at the NYT, not a former journalist. So it could be that he will continue on as a writer of articles only, which is not an uncommon move for editors to make. So that he is “out as NYT science writer” hasn’t really been proven yet.

  2. Pure puritanism. This is what “science” means to the modern day Puritan. If you fire everyone who commits blasphemy, then the consensus that remains “proves” your doctrine. Your lying eyes (and factor analysis) are wrong because “Scientists” say so.

  3. A very similar argument could be made when discussing the genetic differences between humanity as a whole and a very close relative, either one of the extinct ones in the homo genus, or perhaps a close primate relative.

    Of course it’s not so simple as 100 coins.

    Consider the very few genes that code for blue eyes or blond or red hair. If you have these, you are extremely likely to have a Northern European origin – an ethnic determination even more granular than race, and based on many fewer genes than 100. One could say the same for East Asians and the few genes that generate the distinctive epicanthic fold. One could say the same for the few genes which would give an individual enough melanin for extremely dark skin, for which one could be very confident that the individual was no European or East Asian.

    Wade doesn’t adequately cover the progress made by geneticists in developing statistical, factor-analysis style techniques to solve this problem of heritage identification (not just for humans, but for all living things), and the remarkably accurate ability to separate out very close ethnic groups on the basis of a few key allele-cluster distinctions. Look at Razib Khan’s page and GNXP to see the graphs of the results in which one can easily see distinctive genetic splits between, for example, Han Chinese, Koreans, and Japanese.

    Indeed, one of the problems with trying to identify one’s ‘race’ is that the races are such broad and ancient categories that one requires multiple genes to accommodate every possibility of diversity within the race. So, blue eyes signals that one is most likely Northern European. But because Wade is considering the super-group of ‘Caucasians’, the gene for eye color is not enough, and one requires a much larger statistical cluster to determine category.

    So, let’s say there are ‘races’ X, Y, and Z. And now let’s say each of your 100 coins has two sides: Heads – occurs only in race X, Tails – shared by all races, but at different frequencies.

    If anyone has any coin that comes up Heads, they belong to X. But because lots of X’s also have Tails, you need all one hundred coins to be sure. But if you were to separate out the ‘race’ of X into ‘expressions’ X1, X2, … X100, then to determine each ‘expression’ group, you only need one gene, AND you know that the members of that group are of race X.

    • Your statements are a good starting point for someone like the top author who is just starting out. People who have no grasp of basic vocabulary words like haplotype ought to learn something before wading into the discussion.

  4. The clustering algorithms generally use SNPs ( single nucleotide polymorphisms ) and there can be many thousands on a single gene. The algorithms can cluster without knowing what the genes and SNPs actually do, and generally one does not know what genes were used.

  5. Yes. This is the central issue. Ancestry is now relatively easy to determine with modern genomic techniques. But complex traits like temperament or height are determined by so many genes we can only show correlation at the population level. We don’t understand causation at the biological pathway level. Except for simple traits like eye color or lactose tolerance. This is why progressives view Wade’s book as pseudoscience dressing up racism. Yet another rehash of The Bell Curve. See for example this Slate takedown on the book
    http://www.slate.com/articles/health_and_science/science/2014/05/troublesome_inheritance_critique_nicholas_wade_s_dated_assumptions_about.html

    Which led to this twitter exchange where Matthew Yglesias first asks why someone would read the book, then responds when questioned saying that “my mind is pretty closed about scientific racism, yes”
    https://twitter.com/mattyglesias/status/465312436955123712

  6. Wade’s book says that genomic analysis can classify people into races. Races are scientific facts, not social constructs. Genomic analysis can even classify people further into smaller geographic clusters, e.g., Chinese, Japanese, and Koreans.

    It is ironic that the first most educated Americans heard of this was from one of America’s most prominent African-American progressives. Henry Lewis Gates has hosted a number of PBS series that claim to tell what a person’s ancestry is by analyzing their genome. The analysis wouldn’t work unless different groups had differences that could be objectively determined.

  7. Dr. Kling,
    I’ve been puzzling with this subject matter for a little while, starting with about the same knowledge base as you: genes, alleles, and so on. From my reading, I have come to understand the following:

    Throughout the genome, there are locations where a single nucleotide (A, C, G, or T) may, in at least 1% or so of the population, take a value different from the otherwise universal value – say an A where a C almost always occurs. These deviations from normal are called single nucleotide polymorphisms, or SNPs (“snips”) for short. There are hundreds of thousands of such locations in the genome. Because about 85% of the human genome consists of “junk DNA” that doesn’t code for anything, and because not all SNPs in coding portions of the DNA result in changes in the amino acids coded for, it would be wrong to call SNPs alleles, although alleles are often arise from SNPs.

    It is now possible to check, say, 500,000 SNPs in the genome of a single individual. If those SNPs say that you are, say, 50% Han Chinese and 50% Caucasian, then you are, regardless of what your parents told you. These facts are why people can now send away tissue samples and get in return detailed information about their ancestors – not merely gross information such as continental race (African, East Asian, or Caucasian) but information about places and times where one’s ancestors were definitely present (because that’s where and when a well-known mutation occurred).

    Once you know your ancestral background (whether continental race or some finer subdivision), you may have considerable information about the distributions of inherent traits of which your particular traits are samples. If your ancestral group has an average IQ of 105 (like the East Asians) and yours is 120, then your IQ is about 1 standard deviation above the mean for people “like you” genetically. Applied to individuals, information about group means can be pretty weak soup, of course: Group differences can be much less than individual-to-individual differences within populations. However, when the outcomes achieved by groups as a whole are considered, small differences in the mean value of a trait – say aggression – can be very powerful, even though this difference might be swamped by the variation within groups. This is the basis for Mr. Wade’s claims about the effects of the evolution of personality traits after the races separated. Also, while Mr. Wade speaks softly when he says it, some intergroup differences are quite large.

    I hope this is helpful.

    Ken

Comments are closed.