Polygenic scores

Charles Murray is bullish on them.

I think the application of genomic data to social science questions is roughly where aviation was in 1908. The world’s best plane, the Wright Flyer, was little more than a toy. Yet within a decade, thousands of acrobatically maneuverable aircraft were flying high and fast over the battlefields of Europe.

I will read his latest book, but I have already staked out a more skeptical position.

Plomin is optimistic that with larger sample sizes better polygenic scores will be found, but I am skeptical. Unless there are unexplored areas in the existing data sets, such as non-linearities or interaction effects, my guess is that there are diminishing returns to enlarging the sample size.

That refers to Robert Plomin and his book Blueprint, not to be confused with another recent book of the same title by a different author.

22 thoughts on “Polygenic scores

  1. Although an amateur gemologist, must confess to being thoroughly intimidated by all writing on this topic. Steve Sailer’s glowing review says that the Murray book does a good job of covering the vast stretches of the literature for the interested reader so it might be useful to read. Nevertheless I too remain skeptical of the claims made on behalf of polygenic scores given the general rot in research described this morning by Archer: https://www.jamesgmartin.center/2020/01/the-intellectual-and-moral-decline-in-academic-research/

  2. My guess is that prediction of particular phenotypes from genomic data will converge in accuracy as our observations on degree of heritability of those features. That is, when we guess that some feature is 50% genetically determined by comparing kids to parents, using twin studies and so forth, then we’ll eventually be able to use DNA code to predict with similar accuracy an individual’s expression of that feature.

    I think the real test of this will be for height, which is easily measured, commonly measured, and in well-nourished modernity, understood to be highly inheritable and genetically determined, even if that determination is surprisingly complicated.

    With IQ, as far as I understand from casually looking at just a few of these studies, the trouble is they aren’t really collecting good IQ scores from participants in the study. Maybe there are some out there, but lots of people have high-stakes standardized test scores of some kind, but many of the studies aren’t using those either. The tests that evaluate teachers and schools that don’t affect kids grades are by definition low-stakes and thus not as reliable.

    Mostly they seem to be using proxies like ‘educational attainment human capital’ which is proxied further into simple ‘years in school’ or ‘highest degree reported’. Which is not completely crazy, but still very, very rough and noisily correlated, and which I suspect is making it hard for researchers to come up with impressive results fast.

    The trouble is that if you focus on people with top 1% high-stakes test scores (and of similar ethnic background at that to avoid that major genetic confounder) you are lowering the sample size below what you probably need to get very clear and impressive results.

    • the trouble is they aren’t really collecting good IQ scores from participants in the study

      That in a nutshell is the tragedy of the Nurture-Only movement led by Gould and Lewontin; IQ testing became taboo if not outright banned.

    • My guess is that prediction of particular phenotypes from genomic data will converge in accuracy as our observations on degree of heritability of those features. That is, when we guess that some feature is 50% genetically determined by comparing kids to parents, using twin studies and so forth, then we’ll eventually be able to use DNA code to predict with similar accuracy an individual’s expression of that feature.

      I think there is an implicit assumption that genes will be randomly distributed, so that as your sample size gets larger you will observe sufficient numbers of all possible variations. But what if that isn’t the case? Suppose there are five genes, call them A, B, C, D, and E. A makes people tall. B makes people tall. C has no effect on height. D or E make people short. There are five populations.

      1. Has none of the five genes. normal height
      2. Has A and C. tall
      3. Has B and C. tall
      4. Has A and E. normal height
      5. Has B and E. normal height

      The statistical model is going to want to point to C as a gene that promotes height. It’s true that in this example the “fooled” statistical model works to predict height well enough, but my intuition is that this does not hold in general. That is, I suspect that you can only explain all of the heritability in a characteristic with a correct model, and a “fooled” model will not do so.

  3. In his WSJ article, Charles Murray says:

    social science has never been about causal pathways… it has been about explaining enough variation to achieve predictive validity.

    This is also the guiding principle of the applied sciences like medicine and engineering. Murray uses a famous engineering innovation as a reference:

    I think the application of genomic data to social science questions is roughly where aviation was in 1908. The world’s best plane, the Wright Flyer, was little more than a toy.

    The Wright Flyer was not a toy, it was a gigantic engineering leap. All innovations in aviations afterwards are incremental improvements in comparison. The Wright brothers depending completely on empirical data and field testing. They started with existing airfoil lift tables from Otto Lilienthal (derived empirically). Their first glider tests at Kitty Hawk (the hilly/windy location is another empirical innovation for these residents of Dayton, Ohio) failed miserably because Lilienthal’s tables were wrong which led to another giant Wright Brother breakthrough, the miniature wind tunnel. They then solved the 3-axis control problem and applied their painstaking model/wind-tunnel technique to design the propeller. Adding Curtis’ low-power motorcycle engine was anti-climatic.

    Murray rightly appreciates empirical data to “achieve predictive validity” but misses the boat (or plane in this case) on toys vs. innovation. Like Kling, I’m skeptical of Murray’s bullishness on polygenic scores. The key assumption he makes is that the underlying variables are independent rather than the “complex interaction” between genes and the environment claim that he and Herrnstein are famous for. This assumption may prove to be true, partially true, or completely untrue but I’d bet on something very close to completely untrue. I think we will find few if any cases where a collection of genes act independently of environmental factors.

    • > The Wright Flyer was not a toy, it was a gigantic engineering leap.

      I realize I may be quibbling, but I think “toy” is an accurate and non-derogatory description of the Wright Flyer. It was a toy in the same sense that early PCs (like the Atari) or a catapult made from popsicle sticks are toys: they aren’t practically useful for much other than the intrinsic satisfaction derived from their use, but they make for an easy demonstration of an idea by reducing it down to it’s essence. In other words, a “toy” is just a “proof of concept”.

      >Like Kling, I’m skeptical of Murray’s bullishness on polygenic scores. The key assumption he makes is that the underlying variables are independent rather than the “complex interaction” between genes and the environment claim that he and Herrnstein are famous for. This assumption may prove to be true, partially true, or completely untrue but I’d bet on something very close to completely untrue. I think we will find few if any cases where a collection of genes act independently of environmental factors.

      I’m rather surprised at the skepticism. I feel like one of the themes of this blog is the perpetual inadequacy of present social sciences to make valid predictions on the basis of environmental factors. It seems quite possible that much of this gap in predictive power could be closed by looking at polygenic scores.

      (Also, I assume that polygenic analysis will be done in addition to the present environmental analysis, not in replacement of).

      • I think we need to distinguish between technical innovation and product-market fit. The Wright Flyer represents a technical innovation similar to Watson and Crick’s discovery of the double-helix structure of DNA. There was a race to build the first heavier-than-air flying machine but, unlike DNA, all the other competitors were on dead-end paths. One interesting question is why it took humanity so long to get to the form-factor of the bicycle. I’m not sure heavier-than-air flight would have remained similarly elusive if it were not for the Wright brothers and I’m quite sure that someone else would have come up with the structure of DNA within a few months/years of Watson and Crick. Early Personal Computers were toy computers compared to mainframes, and the early Intel/Motorola CPU’s were functional jokes compared to IBM CPUs but the Integrated Circuit (IC) was an incredible technical innovation relative to IBM’s circuit boards.

        It is not a general skepticism that underlies my assessment of polygenic scores, it is an informed estimate of how independent the nature and nurture components are. There is an upper limit to how independent the genetic variables are from the environmental variables. Height, which under normal conditions is purely genetic, has a 17% polygenic component that is independent. 20% seems like a reasonable upper limit for most attributes and I’d be shocked if behavioral attributes had more than a tiny fraction of this upper limit. I also agree with Mark Z’s fine assessment of the statistical challenges that he outlines below.

        This is a matter of predicting where research resources should go. Polygenic Scoring is a very cool technology/technique but I think there is more potential for traditional social science studies with a shift in focus on measured IQ and peer environments within a timeline of human growth/development milestones. New/advanced tech isn’t always the best tech. I’ve been advocating for natural gas fired plants for power generation over nuclear power generation; same idea.

  4. Re: “blueprint”

    Kevin Mitchell’s Innate makes the point that “blueprint” is a terrible metaphor for what genes do. Genes do not code for a picture, even a 3-dimensional one. They do not code for a product. Rather they code for a process. A process subject to a good deal of randomness. Some times the randomness cancels out and you get a canonical human. Some times it explodes. One identical twin may be a little weird while the other develops full-blown schizophrenia.

    • Kevin Mitchell’s Innate makes the point that “blueprint” is a terrible metaphor for what genes do.

       

      I think blueprints in a product factory is an almost a perfect analogy with three clarifications:

      1. DNA is a collection of blueprints (plural) not a single blueprint
      2. Each blueprint describes a core machine component, not an end-product
      3. Product machines self-assemble from these core components in an emergent fashion

      A cell is similar to a product factory where DNA is like a wall of blueprints for a set of core machine components. Each blueprint contains a small subsection with simple instructions on when to pull the required raw material from inventory; this is the only software in the system. The component room and it’s wall of blueprints has a small footprint compared to the inventory and product machine sections of the factory. When a factory splits into two it not only copies the wall of blueprints but also copies the state embedded in the inventory levels that triggers both blueprint production and emergent machine production. Blueprint production is deterministic but factory production is emergent.

      • I feel fairly sure Mitchell would say it’s not a collection of smaller “blueprints”. It’s a very long and complicated sequence of processes, with many, many processes happening in parallel.

        • I have not read Innate. Fully realizing that a non-specialist claiming to understand the genetic basis of life better than a world-renowned expert is arrogant, if Mitchell fundamentally disagrees with my model then I fully embrace the label of contrarian crank who needs to learn humility but I’m sticking to my guns until I understand why my model is technically wrong.

          The core part of the model is standard genetics. The emergent part is from my head but as The Who song says “every idea in my head, someone else has said” probably applies and I’m just too lazy/apathetic to look for prior-art.

          The standard genetics part is that each blueprint is an Operon, a cluster of one or more genes that starts with a Promoter area that indicates the starting point for transcription and an Operator that defines a handful of “procedures” (in the software sense) that regulate transcription (e.g. Repressor or Attenuation). This builds upon and is only slightly more complicated than the standard one-gene==one-enzyme model. Enzymes are the “machine components” in my analogy and humans have less than 20K of them. Since each blueprint/Operon can describe one or more enzymes we have less than 20K components to build the complexity of all the specialized human cells that make up the body/mind. DNA also encodes RNA so its possible that my model misses an aspect involving RNA that is procedural in nature.

          This model so far covers the deterministic aspects of genetics. The sequence of Operons is only important for replication. Each blueprint is transcribed independently and in parallel with the other blueprints that make up DNA.

          My expertise, if any, is in computing. I understand data structures, procedures, and state machines very well. None of the standard genetic explanations are satisfactory for me. It was only when I thought of the molecules that make up the cytoplasm as a state-machine that interacts with DNA through the Operon’s Operators that it clicked for me. I may be the only one and/or this model may represent a malfunctioning mind (an extreme preference as Bryan Caplan would say) but there you have it.

        • You understand coding better than I understand genetics so I can’t really comment. Mitchell’s book is interesting and well-written. I recommend it.

          I was really impressed by Jamie A. Davies’ Life Unfolding: How the Human Body Creates Itself. It is not so much about genetics as about the actual process of going from a fertilized egg to a body of a trillion specialized cells–which may be closer to what you are talking about.

          • I’ve ordered “Life Unfolding” and “Synthetic Biology: A Very Short Introduction”. Thanks for the recommendation.

  5. I think you’re underestimating the sheer extent of the statistical power problem in genomics. Obviously, returns usually start to diminish right away once you discover a few big mutations. But what’s remarkable in not just complex heritable traits, but also for somatic mutations in cancer, is just how much remains to be explained even after all the big ‘driver’ variants are known. It’s a fact that has disappointed cancer researchers, who expected mutations in a fairly small set of genes would ultimately explain everything.

    So, I think there’s definitely lots of causality among the millions of individually less dramatic variants. And it’s not just that each variant itself is rare that makes its significance hard to discover; the effect size is also small (that is, a single variant may have a very small effect on the likelihood of carcinogenesis or on a trait like intelligence), so we may need extremely large sample sizes to isolate the effect.

    Perhaps I’m misunderstanding your critique, but it seems like you’re saying you question whether there’s much more causality to find that hasn’t already been found. I think there definitely is, though the sample sizes needed may in some cases be impossibly high. I think the main weakness of polygenic scores is that I suspect they’re prone to overdetermination, especially when machine learning type methods are used to develop them. It’s not fully appreciated by many how much ‘irrelevant’ structure there is in biological data that can confound pattern recognition.

    • Right. Larger sample sizes usually have diminishing returns, except in special cases where there are very many small effects adding up, and you need huge sample sizes to identify and isolate all the small-effect factors to make their influence statistically significant. In those cases, larger sample sizes have increasing returns, as we might expect a revision in an inferred polygenic prediction model to go from 20 “+1” variants to those and 200 more “+0.1” variants. This is twice as powerful as the original model, but impossible to figure out without much higher n than was necessary to find the original 20.

  6. I found this part of the Murray Op-Ed unconvincing:

    <>

    It’s not obvious to me that, based on our current R&D behavior, either finding would substantially “energize” any searches. Is that uncertainty, about the precise contribution of IQ v environment, what is blocking progress now? Aren’t there people working on both versions of Murray’s?

    • Sorry, this is snip from the WSJ piece:

      Suppose that, a few years from now, it has been solidly established that adolescents from disadvantaged backgrounds have IQ scores that average 10 points lower than their genetic potential would have led us to expect. Confident new knowledge of that kind will energize the search for effective childhood interventions in ways we can scarcely imagine.

      Suppose instead it is found that the adolescent IQ scores of children from disadvantaged backgrounds are about the same as we’d expect from their polygenic scores. That will provide an incentive to foster human flourishing for people with lesser abilities—an issue that has been criminally ignored in our era’s insistence that all the children can be above average.

    • Aren’t there people working on both versions of Murray’s?

      I’d argue that no one has been working on either version. The environment focus continues to be on “nurture” rather than peer influence etc. Two to three generations worth of IQ data has not been collected which means we are starting from scratch when it comes to determining “genetic potential”. We are working backwards from outcomes to estimate IQ.

  7. I think the genome is intrinsically non-linear. Two genes acting in concert may very well have consequences different from each one individually. The effects are not simple linear combinations.

Comments are closed.