The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Anatoly Karlin Razib Khan
Nothing found
 TeasersGene Expression Blog
Quantitative Genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

The_Sports_Gene_Book_Cover_2013 Interesting piece in The Wall Street Journal, which could have been cribbed from David Epstein’s The Sports Gene (a very good book I might add), NBA Basketball Runs in the Family (if you go to Google News and search for the title it should come up and you can get a free copy):

According to a Wall Street Journal analysis of biographical data on every NBA player, 48.8% are related to current or former elite athletes—defined as anyone who has played a sport professionally, in the NCAA or at national-team level. While other leagues feature notable dynasties—the Manning’s of the NFL or the Griffey’s in baseball—only about 17.5% of NFL players and 14.5% of MLB players are related to other elite athletes, based on a similar study.

The connectedness in the NBA likely comes down to the importance of height in elite basketball. The average NBA player is about 6-feet, 6-inches tall, which is 11 inches taller than the average American male, according to Census data.

As indicated in the piece you aren’t seeing that the 10,000 hour rule is a secret passed down within families. If you are not very tall it is unlikely that 10,000 hours of practice will result in you becoming a professional athlete in the NBA. The article emphasizes that the enrichment of those with relatives who had played in the NBA is far greater than the NFL or MLB, but please note that the average person’s odds of entering any professional sport is infinitesimal. Well, not quite, but the odds are low.

The piece in The Wall Street Journal is valuable for the added data, but there a few conceptual aspects which I’m not satisfied with. Researchers have known for decades that most of the variation in the population in non-malnourished societies in height is due to variation in genetics. 80 percent heritability is conservative. This can lead to some confused intuitions though. The correlation between siblings is high, but not that high, in the range of ~0.50. That translates to an average difference in height of nearly two inches.

41ZhyEU5lGL._SY344_BO1,204,203,200_ In other words, parental or sibling success in the NBA is not destiny. On the contrary. Nearly half of current players may have had relatives who played in the NBA, but most of the people who have relatives who played in the NBA did not themselves play in the NBA. But, obviously having relatives is incredibly predictive of much higher than normal odds (orders of magnitude greater!) of becoming a professional.

Why? As noted in the article NBA players need to have an intersection of traits which are very deviated from the norm. The range restriction on height, with “very short” players being mildly above average the human male median, shrinks the pool of potential candidates a lot. Fourteen years ago James F. Crow wrote Unequal by Nature: A geneticist’s perspective on human differences. Crow observes “that whenever a society singles out individuals who are outstanding or unusual in any way, the statistical contrast between means and extremes comes to the fore.” As it happens being a professional basketball player is not just about height; one needs to also be athletic, and exhibit a modicum of agility and skill. At the collegiate level there are many relatively tall players, but most of them do not have the skill level of an NBA player. The best-of-the-best have often been NBA players who combine great height with high skill levels (e.g., Lebron James, Magic Johnson, and Kevin Garnett being examples; Michael Jordan, a few inches shorter than James, had greater skill, but he is close to the NBA median).

The article also highlights the fact that individual humans often want to attribute their own success to their hard work, or choices their parents made. Many of the players interviewed did not deny the importance of their size and athletic endowments, but emphasized the importance of learned work ethic and competition with family members of similar skill levels and physique. This illustrates two other aspects of quantitative genetics: gene-environment interaction and gene-environment correlation. Obviously these are real phenomena. But are they really relevant for an NBA player?

David Robinson grew up in a middle class family (his father was an engineer). He scored 1320 on the pre-recentered SAT (that puts his IQ well above two standard deviations) and majored in mathematics at the Naval Academy. Robinson’s non-basketball activities were, and are, copious (and not in a Dennis Rodman fashion).

He was not initially very good at basketball in secondary school, but underwent a massive growth spurt in his late teens. Eventually he became a standout basketball player at the collegiate level, and went on to a storied career (after serving some time in the navy). My point with recounting this is that even someone like David Robinson, who had many alternative paths, talents, and opportunities, and evinced no burning desire to become a basketball player at all costs, became a professional. Why? Because his raw talent was clear, and the reality is that becoming a professional basketball players is highly lucrative. The average NBA player earns millions. Even a washout player can earn millions in one year.

So we are at this point moving from the domain of quantitative genetics, to economics. Incentives matter. Millions of young people delude themselves into thinking they have a chance. The reality is that even someone like Jeff Hornacek, perhaps a mascot for those who argue that work ethic can match talent, is not physically typical (he’s 6’4). And, let’s be honest, work ethic matters a lot, but it too is heritable (mediated through conscientiousness). Wheels within wheels….

• Category: Science • Tags: Basketball, Quantitative Genetics 
🔊 Listen RSS

512px-Bellcurve.svgThis year at ASHG one of the most fascinating talks was Po-Ru Loh’s, where he reviewed the BOLT-REML method. It’s introduced in the paper, Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. As you likely know many diseases such as schizophrenia manifest as complex trait; that is, they’re basically quantitative in their genetic architecture. Lots of alleles in the population, at varied frequencies (e.g., it might be low frequency and large effect, or higher frequency and smaller effect). In the abstract they state that “We also observe significant enrichment of heritability in GC-rich regions and in higher-frequency SNPs for both schizophrenia and GERA diseases.” In other words, they’re getting toward the holy grail of these sorts of studies, actually fixing upon likely loci which explain the variation.

But the genesis of these methods goes back to the late 2000s, when some statistical geneticists began to synthesis the power of genomics with classical quantitative genetic frameworks and insights. Another paper which sums up this tradition is Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. That is, the authors have confirmed the classical heritability estimates for height, using inferences such as twin studies, with genomic methods. Many geneticists operating just outside this field are totally unaware of the power, precision, and rapidity in advance of this set of techniques. If so, I suggest you read A Commentary on ‘Common SNPs Explain
a Large Proportion of the Heritability for Human Height’ by Yang et al. (2010)
(ungated). Here is the final paragraph:

Why have we encountered so much apparent misunderstanding of the methods and results in the human
genetics community? The core of our method is heavily steeped in the tradition of prediction of random effects and the estimation of variance due to random (latent) effects. While estimation and partitioning of variance has a long history in human genetics, in particular in twin research, the prediction of random effects is alien to many human geneticists and, surprisingly, also to statisticians (Robinson, 1991). Another reason could be the simultaneous use of population genetics and quantitative genetics concepts and theory in our paper, since these are usually applied in different applications, e.g., gene mapping or estimation of heritability. All concepts and methods that we used are extensively described in the textbooks by Falconer and Mackay (1996; chapters 1, 3, 4, 7–10) and Lynch and Walsh (1998; chapters 4, 7, 26, 27).

Please, if you read anything on this blog, read this.

• Category: Science • Tags: Genomics, Quantitative Genetics 
🔊 Listen RSS


The above figure is from a paper in Proceedings B which shows a Dutch data set from right after World War 2. Controlling for several variables taller men and average height women have maximal fertility. The authors contrast the results from the United States, where it seems that shorter women and average height men have maximal fertility. This is kind of a big deal. The reasons for why the Dutch, who were the shortest Europeans two centuries ago, are the tallest nation in the world today, have been a matter of public discussion for over ten years (see this article in The New Yorker).

In the 19th century American whites were far taller than Europeans. European elites who toured the United States were reputedly shocked by the fact that American yeoman farmers were no shorter than them, as was the norm among the peasant classes in their lands of origin. This lack of size differential due to surplus of land in the early American republic was often compared with the relative social egalitarianism of the United States, along with its broader democratic ethos.

Obviously things have changed since the 19th century. The Malthusian conditions which ground down Dutch peasants in the 18th century no longer applied in the 20th century, and definitely not in the 21st century. Modern agricultural techniques mean that Northern Europeans are no longer nutritionally constrained. Not only that, but one could argue that today Northern European societies are more egalitarian than the the United States. Naturally there has been a focus on the environmental factors which might have shaped this difference in the distribution of heights between Northern Europeans and American whites of Northern European heritage.

But there are some biological issues which are likely relevant. Average human size, including height, actually peaked in the wake of the Last Glacial Maximum, ~20,000 years ago. Some of this is likely due to the nutritional changes enforced by the Neolithic Revolution, but the decrease in sizes predate that. Likely standard dynamics common to mammals, such as Bergmann’s rule, have also affected humans. The recent increases in height across the developed world have still not produced a population as imposing as that of late Pleistocene humans. Second, over the past few years plenty of genomic work has now argued for selection on height in Europe, explaining why there are small but persistent differences between populations in the north and south. Ancient DNA analysis has now confirmed this broadly result, as populations diverged in size due to local ecological pressures.

These new results suggest that selection is driving change in allele frequencies which control for height even today among the Dutch. The methods were pre-genomic. Basically they tracked fertility of individuals along with a bunch of variables, including height. There was no need to go into genomic details because there is a wide body of research which indicates that 80-90% of the variation in height in developed societies is controlled by variation in genes. In other words, height is a highly heritable trait. As per the breeder’s equation all you need to change a trait value for a highly heritable trait is selection:

Selection × Heritability = Response to selection

If there is selection but no heritability, then there is no response. If there is heritability but not selection, then there is no response. In this case the heritability is well known, and now they have shown selection in the Dutch population as an implication of differential fertility that tracks this heritable variation.

This framework is true for quantitative traits more generally. I wouldn’t be surprised to see that there is a fair amount of evolution going on in modern human populations, which large and robust data sets might be able to capture in the near future.

Citation: Does natural selection favour taller stature among the tallest people on earth? Gert Stulp, Louise Barrett, Felix C. Tropf, Melinda Mills, Proc. R. Soc. B: 2015 282 20150211; DOI: 10.1098/rspb.2015.0211. Published 8 April 2015

• Category: Science • Tags: Quantitative Genetics 
🔊 Listen RSS


IBD plays a big part in my understanding of inheritance. I don’t mean inflammatory bowel disease. Nor do I mean isolation by distance. I’m talking identity by descent. Assuming your parents are “unrelated” then you are identical by descent with your sibling across some portion of your genome. You inherit identical segments from your parents, though due to recombination they will usually be non-identical at least across some part of the chromosome. Because of the law of segregation you should overlap 25% with your full sibling on the copy of the genes inherited from your mother and father (double that, and you get 50%). But this is an expected value. As it happens many siblings are not exactly 50% (e.g., I know of full siblings who share 40% of their genomes identical by descent from their parents). In the pre-genomic age this detail about variation was elided because usually you couldn’t precisely estimate the identity by descent. Rather, you just assume that you share 1/2 your genome with your full sibling, 1/4 with a half sibling or aunt/uncle or grandparent, 1/8 with your first cousin, and so forth.

Genomics has changed that. I can tell you for example that my son is ~20% identical by descent with one of his grandfathers. And, more surprisingly, he’s 18.9% identical by descent with one of his great-aunts! If expectation held his great-aunt should be 1/2 as related to him as his grandfather, but expectation did not hold. The figure above is from a review, Relatedness in the post-genomic era: is it still useful?:

Relatedness is a fundamental concept in genetics but is surprisingly hard to define in a rigorous yet useful way. Traditional relatedness coefficients specify expected genome sharing between individuals in pedigrees, but actual genome sharing can differ considerably from these expected values, which in any case vary according to the pedigree that happens to be available. Nowadays, we can measure genome sharing directly from genome-wide single-nucleotide polymorphism (SNP) data; however, there are many such measures in current use, and we lack good criteria for choosing among them. Here, we review SNP-based measures of relatedness and criteria for comparing them. We discuss how useful pedigree-based concepts remain today and highlight opportunities for further advances in quantitative genetics, with a focus on heritability estimation and phenotype prediction.

If you have academic access, you should read it. If you don’t, they seem to be proposing that we move beyond the confusing concept of identity by descent, and just think in terms of a coalescent framework. It does strike me that classical IBD-thinking is a historical contingency of genetics’ emergence in part in an age where pedigrees were very prevalent tools in interrogating patterns of inheritance. All for the good. But for non-geneticists I would suggest that these new methods which are able to pinpoint with fine precision patterns of genetic variation across pedigrees will allow us to explore in much more detail the nature of the heritability of many quantitative traits.

• Category: Science • Tags: Quantitative Genetics 
🔊 Listen RSS

Sir Francis Galton

Modern evolutionary genetics owes its origins to a series of intellectual debates around the turn of the 20th century. Much of this is outlined in Will Provines’ The Origins of Theoretical Population Genetics, though a biography of Francis Galton will do just as well. In short what happened is that during this period there were conflicts between the heirs of Charles Darwin as to the nature of inheritance (an issue Darwin left muddled from what I can tell). On the one side you had a young coterie around William Bateson, the champion of Gregor Mendel’s ideas about discrete and particulate inheritance via the abstraction of genes. Arrayed against them were the acolytes of Charles Darwin’s cousin Francis Galton, led by the mathematician Karl Pearson, and the biologist Walter Weldon. This school of “biometricians” focused on continuous characteristics and Darwinian gradualism, and are arguably the forerunners of quantitative genetics. There is some irony in their espousal of a “Galtonian” view, because Galton was himself not without sympathy for a discrete model of inheritance!

William Bateson

In the end science and truth won out. Young scholars trained in the biometric tradition repeatedly defected to the Mendelian camp (e.g. Charles Davenport). Eventually, R. A. Fisher, one of the founders of modern statistics and evolutionary biology, merged both traditions in his seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance. The intuition for why Mendelism does not undermine classical Darwinian theory is simple (granted, some of the original Mendelians did seem to believe that it was a violation!). Many discrete genes of moderate to small effect upon a trait can produce a continuous distribution via the central limit theorem. In fact classical genetic methods often had difficulty perceiving traits with more than half dozen significant loci as anything but quantitative and continuous (consider pigmentation, which we know through genomic methods to vary across populations mostly due to half a dozen segregating genes or so).

Notice here I have not said a word about DNA. That is because 40 years before the understanding that DNA was the substrate of genetic inheritance scientists had a good grasp of the nature of inheritance through Mendelian processes. The gene is fundamentally an abstract unit, an analytic element subject to manipulation which allows us to intelligibly trace and predict patterns of variation across the generations. It so happens that the gene is instantiated in a material sense through sequences of the biomolecule DNA. This is very important. Because we know the material basis of modern genetics it is a much more fundamental science than economics (economics remains mired in its “biometric age!”).

The “post-genomic era” is predicated on industrial scale analysis of the material basis of genetics in the form of DNA sequence and structure. But we shouldn’t confuse DNA, concrete bases, with classical Mendelism. A focus on the material and concrete is not limited to genetics. In the mid-2000s there was a fad for cognitive neuroscience fMRI studies, which were perceived to be more scientific and convincing than classical cognitive scientific understandings of “how the mind works.” In the wake of the recession of fMRI “science” due to serious methodological problems we’re left to fall back on less sexy psychological abstractions, which may not be as simply reduced to material comprehension, but which have the redeeming quality of being informative nonetheless.

This brings me to the recent paper on SNPs associated with education in a massive cohort, GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment. You should also read the accompanying FAQ. The bottom line is that the authors have convincingly identified three SNPs to explain 0.02% of the variation in educational attainment across their massive data set. Pooling all of the SNPs with some association they get ~2% of the variation explained. This is not particularly surprising. A few years back one of the authors on this paper wrote Most Reported Genetic Associations with General Intelligence Are Probably False Positives. Those with longer memories in human genetics warned me of this issue in the early 2000s. More statistically savvy friends began to warn me in 2007. At that point I began to caution people who assumed that genomics would reveal the variants which are responsible for normal variation on intelligence, because it seemed likely that we might have to wait a lot longer than I had anticipated. As suggested in the paper above previous work strongly implied that the genetic architecture of intelligence is one where the variation on the trait in the normal range is controlled by innumerable alleles of small effect segregating in the population. Otherwise classical genetic techniques may have been able to detect the number of loci with more surety. If you read Genetics of Human Populations you will note that using classical crossing techniques and pedigrees geneticists did in fact converge upon approximately the right number of loci segregating to explain the variation between European and African pigmentation 60 years ago!

Some of my friends have been arguing that the small effect sizes here validate the position that intelligence variation is mostly a function of environment. This is a complicated issue, and first I want to constrain the discussion to developed Western nations. It is an ironic aspect that arguably intelligence is most heritable among the most privileged. By heritable I mean the component of variation of the trait controlled by genes. When you remove environmental variation (i.e. deprivation) you are left with genetic variation. Within families there is a great deal of I.Q. difference across siblings. The correlation is about 0.5. Not bad, but not that high. Of course some of you may think that I’m going to talk about twin studies now. Not at all! Though contrary to what science journalists who seem to enjoy engaging in malpractice like Brian Palmer of Slate seem to think classical techniques have been to a great extent validated by genomics, it is by looking at unrelated individuals that some of the most persuasive evidence for the heritability of intelligence has been established. It is no coincidence that one of the major authors of the above study also is an author on the previous link. There is no contradiction in acknowledging difficulties of assessing the concrete material loci of a trait’s variation even if one can confidently infer that association. There was genetics before DNA. And there is heritability even without specific SNPs.

Additionally, I want to add one caveat into the “environmental” component of variation. For technical reasons this environmental component may actually include relatively fixed biological variables. Gene-gene interactions, or developmental stochasticity come to mind. Though these are difficult or impossible to predict from parent to offspring correlations they are not as simple as removing lead from the environment of deprived children. My own suspicion is that the large variation in intelligence across full siblings tell us a lot about the difficult to control and channel nature of “environmental” variation.

Finally, I want to point out that even small effect loci are not trivial. The authors mention this in their FAQ, but I want to be more clear, Small genetic effects do not preclude drug development:

Consider a trait like, say, cholesterol levels. Massive genome-wide association studies have been performed on this trait, identifying a large number of loci of small effect. One of these loci is HMGCR, coding for HMG-CoA reductase, an important molecule in cholesterol synthesis. The allele identified increases cholesterol levels by 0.1 standard deviations, meaning a genetic test would have essentially no ability to predict cholesterol levels. By the logic of the Newsweek piece, any drug targeted at HMGCR would have no chance of becoming a blockbuster.

Any doctor knows where I’m going with this: one of the best-selling groups of drugs in the world currently are statins, which inhibit the activity of (the gene product of) HMGCR. Of course, statins have already been invented, so this is something of a cherry-picked example, but my guess is that there are tens of additional examples like this waiting to be discovered in the wealth of genome-wide association study data. Figuring out which GWAS hits are promising drug targets will take time, effort, and a good deal of luck; in my opinion, this is the major lesson from Decode (which is not all that surprising a lesson)–drug development is really hard

Addendum: Most of my friends, who have undergraduate backgrounds in biology, and have taken at some quantitative genetics, seem to guess the heritability of I.Q. to be 0.0 to 0.20. This is just way too low. But is it even important to know this? I happen to think an accurate picture of genetic inheritance is probably useful when assessing prospects of mates….

Citation: Rietveld, Cornelius A., et al. “GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment.” Science (New York, NY) (2013).

🔊 Listen RSS

In part, genes. Luke Jostins reported this from a conference last year, so not too surprising. Evidence of widespread selection on standing variation in Europe at height-associated SNPs. Let me jump to the summary:

In summary, we have provided an empirical example of widespread weak selection on standing variation. We observed genetic differences using multiple populations from across Europe, thereby showing that the adult height differences across populations of European descent are not due entirely to environmental differences but rather are, at least partly, genetic differences arising from selection. Height differences across populations of non-European ancestries may also be genetic in origin, but potential nongenetic factors, such as differences in timing of secular trends, mean that this inference would need to be directly tested with genetic data in additional populations. By aggregating evidence of directionally consistent intra-European frequency differences over many individual height-increasing alleles, none of which has a clear signal of selection on its own, we observed a combined signature of widespread weak selection. However, we were not able to determine whether this differential weak selection (either positive or negative) favored increased height in Northern Europe, decreased height in Southern Europe or both. One possibility is that sexual selection or assortative mating (sexual selection for partners in similar height percentiles) fueled the selective process. It is also possible that selection is not acting on height per se but on a phenotype closely correlated with height or a combination of phenotypes that includes height.

Two points of note. First, simulations suggested that the genetic architecture is unlikely to be due to drift alone. In other words, natural selection. Selection on quantitative traits isn’t magic, there’s a whole agricultural industry based around this phenomenon. For the purposes of understanding human evolution the key is that we are now moving beyond looking for traits which emerged due to novel mutations (e.g., lactase persistence), and now trying to understand how selection and drift may work on standing variation. For example, humans have become smaller in overall size, and also in cranial capacity, over the past 10,000 years. Second, they validated their findings using a sibling cohort. This is something I always look for when people make inter-population inferences. A number of population wide correlations don’t pan out when you are looking within families. This matters in trying to understand causation.

• Category: Science • Tags: Height, Quantitative Genetics 
🔊 Listen RSS

According to the reader survey 88 percent said they understood what heritability was. But only 34 percent understood the concept of additive genetic variance. For the purposes of this weblog it highlights that most people don’t understand heritability, but rather heritability. The former is the technical definition of heritability which I use on this weblog, the latter is heritability in the colloquial sense of a synonym for inheritance, biological and cultural. Almost everyone who understands the technical definition of heritability will know what heritability in the ‘narrow sense’ is, often just informally termed heritability itself. It is the proportion of phenotype variability that can be attributed to additive genetic variation. Those who understand additive genetic variance and heritability in the survey were 32 percent of readers. If you understand heritability in the technical manner you have to understand additive genetic variance. This sets the floor for the number who truly understand the concept in the way I use on this weblog (I suspect some people who were exceedingly modest who basically understand the concept for ‘government purposes’ put themselves in the ‘maybe’ category’). After nearly 10 years of blogging (the first year or so of which I myself wasn’t totally clear on the issue!) that’s actually a pretty impressive proportion. You take what you can get.

• Category: Science • Tags: Heritability, Quantitative Genetics 
🔊 Listen RSS

That’s the question a commenter poses, albeit with skepticism. First, the background here. New England was a peculiar society for various demographic reasons. In the early 17th century there was a mass migration of Puritan Protestants from England to the colonies which later became New England because of their religious dissent from the manner in which the Stuart kings were changing the nature of the British Protestant church.* Famously, these colonies were themselves not aiming to allow for the flourishing of religious pluralism, with the exception of Rhode Island. New England maintained established state churches longer than other regions of the nation, down into the early decades of the 19th century.

Between 1630 and 1640 about ~20,000 English arrived on the northeastern fringe of British settlement in North America. With the rise of co-religionists to power in the mid-17th century a minority of these emigres engaged in reverse-migration. After the mid-17th century migration by and large ceased. Unlike the Southern colonies these settlements did not have the same opportunities for frontiersmen across a broad and ecological diverse hinterland, and its cultural mores were decidedly more constrained than the cosmopolitan Middle Atlantic. The growth in population in New England from the low tends of thousands to close to 1 million in the late 18th century was one of endogenous natural increase from the founding stock.

This high fertility regime persisted down into the middle of the 19th century, as the core New England region hit its Malthusian limit, and flooded over into upstate New York, to the irritation of the older Dutch population in that region. Eventually even New York was not enough, and New England swept out across much of the Old Northwest. The last became the “Yankee Empire,” founded by Yankees, but later demographically supplemented and superseded in its western reaches by immigrants from northwest Europe who shared many of the same biases toward order and moral probity which were the hallmarks of Yankees in the early Republic.

While the Yankees were waxing in numbers, and arguably cultural influence, the first decades of the American Republic also saw the waning of New England power and influence in relation to the South in the domain of politics. This led even to the aborted movement to secede from the union by the New England states in the first decade of the century. By the time of Andrew Jackson an ascendant Democrat configuration which aligned Southern uplanders and lowlanders with elements of the Middle Atlantic resistant to Yankee cultural pretension and demographic expansion would coalesce and dominate American politics down to the Civil War. It is illustrative that one of the prominent Northern figures in this alliance, President Martin Van Buren, was of Dutch New York background.

But this is a case where demographics was ultimate destiny. Not only were the Yankees fecund, but immigrants such as the German liberals fleeing the failures of the tumult of 1848 (e.g., Carl Schurz) were aligned with their anti-slavery enthusiasms (though they often took umbrage at the anti-alcohol stance of the Puritan moralists of the age, familiarizing the nation with beer in the 1840s). The Southern political ascendancy was simply not tenable in the face of Northern demographic robustness, fueled by both fertility and immigration. Because of overreach on the part of the Southern elite the segments of the Northern coalition which were opposed to the Yankees eventually fractured (Martin Van Buren allowed himself to be candidate for the anti-slavery Free Soil party at one point). Though there remained Northern Democrats down to the Civil War, often drawn from the “butternuts” whose ultimate origins were in the Border South, that period saw the shift in national politics from Democrat to Republican dominance (at least up the New Deal). Curiously, the coalition was an inversion of the earlier coalition, with Yankees now being integral constituents in a broader Northern and Midwestern movement, and Southerners being marginalized as the odd-men-out.

I review all this ethno-history because I think that to a great extent it is part of the “Dark Matter” of American political and social dynamics. Americans are known as “Yankees” to the rest of the world, and yet the reality is that the Yankee was one specific and very distinctive folkway on the American scene. But, that folkway has been very influential, often in a cryptic fashion.

Both Barack H. Obama and George W. Bush are not culturally identified as Yankees in a narrow sense. Obama is a self-identified black American who has adopted the Chicago’s South Side as his community. The South Side is home to black culture which descends from those who arrived at the terminus of their own Great Migration from the American South. George W. Bush fancies himself a West Texan and a cowboy. He was governor of Texas, and makes his residence in Dallas, while much of his young adulthood was spent in Midland. But the reality is that both of these men have Yankee antecedents. This is clear in Bush’s case. His father is a quintessential Connecticut Yankee. Bush is the product of Andover Academy, Yale, and Harvard (by and large thanks to family connections). Barack H. Obama is a different case entirely. His racial identity as a black American is salient, but he grew up in one of the far flung outposts of the Yankee Empire, Hawaii. But perhaps more curiously, many of his mother’s ancestors were clearly Yankees. Obama has a great-grandfather named Ralph Waldo Emerson Dunham.

Within and outside of the United States there is often a stereotype that white Americans are an amorphous whole, a uniform herrenvolk who oppressed the black minority. This ideology was actually to some extent at the heart of the dominance of the early Democratic party before the rise of the Republicans fractured the coalition along sectional lines. In many Northern states one saw populist Democrats revoking property rights which were race-blind with universal white male suffrage. But white Americans, and Anglo-Americans of British stock at that, were not one. That was clear by the 1850s at the latest. And they exhibit a substantial amount of cultural variation which remains relevant today.

New England in particular stands out over the long historical scale. In many ways of the all the colonies of Great Britain it was the most peculiar in its relationship to the metropole. Unlike Australia or Canada it was not an open frontier, rich with natural resources which could absorb the demographic surplus of Britain. Unlike India it was not a possible source of rents from teeming culturally alien subjects. Unlike the South in the mid-19th century there was no complementary trade relationship. In economic terms New England was a potential and incipient rival to Old England. In cultural and social terms it may have aped Old England, but its “low church” Protestant orientation made it a throwback, and out of step with a metropole which was becoming more comfortable with the English Magisterial Reformation (which eventually led to the emergence of Anglo-Catholicism in the 19th century). Like modern day Japan, and England of its day, New England had to generate wealth from its human capital, its own ingenuity. This resulted in an inevitable conflict with the mother country, whose niche it was attempting to occupy (albeit, with exceptions, such as the early 19th century, before the rise of robust indigenous industry, and the reliance on trade). Today the American republic has pushed England aside as the center of the Anglosphere. And despite the romantic allure of the frontier and the surfeit of natural resources, it is ultimately defined by the spirit of Yankee ingenuity (rivaled by the cowboy, whose violent individualist ethos seems straight out of the Scots-Irish folklore of the South, transposed to the West).

What does this have to do with genetics? Let’s go back to the initial colonial period. As I’ve noted before: the Yankee colonies of New England engaged in selective immigration policies. Not only did they draw Puritan dissenters, but they were biased toward nuclear family units of middling background. By “middling,” that probably refers at least toward the upper quarter of English society of the period. They were literate, with at least some value-added skills. This is in contrast with the Irish Catholic migration of the 19th century, which emptied out Ireland of its tenant peasants (attempts to turn these Irish into yeoman farmers in the Midwest failed, with fiascoes such as the consumption of their seed corn and cattle over harsh Minnesota winters).

So the question is this: could “middle class” values be heritable? Yes, to some extent they are. Almost all behavioral tendencies are heritable to some extent. Adoption studies are clear on that. But, is one generation of selection sufficient to result in a long term shift? First, let’s dismiss the possibility of random genetic drift and therefore a bottleneck. The one generation shift in allele frequencies due to drift is inversely proportional to effective population. If you assume that effective population is ~5,000, then the inverse of that is 0.0002. So you’d expect the allele frequency at any given locus shift by only a tiny fraction. So we have to look to selection.

Let’s do some quick “back of the envelop” calculations. We’ll use IQ as a proxy for a whole host of numbers because the numbers will at least be concrete, though the underlying logic of a quantitative continuous trait remains the same. First, the assumptions:

– Truncation selection on the trait which lops off the bottom 75 percent of the class distribution

– A correlation between the trait and genetic variation, so that you lop off the bottom 50 percent of the IQ distribution

– A heritability of IQ of 0.50

The top 50 percent of the IQ distribution has a median/mean IQ of ~110. Assuming 0.50 heritability implies half way regression back to the mean. Therefore, this model predicts that one generation of selection would entail a median IQ of 105 in the second generation, about 1/3 of a standard deviation above the norm in England.

Is this plausible, and could it result in the differences we see across American white ethnic groups? It is possible, but there are reasons to be skeptical. I think my guess of the top 25 percent of the class distribution is defensible from all I’ve read. But the correlation of this with IQ is probably going to be lower in the pre-modern era than today, where you have meritocratic institutions which channel people of different aptitudes. Second, the heritability of IQ was probably lower back then than now, because of wide environmental variance. Please note, I don’t dismiss the genetic explanation out of hand. Rather, this is a case where there are so many uncertainties that I’m not inclined to say much more than that it is possible, and that we may have an answer in the coming decades with widespread genomic sequencing.

But there’s another option, which is on the face of it is more easy to take in because so many of the parameters are well known and have been thoroughly examined. And that’s cultural selection. While we have to guess at the IQ distributions of the early Puritans, we know about the distribution of their cultural tendencies. They were almost all Calvinists, disproportionately literate. Because of its flexible nature culture can generate enormous inter-group differences in phenotypic variation. The genetic difference between New England and Virginia may have been small, but the cultural difference was wide (e.g., Yankee thrift vs. Cavalier generosity). Yankees who relocated to the South would assimilate Southern values, and the reverse (there is some suggestion that South Carolinian John C. Calhoun’s Unitarianism may have been influenced by his time at Yale, though overall it was obviously acceptable to the Deist inclined Southern elite of the period).

Before New England human societies had an expectation that there would be a literate segment, and an illiterate one. By and large the substantial majority would be illiterate. In the Bronze Age world the scribal castes had almost a magic power by virtue of their mastery of the abstruse cuneiform and hieroglyph scripts. The rise of the alphabet (outside of East Asia) made literacy more accessible, but it seems likely that the majority of ancient populations, even in literary capitals such as Athens, were functionally illiterate. A small minority was sufficient for the production, dissemination, and propagation of literary works. Many ancient books were written with the ultimate understanding that their wider “reading” was going to occur in public forums where crowds gathered to listen to a reader. The printing press changed this with the possibility for at least nominal ownership of books by those with marginal surplus, the middle class. By limiting migration to these elements with the means to buy books, as well as an emphasis on reading the Bible common to scriptural Protestants, you had a society where the majority could be readers in the public forum.

What were the positive cultural feedback loops generated? And what sort of cultural dampeners may have allowed for the new stable cultural equilibrium to persist down the centuries? These are open questions, but they need to be explored. I’ll leave you with a map of public school expenditures in 2003. In the 1840s and 1850s one of the more notable aspects of the opening of the Western frontier with the huge difference between states settled by Yankees, such as Michigan, and those settled by Southerners, such as Arkansas. Both states were settled contemporaneously, but while Michigan had numerous grammar schools, Arkansas had hardly any….

* British Protestantism has shifted several times from a more “Catholic” to “Radical Protestant” direction. Its peak in officially sanctioned Radical Protestantism was probably during the reign of Edward VI, decades before the Stuart kings (the exception being the republic)

• Category: History, Science • Tags: Blog, Culture, Genetics, Quantitative Genetics 
🔊 Listen RSS

The Pith: Even traits where most of the variation you see around you is controlled by genes still exhibit a lot of variation within families. That’s why there are siblings of very different heights or intellectual aptitudes.

In a post below I played fast and loose with the term correlation and caused some confusion. Correlation is obviously a set of precise statistical terms, but it also has a colloquial connotation. Additionally, I regularly talk about heritability. Heritability is in short the proportion of phenotypic variance which can be explained by genetic variance. In other words, if heritability is ~1 almost all the variation in the trait is due to variation in genes, while if heritability is ~0 almost none of it is. Correlation and heritability of traits across generations are obviously related, but they’re not the same.

This post is to clarify a few of these confusions, and sharpen some intuitions. Or perhaps more accurately, banish them.


The plot above shows relationship between heights of fathers and heights of sons in standard deviation units (yes, I removed some of the values!). You see that the slope is ~0.45, and that’s the correlation. At this point you probably know that heritability of height is on the order of 0.8-0.9. So why is the correlation so low? A simple biological reason is that you don’t know the value of the mothers. If the parents are not strongly correlated (assortative mating) obviously the values of the sons is going to diverge from that of the father. That being said, you probably notice that the correlation here is about 1/2 that of the heritability you know has been confirmed in the literature. That’s no coincidence. One way to estimate heritability is to take the slope of the plot of offspring vs. parents, and multiply that by 2. Therefore, the correlation (which equals the slope) is 1/2 × h2, where h2 represents heritability.

Correlation (parent to offspring) = 1/2 × h2

1/2 turns out to be the coefficient of relatedness of a parent to offspring. I’ll spare you the algebra, but suffice it to say that this is not a coincide. Where r = coefficient of relatedness the correlation between sets of relatives on a trait value is predicted to be:

Correlation (relative to relative) = r × h2

Where r is simply the coefficient of relatedness across the pair of relatives. Here are some values:

r relationship
0.5 (½) parent-offspring
0.25 (¼) grandparent-grandchild
1 identical twins; clones
0.5 (½) full siblings
0.25 (¼) half siblings
0.125 (⅛) first cousins

Here’s the kicker: the correlation coefficient of the midparent value and the offspring value does not equal the slope of the line of best fit. This is why I had second thoughts about using the term “correlation” so freely, and then switching to heritability. The formula is:

Correlation (midparent to offspring) = 1/√2 × h2

So the correlation of midparent to offspring is 0.71 × heritability.

Why is this something you might want to know? I think people are sometimes confused about how an extremely heritable trait, like height, where you’re given heritability values of 0.90, still yields families with such a wide range of heights. Well, recall that the coefficient of relatedness among siblings is 1/2. So their correlation is going to be the same as with parents. Therefore, the magnitude will be half that of the heritability. A correlation of 0.45 is not small, but neither is it extremely tight. The histogram below illustrates this with the above data set. The values are simply the real difference between fathers and sons:

• Category: Science • Tags: Correlation, Height, Quantitative Genetics 
🔊 Listen RSS

In earlier discussions I’ve been skeptical of the idea of “designer babies” for many traits which we may find of interest in terms of selection. For example, intelligence and height. Why? Because variation on these traits seems highly polygenic and widely distributed across the genome. Unlike cystic fibrosis (Mendelian recessive) or blue eye color (quasi-Mendelian recessive) you can’t just focus on one genomic region and then make a prediction about phenotype with a high degree of certainty. Rather, you need to know thousands and thousands of genetic variants, and we just don’t know them.

But I just realized one way that genomics might make it a little easier even without this specific information.

The method relies on the phenotypic correlation between relatives. Even before genomics, and genetics, biometricians could generate rough & ready predictions about phenotypic values based on parental values. The extent of the predictive power depends upon the heritability of the trait. A trait like height is ~80-90% heritable. That means that ~80-90% of the variation in the population of the trait is due to genes. The expected value of your height is strongly conditional upon the heights of your parents.

That’s all common sense. What does this have to do with genomics? Simple. You are 50% identical by descent with each parent. That means half your gene copies come from your mother and half from your father. You can’t change that unless you’re a clone. But, because of the law of segregation and recombination you are not necessarily 25% identical by descent from each grandparent! The expectation is that you’re coefficient of relatedness is 25%, but there is variation around this. A given parent either contributes their own paternal or maternal homologous chromosome. There’s a 50% chance that you’re going to inherit one or the other across your chromosomes, of independent probability. You have 22 autosomal chromosome pairs (non-sex chromosomes), so there’s a strong chance that you won’t be equally balanced between your opposite sex paternal and maternal grandparents (e.g., you have more genes identical by descent from your paternal grandfather than paternal grandmother).* Second, recombination is also going to generate new combinations. In the generation we’re concerned about this will work against the dynamic we’re relying on, by swapping segments across homologous chromosomes from the parents’ mother or father.

The ultimate logic here is to select for zygotes or gametes which are biased toward the grandparents with phenotypic values which you are interested in. To give a concrete example, if you have a parent who is moderately tall, whose own father was very tall, while the mother was somewhat short, and you want the tallest possible child, you’ll want to select zygotes with the most gene content identical by descent with the tall grandparent. The point isn’t to pick specific genetic variants, you don’t need to know that. All you know is that the tall grandfather probably had genes which resulted in a predisposition toward being tall. So just make sure that the grandchild has as much of that grandparent “in them.”

I still don’t know if this is going to be cost effective in the near term. But I began to think of it because in the near future I’ll be checking the genotype of a child who has a full pedigree of 1,000,000 SNPs of their parents and grandparents.

* Modeling it as a binomial, about 1 in 7 cases will have the expected 11 chromosomes from a focal grandparent. The standard deviation is more than 2 chromosomes. You need to have about 100 zygotes to expect to get any individuals who are 5 chromosomal units away from the expected value (i.e., the individual is 10-15% instead of 25% one grandparent, or 35-40%). Obviously you need more to be assured of getting zygotes of that value. And I neglected recombination, which would work against this, by swapping genomic regions….

🔊 Listen RSS

In response to comments and queries below I’ve been poking around for more experimental material on quantitative genetics, and in particular the breeder’s equation. That’s how I stumbled upon this very interesting and informative obituary of D. S. Falconer in Genetics. It reviews not only the biographical details of Falconer’s life, but much of his science. It’s free to all now, so I highly recommend it! (as well as Introduction to Quantitative Genetics, which is quite pricey right now, but just keep watching, I recall getting a relatively cheap copy of the 1996 edition) Curiously, quantitative genetics is rather unknown to the general public in comparison to the biophysical sexiness of molecular genetics, but in most ways it’s the much better complement to the “folk genetics” which often crops up in our day to day life (e.g., “why is so-and-so’s son so short when so-and-so is so tall”). DNA illuminates the discontinuities of Mendelian inheritance, often in the gloomy realm of disease, but quantitative genetics sheds light on the continuities and variations we see across the generations.

• Category: Science • Tags: Quantitative Genetics 
🔊 Listen RSS

In the comments below a reader asks about the empirical difference in heights between siblings. I went looking…and I have to say that the data isn’t that easy to find, people are more interested in the deeper inferences on can make from the resemblances than the descriptive first-order data itself. But here’s one source I found:

Average difference Identical twins Identical twins raised apart Full siblings
Height, inches 0.67 0.71 1.8
Weight, pounds 4.2 9.9 10.4
IQ 5.9 8.2 9.8

These data indicate that IQ and height variation among sibling cohorts is on the order of ~2/3rd to 3/4th of the variation that one can find within the general population (my estimate of standard deviation of 2.5 inches for height below is about right, if a slight underestimate according to the latest data). But I also found a paper with more detailed statistics.

The aim of the paper was to find outliers from expectation. In other words, which siblings diverged a lot from what you’d expect in terms of normal variation within the cohort? In the process they do report some statistics on inter-sibling variation. The correlation of height between siblings after correcting for age and sex are 0.43. This is what I’ve seen in the literature. Next, the standard deviation is 6.7 centimeters. This is about ~2.7 inches. The average phenotypic difference between siblings was about 7.2 centimeters (D). Therefore, to a first approximation the recapitulation of population-wide variation in a continuous quantitative trait within sibling cohorts seems to hold. Though I’d be curious if readers can provide better and more diverse sources.

• Category: Science • Tags: Height, Quantitative Genetics 
🔊 Listen RSS

Kobe Bryant is an exceptional professional basketball player. His father was a “journeyman”. Similarly, Barry Bonds and Ken Griffey Jr. both surpassed their fathers as baseball players. Both of Archie Manning’s sons are superior quarterbacks in relation to their father. This is not entirely surprising. Though there is a correlation between parent and offspring in their traits, that correlation is imperfect.

Note though that I put journeyman in quotes above because any success at the professional level in major league athletics indicates an extremely high level of talent and focus. Kobe Bryant’s father was among the top 500 best basketball players of his age. His son is among the top 10. This is a large realized difference in professional athletics, but across the whole distribution of people playing basketball at any given time it is not so great of a difference.

What is more curious is how this related to the reality of regression toward the mean. This is a very general statistical concept, but for our purposes we’re curious about its application in quantitative genetics. People often misunderstand the idea from what I can tell, and treat it as if there is an orthogenetic-like tendency of generations to regress back toward some idealized value.

Going back to the basketball example: Michael Jordan, the greatest basketball player in the history of the professional game, has two sons who are modest talents at best. The probability that either will make it to a professional league seems low, a reality acknowledged by one of them. In fact, from what I recall both received special attention and consideration because they were Michael Jordan’s sons. It is still noteworthy of course that both had the talent to make it onto a roster of a Division I NCAA team. This is not typical for any young man walking off the street. But the range in realized talent here is notable. Similarly, Joe Montana’s son has been bouncing around college football teams to find a roster spot. Again, it suggests a very high level of talent to be able to plausibly join a roster of a Division I football team. But for every Kobe Bryant there are many, many, Nate Montanas. There have been enough generations of professional athletes in the United States to illustrate regression toward the mean.

So how does it work? A few years ago a friend told me that the best way to think about it was a bivariate distribution, where the two random variables are additive genetic variation and environmental genetic variation. Clearer? For many, probably not. To make it concrete, let’s go back to the old standby: the quantitative genetics of height.

For height in developed societies we know that ~80% of the variation of the trait in the population can be explained by variation of genes in the population. That is, the heritability of the trait is 0.80. This means that the correspondence between parents and offspring on this trait is rather high. Having tall or short parents is a decent predictor of having tall or short offspring. But the heritability is imperfect. There is a random “environmental” component of variation. I put environmental in quotations because that really just means it’s a random noise effect which we can’t capture in the additive or dominance components (this sort of thing may be why homosexual orientation in individuals is mostly biologically rooted, even if its population-wide heritability is modest). It could be biological, such as developmental stochasticity, or gene-gene interactions. The point is that this is the component which adds an element of randomness to our ability to predict the outcomes of offspring from parents. It is the darkening of the mirror of our perceptions.

Going back to height, the plot to the left shows an idealized normal distribution of height for males. I set the mean as 70 inches, or 5 feet 10 inches. The standard deviation is 2.5, which means that if you randomly sampled any two males from the dataset the most likely value of the difference would be 2.5 inches which is just the average deviation from the mean (it’s a measure of dispersion). Obviously the height of a male is dependent upon the height of a father, but the mother matters as well (perhaps more due to maternal effects!). Here we have to note that there’s clearly a sex difference in height. How do you handle this problem? Actually, that’s easy. Just convert the heights of the parents to sex-controlled standard deviation units. For example, if you are 5 feet and 7.5 inches as a male you are 1 standard deviation unit below the mean. If you are a female at the same height you are 1.4 standard deviation units above the mean (assuming female mean height of 5 feet and 4 inches, and standard deviation of 2.5 inches). If height was nearly ~100% heritable you’d just average the two parental values in standard deviation units to get the expectation of the offspring in standard deviation units. In this case, the offspring should be 0.2 standard deviation units above the mean.

But height is not ~100% heritable. There is an environmental component of variation which isn’t accounted for by the parental genotypic values (at least the ones with effects of interest to us, the additive components). If height is ~80% heritable then you’d expect the offspring to regress 1/5th of the way back to the population mean. For the example above, the expectation of the offspring would be 0.16 standard deviation units, not 0.20.

Let’s make this more concrete. Imagine you sampled a large number of couples whose midparent phenotypic value is 0.20 standard deviation units above the mean in height. This means that if you convert the father and mother into standard deviation units, their average is 0.20. So one pair could be 0.20 and 0.20, and another could be of someone 2.0 and -1.6 standard deviation units. What’s the expected distribution of male offspring height?

The relevant points:

1) The midparent value naturally is constrained to have no variance (though as I indicate above since it’s an average the selected parents may have a wide variance)

2) The male offspring are somewhat above the average population in distribution of height

3) It remains a distribution. The expected value of the offspring is a specific value, but environmental and genetic variation remains to produce a range of outcomes (e.g., Mendelian segregation and recombination)

4) There has been some regression back to the population mean

I only displayed the males. There are obviously going to be females among the offspring generation. What would the outcome be if you mated the females with the males? Recall that the female heights would exhibit the same mean, 0.16 units above the original population mean. This is where many people get confused (frankly, those whose intelligence is somewhat closer to the mean!). They presume that a subsequent generation of mating would result in further regression back to the mean. No! Rather, the expected value of the offspring would be 0.16 units. Why?

Because through the process of selection you’ve created a new genetic population. The selection process is imperfect in ascertaining the exact causal underpinning of the trait value of a given individual. In other words, because height is imperfectly heritable some of the tall individuals you select are going to be tall for environmental reasons, and will not pass that trait to heir offspring. But height is ~80% heritable, which means that the filtering process of genes by using phenotype is going to be rather good, and the genetic makeup of the subsequent population will be somewhat deviated from the original parental population. In other words, the reference population to which individuals “regress” has now changed. The environmental variation remains, but the additive genetic component around which the regression is anchored is now no longer the same.

This is why I state that regression toward the mean is not magical in a biological sense. There is no population with fixed traits to which selected individuals naturally regress or revert to. Rather, populations are useful abstractions in making sense of the statistical correlations we see around us. The process of selection is informed by population-wide trends, so we need to bracket a set of individuals as a population. But what we really care about are the genetic variables which underpin the variation across the population. And those variables can change rather easily through selection. Obviously regression toward the mean would be exhibit the magical reversion-toward-ideal-type property that some imagine if the variables were static and unchanging. But if this was the matter of things, then evolution by natural selection would never occur!

Therefore, in quantitative genetics regression toward the mean is a useful dynamic, a heuristic which allows us to make general predictions. But we shouldn’t forget that it’s really driven by biological processes. Many of the confusions which I see people engage in when talking about the dynamic seem to be rooted in the fact that individuals forget the biology, and adhere to the principle as if it is an unthinking mantra.

And that is why there is a flip side: even though the offspring of exceptional individuals are likely to regress back toward the mean, they are also much more likely to be even more exceptional than the parents than any random individual off the street! Let’s go back to height to make it concrete. Kobe Bryant is 6 feet 6 inches tall. His father is 6 feet 9 inches. I don’t know his mother’s height, but her brother was a basketball player whose height is 6 feet 2 inches. Let’s use him as a proxy for her (they’re siblings, so not totally inappropriate), and convert everyone to standard deviation units.

Kobe’s father: 4.4 units above mean

Kobe: 3.2 units above mean

Kobe’s mother: 1.6 units above the mean

Using the values above the expected value for the offspring of Kobe’s father & mother is a child 2.4 units above the mean. Kobe is somewhat above the expected value (assuming that Kobe’s mother is a taller than average woman, which seems likely from photographs). But here’s the important point: his odds of being this height are much higher with the parents he has than with any random parents. Using a perfect normal distribution (this is somewhat distorted by “fat-tailing”) the odds of an individual being Kobe’s height are around 1 in 1,500. But with his parents the odds that he’d be his height are closer to 1 out of 5. In other words, Kobe’s parentage increased the odds of his being 6 feet 6 inches by a factor of 300! The odds were still against him, but the die was loaded in his direction in a relative sense. By analogy, in the near future we’ll see many more children of professional athletes become professional athletes both due to nature and nurture. But, we’ll continue to see that most of the children of professional athletes will not have the requisite talent to become professional athletes.

Image Credit: Wikipedia

🔊 Listen RSS

I have discussed the reality that many areas of psychology are susceptible enough to false positives that the ideological preferences of the researchers come to the fore. CBC Radio contacted me after that post, and I asked them to consider that in 1960 psychologists discussed the behavior of homosexuality as if it was a pathology. Is homosexuality no longer a pathology, or have we as a society changed our definitions? In any given discipline when confronted with the specter of false positives which happen to meet statistical significance there is the natural tendency to align the outcome so that it is socially and professionally optimized. That is, the results support your own ideological preferences, and, they reinforce your own career aspirations. Publishing preferred positive results furthers both these ends, even if at the end of the day many researchers may understand on a deep level the likelihood that a specific set of published results are not robust.

This issue is not endemic to social sciences alone. I have already admitted this issue in medical sciences, where there is a lot of money at stake. But it crops up in more theoretical biology as well. In the early 20th century Charles Davenport’s research which suggested the inferiority of hybrids between human races was in keeping with the ideological preferences of the era. In our age Armand Leroi extols the beauty of hybrids, who have masked their genetic load through heterozygosity (a nations like Britain which once had a public norm against ‘mongrelization’ now promote racial intermarriage in the dominant media!). There are a priori biological rationales for both positions, hybrid breakdown and vigor (for humans from what I have heard and seen there seems to be very little evidence overall for either once you control for the deleterious consequences of inbreeding). In 1900 and in 2000 there are very different and opposing social preferences on this issue (as opposed to individual preferences). The empirical distribution of outcomes will vary in any given set of cases, so researchers are incentivized to seek the results which align well with social expectations. (here’s an example of heightened fatality due to mixing genetic backgrounds; it seems the exception rather than the rule).

Thinking about all this made me reread James F. Crow’s Unequal by nature: a geneticist’s perspective on human differences. Crow is arguably the most eminent living population geneticist (see my interview from 2006). Born in 1916, he has seen much come and go. For those of us who wonder how anyone could accept ideas which seem shocking or unbelievable today, I suspect Crow could give an answer. He was there. In any case, on an editorial note I think the essay should have been titled “Different by nature.” Inequality tends to connote a rank order of superiority or inferiority, though in the context of the essay the title is obviously accurate. Here is the most important section:

Two populations may have a large overlap and differ only slightly in their means. Still, the most outstanding individuals will tend to come from the population with the higher mean. The implication, I think, is clear: whenever an institution or society singles out individuals who are exceptional or outstanding in some way, racial differences will become more apparent. That fact may be uncomfortable, but there is no way around it.

The fact that racial differences exist does not, of course, explain their origin. The cause of the observed differences may be genetic. But it may also be environmental, the result of diet, or family structure, or schooling, or any number of other possible biological and social factors.

My conclusion, to repeat, is that whenever a society singles out individuals who are outstanding or unusual in any way, the statistical contrast between means and extremes comes to the fore. I think that recognizing this can eventually only help politicians and social policymakers.

You can, and should, read the whole thing. Let’s make it concrete. Imagine the following trait with two distributions (i.e., two populations):

– Mean = 100 and 105 (average value)
– Standard deviation = 15 (measure of dispersion)
– Let’s assume a normal distribution

Let’s plot the two distributions:

Observe the close overlap between the two distributions. Most of the variance occurs within both sets of populations. Now let’s impose a cut-off of about ~130 on the curves:

Now the similarity between the two curves is not as striking. As you move to the tails of the distribution they begin to diverge. In other words, the average of the two populations is pretty much interchangeable, but the values at the tails differ. Now let’s move the cut-off to 145:

The difference is now even more stark. Let’s compare the ratios of the area under the curve for the two populations as defined by the cut-offs:

Value at 100 = 1.26 (any given individual in the blue population is 1.26 times more likely to be above 100 than in the red population)
Value at 130 = 1.83
Value at 145 = 28

A major caveat: quantitative traits are only approximately normally distributed, and there tends to be a “fat tail” dynamic, where deviation from the normal increases as one moves away from the mean. Concretely, this means that the ratios at the tails are probably not quite as extreme, as there are more individuals in all populations at the tails than you’d expect.

What does this entail concretely? As Crow noted above if you sample from the tails of the distribution then very modest differences between groups become rather salient. Consider long distance running. To be successful in international competitions one presumably has to be many, many, standard deviations above the norm. One can’t be a 1 out of 100, or 1 out of 1,000. Rather, presumably one should be 1 out of hundreds of thousands, at a minimum. This would be the fastest ~100,000 or so people in the world (out of 7 billion). With this in mind, we should not be surprised a priori at the success of the Kalenjin people of Kenya in this domain. They may have both the biological and social preconditions which allow their distribution of talent to be moderately above that of the human norm. Even a marginal shift can make a huge difference at the tails. 1 out of 100,000 is 4.26 deviation units above the mean. Increasing the mean of a population by half a standard deviation units (e.g., if 100 is the mean, 15 is the standard deviation, then for the population with the higher mean you’d be at 107.5) results in a disproportion in ratio of above 8:1 at 4.26 units (as measured in the first population). This is modest, about 1 order of magnitude, but consider possible gene-environment correlations and synergies that might ensue when you have a critical mass of very fast individuals. This could amplify the effect of a difference in distributions on a single variate (more importantly I suspect, consider that virtuosity in many domains requires an intersection of aptitudes many units deviated from the norm across many traits).

In the early 2000s James F. Crow was responding to the Human Genome Project. As has been thoroughly covered elsewhere human genomics has probably underwhelmed in terms of outcomes 10 years out. But it is often the case that with new technologies we overestimate the short-term change which they will effect and underestimate their long-term consequences. I believe with the rise of mass genomics, a radical increase in population coverage and full genome sequencing, we may finally start to adduce the underpinnings of quantitative traits. We already have indirect methods, but I believe that by 2020 we will have direct means at our disposal. We’ll have a good sense how deeply humans are commensurable on a population genetic level. I doubt it will change much in our values, but it may entail some rhetorical adjustments.

🔊 Listen RSS

ResearchBlogging.orgThe Pith: When it comes to the final outcome of a largely biologically specified trait like human height it looks as if it isn’t just the genes your parents give you that matters. Rather, the relationship of their genes also counts. The more dissimilar they are genetically, the taller you are likely to be (all things equal).

Dienekes points me to an interesting new paper in the American Journal of Physical Anthropology, Isolation by distance between spouses and its effect on children’s growth in height. The results are rather straightforward: the greater the distance between the origin of one’s parents, the taller one is likely to be, especially in the case of males. These findings were robust even after controlling for confounds such as socioeconomic status. Their explanation? Heterosis, whether through heterozygote advantage or the masking of recessive deleterious alleles.

The paper is short and sweet, but first one has to keep in mind the long history of this sort of research in the murky domain of human quantitative genetics. This is not a straight-forward molecular genetic paper where there’s a laser-like focus on one locus, and the mechanistic issues are clear and distinct. We are talking about a quantitative continuous trait, height, and how it varies within the population. We are also using geographical distance as a proxy for genetic distance. Finally, when it comes to the parameters affecting these quantitative traits there are a host of confounds, some of which are addressed in this paper. In other words, there’s no simple solution to the fact that nature can be quite the tangle, more so in some cases than others.

Because of the necessity for subtlety in this sort of statistical genetic work one must always be careful about taking results at face value. From what I can gather the history of topics such as heterosis in human genetics is always fraught with normative import. The founder of Cold Spring Harbor Laboratory, Charles Davenport, studied the outcomes of individuals who were a product of varied matings in relation to genetic distance in the early 1920s. This was summed up in his book Race Crossing in Jamaica:

A quantitative study of 3 groups of agricultural Jamaican adults: Blacks, Whites, and hybrids between them; also of several hundred children at all developmental stages. The studies are morphological, physiological, psychological, developmental and eugenical. The variability of each race and sex in respect to each bodily dimension and many basis vary just as morphological traits do. In some sensory tests the Blacks are superior to Whites; in some intellectual tests the reverse is found. A portion of the hybrids are mentally inferior to the Blacks. The negro child has, apparently, from birth on, different physical proportions than the white child.

Because of the fears of miscegenation in the early 20th century scholars had a strong bias toward finding the data to confirm the assumption that admixture between divergent human kinds resulted in a breakdown and depression in trait value in relation to both parental lineages. Today this is not so. Rather, I would argue that the bias is now in the opposite direction, at least in the West. My friend Armand Leroi wrote Meet the world’s most perfect mutant seven years ago. Who is the most perfect human according to Armand? She is Saira Mohan, a model of Indian, Irish and French ancestry. Armand concludes:

If deleterious mutations rob us of it, they should do so with particular efficacy if we marry our relatives. Most novel mutations are at least partly recessive, and inbreeding should accentuate their negative effects. Many weird genetic disorders come from Pakistan and Saudi Arabia, where there is a strong tradition of first-cousin marriage.

Conversely, people of mixed ancestry should show the benefits of concealing recessive mutations. And this, I suspect, is the true meaning of Saira Mohan: half Punjabi, quarter Irish, quarter French and altogether delightful. She, too, is a mutant – but a little less so than most of us.

Thandie Newton masking recessive alleles

This is entirely in keeping with the dominant ethos of the global elite, which aims for a panmixia of genes in concert with an alignment of a particular set of cosmopolitan post-materialist memes. But, as I pointed out to Armand there are also cases where crosses between genetic backgrounds may have deleterious consequences. For example, a European specific allele in African Americans may have a negative fitness interaction with the predominant African genetic background of this population. I am not implying here that science is fiction, a construction of our biases and preconceptions. But the dominant cultural narrative framework does put pressure upon how we interpret science, and all the more so in domains which require a level of statistical subtlety and personal candor.

Of course now that we can see exactly how individuals are mutant at the level of the genome Armand’s supposition can actually be tested. That is, we can see how many deleterious recessive alleles are in fact masked in people of hybrid origin. That at least may plug one of the fuzzy spots in our picture of how genetic backgrounds interact in humans.

I prefaced the review of a paper on marital distance and height with some history of science and a reflection of how contemporary values influence the generation and interpretation of knowledge because there’s a lot of confusing material in the literature on correlations between genetic distance and trait value. There is the result that marriages between 3rd cousins seem the most fertile in Iceland. Is this because of a balance between genetic incompatibilities and expression of recessive diseases? Or perhaps the answer lies in social dynamics, insofar as people who come from related lineages are more likely to weather difficult times in their relationship? It’s one study from Iceland. But of course the minority who vociferously argue against racial amalgamation and admixture on moral/normative grounds will focus upon this specific positive empirical finding in the literature. Now, Iceland is ideal for many human genetic studies because it has excellent records and is culturally homogeneous. But at the end of the day Iceland is still Iceland.

And today Poland is still Poland. I say that because this study tracks thousands of Polish youth over the years. Here’s the abstract:

Heterosis is thought to be an important contributor to human growth and development. Marital distance (distance between parental birthplaces) is commonly considered as a factor favoring the occurrence of heterosis and can be used as a proximate measure of its level. The aim of this study is to assess the net effect of expected heterosis resulting from marital migration on the height of offspring, controlling for midparental height and socioeconomic status (SES). Height measurements on 2,675 boys and 2,603 girls ages 6 to 18 years from Ostrowiec Świętokrzyski, Poland were analyzed along with sociodemographic data from their parents. Midparental height was calculated as the average of the reported heights of the parents. Analyses revealed that marital distance, midparental height, and SES had a significant effect on height in boys and girls. The net effect of marital distance was much more marked in boys than girls, whereas other factors showed comparable effects. Marital distance appears to be an independent and important factor influencing the height of offspring. According to the “isolation by distance” hypothesis, greater distance between parental birthplaces may increase heterozygosity, potentially promoting heterosis. We propose that these conditions may result in reduced metabolic costs of growth among the heterozygous individuals.

As you may know, height is substantially heritable. That means that ~80-90% of the variation in the trait within the population in developed nations is due to variation in genes. This has some validity even within families. Tall parents tend to given rise to tall offspring, though there is a variation around the expectation. In other words, siblings differ in height, in part because of environmental factors, but also in part because siblings differ in their genetic endowments from their parents. So naively one can model this like so:

Height ~ Genetic endowment + Environmental contingencies

The genetic endowment is a function of the mid-parent value in standard deviation units. That means you average the standard deviations of the parents from the sex-controlled mean. Let’s give a concrete example. Imagine a male who is 5’8 inches, and a female who is 5’7 inches. The standard deviation for height is ~3 inches, with the American male mean being 5’10 inches and female being 5’4 inches. That means that the male is -2/3 standard deviations below the mean, and the female is 1 standard deviation above the mean. The expectation for their offspring then will be 1/3 standard deviation above the mean (5’11 for males, 5’5 for females). But because of the variation in the nature of genetics and environment, there’s actually going to be a standard deviation of ~3 inches for the offspring (e.g., ~70% chance that the male will be between 5’8 and 6’2). There is also the reality that because environmental factors aren’t heritable the offspring should regress somewhat back to the population mean all things equal, though in the case of height not too much because it is so genetically influenced.

A few years ago I played this game with libertarian pundits Megan McArdle and Peter Suderman, who announced their engagement. Megan and Peter are both 6’2. I estimated that the expected value is that any son of theirs would be 6 feet 3.6 inches, and any daughter 5 feet 9.6 inches. How can it be that their sons should be taller than either of them? Remember that Megan is much taller than Peter in standard deviation units in relation to her sex.

Now how would expectation be altered if Megan McArdle and Peter Suderman were full-siblings? (they are not full-siblings, this is a thought experiment!) At this point even if you had never taken college genetics you might be wondering whether it makes sense to calculate an expectation for the height of the offspring of two full-siblings. You know very well that there are much more serious genetic issues at hand. Going back to the relation above, you might update it like so:

Height ~ Genetic endowment + Environmental contingencies – Incest decrement

Even stipulating viability of the offspring, any child of full-siblings would exhibit all the problems that Armand alludes to above. It seems likely that whatever potential their parents might impart to their offspring, the combination of their genotypes would be highly deleterious, because near kin carry the same recessives. The paper above posits the inverse effect, where outbreeding results in greater outcomes than are to be expected based on the mid-parent trait value. In this telling, height is a proxy for health and development. This seems biologically plausible in the case of humans. Individuals who marry those genetically dissimilar impart gains of fitness to their offspring by virtue of elevated heterozygosity. So now we create a new relation:

Height ~ Genetic endowment + Environmental contingencies + Magnitude of outbreeding

In pre-modern societies individuals tended to marry those close to them geographically. Even if cousin marriage was not normally practiced, over time clusters of villages would form networks of de facto consanguinity. In the 19th and especially 20th century much of this in the extreme cases abated in Europe because of better transport. L. L. Cavalli-Sforza documented this in Consanguinity, Inbreeding, and Genetic Drift in Italy. Modern roads resulted in a radical drop in inbreeding in mountainous regions of the country. Some researchers have argued that this shift resulted in an increased level of height, intelligence, and health, among European populations.

With that, here’s a nice map from

Going back to the paper, after controlling for socioeconomic status they found that:

1) The increased marital distance predicts taller height than expected, especially in boys.

2) This effect is most noticeable in boys who already have parents who are relatively tall.

3) Finally, greater marital distance seems to be correlated with greater height in the parents!

The last is actually a possible reason why there’s no reason to appeal to heterosis at all. This might simply be a function of assortative mating of tall individuals who are more mobile. In the paper the authors go at length about sexual selection, greater mobility of individuals who are taller, etc. But whatever the reason, this shows exactly the care which must be taken with these sorts of results. It is known for example that taller individuals seem to have higher I.Q.s, leading some to assert that the genes which control height and I.Q. variance must be the same (some of them almost certainly are if there are many loci of small effect). But, it turns out that this height-I.Q. correlation disappears within families (tall siblings are no smarter than short siblings), implying that the correlation might be a function of assortative mating.

As for why there may be a sex difference, the authors suggest that heterosis may manifest at different points in the developmental arc of children. Females mature somewhat faster than males. This may be so, the sexes differ and such. But my own preference is that the original results merit a deeper and expanded examination before we posit an evolutionary story (that’s not possible in a scientific paper which needs a discussion, but I’m proposing an ideal world of knowledge generation and refinement!). The empirics need to be firmed up before we scaffold it in theory. Poland is Poland, and if you troll through enough data sets there’ll be millions of correlations which are publishable. And yet we are living in the age of information, so we had better get going in sieving through it. At the end of the paper the authors go in a direction which I think might yield some interesting finds in the future:

One possible limitation of our study and explanation of the results may come from the fact that we used geographical distance between parental birthplaces as the only approximate measure of offspring heterozygosity. Further studies should focus on more direct examination of individuals’ allele diversity and its influence on physiological processes. Of particular interest would be investigation of a possible relationship between the level of basal metabolic rate and individual’s heterozygosity both in general term as well as heterozygosity of specific locus. Such suggestion seems to be supported by previous studies which indicate that the variation in energy expenditure at rest is determined by substantial genetic component (Bouchard et al., 1989; Bouchard and Tremblay, 1990) and heterogeneity of gene loci (Jacobson et al., 2006; Loos et al., 2007). More studies in this regard may be crucial for a better and profound understanding of the Homo sapiens metabolism and energy budget.

Because of the advances in genomics, as well as the proliferation of social science data sets (thanks to corporations and government) I hope that we can begin breaking out of the habit of being led about by the nose by our norms in more areas of human genetics than just the study of Mendelian diseases! That’s a hope. I’m not saying I’d bet money on it.

Citation: Sławomir Kozieł, Dariusz P. Danel, & Monika Zaręba (2011). Isolation by distance between spouses and its effect on children’s growth in height American journal of physical anthropology : 10.1002/ajpa.21482

Image Credit: Caroline Bonarde Ucci.

🔊 Listen RSS

Aka Pygmies

The Pith: There has been a long running argument whether Pygmies in Africa are short due to “nurture” or “nature.” It turns out that non-Pygmies with more Pygmy ancestry are shorter and Pygmies with more non-Pygmy ancestry are taller. That points to nature.

In terms of how one conceptualizes the relationship of variation in genes to variation in a trait one can frame it as a spectrum with two extremes. One the one hand you have monogenic traits where the variation is controlled by differences on just one locus. Many recessively expressed diseases fit this patter (e.g., cystic fibrosis). Because you have one gene with only a few variants of note it is easy to capture in one’s mind’s eye the pattern of Mendelian inheritance for these traits in a gestalt fashion. Monogenic traits are highly amenable to a priori logic because their atomic units are so simple and tractable. At the other extreme you have quantitative polygenic traits, where the variation of the trait is controlled by variation on many, many, genes. This may seem a simple formulation, but to try and understand how thousands of genes may act in concert to modulate variation on a trait is often a more difficult task to grokk (yes, you can appeal to the central limit theorem, but that means little to most intuitively). This is probably why heritability is such a knotty issue in terms of public understanding of science, as it concerns the component of variation in quantitative continuous traits which is dispersed across the genome. The traits where there is no “gene for X.” Additionally, quantitative traits are likely to have a substantial environmental component of variation, confounding a simple genotype to phenotype mapping. Arguably the classic quantitative trait is height. It is clear and distinct (there aren’t arguments about the validity of measurement as occurs in psychometrics), and, it is substantially heritable. In Western societies with a surfeit of nutrition height is ~80-90% heritable. What this means is that ~80-90% of the variance of the trait value within the population is due to variance of the genes within the population. Concretely, there will be a very strong correspondence between the heights of offspring and the average height of the two parents (controlled for sex, so you’re thinking standard deviation units, not absolute units). And yet height is at the heart of the question of the “missing heriability” in genetics. By this, I mean the fact that so few genes have been associated with variation in height, despite the reality that who your parents are is the predominant determination of height in developed societies.

The issue gets even more thorny when you talk about variation across societies. This is a simple and yet complex issue. On the one hand we know that over time people across the world have gotten taller as nutrition has gotten better. What is less well known is that human populations have been shrinking until the past few centuries since the the Last Glacial Maximum ~20,000 years ago. Why? One can posit many reasons, both genetic and environmental, but it does point us to the reality that the story of height is not monotonic. That is, it doesn’t go in one direction, and has no simple one size fits all answer.

But that’s just the dimension of time. How about space? The question of whether different populations have final different genetic potentials for height is a disputed one. And yet it seems plausible that at the extremes there are genuine differences in the gene frequencies across populations which will speak to their different distributions in trait values. This is particularly interesting in the case of very populations characterized by low median adult heights, often termed “pygmies.” Of particular note are the Pygmies of Central Africa, who exist in a state of cultural symbiosis with their Bantu and Nilotic neighbors, adopting their languages, but remaining distinct.

These populations have very low median heights, but they are clearly not dwarfs (they are proportionate). Thankfully at least the population genetics of the Pygmies of Africa are now relatively well understood. It seems that the Western and Eastern Pygmy populations are very distinct clusters, with a common ancestry perhaps on the order of tens of thousands of years in the past. And not surprisingly the genetic distance between the Pygmy groups and their non-Pygmy neighbors is very large. The Western Pygmies tend to show more evidence of admixture with their Bantu neighbors than the Eastern ones (I suspect this is due to the longer residence of Bantus in this region). But for me the hardest issue to grapple with is the reality that the Pygmies of Central Africa seem to be genetically closer to the Khoisan people of Southern Africa than their Bantu or Nilotic neighbors! I believe this is evidence of an ancient hunter-gatherer continuum within Africa which has been marginalized and overlain by the recent expansion of Bantu farmers and Nilotic pastoralists.

In any case, what does all this have to do with the genetics of height? A new paper in the American Journal of Physical Anthropology synthesizes the inferences generated from population genetics with the basic logical assumptions of quantitative genetics to adduce that the difference between Pygmies and non-Pygmies in height is actually likely to be due to heritable differences. Indirect evidence for the genetic determination of short stature in African Pygmies:

Central African Pygmy populations are known to be the shortest human populations worldwide. Many evolutionary hypotheses have been proposed to explain this short stature: adaptation to food limitations, climate, forest density, or high mortality rates. However, such hypotheses are difficult to test given the lack of long-term surveys and demographic data. Whether the short stature observed nowadays in African Pygmy populations as compared to their Non-Pygmy neighbors is determined by genetic factors remains widely unknown. Here, we study a uniquely large new anthropometrical dataset comprising more than 1,000 individuals from 10 Central African Pygmy and neighboring Non-Pygmy populations, categorized as such based on cultural criteria rather than height. We show that climate, or forest density may not play a major role in the difference in adult stature between existing Pygmies and Non-Pygmies, without ruling out the hypothesis that such factors played an important evolutionary role in the past. Furthermore, we analyzed the relationship between stature and neutral genetic variation in a subset of 213 individuals and found that the Pygmy individuals’ stature was significantly positively correlated with levels of genetic similarity with the Non-Pygmy gene-pool for both men and women. Overall, we show that a Pygmy individual exhibiting a high level of genetic admixture with the neighboring Non-Pygmies is likely to be taller. These results show for the first time that the major morphological difference in stature found between Central African Pygmy and Non-Pygmy populations is likely determined by genetic factors.

First, is there a plausible physiological reason for the difference in adult height between Pygmies and non-Pygmies? The authors review the relevant evidence:

Endocrinologists have described the physiological determination of the African Pygmies’ short stature: serum levels of Insulin-Like Growth Factor 1 (IGF1) and of Growth Hormone Binding Protein (GHBP) are abnormally low, whereas the levels of Growth Hormone (GH) and IGF2 do not differ from Non-Pygmy controls…In this context, Merimee…proposed that the short stature of African Pygmies could be attributed to the absence of a growth spurt during puberty and that the genetic factor(s) implicated in the Pygmy stature were to be found in the GH-IGF1 axis…A recent gene-expression study further showed a slight (1.8-fold) under-expression of GH and a more dramatic (8-fold) under-expression of the GH receptor in adult African Pygmies, which was not found in Non-Pygmy Bantu speakers…However, the only genetic study focusing specifically on Pygmies’ stature, failed to find allele frequency differences in the promoter region of the gene encoding IGF1 between two African Pygmy populations and Non-Pygmy controls…In this context, whether the Pygmy populations’ short stature is solely due to environmental pressures experienced by individuals during growth (i.e., phenotypic plasticity), or to a complex genetic mechanism, remains to be demonstrated.

I believe that IGF can be found in meat and milk, so there are plausible dietary reasons that one could imagine this difference. As far as looking at differences between the genes which are known to impact height within populations across populations, there simply aren’t that many genes known which could account for the large between population differences. Not to mention that many of the current studies have used European populations, and so would likely have an ascertainment bias which might miss a lot of variance which is common within African populations.

The basic method in this paper is not too difficult to understand:

1) Use STRUCTURE, a program which assigns different ancestral quanta to individuals.

2) And compare the variation in a particular Pygmy-modal quantum across the population with variation in height.

If there are many genetic variants of small effect within the Pygmy genome which are resulting in their relatively low adult median height then dollops of Pygmy genome through admixture will reduce the height of non-Pygmies and dollops of non-Pygmy admixture in Pygmies will increase their height. The presumption is that if there are strong environmental impacts on height due to social differences then the disjunction between genetic identity and anthropological identity will be informative. For example, if Pygmies are put under particular stress or deprived specific nutritional intake because of their communal identity as marginalized Pygmies then different admixture levels with non-Pygmies should not matter much (and vice versa).

There’s a lot of statistics toward the aim of achieving significance in this paper (p-value > 0.05). And I really don’t understand the point of disaggregating males and females, for example. Just convert them to standard deviation units deviated from sex median! But in any case the major correlation is well illustrated by the two panels below. Pygmies are in red and non-Pygmies are in blue:

The y-axis is straightforward, height. You can see the Pygmies in their sample are shorter, on average. The x-axis is an ancestral component inferred from STRUCTURE which is generally found in non-Pygmies. You can see that as expected non-Pygmies have more of this than the Pygmies, but the descriptive statistic of a correlation between the non-Pygmy ancestry and height in Pygmies is evident even in this plot. Conversely, the Pygmy ancestry is correlated with lower adult height in non-Pygmies.

As a single result this particular finding isn’t too earth-shaking. If there was one population which was short due to genetic factors, I suspect that one would have to bet on the Pygmies of Central Africa. And as noted in the paper Pygmoid morphology is found among other hunter-gatherer tropical populations. This may not be a human ancestral type, but it is a type which has emerged repeatedly in our history, whether due to genetic or environmental factors. The big picture is that this same general procedure can be used to explore the differences in genetic dispositions across groups for many quantitative traits. With the coming era of cheap genotyping and sequencing I’m sure it will be done. A intrepid researcher has plenty of admixed populations in the New World to select from. There are in Brazil people who are socially identified and self-identify as white who have less European ancestry than those who are socially identified and self-identify as non-white. To compare the the social and genetic valences of African and European ancestral contributions for medical and psychological quantitative traits these sorts of populations will be of great future interest.

Link credit: Dienekes

Citation: Becker NS, Verdu P, Froment A, Le Bomin S, Pagezy H, Bahuchet S, & Heyer E (2011). Indirect evidence for the genetic determination of short stature in African Pygmies. American journal of physical anthropology PMID: 21541921

🔊 Listen RSS

The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.

Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between grandparent and grandchild is not deterministic at any given locus. Rather, it is defined by a probability. To give a concrete example, consider an individual who has four grandparents, three of whom are Chinese, one of whom is Swedish. Imagine that the Swedish individual has blue eyes. One can assume reasonably then on the locus which controls blue vs. non-blue eye color difference one of the grandparents is homozygous for the “blue eye” allele, while the other grandparents are homozygous for the “brown eye” alleles. What is the probability that any given grandchild will carry a “blue eye” allele, and so be a heterozygote? Each individual has two “slots” at a given locus. We know that on one of those slots the individual has only the possibility of having a brown eye allele. Their probability of variation then is operative only on the other slot, inherited from the parent whom we know is a heterozygote. That parent in their turn may contribute to their offspring a blue eye allele, or a brown eye allele. So there is a 50% probability that any given grandchild will be a heterozygote, and a 50% probability that they will be a homozygote. The above “toy” example on one locus is to illustrate that the variation that one sees among individuals is in part due to the fact that we are not a “blend” of our ancestors, but a combination of various discrete genetic elements which are recombined and synthesized from generation to generation. Each sibling then can be conceptualized as a different “experiment” or “trial,” and their differences are a function of the fact that they are distinctive and unique combinations of their ancestors’ genetic variants. That is the most general theory, without any direct reference to proximate biophysical details of inheritance. Pure Mendelian abstraction as a formal model tells us that reproductive events are discrete sampling processes. But we live in the genomic age, and as you can see above we can measure the variation in genetic relationships among siblings today in an empirical sense. The expectation, as we would expect, is 0.50, but there is variance around that expectation. It is not likely that all of your siblings are “created equal” in reference to their coefficient of genetic relationship to you.

We know now that the human genome consists of about ~3 billion base pairs of A, G, C, and T. In the oldest classical evolutionary genetic models each of these base pairs can be conceived to be inherited independently from the other. In other words, evolution is a game of independent probabilities. But this idealization is not the concrete reality. To the left is a visualization of a human male karyotype, the set of 23 chromosomal pairs which the human genome (excluding the mtDNA) manifests as. Because the ~3 billion aforementioned base pairs have a physical position within these chromosomes the reality is that some are inherited together. That is, their inheritance patterns are associated due to their physical linkage. The karytope you see is clearly diploid. Each chromosome is divided into two symmetrical homologs, inherited from each parent (except 23, the sex chromosomes). The chromosomal numbers also correspond roughly to a rank order of size. To give you a sense of the gap, chromosome 1 has 250,000,000 bases and 4,200 genes, while chromosome 22 has 1,100 genes and 50,000,000 bases (the Y chromosome has a paltry 450 genes, as opposed to the 1,800 on the X).

In the toy example above the eye color locus is on a chromosome. Specifically, chromosome 15. Each individual will inherit one copy of 15 from their parents. But, there is no guarantee that each sibling will inherit the same copy from the generation of the grandparents. Let’s illustrate this schematically. Below you see the four combinations possible in relation to the chromosomes inherited by an individual’s parents from their own parents. So “paternal” and “maternal” here is in reference from the parental generation, so there are two of each. The ones inherited from the parental mother I’ve italicized.

Possible outcomes of combinations from grandparents
Paternal Maternal
Father Paternal Paternal Paternal Paternal Maternal
Maternal Maternal Paternal Maternal Maternal

The outcome are as follows:

Top-left cell: paternal grandfather’s chromosome + maternal grandfather’s chromosome
Top-right cell: paternal grandfather’s chromosome + maternal grandmother’s chromosome
Bottom-left cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome
Bottom-right cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome

As an example, if on chromosome 15 two siblings were characterized by the top-left cell, we might say that they were 100% “identical-by-descent” (IBD). This just means that their genes came down from the exact same ancestors. On the other hand, if one sibling was characterized by the top-left cell, and another the bottom-right, then they would be 0% IBD! In other words, in theory with this model siblings could be 0% IBD on the autosomal chromosomes if they kept inheriting different homologs from their grandparents, chromosome by chromosome (This would not be possible for chromosome 23. Males by necessity inherit the same Y from their father. While two females must share the same X from their father).

If you have a background in biology, you know this is wrong, because there’s more to the story. Recombination means that in fact you don’t invariably inherit intact copies of your grandparent’s chromosome. Rather, during meoisis, an individual’s chromosomes often “mix & match” their strands so that new mosaics are formed. So instead of inheriting homologous chromosomes which resemble exactly those carried by their grandparents, individuals often have chromosomes which are a mosaic of maternal and paternal due to the two meoisis events which intervened (one during the formation of the gametes which led to one’s parents, and another during the formation of the gametes of their parents’). If you are still confused, the following 3 minute instructional video may help. The narration has information, so if you can’t listen, the blue = paternal chromosomal segments, and the red = maternal chromosomal segments. Focus especially on recombination, about half way through the video.

This process works in contradiction to conditional dependence of inheritance of variants due to physical linkage on the same chromosomal regions. In other words, though still theoretically possible with no recombination for siblings to be very different, realistically recombination breaks apart many of the associations and reduces the realized variance. In the figure above the the low bound outliers in terms of genetic distance across sibling pairs are about mid-way between the coefficient of relatedness of half-siblings (0.25) and full-siblings (0.50), and fulling-sibling ~0.35 or so (the high bounds are 0.65).

Any any given locus the variance of IBD for siblings is 1/8. Since expectation is ~0.50, you can infer from this that on a specific gene there’s a lot of deviation across a cohort of siblings. This makes sense when you consider that siblings differ a great deal on single gene Mendelian traits. But what about the whole genome? Because now you have many more “draws” the “law of large nummbers” tends to reduce the variance. The figure to the right shows the standard deviation of IBD by chromosome. Remember that expectation is ~0.50. Observe that longer chromosomes have lower deviations. This is due to the variation of rates of recombination across the genome. We’ve come a long way from an abstract Mendelian model, to the point where one can integrate in an understanding of differences of rates of recombination across regions of the genome into the model. The total genome standard deviation of IBD turns out to be 0.036, which is close to older theoretical models which predicted ~0.04. This means that if you randomly drew two full-siblings and compared the extent of total genome IBD, the highest likelihood would be that they differed from 0.50 by 0.036. Assuming a normal distribution that means that 70% of siblings would fall within the interval 0.536 and 0.464 coefficient of relatedness. About 95% would fall with two standard deviations, 0.428 and 572. About 99.8% would fall within three standard deviations, 39.2 to 61.8.

The paper from which I’m drawing the figures and statistics is Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. The citations, as well as follow-up papers are very interesting. It shows how modern genomics is literally swallowing whole the insights of classical quantitative genetics. Nature is one, and abstractions ultimately map onto the concrete. I’d long thought I should review this paper and its insights, as comparisons across siblings are likely going to be a future avenue of understanding the genetic basis of many traits. But I have a more personal reason for looking into this issue.

This week many of my family members came “online” to the 23andMe system. To review:

RF = Father
RM = Mother
RS1 = Sibling 1 (female)
RS2 = Sibling 2 (male)

Later to come will be RS3, another male. But his data has not loaded….

23andMe has many features related to disease risk and ancestry information. The former was not of great interest to me, as my family is large enough that I had a good sense of what we were at risk for. 23andMe told me that I was at more risk for various ailments which are common across my extended pedigree. It also told me I was at more risk for ailments which are not known in my family. And, it told me I was at less risk for ailments common across my extended pedigree. Finally, it told me I was at less risk for ailments not common across my pedigree. You get the picture. For most people there isn’t much value-add here. I haven’t even touched the issue of “odds ratios”.

In regards to ancestry, I have received some value. I suspect I’m near the end of the line in this area, unless I get into some serious DYI genetics. My involvement in the Harappa Ancestry Project is more about understanding regional patterns of variation, than that of my own family.

So we’re at the next stage: looking at patterns in my own family. The screenshot you see above is from the ‘family inheritance’, and shows the IBD between RS2 and RF chromosome by chromosome. My male sibling and my father. As you can see they are “half-identical” across the whole genome, as they should be. Of each gene my father contributes one copy on the autosome. There’s no variance here. The total 2.86 GB value is also what you’d expect, there are ~3 billion base pairs, and you’re excluding the X and Y, as well as “no calls.” I can tell you that I exhibit the exact same relationship to my father as my brother. In contrast, my sister has more segments shared. That’s because she has an X chromosome from my father. The relationship to our mother is also as expected. We’re all equally related to our parents, once you account for sex differences on chromosome 23.

Below are the screenshots from family inheritance comparing the three siblings in terms of our genomes. Remember that half-identical (light blue) has half the weight as full-identical (dark blue).

[nggallery id=30]

Here’s the top-line. I share about the same length of segments that are half-identical to both RS1 and RS2, 2.26 and 2.27 GB. But, while I have 0.60 full-identical with RS1, I have 0.86 full-identical with RS2. And here’s the even more surprising part: RS1 and RS2 have much less in common than I do with either of them. 2.09 GB half-identical, and 0.5 full-identical.

But that’s not all. 23andMe has a “relative finder” feature. It’s main goal is to find relatives you don’t know about. I don’t have any non-close relative so far, in contrast to most others from what I have heard. It may be that most of the Bangladeshis in the database are from my own immediate family! (though there are some Indian Bengalis, I’ve found only one other Bangladeshi in the database to “share” genes with) You can though include your own family in the mix. You get two different values, % of DNA shared, and # of shared segments. The former basically seems to be a proxy for IBD. I have a person of European American ancestry on my account, and they have many “relatives” matched with whom they share 0.1-1% of their genome. One individual who asked for a contact did turn out to be a very distant cousin (his surname was the same as that of a grandparent). In any case, the matrix above shows the results so far for my family. My parents are not related; they share no segments or DNA IBD. In contrast, we are all about ~50% IBD with our parents (remember that father contributes no X chromosome to sons). But look at the sibling comparisons. In particular, RS1 & RS2 share only42% of their DNA! This aligns with the earlier results. RS1 and I are a bit closer than expectation. RS2 and I are a bit more distinct. Interestingly, while RS2 and I have 49 segments in common, RS1 and RS2 have 55 in common. Why the discrepancy? Presumably RS1 and RS2 load up on the number of segments on smaller chromosomes. This seems clear in the images above.

Where does this leave us? We know intuitively that siblings differ, and cluster, in their traits. These data and methods illustrate how in the near future how parents be able to determine which siblings cluster on the total genome content level! As I have stated before, RS2 and I in particular resemble each other physically, far more than either of us resemble RS1. Could this relate to what we’ve found genomically? I believe so. Physical appearance is controlled by many different variants across many different genes, so the phenotype may be a good reflection of the character of the total genome. This can be generalized to other quantitative traits.

Finally, this has clear implications for our study of genetic inheritance within families. Classical genetic techniques had to assume that the coefficient of relatedness between siblings was 0.50. The deviation from this expectation would have introduced errors into estimates of heritability and possibly masked the understanding of the genetic architecture of a trait. But now we can correct for deviations from the 0.50 value, and so better understand the genetic basis of complex traits such as behavior.

Citation: Visscher, P., Medland, S., Ferreira, M., Morley, K., Zhu, G., Cornes, B., Montgomery, G., & Martin, N. (2006). Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings PLoS Genetics, 2 (3) DOI: 10.1371/journal.pgen.0020041

🔊 Listen RSS

Two of the main avenues of research which I track rather closely in this space are genome-wide association studies (GWAS), which attempt to establish a connection between a trait/disease and particular genetic markers, and inquiries into the evolutionary parameters which shape the structure of variation within the human genome. Often with specific relation to a particular trait/disease. By evolutionary parameters I mean stochastic and deterministic forces; mutation, migration, random drift, and natural selection. These two angles are obviously connected. Both focus on phenomena which are proximate in relation to the broader evolutionary principle: the ultimate raison d’être, replication. Stochastic forces such as random genetic drift reflect the error of sampling of genes from generation to generation during the process of reproduction, while adaptation through natural selection is an outcome of the variation of reproductive fitness as a function of variation of heritable traits. Both of these forces have been implicated in diseases and traits which come under the purview of GWAS (and linkage mapping).

GWAS are regularly in the news because of their relevance in identifying the causal genetic factors for specific diseases. For example, schizophrenia. But they can be useful in a non-disease context as well. Human pigmentation is a character whose genetic architecture has been well elucidated thanks to a host of recent association studies. The common disease-common variant has yielded spectacular results for pigmentation; it does seem a few common variants are responsible for most of the variation on this trait. But this has been the exception rather than the rule.

One reason for this disjunction between the promise of GWAS and the concrete tangible outcomes is that many traits/diseases of interest may be polygenic and quantitative. This implies that variation in phenotype is controlled by variation across many genes, and, that the variation itself exhibits gradual continuity (a continuity which can be modeled as a normal distribution of values). The power of GWAS to detect correlated variation across genes and traits of small marginal effect is obviously limited. In contrast, it seems that about half a dozen genes can explain most of the between population variation in pigmentation. One SNP is able to account for 25-40% of the difference in shade between Europeans and Africans. This SNP is fixed in Europeans, nearly absent in Africans and East Asians, and segregating in both ancestral and derived variants in groups such as South Asians and African Americans. In contrast, though traits such as schizophrenia and height are substantially heritable, much of the variation at the population level of the trait is explainable by variation in genes. The effect size at any given locus may be small, or the variation may be accumulated through the sum of larger effect variants of low frequency. In other words, many common variants of small effect, or numerous distinctive rare variants of large effect. These nuances of genetic architecture are not irrelevant to the possible evolutionary arc of the traits in question. One model of the adaptation leading to the high frequency of a trait or disease is that a novel mutation rapidly “sweeps” to fixation, or nearly to fixation. In other words, it shifts from nearly ~0% to nearly ~100% frequency in the population of alleles at that locus, driven by positive selection. This sort of rapid “hard sweep” would also result in “hitchhiking” of associated variants in the genomic regions adjacent to the originally favored mutant, producing regions of high linkage disequilibrium in the genome and haplotype blocks of associated alleles across loci. Such a model does seem possible in the case of some of the variants which are responsible for diversity of pigmentation. But this neat dovetailing between the strong association of a few variants with trait variance, and signatures of positive selection being driven by adaptation, is not so easy to come by in many instances.

There are other evolutionary possibilities in terms of what could drive a high frequency of particular alleles. Population bottlenecks and inbreeding can crank up the frequency of a variant simply through chance. This may be the origin of many traits and diseases expressed recessively or in quasi-Mendelian form which run in specific populations. Let’s set such stochastic possibilities to the side for now. The well of natural selection is not quite tapped out simply by models of positive selection drawing upon singular new mutations. Another model is that of “soft sweeps” operating upon standing genetic variation. Consider for example a trait which has a heritability of 0.50. 50% of the variance in trait value can be explained by variance in genes. Selection correlated with trait value can rapidly change the distribution of the trait within the population, as modeled by the breeder’s equation. But no new mutations are necessary in this model, rather, the frequencies of extant alleles changes over time. In fact, as the proportions shift novel combinations of alleles which were once too rare to be found together in the same individual will emerge, and so offer up the possibility that the mean trait value in generation t + n generations may be outside of the range of trait values at t = 0.

Over time such selection on a quantitative trait theoretically exhausts its own fuel, genetic variation. But quite often this is not practically operative, because such traits are subject to a background level of novel mutation and balancing selection. Stabilizing selection around a median phenotype, as well as frequency dependence and shifting environmental pressures, may produce a circumstance where adaptation never moves beyond the transient flux toward a new equilibrium. The element of the eternal race is at the heart of the Red Queen’s Hypothesis, where pathogen and host engage in an evolutionary war, and host immune responses are subject to negative frequency dependence. As the frequency of an allele rises, its relative fitness declines. As its frequency declines, its fitness rises.

Naturally such complex evolutionary models, subject to contingency and less non-trivially powerful in their generality, only become appealing when simple hard sweep models no longer suffice. But it seems highly plausible that the genetic architecture of some traits, those which seem plagued by ‘missing heritability,’ are going to necessitate somewhat more baroque evolutionary models to explain their ultimate emergence & persistence. A new paper in PLoS Genetics tackles this complexity by looking at the patterns of variation of SNPs implicated in GWAS in the HGDP data set. Genome-Wide Association Study SNPs in the Human Genome Diversity Project Populations: Does Selection Affect Unlinked SNPs with Shared Trait Associations? First, the abstract:

Genome-wide association studies (GWAS) have identified more than 2,000 trait-SNP associations, and the number continues to increase. GWAS have focused on traits with potential consequences for human fitness, including many immunological, metabolic, cardiovascular, and behavioral phenotypes. Given the polygenic nature of complex traits, selection may exert its influence on them by altering allele frequencies at many associated loci, a possibility which has yet to be explored empirically. Here we use 38 different measures of allele frequency variation and 8 iHS scores to characterize over 1,300 GWAS SNPs in 53 globally distributed human populations. We apply these same techniques to evaluate SNPs grouped by trait association. We find that groups of SNPs associated with pigmentation, blood pressure, infectious disease, and autoimmune disease traits exhibit unusual allele frequency patterns and elevated iHS scores in certain geographical locations. We also find that GWAS SNPs have generally elevated scores for measures of allele frequency variation and for iHS in Eurasia and East Asia. Overall, we believe that our results provide evidence for selection on several complex traits that has caused changes in allele frequencies and/or elevated iHS scores at a number of associated loci. Since GWAS SNPs collectively exhibit elevated allele frequency measures and iHS scores, selection on complex traits may be quite widespread. Our findings are most consistent with this selection being either positive or negative, although the relative contributions of the two are difficult to discern. Our results also suggest that trait-SNP associations identified in Eurasian samples may not be present in Africa, Oceania, and the Americas, possibly due to differences in linkage disequilibrium patterns. This observation suggests that non-Eurasian and non-East Asian sample populations should be included in future GWAS

And now the author summary:

Natural selection exerts its influence by changing allele frequencies at genomic polymorphisms. Alleles associated with harmful traits decrease in frequency while those associated with beneficial traits become more common. In a simple case, selection acts on a trait controlled by a single polymorphism; a large change in allele frequency at this polymorphism can eliminate a deleterious phenotype from a population or fix a beneficial one. However, many phenotypes, including diseases like Type 2 Diabetes, Crohn’s disease, and prostate cancer, and physiological traits like height, weight, and hair color, are controlled by multiple genomic loci. Selection may act on such traits by influencing allele frequencies at a single associated polymorphism or by altering allele frequencies at many associated polymorphisms. To search for cases of the latter, we assembled groups of genomic polymorphisms sharing a common trait association and examined their allele frequencies across 53 globally distributed populations looking for commonalities in allelic behavior across geographical space. We find that variants associated with blood pressure tend to correlate with latitude, while those associated with HIV/AIDS progression correlate well with longitude. We also find evidence that selection may be acting worldwide to increase the frequencies of alleles that elevate autoimmune disease risk.

This is a paper where jumping to the methods might be useful. Though I’m sure that the authors did not intend it, sometimes it felt as if you were following the marble being manipulated by the carnival tender. Since I was not familiar with some of the terms for the statistics, a simple allusion to the methods without elaborating in detail did not suffice. In any case, the key here is that they focused on the set of SNPs which have been associated with trait variance in GWAS, and compared those to the total SNPs found in the HGDP data set of 53 populations. Note that not all SNPs in GWAS were in the HGDP SNP panel. But for the general questions being asked the intersection of SNPs sufficed. Additionally, they generated a further subset of SNPs which were highly likely to be associated with trait variance. These were SNPs where other SNPs of related function were within 1 MB, or, SNPs which were found in more than one GWAS.

There were four primary statistics within the paper: Delta, Fst, LLC, and iHS. Fst and iHS are familiar. Fst measures the extent of between population variance across a set of populations. High Fst means a great deal of population structure, while Fst ~ 0 means basically no population structure. iHS is a test to detect the probability of natural selection based on patterns of linkage disequilibrium in the genome. Basically the important thing for the purposes of this paper is that iHS tends to be good at detecting alleles at moderate frequencies still presumably going through sweeps. This is in contrast to the older EHH test, which only detects sweeps which are nearly complete. If the authors are focusing on polygenic traits and soft sweeps the likelihood of that showing up on EHH is low since that is predicated on hard, nearly complete, sweeps. LLC measures the correlation between genetic variant of a trait as a function of latitude and longitude. Presumably this would be useful for smoking out those traits driven by ecological pressures (an obvious example in a general sense are consistent changes in area-to-volume ratio across taxa as organisms proceed from warmer to colder climes). Finally, Delta measures the allele frequency difference across the set of populations. The sign of Delta is simply a function of whether the allele frequency in question is higher in the first or second population in the comparison.

In doing their comparisons the authors did not simply compare across all 53 populations in a pairwise fashion. Rather, they often pooled continental or regional groups. To the left is a slice of table 1. It shows the populations used to generate the Delta values, and how they were pooled. The HGDP populations are broken down by region in a rather straightforward manner. But also note that some of the comparisons are between populations within regions, and those with different lifestyles. I assume that the comparisons highlighted within the paper were performed with the aim of squeezing maximal informative juice in such an exploratory endeavor. There are no obligate hunter-gatherers within the Eurasian populations in the HGDP data set to my knowledge, so a comparison between agriculturalists and hunter-gatherers would not be possible. There is such a comparison available in the African data set. The authors generated p-values by comparing the GWAS SNPs to random SNPs within the HGDP data set. In particular, they were looking for signatures of distinctiveness among the HGDP data set.

Such distinctiveness is expected. The set of SNPs associated with diseases and traits of note are not likely to be a representative subset of the SNPs across the whole genome. Remember that a neutral model of molecular evolution means that we should expect most genetic variation within the genome is going to be due to stochastic forces. Panel A of figure 1 shows that in fact the SNPs derived from GWAS did exhibit a different pattern from the total set of SNPs in the HGDP panel. Observe that the distribution of minor allele frequency (MAF) is somewhat skewed toward higher values for the GWAS SNPs. If the logic of GWAS is geared toward “common variants” which will be frequent enough within the population to generate an effect which is powerful enough to be picked up by the studies given their sample sizes, the bias toward more common variants (higher MAF) is understandable.

To the left are some SNPs and traits which had low p-values (i.e., they were deviated from expectation beyond what you’d expect from random noise). Not very surprisingly they found that pigmentation related SNPs tended to show up strongly in all the measures of population differentiation and variation. rs28777 is found in SLC45A2, a locus which differentiates Europeans from non-Europeans. rs1834640 is in SLC24A5, which differentiates Europeans + Middle Easterners + Central/South Asians from other populations. rs12913832 is a “blue eye” related variant. That is, it’s one of the markers associated with blue vs. non-blue eye color differences in Europeans.

Seeing that pigmentation has been one of the few traits which has been well elucidated by the current techniques, it should be expected that more subtle and thorough methods aimed at detecting genetic variation across and within populations should stumble upon those markers first. The authors note that “SNPs and study groups associated with pigmentation and immunological traits made up a majority of those that reached significance in our analysis.” There has long been a tendency toward finding signatures of selection around pigmentation and disease related loci.

One pattern which was also evident in terms of geography in the patterns of low p-values was the tendency for Eurasian groups to be enriched. This is illustrated in figure 2. Most of the SNPs from the GWAS studies were derived from study populations which were European. Because of this there is probably a bias in the set of SNPs being evaluated which are particular informative for Europeans and related populations. Additionally, it may also be that Eurasians were subject to different selective pressures as they left the ancestral African environment ~150-50,000 years B.P. In any case, for purposes of medical analysis the authors did find that using SNPs from East Asian populations produced somewhat different results than using those from European populations. Though some studies have shown a broad applicability of SNPs across populations, there are no doubt many variants in non-European populations which have simply not been detected because GWAS studies are not particularly focused on non-European populations. Consider:

… However, our results indicate that SNPs associated with pigmentation in GWAS display unusual allele frequency patterns almost exclusively in Europe, the Middle East, and Central Asia. This suggests to us that there may be SNPs, perhaps in or near genes other than SLC45A2, IRF4, TYR, SLC24A4, HERC2, MC1R, and ASIP, which are associated with pigmentation in non-Eurasian populations, but which have yet to be identified by GWAS. GWAS for pigmentation traits carried out using non-European subjects are needed to explore this possibility further.

There are two major other classes of trait/disease which were found to vary systematically across the HGDP populations:

– High blood pressure associated variants seemed to decrease with latitude

– Infectious and autoimmune disease SNPs had elevated scores. Specifically, there were some HIV related SNPs associated with Europeans which seem to confer resistance

The first set of traits would naturally come out of GWAS derived SNPs, since so much medical research goes into identifying risk and treating high blood pressure and other circulatory ailments. A consistent pattern where geography and not ancestry predict variation is an excellent tell for exogenous selective pressures. The physical nature of the earth is such that as mammals spread away from the equators their physiques will be reshaped by different sets of ecological parameters. Siberian populations have developed adaptations to cold stress, and there seem to be consistent cross-taxa shifts in body form to maximize or minimize heat radiation among mammals.

In the second case you have resistance to disease cropping up again, as well as pleiotropy, whereby genetic changes can have multiple downstream consequences. Often this is temporally simultaneous; consider the tame silver foxes. But sometimes you have a change in the past which has a subsequent consequence later in time due to different selective pressures. It is not that surprising that immunological responses can be multi-purpose, so even though Europeans did not develop resistance to HIV as a general selective pressure, similar pressures seem to have resulted in responses with general utility and now a specific use in relation to HIV. Selection can often be a blunt instrument, interposing itself into a network of interactions with multiple consequences, reshaping many traits simultaneously in the process of maximizing local fitness. This is most clear when you have a trait such as sicke-cell disease, which emerges only because the fitness benefit of heterozygosity is so great. But no doubt when it comes to many traits the byproducts are more subtle, or may seem cryptic to us. We still do not know why EDAR was driven to higher frequency in East Asians (less body odor and thick straight hair seem implausible targets for selection).

And just as natural selection can be blunt and rude in its impact on the covariance of genes and traits, so its relaxation may remove a suffocating vice. Consider the possibilities with blood pressure: perhaps the reason that northern Eurasians have lower blood pressure is that selection for other correlated traits associated with higher values were relaxed, allowing for fitness to be maximized in this particular dimension. Similarly, African Americans have a lower frequency of the sickle-cell disease than their ~80% West African ancestry would entail, because without the pressure of endemic malaria selection for the heterozygote was removed, allowing for the purging of the allele from the gene pool.

Nevertheless, the authors do conclude::

Despite our broad-based approach, we found only a few examples of what may be a polygenic response to a single selective pressure.</b> We did use stringent significance criteria which might mean that additional examples can be found among the study groups that did not quite meet our threshold of significance. It may also be that there is something about “GWAS” traits and their underlying genetics that served to undermine our approach.

They have several suggestions for why this didn’t pan out:
– The GWAS variants aren’t the primary source of the variation. It could be copy number variants, rare large effect variants (“synthetic”)

– Epistasis. Gene-gene interaction, which would mask or confound linear associations between variants and traits

– Low impact of selection on GWAS SNPs, or, balancing or negative selection

They finish:

In summary, we have examined 1,336 trait-associated SNPs in the 53 CEPH-HGDP populations looking for individual SNPs and groups of SNPs with unusual allele frequency patterns and elevated iHS scores. We identified 13 different traits with an associated SNP or study group that produced a significantly elevated score for at least one delta, Fst, LLC, or iHS measure, a small percentage of the total number of traits analyzed. We believe that the limited number of positive results could be due to our stringent significance criteria or to features of the genetic architecture of the traits themselves. Specifically, the roles of rare variants, epistasis, and pleiotropy in human complex traits are, although areas of active inquiry, still generally not well understood. Our measures may also not be optimal for detecting all types of selection acting on GWAS traits. It has been speculated that variants underlying complex traits will be influenced primarily by negative or balancing selection, which may not produce extreme values for our measures, particularly if these forces are relatively uniform across populations or are acting on many regions in the genome.

If selective pressures on polygenic traits are so common perhaps genomicists are going to be thumbing through Introduction to Quantitative Genetics. These are traits and evolutionary processes which lack clear distinction. In many ways modeling positive selection and hard sweeps resembles the economics of equilibriums. When it comes to continuous and quantitative traits subject to the effect of many genes a different way of thinking has to come to the fore. The transient no longer becomes a punctuation between the stasis, but the thing in and of itself. There are for example HLA genes in humans which are found in chimpanzees, because the nature of the eternal race between host and pathogen means that all the old tricks are preserved, at least at low frequencies. Human variation in intelligence, height, and all sorts of other liabilities and characteristics, may have always been with us, being buffeted continuously by a swarm of selective pressures. The question is, can our crude statistical methods ever get a grip on this diffuse but all-powerful net?

Citation: Casto AM, & Feldman MW (2011). Genome-Wide Association Study SNPs in the Human Genome Diversity Project Populations: Does Selection Affect Unlinked SNPs with Shared Trait Associations? PLoS Genetics : 10.1371/journal.pgen.1001266

🔊 Listen RSS

In a nation of ~1 billion, even one where a large minority are positively malnourished, you’d expect some really tall people. So not that surprising: NBA Awaits Satnam From India, So Big and Athletic at 14:

In a country of 1.3 billion people, 7-foot, 250-pound Satnam Singh Bhamar has become a beacon for basketball hope.

At age 14.

That potential starts with his size, which is incredible itself. At age 14, he is expected to grow for another couple of years. For now, he wears a size-22 basketball shoe. His hands swallow the ball. His father, Balbir Singh Bhamara, is 7-2. His grandmother on his father’s side is 6-9.

Punjab is one of India’s more prosperous states. Interestingly this kid’s paternal grandmother is as tall in standard deviation units as her son or grandson. In Western developed societies height is 80-90% heritable. That means that there’s very little expected regression back to the population mean for any given child. The article doesn’t mention the mother’s height though. If she is of more normal size then Satnam is either a fluke, or, there are dominant large effect rare alleles being passed down by the father, perhaps from the paternal grandmother.

• Category: Science • Tags: Genetics, Height, Quantitative Genetics 
🔊 Listen RSS In the early 20th century there was a rather strange (in hindsight) debate between two groups of biological scientists attempting to understand the basis of inheritance and its relationship to evolutionary processes. The two factions were the biometricians and Mendelians. As indicated by their appellation the Mendelians were partisans of the model of inheritance formulated by Gregor Mendel. Like Mendel many of these individuals were experimentalists, with a rough & ready qualitative understanding of biological processes. William Bateson was arguably the model’s most vociferous promoter. Set against the Mendelians were more mathematically minded thinkers who viewed themselves as the true inheritors of the mantle of Charles Darwin. Though the grand old patron of the biometricians was Francis Galton, the greatest expositor of the school was Karl Pearson.* Pearson, along with the zoologist W. F. R. Weldon, defended Charles Darwin’s conception of evolution by natural selection during the darkest days of what Peter J. Bowler terms “The Eclipse of Darwinism”.** One aspect of Darwin’s theory as laid out in The Origin of Species was gradual change through the operation of natural selection upon extant genetic variation. There was a major problem with the model which Darwin proposed: he could offer no plausible engine in regards to mode of inheritance. Like many of his peers Charles Darwin implicitly assumed a blending model of inheritance, so that the offspring would be an analog constructed about the mean of the parental values. But as any old school boy knows the act of blending diminishes variation! This, along with other concerns, resulted in a general tendency in the late 19th century to accept the brilliance of the idea of evolution as descent with modification, but dismiss the motive engine which Charles Darwin proposed, gradual adaptation via natural selection upon heritable variation.

Mendels theory of inheritance rescued Darwinism from the problem of gradual diminution of natural selection’s raw material through the process of sexual reproduction. Yet due to personal and professional rivalries many did not see in Mendelism the salvation of evolutionary theory. Pearson and the biometricians scoffed at Bateson and company’s innumeracy. They also argued that the qualitative distinctions in trait value generated by Mendel’s model could not account for the wide range of continuous traits which were the bread & butter of biometrics, and therefore natural selection itself. Some of the Mendelians also engaged in their own flights of fancy, seeing in large effect mutations which they were generating in the laboratory an opening for the possibility of saltation, and rendering Darwinian gradualism absolutely moot.

There were great passions on both sides. The details are impeccably recounted in Will Provine’s The Origins of Theoretical Population Genetics. Early on in the great debates the statistician G. U. Yule showed how Mendelism could be reconciled with biometrics. But his arguments seem to have fallen on deaf ears. Over time the controversy abated as biometricians gave way to the Mendelians through a process of attrition. Weldon’s death in 1906 was arguably the clearest turning point, but it took a young mathematician to finish the game and fuse Mendelism and biometrics together and lay the seeds for a hybrid theoretical evolutionary genetics.

R._A._FischerThat young mathematician was R. A. Fisher. Fisher’s magnum opus is The Genetical Theory of Natural Setlection, and his debates with the American physiologist and geneticist Sewall Wright laid the groundwork for much of evolutionary biology in the 20th century. Along with J. B. S. Haldane they formed the three-legged population genetic stool upon which the Modern Neo-Darwinian Synthesis would come to rest. Not only was R. A. Fisher a giant within the field of evolutionary biology, but he was also one of the founders of modern statistics. But those accomplishments were of the future, first he had to reconcile Mendelism with the evolutionary biology which came down from Charles Darwin. He did so with such finality that the last embers of the debate were finally doused, and the proponents of Mendelism no longer needed to be doubters of Darwin, and the devotees of Darwin no longer needed to see in the new genetics a threat to their own theory.

One of the major issues at work in the earlier controversies was one of methodological and cognitive incomprehension. William Bateson was a well known mathematical incompetent, and he could not follow the arguments of the biometricians because of their quantitative character. But no matter, he viewed it all as sophistry meant to obscure, not illuminate, and his knowledge of concrete variation in form and the patterns of inheritance suggested that Mendelism was correct. The coterie around Karl Pearson may have slowly been withering, but the powerful tools which the biometricians had pioneered were just waiting to be integrated into a Mendelian framework by the right person. By 1911 R. A. Fisher believed he had done so, though he did not write the paper until 1916, and it was published only in 1918. Titled The Correlation Between Relatives on the Supposition of Mendelian Inheritance, it was dense, and often cryptic in the details. But the title itself is a pointer as to its aim, correlation being a statistical concept pioneered by Francis Galton, and the supposition of Mendelian inheritance being the model he wished to reconcile with classical Darwinism in the biometric tradition. And in this project Fisher had a backer with an unimpeachable pedigree: a son of Charles Darwin himself, Leonard Darwin.

You can find this seminal paper online, at the R. A. Fisher digital archive. Here is the penultimate paragraph:

In general, the hypothesis of cumulative Mendelian factors seems to fit the facts very accurately. The only marked discrepancy from existing published work lies in the correlation for first cousins. Snow, owning apparently to an error, would make this as high as an avuncular correlation; in our opinion it should differ by little from that of the great-grandparent. The values found by Miss Elderton are certainly extremely high, but until we have a record of complete cousinships measured accurately and without selection, it will not be possible to obtain satisfactory numerical evidence on this question. As with cousins, so we may hope that more extensive measurements will gradually lead to values for the other relationship correlations with smaller standard errors. Especially would more accurate determinations of the fraternal correlation make our conclusions more exact.

I have to admit at the best of times that R. A. Fisher can be a difficult prose stylist to follow. One might wish to add from a contemporary vantage point that his language has a quaint and dated feel which compounds the confusion, but the historical record is clear that contemporaries had great difficulty in teasing apart distinct elements in his argument. Much of this was due to the mathematical aspect of his thinking, most biologists were simply not equipped to follow it (as late as the 1950s biologists at Oxford were dismissing Fisher’s work as that of a misguided mathematician according to W. D. Hamilton). In the the text of this paper there are the classic jumps and mysterious connections between equations along the chain of derivation which characterize much of mathematics. The problem was particularly acute with Fisher because his thoughts were rather deep and fundamental, and he could hold a great deal of complexity in his mind. Finally, there are extensive tables and computations of correlations of pedigrees from that period drawn from biometric research which seem extraneous to us today, especially if you have Mathematica handy.

But the logic behind The Correlation Between Relatives on the Supposition of Mendelian Inheritance is rather simple: in the patterns of correlations betweens relatives, and the nature of variance in trait value across those relatives, one could perceive the nature of Mendelian inheritance. It was Mendelian inheritance which could explain most easily the patterns of variation across continuous traits as they were passed down from parent to offspring, and as they manifested across a pedigree. Early on in the paper Fisher observes that a measured correlation between father and son in stature is 0.5. From this one can explain 1/4 of the variance in the height across the set of possible sons. This biological relationship is just a specific instance of the coefficient of determination, how much of the variance in a value, Y (sons’ heights), you can predict from the variance in X (fathers’ heights). Correcting for sex one can do the same for mothers and their sons (and inversely, fathers and their daughters).*** So combing the correlations of the parents to their offspring you can explain about half of the variance in the offspring height in this example (the correlation is higher in contemporary populations, probably because of much better nutrition in the lower orders). But you need not constraint yourself to parent-child correlations. Fisher shows that correlations across many sorts of relationships (e.g., grandparent-grandchild, sibling-sibling, uncle-niece/nephew) have predictive value, though the correlation will be a function of genetic distance.

What does correlation, a statistical value, have to do with Mendelism? Remember, Fisher argues that it is Mendelism which can explain in the details patterns of correlations on continuous traits. There were peculiarities in the data which biometricians explained with abstruse and ornate models which do not bear repeating, so implausible were the chain of conjectures. It turns out that Mendelism is not only the correct explanation for inheritance, but it is elegant and parsimonious when set next to the alternatives proposed which had equivalent explanatory power. A simple blending model could not explain the complexity of life’s variation, so more complex blending models emerged. But it turned out that a simple Mendelian model explained complexity just as well, and so the epicycles of the biometricians came crashing down. Mendelism was for evolutionary biology what the Copernican model was for planetary astronomy.

To a specific case where Mendelism is handy: in the data Fisher noted that the height of a sibling can explain 54% of the variance of height of other siblings, while the height of parents can explain only 40% of that of their offspring. Why the discrepancy? It is noted in the paper that the difference between identical twins is marginal, and other workers had suggested that the impact of environment could not explain the whole residual (what remains after the genetic component). Though later researchers observe that Fisher’s assumptions here were too strong (or at least the state of the data on human inheritance at the time misled him) the big picture is that siblings have a component of genetic correlation which they share with each other which they do not share with their parents, and that is the fraction accounted for by dominance. When dominance is included in the equation heritability is referred to as the “broad sense,” while when dominance is removed it is termed “narrow sense.”

A concept such as dominance can of course be easily explained by Mendelism, at least formally (the physiological basis of dominance was later a point of contention between Fisher and Sewall Wright). Most of you have seen a Punnet square, whereby heterozygous parents will produce offspring in ratios where 50% are heterozygous, and 25% one homozygote and 25% another. But consider a scenario where one parent is a heterozygote, and the other a homozygote for the dominant trait. Both parents will express the same trait value, as will their offsprings. But, there will be a decoupling of the correlation between trait-value and genotype here, as the offspring will be genotypically variant. Parent-offspring correlations along the regression line become distorted by a dominance parameter, and so reduce correlations. In contrast, full siblings share the same dominance effects because they share the same parents and can potentially receive the same identical by descent alleles twice. Consider a rare recessively expressed allele, one for cystic fibrosis. As it is rare in a population in almost all cases where the offspring are homozygotes for the disease causing allele, both parents will be heterozygotes. They will not express the disease because of its recessive character. But 25% of their offspring may because of the nature of Mendelian inheritance. So there’s a major possible disjunction between trait values from the parental to offspring cohorts. On the other hand, each sibling has a 25% chance of expressing the disease, and so the correlation is much higher than that with the parents (who do not express disease). In other words siblings can resemble each other much more than they may resemble either parent! This makes intuitive sense when you consider the inheritance constraints and features of Mendelism in diploid sexual species. But obviously a simple blending model can account for this. What it can not account for is the persistence of variation. It is through the segregation of independent Mendelian alleles, and their discrete and independent reassortment, that one can see how variation would not only persist from generation to generation, but manifest within families as alleles across loci shake out in different combinations. A simple model of inheritance can then explain two specific phenomena which are very different from each other.

There is much in Fisher’s paper which prefigures later work, and much which is rooted in somewhat shaky pedigrees and biometric research of his day. The take home is that Fisher starts from an a priori Mendelian model, and shows how it could cascade down the chain of inferences and produce the continuous quantitative characteristics we see all around us. From the Hardy-Weinberg principle he drills down through the inexorable layers of logic to generate the formalisms which we associate with heritability, thick with variance terms. The Correlation Between Relatives on the Supposition of Mendelian Inheritance was a marriage between what was biometrics and Mendelism which eventually gave rise to population genetics, and forced the truce between the seeds of that domain and what became quantitative genetics.

As I said, the paper itself is dense, often opaque, and characterized by a prose style that lends itself to exegesis. But I find that it is often useful to see the deep logics behind evolution and genetics laid bare. Some of the issues which we grapple with today in the “post-genomic era” have their intellectual roots in this period, and Fisher’s work which showed that quantitative continuous traits and discrete Mendelian characters were one in the same. The “missing heritability” hinges on the fact that classical statistical techniques tell us that Mendelian inheritance is responsible for the variation of many traits, but modern statistical biology which has recourse to the latest sequencing technology has still not be able to crack that particular nut with satisfaction. Perhaps decades from now biologists will look at the “missing heritability” debate and laugh at the blindness of current researchers, when the answer was right under their noses. Alas, I suspect that we live in the age of Big Science, and a lone genius is unlikely to solve the riddle on his lonesome.

Citation: Fisher, R. A. (1918). On the correlation between relatives on the supposition of Mendelian inheritance Transactions of the Royal Society of Edinburgh

Suggested Reading: The Origins of Theoretical Population Genetics, R.A. Fisher: The Life of a Scientist, and The Genetical Theory of Natural Selection.

* Though I will spare you the details, it may be that the Galtonians were by and large more Galtonian than Galton himself! It seems that Francis Galton was partial was William Bateson’s Mendelian model.

** To be fair, I believe the phrase was originally coined by Julian Huxely.

*** Just use standard deviation units.

Image Credit: Wikimedia

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"