The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Update: First, people coming to this weblog for the first time should know that I moderate comments. So if you leave an obnoxious one it’s basically like an email to me (no one will see it). Second, the correlation between height and intelligence is not that high. This association is probably not going to be intuitively visible to anyone, but rather only shows up in large data sets. So please stop offering yourself as a counter-example of the trend (also, the key is to look within families, because the signal here is going to be swamped by other factors when you compare across populations). Third, a friend has sent me another paper which does confirm that even within sibling cohorts there does seem to be a correlation between height and I.Q. The problem is that it is a very small one, so you need large data sets with a lot of power to see it.

End Update

One moderately interesting social science finding is that there is a positive correlation between height and measured intelligence (e.g., on an I.Q. test). Setting aside the possibility that I.Q. tests designs are culturally biased against shorter people, one wonders why this is so. Height is a highly heritable trait where most of the variation within the population is due to variation as numerous genes. In other words, there isn’t a “tall” or “short” gene, but thousands and thousands of variants which shape the variation of the trait across the population. When I say it is highly heritable, I mean to imply that most of the variation in height in developed societies is due to genes (80-90%). As it happens intelligence is somewhat similar in its genetic architecture, heritable due to small effects across many genes. In general estimates for the heritability of intelligence tend to be somewhat lower, on the order of ~50% rather than 80-90%.

It is due to the highly polygenic nature that both of these traits have been posited as candidates for a “good genes” model of sexual selection. Presumably individuals with a higher mutational load will have lower intelligence and be shorter, all things equal, because these traits have extensive genome-wide coverage and are big targets. Geoffrey Miller’s The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature, was predicated on this logic. If the mutational load argument holds then the reduced I.Q. of shorter individuals may simply be due to the same cause: “bad genes.”

Another scenario is that assortative mating between tall and intelligent people has generated a correlation between alleles which tend toward this end of the trait distribution. The phenomenon is simple enough to describe; height and intelligence are both attractive, and even if they are not due to the same genetic loci the pairing of tall and smart results in the correlation between the traits. My own assumption is that something like this, perhaps with a mutational effect at the bottom of the distribution (due to large effect deleterious alleles knocking people down in height and intelligence), generates most of the correlation. Part of this is due to my reading of The g Factor:

It is now well established that both height and weight are correlated with IQ. When age is controlled, the correlations in different studies range mostly between 0.10 and 0.30, and the average about 0.20. Studies based on siblings find no significant within-family correlation, and gifted children (who are taller than their age mates in the general population) are not taller than their non-gifted siblings.

Whenever people posit a pleiotropic relationship between traits I am always curious about the possibility that the traits may be correlated (or not) in siblings. Population structure of some sort can produce correlations, but patterns within families are often more informative of the genuine genetic basis of these correlations.

A new paper in PLOS GENETICS tackles this with more sophisticated techniques. They conclude:

Traits that are attractive to the opposite sex are often positively correlated when scaled such that scores increase with attractiveness, and this correlation typically has a genetic component. Such traits can be genetically correlated due to genes that affect both traits (“pleiotropy”) and/or because assortative mating causes statistical correlations to develop between selected alleles across the traits (“gametic phase disequilibrium”). In this study, we modeled the covariation between monozygotic and dizygotic twins, their siblings, and their parents (total N = 7,905) to elucidate the nature of the correlation between two potentially sexually selected traits in humans: height and IQ. Unlike previous designs used to investigate the nature of the height–IQ correlation, the present design accounts for the effects of assortative mating and provides much less biased estimates of additive genetic, non-additive genetic, and shared environmental influences. Both traits were highly heritable, although there was greater evidence for non-additive genetic effects in males. After accounting for assortative mating, the correlation between height and IQ was found to be almost entirely genetic in nature. Model fits indicate that both pleiotropy and assortative mating contribute significantly and about equally to this genetic correlation.

Pleiotropy here means that the same gene is impacting different traits (height and I.Q.). The additive genetic correlation between height and I.Q. was 0.08 and 0.17 in males and females respectively. These are small correlations obviously, but it’s what we’d expect.

After the statistical modeling using a twin design there’s a lot of talk about sexual selection and the long arc of evolutionary genetics (e.g., additive genetic variation being exhausted by selection). This is what you have to do in discussions, but I’m not sure it really adds any value. How strong is sexual selection for intelligence and height? The data show that taller men have more sexual partners, but the problem here is that taller men have taller daughters, and these daughters are not necessarily so reproductively fit. Whenever you are talking about sexual selection you need to take into account the antagonism that might entail because of the differential value of a trait between the sexes (e.g., masculine men may have masculine daughters). As for I.Q., I’m not sure about the long term distribution of fitness for this trait. I have a suspicion that the “sweet spot” for mating is to be only somewhat smarter than than the average, but not so clever so as to be obnoxious.

In the end I’d really like to see a massive number of siblings compared. I think that’s doable with this data set, but I didn’t see it in the paper (tell me if I’ve missed something). At some point we’ll have accurate high coverage whole genomes for many pairs, and we can ascertain whether it’s mutational load and pleiotropy more directly when it comes to correlations like this. Since pedophiles tend to be shorter and less intelligent I’m willing to accept deep biological connections across many traits. But I feel that the whole area is somewhat of a muddle right now. And talking a lot about sexual selection strikes me as excessive hand waving.

Citation: Keller MC, Garver-Apgar CE, Wright MJ, Martin NG, Corley RP, et al. (2013) The Genetic Correlation between Height and IQ: Shared Genes or Assortative Mating? PLoS Genet 9(4): e1003451. doi:10.1371/journal.pgen.1003451

• Category: Science • Tags: Anthropology, Genetics, Genomics, Height, Intelligence, Select 
🔊 Listen RSS

TreeMix on 100,000 SNPs, migration parameter = 5

When I read Genome-Wide Diversity in the Levant Reveals Recent Structuring by Culture in PLoS Genetics last week, one of my thoughts was “where is the tree”? Thankfully all the data is online, so I simply ran TreeMix on it. After a number of runs I know understand perhaps why there is no figure emphasizing a tree. There just isn’t that much informative yield from what I can tell, though the basic inference from the paper is recapitulated. You can see the results in the figure above, from one of my TreeMix runs. Overall, what this paper reinforces is that there are sharp genetic distinctions across ethno-religious boundaries within the modern Middle East which confound attempts to use geography to predict variation.

Umayyad Caliphate

There are two scales that one can bin the dynamics which might account for this. First, the Middle East has long been, and is still now, a region characterized by consanguineous marriages which likely increases genetic distance over short spatial scales. This may be of dep historical standing. Recall that in the Hebrew Bible there are many instances of cousin marriage, foremost being the marriage of Jacob* to Rachel and Leah, his maternal cousins (Jacob being the ancestor of the tribes of Israel). And yet a major feature of contemporary Middle Eastern society is parallel cousin marriage, the marriage of the offspring of brothers. This cultural pattern has been associated with Islam in particular, which has codified and expanded many practices which were present among the Semitic peoples of the Arabian peninsula.

Second, there is the issue of the rise of Islam, and the crystallization of broad confessional communities which exhibit endogamy. By this, I mean that the process of Islamicization was accompanied by a level of marital segregation, the importance of which has been moderately debated. The primary fissure is the argument over the extent of intermarriage between newcomers from the Arabian peninsula and local populations, as opposed to conversion of local elites, and ethnic identity shift. Not only did the rise of Islam generate a novel range of endogamous communities, non-Muslim populations across what became the Arab world were sealed off from the demographic currents which arose due to the massive scale of the early Muslim empires. Recall that the hegemony of the Umayyads in the early 8th century extended from the Atlantic ocean to Central Asia!

A major human consequence of the Pax Islamica was the influx into the core Muslim lands of slaves of various sorts. This precipitated the Zanj rebellion. But that catastrophic incident did not stop the trade in humans in the Islamic world, which persisted down to the 20th century. Importantly one must not forget that the slaves in the Muslim world were a multiracial cohort (though for various reasons black Africans and their descendants were particularly stigmatized as slaves, and remain so), Turks and Indians from the east, Caucasians from the north, and Europeans from the west, as well as Africans from the south. The key factor behind all of these is that the slaves were from non-Muslim lands, and originally not Muslim. And, due to social and economic considerations dhimmis, whether that be Christians of various sects, Jews, Sabians, Zoroastrians, and other assorted obscure ethno-religious minorities (e.g., Mandaens and Yazidis), were excluded from the mainstream of this cosmopolitan world of trade and migration.

What the authors of the above paper confirmed is that Lebanese Christians, Muslims, and Druze, are genetically very distinct. Geography does not predict their clustering. Much of this is almost certainly straightforward admixture. The TreeMix plot shows the influence of African ancestry in the Muslim Middle Eastern groups. Using other data I have suggested that many of the non-Muslim populations of the Fertile Crescent reflect the extant variation at the time of the Arab conquest. Not only were these populations not subject to admixture with the victorious Arabs, but there were not impacted by gene flow from diverse populations which migrated or were brought into the center of the Muslim world.

One aspect of the PLoS Genetics paper is that they used ADMIXTURE to explore the relationship between two ancestral clusters, termed “Levantine” and “Middle Eastern,” with a northern and southern geographic focus in the Near East. These are familiar to those who read genome bloggers. The Levantine component is probably the ubiquitous “West Asian” element (as implied by its closer association with European ancestral clusters), while the Middle Eastern is often labeled “South Arabian.” In ADMIXTURE runs the Middle Eastern/South Arabian element is often the Eurasian component present in Ethiopians. A major issue to consider is whether the Middle Eastern/South Arabian element is itself a compound, as “Indian/South Asian” ones have been found to be. This would explain its genetic distance from other West Eurasian elements. It could be that it is simply a synthesis between indigenous populations and incoming farmers from the north. This hybrid population then underwent its own demographic expansion at some point later in history, even admixing with one of its “parents” to the north with the rise of Islam.

Rather than more elegant and powerful methods what is needed here is more data. The ethno-religious minorities of the Middle East are fascinating treasure troves of cultural diversity, and genetic variety. In particular it would be of interest to explore differences between Iranian Zoroastrians and other Persians, as well as Copts vs. Muslim Egyptians. Because of genetic closeness it is not always easy to assess Arabian contribution to Middle Eastern populations, but at some point we’ll have a better sense, and my prediction would be that it is minimal in Iran, but non-trivial in Egypt, explaining why the latter underwent language shift where the former did not.

Citation: Haber M, Gauguier D, Youhanna S, Patterson N, Moorjani P, et al. (2013) Genome-Wide Diversity in the Levant Reveals Recent Structuring by Culture. PLoS Genet 9(2): e1003316. doi:10.1371/journal.pgen.1003316

* Jacob’s parents were also related, with Isaac’s wife being the daughter of a cousin.

• Category: History, Science • Tags: Anthropology, Genomics, Select 
🔊 Listen RSS

There’s an excellent paper up at Cell right now, Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. It synthesizes genomics, computational modeling, as well as the effective execution of mouse models to explore non-pathological phenotypic variation in humans. It was likely due the last element that this paper, which pushes the boundary on human evolutionary genomics, found its way to Cell (and the “impact factor” of course).

The focus here is on EDAR, a locus you may have heard of before. By fiddling with the EDAR locus researchers had earlier created “Asian mice.” More specifically, mice which exhibit a set of phenotypes which are known to distinguish East Asians from other populations, specifically around hair form and skin gland development. More generally EDAR is implicated in development of ectodermal tissues. That’s a very broad purview, so it isn’t surprising that modifying this locus results in a host of phenotypic changes. The figure above illustrates the modern distribution of the mutation which is found in East Asians in HGDP populations.

One thing to note is that the derived East Asian form of EDAR is found in Amerindian populations which certainly diverged from East Asians > 10,000 years before the present (more likely 15-20,000 years before the present). The two populations in West Eurasia where you find the derived East Asian EDAR variant are Hazaras and Uyghurs, both likely the products of recent admixture between East and West Eurasian populations. In Melanesia the EDAR frequency is correlated with Austronesian admixture. Not on the map, but also known, is that the Munda (Austro-Asiatic) tribal populations of South Asia also have low, but non-trivial, frequencies of East Asian EDAR. In this they are exceptional among South Asian groups without recent East Asian admixture. This lends credence to the idea that the Munda are descendants in part of Austro-Asiatic peoples intrusive from Southeast Asia, where most Austro-Asiatic languages are present.

And yet one thing that jumps out at me is that there is no East Asian EDAR in European populations, even in Russians. I am a bit confused by this result, because of the possibility of Siberian-affiliated population admixture with Europeans within the last 10,000 years, as adduced by several researchers (this is not an obscure result, it manifests in TreeMix repeatedly). The second figure shows the inferred region from which the East Asian EDAR haplotype expanded over the past 30,000 years. The authors utilized millions of forward simulations with a host of parameters to model the expansion of EDAR, so that it fit the distribution pattern that is realized (see the supplements here for the parmeters). To make a long story short they infer that there was one mutation on the order of ~30,000 years before the present, and that it swept up in frequency driven by selection coefficients on the order of ~0.10 (10% increase relative fitness, which is incredibly powerful!). This is on the extreme end of selective sweeps, and likely of the same class as the haplotype blocks which characterize SLC24A5 and LCT (the block is shorter, though that makes sense because of the deeper time depth). Again, I am perplexed why such an ancient allele, which is found in Amerindians, or Munda populations, is absent in Europeans who have putative East Eurasian admixture. The whole does not cohere for me. There is a weak point in one or more of my assumptions.

Then there’s the section on the mouse model. To me this aspect was ingenious, though I’m not particularly able to assess it on its technicalities. The earlier usage of mouse models to test the effects of mutations on EDAR was in the context of coarse copy number changes which resulted in massive dosage changes of protein. The phenotypic outcomes were rather extreme in that case. Here they used a “knockin” model where they recreated the specific EDAR point mutation. Instead of extreme phenotypes they found that the mice were much more normal in their range of traits, though the hair form shifts were well aligned with what occurred in humans. Additionally there were some changes in the number of eccrine glands, with a larger number in the derived East Asian EDAR carriers (with additive effect). Finally they noticed that there were differences in mammary gland pad area and branching. None of this is that surprising, EDAR is a significant regulatory gene which shapes the peripheries and exterior of an organism.

To double check the human relevance of what they found in the mouse model they performed a genome-wide association in a large cohort of Han Chinese. The correlations of particular traits were in the directions that they expected; those individuals with East Asian EDAR variants had thicker hair, shovel-shaped incisors, and a greater density of eccrine glands. It is perhaps important to note that the frequency of the derived variant is so high in Han populations that they didn’t have enough homozygote ancestral genotypes to perform statistics, so their comparisons involved heterozygotes with the derived mutant and also a copy of the ancestral state. This is like SLC24A5 in Europeans, where it is difficult to find individuals of European heritage who have double copies of the non-European modal variant.

Let’s review all the awesome things they did in this study. They dug deeply into the evolutionary genomics of the region around the EDAR, concluding that this haplotype was driven up in frequency from on ancestral variant ~30,000 years ago in a hard selective sweep. And a sweep of notable strength in terms of selection coefficient. This may be one of the largest effect targets of natural selection in the genome of non-Africans over the past 50,000 years. Second, they used a humanized mouse model to explore the range of phenotypes correlated with this mutational change in East Asians. So you have a strong selection coefficient on a locus, and, a range of traits associated with changes on that locus. Third, they confirmed the correlation between the traits and the mutation in humans, despite there being prior research in this area (i.e., they reproduced). This is all great science, and shows the power of collaboration between the groups.

Much of the elegance and power of the paper applies to the discussion section as well, but to be frank this is where things start falling apart for me. You can get a sense of it in The New York Times piece, East Asian Physical Traits Linked to 35,000-Year-Old Mutation. The headline here points to a legitimately important inference from this line of research, many salient physical characteristics of the human races seem to be due to strong selection events at a few loci. In addition to EDAR I’m thinking of the pigmentation loci, such as SLC24A5. I wouldn’t be surprised if there was something similar for the epicanthic fold. If it is visible, and defines between populations differences, it is generally not genomically trivial. There’s usually a story underneath that difference.

In the broad scale of human natural history the problem that arises for me is that we have traits, we have genes under selection, but we have very weak stories to explain the mechanism and context of natural selection. Here there is a strong contrast with the loci around lactase persistence and malaria resistance. In those situations the causal mechanism for the selection seems relatively clear. Critics of evolutionary psychology are wont to accuse the field of ‘Just So’ storytelling, but the same problem crops up in the more intellectually insulated domain of evolutionary genomics (in part because the field is very new, and also mathematically and computationally abstruse). To illustrate what I’m talking about I’m going to quote from the discussion of the above paper:

A high density of eccrine glands is a key hominin adaptation that enables efficient evapo-traspiration during vigorous activities such as long-distance walking and running (Carrier et al., 1984; Bramble and Lieberman, 2004). An increased density of eccrine glands in 370A carriers might have been advantageous for East Asian hunter-gatherers during warm and humid seasons, which hinder evapo-transpiration.

Geological records indicate that China was relatively warm and humid between 40,000 and 32,000 years ago, but between32,000 and 15,000 years ago the climate became cooler and drier before warming again at the onset of the Holocene (Wang et al., 2001; Yuan et al., 2004). Throughout this time period, however, China may have remained relatively humid due to varying contribution from summer and winter monsoons.

High humidity, especially in the summers, may have provided a seasonally selective advantage for individuals better able to functionally activate more eccrine glands and thus sweat more effectively (Kuno, 1956). To explore this hypothesis, greater precision on when and where the allele was under selection—perhaps using ancient DNA sources—in conjunction with more detailed archaeological and climatic data are needed.

A climate adaptation is always a good bet. The problem I have with this hypothesis is that modern day gradients in the distribution of this allele are exactly the reverse of what one might expect in terms of adaptation to heat and humidity. Additionally, is there no cost to this adaptation? After the initial sweep upward, the populations where the derived EDAR mutant is found in high frequencies went through the incredible cold of the Last Glacial Maximum, and groups like the Yakuts are known to have cold adaptations today. Not only that, but the Amerindians from the arctic to the tropics all exhibit a cold adapted body morphology, the historical consequence of the long sojourn in Berengia.

Granted, the authors are not so simplistic, and the somewhat disjointed discussion alludes to the fact that EDAR has numerous phenotypic effects, and it may be subject to diverse positive selection pressures. This seems plausible on the surface, but this complexity of mechanism seems ill-fitted to the fact that the signal of selection around this locus is so clean and crisp. It seems that this is not going to be an easy story to unpack, and there’s a good deal of implicit acknowledgement of that fact in this paper. But tacked right at the end of the main text is this whopper:

It is worth noting that largely invisible structural changes resulting from the 370A allele that might confer functional advantage, such as increased eccrine gland number, are directly linked to visually obvious traits such as hair phenotypes and breast size. This creates conditions in which biases in mate preference could rapidly evolve and reinforce more direct competitive advantages. Consequently, the cumulative selective force acting over time on diverse traits caused by a single pleiotropic mutation could have driven the rise and spread of 370A.

A simple takeaway is that the initial climatic adaptation may have given way to a cultural/sexual selective adaptation, whereby there was a preference for “good hair” as exemplified by pre-Western East Asian canons (black and lustrous), as well as a bias toward small breasts. This aspect gets picked up in The New York Times piece of course. I’ll quote again:

But Joshua Akey, a geneticist at the University of Washington in Seattle, said he thought the more likely cause of the gene’s spread among East Asians was sexual selection. Thick hair and small breasts are visible sexual signals which, if preferred by men, could quickly become more common as the carriers had more children. The genes underlying conspicuous traits, like blue eyes and blond hair in Europeans, have very strong signals of selection, Dr. Akey said, and the sexually visible effects of EDAR are likely to have been stronger drivers of natural selection than sweat glands.

The passage here is ambiguous because the author of the article, Nick Wade, doesn’t use quotes, and I don’t know what is Akey and what is Wade’s gloss on Akey. For example, for theoretical reasons of reproductive skew (a few men can have many children) in general sexual selection is considered to be driven most often by female preference for male phenotypes. I assume Akey knows this, so I suspect that that section is Wade’s gloss (albeit, a reasonable one given the proposition of preference for smaller breasts). The main question on my mind is how seriously prominent population geneticists such as Joshua Akey actually take sexual selection to be as a force driving variation and selection in human populations. It seems that quite often sexual selection is presented as a deus ex machina. A phenomenon which can rescue our confusion as to the origins of a particular suite of traits. But our assessment of the likelihood of sexual selection presumably has to be premised on prior expectations informed by a balance of different forces one can gauge from the literature, and here my knowledge of the current sexual selection literature is weak. Perhaps my skepticism is premised on my ignorance, and the population geneticists who proffer up this explanation are more informed as to the state of the literature.

All this brings me back to the farcical title. When this paper first made news last week I was having dinner with a friend of Japanese heritage (who spent his elementary school years in Japan). I asked him point blank, “Do you like small breasts?” His initial response was “WTF!?! Razib,” but as a mouse geneticist he understood the thrust of my question after I outlined the above results to him. From personal communication with many East Asian American males I am not convinced that there is a overwhelmingly strong preference for small breasts within this subset of the population. But the key here is American. These are individuals immersed in American culture. The norms no doubt differ in East Asia. The typical visual representation of celebrity East Asian females that we see in the American media depict individuals who are slimmer and more understated in their secondary sexual characteristics than is the norm among Western female celebrities (e.g., Gong Li, the new crop of Korean pop stars, even taking into account the plastc surgery of the latter). Part of this is no doubt the reality that the normal range of variation across the population differs, and part of it may be the nature of aesthetic preferences.

But the possibility of deep rooted psychological reasons driving sexual selection (to my knowledge there was no culture which spanned South China and Siberia) brings us back to old ideas about the Pleistocene mind. And, it brings us back to evolutionary psychology, a field which is the whipping boy of both skeptics of the utility of evolutionary science in understanding human nature, and rigorous practitioners of evolutionary biology. And yet here it is not the evolutionary psychologists, but rock-ribbed statistical geneticists who I often see being quoted in the media invoking sexual selection. But do we know it is sexual selection, or is it just our best guess? Because more often than not best guesses are wrong (though best guesses are much more likely to be right than worst guesses!).

Evolutionary genomics has come a long way in the past 10 years. We know, for example, the genetic architecture and some aspects of the natural history of many traits. But, there are still shortcomings. Lactase persistence is the exception to the rule. Even a phenotype as straightforward as human pigmentation has no undisputed answer as to why it has been the repeated target of selection across Eurasia over the past 40,000 years. Oftentimes the right answer is simply that we just don’t know.


🔊 Listen RSS

My daughter has four grandparents. Genetically she is a little over 25 percent her paternal grandfather and maternal grandmother, and a little under 25 percent her maternal grandfather and paternal grandmother.* Why? Because she is 50 percent genetically identical by descent with her mother and likewise with her father. This is all rather straightforward. But what about culturally?

With biological heredity we can speak of genes, the substrate by which inheritance occurs. With culture memes have been far less fruitful as anything more than an illustration, as opposed to the basis of a formal system of logic and analysis. Nevertheless, we can describe with relative clarity many aspects of culture as a trait or phenotype. And this is important. Recall that evolutionary process was characterized by Charles Darwin despite lacking a satisfying theory of inheritance.

One of the more fascinating aspects of surveying human phenotypic variation is that one can consider the differing dynamics which those which are genetically controlled, at least in part, are subject to in contrast to those which are entirely “memetic” in character. Variation in skin color, for example, is mostly genetically controlled. In other words, skin color is a heritable trait in a genetic sense. In contrast the language one speaks is a function of milieu. One’s hair form, blood type, and nose shape, are matters contingent upon one’s biological parents in a necessary and determinative sense. Language, religion, and culinary preferences are accidents contingent upon one’s parents’ preferences.

But it doesn’t end here. In sexual organisms genetic inheritance is symmetric (the autosomal genome has equal contributions from both parents), and exclusively vertical (parents to offspring). In contrast cultural inheritance can be asymmetric (i.e., one inherits by and large the culture of one parent) and horizontal (one inherits the culture of one’s peers). In The Nurture Assumption Judith Rich Harris relates the story of cultural continuity in elite British boarding schools. For generations norms and folkways were transmitted from older students to younger ones, with no parental input. This regular and systematic inter-quasi-generational horizontal transmission illustrates flexibility of cultural transmission which has few parallels in biological genetics. One reason that the logic of biological genetics is powerful is that the system is straight-jacketed by is own constraints, reducing the space of inferences and narrowing one’s extrapolations. Often complexity breeds intractability (see: economics). This is why a formal and systematic study of cultural evolutionary process analogous to that in biology has been a quixotic quest (promoted periodically by individuals of note such as E. O. Wilson and L. L. Cavalli-Sforza, and pushed forward by Peter Richerson and Robert Boyd and their students for several decades).

And yet all this is the broader purview of a paper in Proceedings of the Royal Society B. It is not online as of yet, so I will point you to the report in Nature, Genes mix faster than stories. Here is the top line result:

If folk tales simply spread by diffusion, like ink blots in paper, one would expect to see smooth gradients in these variations as a function of distance. Instead, researchers found that language differences between cultures create significant barriers to that diffusion.

These barriers are stronger than those for the exchange of genes — a message that might be crudely expressed as: “I’ll sleep with you, but I prefer my stories to yours.”

The irony here is that despite the powerful flexibility of cultural transmission, quite often it is cultural variation which exhibits sharp inter-group differences. Both common sense and population genetic theory support this finding. Without inquiring further into the matter I will assert, and be willing to take a $100 bet, that the genetic distance between the Flemish and Walloons of Beligum is smaller than that between the Walloons and Catalans. The language of the Walloons is clearly more closely related to other Romance dialects than that of their Flemish neighbors (go to Google Translate and listen to various Germanic and Romance languages with the same phrase, and it is obvious). But it does not follow that this cultural resemblance must entail a genetic resemblance.

As far as population genetics goes, gene flow is a very powerful force in equilibrating allele frequencies. Only 1 migrant per generation is needed between two populations to prevent them form diverging. Even a 1 percent admixture between two populations will quickly equilibrate allele frequency differences, especially considering that on most loci those differences are not of the disjoint character (frequency 0 vs. 1). Continuous gene flow defined by isolation by distance is a constant homogenizing force across adjacent populations.

But the genetic homogenization on a genome-wide inter-population scale mediated by migration does necessarily hold for culture. It may in some cases, but by and large it does not. This is most easy to illustrate with language, and that is why I focus on that example. The case of the “rape of the Sabine women” by the early Romans is a legendary illustration of the distinction between cultural and genetic inheritance. The Romans assimilated many groups early in their history. In fact, the elite patrician gens Claudia even had paternal Sabine ancestry. But no matter the biological nature of their genealogy the Latin Roman cultural matrix persisted, and propagated. The children of the Sabine women were culturally Roman, not a hybrid between the Sabine and the Roman.

One can illustrate this reality with other cultural characters. Modern Mexicans are a genetically hybrid population between Europeans and Amerindians. But their religion is a European sect (even if their Roman Catholicism has an indigenous flavor, no one would confuse it with the Aztec or Maya religion). Their language is also a European language (even if there are indigenous loan words, regional Mexican Spanish is intelligible with Castilian). But, their cuisine arguably has a predominantly Amerindian basis, albeit inflected with Iberian influences.

The focus on regional, ethnic, and national constructs here is not coincidental. Cultural variation as noted above exhibits high levels of inter-group variation. When comparing the genes of the Yoruba and Tuscans, most of the variation is within each group. But when comparing the language of the Yoruba and Tuscans, most of the variation is across the two groups. The organismic analogy for groups or cohorts of individuals applies much more appropriately to cultural entities than it does to biological genetic abstractions (e.g., the Body of Christ). The origin of the term shibboleth illustrates the functional relevance of this reality of inter-group variation: even though culture is highly plastic across generations and populations, it is not always facultative in the lives of individuals. The way you speak marks your origins and your class. It constraints your norms, and shapes with whom you identify.

And with that, back to my daughter. She will speak English, and she will be irreligious. Her norms and views will not be atypical for the average American. She will eat bacon (OK, she has), and when of age, drink beer. In all ways culturally that matter she resembles her maternal grandparents, and not her paternal grandparents. There was never a great question about this. In choosing to bring up their children to an American milieu my parents risked severing us from the culture in which they were embedded, and which nurtured them. So it is, and so it will always be. The dreams of generations past may die, but their genes live on.

* Her whole recent pedigree has been genotyped, so these proportions are known with precision.

🔊 Listen RSS

Most people in South Asia speak one of two varieties of language, Indo-Aryan and Dravidian. These two are not particularly closely related. Indo-Aryan is an Indo-European language, as is evident in the plethora of obvious cognates with other Indo-European dialects. I have a minimal fluency in Bengali, the easternmost of the Indo-European languages, and quite a bit more fluency with English, one of the most westernmost, and it was evident to me rather early on (e.g., grass vs. gash, man vs. manush, nose vs. nak). In contrast to me Dravidian languages are peculiar because the accent and cadence are clearly South Asian, but they are utterly impenetrable (though there are many loan words into Indo-Aryan from Dravidian).

But in this post I’m going to explore the genetic relationships of the people who speak a subgroup of Austro-Asiatic languages indigenous to India, that of the Munda. The traditional question has always been whether the Austro-Asiatic languages are from India, or, whether they are from Southeast Asia. More precisely, did the Munda culture come to India, or is the Munda culture a relic of the original Austro-Asiatic domain in eastern India?

As background I believe it is important that readers understand that the territory between Vietnam and that of the Munda was likely dominated by Austro-Asiatic dialects ~2,000 years ago. Both the Burmese and Thai arrived in the historic period from southern China, and overthrew Mon or Khmer cultures which flourished in lowland Southeast Asia. In the case of both the Burmese and Thai it was a situation where the newcomers imposed their language upon the indigenous population, but by and large adopted most elements of high culture from the natives (e.g., Theravada Buddhism). The monarchies of Thailand and Burma drew directly from the Indic-inflected polities of the Khmer and Mon.

The recent extensive distribution and variety of Austro-Asiatic languages in Southeast Asia is suggestive of the likelihood that they derive from this area, but it is not a definitive point in that model’s favor. But there are now other genetic lines of inquiry. A few years ago a paper came out which reported that the Y chromosomal lineages of the Munda people which connect them to the Southeast Asia are much more diverse in Southeast Asia. This matters because population expansions and migrations tend to homogenize lineages through greater genetic drift, with the “source” population more likely to maintain diversity. Additionally, there was also evidence of a genetic variant in EDAR which has the hallmark of recent increase in frequency across eastern Asia. This seems to peg the Munda arrival to the Holocene, not the Pleistocene. Finally, there is the pattern of male lineages exhibiting some concordance with Southeast Asia, but female lineages being entirely indigenous. This is a classic expectation from a model of migration where there was a strong bias toward males because of the mobility of these groups, which lacked women and children.

I decided to further explore the question using the Estonian Biocenter data sets, as well as the HGDP and HapMap. For those of you who are curious about the technical details, I LD pruned the Estonian Biocenter marker set from ~600,000 down to ~130,000. I also put the samples through –geno 0.01 and –mind 0.80 on Plink to get high quality individuals and good coverage on markers. To be explicitly clear, I renamed and combined some of the populations in the original data set (e.g., Chamars = UP_Dalits). I ran a preliminary MDS to make sure that the data wasn’t strange, and it checked out.

So to do the analysis I ran TreeMix. I used Chinese Americans as the root outgroup population, and wanted 5 migrations, and also tried to correct for any remaining LD by looking across a window of 1,000 SNPs. You can view my first plot below.

The primary thing I would focus on is the gene flow from Cambodians to Munda. This is exactly what one might expect if the Munda were intrusive to South Asia. More interestingly, observe that there is no gene flow into Burmese from the South Asian groups, even though they are much closer proximity to South Asia! This is probably picking up something deep in history then. The fact that the Munda diverge early from other South Asian groups is also in keeping with Admixture or Structure bar plot results: the South Asian ancestry of the Munda is relatively unadmixed.

Next I wanted to focus more on the eastern population flows. So I removed a lot of the western groups which overwhelmed my gene flow edges.

In this scenario again there is a gene flow parameter from the rough region of the Cambodian node. Perhaps more curious now there is a powerful gene flow parameter into the Burmese from the same locus.Totally intelligible in light of the fact that the modern Burmese are genetically a hybrid population between Tibeto-Burman and Mon (Austro-Asiatic).

I’m certainly not ready to assert that the “case is closed.” But it seems that we need to shift our probabilities again toward the intrusive hypothesis.

Image credit: Wikipedia

🔊 Listen RSS

Over at Scientific American Christie Wilcox has a post up with the provocative title, People With Brown Eyes Appear More Trustworthy, But That’s Not The Whole Story, which reports on a new PLoS ONE paper, Trustworthy-Looking Face Meets Brown Eyes. Like Christie I would enjoy illustrating this post with my own trustworthy and youthful brown eyed visage, but I worry that my mien is a bit on the sly side! In any case, what of the paper? Wilcox reviews the salient points of the results. In short, the issue here is that brown eyed men seem to have more ‘trustworthy faces’ than blue eyed men. When the eyes were digitally manipulated it turned out that color had no influence on perception. Rather, it was the correlation between eye color and facial proportion which which was driving the initial association. Christie finishes:

Given the importance of trust in human interactions, from friendships to business partnerships or even romance, these findings pose some interesting evolutionary questions. Why would certain face shapes seem more dangerous? Why would blue-eyed face shapes persist, even when they are not deemed as trustworthy? Are our behaviors linked to our bodies in ways we have yet to understand? There are no easy answers. Face shape and other morphological traits are partially based in genetics, but also partially to environmental factors like hormone levels in the womb during development. In seeking to understand how we perceive trust, we can learn more about the interplay between physiology and behavior as well as our own evolutionary history.

These findings do pose evolutionary questions, and I am interested in the correlations of behavior and eye color, and I have been so in the past. But, I have many qualms about the reliability of this literature now. When I read the post initially my eyes immediately sought out the plot you see above. Observe the intervals. Such intervals would not concern me in a simpler design, or in a model where the hypothesis already had prior support, but this is a peculiar and potentially counter-intuitive result. Additionally, if you peruse the methods section of the paper notice the attempts to control for demographic confounds in the linear regression model. There’s nothing wrong this, and due to the nature of the sample size (< 100) there was no chance that they’d get a perfectly ideal study population. But these sorts of statistical techniques are exactly the flavor of powerful tools which have been so abused in psychology and biomedical science, consciously and unconsciously. You can squeeze a correlation out of a rock.

This is an area where Jim Manzi would say we’re confronted with ‘high causal density.’ There is a literature which suggests that there are behavioral differences between blue eyed and brown eyed children. Unfortunately when Jason Malloy looked to see if there were differences in the huge NLSY data set he couldn’t find it. This doesn’t mean that there aren’t differences between individuals that differ by this phenotype, but the difference might be subtle, and one needs to tease apart various confounds. It reminds me somewhat of the confused literature on sexual attraction and MHC. There may be something there, but the papers often present contradictory results, or add a complexifying layer (e.g., you are attracted to individuals who smell different from your opposite sex parent, but not too different).

To cut to the chase on this specific paper and results, would I bet money that this will pan out? No. I think the results are probably not robust. Do I think that there are going to be biobehavioral differences between individuals with blue eyes vs. individuals with brown eyes? Here, much more cautiously, as my confidence is low, I think there actually will be found to be some phenomena of interest and difference. What one needs to do in this case I think is look at sibling pairs. Because as it happens due to the genetic architecture of eye color inheritance in Europeans you have a huge potential sample space of siblings with different eye colors, who share much genetically, and a common home environment.

Which brings me to genetics and evolution. Though I might nitpick with the methods and results of the paper which Christie reviewed above, I think they’re defensible, as far as it goes. But some of the discussion really leaves me scratching my head:

Therefore, we tentatively suggest that a combination of sex linkage and sexual selection is the most probable explanation for the reported covariance between brown eyes and trustworthy-looking faces. Also, the blue-eyed phenotype is now abundant in Northern Europe and hence should have some kind of adaptive advantage, most likely one favored by sexual selection…that compensates for the loss of perceived trustworthiness. The trade-off between a preference for colorful and visible physical features and the advantage of a trustworthy-looking face might have contributed to the high variability of European eye and hair color.

Consider this sentence: the dry earwax phenotype is now abundant in Eastern Asia and hence it should have some kind of adaptive advantage. Just because a trait is abundant does not mean that it is selectively advantageous. Rather, pleiotropy means that traits without advantage may spread, just as hitchhiking during a selective sweep can result in the spread of alleles which are not the direct targets of selection. Though the authors allude to the genetic literature (and it is cited), they do not explore it in much detail. This is a shame, because the genetics of blue eyes have been well explored in the past 5 years. I’d hazard to assert that we now understand it.

Heterozygote (my daughter)

The inheritance pattern of blue and brown eyes was one of the classic illustrations of the recessive expression of phenotypes in Mendelian genetics. In other words, two blue eyed parents could only give rise to blue eyed children, while brown eyed parents could potentially give rise to both eye colors, because the brown eyed phenotype was inclusive of homozygotes and heterozygotes (brown being dominant to blue). This is informative, but it is too simple a description of the way inheritance works in the real world. About ~75% of the blue vs. non-blue eye color variation in Europeans seems to be due to a locus which spans the genes OCA2 and HERC2. There are assorted modifier genes, but to a first approximation it is this locus which classic Mendelian models were detecting in terms of segregation within the population. But ~75% is not 100%, and there are more eye colors than blue and brown. In other words, eye color inheritance is complicated, but not too complicated.

Perhaps a more important point is that the OCA2-HERC2 region is not limited to iris pigmentation in its effect. This was originally a region where an albinism mutation was localized. There is evidence that it impacts skin color in Europeans and Asians. And, these variants in this region do seem to be targets of natural selection. One immediate thing that jumps out at you for the European variants is that they are characterized by a long block of the genome which is co-inherited together, a hallmark of recent natural selection. Second, it is important to note that the block is long for another reason. And that reason is why I’m skeptical that the reason that this region was selected initially for blue eye color. Though the original recessive expression single gene model is too simple, it is correct that most individuals with blue eyes tend to be homozygotes. This means that in the initial stage of the allele’s increase in frequency trajectory it will rise in proportion very slowly, because most variants will be in heterozygotes, and so would not be favored by natural selection. Rather than a very long homogeneous block you’d expect a narrower region, because recombination will have mixed & matched the region during the early phase. This may all sound abstruse, but evolutionary hypotheses are most persuasive when they rest on a solid genetic basis. We have a good understanding of the genetics of eye color, and a more modest one of its evolutionary history. We should leverage that.

It was my intent over the course of this post to back into a domain with lower ‘causal density.’ The genetics of eye color is not really simple, but it is intuitively tractable. In contrast the story outlined in the PLoS ONE strikes me as problematic because though the results are statistically significant in some specific conditions, the overall story is complicated, and requires some unpacking. A more general issue which goes at the heart of the problem of constructing plausible evolutionary stories for the origin of phenotypes is there are many, many, phenotypes. Probing for correlations across any pair of phenotypes, most of the time you won’t find one (at least to statistical significance). But the process will eventually yield correlations. Some will be giving you much insight, but many will be spurious.

So what is a future avenue of exploration of this topic? I’m interested in genetics, so you know where I would go. Look at the sibling pairs, and see if the correlation with face shape and eye color holds. But more importantly the genetics of facial morphology are finally starting to be elucidated. It turns out that the trait is highly polygenic, with each locus predicting only a small proportion of trait variance. To me this poses an immediate problem in attempting to posit a genetic correlation with eye color, since that trait has a genetic architecture where most of the variance is localized around one region of the genome. But the difference in face shape here may be much more subtle, and so not picked up in the GWAS analyses which have recently come out.

What I’m hoping for in the future are simple explanations of very large data sets. Here we got a somewhat complex explanation for a not so large data set.


🔊 Listen RSS

A week ago Keith Kloor had a post up, What Science, Environmentalism and the GOP Have in Common, where he bemoaned the lack of representation of non-whites in these categories. As a matter of fact I think Keith is wrong about science. Even constraining the data set to American citizens and permanent residents people of Asian ancestry are well represented in many areas of science. But not all sciences are created equal. In 2011 there were 158 doctorates which were awarded within the category of ‘evolutionary biology’ for American citizens or permanent residents. Of these 135 were non-Hispanic white, and 5 were Asian. In ‘neuroscience’ the respective figures were 742, 535, and 96. In ‘zoology’ 55, 49, and 0. In ‘bioinformatics’ they were 80, 51, and 17. Finally, in ‘ecology’ the breakdown was 330, 300, and 11. If you are involved in academic biology I’m rather sure that these numbers won’t surprise you too much, even if you’d never thought about it. You can even infer these by walking through the posters at ASHG 2012, and seeing how the demographics of the crowds shift.

We can look at this issue another way. In 2010 US News & World Report listed the top 10 ecology & evolution graduate programs. I went to the faculty websites after typing the university and ‘ecology,’ and then ‘neuroscience.’ Looking at names, and sometimes head shots, I classified everyone as ‘Asian’ (as defined by the US Census) and ‘Not Asian.’ You can find the data here. Please note that the left columns are ecology faculty, and the right are neuroscience.

The raw results are:

University & Department Asian Not Asian % Asian
Berkeley – Ecology 0 46 0.0%
Berkeley – Neuroscience 4 40 10.0%
Harvard – Ecology 3 48 6.3%
Harvard – Neuroscience 21 127 16.5%
Davis – Ecology 8 117 6.8%
Davis – Neuroscience 12 73 16.4%
Chicago – Ecology 3 22 13.6%
Chicago – Neuroscience 11 65 16.9%
Stanford – Ecology 2 17 11.8%
Stanford – Neuroscience 19 74 25.7%
Cornell – Ecology 1 31 3.2%
Cornell – Neuroscience 3 39 7.7%
UTexas – Ecology 3 43 7.0%
UTexas – Neuroscience 7 63 11.1%
Yale – Ecology 0 23 0.0%
Yale – Neuroscience 13 83 15.7%
Princeton – Ecology 0 15 0.0%
Princeton – Neuroscience 2 17 11.8%
Arizona – Ecology 0 54 0.0%
Arizona – Neuroscience 0 20 0.0%


And here are charts of % and counts:

Does this matter? In American society, especially from the center to the left of the social-cultural spectrum, there is a premium on diversity. Usually this means specifically cases of racial and gender diversity (again, as I have contended before the nod to class diversity is almost always perfunctory, and there is only marginal concern about ideological diversity). As a rule within these parameters the question about diversity is usually ‘why not,’ in as proportions out of sync with the population immediately prompt questions as to why this might be. My own personal position is at variance with this. Rather, my attitude is more ‘so what?’ I generally don’t care about these things personally. Unlike most my default assumption isn’t that all groups will have the same aptitudes and preferences, and so it is difficult to assess the scope and nature of the idealized demographic mix sans discrimination. In the sciences what is of importance to me is not ‘who,’ but ‘what’? That is, what is being discovered.

The question in regards to Asian Americans with American biological science is of personal interest to me. My own passions lean strongly to evolutionary biology. Any curiosity about genomics and bioninformatics is prompted by population and evolutionary genetic questions. Frankly, this means that I spend a great deal of time around white people, because for whatever reason evolutionary biology is far more white than many other areas of life science. In contrast, if I stumble into a molecular biology or neuroscience seminar the audiences are by nature far more diverse, with diversity being due to the large contingent of people of Asian ancestral background.

I don’t know if this matters in any deep way. I suspect if Asian Americans were as well represented in human evolutionary genomics as they are in cancer research there might be some stronger and earlier focus on questions of ascertainment bias due to early Eurocentric data sets. But this would be only a shift on the margins; it isn’t as if evolutionary biologists aren’t aware of the issue at all. More importantly I wanted to highlight this difference across fields because I think it illustrates the proximate power of preferences and expectations, rather than discrimination or lack of outreach. To give an example of what I mean, my father, who has a doctorate in physical chemistry, once quipped me that ‘it would be nice if you studied neuroscience, then I could just tell people you study the brain.’ Though conveniently for him since my major area of concern is genetics that is something that he can tell his friends which is intelligible, though questions always get back to me about ‘genetic engineering’ and ‘gene therapy,’ suggesting that people assume my topics must be biomedical. For whatever reason most of the young Asian Americans who enter university and study biology of some sort do not tend to gravitate into areas like ecology or evolution. An Asian American acquaintance who is an ecologist has even joked to me that sometimes his friends refer to him as a ‘twinkie‘ on account of his disciplinary focus. I do not believe that the lack of representation of Asian Americans within ecology or evolution has to do with discrimination, nor do I think that biomedical science has less implicit bias against people of Asian heritage. To be succinct, many Asian American youth who pursue graduate school in science may already elicit raised eyebrows because they did not pursue medical school. Going off to study the phylogeny of starfish, or some such thing, would frankly result in even more bewilderment and disappointment.

In this case it seems clear that the problem is not discrimination or bias (though that exists, I don’t think it varies that much across fields), but a cultural preconception as to what science merits one’s professional energies. Evolutionary biologists could go into Korean American churches to argue for the value of their discipline, but even assuming individuals their audience did not hold Creationist beliefs (many would), it would be a hard sell to convince them that abstract and theoretical evolutionary questions are more worthy of attention than projects with a more practical biomedical focus. This isn’t going to convince people who start out with the null hypothesis that variation in discriminatory atmosphere explains variation in representation in fields by race and ethnicity, but, I hope it makes people reconsider different hypotheses.

Addendum: Also, bemoaning the lack of ‘minorities’ in science often seems a case of the ‘How Asians became white‘ phenomenon.

• Category: Science • Tags: Select 
🔊 Listen RSS

The above image, and the one to the left, are screenshots from my father’s 23andMe profile. Interestingly, his mtDNA haplogroup is not particularly common among ethnic Bengalis, who are more than ~80% on a branch of M. This reality is clear in the map above which illustrates the Central Asian distribution my father’s mtDNA lineage. In contrast, his whole genome is predominantly South Asianform, as is evident in the estimate that 23andMe provided via their ancestry composition feature, which utilizes the broader genome. The key takeaway here is that the mtDNA is informative, but it should not be considered to be representative, or anything like the last word on one’s ancestry in this day and age.

As a matter of historical record mtDNA looms large in human population genetics and phylogeography for understandable reasons. Mitchondria produce more genetic material than is found in the nucleus, and so were the lowest hanging fruit in the pre-PCR era. Additionally, because mtDNA lineages do not recombine they are well suited to a coalescent framework, where an idealized inverted treelike phylogeny converges upon a common ancestor. Finally, mtDNA was presumed to be neutral, so reflective of demographic events unperturbed by adaptation, and characterized by a high mutation rate, yielding a great amount of variation with which to differentiate the branches of the human family tree.

Many of these assumptions are are now disputable. But that’s not the point of this post. In the age of dense 1 million marker SNP-chips why are we still focusing on the history of one particular genetic region? In a word: myth. Eve, the primal woman. The “mother of us all,” who even makes cameos in science fiction finales!

In 1987 a paper was published which found that Africans harbored the greatest proportion of mtDNA variation among human populations. Additionally, these lineages coalesced back to a common ancestor on the order of 150,000 years ago. Since mtDNA is present in humans, there was a human alive 150,000 years ago who carried this ancestral lineage, from which all modern lineages derive. Mitochondrial DNA is passed from mothers to their offspring, so this individual must have been a woman. In the press she was labeled Eve, for obvious reasons. The scientific publicity resulted in a rather strange popular reaction, culminating in a Newsweek cover where Adam and Eve are depicted as naked extras from Eddie Murphy’s Coming to America film.

The problem is that people routinely believe that mtDNA Eve was the only ancestress of all modern humans from the period in which she lived. Why they believe this is common sense, and requires no great consideration. The reality is that the story being told by science is the story of mtDNA, with inferences about the populations which serve as hosts for mtDNA being incidental. These inferences need to be made cautiously and with care. It is basic logic that a phylogeny will coalesce back to a common ancestor at some point. Genetic lineages over time go extinct, and so most mtDNA lineages from the time of Eve went extinct. There were many woman who were alive during the same time as Eve, who contributed at least as much, perhaps more, to the genetic character of modern humans today. All we can say definitively is that their mtDNA lineage is no longer present. As mtDNA is passed from mother to daughter (males obviously have mtDNA, but we are dead ends, and pass it to no one), all one needs for a woman’s mtDNA lineage to go extinct is for her to have only sons. Though she leaves no imprint on the mtDNA phylogeny, obviously her sons may contribute genes to future generations.

Prior to ancient DNA and the proliferation of dense SNP data sets scholars were a bit too ambitious about what they believed they could infer from mtDNA and Y lineages (e.g., The Real Eve: Modern Man’s Journey Out of Africa). We are in a different time now, inferences made about the past rest on more than one leg. But the legend of Eve of the mtDNA persists, not because of its compelling scientific nature, but because this is a case where science piggy-backs upon prior conceptual furniture. This yields storytelling power, but a story which is based on a thin basis of fact becomes just another tall tale.

All this is on my mind because one of the scientists involved with Britain’s DNA, Jim Wilson, has penned a response to Vincent Plagnol’s Exaggerations and errors in the promotion of genetic ancestry testing (see here for more on this controversy). Overall I don’t find Wilson’s rebuttal too persuasive. It is well written, but it has the air of sophistry and lawyerly precision. I have appreciated Wilson’s science before, so I am not casting aspersions at his professional competence. Rather, some of the more enthusiastic and uninformed spokespersons for his firm have placed him in a delicate and indefensible situation, and he is gamely attempting to salvage the best of a bad hand. Importantly, he does not reassure me in the least that his firm did not use Britain’s atrocious libel laws as a threat to mute forceful criticism of their business model on scientific grounds. A more general issue here is that Wilson is in a situation where he must not damage the prospects of his firm, all the while maintaining his integrity as a scientist. From what I have seen once science becomes a business one must abandon the pretense of being a scientist first and foremost, no matter how profitable that aura of objectivity may be. The nature of marketing is such that the necessary caution and qualification essential for science becomes a major liability in the processing of communicating. It’s about selling, not convincing.

Going back to Eve, Wilson marshals a very strange argument:

“The claim that Adam and Eve really existed, as you suggest, refers to the most recent common ancestors of the mtDNA and non-recombining part of the Y chromosome. I don’t agree that there is nothing special about these individuals: there must have been a reason why mitochondrial Eve was on the front cover of Time magazine in the late 80s!….

A minor quibble, but I suspect he means the Newsweek cover. More seriously, this line of argumentation is bizarre on scientific grounds. Rather, it is a tack which is more rational when aiming toward a general audience which might purchase a kit which they believe might tell them of their relationship to “Eve.”

In the wake of the discussion at Genomes Unzipped I participated in further exchanges with Graham Coop and Aylwyn Scally on Twitter, and decided to spend 20 minutes this afternoon asking people what they thought about mitochondrial Eve. By “people,” I mean individuals who are pursuing graduate educations in fields such as genetics and forensics. My cursory “field research” left me very alarmed. Naturally these were individuals who did not make elementary mistakes in regards to the concept, but there was great confusion. I can only wonder what’s going through the minds of the public.

Analogies, allusions, and equivalences are useful when they leverage categories and concepts which we are solidly rooted in, and transpose them upon a foreign cognitive landscape. By pointing to similarities of structure and relation one can understand more fully the novel ground which one is exploring. Saying that the president of India is analogous to the queen of England is an informative analogy. These are both positions where the individual is a largely ceremonial head of state. In contrast, the president of the United States and the queen of England are very different figures, because the American executive is not ceremonial at all. This is not a useful analogy, even though superficially it sees no lexical shift.

Who was Eve? A plain reading is that she is the ancestor of all humans, and more importantly, the singular ancestress of all humans back to the dawn of time. This is a concept which the public grasps intuitively. Who is mtDNA Eve? A woman who flourished 150,000 years ago, who happened to carry the mtDNA lineage which would drift to fixation in the ancestors of modern humans. I think this is a very different thing indeed. For purposes of poetry and marketing the utilization of the name Eve is justifiable. But on scientific grounds all it does is confuse, obfuscate, and mislead.

The fiasco that Vincent Plagnol stumbled upon is just a symptom of a broader problem. Scientists need to engage in massive conceptual clean up, as catchy phrases such as “mitochondrial Eve” and “Y Adam” permeated the culture over the past generation, and mislead many sincere and engaged seekers of truth. This is of the essence because personal genomics, and the scientific understanding of genealogy, are now moving out of the ghetto of hobbyists, enthusiasts, and researchers. Though I doubt this industry will be massive, it will be ubiquitous, and a seamless part of our information portfolio. If people still have ideas like mitochondrial Eve in their head it is likely to cloud their perception of the utility of the tools at hand, and their broader significance.

🔊 Listen RSS

Romanis-historical-distributionIf you live in the States one of the things you hear a lot about Europe in regards to its relationship to its ethno-religious minorities are the problems with Muslims. This is probably an Americo-centric perspective shaped by 9/11, when many of the hijackers had turned out to have spent time in Germany. Additionally, terrorist actions in both London and Madrid highlight the persistence of these problems over the years. These sorts of shocking events put a sharp focus on the geopolitical cross-hairs which Europe finds itself in in the second age of mass migration. Though this time it is a destination, and not a source.

But having been to Europe recently it was notable that in several regions the day-to-day tension when it came to ethnicity often focused on Gypsies (I use the older term because the ethnonym “Roma” which has become politically correct in the USA includes only a subset of Europe’s Gypsy population, even if the greater number). Many regions of Europe now have two distinct populations of Gypsies, a long resident local group, as well as Roma from the eastern nations of the EU. Though the relationships between these traditionally nomadic peoples and indigenous populations has never been without tension, it is clear that something close to a modus vivendi has been achieved in many European nations between the majority and their small native Gypsy populations. The influx of the Balkan Roma add a new variable. But the political fuss for me simply rekindled a curiosity as to the genetic origins of the Gypsies. Culturally their South Asian provenance couldn’t be clearer; they speak an Indo-Aryan language. Their term for themselves in many parts of Europe comes from the Indo-Aryan word for “black,” as they are are darker than the natives of the lands in which they have settled , and in fact often look visibly South Asian. This seemed especially true of Balkan Roma. On the other hand the Kale of Finland looked to be brunette Europeans.

The problem with the genetics of the Gypsy people of Europe is that until recently they’ve focused on uniparental lineages. Though this has confirmed their South Asian origins, looking at maternal or paternal direct descent alone leaves something to be desired in terms of assessing ancestry, and, these two markers (mtDNA and Y) are subject to more drift as they are haploid (half as many copies). But a new paper in The American Journal of Physical Anthropology has some results using 16 autosomal STRs (a group of highly variant markers). A Genetic Historical Sketch of European Gypsies: The Perspective From Autosomal Markers:

In this study, 123 unrelated Portuguese Gypsies were analyzed for 15 highly polymorphic autosomal short tandem repeats (STRs). Average gene diversity across the 15 markers was 76.7%, which is lower than that observed in the non-Gypsy Portuguese population. Subsets of STRs were used to perform comparisons with other Gypsy and corresponding host populations. Interestingly, diversity reduction in Gypsy groups compared to their non-Gypsy surrounding populations apparently varied according to an East-West gradient, which parallels their dispersion in Europe as well as a decrease in complexity of their internal structure. Analysis of genetic distances revealed that the average level of genetic differentiation between Gypsy groups was much larger than that observed between the corresponding non-Gypsy populations. The high rate of heterogeneity among Gypsies can be explained by strong genetic drift and limited intergroup gene flow. However, when genetic relationships were addressed through principal component analysis, all Gypsy populations clustered together and was clearly distinguished from other populations, a pattern that suggests their common origin. Concerning the putative ancestral genetic component, admixture analysis did not reveal strong Indian ancestry in the current Gypsy gene pools, in contrast to the high admixture estimates for either Europeans or Western Asians.

This isn’t a 500,000 SNP-chip analysis, so everything needs to be taken with a grain of salt. But, 16 markers is a lot more than the two you usually have to deal with when assessing the genetics of the Gypsy populations of Europe, so it’s certainly an improvement when making inferences. One figure and table are really worth looking at in this paper.


The first plot shows the variance partitioned into two dimensions as a function of the 13 STRs. The table shows bootstrapped admixture estimates and standard deviations. They had a 3-population model with West Asians, but it didn’t look to me like they were getting sensible results with that, so I excised that portion (with only 600 pixels the table would have been very hard to read with the nonsensical estimates in). I think the last model where they aggregate West Asians with Europeans makes the most sense. I assume the major issue here is that with 16 STRs which aren’t necessarily filtered for ancestral informativeness within these populations you’re going to get weird results on the margins.

These results confirm the finding from previous Y and mtDNA results that Europe’s Gypsy populations are genetically fragmented, and seem to have gone through bottlenecks. In this paper they also seem to have found a pattern of decreased genetic variance from east to west for the Gypsy groups, which makes sense in light of a historical model of serial bottlenecks as they traversed Europe. Any reasonable model of the genetic heritage of the Gyspy people of the world posits that they’re a compound to various extents of populations distributed along a continuum between South Asia and Western Europe, and yet here you see a 2-dimensional plot that they don’t look like a linear combination of South Asians and Europeans. Why? Because of their unique genetic history has resulted in their “random walk” into patterns of allelic variance distinct from the ancestral groups.

But a second genetic dynamic with these populations seems to be admixture. With 16 STRs, and obvious sensitivity depending on the populations you survey, one should be careful about overweighting the findings from this paper. And yet plausibly it does show a pattern of decreased South Asian admixture the further you go from the Balkans. Not only does this stand to reason a priori, but empirically it’s generally agreed that the Gypsy groups of the north and west of Europe look less South Asian in appearance than those of the Balkans.

A final consideration here is that the Indian populations which they used as a reference for South Asians are not representative of the ancestral Indian groups from which Gypsies derived. The Indo-Aryan language of the Gypsies seems to share the most features with the language of northwest India, Punjabi and Hindi. But the samples which had the appropriate STRs for comparison were Central and South Indian. Overall I don’t think that’s that much of a consideration, but something to remember.

A bigger take home point is the disjunction between cultural and biological modes of inheritance and persistence. The language of the Gypsies retains in its broad outlines the character of an Indo-Aryan tongue. That is why the South Asian origins of Gypsies was able to be ascertained by Indian sailors in Britain who overheard, and broadly understood, what Gypsies were saying. Romanipen, the spirit of Gypsy culture which transcends difference of religion and nationality, seems to be clearly traceable to some South Asian antecedents (e.g., the emphasis on avoidance of contamination of food by outsiders).

And yet despite the cultural distinctiveness the various Gypsy populations have become genetically less South Asian. That makes sense, it seems likely that they left India ~1000 years ago, or 40 generations. They’ve been in the Balkans for about 600 years, or 24 generations. Let’s assume unrealistically that the Roma were 100% South Asian when they arrive in the Byzantine lands (there are related groups in the Middle East, so it seems certain they picked up Middle Eastern ancestry along the way, but no matter). 99% endogamy per generation would imply that they’d be 79% South Asian today. 95% endogamy would result in them being 29% South Asian. 90% endogamy would mean that they’d be 8% South Asian. Reality is more complex. It is likely that in the early periods when social norms had not hardened and Roma were less numerous the endogamy rates were probably far lower, especially as the Gypsy bands mixed with other destitute groups in the Balkans. The evidence of lots of structure across the Gypsy groups points to endogamy drilling down to a lower level of organization than just the ethnic group, which would be consistent with tendencies within South Asian culture more broadly.

More generally it seems that the Roma and their relatives can’t just be understood as a simple linear combination of Europeans, Middle Easterners, and South Asians, genetically or culturally. Their unique history has reshaped them, and their persistence and demographic expansion in the face of ostracism and persecution are clear evidence as to the functional success of their social-cultural traditions.

Citation: Gusmão A, Valente C, Gomes V, Alves C, Amorim A, Prata MJ, & Gusmão L (2010). A genetic historical sketch of European Gypsies: The perspective from autosomal markers. American journal of physical anthropology, 141 (4), 507-14 PMID: 19918999

🔊 Listen RSS

Uni_Freiburg_-_Philosophen_Interesting post by Gretchen Reynolds reviewing the evidence on exercise and intelligence. The title is “Phys Ed: Can Exercise Make Kids Smarter?”, so this is definitely seen as something which is “actionable” in a public policy sense, especially in light of the increases in obesity among young people. Intuitively I think most people are going to agree with this in the United States. In fact, when you’re down with the flu or some other illness you are generally less productive (most of the films I’ve watched over the past three years have been when I’m ill since I can’t focus on difficult material), so there’s probably going to be a natural connection made between greater cognitive function with greater health.

First, Reynolds points to a study which shows that:

1) The most fit children are more intelligent than the least fit as adduced from psychometric tests

2) The most fit children ‘had significantly larger basal ganglia, a key part of the brain that aids in maintaining attention and “executive control,” or the ability to coordinate actions and thoughts crisply.’ The researchers controlled for socioeconomic status and body mass index,

A second study indicated that the fit children had better working memory and greater hippocampal volume. Finally, an earlier study using data from Swedish conscripts showed that even among identical twins the fitter ones were more intelligent. Note that the primary author was the same on the first two studies. Before commenting further how about looking at some tables and/or figures from the papers?

The first image has two tables from the first paper, the second two images are from the second paper, and finally, the last is from the last paper.

[nggallery id=13]

As most of you know just because papers make it through peer review doesn’t imply that they’re going to stand the test of time. Over the years I’ve also gotten more and more skeptical of neuroimaging results, primarily because there’s now psychological evidence that images of brains add to the credibility of research in a very irrational fashion. To really understand the first two studies you probably have to be a cognitive neuroscientist, in particular, one with some background in psychometrics. The last study is more straightforward as you’re comparing dizygotic and monozygotic twins, and seeing the correlations between traits as a function of genetic relatedness. The latter are genetically identical, in theory if not totally in practice, so one presumes that the differences may be environmental.

Perhaps, but it depends on what you label “environment.” We may be seeing differences which derive from random events in the fetal environment, or during early stages of development. Aspects of fitness are often correlated. If athletic and intellectual prowess are both embedded in numerous genetic and physiological pathways, which seem likel y, then variations due to stochastic aspects of development may affect both trait clusters in the same fashion.

In other words I’d say to make a strong case for the efficacy of exercise and aerobic health as a driver of higher intelligence we should wait for more research. On the other hand there are plenty of data on the value of aerobic health more generally, and the downsides of obesity, so there are other grounds on which to move forward. I suspect if these sorts of studies get into the Zeitgeist you’ll have pretty dumb books published soon with titles like “How 1 hour of exercise a day can give you 10 I.Q. points! (as shown by studies!)”.

Note: A quick lit search yields papers like this, so I’m not totally clear that there are robust long term cognitive benefits to exercise, though in some cases there seems to be.

Image Credit: Michael Schmalenstroer

• Category: Science • Tags: Health, Select, Social Science 
🔊 Listen RSS

800px-Pfau_imponierendSexual selection is, for lack of a better term, a sexy concept. Charles Darwin elaborated on the specific phenomenon of sexual selection in The Descent of Man, and Selection in Relation to Sex. In The Third Chimpanzee Jared Diamond endorsed Darwin’s thesis that sexual selection could explain the origin of human races, as each isolated population extended their own particular aesthetic preferences. More recently the evolutionary psychologist Geoffrey Miller put forward an entertaining, if speculative, battery of arguments in The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature. It’s clearly the stuff of science that can sell.

Sexual selection itself comes in a variety of flavors. Perhaps the most counterintuitive one on first blush is the idea that many traits, such as antlers, are positively costly and exist only to signal robust health which can incur the cost without debility. The idea was outlined by Amotz Zahavi in The Handicap Principle in the 1970s. Initially dismissed by Richard Dawkins in the original edition of The Selfish Gene, Zahavi’s ideas have come into modest mainstream acceptance, and the second edition of Dawkins’ seminal work reflects a revised appraisal. This is really a subset of a “good genes” model of sexual selection, whereby females select from a range of males which would exhibit variance in mutational load. A more capricious and erratic form of sexual selection is “runaway,” which like genetic drift needs no rhyme or reason. Rather, arbitrary initial preferences can become coupled with heritable preference in a positive feedback loop which drives the mean phenotypic value of a population off the previous median, until natural selection enforces a countervailing pressure once the trait starts to become excessively maladaptive (e.g., imagine selection for longer and longer tail feathers until the ability of a bird to fly is inhibited).

ResearchBlogging.orgPaul_Giamatti_2008But notwithstanding the inevitable press which the theory gets, and its centrality to several popular science books, the main action in the area of sexual selection is in the academic literature (contrast this with the aquatic ape hypothesis). Many of the verbal outlines of sexual selection are highly stylized, as economists might say. We are treated to images of stags with massive antlers facing off, elephant seals strutting their stuff, and beautifully plumaged birds gathering for a lek. Set next to this is a body of mathematically oriented models, short on color, long on Greek symbols. But these formal models are valuable. Obviously there is a wide range of variation across species in terms of how sexual selection plays out (if it does so at all within a given species, sexual or asexual). The sexual dimorphism of elephant seals is not the norm against which all species are judged. To explore the variables which produce this pattern of difference one must analyze them in an algebraic fashion, where each can be manipulated in isolation so as to properly characterize its impact. So with that, a paper from The American Naturalist which purports to show how assortative mating could emerge in a sexual selective framework, Make love not war: when should less competitive males choose low-quality but defendable females?:

Male choosiness for mates is an underexplored mechanism of sexual selection. A few theoretical studies suggest that males may exhibit—but only under rare circumstances—a reversed male mate choice (RMMC; i.e., highly competitive males focus on the most fecund females, while the low‐quality males exclusively pair with less fecund mates to avoid being outcompeted by stronger rivals). Here we propose a new model to explore RMMC by relaxing some of the restrictive assumptions of the previous models and by considering an extended range of factors known to alter the strength of sexual selection (males’ investment in reproduction, difference of quality between females, operational sex ratio). Unexpectedly, we found that males exhibited a reversed mate choice under a wide range of circumstances. RMMC mostly occurs when the female encounter rate is high and males devote much of their time to breeding. This condition‐dependent strategy occurs even if there is no risk of injury during the male‐male contest or when the difference in quality between females is small. RMMC should thus be a widespread yet underestimated component of sexual selection and should largely contribute to the assortative pairing patterns observed in numerous taxa.

The title is accessible and charming, but the paper is dense on mathematical formula and computational esoterica. It screams “trust me with my parameters!” But reality is a complex and manifold thing, and it may be that to model it one must go beyond elegant simplicity. As noted in the above abstract sexual selection models are often spare. That’s the beauty of a model, you remove all you can from the reconstruction of reality until you start losing the aspects of reality which you’re trying to understand and predict. I am not totally familiar with the sexual selection literature, so the first table is helpful insofar as it gives a sense of the scope of previous models which this paper is an extension of, and to some extent rejoinder to.


The main parameters to focus on in this study are the quality of the males and females, the competition between males, and the cost of mating. All the parameters checked off for the current study relate to these broad classes; density for example would increase competition, as would shifting the sex ratio. This being a model of the “mating game” rather than all the phenomena which might occur in the life history of individuals in a species, it is constrained in a somewhat peculiar manner. Males have a specific finite lifetime, and can enter into a serial set of relationships. These relationships are of finite length naturally, and, a particular fraction of the lifetime of a given male, though that fraction may vary within the model. Additionally, males have to engage in “pre-copulatory guarding” before gaining a reproductive payoff. Basically, the male can not mate for a period of time after pairing up with a female. During this guarding period the male may have to fend off suitors, so there is a risk that the investment is all for naught. This is the dimension where the quality of both male and female come into play. For example, low quality males are not good defenders, and high quality females will attract a lot of attention. There are also factors such as predation risk while seeking a partner, which one must do if one loses one’s current partner to a superior male, or, one is initially unpaired and is deciding whether to reject to accept the offers of pairing up with a female.

Frankly, the model outlined in the paper is convoluted, and it probably says something that they have to nest a lot of the details into the supplements. Table 2 has all the parameters of interest.


As you can see some of the parameters have a few discrete values. Some of these are obviously continuous variables in reality, but for the purposes of modeling you have to simplify, especially if you’re going to do something computationally intensive. They ran the “game” of interactions over several different variations of the parameters, and noted how males varied in their evolutionarily stable strategy. Below are three figures which illustrate the response topographies of males of high and low quality to females of high and low quality, with number of interactions on the y-axis (the axis projecting “away” from your viewpoint perspective), and “rejection index” on the z-axis (vertical). High quality males are in the top panels, low quality males in the bottom panels, high quality females in the left panels, and finally, low quality females in the right panels. Each figure has a different parameter varied on the x-axis, as per the labels.

[nggallery id=10]

The rejection index is such that below 0 denotes acceptance and above rejection. In the first figure the variable is the time invested in each reproductive event, ranging from 1% to 50% of the male’s lifetime. In this situation high quality males accept high quality females, and reject low quality females, invariably. But low quality males are more accepting of low quality females as the time invested increases, and tend to reject high quality females. Why? High quality females would likely attract attention from high quality males, against whom the low quality males could not compete successfully. In the mating game pairing up with a high quality female would be a low payoff action, as the probability of keeping such a female and reproducing is low. The logic is inverted for low quality females, who would attract less attention from other males. Granted, these females are less fecund, but low fecundity is better than no fecundity from the perspective of the low quality male.

The second figure varies fecundity ratio between the high and low quality females, from 5% to 100%. In the second case there’s no difference in fecundity between the two classes, and that explains panel B, where the high quality males drop sharply into acceptance territory for low quality females as the x-axis verges to 100%. For low quality males the picture is different, as they begin to reject much more quickly once the ratio difference starts to converge. Observe however the effect of the y-axis, number of female interactions assuming one is not guarding a mate. As the number of these interactions increases the rejection threshold keeps dropping as low quality males become less and less inclined to guard high quality males. This has to be because the greater the number of interactions which freelance males have, presumably the greater the number of competitive interactions whereby these males may “steal” a female from a male who is guarding one.

Finally, the last set of figures focuses on “operational sex ratio,” OSR. The OSR ranges from 0.2, female-biased, to 2.4, male-based. When there is a deficit of females high quality males will begin to accept pairings with low quality females, as is clear in panel B of the third figure. This makes rational sense in an environment of “scarcity.” The behavior of low quality males is more peculiar. In a situation of extreme female surplus their behavior converges upon that of high quality males: they reject low quality females, and accept high quality ones. As the sex ratio verges toward 1 the low quality males begin to reject high quality females and accept low quality ones. It seems that balanced mating ratios result in optimal trait matching, at least in terms of genetic quality, in the context of male competition for females (i.e., low quality males may prefer high quality females, but that is not an optimal decision because the likelihood of a payoff is low). But as the sex ratio verges toward a male surplus there are no good options for low quality males; the high quality females will reject them, because there are high quality males galore for them to select from, and the low quality females are now acceptable to high quality males, who will win them in the competition with low quality males.

Much of this is common sense. The mapping between formal quantitative model and verbal description is rather good. We know intuitively that in a context of male surplus it is the low quality males who will be shafted, and that low quality females will become valuable. You can offer up anecdote from engineering universities, or the army, or cite historical examples such as frontier societies with male-biased sex ratios. In modern day Punjab men import wives from poorer regions of eastern South Asia because of a sex-ratio imbalance. But here is where numbers are of the essence, as quantitative models show you how shifting the variates shifts the response. There has been some concern in relation to “bare branches”, men who can not marry in Asia, and its possible impact on societal stability. But one must keep in mind the exact proportion of bare branches within a society when predicting instability due to manic competition for women. Formal models can give us a better guide as to thresholds which should concern us.

Ultimately papers like this need to be validated by experiment and observation. But they’re useful toolkits, sharpeners of thought and conceptualization. It’s hard to test, verify, and refute, if you don’t pose the question and make a prediction in a clear and distinct manner.

Citation: Venner S, Bernstein C, Dray S, & Bel-Venner MC (2010). Make love not war: when should less competitive males choose low-quality but defendable females? The American naturalist, 175 (6), 650-61 PMID: 20415532

Image Credit: BS Thurner Hof, Kristin Dos Santos

🔊 Listen RSS

Across the ~3 billion or so base pairs in the human genome there’s a fair amount of variation. That variation can be partitioned into different classes, somewhat artificial constructions of human categorization systems, but nevertheless mapping on to real demographic or life history events of particular importance. Some of the variation is specific to populations, while some of it is specific to a set of populations, and, there is also variation which we find only within families. Presumably when whole genome sequencing and analysis becomes the norm such distinctions will still have utility, but we should be able to tunnel down to whatever level of analysis we wish. But until that day comes we’re going to have to rely on population sets which are deeply sequenced and can serve as a reasonable representation of a subset of human variation. I mention some of these populations regularly on this weblog, the HGDP, HapMap and POPRES being three prominent data sets with a diverse range. These groups cover only a small subset of human populations, and of those populations only a small proportion of the genomes of individuals (albeit, the component which is likely to vary within the population). A new paper in Nature takes a close look at the expansion of the HapMap to a new set of populations. Since it’s out of the HapMap consortium the list of authors themselves gives us a large set of individuals who might be of population genetic interest! (though not a representative set of human population variation; where are the Papuan employees of the Broad Institute?) Some of the data coming out of the next stage of the HapMap has been found in several papers already (often in the supplements), but this looks to be an overview and taste of what’s to come (the paper was submitted last fall). Integrating common and rare genetic variation in diverse human populations:

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains pa lained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called ‘HapMap 3’, includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of ≤5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs . This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

Since the supplements are free to all I recommend you download them if you don’t have academic access. The main difference is that they’re not as pithy in the supplements, and the graphics are lower quality. The populations (the original HapMap populations bold):

Centre d’Etude du Polymorphisme Humain collected in Utah, USA, with ancestry from northern and western Europe (CEU)
Han Chinese in Beijing, China (CHB)
Japanese in Tokyo, Japan (JPT)
Yoruba in Ibadan, Nigeria (YRI)

African ancestry in the southwestern USA (ASW)
Chinese in metropolitan Denver, Colorado, USA (CHD)
Gujarati Indians in Houston, Texas, USA (GIH)
Luhya in Webuye, Kenya (LWK)
Maasai in Kinyawa, Kenya (MKK)
Mexican ancestry in Los Angeles, California, USA (MXL)
Tuscans in Italy (Toscani in Italia, TSI)

So memorize some of those abbreviations! One particular difference across these populations is that some are parent-offspring trios, and some are not. So the CEU sample are trios, while the TSI are not. This obviously matters since you’re going to have clusters of relatedness within the CEU sample that you wouldn’t have within TSI. There are analytic upsides and downsides to having trios or not having trios, but, for a major purpose of this sort of data set, covering world wide human variation, you probably would want unrelated individuals with a population. These are the samples with trios: CEU, ASW, MXL, MKK, and YRI.

To get the SNPs and CNPs they merged the results from Affymetrix and Illumina chips, and came out with ~1.5 million variants across ~1,000 individuals. In terms of exploring big picture questions which are on a coarse scale this is pretty good, though I’m not sure that it’s that much better than the HGDP, which has so many populations (though about half the number of SNPs). Rather, one of the primary issues focused on in this paper is finding enough of the rarer variants, which may not have shown up in the initial panel because of its narrow population coverage, so as to perform imputation for purposes of statistical analysis in GWAS. So, for example, they compare the CEU vs. the CEU+TSI in imputing to a British study group. Here’s what they found (MAF = major allele frequency):

For common SNPs (MAF ≥5%), the larger HapMap 3 reference panel made only a slight difference to the already excellent performance (mean r2 increased from 0.946 to 0.961). However, as expected there was greater improvement for rare (MAF <0.5%) and low-frequency SNPs (MAF = 0.5–5%). Their combined mean r2 increased from 0.60 to 0.76, driven by a large subset of rare SNPs (41%) and low-frequency SNPs (25%) where r2 increased by at least 0.1, yielding mean r2 improvement for these subsets of 0.62 and 0.49 respectively…

So the older HapMap data set was fine with more common variants, but a larger sample set really gave some returns with less common variants. This makes intuitive sense. What is interesting to me is that the CEU sample of Utah Whites is presumably genetically close to a group of British whites born in 1958, and yet adding a Tuscan sample was still useful. To get a sense of how the power of this sort of imputation drops off between populations, as the further the genetic distance the fewer rare variants are shared, they imputed in a pairwise fashion, or, comparing a population to putative admixtures. So African Americans, who have a substantial proportion of European admixture with West African primary ancestry, are best modeled once you combine CEU+YRI with appropriate weights. This is especially true for rare alleles, r2 was 83% and 86.5% for common SNPs for African Americans and Yoruba, and Africans Americans and Yoruba & Utah Whites. For rare SNPs, it was 45.5% vs. 71.7%! Models which added the other HapMap 3 populations were actually less effective at imputation. East Eurasians have different genetic variants which simply confuse the picture.

It is intuitively obvious why rare alleles show up as you increase sample size. But why are rare alleles more distinctive across populations? If they’re common alleles they’re likely to have been around a long time, and so may be ancestral variants, or have had time to spread via gene flow. In contrast, rare alleles may be new, and so more distinctive across populations. Similarly, there are alleles which surely are passed down through families.

Figure 3 shows the impact of sample size on SNPs discovered:


Note the two groups of curves: African vs. non-African. This paper confirms the findings that Africans have more genetic diversity than other populations, while East Asians have less (presumably if Amerindians were in the sample they would round out the bottom). From the text:

As judged by this measure, informativeness varied greatly for different population pairs. Consistent with the observation that non-African diversity is largely a subset of African diversity…African samples provided a more complete discovery resource for variant sites in non-African samples than the converse…Focusing only on low-frequency variants in the original sample of 30 A individuals (one or two copies, corresponding to allele frequencies of 3.3% or less), even African samples were highly incomplete for diversity outside of Africa, with informativeness ratios dropping to 40–60% in LWK and YRI…In general, for low-frequency variants only closely related populations did an adequate job of capturing variation…probably reflecting the recent origins of low-frequency variants. Two populations, LWK and GIH, stand out as being poorly captured by any of our other populations, the result of admixture with an ancestral population not closely related to any in our regional sequencing data….

hapmap3fig2aSo again, African genetic diversity can inform on other populations, but with low frequency allelic variants even Africans don’t have enough to account for non-African groups. As a historical matter much of that might be due to the fact that the non-African variants have emerged more recently since the out of Africa event. Figure 2a shows the pairwise relationships between and within the populations measured by low frequency SNPs. More precisely, they took 30 random individuals from a population, and compared them to 30 random individuals from within the same population (without overlap), as well as 30 random individuals from other populations. The black bar is the same population comparison, while the colored bars represent across population comparisons. The higher the bar the better the across sample concordance; SNPs in one sample set map on well to those in the other sample set. First, observe the minimal difference between CEU & TSI. Europeans are relatively genetically homogeneous, and as far back as History and Geography of Human Genes it was evident that there was relatively minimal within continental variance. Next in line in relation to a CEU reference is GIH, the Gujaratis. This makes sense from all the other studies we know. South Asians are closer to West Eurasians than any other populations. Similarly, YRI are closest in correspondence with LWK, the Bantu sample from Kenya. But though the rank order of population relatedness is roughly similar to what you’d find in Fst, the authors note that the pairwise comparisons are not symmetrical. GIH was informative for 71% of TSI low frequency SNPs, but TSI was only informative for 55% of GIH. Why? GIH is more diverse, but it is also probably the Gujaratis are a compound of a European-like and non-European population, so what your’e seeing is overlap across the European fractions. Since the Tuscans lack the non-European fraction the Gujaratis will have alleles which aren’t found within them.

Speaking of the Gujaratis, there are some interesting results in the supplements which I want to highlight. They illustrate again the importance of context in PCA charts. They’re representations of reality, but only as good as your ability to interpret them and the inputs you’re giving them. Below are a set of images from the supplements, and you can skim them quickly. I’ve labelled them by population and context. Note how the populations shift positions based on the population set of variation you plug into the analysis. These are all the two largest components of variance.

[nggallery id=8]

Notice how Gujaratis and Mexican Americans overlap on the world wide PCA plot. Why? Because their gene frequencies are a linear combination of East and West Eurasian genetic variance, to a first approximation. I’ve indicated before that the overlap disappears when you look at other components of variation. But as the second image shows, you don’t have have to do that. Use only Mexican Americans, Europeans, and Gujaratis, and you see that Mexican Americans have a component of variance which is different from the other two. That’s because the non-European ancestry of Gujaratis is very different from that of the Mexican Americans, though both cluster to together when set next to Europeans, East Asians, and Africans. Remember that in the world wide set PC 1 is African vs. non-African, so removing Africans immediately frees up a dimension for the plot. The last figure shows Mexican Americans with Chinese and Europeans, and again, you see that there’s variation which isn’t simply a linear combination of Chinese and Europeans, Amerindians have their own uniqueness not found in either. In contrast, African Americans are a rather straightforward combination of West Africans and Europeans. Thankfully for African American genetics their parental populations were in the original HapMap. For Gujaratis and Mexican Americans you have only half the picture in the original HapMap, and you’d have to use the imperfect substitute of East Asians (very imperfect for Gujaratis, and somewhat so for Mexican Americans).

One final issue on phylogenetic relationships: the strange pattern among Gujaratis which I perceived among other South Asians as well is still evident. In the plot with Mexican Americans + Europeans + Gujaratis, the Gujaratis seem a linear combination of European + something else. What Reich et al. would term “Ancestral North Indian” + “Ancestral South Indian.” But the Gujarati + European plot shows that in the second component of variation there’s a difference between two clusters of Gujaratis. There’s something going on with the Gujarati group which is a touch closer to Europeans on the largest component of variance, because on the second dimension they’re deviated from the other Gujarati cluster and Europeans. This is similar in quality to the pattern with the South Asian data set with an orthogonal component of variation to the European-South Indian axis. The orthogonal component is striking among those which are between the Europeans and South Indians. The CEU + GHI + CHB plot doesn’t indicate to us that it’s East Asian either.

hapmap3selectfigOf course the paper wasn’t just about validating the power of expanding the data set for medical genetics and clarifying phylogenetic relationships. There are several subsections, but I thought I’d jump to the end where they allude to detecting natural selection. This seems preliminary at least. They didn’t really go that much further for populations in the original HapMap, but found some interesting stuff for the new groups. To the left is a table from the supplements (I reedited it a bit) which shows loci which popped out of the CMS test for natural selection for Tuscans, Masai, and Luhya (the second are Nilotic and Bantu from Kenya). I present the results for readers with an interest in particular loci who might seem something in the list that does, or doesn’t, make sense to them. It seems that this part of the paper is primarily about showing that the new populations have some utility in fleshing out evolutionary phenomena which may have been missing in the original analyses of the HapMap because of constrained population coverage. Comparing Tuscans to CEU, and the Masai to Luhya, should tell us something about the evolution of lactase persistence. These pairs consist of populations which are rather close to each other in terms of ancestry (especially the European groups), but local ecological and cultural conditions have no doubt applied different selection pressures (the majority of Tuscans seem to lack the lactase persistence allele common in northern Europe last I checked).

Finally, from the conclusion:

With improvements in sequencing technology, low-frequency variation is becoming increasingly accessible. This greater resolution will no doubt expand our ability to identify genes and variants associated with disease and other human traits. This study integrates CNPs and lower-frequency SNPs with common SNPs in a more diverse set of human populations than was previously available. The results underscore the need to characterize population-genetic parameters in each population, and for each stratum of allele frequency, as it is not possible to extrapolate from past experience with common alleles. As expected, lower-frequency variation is less shared across populations, even closely related ones, highlighting the importance of sampling widely to achieve a comprehensive understanding of human variation.

Intrepid readers can poke around the data themselves at the HapMap website.

Citation: The International HapMap 3 Consortium (2010). Integrating common and rare genetic variation in diverse human populations Nature : 10.1038/nature09298

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"