Razib Khan
The Pith: You’re Asian. Yes, you!

A conclusion to an important paper, Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich:

In particular, we have presented evidence suggesting that the genetic history of Europe from around 5000 B.C. includes:

1. The arrival of Neolithic farmers probably from the Middle East.

2. Nearly complete replacement of the indigenous Mesolithic southern European populations by Neolithic migrants, and admixture between the Neolithic farmers and the indigenous Europeans in the north.

3. Substantial population movement into Spain occurring around the same time as the archaeologically attested Bell-Beaker phenomenon (HARRISON, 1980).

4. Subsequent mating between peoples of neighboring regions, resulting in isolation-by-distance (LAO et al., 2008; NOVEMBRE et al., 2008). This tended to smooth out population structure that existed 4,000 years ago.

Further, the populations of Sardinia and the Basque country today have been substantially less influenced by these events.


It’s in Genetics, Ancient Admixture in Human History. Reading through it I can see why it wasn’t published in Nature or Science: methods are of the essence. The authors review five population genetic statistics of phylogenetic and evolutionary genetic import, before moving onto the novel results. These statistics, which measure the possibility of admixture, the extent of admixture, and the date of admixture, are often presented, but nested into supplements, in previous papers by the same group. On the one hand this removes from view the engines which are driving the science. On the other hand I have always appreciated that a benefit of this injustice to the methods which make insight possible is that those without academic access can actually bite into the meat of the researcher’s mode of thought.

I did read through the methods. Twice. I’ve encountered all the statistics before, and I’ve read how they were generated, but I’ll be honest and admit that I haven’t internalized them. That has to end now, because the authors have finally released a software package which implements the statistics, ADMIXTOOLS. I plan to use it in the near future, and it is generally best if you understand the underlying mechanisms of a software package if you are at the bleeding end of analytics. I will review the technical points in more detail in future posts, more for my own edification than yours. But for the moment I’ll be a bit more cursory. Four of the tests use comparisons of allele frequencies along explicit phylogenetic trees. That’s so general as to be uninformative as a description, but I think it’s accurate to the best of my knowledge. In the basics the tests are seeing if a model fits the data (as opposed to TreeMix, which finds the best model out of a range to fit the data). The last method, rolloff, infers the timing of an admixture event based upon the decay of linkage disequilibrium. In short, admixture between two very distinct populations has the concrete result of producing striking genomic correlations. Over time these correlations dissipate due to recombination. The magnitude of dissipation can allow one to gauge the time in the past when the original admixture occurred.

Let’s look at some results. To the left is a section of a table which illustrates the most significant 3-population test scores in the HGDP. The authors checked all the various combinations, and these came out at the top as likely admixtures (i.e., the two sources produce particular patterns in the target). Please remember that these triads should not be taken literally. The Uygur are not descended from Japanese and Italians. Rather, they are descended from populations with genetic affinities to these two sources. Precisely, the Uygurs are descended from Northeast Asian Turks, who assimilated an Indo-European speaking substratum. Most of the results are rather obvious and explicable. Several Middle Eastern populations are known to have Sub-Saharan African admixture, and this is shows up in the results. Others may be more confusing because of the obscurity of the populations, but the Burusho clearly have ancient East Asian ancestry on clustering algorithms, so their presence is not surprising to me. Similarly, the Russians in the HGDP data set have an ‘eastern’ affinity (or at least some do), either due to Finno-Ugric or Turkic ancestry (Tatars regularly assimilated into a Russian ethnic identity as the Tsars expanded their domains).

Some of the other results are more confusing, but one can still find a historical explanation. I have seen evidence that some of the Cambodian samples may have old Indian admixture, though it is not entirely clear to me. But that could explain why there is a signature of West Eurasian admixture into this population (though one wonders why the donor was not Baloch or Pathan.). The Xibo and Tu are Northeast Asian groups, on the border between China proper and the great Eurasian interior. West Eurasian admixture into these groups is not unexpected. West Eurasians are historically attested among the mercenaries and soldiers who arrived on the North China plain after the collapse of the Han dynasty, down to the Alans who served under Kublai Khan. Some of Mongolian and Turkic peoples have individuals who are attested as having characteristics more typical of Europeans (e.g., red hair), so it is likely that this admixture was relatively old and widespread, well before the era of the Pax Mongolica.

There is a minor dissonant note in these results above. The authors used rolloff and inferred an admixture of ~800 years before the present. This is far lower than earlier estimates, which were >2,000 years before the present. First, I have to say that I was mildly skeptical of the higher value reported earlier. From what little I know the roiling of Turco-Mongol peoples which reordered the Inner Asian landscape did not really establish itself beyond the Chinese fringe at this time. Recall that Central Asia was the domain of the Iranians from prehistory down to the Islamic age (the full transition of Central Asia from Persianate to Turkic has not completed itself to this date, though it has progressed over the centuries since 1000 A.D.). Is it creditable that the Turkic hordes were shut on the other side of the Pamirs for ~1,000 years? Perhaps. But it should warrant skepticism, and openness to the lower values proffered here. The technical reason that the authors consider is that STRUCTURE based inferences may overestimate admixture when reference populations are not appropriate. And yet the authors still concede that 800 years is simply difficult to credit when one consults the historical literature. Strangely though it does align with the date of the Mongol ascendancy, during which time the Uygurs served as civil servants in the barbarian empire (Mongol script derives from the old Uygur script). I managed to dig up a cave painting of Uygurs from this period. There is surely artistic license, but they look rather East Asian to me, as opposed to the hybrid Eurasian appearance modal among modern Uygurs. I won’t touch upon the rather fraught and complex ethnology and ethnogenesis of modern Uygurs, and their relationship to Russian and Chinese ethnographers, but suffice it to say that one needs to be careful about excessive reliance on the literality of historical documents in this area, because of semantic confusions.

So let’s move to the main course: what’s going on in Europe? Before putting the spotlight on the macro picture, let’s highlight one secondary aspect: the authors detect evidence of massive gene flow into Spain from Northern Europe ~4,000 years before the present. I’ll let them speak here:

We hypothesize that we are seeing here a genetic signal of the ‘Bell-Beaker culture’ (HARRISON, 1980). Initial cultural flow of the Bell-Beakers appears to have been from South to North, but the full story may be complex. Indeed one hypothesis is that after an initial expansion from Iberia there was a reverse flow back to Iberia (CZEBRESZUK, 2003); this ‘reflux’ model is broadly concordant with our genetic results, and if this is the correct explanation it suggests that this reverse flow may have been accompanied by substantial population movement.

Two things to hammer home here. First, pots move with people. That’s the inference being drawn from the results. It’s not pots-not-people, it’s people-and-pots. Second, the idea of reversals in the direction of gene flow are intriguing, and, I think need to be taken more seriously. It seems the most plausible candidate here are the people who later became the Celtiberians. Celts have been associated with the Bell Beakers before.

But the bigger shock is that Europeans, and especially Northern Europeans, seem to have a substantial Northeast Asian component. From the nature of the prose I feel that the authors were definitely taken aback. They basically say so in so many words. In the process of resolving their confusion they skinned the cat every which way. And it does look to me that Northern Europeans are truly descended in part from a population which has affinities to the “First Americans.” I say this specifically because the Siberian samples they tested actually gave a weaker result than the South American Amerindians on the 3-population test.

So what’s the proportion of ancestry? Using the Siberian population they came up with an interval of 5-18 percent in Northern Europeans. The authors used the Sardinians as their “pure” European reference, and admit that it is likely that their admixture estimate is lower than real value due to this fact. Inference is inference, do you trust this result? As it happens the authors also checked Ötzi the Iceman, and found that like the modern Sardinians he had very little Northeast Asian ancestry. Ötzi is dated to ~5,000 years before he present. Using rolloff the authors estimate an admixture date of ~4,000 years before the present, with an error of nearly 1,000. Additionally, using a different data set they came with an admixture date of ~2,000 years before the present. The latter is obviously wrong (they explain why this could happen in the text). But Ötzi seems to put a boundary on how early it could have been, at least in Southern Europe.

As of publication the authors did not have time to include a reference to this interesting nugget from the abstracts of ASHG 2012:

The complete genome of the 5,300 year old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has recently been published and yielded new insights into his origin and relationship to modern European populations. A key finding of this study has been an apparent recent common ancestry with individuals from Southern Europe, in particular Sardinians…We used unpublished data from whole genome sequencing of 452 Sardinian individuals, together with publicly available data from Complete Genomics and the 1000 Genomes project, to confirm that the Iceman is most closely related to contemporary Sardinians. An analysis of these data together with ancient DNA data from a recently published study on Neolithic farmers and hunter-gatherers from Sweden shows the Iceman most closely related to the farmer individual, but not the hunter-gatherers, with the Sardinians again being the contemporary Europeans with the highest affinity. Strikingly, an analysis including novel ancient DNA data from an early Iron Age individual from Bulgaria also shows the strongest affinity of this individual with modern-day Sardinians. Our results show that the Tyrolean Iceman was not a recent migrant from Sardinia, but rather that among contemporary Europeans, Sardinians represent the population most closely related to populations present in the Southern Alpine region around 5000 years ago. The genetic affinity of ancient DNA samples from distant parts of Europe with Sardinians also suggests that this genetic signature was much more widespread across Europe during the Bronze Age.

I’m betting that this Bulgarian sample won’t exhibit Northeast Asian ancestry, though who knows?

There is a definite geographic pattern within Europe to the strength of the signature of admixture. Northern European populations have the greatest, Southern European populations less, and islanders like Cypriots hardly any. Recall that Sardinians seem to be the best reference, so the ~0 floor may just be a statistical artifact of the measuring stick we have. All that being said, what went on <5,000 years before the present to reorder the European landscape?

The answer may sound crazy, but I think the most probable explanation (even if it is unlikely) is something to do with the Indo-Europeans. We know that Indo-European languages were spoken in Greece by ~1500 BC at the latest. One thing that is clear from less advanced clustering algorithms is that Basques and Finns are somewhat distinctive in relation to their neighbors. Though they are not genetically that different, they still lack some “interesting”elements. The results to the left are from Dienekes, though I’ve replicated it. You can see a similar difference between French, and French Basques. The Basques seem to lack something which has affinities with West Asia. These results, and hints elsewhere, imply that the Basque may not be descended from hunter-gatherers, but the first European farmers. So who came after them?

Though it strikes me as a bizarre conjecture, but I can’t help but imagine the rapid expansion of Indo-European populations into Europe, pushing into the peninsulas of the south. These people may have been a newly formed cosmopolitan mix of West Asians, Northern European Mesolithics, and Northeast Asians. I am at a loss to hazard a guess as to who the First American-like Northeast Asians were, though perhaps they were a western offshoot of the Kets? These people were then absorbed into a melange of tribes who themselves emerged from a synthesis between immigrant West Asian farmers and Northern Europeans. In shorthand: perhaps the Indo-Europeans were mongrels! This is not an entirely crazy proposition if you look at the historical record. Conquest populations often synthesized and absorbed those who they conquered. Sometimes they even became the conquered in deep cultural ways (e.g., the Bulgars).

To ward off accusations of glib and facile speculations, I well understand that much of what I suggest above is likely wrong. But bizarre results are going to elicit unhinged hypotheses. And I shouldn’t overplay how strange these results are, I think they are going to stand the test of time. The authors are top notch, and Dr. Joseph Pickrell found the same pattern (a connection between Europeans and Native Americans) with TreeMix! If we sit back and reflect on phenotype it shouldn’t be entirely surprising. Some Scandinavians have always struck me as having a generalized Eurasian cast to their features. Obviously this tendency is stronger among the Sami and Finns, but you can see it in Swedes and others. This is far less evident to me among Southern European peoples. I doubt one would ever confuse a Sardinian for a Eurasian, and I never had that feeling when I spent some time in Italy a few years back (in contrast, some Finns did look Asiatic to me).

Finally, this paper highlights the reality that population genetics has little to do with Plato. A population within a species is simply not clear and distinct in a sense which would satisfy an Idealist. The authors of the above paper nod to this, illustrating how their tests for admixture are confounded and confused by constant gene flow via isolation-by-distance dynamics. These results indicate that Northern Europeans are on the order of 10% Northeast Asian. Does this mean that Northern Europeans are 10% non-white? Well, it turns out that white people were always 10% non-white! We just didn’t know. Is my daughter (who is 50% Northern European) now majority non-white? Oh wait, I’m South Asian. That means I’m ~50% white! Is my friend who is 25% Japanese now more than 25% Northeast Asian? Words and concepts fail us on the boundary of unfamiliarity, in time and space. Populations and genealogies don’t brook our categorizations. On a deep level we are all admixtures, and partitioning of ancestry along phylogenetic trees are useful and comprehensible fictions. These techniques put flesh upon the bones of archaeology and smoke out the outlines of history. But we always need to be aware that that history is not made by humans, rather, we excavating it, and then giving it appropriate glosses in our museums. And yet it is.

Related: Dienekes has much to say (obviously).

Image credit: Wikipedia, Wikipedia, and Wikipedia.

Cite: 10.1534/genetics.112.145037

(Republished from Discover/GNXP by permission of author or representative)
Back when this sort of thing was cutting edge mtDNA haplogroup J was a pretty big deal. This was the haplogroup often associated with the demic diffusion of Middle Eastern farmers into Europe. This was the “Jasmine” clade in Seven Daughters of Eve. A new paper in PLoS ONE makes an audacious claim: that J is not a lineage which underwent recent demographic expansion, but rather one which has been subject to a specific set of evolutionary dynamics which have skewed the interpretations due to a false “molecular clock” assumption. By this assumption, I mean that mtDNA, which is passed down in an unbroken chain from mother to daughter, is by and large neutral to forces like natural selection and subject to a constant mutational rate which can serve as a calibration clock to the last common ancestor between two different lineages. Additionally, mtDNA has a high mutational rate, so it accumulates lots of variation to sample, and, it is copious, so easy to extract. What’s not to like?

First, the paper, Mutation Rate Switch inside Eurasian Mitochondrial Haplogroups: Impact of Selection and Consequences for Dating Settlement in Europe:

R-lineage mitochondrial DNA represents over 90% of the European population and is significantly present all around the planet (North Africa, Asia, Oceania, and America). This lineage played a major role in migration “out of Africa” and colonization in Europe. In order to determine an accurate dating of the R lineage and its sublineages, we analyzed 1173 individuals and complete mtDNA sequences from Mitomap. This analysis revealed a new coalescence age for R at 54.500 years, as well as several limitations of standard dating methods, likely to lead to false interpretations. These findings highlight the association of a striking under-accumulation of synonymous mutations, an over-accumulation of non-synonymous mutations, and the phenotypic effect on haplogroup J. Consequently, haplogroup J is apparently not a Neolithic group but an older haplogroup (Paleolithic) that was subjected to an underestimated selective force. These findings also indicated an under-accumulation of synonymous and non-synonymous mutations localized on coding and non-coding (HVS1) sequences for haplogroup R0, which contains the major haplogroups H and V. These new dates are likely to impact the present colonization model for Europe and confirm the late glacial resettlement scenario.

John Hawks has written at length of the possible distortions that selection might produce in our understanding of the history of mtDNA lineages, and therefore our understanding of the history of the population groups which these genealogies are used as proxies for. So I won’t review that much. I find the dynamics that they’re detecting possible, even plausible. But I don’t see why the authors having introduced skepticism start to conjure up positive visions of what is the true nature of the demographics which underpin these mtDNA phylogenies, now that they’ve “corrected” for variation in the power of the molecular clock to let use look through the glass clearly.

Readers with more fluency in the mtDNA literature can probably pick it apart. At the end of the day I’m always wondering what do the subfossils tell us? In other words, ancient DNA. Inferences from contemporary populations have been a total hash at a finer grain than that of continents, so you probably shouldn’t rest on that leg alone.

Finally, I thought this paper was of interest because it’s an inversion of R1b1b2. That’s a Y chromosomal haplogroup which was once presumed to be Paleolithic but now seems likely to be Neolithic. These authors are claiming that a mtDNA haplogroup which was once presumed to be Neolithic is actually Paleolithic. All this I think indicates that we should be modulating outward our error bars whenever we make assertions based on uniparental data with any time depth and below a very coarse level of spatial granularity.

(Republished from Discover/GNXP by permission of author or representative)
The image above is adapted from the 2010 paper A Predominantly Neolithic Origin for European Paternal Lineages, and it shows the frequencies of Y chromosomal haplogroup R1b1b2 across Europe. As you can see as you approach the Atlantic the frequency converges upon ~100%. Interestingly the fraction of R1b1b2 is highest among populations such as the Basque and the Welsh. This was taken by some researchers in the late 1990s and early 2000s as evidence that the Welsh adopted a Celtic language, prior to which they spoke a dialect distantly related to Basque. Additionally, the assumption was that the Basques were the ur-Europeans. Descendants of the Paleolithic populations of the continent both biologically and culturally, so that the peculiar aspects of the Basque language were attributed by some to its ancient Stone Age origins.

As indicated by the title the above paper overturned such assumptions, and rather implied that the origin of R1b1b2 haplogroup was in the Near East, and associated with the expansion of Middle Eastern farmers from the eastern Mediterranean toward western Europe ~10,000 years ago. Instead of the high frequency of R1b1b2 being a confident peg for the dominance of Paleolithic rootedness of contemporary Europeans, as well as the spread of farming mostly though cultural diffusion, now it had become a lynch pin for the case that Europe had seen one, and perhaps more than one, demographic revolutions over the past 10,000 years.

This is made very evident in the results from ancient DNA, which are hard to superimpose upon a simplistic model of a two way admixture between a Paleolithic substrate and a Neolithic overlay. Rather, it may be that there were multiple pulses into a European cul-de-sac since the rise of agriculture from different starting points. We need to be careful of overly broad pronouncements at this point, because as they say this is a “developing” area. But, I want to go back to the western European fringe for a moment.

As I stated above the Basques were long used as a Paleolithic “reference” by historical geneticists. That is, the deviation of a population from the Basques would be a good measure of how much admixture there had been from post-Paleolithic sources. Connections between Iberian populations and those of western and northern Europe were used to trace expansions out of the ecological refuges of modern humans during the Last Glacial Maximum ~20,000 years ago. Just goes to show how reliant we are on axioms which are squishier than we’d like to think.

Last fall I posted a result from Dodecad on the difference between French and French Basques (both from the HGDP). I’ve replicated this myself a few times now too:

The striking aspect is that the Basque are less cosmopolitan than the other French. This is evident in most of the runs of the HGDP Basque; they just have a “simpler” genetic heritage than other Western Europeans. Today Dienekes posted some results from the IBS Spanish data set in the 1000 Genomes. He suggests there are clearly a few Spanish Basques in there (I’ve highlighted them):

Recall that the Basques were exempt from inspection for “cleanliness of blood”, because they were presumed to lack Jewish or Moorish ancestry by virtue of being Basque. It seems that the Spanish IBS sample, like the Behar et al. Spaniards and Portuguese, do have some Moorish genetic imprint. This is not too surprising. The Moriscos might have been expelled in the early 17th century, but not before the majority had converted to Christianity over the centuries (in fact, some of the most virulent anti-Morisco partisans had Moorish ancestry themselves, and were particularly tainted by association with the remaining culturally unassimilated crypto-Muslims). All that being said, I suspect that the “West Asian” ancestry amongst the majority of the Spaniards is not due mostly to the Arab period (when of the majority of the settlers probably were Berbers or Arabicized Berbers), but to population impacts prior to that. By the time of the Roman conquest much of Spain was Celtiberian. I have low confidence in this assertion, but I am coming to believe that the Indo-Europeans brought a mix of East European and West Asian ancestry (or at least those two distinct strands which tend to shake out of ADMIXTURE in a broad array of European samples) to western Europe.

On a related note, Wave-of-Advance Models of the Diffusion of the Y Chromosome Haplogroup R1b1b2 in Europe:

Whether or not the spread of agriculture in Europe was accompanied by movements of people is a long-standing question in archeology and anthropology, which has been frequently addressed with the help of population genetic data. Estimates on dates of expansion and geographic origins obtained from genetic data are however sensitive to the calibration of mutation rates and to the mathematical models used to perform inference. For instance, recent data on the Y chromosome haplogroup R1b1b2 (M269) have either suggested a Neolithic origin for European paternal lineages or a more ancient Paleolithic origin depending on the calibration of Y-STR mutation rates. Here we examine the date of expansion and the geographic origin of hgR1b1b2 considering two current estimates of mutation rates in a total of fourteen realistic wave-of-advance models. We report that a range expansion dating to the Paleolithic is unlikely to explain the observed geographical distribution of microsatellite diversity, and that whether the data is informative with respect to the spread of agriculture in Europe depends on the mutation rate assumption in a critical way.

Really I’m waiting for more ancient DNA. These sorts of studies are starting to feel like rewarming cold pizza. Edible, but suboptimal. Next, Phylogeography of a Land Snail Suggests Trans-Mediterranean Neolithic Transport:

Fragmented distribution ranges of species with little active dispersal capacity raise the question about their place of origin and the processes and timing of either range fragmentation or dispersal. The peculiar distribution of the land snail Tudorella sulcata s. str. in Southern France, Sardinia and Algeria is such a challenging case.

Statistical phylogeographic analyses with mitochondrial COI and nuclear hsp70 haplotypes were used to answer the questions of the species’ origin, sequence and timing of dispersal. The origin of the species was on Sardinia. Starting from there, a first expansion to Algeria and then to France took place. Abiotic and zoochorous dispersal could be excluded by considering the species’ life style, leaving only anthropogenic translocation as parsimonious explanation. The geographic expansion could be dated to approximately 8,000 years before present with a 95% confidence interval of 10,000 to 3,000 years before present.

This period coincides with the Neolithic expansion in the Western Mediterranean, suggesting a role of these settlers as vectors. Our findings thus propose that non-domesticated animals and plants may give hints on the direction and timing of early human expansion routes.

So basically the snail hitched a ride from Sardinia to Algeria to France. I don’t think this is that surprising. First, it seems pretty obvious that a lot of the cultural expansion in the prehistoric period did not consist of the fission of villages along a continuous wave of advance, but involved leap-frogging to suitable nuclei from which the populations expanded. Imagine a rising flood where the lowest zones are inundated first, and then the higher peaks. Additionally, we shouldn’t presume that these expansion events were without conflict and institutional support. Consider that the expansion of farming across much of southern European Russia and Ukraine could only occur after the state had pacified, expelled, or assimilated, the mobile Turkic populations which were wont to extract unsustainable rents out of isolated and vulnerable peasant populations.

Finally, what’s up with the strong north-south differentiation across the Mediterranean basin, peaking in the west? It’s as if there were two waves of demographic and cultural advance which laid the ground work, and later perturbations haven’t disrupted that bedrock. It suggests to me the critical importance of lateral coastal transport in connecting cultural colonies, as opposed to more long distance jumps across the open sea. The latter were probably important for the transport of luxury goods and the exchange of memes, but not so much for the exchange of genes.

(Republished from Discover/GNXP by permission of author or representative)
Synthetic map

In the age of 500,000 SNP studies of genetic variation across dozens of populations obviously we’re a bit beyond lists of ABO blood frequencies. There’s no real way that a conventional human is going to be able to discern patterns of correlated allele frequency variations which point to between population genetic differences on this scale of marker density. So you rely on techniques which extract the general patterns out of the data, and present them to you in a human-comprehensible format. But, there’s an unfortunate tendency for humans to imbue the products of technique with a particular authority which they always should not have. The History and Geography of Human Genes is arguably the most important historical genetics work of the past generation. It has surely influenced many within the field of genetics, and because of its voluminous elegant visual displays of genetic data it is also a primary source for those outside of genetics to make sense of phylogenetic relations between human populations. And yet one aspect of this great work which never caught on was the utilization of “synthetic maps” to visualize components of genetic variation between populations. This may have been fortuitous, a few years ago a paper was published, Interpreting principal components analyses of spatial population genetic variation, which suggested that the gradients you see on the map above may be artifacts:

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.’s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

A paper earlier this year took the earlier work further and used a series of simulations to show how the nature of the gradients varied. In light of recent preoccupations the results are of interest. Principal Component Analysis under Population Genetic Models of Range Expansion and Admixture:

In a series of highly influential publications, Cavalli-Sforza and colleagues used principal component (PC) analysis to produce maps depicting how human genetic diversity varies across geographic space. Within Europe, the first axis of variation (PC1) was interpreted as evidence for the demic diffusion model of agriculture, in which farmers expanded from the Near East ∼10,000 years ago and replaced the resident hunter-gatherer populations with little or no interbreeding. These interpretations of the PC maps have been recently questioned as the original results can be reproduced under models of spatially covarying allele frequencies without any expansion. Here, we study PC maps for data simulated under models of range expansion and admixture. Our simulations include a spatially realistic model of Neolithic farmer expansion and assume various levels of interbreeding between farmer and resident hunter-gatherer populations. An important result is that under a broad range of conditions, the gradients in PC1 maps are oriented along a direction perpendicular to the axis of the expansion, rather than along the same axis as the expansion. We propose that this surprising pattern is an outcome of the “allele surfing” phenomenon, which creates sectors of high allele-frequency differentiation that align perpendicular to the direction of the expansion.

The first figure shows the general framework with which they performed the simulations:


You have a lattice which consists of demes, population units, all across Europe. They modulated parameters such as population growth (r), carrying capacity (C), and migration (m). Additionally, they had various scenarios of expansion from the southwest or southeast, as well as two expansions one after another to mimic the re-population of Europe after the Ice Age by Paleolithic groups, and their later replacement by Neolithic groups. They modulated admixture and introgression of genes from the Paleolithic group to the Neolithics so that you had the full range where the final European were mostly Neolithic or mostly Paleolithic.

Below are some of the figures which show the results:

[nggallery id=25]

allesurAs you can see the strange thing is that in some models the synthetic map gradient is rotated 90 degrees from the axis of demographic expansion! In this telling the famous synthetic map showing Neolithic expansion might be showing expansion from Iberia. Perhaps a radiation from a post-Ice Age southern refuge?

One explanation might be “allele surfing” on the demographic “wave of advance.” Basically as a population expands very rapidly stochastic forces such as random genetic drift and bottlenecks could produce diversification along the edge of the population wave front. The reason for this is that these rapidly expanding populations explode out of serial bottlenecks and demographic expansions, which will produce genetic distinctiveness among the many differentiated demes bubbling along the edge of expansion. Alleles which may have been at low frequency in the ancestral population can “fix” in descendant populations on the edge of the demographic wave of advance. This is the explanation, more or less, that one group gave last year for the very high frequencies of R1b1b2 in Western Europe. With this, they overturned the classic assumption that R1b1b2 was a Paleolithic marker, and suggested it was a Neolithic one.

Here’s their conclusion from the paper:

A previous study showed that the original patterns observed in PCA might not reflect any expansion events (Novembre and Stephens 2008). Here, we find that under very general conditions, the pattern of molecular diversity produced by an expansion may be different than what was expected in the literature. In particular, we find conditions where an expansion of Neolithic farmers from the southeast produces a greatest axis of differentiation running from the southwest to the northeast. This surprising result is seemingly due to allele surfing leading to sectors that create differentiation perpendicular to the expansion axis. Although a lot of our results can be explained by the surfing phenomenon, some interesting questions remain open. For example, the phase transition observed for relatively small admixture rates between Paleolithic resident and Neolithic migrant populations occurs at a value that is dependent on our simulation settings, and further investigations would be needed to better characterize this critical value as a function of all the model parameters. Another unsolved question is to know why the patterns generally observed in PC2 maps for our simulation settings sometimes arise in PC1 maps instead. These unexplained examples remind us that PCA is summarizing patterns of variation in the sample due to multiple factors (ancestral expansions and admixture, ongoing limited migration, habitat boundary effects, and the spatial distribution of samples). In complex models such as our expansion models with admixture in Europe, it may be difficult to tease apart what processes give rise to any particular PCA pattern. Our study emphasizes that PC (and AM) should be viewed as tools for exploring the data but that the reverse process of interpreting PC and AM maps in terms of past routes of migration remains a complicated exercise. Additional analyses—with more explicit demographic models—are more than ever essential to discriminate between multiple explanations available for the patterns observed in PC and AM maps. We speculate that methods exploiting the signature of alleles that have undergone surfing may be a powerful approach to study range expansions.

What’s the big picture here? In the textbook Human Evolutionary Genetics it is asserted that synthetic maps never became very popular compared to PCA itself. I think this is correct. But, the original synthetic maps have become prominent for many outside of genetics. They figure in Peter Bellwood’s First Farmers, and are taken as a given by many pre-historians, such as Colin Renfrew. And yet a reliance on these sorts of tools must not be blind to the reality that the more layers of abstraction you put between your perception and comprehension of concrete reality, the more likely you are to be led astray by quirks and biases of method.

In this case I do think first-order intuition would tell us that synthetic maps which display PCs would be showing gradients as a function of demographic pulses. And yet the intuition may not be right, and with the overturning of old orthodoxies in the past generation of inferences from the variation patterns in modern populations, we should be very cautious.

Citation: Olivier François, Mathias Currat, Nicolas Ray, Eunjung Han, Laurent Excoffier, & John Novembre (2010). Principal Component Analysis under Population Genetic
Models of Range Expansion and Admixture Mol Biol Evol

(Republished from Discover/GNXP by permission of author or representative)
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"