The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
European Genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

330px-KeiraKnightleyByAndreaRaffin2011 In the comments below someone asked if the model Bryan Sykes’ outlined in Seven Daughters of Eve (and later Saxons, Vikings, and Celts), that modern Britons descend predominantly from the Paleolithic stock which repopulated the island in the wake of the end of the last Ice Age (or fled Doggerland), is still tenable. I don’t think so much. First, the tripartite origin of modern Northern Europeans probably puts more of an emphasis on migration in the Western regions of the continent. Yes, in many groups the ancestry which derives from the small populations which followed as the glaciers retreated is overall predominant (that is, somewhat more than 50%). But that is distinct from the idea that the proportion of ancestry from hunter-gatherers in any given area, such as Britain, is from indigenous hunter-gatherers long resident. What I’m getting at is that socio-cultural groups, such as “Early European Farmers” (EFF) and the Yamna, which contributed a great deal of ancestry to modern people are themselves in origin compounds of disparate elements. Because of the seeming homogeneity of European hunter-gatherers, likely due to a Pleistocene bottleneck and then a rapid range expansion from small founder groups, earlier methods of aligning mtDNA and Y haplogroups may have misled because of the lack of power to distinguish between extremely close lineages (European hunter-gatherers are almost all mtDNA group U and predominantly Y group I). Therefore the predominant Paleolithic ancestry across Northern Europe may actually be a function of a few discrete pulse admixture events. Subsequently demographically successful groups then carried this ancestry where they went, possibly replacing natives in totality.

I grant that this is speculative and not certain. For example, one assumption I’m making is that the density of hunter-gatherers was rather low across Europe. But clearly there were marine environments where they seem to have been thicker on the ground, particular zones where agriculturalists seem to simply stop their advance abruptly The ultimate answer will probably be through ancestry deconvolution methods. Basically, looking at the distribution of lengths of distinct ancestral elements, and seeing which model the empirical patterns fit. If I’m correct, then the distribution of lengths for hunter-gatherer ancestry in Northern Europe will be narrower than if you had a scenario of continuous regional expansion. It’s certain that someone is working on this.

Genetically the two scenarios don’t make that much of a difference, because European hunter-gatherers were probably a very homogeneous bunch (though this might be generally true for Eurasian hominins, as Neandertals and the Denisovan sample also exhibit low genetic diversity in comparison to modern populations). But anthropologically it is critical, because it fleshes out the processes of potential cultural change and turnover in the transition between societies and modes of production.

• Category: Science • Tags: European Genetics, Genetics 
🔊 Listen RSS
An the men of Europe shall be as he was!

And the men of Europe shall be as he was!

Were Scandinavians the original people of Europe? Such a headline is very suggestive of a press release gone wrong. But no, you just need to see what Eske Willerslev actually said to see the source of the headline. It was his lab which published the recent paper in Science, Genomic structure in Europeans dating back at least 36,200 years. Willerslev states:

”Genetically, he is European and is more closely related to current Europeans than any other people in the world. And that means that some of the earliest people in Europe were actually our forefathers,” Willerslev told the science website ”He is actually more closely related to Danes, Swedes, Finns and Russians than he is to the French, Spanish and Germans, so one could argue we are more originally European.”

I don’t know if the initial exchange was in English, or translated from Danish. And, journalists have been known to make “mistakes” in their quotations from scientists. But, taken at face value I’d have to say that this quote, and many of the inferences being made from this paper, strike me as “not even wrong.” Using the whole genome of this ancient man the authors generated fascinating results, but I am not quite so sure about the confidence presented in their interpretations. Some sentences jump out at me as anachronistic and incongruous. Consider:

Altogether, these results suggest that contemporary Siberian populations from the Yenisei basin derive part of their gene pool from a Eurasian HG population that shares ancestry with K14, but is more closely related to Scandinavian MHGs than to either MA1 or western European MHGs, indicating gene flow between their ancestors and Scandinavian Europe after K14 but prior to the Mesolithic.

Europe during the last glaciation

Europe during the last glaciation

It illustrates both the startling results which demand explanation , and the head-scratching assertions which are strewn about like ticking time bombs, due to the incongruity of their implications. Kostenki 14, K14, lived ~37,000 years ago. For most of the period between then and now Scandinavia, and much of Northern Europe was glaciated, and uninhabited. Much of the Yensei basin was also subject to glaciation. Therefore you can eliminate the possibility gene flow from the geographic region of Scandinavian Europe for much of this period because no humans lived in Scandinavian Europe, and likely in much of the Yensei basin as well. This is not a trivial point because the authors make assertions about the nature of migration, or lack thereof, in prehistory, so they need to be very clear and precise on these issues in regards to geographic provenance. The results and statistical genetic methods are complex enough as it is. Quotes like the one above make it very unclear whether the term Scandinavian is a semantic shorthand, likely, since glaciation is evident in a figure in their paper, or literal description, as the lay public and even non-close scientific readers are likely to infer. The major topline finding that is hard to dispute is that nearly 40,000 years ago on the plains of modern Ukraine an individual lived whose genetic makeup exhibited strong affinities to modern Europeans, in particular, Northern Europeans. This is not a trivial result because it adds more evidence to the model that West Eurasians and East Eurasians diverged before 40,000 years ago (earlier statistical genetic models utilizing computational techniques arrived at dates closer to ~20,000 years ago). Recently the genome of the man who lived about ~45,000 years further east in Siberia was analyzed, and found to exhibit genetic affinities with bothEast and West Eurasians. This implies that the differentiation of West and East Eurasians occurred in the interval between ~35 to 50 thousand years ago, aligning well with certain archaeological and paleontological findings. In addition, the tract length of Neandertal ancestry was longer in this individual than in moderns, just as it was in the Siberian genome, as one would expect. The admixture date was inferred to be on the order of ~50,000 years ago, again, in good alignment with expectations. The issues that I have rather are about the nature of the emergence of anatomically modern humans across Eurasia inferred from these results (and further). First, I can’t speak to the archaeology and ancient DNA analysis. I assume they had paleontologists look at the dating and what not, and it checked out. The dates and descriptions look plausible from what I know, but then I don’t know that much. Additionally, the ancient DNA looks good. Willerslev’s team is top notch, probably the only group within spitting distance of Svante Paabo’s in this area. With 2.4x coverage on the whole genome that’s good enough for reasonable genotype calling with ANGSD, and to compare with the HGDP data set and what not (~500,000 markers merged). This doesn’t mean that the archaeology and ancient DNA quality have no issues, but they aren’t obvious.


Citation: Seguin-Orlando, Andaine, et al. “Genomic structure in Europeans dating back at least 36,200 years.” Science (2014): aaa0114.

To the right you see an edited version of a figure from the paper. The barplot is something you’ll recognize as an admixture analysis. You can see that the European hunter-gatherer frequencies from 5,000 to 10,000 years before the present exhibit a very strong modal cluster affiliation, while the Neolithic farmers (to the right) are mixed. To the left you see the two ancient samples of K14 and MA1 (Mal’ta). They are hard to tease apart, though you see that K14 has a lot of the components altogether, like many modern Europeans. The map with the heated circles show genetic affinities from the f3 test, which basically takes a phylogenetic tree, with an outgroup, Mbuti pygmies, and the K14 individual, and another population, X. The outcome of the statistic can indicate the affinity and closeness between K14 and X (basically, the two populations posited to be a clade). The authors use several of these sorts of statistics, but the basic idea is the same, where allele frequencies across topologies should reflect shared (or lack thereof) drift history conditional on the topology being correct or not. In some cases the topology is not totally correct because of gene flow across the clades, so you infer admixture. It does seem rather evident that the K14 individual shares a lot of drift with Northern Europeans. Or more specifically Northern Europeans descend in large part from a population which also contributed to K14. The admixture plot confirms this.


Citation: Seguin-Orlando, Andaine, et al. “Genomic structure in Europeans dating back at least 36,200 years.” Science (2014): aaa0114.

All good so far. My main results qualm though can be found in the supplemental (which is free to anyone without access to the paper itself). In Science Willerslev tells Anne Gibbons that “What is surprising is this guy represents one of the earliest Europeans, but at the same time he basically contains all the genetic components that you find in contemporary Europeans—at 37,000 years ago.” The admixture analysis sort of confirms this, though admixture without a lot of scaffold in terms of what we know can be kind of confusing and like reading tea leaves. That’s why PCA is often useful since it summarizes the variation in a more straightforward manner, plotting out the largest independent components of variance within the data. Take a look at where K14 shakes out. I’ve placed a red pointer toward the K14 sample. In the first plot with all non-Africans K14 is smack the middle of the Central Asian cluster, clearly shifted toward East Asians. All the Europeans aside from Uralic and Turkic populations from Russia are a tight cluster, off to the right of the plot. In the second plot it is constrained more straightforwardly to a sample of Europeans and Middle Eastern populations. Previous PC1 had separated East Asians and Europeans, while PC2 had separeated Oceanieans and Amerindians. Now PC1 seems to separate ancient European hunter-gatherers from Middle Easterners and PC2 Early European Farmers (EFF) from Ma’lta and the Turkic groups with residual East Asian ancestral. Since K14 has the “Ancestral North Eurasian” affinity, it is shifted down toward Mal’ta, but since it is mixed it is not particularly close to the European populations. I point this out to suggest that summaries of the form that “someone just like modern Europeans existed 37,000 thousand years ago” are probably not helpful or illuminating, but that’s what we’re seeing in the press, and not just from the press itself. K14 was an ancient human being. Attempting to understand him as a combination of modern genetic variation is going to have shortcomings. This is obvious. Using “ancestral allele frequencies” derived from model-based admixture analysis to figure out his affinities is useful, but has limits, because those allele frequencies are informative of modern populations. Similarly, PCA is projecting him upon modern population genetic variation. Useful, but again, one must be careful. What’s the big takeway? Let me quote the last paragraph of the Science paper:

Our results further suggest that the early stages of the western Eurasian lineage were already complex (see also Fig. 2). Besides its core affinities with subsequent European groups, K14 also shares alleles with European Neolithic farmers and contemporary people from the Middle East/Caucasus, which are not found in MA1 and western European MHGs, indicating genetic exchange between K14 and a Basal Eurasian Lineage (which eventually contributed to Neolithic groups) after the ancestors of MA1 and subsequent European MHGs had diverged. This implies that early AMH populations became structured early in their history, but already in the UP contained the major genetic components found in Europeans today. As such our findings show the existence of a meta-population structure in Europe from the Upper Paleolithic onwards, remnants of which are still found today, despite migrations to and from Europe since the UP. The early UP contribution is greater among northern than southern Europeans, in agreement with the southeast to west and north gene flow cline resulting from the expansion of Neolithic famers 9-6 ka cal BP (20, 45). However, descendants of the early UP population represented by K14 likely also contributed genes to western Siberian groups living around the mouth of the Yenisei River. Therefore, our findings support the view that these Uralic-speaking populations represent an ancient admixture between European and East Asian lineages. The recently proposed Holocene gene flow from East Asians into northern Europeans (21) can, in our view, be equally well explained by population structure of the hunter-gatherer meta-population within Europe. As such our results paint an increasingly complex picture of colonization history of Europe from the UP to today. Instead of inferring a few discrete migration events from Asia into Europe, we now see evidence that humans in Western Eurasia formed a large meta-population with gene flow in multiple directions occurring repeatedly and perhaps continuously.


Click to enlarge

The authors also present a modified schematic of Lazaridis et al. There are two aspects of the conclusion: 1) Many of the assertions are totally uncontroversial (e.g., “it’s complicated”). 2) Many of them seem to be challenging the model posited by publications coming out of David Reich’s lab, where they partner with Svante Paabo’s groups for the ancient DNA work. Let me quote Willerslev and Reich from the Gibbon’s piece in Science:

“There was a really large met-population that probably stretched all the way from the Middle East into Europe and into Eurasia,” Willerslev says. These people interbred at the edges of their separate populations, keeping the entire complex network interconnected—and so giving the ancient Kostenki man genes from three different groups. “In principle, you just have sex with your neighbor and they have it with their next neighbor—you don’t need to have these armies of people moving around to spread the genes.” [Willerslev] … Other researchers say that this new genome is important because “it is the first paper to document some degree of continuity among the first people to get to Europe and the people living there today,” says population geneticist David Reich of Harvard University, one of the authors on the triple migration model. It also is “a striking finding that the Kostenki 14 genome already has the three major European components present that we detect in modern Europeans,” says Johannes Krause of the University of Tübingen in Germany. But even if the man from Kostenki in Russia had all these elements 36,000 years ago, that doesn’t mean that other Europeans did, Reich says. His team’s DNA data and models suggest that Europeans in the west and north did not pick up DNA from the steppes until much later…. [Reich]

I’ve read the Seguin-Orlando et al. paper (and supplements) several times to try and be fair. What I don’t understand is why they can’t acknowledge the possibility that K14 did not leave modern descendants, and was part of an early population which did not end up flourishing. That is consistent with all their results after all, and, consistent with ancient DNA which seems to show a lot less admixture in Mesolithic groups than K14. The fact that Willerslev talks about meta-populations makes it even more confusing, since one of the aspects of meta-population dynamics is the likelihood of repeated population extinctions and re-colonizations. This seems entirely plausible across the European and Northern Eurasian plain during the last Ice Age, as humans retrenched and expanded multiple times. It’s been a while so I checked Wikipedia, and here’s a representative sentence reflecting what I recall learning: “Kritzer & Sale have argued against strict application of the metapopulation definitional criteria that extinction risks to local populations must be non-negligible.” So there’s an argument about whether extinctions are necessary or not. But it shows how critical they’ve been historically to model metapopulation (I think it makes the math easier). As for the idea that gene flows occurred through diffusion, obviously there’s a lot of that. But the punctuated turnover of mtDNA lineages in ancient DNA transects and the ability to infer admixture events from pulse fusion events seems to suggest a great deal of ancient demography could not be modeled in such a fashion. This paper focuses on Northern Eurasia, but in South Asia, Southeast Asia, and Africa, we see clear instances where genetically distinct populations fused rapidly due to demographic expansion enabled by cultural change. Perhaps Northern Eurasia is different, but cases in other parts of the world should scaffold our expectations. Overall I’m left somewhat more confused and interested. Addendum: One thing I want to emphasize. Willersleve seems to be implying that all the variation of West/North Eurasians in its basic algebraic constituents was present about ~10,000 years after the differentiation of non-Africans. Is it really plausible that the last 35,000 years, a time frame 3.5 times as long, is a coda to that? Perhaps. But color me a little skeptical. I suspect that we’ll get more clarity when we stop thinking of prehistoric populations simply as repositories of the history of extant populations. To see what I’m getting at, there are ~10,000 years separating K14 from East Asian populations alive at that time. There are ~35,000 years separating modern West/North Eurasians and East Asians from their putative ancestry populations. Even if you double the value for K14 vs. Paleolithic East Asians by two because there are two paths of drift, that’s still 20,000 years. The peculiarity of these ancient remains is always clear when you visualize them on TreeMix; they’re often very long branches awkwardly slotted into contemporary trees.

• Category: Science • Tags: European Genetics 
🔊 Listen RSS

Don’t forget the deep structure in Italy!
Credit: Rita Molnar

Standard apologies that I have had not the marginal time to blog much, but I thought it was important that I least note that Dr. Peter Ralph and Dr. Graham Coop’s paper on identity-by-descent segments and European populations and history is out in its final form in PLoS Biology, The Geography of Recent Genetic Ancestry across Europe. I’ve been familiar with the outlines of these results for about a year now, and to be frank I am still digesting them. The media hype will come and go, with true but to some extent trivial headlines that “all Europeans are related,” but the consequences of these sorts of genetic inquiries into the relatedness of populations are going to be long lasting. At least they should be.

But before I go on about that, if you find the paper itself a bit daunting (though the main body of the text strikes me as eminently readable for a piece of statistical genetics), see Carl Zimmer’s condensation. With this sort of result there is liable to be confusion, so note that Graham Coop has been posting comments on Carl’s blog (and elsewhere, and you can always send him a note on Twitter). Additionally he has a very readable FAQ out. Dr. Coop told me on Twitter that there would even be updates tomorrow as well! In particular one aspect of the paper which I noticed is that most relatively short, but detectable segments (~10 cM), between any two individuals in many nationalities is not going to be evidence of recent genealogical affinities, but deeper historical process.

As for my earlier allusion about this paper: every historian of the Roman Empire interested in demographic and social questions needs to read this sort of work. The reason is the specific result from Italy, which seems to exhibit a lot of deep local population structure. This is in contrast to other European nations, which are relatively homogenized, to the point of being international in the case of Slavic peoples. Despite decades of genetic work on Italians (thanks to L. L. Cavalli-Sforza) this is the first work which highlights this particularity in relation to other Europeans. That is because as Ralph & Coop note other measures of genetic differentiation (e.g., PCA utilizing thick density SNP-chips) tend to pick up deeper time historical and prehistorical events. In contrast Ralph & Coop are focusing upon segments of the genome inherited as a unit from a common ancestor, whose detectable integrity decays rapidly over the generations via recombination. Though this technique of focusing on inherited segments is powerful, it also has a shallow time depth.

I shall quote the authors from their discussion on Italy:

In addition to the very few genetic common ancestors that Italians share both with each other and with other Europeans, we have seen significant modern substructure within Italy (i.e., Figure 2) that predates most of this common ancestry, and estimate that most of the common ancestry shared between Italy and other populations is older than about 2,300 years (Figure S16). Also recall that most populations show no substructure with regards to the number of blocks shared with Italians, implying that the common ancestors other populations share with Italy predate divisions within these other populations. This suggests significant old substructure and large population sizes within Italy, strong enough that different groups within Italy share as little recent common ancestry as other distinct, modern-day countries, substructure that was not homogenized during the migration period. These patterns could also reflect in part geographic isolation within Italy as well as a long history of settlement of Italy from diverse sources.

The latter idea is the classic one the native Italian people were replaced by migrants during the Roman period, especially from the Eastern Mediterranean. Epigraphic and textual evidence as to the proliferation of Greek names in places such as Rome are proffered to support this case. I am skeptical of these data because slaves and the urban proletariat often had low fertility in antiquity, and cities may have been population sinks anyhow. Rather, I suspect that the primary eastern influence on the genetics of modern Italians comes from the era of Greek colonization during Magna Graecia, because despite the urban focus of their civilization the Hellenes did engage in agriculture.

Rather, I lean toward the proposition that Italy was sui generis in continental Europe after the fall of Rome in that despite its regress it maintained local regional identities due to high population densities. The widespread coalescence of genealogies across vast swaths of the other post-Roman domains (Iberia, France, and Britain) may reflect the shattering of the societies and demographic collapse and localized disturbances. The true test of this hypothesis is when these methods expand out to other regions of the world, especially the southern Mediterranean. Egypt and the Levant should exhibit a more Italian pattern, because there was no deep rupture with antiquity in these areas.

There is much more to say about this paper. But I feel that this result from Italy the sore thumb that sticks out and warrants out attention. Ralph & Coop suggest that collaboration with anthropologists and historians is needed. True indeed.

Citation: Ralph P, Coop G (2013) The Geography of Recent Genetic Ancestry across Europe. PLoS Biol 11(5): e1001555. doi:10.1371/journal.pbio.1001555

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

The Pith: You’re Asian. Yes, you!

A conclusion to an important paper, Nick Patterson, Priya Moorjani, Yontao Luo, Swapan Mallick, Nadin Rohland, Yiping Zhan, Teri Genschoreck, Teresa Webster, and David Reich:

In particular, we have presented evidence suggesting that the genetic history of Europe from around 5000 B.C. includes:

1. The arrival of Neolithic farmers probably from the Middle East.

2. Nearly complete replacement of the indigenous Mesolithic southern European populations by Neolithic migrants, and admixture between the Neolithic farmers and the indigenous Europeans in the north.

3. Substantial population movement into Spain occurring around the same time as the archaeologically attested Bell-Beaker phenomenon (HARRISON, 1980).

4. Subsequent mating between peoples of neighboring regions, resulting in isolation-by-distance (LAO et al., 2008; NOVEMBRE et al., 2008). This tended to smooth out population structure that existed 4,000 years ago.

Further, the populations of Sardinia and the Basque country today have been substantially less influenced by these events.


It’s in Genetics, Ancient Admixture in Human History. Reading through it I can see why it wasn’t published in Nature or Science: methods are of the essence. The authors review five population genetic statistics of phylogenetic and evolutionary genetic import, before moving onto the novel results. These statistics, which measure the possibility of admixture, the extent of admixture, and the date of admixture, are often presented, but nested into supplements, in previous papers by the same group. On the one hand this removes from view the engines which are driving the science. On the other hand I have always appreciated that a benefit of this injustice to the methods which make insight possible is that those without academic access can actually bite into the meat of the researcher’s mode of thought.

I did read through the methods. Twice. I’ve encountered all the statistics before, and I’ve read how they were generated, but I’ll be honest and admit that I haven’t internalized them. That has to end now, because the authors have finally released a software package which implements the statistics, ADMIXTOOLS. I plan to use it in the near future, and it is generally best if you understand the underlying mechanisms of a software package if you are at the bleeding end of analytics. I will review the technical points in more detail in future posts, more for my own edification than yours. But for the moment I’ll be a bit more cursory. Four of the tests use comparisons of allele frequencies along explicit phylogenetic trees. That’s so general as to be uninformative as a description, but I think it’s accurate to the best of my knowledge. In the basics the tests are seeing if a model fits the data (as opposed to TreeMix, which finds the best model out of a range to fit the data). The last method, rolloff, infers the timing of an admixture event based upon the decay of linkage disequilibrium. In short, admixture between two very distinct populations has the concrete result of producing striking genomic correlations. Over time these correlations dissipate due to recombination. The magnitude of dissipation can allow one to gauge the time in the past when the original admixture occurred.

Let’s look at some results. To the left is a section of a table which illustrates the most significant 3-population test scores in the HGDP. The authors checked all the various combinations, and these came out at the top as likely admixtures (i.e., the two sources produce particular patterns in the target). Please remember that these triads should not be taken literally. The Uygur are not descended from Japanese and Italians. Rather, they are descended from populations with genetic affinities to these two sources. Precisely, the Uygurs are descended from Northeast Asian Turks, who assimilated an Indo-European speaking substratum. Most of the results are rather obvious and explicable. Several Middle Eastern populations are known to have Sub-Saharan African admixture, and this is shows up in the results. Others may be more confusing because of the obscurity of the populations, but the Burusho clearly have ancient East Asian ancestry on clustering algorithms, so their presence is not surprising to me. Similarly, the Russians in the HGDP data set have an ‘eastern’ affinity (or at least some do), either due to Finno-Ugric or Turkic ancestry (Tatars regularly assimilated into a Russian ethnic identity as the Tsars expanded their domains).

Some of the other results are more confusing, but one can still find a historical explanation. I have seen evidence that some of the Cambodian samples may have old Indian admixture, though it is not entirely clear to me. But that could explain why there is a signature of West Eurasian admixture into this population (though one wonders why the donor was not Baloch or Pathan.). The Xibo and Tu are Northeast Asian groups, on the border between China proper and the great Eurasian interior. West Eurasian admixture into these groups is not unexpected. West Eurasians are historically attested among the mercenaries and soldiers who arrived on the North China plain after the collapse of the Han dynasty, down to the Alans who served under Kublai Khan. Some of Mongolian and Turkic peoples have individuals who are attested as having characteristics more typical of Europeans (e.g., red hair), so it is likely that this admixture was relatively old and widespread, well before the era of the Pax Mongolica.

There is a minor dissonant note in these results above. The authors used rolloff and inferred an admixture of ~800 years before the present. This is far lower than earlier estimates, which were >2,000 years before the present. First, I have to say that I was mildly skeptical of the higher value reported earlier. From what little I know the roiling of Turco-Mongol peoples which reordered the Inner Asian landscape did not really establish itself beyond the Chinese fringe at this time. Recall that Central Asia was the domain of the Iranians from prehistory down to the Islamic age (the full transition of Central Asia from Persianate to Turkic has not completed itself to this date, though it has progressed over the centuries since 1000 A.D.). Is it creditable that the Turkic hordes were shut on the other side of the Pamirs for ~1,000 years? Perhaps. But it should warrant skepticism, and openness to the lower values proffered here. The technical reason that the authors consider is that STRUCTURE based inferences may overestimate admixture when reference populations are not appropriate. And yet the authors still concede that 800 years is simply difficult to credit when one consults the historical literature. Strangely though it does align with the date of the Mongol ascendancy, during which time the Uygurs served as civil servants in the barbarian empire (Mongol script derives from the old Uygur script). I managed to dig up a cave painting of Uygurs from this period. There is surely artistic license, but they look rather East Asian to me, as opposed to the hybrid Eurasian appearance modal among modern Uygurs. I won’t touch upon the rather fraught and complex ethnology and ethnogenesis of modern Uygurs, and their relationship to Russian and Chinese ethnographers, but suffice it to say that one needs to be careful about excessive reliance on the literality of historical documents in this area, because of semantic confusions.

So let’s move to the main course: what’s going on in Europe? Before putting the spotlight on the macro picture, let’s highlight one secondary aspect: the authors detect evidence of massive gene flow into Spain from Northern Europe ~4,000 years before the present. I’ll let them speak here:

We hypothesize that we are seeing here a genetic signal of the ‘Bell-Beaker culture’ (HARRISON, 1980). Initial cultural flow of the Bell-Beakers appears to have been from South to North, but the full story may be complex. Indeed one hypothesis is that after an initial expansion from Iberia there was a reverse flow back to Iberia (CZEBRESZUK, 2003); this ‘reflux’ model is broadly concordant with our genetic results, and if this is the correct explanation it suggests that this reverse flow may have been accompanied by substantial population movement.

Two things to hammer home here. First, pots move with people. That’s the inference being drawn from the results. It’s not pots-not-people, it’s people-and-pots. Second, the idea of reversals in the direction of gene flow are intriguing, and, I think need to be taken more seriously. It seems the most plausible candidate here are the people who later became the Celtiberians. Celts have been associated with the Bell Beakers before.

But the bigger shock is that Europeans, and especially Northern Europeans, seem to have a substantial Northeast Asian component. From the nature of the prose I feel that the authors were definitely taken aback. They basically say so in so many words. In the process of resolving their confusion they skinned the cat every which way. And it does look to me that Northern Europeans are truly descended in part from a population which has affinities to the “First Americans.” I say this specifically because the Siberian samples they tested actually gave a weaker result than the South American Amerindians on the 3-population test.

So what’s the proportion of ancestry? Using the Siberian population they came up with an interval of 5-18 percent in Northern Europeans. The authors used the Sardinians as their “pure” European reference, and admit that it is likely that their admixture estimate is lower than real value due to this fact. Inference is inference, do you trust this result? As it happens the authors also checked Ötzi the Iceman, and found that like the modern Sardinians he had very little Northeast Asian ancestry. Ötzi is dated to ~5,000 years before he present. Using rolloff the authors estimate an admixture date of ~4,000 years before the present, with an error of nearly 1,000. Additionally, using a different data set they came with an admixture date of ~2,000 years before the present. The latter is obviously wrong (they explain why this could happen in the text). But Ötzi seems to put a boundary on how early it could have been, at least in Southern Europe.

As of publication the authors did not have time to include a reference to this interesting nugget from the abstracts of ASHG 2012:

The complete genome of the 5,300 year old mummy of the Tyrolean Iceman, found in 1991 on a glacier near the border of Italy and Austria, has recently been published and yielded new insights into his origin and relationship to modern European populations. A key finding of this study has been an apparent recent common ancestry with individuals from Southern Europe, in particular Sardinians…We used unpublished data from whole genome sequencing of 452 Sardinian individuals, together with publicly available data from Complete Genomics and the 1000 Genomes project, to confirm that the Iceman is most closely related to contemporary Sardinians. An analysis of these data together with ancient DNA data from a recently published study on Neolithic farmers and hunter-gatherers from Sweden shows the Iceman most closely related to the farmer individual, but not the hunter-gatherers, with the Sardinians again being the contemporary Europeans with the highest affinity. Strikingly, an analysis including novel ancient DNA data from an early Iron Age individual from Bulgaria also shows the strongest affinity of this individual with modern-day Sardinians. Our results show that the Tyrolean Iceman was not a recent migrant from Sardinia, but rather that among contemporary Europeans, Sardinians represent the population most closely related to populations present in the Southern Alpine region around 5000 years ago. The genetic affinity of ancient DNA samples from distant parts of Europe with Sardinians also suggests that this genetic signature was much more widespread across Europe during the Bronze Age.

I’m betting that this Bulgarian sample won’t exhibit Northeast Asian ancestry, though who knows?

There is a definite geographic pattern within Europe to the strength of the signature of admixture. Northern European populations have the greatest, Southern European populations less, and islanders like Cypriots hardly any. Recall that Sardinians seem to be the best reference, so the ~0 floor may just be a statistical artifact of the measuring stick we have. All that being said, what went on <5,000 years before the present to reorder the European landscape?

The answer may sound crazy, but I think the most probable explanation (even if it is unlikely) is something to do with the Indo-Europeans. We know that Indo-European languages were spoken in Greece by ~1500 BC at the latest. One thing that is clear from less advanced clustering algorithms is that Basques and Finns are somewhat distinctive in relation to their neighbors. Though they are not genetically that different, they still lack some “interesting”elements. The results to the left are from Dienekes, though I’ve replicated it. You can see a similar difference between French, and French Basques. The Basques seem to lack something which has affinities with West Asia. These results, and hints elsewhere, imply that the Basque may not be descended from hunter-gatherers, but the first European farmers. So who came after them?

Though it strikes me as a bizarre conjecture, but I can’t help but imagine the rapid expansion of Indo-European populations into Europe, pushing into the peninsulas of the south. These people may have been a newly formed cosmopolitan mix of West Asians, Northern European Mesolithics, and Northeast Asians. I am at a loss to hazard a guess as to who the First American-like Northeast Asians were, though perhaps they were a western offshoot of the Kets? These people were then absorbed into a melange of tribes who themselves emerged from a synthesis between immigrant West Asian farmers and Northern Europeans. In shorthand: perhaps the Indo-Europeans were mongrels! This is not an entirely crazy proposition if you look at the historical record. Conquest populations often synthesized and absorbed those who they conquered. Sometimes they even became the conquered in deep cultural ways (e.g., the Bulgars).

To ward off accusations of glib and facile speculations, I well understand that much of what I suggest above is likely wrong. But bizarre results are going to elicit unhinged hypotheses. And I shouldn’t overplay how strange these results are, I think they are going to stand the test of time. The authors are top notch, and Dr. Joseph Pickrell found the same pattern (a connection between Europeans and Native Americans) with TreeMix! If we sit back and reflect on phenotype it shouldn’t be entirely surprising. Some Scandinavians have always struck me as having a generalized Eurasian cast to their features. Obviously this tendency is stronger among the Sami and Finns, but you can see it in Swedes and others. This is far less evident to me among Southern European peoples. I doubt one would ever confuse a Sardinian for a Eurasian, and I never had that feeling when I spent some time in Italy a few years back (in contrast, some Finns did look Asiatic to me).

Finally, this paper highlights the reality that population genetics has little to do with Plato. A population within a species is simply not clear and distinct in a sense which would satisfy an Idealist. The authors of the above paper nod to this, illustrating how their tests for admixture are confounded and confused by constant gene flow via isolation-by-distance dynamics. These results indicate that Northern Europeans are on the order of 10% Northeast Asian. Does this mean that Northern Europeans are 10% non-white? Well, it turns out that white people were always 10% non-white! We just didn’t know. Is my daughter (who is 50% Northern European) now majority non-white? Oh wait, I’m South Asian. That means I’m ~50% white! Is my friend who is 25% Japanese now more than 25% Northeast Asian? Words and concepts fail us on the boundary of unfamiliarity, in time and space. Populations and genealogies don’t brook our categorizations. On a deep level we are all admixtures, and partitioning of ancestry along phylogenetic trees are useful and comprehensible fictions. These techniques put flesh upon the bones of archaeology and smoke out the outlines of history. But we always need to be aware that that history is not made by humans, rather, we excavating it, and then giving it appropriate glosses in our museums. And yet it is.

Related: Dienekes has much to say (obviously).

Image credit: Wikipedia, Wikipedia, and Wikipedia.

Cite: 10.1534/genetics.112.145037

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

ResearchBlogging.orgThe Pith: Over the past 10,000 years a small coterie of farming populations expanded rapidly and replaced hunter-gatherer groups which were once dominant across the landscape. So, the vast majority of the ancestry of modern Europeans can be traced back to farming cultures of the eastern Mediterranean which swept over the west of Eurasia between 10 and 5 thousand years before the before.

Dienekes Pontikos points me to a new paper in PNAS which uses a coalescent model of 400+ mitochondrial DNA lineages to infer the pattern of expansions of populations over the past ~40,000 years. Remember that mtDNA is passed just through the maternal lineage. That means it is not subject to the confounding dynamic of recombination, allowing for easier modeling as a phylogenetic tree. Unlike the autosomal genome there’s no reticulation. Additionally, mtDNA tends to be highly mutable, and many regions have been presumed to be selectively neutral. So they are the perfect molecular clock. There straightforward drawback is that the history of one’s foremothers may not be a good representative of the history of one’s total lineage. Additionally the haploid nature of mtDNA means that genetic drift is far more powerful in buffeting gene frequencies and introduced stochastic fluctuations, which eventually obscure past mutational signals through myriad mutations. Finally, there are serious concerns as to the neutrality of mtDNA…though the authors claim to address that in the methods. I should also add that it also happens to be the case that there is less controversy and more surety as to the calibration of mutational rates of mtDNA than the Y chromosomal lineages of males. Their good for determining temporal patterns of demographic change, and not just tree structures.

Here’s the abstract, Rapid, global demographic expansions after the origins of agriculture:

The invention of agriculture is widely assumed to have driven recent human population growth. However, direct genetic evidence for population growth after independent agricultural origins has been elusive. We estimated population sizes through time from a set of globally distributed whole mitochondrial genomes, after separating lineages associated with agricultural populations from those associated with hunter-gatherers. The coalescent-based analysis revealed strong evidence for distinct demographic expansions in Europe, southeastern Asia, and sub-Saharan Africa within the past 10,000 y. Estimates of the timing of population growth based on genetic data correspond neatly to dates for the initial origins of agriculture derived from archaeological evidence. Comparisons of rates of population growth through time reveal that the invention of agriculture facilitated a fivefold increase in population growth relative to more ancient expansions of hunter-gatherers.

As Dienekes notes until recently the orthodoxy was that the genetic variation of modern populations was well explained by the genetic variation of Paleolithic groups after the Last Glacial Maximum ~20,000 years B.P. In this line of thought agriculture spread often by cultural diffusion, and the first local adopters in a region would then enter into a phase of demographic expansion. Bryan Sykes’ Seven Daughters of Eve and Stephen Oppenheimer’s The Real Eve are expositions of this point of view, which really was the historical genetic mainstream. This also dovetailed with the anthropological bias of “pots-not-people,” whereby cultural forms moved through transmission and not migration. There were some dissenters, such as Peter Bellwood, but by and large the genetic evidence at least was robust enough that they could be dismissed.

So what happened? Several things. First, the sample sets of mtDNA and Y chromosomes kept getting larger. There was deeper sequencing of informative regions. Thick SNP-chip autosomal studies came to the fore, with different conclusions. Finally, ancient DNA extraction allowed scientists to compare the real lineages of hunter-gatherers in ancient Europe vs. what they had presumed were hunter-gatherer descendant lines in modern Europeans. The strong disjunction often found was indicative of a major failing in the prior assumptions of the theorists of the early 2000s: that they could infer confidently past events from the palimpsest of modern genetic variation. They couldn’t. We know that because they seem to have been wrong.

Let’s give India as an example of “what went wrong.” Here’s a paper from 2005, Most of the extant mtDNA boundaries in South and Southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans:

Since the initial peopling of South and West Asia by anatomically modern humans, when this region may well have provided the initial settlers who colonized much of the rest of Eurasia, the gene flow in and out of India of the maternally transmitted mtDNA has been surprisingly limited. Specifically, our analysis of the mtDNA haplogroups, which are shared between Indian and Iranian populations and exhibit coalescence ages corresponding to around the early Upper Paleolithic, indicates that they are present in India largely as Indian-specific sub-lineages. In contrast, other ancient Indian-specific variants of M and R are very rare outside the sub-continent.

The Upper Paleolithic is pre-Holocene. I generally accepted this, until the the studies came out from the SNP-chips which had hundreds of thousands of autosomal markers. To be short about it Indians just seemed too close to West Eurasians if the mtDNA results were correct, and, representative. In fact, if Reconstructing Indian History is correct, about half the South Asian genome in aggregate is very close to that of West Eurasians, to the point where it seems likely to have a common ancestry in the Holocene. The mistaken inference from mtDNA may be due in part to sex-biased gene flow. That is, the South Asian exogenous genome was strongly biased toward male migration, while the deep time mtDNA substrate has tended to persist underneath all these successive layers.

Moving to the paper in question, they use a “Bayesian skyline” method to reconstruct past demographic history. Specifically, the history of the direct maternal lineage. We wouldn’t really pay attention if they didn’t have interesting results. And they do indeed.

The table is rather straightforward. They partitioned the samples they had into putative hunter-gatherer and Neolithic lineages. Notice the difference. For some of these cases we have very robust non-genetic evidence of expansion. This is true especially for the African and Southeast Asian Holocene cases. Their methods here predict exactly what we already know. So the key value add is that the methods are predicting something which is more in dispute: the demographic history of contemporary European mtDNA lineages. The concordance of the archaeological evidence of the Neolithic transition in Europe and the inferred demographic expansion of European Neolithic mtDNA lineages is striking.

The plot to the left is the curve of demographic expansion predicted from their method for Neolithic and Paleolithic lineages in Europe. The y-axis is log-scaled, so it naturally understates the explosive growth of Neolithic lineages. It comports well with what we know of how agricultural societies tend to expand and stabilize over time. During a phase of “land surplus” they enter into rapid demographic expansion, forcing the frontier of settlement out. Once the land is “filled up” we enter into the classic Malthusian “stationary state,” where the grinding misery of the peasantry becomes the lot of most. In contrast hunter-gatherer lineages didn’t experience such an explosive shift. Though pre-modern hunter-gatherer landscapes were more diversified than what we experience today, because they had access to the rich “bottom lands” and seashores now monopolized by agriculturalists, the carrying capacity of the land was generally lower for their lifestyle, and waxed and waned more gradually with shifts in ecology.

The authors also did some neat geo-visualization, if I do say so (and I’m jealous!). The two panels illustrate the spread of agriculture as inferred from archaeology, and the rate of population growth calculated from the joint information of the time of onset of a farming lifestyle in a region and the point on the “growth curve” for the Middle Eastern lineages at that time. So above you see the spread of agriculture from the eastern Mediterranean from 8000 BC to 2500 BC. Then, you see a geographical illustration of the S-shaped growth curve of the farmers. Their initial colonies experienced modest growth, but there was a transition zone in the middle of rapid expansion. Why? Perhaps there was a necessary critical mass, before the superiority of numbers began to wear down the hunter-gatherers. But this itself was a transient, as the farmer societies ran up against the limits of ecology along the northern European plain (or, perhaps just as likely, they encountered dense hunter-gatherer societies which were able to temporarily withstand their aggressive expansion on the European maritime fringe). I suspect that the models are more complex than a one-two punch, in either time or space. There were likely several pulses and distinct streams coming out of the Middle East which populated Europe.

They conclude that “Mesolithic ancestry makes up only a fraction of contemporary European genomes. U5a, U5b1, V, and 3H combined account for ≈15% of western Europeans mtDNA haplogroups.” Note that U5a and U5b are modal among the Finnic peoples of Europe. V seems widely distributed, and modal in northern Scandinavia and the western Mediterranean. I can’t seem to find easy information on 3H.

From the supplements here are the European haplgroups they selected:

We chose haplogroups associated with an origin in Near Eastern populations during the Holocene: T1, T2, J1a, K2a, and H4a. These haplogroups (T1, T2, J1a, and K) all appear to have Near Eastern founders that migrated to Europe after the Younger Dryas (2). After inspecting the haplogroup K network in Behar et al. (4), we chose the subgroup K2a, which appears to be present in the Near East (including non-Ashkenazi Jews) and European populations (but not North Africa). Haplogroup H4a is thought to have expanded throughout Europe during the Neolithic (5). However, the location of its origin is still not certain (6). Removing H4a from the Skyline analysis did not substantively change the timing of Holocene period expansion (results not shown). European haplogroups U5, V, and 3H are associated with an indigenous origin in Europe (2). Haplogroups U5a, U5b1, V, and 3H have all been attributed a TMRCA during the Last Glacial Period (2, 7–9)

Readers more well versed in the literature on mtDNA haplogroups can pick these details apart.

Where does this leave us? If this and other recent papers are correct. then the expansion of farming to Europe from the Middle East resembles the settlement of the New World far more than we may have thought! In some regions there was likely near total replacement of the substrate, perhaps like the United States. In others there was modest uptake of the indigenous substrate, as is the case in Argentina. Finally, there were regions where the indigenous hunter-gatherer substrate may have persisted to a far greater extent. I think this may be the case mostly in Baltic Europe, which combined both the possibility of relatively high hunter-gatherer carrying capacities because of marine resources and a climatic regime rather unsuitable to the initial Middle Eastern crops.

Citation: Gignoux CR, Henn BM, & Mountain JL (2011). Rapid, global demographic expansions after the origins of agriculture. Proceedings of the National Academy of Sciences of the United States of America, 108 (15), 6044-9 PMID: 21444824

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

I decided to take the Dodecad ADMIXTURE results at K = 10, and redo some of the bar plots, as well as some scatter plots relating the different ancestral components by population. Don’t try to pick out fine-grained details, see what jumps out in a gestalt fashion. I removed most of the non-European populations to focus on Western Europeans, with a few outgroups for reference.

Here’s a table of the correlations (I bolded the ones I thought were interesting):

W Asian NW African S Europe NE Asian SW Asian E Asian N European W African E African S Asian
W Asian * -0.01 -0.18 0.04 0.81 0.59 -0.64 0.39 0.2 0.04
NW African * * 0.19 -0.16 0.23 -0.09 -0.19 0.26 0.67 -0.11
S European * * * -0.38 -0.03 -0.27 -0.42 -0.11 -0.02 -0.36
NE Asian * * * * -0.06 0.5 0.26 -0.04 -0.1 -0.07
SW Asian * * * * * 0.21 -0.62 0.74 0.59 -0.13
E Asian * * * * * * -0.27 0.08 0 0.14
N European * * * * * * * -0.34 -0.28 -0.31
W African * * * * * * * * 0.86 -0.04
E African * * * * * * * * * -0.07

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Synthetic map

In the age of 500,000 SNP studies of genetic variation across dozens of populations obviously we’re a bit beyond lists of ABO blood frequencies. There’s no real way that a conventional human is going to be able to discern patterns of correlated allele frequency variations which point to between population genetic differences on this scale of marker density. So you rely on techniques which extract the general patterns out of the data, and present them to you in a human-comprehensible format. But, there’s an unfortunate tendency for humans to imbue the products of technique with a particular authority which they always should not have. The History and Geography of Human Genes is arguably the most important historical genetics work of the past generation. It has surely influenced many within the field of genetics, and because of its voluminous elegant visual displays of genetic data it is also a primary source for those outside of genetics to make sense of phylogenetic relations between human populations. And yet one aspect of this great work which never caught on was the utilization of “synthetic maps” to visualize components of genetic variation between populations. This may have been fortuitous, a few years ago a paper was published, Interpreting principal components analyses of spatial population genetic variation, which suggested that the gradients you see on the map above may be artifacts:

Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.’s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.

A paper earlier this year took the earlier work further and used a series of simulations to show how the nature of the gradients varied. In light of recent preoccupations the results are of interest. Principal Component Analysis under Population Genetic Models of Range Expansion and Admixture:

In a series of highly influential publications, Cavalli-Sforza and colleagues used principal component (PC) analysis to produce maps depicting how human genetic diversity varies across geographic space. Within Europe, the first axis of variation (PC1) was interpreted as evidence for the demic diffusion model of agriculture, in which farmers expanded from the Near East ∼10,000 years ago and replaced the resident hunter-gatherer populations with little or no interbreeding. These interpretations of the PC maps have been recently questioned as the original results can be reproduced under models of spatially covarying allele frequencies without any expansion. Here, we study PC maps for data simulated under models of range expansion and admixture. Our simulations include a spatially realistic model of Neolithic farmer expansion and assume various levels of interbreeding between farmer and resident hunter-gatherer populations. An important result is that under a broad range of conditions, the gradients in PC1 maps are oriented along a direction perpendicular to the axis of the expansion, rather than along the same axis as the expansion. We propose that this surprising pattern is an outcome of the “allele surfing” phenomenon, which creates sectors of high allele-frequency differentiation that align perpendicular to the direction of the expansion.

The first figure shows the general framework with which they performed the simulations:


You have a lattice which consists of demes, population units, all across Europe. They modulated parameters such as population growth (r), carrying capacity (C), and migration (m). Additionally, they had various scenarios of expansion from the southwest or southeast, as well as two expansions one after another to mimic the re-population of Europe after the Ice Age by Paleolithic groups, and their later replacement by Neolithic groups. They modulated admixture and introgression of genes from the Paleolithic group to the Neolithics so that you had the full range where the final European were mostly Neolithic or mostly Paleolithic.

Below are some of the figures which show the results:

[nggallery id=25]

allesurAs you can see the strange thing is that in some models the synthetic map gradient is rotated 90 degrees from the axis of demographic expansion! In this telling the famous synthetic map showing Neolithic expansion might be showing expansion from Iberia. Perhaps a radiation from a post-Ice Age southern refuge?

One explanation might be “allele surfing” on the demographic “wave of advance.” Basically as a population expands very rapidly stochastic forces such as random genetic drift and bottlenecks could produce diversification along the edge of the population wave front. The reason for this is that these rapidly expanding populations explode out of serial bottlenecks and demographic expansions, which will produce genetic distinctiveness among the many differentiated demes bubbling along the edge of expansion. Alleles which may have been at low frequency in the ancestral population can “fix” in descendant populations on the edge of the demographic wave of advance. This is the explanation, more or less, that one group gave last year for the very high frequencies of R1b1b2 in Western Europe. With this, they overturned the classic assumption that R1b1b2 was a Paleolithic marker, and suggested it was a Neolithic one.

Here’s their conclusion from the paper:

A previous study showed that the original patterns observed in PCA might not reflect any expansion events (Novembre and Stephens 2008). Here, we find that under very general conditions, the pattern of molecular diversity produced by an expansion may be different than what was expected in the literature. In particular, we find conditions where an expansion of Neolithic farmers from the southeast produces a greatest axis of differentiation running from the southwest to the northeast. This surprising result is seemingly due to allele surfing leading to sectors that create differentiation perpendicular to the expansion axis. Although a lot of our results can be explained by the surfing phenomenon, some interesting questions remain open. For example, the phase transition observed for relatively small admixture rates between Paleolithic resident and Neolithic migrant populations occurs at a value that is dependent on our simulation settings, and further investigations would be needed to better characterize this critical value as a function of all the model parameters. Another unsolved question is to know why the patterns generally observed in PC2 maps for our simulation settings sometimes arise in PC1 maps instead. These unexplained examples remind us that PCA is summarizing patterns of variation in the sample due to multiple factors (ancestral expansions and admixture, ongoing limited migration, habitat boundary effects, and the spatial distribution of samples). In complex models such as our expansion models with admixture in Europe, it may be difficult to tease apart what processes give rise to any particular PCA pattern. Our study emphasizes that PC (and AM) should be viewed as tools for exploring the data but that the reverse process of interpreting PC and AM maps in terms of past routes of migration remains a complicated exercise. Additional analyses—with more explicit demographic models—are more than ever essential to discriminate between multiple explanations available for the patterns observed in PC and AM maps. We speculate that methods exploiting the signature of alleles that have undergone surfing may be a powerful approach to study range expansions.

What’s the big picture here? In the textbook Human Evolutionary Genetics it is asserted that synthetic maps never became very popular compared to PCA itself. I think this is correct. But, the original synthetic maps have become prominent for many outside of genetics. They figure in Peter Bellwood’s First Farmers, and are taken as a given by many pre-historians, such as Colin Renfrew. And yet a reliance on these sorts of tools must not be blind to the reality that the more layers of abstraction you put between your perception and comprehension of concrete reality, the more likely you are to be led astray by quirks and biases of method.

In this case I do think first-order intuition would tell us that synthetic maps which display PCs would be showing gradients as a function of demographic pulses. And yet the intuition may not be right, and with the overturning of old orthodoxies in the past generation of inferences from the variation patterns in modern populations, we should be very cautious.

Citation: Olivier François, Mathias Currat, Nicolas Ray, Eunjung Han, Laurent Excoffier, & John Novembre (2010). Principal Component Analysis under Population Genetic
Models of Range Expansion and Admixture Mol Biol Evol

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

genmap3One of the more popular posts on this weblog (going by StumbleUpon and search engine referrers) focuses on genetic variation in Europe as a function of geography. In some ways the results are common sense; populations closer to each other are more genetically related. Why not? Historically people have married their neighbors and so gene flow is often well modeled as isolation by distance. The scientific rationale for these studies is to smoke out population stratification in medical genetics research programs which attempt to find associations between genes and particular diseases. By population stratification I mean the fact that different populations will naturally have different gene frequencies, and if those populations exhibit different frequencies of the disease/trait under investigation then one may have to deal with spurious correlations. If, for example, your study population includes many people of African and European descent, presumably cautious researchers would immediately by aware of this problem and attempt to take it into account. But what about populations which are genetically closer, or whose genetic difference may not be so well manifest in physical characteristics which might clue you in to the issue of stratification? That’s why the sorts of results which might seem common sense in the aggregate are useful. One can ask questions as to the genetic closeness of Irish and English, or Irish and Spanish, in a rigorous sense. In the United States research programs which are constrained to white cases and controls may hide population stratification because of the ethnic diversity of the American population. A primary motivation for studies of Jewish genetics are the cluster of “Jewish diseases” which are common within that population. In our age it is fashionable to focus on what binds us together as a species, but genetic differences matter a great deal. Ask the parents of multiracial children who require bone marrow transplants.

A new paper in Human Heredity examines a large sample of five European populations, and goes over the between population allele frequency differences with a fine tooth comb. Genetic Differences between Five European Populations:

We sought to examine the magnitude of the differences in SNP allele frequencies between five European populations (Scotland, Ireland, Sweden, Bulgaria and Portugal) and to identify the loci with the greatest differences…We found 40,593 SNPs which are genome-wide significantly…The largest differences clustered in gene ontology categories for immunity and pigmentation. Some of the top loci span genes that have already been reported as highly stratified: genes for hair color and pigmentation (HERC2, EXOC2, IRF4), the LCT gene, genes involved in NAD metabolism, and in immunity (HLA and the Toll-like receptor genes TLR10, TLR1, TLR6). However, several genes have not previously been reported as stratified within European populations, indicating that they might also have provided selective advantages: several zinc finger genes, two genes involved in glutathione synthesis or function, and most intriguingly, FOXP2, implicated in speech development. Conclusion: Our analysis demonstrates that many SNPs show genome-wide significant differences within European populations and the magnitude of the differences correlate with the geographical distance. At least some of these differences are due to the selective advantage of polymorphisms within these loci

They looked at ~350,000 SNPs across the five populations. The sample sizes were pretty large: 1,129 individuals from Bulgaria, 1,142 from Ireland, 656 from Scotland, 620 from Sweden, and 563 from Portugal. In the supplements they had a figure where they displayed the genetic variation on the two largest principal components for their sample and color-coded by region of origin. Next to this they transposed the PCA onto a map of Europe.


This confirms previous findings that the largest component of variation in Europe is north-south (at least evaluating to the west of a particular geographical cutoff), with a secondary east-west dimension. But the focus of the paper wasn’t really phylogenetic relationships between the populations as such, but the patterns of genetic differences across them. Table 1 shows the population to population differences in SNPs. Rescaled here means that the results were rescaled for sample size, which differed between populations, along with the value after a Bonferroni correction.


The pairwise differences are what you’d expect from the PCA. Most of the between population difference is probably due to history; populations random walk into their own gene frequencies through isolation by distance. But there’s more to the story than that, as is clear in table 2.


As noted by the authors genes in specific categories or classes are overrepresented among those with large between population differences. In particular, they focus on genes related to immune function and pigmentation. The reason for variation on the former is relatively straightforward, research on patterns of natural selection in the human genome have long pinpointed loci implicated in immune function as having been particularly shaped by this evolutionary genetic parameter, no doubt because disease resistance has a major impact on reproductive fitness. Additionally, it seems likely that immune related function is constantly being buffeted by selection because of the prominence of frequency dependent dynamics. As for pigmentation, it has also shown up as a major target of natural selection in many of the more recent papers, and it’s a trait whose genetic architecture we have a reasonably good grasp of now. They also found that the NAD synthetase 1 gene was stratified. They note that this impacts metabolism and has been found to have a relationship to the disease pellagra. Loci related to diet also seem to be disproportionately affected by natural selection, and that stands to reason as the shift to agriculture was relatively recent and many populations may still be going through transients (e.g., gluten sensitivity). The densities and diets of European populations even today vary a great deal. Italy is about an order of magnitude more dense in population than Sweden, and this has likely been the case for many millennia due to differences in primary agricultural productivity. Finally, the authors observe that FOXP2 is also stratified. This is the famous “language gene,” which regularly makes press every few years. The short of it is that FOXP2 seems to be involved in complex vocalization, and been subject to selection in tetrapod lineages where vocal ability is pronounced (birds, humans, etc.). They don’t make much of the variation in the paper, but it seemed warranted to note that the gene had popped up in their tests.

The authors freely admit that their findings are provisional:

Our paper focuses on the top 11 loci and suggests plausible mechanisms for most of them. However, the total number of genome-wide significant SNPs is 150,000 and the top hits clustered in several GO categories. We cannot judge which ones are due to the effects of selection or to other mechanisms. We present a full list of genes with the best and median p values for SNPs within them (separately for the full sample and for controls only), so that others can make use of this information in future studies…

Citation: Moskvina V, Smith M, Ivanov D, Blackwood D, Stclair D, Hultman C, Toncheva D, Gill M, Corvin A, O’Dushlaine C, Morris DW, Wray NR, Sullivan P, Pato C, Pato MT, Sklar P, Purcell S, Holmans P, O’Donovan MC, Owen MJ, & Kirov G (2010). Genetic Differences between Five European Populations. Human heredity, 70 (2), 141-149 PMID: 20616560

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

genmap1A few years ago you started seeing the crest of studies which basically took several hundred individuals (or thousands) from a range of locations, and then extracted out the two largest components of genetic variation from the hundreds of thousands of variants. The clusters which fell out of the genetic data, with each point being an individual’s position, were transposed onto a geographical map. The figure to the left (from this paper) has been widely circulated. You don’t have to be a deep thinker to understand why things shake out this way; people are more closely related to those near than those far because gene flow ties populations together, and its power decreases as a function of distance.

Of course the world isn’t flat, and history perturbs regularities. Jews for example often don’t shake out where they “should” geographically, because of their historical mobility contingent upon random and often capricious geopolitical or social pressures. The Hazara of Afghanistan have their ethnogenesis in the melange of peoples who were thrown together after the Mongol conquest of Central Asia and Iran in the 13th century, and the subsequent collapse of the Ilkhan dynasty. Though the Hazara have mixed with their Persian, Tajik and Pashtun neighbors, they still retain a strong stamp of Mongolian ancestry which means that they are at some remove on the “genetic map” from their geographical neighbors.

So when interpreting these sorts of results you have two extreme dynamics operative. On the one hand you have an equilibrium state where gene flow is mediated through continuous but small flows of migration; women moving between villages, younger sons venturing out of the village in search of better opportunities. Then you have the random (or perhaps modeled as a poisson distribution) “shocks” which are attributed to world-historical (or region-historical) events which leave an outsized and often perplexing stamp and distort the genetic map from the geographic one. Sometimes the two are not in balance. In much of the New World and Australasia the native populations were genetically replaced by settlers from the outside. Thousands of years of genetic variation accumulated and shaped by localized gene flow events were wiped clean off the map by the demographic tsunami.

Obviously that’s an extreme scenario. The macroscale does not always render the microscale irrelevant in such a fashion. A new short paper in The European Journal of Human Genetics gives us an example. Genes predict village of origin in rural Europe:

The genetic structure of human populations is important in population genetics, forensics and medicine. Using genome-wide scans and individuals with all four grandparents born in the same settlement, we here demonstrate remarkable geographical structure across 8–30 km in three different parts of rural Europe. After excluding close kin and inbreeding, village of origin could still be predicted correctly on the basis of genetic data for 89–100% of individuals.

Here’s the ubiquitous PC chart, except on the scale of villages:


As noted above they excluded close relatives, out to second cousins. They judge the genetic time depth is about ~120 years into the past back to the common ancestry. Remember that if their grandparents are from this village they obviously are going to be somewhat inbred, from the perspective of an American whose ancestors are from different nations. But for most of history the European case was the typical one, not the American one where people from different continents mingled.

Here’s part of the discussion which I think needs highlighting:

To explore how many markers are required to recover these fine scale patterns of structure, we ranked SNPs by FST among villages and repeated the PCA for the most differentiated subsets of 30 000, 10 000, 3000 and 300 SNPs in each population. In all three populations, 10 000 or more high FST SNPs recovered an essentially identical picture to that using the full data set, and even 3000 SNPs preserved considerable separation between the villages (not shown). Using only the most discriminating 300 SNPs, little structure could be observed between the two Croatian villages; however, in Scotland and Italy one of the three settlements included in each location remained completely differentiated from the other two (not shown). We note that these results are only indicative of the minimum number of SNPs required to separate these populations, as by necessity SNPs have been selected intrinsically on the basis of FST within the same data set, rather than extrinsically from other data.

The slightly lower differentiation of the Croatian villages is not surprising given the fact that they are physically the closest of those considered here, being 8 km apart, with only low hills separating them. In contrast, the settlements in the Scottish Isles and Italy are separated by 15–30 km of sea in the former case, and of 3000 m mountains in the latter, although there are deep connecting valleys.

First, we get a sense of the range of informative markers necessary to discern population structure well in much of the Old World. For continental races (e.g., Europeans vs. East Asians) you need on the order of 10-100 markers to distinguish them with a high degree of confidence (closer to the low bound than the high). It looks like in the case of village vs. village differences, it will be on the order of 100-1000 markers. I suspect in Iraq or the Caucasus you’ll need less than 300 markers, because genetic differentiation is higher over a shorter distance due to inbreeding, ethnic diversity, and geography (more the former in Iraq, more the latter in the Caucasus). In contrast, in regions where geography is conducive to transport and local norms enforce exogamy I wouldn’t be surprised if you need more like a thousand markers.

Second, observe the importance of topographical detail. I have observed before than Sardinia is a genetic outlier in Europe. That’s not because Sardinians interbred with native elves of that island. Rather, a water barrier serves as a major check on continuous gene flow mediated by banal contacts (e.g., going to the market and meeting a person from the neighboring village). Islands become worlds unto themselves. Though they are effected by the exogenous shocks, they are less subject to the continuous gene flow at the equilibrium because the water serves as a barrier. Similarly mountains can produce genetic barriers as well, because they make travel rather difficult. In Consanguinity, Inbreeding, and Genetic Drift in Italy L. L. Cavalli-Sforza documents in detail through Roman Catholic Church records what a big impact modern roads had on inbreeding coefficients, which plunged in the 19th century. Distortions of the genetic map tells about variations in elevation in the third dimension on the geographic map!

The utility of this sort of data collection and analysis in the modern world is an empirical question. On the one hand many Europeans are relatively less inclined to move in comparison to Americans. And yet the breaking down of borders with the European Union and the likely need for a more productive economic sector on that continent because of changing demographics point to greater mobility, migration and mixing, which would make these sorts of studies of only near-term use. Of more interest to me are going to be fine-grained analyses of social groups. For example the Indian caste system. Last fall in the Reich et al. paper the authors seemed to be indicating the likelihood of a lot of between population variance groups these groups. It doesn’t matter if a particular Bania sub-caste from Gujarat is scattered across the world, from Kenya to England to the United States. They may all still marry amongst a set of individuals who hale from the same original few villages.

Good times.

Citation: O’Dushlaine, C., McQuillan, R., Weale, M., Crouch, D., Johansson, Aulchenko, Y., Franklin, C., Polašek, O., Fuchsberger, C., Corvin, A., Hicks, A., Vitart, V., Hayward, C., Wild, S., Meitinger, T., van Duijn, C., Gyllensten, U., Wright, A., Campbell, H., Pramstaller, P., Rudan, I., & Wilson, J. (2010). Genes predict village of origin in rural Europe European Journal of Human Genetics DOI: 10.1038/ejhg.2010.92

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

A few months ago I blogged a paper in PLoS Biology which suggested that a common Y chromosomal haplogroup, in fact the most common in Europe and at modal frequency along the Atlantic fringe, is not pre-Neolithic. Rather their analysis of the data implied that the European variants were derived from an Anatolian variant. The implication was that a haplogroup which had previously been diagnostic of “Paleolithicness,” so to speak, of a particular population may in fact be an indication of the proportion of Neolithic Middle Eastern ancestry. The most interesting case were the Basques, who have a high frequency of this haplogroup, and are often conceived of as “ur-Europeans,” Paleolithic descendants of the Cro-Magnons in the most romantic tellings. I was somewhat primed to accept this finding because of confusing results from ancient DNA extraction which implies a lot of turnover in maternal lineages, the mtDNA. My logic being that if the mtDNA exhibited rupture, then the Y lineages should too, as demographic revolutions are more likely to occur among men.

But perhaps not. A new paper in PLoS ONE takes full aim at the paper I blogged above. It is in short a purported refutation of the main finding of the previous paper, and a reinstatement of what had been the orthodoxy (note the citations to previous papers). A Comparison of Y-Chromosome Variation in Sardinia and Anatolia Is More Consistent with Cultural Rather than Demic Diffusion of Agriculture:

Two alternative models have been proposed to explain the spread of agriculture in Europe during the Neolithic period. The demic diffusion model postulates the spreading of farmers from the Middle East along a Southeast to Northeast axis. Conversely, the cultural diffusion model assumes transmission of agricultural techniques without substantial movements of people. Support for the demic model derives largely from the observation of frequency gradients among some genetic variants, in particular haplogroups defined by single nucleotide polymorphisms (SNPs) in the Y-chromosome. A recent network analysis of the R-M269 Y chromosome lineage has purportedly corroborated Neolithic expansion from Anatolia, the site of diffusion of agriculture. However, the data are still controversial and the analyses so far performed are prone to a number of biases. In the present study we show that the addition of a single marker, DYSA7.2, dramatically changes the shape of the R-M269 network into a topology showing a clear Western-Eastern dichotomy not consistent with a radial diffusion of people from the Middle East. We have also assessed other Y-chromosome haplogroups proposed to be markers of the Neolithic diffusion of farmers and compared their intra-lineage variation—defined by short tandem repeats (STRs)—in Anatolia and in Sardinia, the only Western population where these lineages are present at appreciable frequencies and where there is substantial archaeological and genetic evidence of pre-Neolithic human occupation. The data indicate that Sardinia does not contain a subset of the variability present in Anatolia and that the shared variability between these populations is best explained by an earlier, pre-Neolithic dispersal of haplogroups from a common ancestral gene pool. Overall, these results are consistent with the cultural diffusion and do not support the demic model of agriculture diffusion.

Their main trump cards seem to be that they used a denser set of markers, and, they claim they have a more accurate molecular clock. Ergo, in the latter case they produce a better time to the last common ancestor, which is twice as deep as the paper they’re attempting to refute. Someone like Dienekes or Polish Genetics can tackle the controversies in scientific genealogy here (I know Dienekes has a lot of interest in mutational rates which go into the molecular clock for these coalescence times). Rather, I would suggest that usage of Sardinians concerns me for an obvious reason: they’re genetic outliers in Europe. A lot of this has to do with being an island. Islands build up uniqueness because they don’t engage in the normal low level gene flow between adjacent populations because they’re…well, islands. You would know about Sardinia’s position because they’re one of the populations in L. L. Cavalli-Sforza’s HGDP sample and they show up in History & Geography of Human Genes as on the margins of the PCA plots. But here’s a figure from a more recent paper using a much denser market set, constrained to Southern European populations. I labelled some of the main ones so you’d get a sense of why I say Sardinians are outliers:

Over the two largest independent dimensions of genetic variation you can see a distribution from the southeast Mediterranean all the way to the northwest (in fact, the Basques are an Atlantic group). The Sardinians are out of the primary axis, and that’s why I say they’re an outlier. A few other European groups, like the Icelanders and Sami exhibit this tendency. As I suggested above I think the fact that the Sardinians are on an isolated island relatively far from the European and Africa mainland means that they’ll “random walk” in genetic variation space toward an outlier status naturally, just as the Icelanders have since the year 1000. So though I grant the authors their rationale for using the Sardinians as a reference against the Anatolian source population, the fact that we know that they’re peculiar in their variation in total genome content makes me wary of drawing too many inferences from their relationships to other groups where they are seen as representative of a larger set.

Citation: Morelli L, Contu D, Santoni F, Whalen MB, & Francalacci P (2010). A Comparison of Y-Chromosome Variation in Sardinia and Anatolia Is More Consistent with Cultural Rather than Demic Diffusion of Agriculture PLoS ONE : 10.1371/journal.pone.0010419

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, European Genetics, Genetics 
🔊 Listen RSS

Over at my other blog I have a review up of a new paper in PLoS Biology. The authors argue that a particular Y haplogroup lineage, R1b1b2, which has often been assumed to be a marker of indigenous Paleolithic Europeans (i.e., those who were extant before the rise of agriculture and the spread of farmers), is actually a signature of Anatolians who brough agriculture. This probably isn’t too surprising for the genetic genealogy nuts among the readers. After I got a copy of this paper I poked around the internet and the general finding that R1b1b2 was very diverse in the eastern Mediterranean seems to have been well known among the genetic genealogy community (also see Anatole Klyosov’s paper and what he says about Basques specifically). And then in eastern Europe you have R1a1, which seems to have also undergone recent range expansion. Finally, there are the recent rumblings out of ancient DNA extraction which imply a lot of turnover of mtDNA lineages during the shift from hunter-gathering to agriculture.

I think this makes us reconsider the idea that most of the ancestors of contemporary citizens of the European Union who were alive 10,000 years ago were actually resident within the current borders of the European Union. But let’s put the details of that aside for a moment. Which group might be most representative of Paleolithic Europeans? If the paper above is correct, the Basques are not a good proxy for the ancient hunter-gatherers of Europe.

Let’s look at a map which illustrates the spread of agriculture. I’d always focused on the SE-NW cline, but if the U5 mtDNA haplogroup is a reasonable marker of ancient pre-agricultural Europeans, we need to look at the Finnic peoples of the northeast. This may explain why these populations also tend to be genetically distinct from other European groups; not because they’re an exotic admixture, but because they’re not. Anyway, simply speculation, I’m sure readers will have their opinions….

(Republished from by permission of author or representative)
• Category: Science • Tags: Archaeology, European Genetics, Finn Baiting 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"