The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Agnostic The Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

51qciM4cBhL._SX258_BO1,204,203,200_*The past after the word*

If science is hard, history is harder. Harder in that the goal is to understand what happened in ages which are fading away like evanescent ghosts of our imagination. But we must be cautious. We are a great storytelling species, seduced by narrative. The sort of empirically informed and rigorous analysis which is the hallmark of modern scholarship is a special and distinctive thing, even if it is usually packaged in turgid and impenetrable prose. It is too pat to state that history was born fully formed with the work of Thucydides (or Sima Qian). In fact Thucydides’ pretensions at historical objectivity despite obvious perspective and bias lend credence to the assertions of those who make the case that the past is fiction (in this way Herodotus may actually have been more honest). The temptation is always great to paint an edifying myth which gives succor to national pride or flatters our contemporary self-image. The fact that modern nation-states in the technological age have vigorous debates about details as to the nature of periods of history in the recent past, when the people who lived during those times are still here to bear witness, is telling in terms of the magnitude of the task before us. Fraught questions must be answered with far fewer resources.

Much of history we see only vaguely through chance and contingency, known through happenstance and the whims of our ancestors. In the West the documents which shed light upon antiquity come to us through tunnels of finite transmissions, a furious period of textual transcription in the last few centuries before 1000 A.D. The Carolingians, the Byzantines, and the Abbasids all engaged in sponsoring the capital intensive project of taking ancient texts and making copies for posterity. The vast majority of the works of antiquity we have today can be traced back to this period[1]. Biases and concerns of the elites who sponsored these projects were critical in determining the nature of the source material which serves as the foundation for our understanding of the deeper past which we take for granted today. We know how little was copied because the extant material make copious reference to a vast body of work which was circulating in the ancient world on assorted topics (and even many of the works we do have are only portions of multi-volume endeavours, such as that of Livy).

brotherhoodBut what about pushing beyond what the text can tell us, and transitioning from history to prehistory? Here is where matters become opaque and conditional upon the nature of the texts (or lack thereof). This is clear when you observe that there are very early periods of human history when our knowledge of individual actors and daily life is actually greater than later epochs due to regress of civilization, or, changes in technology which mitigated against preservation of texts[2]. The “Dark Ages” of Greece between the Mycenaeans and the Classical Greeks are the purview purely of archaeology (and even during the Mycenaean period most Linear B were of a bureaucratic nature; I do not know of narrative literature such as we have for Egypt or Babylon). For the Classical Greeks the rupture was traumatic enough that their Mycenaean past became the subject of legends. The citadels of the Bronze Age warlords were viewed as “cyclopean” works, as if only giants could have created them. Similarly, the period in Britain between the end of central Roman rule and the Christianization of the Anglo-Saxons, about two centuries, is perceived only faintly because of the paucity of written records (this also explains why this period is often utilized as the setting for historical fantasy).

9780192807281_p0_v1_s260x420 Yet when text is silent one still has material remains. Their collection and analysis are the domain of archaeology, a historical science. The fact that history as we understand it deals in the written word, and so limits its focus to the period when we have texts, is itself a historical coincidence. Ideally traditional history and archaeology should work in concert, and critically, words have a way of deceiving and misleading. Most obviously we have a major ascertainment bias in our understanding of the past when we listen only to the perspectives of those who can speak through words, because they who were literate or had access to literate professionals were a very small subset of the broader human experience. Archaeology has less of this bias, because all classes leave behind their material evidence (though if one wants textual representations of a broader cross section of the Roman populace, the novel The Golden Ass is a good place to start). An excellent illustration of this for me, as readers know, is the extended argument in the book The Fall of Rome, which brings material evidence to buttress the position that the decline and fall of the unitary Roman state in the 5th century coincided with a genuine degradation of what we might term civilization. Revisionists looking purely at textual materials have long argued that the classical view was misleading, and to reduce their argument down toward its essence, suggest that classical civilization evolved and transformed, channeling its energies into different activities (e.g., the rise of Christian theology as a successor to the classical liberal arts, see Peter Brown’s The Rise of Western Christendom). But what material remains tell us is that there was indeed an economic and demographic collapse, despite apologia that one can make as to the reshaping of high culture in texts. One may choose to weight these facts, or not, but the facts nevertheless remain, no matter how many glosses one wishes to upon them. The Rome of 600 may have had many more Christian theologians than the Rome of 400 (which was then a mainly non-Christian city), but the Rome of 400 probably had a population on the order of 10-20 times greater.

41hdiv6SmHL._SY344_BO1,204,203,200_ In a world without text, which is almost all of human history, the material remains are all that we have to grasp upon. Though we can attempt to glean the minds of people long gone from paintings and scratches in stone, the reality is that what they hunted with, what they ate with, and the dwellings in which they lived, are going to give us concrete information where leaps of imagination are unnecessary. Moving beyond the text can allow us to truly illuminate the vast dark oceans of human history with more than our dreams, from the dawn of our species, down to even recent periods when literacy was the privilege of the few, and the experiences of the many were dead to us. Despite this, the paintings have only a few colors on the palette, because archaeology is filled with enormous gaps in perception. Pots not cloth. Caves not tents.

Which brings us to biology, and specifically genetics, as it turns out that DNA is actually one of the material remains that one can extract from archaeological field sites. It’s a robust macromolecule, and today researchers believe that it is feasible that some information can be drawn from remains as old as 1 to 2 million years, though that’s a best case scenario. When it comes to questions of demographic change genetic insights are key, and present data in a way that allows for more rigorous analysis. As has been the case in previous posts I must now give a nod here to L. L. Cavalli-Sforza and The History and Geography of Human Genes. Cavalli-Sforza’s magnum opus reopened the book in attempting to understand history through demographics. It was the first page, and the first chapter. Prior to this before World War II there was a cottage industry which attempted to do what Cavalli-Sforza achieved in the late 20th century. But these endeavors were hobbled by two problems. First, they was not scientific, often relying upon intuition derived from their erudition (they were not hypothetico-deductive, though that’s overrated if you have lots of data). Second, the reliance upon intuition meant that many of the conclusions dovetailed rather neatly with the ideological preferences of the day, National Socialism most horrifically, but much more widely than that was a shoddiness of nationalism inflected prehistory. Scientific romance without the genocide (see Pat Shipman’s The Evolution of Racism). After World War II archaeologists reversed course and decoupled cultural evolution and change from demographic variation. Works such as the Races of Europe became anachronistic when decades before they’d have been mainstream, and there was a strong bias toward a null hypothesis that pots, that is cultural traditions, migrate, but people do not.


k7442 Into this intellectual climate stepped Cavalli-Sforza and his students, triggering a minefield in academic explosions (see The Human Genome Diversity Project: An Ethnography of Scientific Practice). Molecular anthropology in its earliest incarnations focused on deep time. In particular, there was a recalibration of time depth of the origin of apes and humans, where the molecular biologists clashed with paleontologists, and came out the victors (see The Monkey Puzzle for a history of these controversies). Then, there was the “Out of Africa” debate (see The African Exodus). Though these were somewhat fractious and personalized arguments, the emotions around the implications of these contests of ideas were often limited to scholars (though the scholars themselves may not have felt the fallout was limited; apparently at Stanford in the late 1990s a cultural anthropologist gave a presentation where he juxtaposed a photo of Cavalli-Sforza with Josef Mengele). What Cavalli-Sforza did was bring genetic science toward addressing more contemporary phenomena, to answer questions which come to the cusp of the present, tackling issues of relevance to living human people on the scale of nations and peoples. Over many decades his lab collected enough information from hundreds of genetic loci to arrive at the sum totality of inferences which were eventually presented in The History and Geography of Human Genes.

CosttoSequenceaGenome-e1409924136899 Let’s take a step back here. Cavalli-Sforza and his colleagues had access to hundreds of markers at best. Note that ~2% of the human genomic codes for proteins, but there are 3 billion positions in terms of bases. Today anyone who wants to pay can get millions of positions through SNP-chip services. My son has billions of positions, because he’s been whole-genome sequenced. For phylogenetic purposes you don’t need billions, millions, or even thousands, depending on the nature of the questions you have in mind. But, it puts in perspective how far we’ve come in literally 20 years. Even 5 years.

As is the nature of science there was much that Cavalli-Sforza got wrong in The History and Geography of Human Genes. But there was much that he got right, because the results were so clear and strong on particular points of contention. In short, very broad patterns on the continental level jumped out when analyzing even hundreds of neutral (that is, not subject to natural selection) markers. For example, the data confirm a gradient of genetic diversity which implies human origins from an African locus, as well as the relative homogeneity of Europe (aside from Finns, European populations have a surprisingly low between-population pairwise genetic distance in most cases). But, more subtle counterintuitive relationships were often not robust (e.g., North and South Chinese do not bifurcate in the manner that he reported in the 1990s). And, most critically for the purposes of this post inferring past demography from current phylogeographic patterns had serious limitations.

*The present as a window into the past*

downloadm511NSSGQNWL._SY344_BO1,204,203,200_ The basic idea behind historical population genetics (archaeogenetics) which was pioneered by Cavalli-Sforza at the HPGL at Stanford was to look at patterns of diversity and relatedness among modern populations, and intersect that with what was and is known about history, as well as geography, and then allow those intersections to peal back the palimpsests of human history (see his The Great Human Diasporas). Though Cavalli-Sforza focused initially on autosomal markers scattered through the genome, in the period between 1995 and 2005 there was a great deal of work using uniparental data., the markers on the Y and mtDNA. The mtDNA is passed through women only, is copious in terms of quantity on a cellular level, and has a highly mutable region of utility for molecular phylogenetics. The Y chromosome exhibited some technical difficulties in comparison to mtDNA, but with the emergence of better extraction techniques as well as a focus on highly mutable microsatellite regions, it came to be set next mtDNA as a critical tool in the forensic reconstruction of human population history. In addition, both had the virtue of being nonrecombining, so that the generation of a phylogenetic tree was not an artificiality, but a reflection of the nature of the transmission of these two regions of the genome (congenial to a coalescent framework as well).

Human_migrationIn the end this line of research often resulted in a transposition of a phylogenetic tree upon a world map, outlining patterns of human migration. It also aligned well with another line of research which explicitly modeled the expansions of humans out of Africa as a “serial founder bottleneck” process. That is, each population which left Africa progressively branched out in a unidirectional manner, resulting in reduced genetic diversity as one progressed out of Africa.

Ramachandran, Sohini, et al. "Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa." Proceedings of the National Academy of Sciences of the United States of America 102.44 (2005): 15942-15947.

Ramachandran, Sohini, et al. “Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa.” Proceedings of the National Academy of Sciences of the United States of America 102.44 (2005): 15942-15947.

In its broadest strokes this model is not without validity. It does seem that most of the ancestry of modern humans can be traced to a population which flourished around or in Africa ~50-100 thousand years ago. Much of the inter-continental racial variation that we see in extant populations does nicely fit onto a bifurcating tree-like model (e.g., Non-Africans branch off from Africans, West Eurasians and East Eurasians diverge, Amerindians branch off from East Eurasians). The problem though is that the branches themselves turn out to be brambles which turn back in on themselves, and in some cases twist with other branches, creating lineages with very diverged ancestral roots. The yield of the earliest efforts by Cavalli-Sforza and his heirs was on a very coarse continental grain, where the effects of the dynamics were so striking that they would exhibit themselves across most neutral markers without much difficulty. But, when the questions were narrower, and the temporal and spatial scope more constrained, the earlier methods were not perceptive enough to smoke out the real dynamics.

Li, Jun Z., et al. "Worldwide human relationships inferred from genome-wide patterns of variation." science 319.5866 (2008): 1100-1104.

Li, Jun Z., et al. “Worldwide human relationships inferred from genome-wide patterns of variation.” science 319.5866 (2008): 1100-1104.

By the middle years of the 2000s researchers had gone back to a focus on recombining autosomal markers. But now they had a whole human genome to compare it to, as well as SNP-chips which quickly yielded large troves of data with little effort. In 2008 a paper was published which took the origin HGDP data set collected by Cavalli-Sforza and his colleagues, and utilized the new technologies to make deeper inferences. First, instead of hundreds of markers you had 650,000 SNPs. Second, the emergence of powerful new analytic and computational resources allowed for the complemention of tree-based and PCA visualizations of genetic relationship with model-based understandings of genetic variation and population structure. By “model-based,” I mean that the algorithm posits particular parameters (e.g., “3 ancestral populations”) and operates upon the data (e.g., “650,000 SNPs in 1000 individuals”) , to generate results which are the best representation of the fit of the data to the model.HGDPme This different from PCA, which has fewer assumptions, and represents genetic variation geometrically (each axis represents an independent dimension of variation within the data). Model-based clustering is very clear and aesthetically appealing. It gives precise results. But, the model itself is not necessarily right.

Anyone who uses these methods understands their limitations. If you use PCA to project variation of the data set, then the composition of the data you input is going to influence the largest principal components. Therefore, if you are asking questions on a broader spatial scale you should be careful about the possibility that you are overloading the sample set of interest with particular populations. More data in this case might result in less insight. Similar issues crop up with model-based clustering you don’t appropriately weight the populations. Another major problem is that the models are imposing limitations which might produce false inferences (false in that they do not accurately reflect demographic history). Most simply you might ask for many more population divisions than is realistic for the demographic and genetic history of the data. Consider a data set of Irish from Cork and Nigerians from a small village. PCA would no doubt show you two very tight and distinct clusters. With a model-based framework you could look for divisions and structure beyond K = 2 (two ancestral populations). The method is devised in such way that you would get results. But, they wouldn’t be very informative, and they’d be forced. They wouldn’t be robust. The model would be a poor fit to reality.

*From model to reality*

Obviously no model captures all elements of reality. But when the model deviates so much from reality that you get a false sense of what is true then that model is not nearly as useful. Being wrong is a definite bug. Aside from model-based admixture analysis, which posits a finite number of ancestral populations which come together to produce the genetic variation in the data set, you notice that the 2008 paper also had a tree representation of genetic variation. These two together give real and substantive results that can be useful. But, they mislead to the point of falsity in many specific cases.


Reich, David, et al. “Reconstructing Indian population history.” Nature 461.7263 (2009): 489-494.

This can be illustrated by the instance of South Asians, who are about 20% of the world’s population. A 2009 paper, Reconstructing Indian Population History, utilized both the higher autosomal marker density sets and new analytic frameworks to come to some specific conclusions which resolve many confusions about the nature of the genetic history of the peoples of the Indian subcontinent. So what did we know before? If you go back to the ideas of the old physical anthropologists they observed that many South Asian groups had an affinity to the peoples of West Eurasia (Europeans and West Asians). This varied as a function of geography and caste. In other words, there was a cline to the northwest, as well as up and down the caste system. You can see it in a PCA, where Indian groups vary in distance from Europeans, while Europeans form a very tight cluster. It also shows up in admixture based analyses. There is usually a K value where a South Asian modal cluster emerges, and it is near fixation in South Indian non-Brahmins, declining in frequency as one moves toward Pakistan, or, in North India up the caste hierarchy (the residual are West Asian and European clusters, except Bengalis, who have East Asian admixture). In The History and Geography of Human Genes South Asians form an outgroup to Europeans and Middle Eastern populations using older distance measures.

So far all good. One can imagine then a cline of genetic variation, with South Asians at one end, and West Eurasians at the other. On a PCA between East Asians and Europeans South Asians usually fall in the middle, but closer to Europeans. But there have long been major problems with this model when you drilldown into the details. The mtDNA and Y chromosomes of South Asians give very different results. The former classes them as distinct from West Eurasians, with distance affinities to East Eurasians. The latter on the other hand are quite a bit more like West Eurasians. Second, South Asians exhibit a lot of variation as a function of both geography and class in terms of their relatedness to word populations. If South Asians were deeply rooted in the subcontinent, as the migration maps above would imply, then we’re talking about massive barriers to gene flow which have persisted for tens of thousands of years. An alternative explanation is that South Asians are the product of recent admixture between two very different groups, which is what is often the norm when there is a lot of inter-individual variation in ancestral components and PCA position within a putative population group (e.g., African Americans). Finally, tests of natural selection geared toward detecting very recent sweeps have indicated a commonality between South Asians and Europeans and Middle Easterners on the haplotype of SLC24A5, which implies either extreme connectedness, or, recent admixture and migration (on the margin these two models are going to be hard to distinguish, since connections are mediated through migration).

I will sidestep the technical issues at this point, and just offer up that the work on South Asians has presaged much of what we’ve learned over the past decade when it comes to the genesis of modern population structure. The puzzles about South Asian genetic variation are resolved when you admit a model where a West Eurasian population mixed with a local indigenous group with distant affinities with other East Eurasians (see Genetic Evidence for Recent Population Mixture in India). The high level of between population variance within South Asia is due to the recent nature of the admixture event and the high genetic distance between the source populations. This may actually be the story of much of the world over the last 10,000 years. Instead of a regular branching process, imagine branches that periodically fuse back together, in a reticulated pattern. Another way to conceive of it is that the last 10,000 years have been a story of the destruction of population structure accrued over the past 100,000 years. A survey of this field can be found in the review Toward a new history and geography of human genes informed by ancient DNA.

*Inference made concrete, ancient DNA*

Up until now we have been talking about increasing the power of analysis of genetic variation in existent populations. Processes like bottlenecks and positive selection leave footprints in the genomes of modern peoples. But these methods of inference have limits. And, to a great extent they necessitate a simplicity of population dynamics to allow for them to have utility in painting a portrait of the past. Researchers had to assume that the past was simple, or the methods that they had wouldn’t be able to tell them as much as they claimed. The complexity of the demographic palimpsest could never race beyond ability of the genetic methods to peel it back, so there was a ceiling on the number of layers imposed upon the model.

41ePHetk1dL._SY344_BO1,204,203,200_ Ancient DNA was a game changer, because it did not come with these limitations. Instead of just inferring the past from the present, the past could now be inferred from the past! That is, a temporal transect in time could be generated which explicitly explored the trajectory of genetic variation across time and space. As if to recapitulate history the earliest work was with mtDNA, just as it had been with “mtDNA Eve” in the 1980s. The sequence target here is small and mtDNA is copious. The immediate upshot though is that massive discontinuities were detected. Populations replaced each other repeatedly in many regions. Pulse admixture events being inferred with novel methodologies on extant populations now could be understood to have been the natural result of migration and population change over the past ~50,000 years. Thanks to the work of researchers such as Svante Paabo and Eske Willerslev the number of samples we have from ancient DNA for humans has grown to such an extent over the past 5 years that a bright line is shining into what had been a dark cavern of prehistory.

*European man, made and unveiled*

Because of both the concentration of researchers in Europe, as well as suitable preservation conditions in Northern Eurasia, ancient DNA has totally changed how we understand the genetic history of this continent most especially. Two new papers have expanded the sample set to 170 individuals, and many major questions have now been answered, and other new questions have been triggered by perplexing results. A few years ago I was talking to Spencer Wells about the age that we are privileged to live in. Spencer is a history and genetics buff (he was one of Richard Lewontin’s last grad students). So naturally as genetic science has emerged to shed light on history we’ve tracked its developments very closely. Spencer professionally, he’s a genetic anthropologist. Many questions which in the past would have been unanswerable are now answerable. Truth is coming at us so fast that it is hard to even respond to all of it (if you wait too long to publish, everything might have changed).

Carl Zimmer’s piece in The New York Times, DNA Deciphers the Roots of Modern Europeans, is accurate as to the current state of the accelerating research in this area. This is the equivalent of having a Rosetta Stone. The ancients are now coming back to life. They speak! Everything has changed. In Nature Ewen Callway quotes a scientist stating in plain language, “Christ, what does this mean?” I’ll try and flesh out further what it means, but the papers themselves do a good job. These are first steps, but they’re very big steps. There’s only so much more to go, and truth will be at hand.

First, the two papers, Massive migration from the steppe was a source for Indo-European languages in Europe, and Population genomics of Bronze Asia Eurasia. As might be suggested by the title the latter paper has coverage of populations outside of Europe, while the former focuses on Europe. The samples sizes are 69 and 101 respectively. The first paper uses a methodology which yields many SNPs, while the latter relied upon whole-genome sequencing (variation is variation, so really this is a minor detail for the results, though it matters a lot for the working scientists who are generating the data). Both agree broadly on the major results. Additionally, there is a third work, a preprint, Eight thousand years of natural selection in Europe, which has results in line with the second paper above (it has a section on selection as well as phylogenomics).

*European genetic structure is younger than the pyramids*


The old debate whether Europeans are descended from farmers or hunter-gatherers was always somewhat incoherent. All humans are descended from hunter-gatherers. Rather, the issue was whether modern Europeans descend primarily from people who were resident within the continent of Europe at the end of the last Pleistocene, or, whether they descend from peoples who developed agriculture in the Middle East ~10,000 years ago. That is, did farming spread through cultural diffusion or migration? Plants or people? The answer is actually not straightforward, but, the results are not controversial today.

First, migration seems to have been the dominant dynamic which defined the spread of farming, especially early on. These first farmers who arrived in Europe were genetically very different from the hunter-gatherers of Europe’s north and west. Some of their ancestry had been isolated by long distances for tens of thousands of years before contact. The people of the Iberian peninsula today have less genetically in common with the hunter-gatherers which were present in the region when the farmers arrived than do modern Northern Europeans, who harbor a greater fraction of ancestry which derives from the Pleistocene people. The main qualifier I’d put on this though is that the farmers themselves seem to have picked up European hunter-gatherer admixture on their way out of the Middle East. The fraction is on the order of ~50%. The other component has been termed “Basal Eurasian,” because this element is an outgroup to all other Eurasians, including the European hunter-gatherers. That is, the Basal Eurasians are an outgroup to a clade that includes such as diverse populations as Andaman Islanders, Australian Aborigines, Japanese, and European hunter-gatherers.

Lazaridis, Iosif, et al. "Ancient human genomes suggest three ancestral populations for present-day Europeans." Nature 513.7518 (2014): 409-413.

Lazaridis, Iosif, et al. “Ancient human genomes suggest three ancestral populations for present-day Europeans.” Nature 513.7518 (2014): 409-413.

The figure to the left is from the paper Ancient human genomes suggest three ancestral populations for present-day Europeans. WHG = “Western (European) Hunter-Gatherers.” EEF = “Early European Farmers.” You can see that EFF is a compound. I don’t think there’s too much clarity right now with where the EEF got its WHG-like ancestry. It could have been structure in the Middle East. Or it could have been in Southeast Europe. In the supplements of Haak et al. they test a Hungarian sample, and it does seem that the EEF individuals are closer to it than the Western European hunter-gatherer samples. So there might have been structure in the ancestral European population, but the confidence here is low. And from what I can tell Basal Eurasian is still something of a mystery, almost occupying the role of “Planet X” before the discovery of Nepture. To make the patterns make sense they have to exist, but much isn’t known about them in detail. And of course there seems to be a huge lacunae right now in terms of exploring the population genetics of the Middle East in a similar fashion as has occurred in Northern Eurasia (my understanding is that Carlos Bustamante was an important person in getting Latin American populations in the 1000 Genomes; unfortunate that there wasn’t someone else to advocate for including a Middle Eastern group, since this is such an important part of the world for human history).

With all that said, if one assumes that the West Eurasian admixture in EEF was from European hunter-gatherers, then it is clearly obvious that most of the ancestry of modern Europeans can date to the Pleistocene (i.e., EEF + Yamnaya likely means more than half the ancestry is WHG-like if you look back 10,000 years). But, this proportion obscures the fact that massive migrations and population turnovers have occurred, so that a simple model of expansion out of Ice Age refuges no longer holds. Cavalli-Sforza has long argued that pure proportions of ancestry are less important than the dynamic, as population growth driven “waves of advance” will over time dilute the initial genetic signal anyway (though the final proportion of non-WHG-like ancestry is actually higher in much of Europe than Cavalli-Sforza conceded in the early 2000s). Whether the ancestry of modern Europeans derives predominantly from those of European hunter-gatherers, the idea of dominant local continuity in a given region has been thoroughly refuted. The hunter-gatherer ancestry in the British Isles, for example, may be mostly from admixture into agricultural groups far to the south and east during the initial waves of advance, not from the people who initially recolonized Northern Europe in the early Holocene.

k8488 The second demographic turnover event which has been highlighted by the papers cited so far is from the east. The migration from the steppes. This event had disproportionate, even dominant, impact across much of Northern Europe. Culturally it is often rooted in the Yamnaya complex, which gave rise to various disparate and wide ranging “daughter” societies. David Anthony’s The Horse, the Wheel, and Language surveys the archaeological terrain thoroughly. If you are interested in this topic, and haven’t read it, do read it. In this work Anthony outlines the spread of Indo-European languages via expansion of a mobile pastoralist elite. He was involved in the retrieval of some of the samples in these studies, and from what I am to understand he was personally surprised that the genetic data imply not just elite migration, but a folk wandering. Not just a band of brothers, but whole peoples on the move.

Haak, Wolfgang, et al. "Massive migration from the steppe was a source for Indo-European languages in Europe." Nature (2015).

Haak, Wolfgang, et al. “Massive migration from the steppe was a source for Indo-European languages in Europe.” Nature (2015).

Focusing on the genetics, these people seem to themselves be a compound of disparate elements. First, some of their ancestry derives from a population which Haak et al. term “Eastern Hunter-Gatherers” (EHG). And the other half derives from a population with affinities to those of the Near East, but different from that of the EEF. There is some disagreement between the two papers in Nature as to the details, but Allentoft et al. admit that they did not have EHG samples, which may have impacted their ability to detect admixture. Allentoft et al. also diverge from Haak et al. in the emphasis they place on the ancestral component among the Yamnaya which some term “Ancient North Eurasian” (ANE) based on the location of the most ancient individual of this line (see Upper Paleolithic Siberian genome reveals dual ancestry of Native Americans). What does seem clear is that this element is deeply diverged from other West Eurasian populations, on the order of ~20 to 30 thousand years. And, they contribute about half the ancestry to the EHG (the rest is WHG-like). The descendants of the Yamnaya people brought this component all throughout Europe, with the exception of the Sardinians and Sicilians, likely isolated because of their position on the Mediterranean littoral (Sicilians have later Near Eastern admixture as well). But this is not limited to Europeans, as a substantial proportion of Native American and West and South Asian ancestral heritage (at least the Kalash) also exhibit connections to this component. Allentoft et al., like Haak et al., points out that there was likely structure in this broader group. That is, the ANE themselves were diversified, with the ancestors of the element in Native Americans and Europeans different from that which contributed to the Siberian component. In fact I have talked to researchers who believe that the term “Ancient North Eurasian” is misleading, as there is little clarity on the distribution of this group (the highest inferred fractions in Eurasia are in the North Caucasus). It is feasible that the Kalash have a different ANE source than Europeans.

A key issue to note, and that confuses some people, is that the ancestry of groups such as Yamnaya exhibited commonalities with other groups across Eurasia. Therefore, if you replaced similar groups then the change in admixture components utilizing model-based programs may not be as extreme as you would think. To illustrate what I’m getting at concrete, the population transfer between Greece and Turkey during the 1920s was far more impactful as a dynamic than simple before and after admixture estimates would suggest to you (since genetically the two groups were very similar). The figure from Haak et al does not use admixture components that break out naturally, but their inferred demographic mixes taking into account the genetic character of the putative ancestral populations. The blue component refers to WHG, but WHG-like ancestry is also in both the green (Yamnaya) and orange (EEF) elements (this is why I’m saying it is likely that modern Europeans are mostly >50% WHG-like).

One temporal dimension that Haak et al emphasizes in particular, but seems clear in Allentoft et al. as well, is that non-Yamnaya ancestry slowly begins to rise again by the Bronze Age. Why? I will address that below. But, Allentoft et al. has broader Eurasian samples, including likely Indo-European populations in the trans-Ural and trans-Altai regions. In both of these areas the successor cultures had EEF-like ancestry. That is, like the Corded Ware population, and unlike the parent Yamnaya group. This strongly implies back-migration by this complex from Eastern Europe, as far east as western China, during the Bronze Age.

warbefore In The New York Times piece David Anthony states two things which puzzle me as an interested lay person without his expertise. First, he seems to think that the amalgamation of the Yamnaya and EEF-descended populations was not a warlike process. Specifically he says “It wasn’t Attila the Hun coming in and killing everybody,”. This is a useful image, but let’s be honest and note that the Huns were not primary producers, and did not aim just to increase pasturage by killing settled peoples as Genghis Khan had wanted to do (see The End of Empire: Attila the Hun & The Fall of Rome). Rather, they conquered and subordinated other barbarian groups, as well as extorted tribute from the East Roman Empire. The demographic impact of the Huns was not directly from them, but the fact that they and their successors (in particular the Avars) facilitated the migration of other groups, first, the Goths, and later the expansion of the Slavs. By the time of Attila barbarian leaders were well aware that the conquered were vital as economic producers whose capture and subjugation would allow them to engage in status competitions of conspicuous consumption. I do not believe that this was quite the case in the Copper and Bronze Ages beyond the limes of the civilized world, which was then an small archipelago of literacy in a sea of barbarism. Both the above papers indicate massive demographic disruption across Europ. Though war as we understand it is necessarily inevitable for our species, between the rise of agriculture and the modern period it seems to have been very common. It is not a coincidence that the Scandinavian Corded Ware culture are also called the Battle-Axe culture. Yes, many archaeologists believe that they were primarily a status symbols. I’m willing to bet many archaeologists are wrong. It’s been known to happen.

gokturk_empire_by_still_ates The second issue which Anthony brings up is the connectedness of the various post-Yamnaya cultures, in particular that of the earliest Indo-Europeans on the fringes of western China, 4,000 miles from their likely point of origin. The genetic characteristics of these eastern groups is also such that it is likely that there was gene flow from Europe, mediated by a common steppe culture. Anthony states that “I myself have a hard time wrapping my head around explanations for that”. This totally confuses me, because he’s a professional archaeologist, so he must know that widespread gene flow and cultural ties cross the vast swath of the Eurasian heartland is not surprising at all! To Carl Zimmer I pointed out the example of the Goturk Empire of the mid 6th century A.D., which expanded rapidly from the core Altai zone, and prefigured the later distribution of the Turkic people, from the Nile to the fringes of the Arctic sea. Language and lifestyle mediate relationships and demographic contact. The peripatetic character of steppe peoples is well known and attested from the historical and semi-historical record. Groups such as the Huns, Avars, and Alans, had inchoate origins in the heart of Eurasia, and moved back and forth along lines of cultural affinity as needed. Alans were serving under the Mongols in China in the 13th century, but 800 years earlier they had accompanied the Vandal tribe to North Africa, and maintained a separate identity there until the conquest of Justinian. It seems entirely plausible that this pattern of hyper-mobility arose with agro-pastoralism along the whole range of continuous ecological appropriateness, only ending with the rise of gunpowder empires and the crushing of the Oirat by the Manchus (with the tacit approval of Russia).

*Northern European archetypical physical characteristics are younger than the pyramids*

Spencer Wells, a new look in the world

Spencer Wells, a new look in the world

Phylogenomics is tangled and complicated still, even with all these new results. I’ve only scratched the surface above. You really need to read the papers, and their supplements, to even get a sense of what’s going on (yes, ideally you’ll know what an f3 statistic is!). But, the population genomics which give us a sense of the character of natural selection and phenotype over time is much clearer. The suite of traits which we associate with white Europeans is quite possibly very recent, as late as post-Bronze Age. White supremacist scholars of the early 20th century who posited that ancient Egypt (in fact, all civilizations) were founded by blonde Nordic people turn out to likely be wrong because these civilizations probably predate the existence of blonde Nordic people, both in their genetic structure, and in their physical type (at least in any number).

nature14507-f4 The genetic architecture of pigmentation is something geneticists know a fair amount about, because genome-wide association has been very fruitful in this area. Unlike traits such as height there is a large amount of between population variation in pigmentation. And, that variation is due in large part to a few genes of large effect. At SLC24A5 there is a SNP which accounts for around 1/3 of the melanin index difference between Europeans and Africans, using an admixed African American population to test the effect. As I have observed before SLC24A5 in its derived form is as close to fixed as you can get in Europeans. In the 1000 Genomes data set of thousands of individuals I found a few samples with a heterozygote and the ancestral copy. In the Middle East this allele is also near fixation, though not quite. As you can see from the figure I adapted from Allentoft et al., among South Asians the derived allele is also at high frequency. My whole family is a homozygote for the “European” variant. There is some suggestive evidence that this haplotype derives from the Middle East. It was only at low frequency among European hunter-gatherers[3]. But, by the Bronze Age had it gone to fixation in Europe, as well as on the Eurasian steppe.

Of more interest to me is the trajectory of SLC45A2. The derived allele is nearly fixed in modern European populations, though not nearly to the same extent at SLC24A5. In Iberian and Sardinian populations the ancestral type is in the range of ~10%. During the Bronze Age in Europe it was only at ~50% frequencies, which is in the range of modern Middle Eastern populations. It was even at lower frequency in the steppe, from which the putative Indo-Europeans migrated.

Finally, in this panel for pigmentation they included a major SNP in OCA2-HERC2 region. This locus is famous for being involved in blue-brown eye color variation, explaining 75% of the variance, and also exhibiting the third longest haplotype in the European genome. Naively projecting from these SNPs one could credibly argue that the ancient hunter-gatherers of Europe at the beginning of the Holocene were dark-skinned and blue-eyed! The Bronze Age European samples, which in this case are biased toward Northern Europeans, had a range of genetic variation equivalent to modern Southern Europeans. The people of the steppe did not seem to have blue eyes at all.


These results align perfectly with those in Mathieson et al. One thing to observe is that the Paleolithic samples, which have a much deeper time depth, are “ancestral” at all these positions. Even if the sample size is small (N =4), they’re from diverse times and places. Does that mean that they were much darker than even the Holocene hunter-gatherers of Europe? As some have pointed out we can’t just straight-line extrapolate from the genetic architecture of today to the past. Remember that Neanderthals exhibited pigmentation polym]orphism, but of a different sort. A deeper functional analysis may yield the possibility that Paleolithic Europeans had alleles which also resulted in lighter skin, but they were different ones from the ones segregating as polymorphisms today. I have already stated that I doubt much of modern European ancestry derives form before the Last Glacial Maximum. The reason that modern genetic variation in terms of predicting phenotype gives these sorts of results is that they may have arrived at the same trait value via a different set of polymorphisms. Genotype-phenotype maps derived from modern populations may be a poor predictor of the relationship 30,000 years ago. Why would one think that selection upon variation in pigmentation began at the cusp of the Holocene?

But, I do think we can predict with more confidence the nature of phenotypes for populations which are genetically much closer to modern ones. Bronze Age Europeans fit that bill. And, I know something personally about what the appearance of individuals during this period might have been based on genetic architecture: both my children exhibit a genotype profile on pigmentation loci similar to many Bronze Age Europeans. That is, they’re fixed for the derived variant of SLC24A5, and are heterozygotes at SLC45A2 and OCA2-HERC2 (my son, but not my daughter, is a heterozygote at KITLG; it does seem to make a difference in hair color). In terms of just their complexion they could pass as indigenous Southern Europeans, but definitely not Northern European.

*Culture leads genes by the leash*

Another major finding of Mathieson et al. and Allentoft et al. is that the derived allele found across West Eurasians that allows them to digest lactose sugar as adults has been sweeping up in frequency over the last 4,000 years. This allele spans a diverse array of populations, from Basques to South Asians. With pigmentation it seems that we need to consider jointly the impact of ancestry and selection (in South Asia derived SLC24A5 frequencies are definitely a function of both selection and descent). But with LCT it seems likely that selection is paramount. The predominant genetic character of Eurasia was established by the Bronze Age, but the frequency of the lactase persistent allele was still far lower. Tests of natural selection which focus on patterns of haplotype variation long detected a huge hit from LCT so this is not surprising.

51r8Ph-vcaL._SY344_BO1,204,203,200_ Intriguingly Allentoft et al. indicates that though the Bronze Age steppe populations had low frequencies of the derived allele, it seems that they did have a higher frequency than contemporary populations. This suggests that the origin of this haplotype, which spans the whole range of Indo-European speaking populations, and also into Finnic groups and the Basque, may still be attributed to the Yamnaya complex. In 10,000 Year Explosion Greg Cochran proposed the hypothesis that the favored mutation for LCT enabled the spread of Indo-European pastoralists. These results are not strong support for that direct causal relationship; rather, it strikes me that the ascendancy of the pastoralists drive the selection pressures for the allele in question. Biology did not drive culture, culture drove biology. The milk-drinking Celts and Germans encountered by Julius Caesar 2,000 years ago may still have been in the middle stages of adaptation to the agro-pastoralist lifestyle slowly being perfected by their ancestors.

*As the white man is, so shall we all be*

A new look as well

A new look as well

It is a running joke of mine on Twitter that the genetics of white people is one of those fertile areas of research that seems to never end. Is it a surprise that the ancient DNA field has first elucidated the nature of this obscure foggy continent, before rich histories of the untold billions of others? It’s funny, and yet these stories, true tales, do I think tell us a great deal about how modern human populations came to be in the last 10,000 years. The lessons of Europe can be generalized. We don’t have the rich stock of ancient DNA from China, the Middle East, or India. At least not enough to do population genomics, which requires larger sample sizes than a few. But, climate permitting, we may. And when that happens I am confident that very similar stories will be told. Using extant genetics we can already infer that modern populations in South Asia are a novel configuration of genotypes and phenotypes. The same in Southeast Asia, the Americas, and probably Africa. Probably the same in East Asia. Perhaps in Oceania. Even without admixture humans evolve in situ and changed, but with admixture the variation increases, and the parameter space of adaptation becomes richer and more flexible.

In Isaac Asimov’s later Foundation books he touched upon the existence of racial diversity in the future (from what I recall his earlier works from the pulp era were whites-only galaxies). At one point Hari Seldon encounters someone whose physical appearance seems to be East Asian, and they discuss the strangeness of people with East Asian ancestry being termed “Easterners” and those with European appearance being “Westerners.” With a loss of memory of the ancient distribution of these populations on the home planet only the shadow of a semantic recollection exists as a ghost in the galaxy-spanning Empire based out of Trantor. But of course tens of thousands of years in the future, even barring genetic and mechanical modification, it is unlikely that modern racial types will persist in any way we would recognize them.

But these results coming out of ancient DNA are telling us that what is likely to be true for the far future was also true for the recent past. White Europeans are a new type. But so are brown South Asians. Ethiopians have a recent ethnogenesis, as do most North African groups. The Bantu expansion has reshaped the face of Africa on the edge of the historical horizon. And so forth. In the big picture Young Earth Creationists are wrong, but in the specifics the idea that the sons of Noah populated the world ~5,000 years ago is not looking as crazy as it once did! Human genetic variation across Eurasia today may be mostly clinal, but in the recent past it was not. Rather, it was characteristic by sharp discontinuities and isolated local populations with diverged ancestry from their neighbors.

*And culture made man in its image*

51L3op-B8fL._SY344_BO1,204,203,200_ About ten years ago it was common in paleoanthropology to assume that human beings emerged almost fully formed ~50,000 years ago, and wiped out all the others in a genocidal wave of advance. Richard Klein advanced this model in The Dawn of Human Culture. Klein’s thesis was that some stochastic event, a mutation, resulted in the punctuation of a new species, our own. This singular genetic process allowed for the emerged of fully formed linguistic faculties in our lineage, which allowed for the development of the cultural flexibility, which made the rest of the human lineages evolutionary dead ends. It was a single and elegant story. It appealed to the principle of parsimony. The reality of “archaic” admixture was a difficulty for Klein’s model, evidenced by the fact that he voiced his skepticism of genetic claims of admixture in The New York Times after most others had moved on. For Klein a biological change explained the rise and success of our species, not a cultural one.

At the time I found the thesis compelling. We were after all a very special species. Modern Homo made it to Oceania and the New World. Something must have happened. Something big. What else could explain our rapid expansion and marginalization of other lineages? I’m a biologist, and so biology is an appealing causal mechanism.


*The luck of the English facing the ocean*

At about the same time the evidence for Neanderthal admixture came out, Luke Jostins posted results which showed that other human lineages were also undergoing encephalization, before their trajectory was cut short. That is, their brains were getting bigger before they went extinct. To me this suggested that the broader Homo lineage was undergoing a process of nearly inevitable change due to a series of evolutionary events very deep in our history, perhaps ancestral on the order of millions of years. Along with the evidence for admixture it made me reconsider my priors. Perhaps some Homo lineage was going to expand outward and do what we did, and perhaps it wasn’t inevitable that it was going to be us. Perhaps the Neanderthal Parallax scenario is not as fantastical as we might think?

41z97bDZvUL._SY344_BO1,204,203,200_ Consider the case of Europe around 1600. In England and northern Germany (or what was to become northern Germany) you have two Protestant and genetically similar populations. But by 1850 it looked as if England was going to demographically overtake Germany in a broader genetic sense. James Belich’s Replenishing the Earth reviews the history of this period, when England spearheaded a demographic revolution far out of proportion to what one might have predicted in the year 1000. But by 2000 Germany, or Germans, had caught up somewhat. How? Millions of Germans migrated to the United States, starting in very large numbers in the mid-19th century, and were “picked up” by the demographic revolution which was the United States. The point is that contigencies of history, cultural and social, rather than biology, explain the trajectory of the gene pool over time. Much of the human past, and the sharp fluctuations in gene frequencies, might be driven by the long and forceful arm of culture.

In the treatment above I note that the EEF farmers who by and large replaced the indigenous hunter-gatherer groups in modern Southern Europe were themselves a compound. The hunter-gatherer ancestry within the EEF was far more successful than that of those they replaced, but the only reason that this was so was geographic coincidence. The WHG-like groups absorbed into the EEF were positioned further east, and so closer to the initial locus of expansion of Neolithic farmers. Similarly, the Neanderthal admixture into modern populations was almost certainly localized to particular groups. This is not to say that there are no biological differences between human populations which may explain a wide range of phenomena. Anyone looking at the skull of a Neanderthal and a modern human knows there are. There are also likely bio-behavioral differences between extent populations. Gene-culture coevolution is a real process, even if the details need to be worked out. But the interplay between biology and culture is complex, and in many cases cultural changes are driving the biological change, and then fixing differences which are advantageous to the “winners” (lactase persistence seems rather to be a perfect case of this). But just as in the individual case we must also remember that winning is often in part a function of being lucky. Naturally selection, generally thought of as a deterministic process, is also to some extent stochastic[4].

*From genetic islands to a roiling sea of humans*

One of the most shocking things for many of the geneticists working in the area of ancient DNA, and encountering the variation of the past, is the high level of population structure. That is, you have groups co-resident for many generations who nevertheless exhibit genetic distances of intercontinental scale. But as I stated above David Reich himself found the same results for India. And, in Africa you have long symbiotic populations, such as the pygmy groups of the Congo, and their agricultural neighbors, who are genetically very different, and have been for tens of thousands of years. Allentoft et al. dryly observe that “These results are indicative of significant temporal shifts in the gene pools and also reveal that the ancient groups of Eurasia were genetically more structured than contemporary populations.”

castesofmind About 10 years ago I read Nicholas Dirks’ Castes of Mind. Dirks is an eminent scholar who is now the chancellor of UC Berkeley. He emphasizes the power of European categories and systematization in creating the modern caste system. I don’t want to reduce his argument to a caricature. Obviously caste predates European colonialism. Dirks would admit this. But in Castes of Mind it is hard to shake the feeling that he believes that the British imposition of formalization made it what we truly understand it to be today. That caste has to be understood as a contemporary and early modern phenomenon, rather than an ancient one that was a structural feature of South Asian society.

The genetic evidence is clear now, and it paints a very different landscape. Many of the caste, even jati, boundaries we see today are thousands of years old. Endogamy long predates the British. It may predate the Aryans! Rather than the British, or Aryans, inventing caste, this form of ethnic segregation may date to the initial admixture event, to be reinvented and modified with each new population which arrives and imposes its hegemony on the subcontinent. In The New York Times David Reich states “You have groups which are as genetically distinct as Europeans and East Asians. And they’re living side by side for thousands of years.” He then he goes on to say “There’s a breakdown of these cultural barriers, and they mix,” alluding to the rise in WHG ancestry in farmer samples over time. Of course it is interesting to remember Reich’s work on India has highlighted exactly how persistent caste has been, and how it maintains genetic variation in a localized region that is often nearly inter-continental in magnitude.

We can never know if 6,000 years ago the LBK people, the first farming culture of Northern Europe, imposed a caste-like system of segregation when encountering the indigenous hunter-gatherers. Nor can we say with total confidence whether their relationship exhibited a symbiosis analogous to that between the Bantu agriculturalists and pygmies of the Congo (though do note that in these scenarios the Bantu communities are higher status, and the individual pygmies often have a semi-slave status). But, we need to look to what cultural evolutionary models and empirical results can tell us to make sense of these patterns. Ancient DNA can tell us very concretely the details of changes in allele frequencies. We can somewhat confidently reconstruct the faces and complexions of our ancestors. The questions population genomicists ask and answer in relation to animal models are relatively cleanly addressed by these data sets, assuming the sample sizes are large enough. But humans are the cultural animal par excellence, and that is the critical new variable which will require a new set of scholars to come together and create a truly multi-disciplinary understanding of the human past, present, and perhaps future. Powerful genomic techniques which produce results which have implications for the study of human history needs to leverage the full array of scholars who study human historical science.

1 – The three-fold copying is an important matter, because the different cultures had different preferences and goals. The Arab effort for example focused mostly on the philosophical production of the ancients. Without the Byzantines we would have far less of the humanistic production of Classical Greece, in particular the theatrical tradition.

2 – Much of what is known about the diplomatic history of the Bronze Age Near East has been preserved in cuneiform tablets. Though unwieldy, this form of writing on clay tablets is obviously more robust and less dependent upon copying than parchment and papyrus which came later.

3 – I would be curious to know if it is the same haplotype as is currently common in Eurasia.

4 – New mutations will usually go extinct, even if they are favored, in the initial generations. It is only when the frequency becomes high enough due to chance that selection will inevitably drive its frequency up, perhaps to fixation.

🔊 Listen RSS


“What it begins to suggest is that we’re looking at a ‘Lord of the Rings’-type world – that there were many hominid populations,” says Mark Thomas, an evolutionary geneticist at University College London who was at the meeting but was not involved in the work.

- Mark Thomas, as reported by Nature

This is in reference to the ancient DNA meeting where David Reich reported that the Denisovans, an exotic archaic population which contributed ~5-10 percent of the ancestry of Papuans, was itself a synthesis of Neandertals and a mysterious group currently unknown. This is not surprising, as the broad outlines of these results were presented at ASHG 2012, though no doubt they’re moving closer to publication. But for this post I want to shift the focus to a different time and place, after the ancient admixture with archaic lineages, and to the reticulation present within our own.

But first we need to backtrack a bit. Let’s think about what we knew in the early 2000s. If you want a refresher, you might check our Spencer Wells’ The Journey of Man or Stephen Oppeneheimer’s Out of Eden, which focused on Y and mtDNA lineages respectively. These books were capstones to the era of uniparental phylogeographic analysis of the spread and diversification of anatomically modern African hominids ~50-100,000 years ago. Rather than looking at the whole genome (the technology was not there yet) these researchers focused on pieces of DNA passed down via direct maternal or paternal lineages, and reconstructed clean phylogenetic trees using a coalescent framework. Broadly speaking these trees were concordant, and told us that our lineage, all extant humans, derived from a small African population which flourished ~100,000 years ago. These insights suffused the thought of human evolutionary thinkers in other disciplines (see The Dawn of Human Culture). H. sapiens sapiens, veni, vidi, vici.

After that initial “Out of Africa” migration a series of bottlenecks and founder events led to the expansion of our lineage, as it replaced all predecessors. By the Last Glacial Maximum, ~20-25,000 years ago, the rough outlines of human genetic variation were established (with the exception of the expansion into the New World). We know now that this picture is very incomplete at the most innocuous, and highly misleading given the least charitable interpretation.

Reticulation. Graphs. Admixture. These words all point to the reality that rather than being the culmination of deep rooted regional populations which date back to the depths of the Pleistocene, most modern humans are recombinations of ancient lineages. On the grandest scale this is illustrated by the evidence of ‘archaic’ ancestry in modern humans. But even more pervasively we see evidence of widespread admixture between distinct lineages which are major world populations which we think of as archetypes. This is true for Amerindians, South Asians, and Europeans. This is also the case for Ethiopians, and Australian populations. A major problem crops up when we talk about extinct ancient populations which were the founding substituent elements of modern ones: it doesn’t make sense to use modern referents when they are simply recombinations of what they are describing. But language and history being what they weare we can’t change the awkwardness of talking about “Ancestral North Eurasians,” anodyne and somewhat incoherent at the same time (Eurasia is a modern construct with contemporary historical salience).

Into the mix comes another ancient DNA paper which reconstructs the genome of a boy who lived in Siberia, near Lake Baikal, somewhat over 20,000 years ago. It’s titled Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Here’s the topline finding: a substantial minority of the ancestry of modern Native Americans derives from a North Eurasian population which has closer affinities to West Eurasians than East Eurasians. And, this is an old admixture event. In the paper itself they observe that all “First American” populations seem to exhibit the same admixture distance to the Siberian genome. These results are also broadly consistent with the admixture of this population in Western Eurasia, especially northeast Europe. As among Amerindian populations it seems that this element is at substantial minority across Europe as a whole, and perhaps at parity in some populations, such as Finns.

Fig1 To the left you see the geographical affinities of the MA-1 Siberian sample. It is shifted toward West Eurasians in the PCA. But on the map with circles representing populations, the definite evidence of admixture between Amerindians and MA-1 is clear in the shading. The statistic used, f-3, looks for complex population history between and outgroup (X) and a putative clade. From this test it is evident Amerindians had some admixture related to MA-1. Because of the dating of Siberian remains it does not seem likely that admixture was from Amerindians to West Eurasian and related populations. Rather, the reverse seems more plausible. You can also see from the map the close affinities with particular European and Central Asian populations of MA-1. This is intriguing, and requires further follow up. Though MA-1 and its kin were closer to West Eurasians than East Eurasians, it still seems likely that there was an early divergence between the populations of north-northeast Eurasia, and those of the southwest. Eventually they came back together in various proportions to produce modern Europeans, but it seems likely that during the Pleistocene these two groups went their own way.

treemix There are hints of this in the TreeMix plot to the right. Note now drifted MA-1 is in relation to other West Eurasians (the branch is long). I suspect some of this is due to the fact that this individual is nearly 1,000 generations in the past. Not only is it difficult to name ancient populations with those of moderns, I suspect that some of the variation in the ancient populations has been lost, and so they seem exotic and difficult to fit into a broader phylogenetic framework (they had hundreds of thousands of SNPs though). And yet MA-1 can be fitted into the broader framework of populations which went north or west after leaving Africa because of mtDNA and Y chromosome results. Both of these indicate that MA-1 was basal to West Eurasians, with haplogroup U for mtDNA, and R for the Y lineage.

To really understand what’s going on here is going to take a while. A later subfossil, circa ~15,000 years before the present, yielded some genetic material, and exhibited continuity with MA-1. This suggests that Siberia may have had massive population replacement relatively recently. We know this was likely the case elsewhere. Reading Jean Manco’s Ancestral Journeys one possible scenario is that Pleistocene Europeans were MA-1 like, but were replaced by Middle Eastern farmers in the early Neolithic. But later eruptions from Central Asia brought mixed populations (Indo-Europeans?) with substantial MA-1 affinities to the center of European history.

Finally, one must make a note of phenotype. The authors looked at 124 pigmentation related SNPs (see supplemental). The conclusion seems to be that MA-1 was not highly de-pigmented, as is the case with most modern Northern Europeans. This stands to some reason, as substantial ancestry of this sort in Amerindians would result in phenotypic variation which does not seem to be present. Though the authors do suggest that coarse morphological variation among early First Americans (e.g., Kennewick Man) might be due to this population, which had West Eurasian affinities.

Where does this leave us? More questions of course. Though I’m confident the befuddlement will clear up in a few years….

Citation: doi:10.1038/nature12736

Addendum: Please read the supplements. They’re rich enough that you don’t need to read the letter if you don’t have access. Also, can we now finally bury the debate when east and west Eurasians diverged? Obviously it can’t have been that recent if a >20,000 year old individual had closer affinity to western populations.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Archaeology 
🔊 Listen RSS

Credit: Eric Hunt

Credit: Eric Hunt

I do love me some sprouts! Greens, bitters, strong flavors of all sorts. I’ve always been like this. Some of this is surely environment. My family comes from a part of South Asia known for its love of bracing and bold sensation. But perhaps I was born this way? There’s a fair amount of evidence that taste has a substantial genetic component. This does not mean genes determine what one tastes, but it certainly opens the door for passive gene-environment correlations. If you do not find a flavor offensive, you are much more likely to explore it depths, and cultivate your palette.


Dost thou dare?
Credit: W.A. Djatmiko

And of course I’m not the only one with a deep interest in such questions. With the marginal income available to us many Americans have become “foodies,” searching for flavor bursts and novelties which their ancestors might never have been able to comprehend. More deeply in a philosophical sense the question of qualia reemerges if there is a predictable degree of inter-subjectivity in taste perception (OK, qualia is always there, though scientific sorts tend to view it as intractable in a fundamental sense).

But there’s heritability, and then there’s genes. We know that perception in some ways is heritable, but what is perhaps more interesting is if you can peg a specific genomic location to it. Then the evolutionary story becomes all the richer. And so it is with the locus TAS2R16, where a nonsynonymous mutation at location 516 seems to result in heightened sensitivity to bitter tastes. More specifically, it’s rs846664, and the derived T allele is fixed outside of Africa, while the ancestral G allele still segregates at appreciable fractions within African populations. A new paper in Molecular Biology and Evolution puts this locus under a microscope, though it does not come up with any clear conclusions. Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa presents some interesting findings. First, let’s look at the distribution of the variation in their sample populations at the SNP of most particular interest:

Region Population T516G
Outside of Africa Non-Africans 0.000
Ethiopia Semitic 0.059
Tanzania Sandawe 0.083
Ethiopia Omotic 0.093
Ethiopia Cushitic 0.095
Tanzania Iraqw 0.111
West Central Africa Fulani 0.114
Kenya Niger-Kordofanian 0.133
Ethiopia Nilo-Saharan 0.156
Kenya Afroasiatic 0.162
West Central Africa Niger-Kordofanian 0.214
Kenya Nilo-Saharan 0.225
Kenya Luo 0.250
Central Africa Niger-Kordofanian 0.329
Tanzania Hadza 0.333
Central Africa Bulala 0.361
Central Africa Nilo-Saharan 0.367
West Central Africa Afroasiatic 0.462
West Central Africa Nilo-Saharan 0.500

As you can see T is fixed outside of Africa, and varies across many African populations Previous work implied this, though coverage within Africa was not good. One thing to observe though is that the frequency of A within Africa can not be explained by recent Eurasian admixture. The frequency is way too high for that to be the sole explanation, and in any case there is no evidence that ~33% of the Hadza’s ancestry is of Eurasian provenance (the Hadza being one of the three major groups of African hunter-gatherers, along with the Bushmen and Pygmies).

Within the paper the authors resequenced ~1,000 base pairs across diverse African populations in an exonic region of this gene (the stuff that codes for amino acids). What they discovered is that of the SNPs segregating, 516 in particular was critical toward effecting phenotyping change. Not only did individuals with the T variant notably exhibit stronger bitter sensitivity, but in vitro expression with a reporter was elevated. Because they had such a dense genomic region they could perform various nucleotide based tests to detect natural selection, and, attempt coalescent models to infer genealogical history.

I’m going to spare you some of the gory details at this point. Here’s what they found. First, it does look like the region is under natural selection in many African populations, in particular, the derived haplotype with T at 516 at the center. But this result is not reproduced across all tests. The coalescent simulations make clear why: the mutation is an old variant with deep roots in the hominin lineage. In other words this variation pre-dates H. sapiens. It looks like the T allele has rapidly increased in frequency relatively recently, though more on the order of ~50,000 years, rather than ~10,000.* Basically around the time of the “Out of Africa” event. Additionally, there’s a tell-tale sign that this is being subject to selection within Africa: the genetic differences across populations at TAS2R16 far exceed the genome-wide values (the Fst at this locus is in the top 1% of loci within the African genome). Finally, one should note that the G allele haplotypes seem to be much more strongly constrained, as if they’re under purifying selection. This means that the switch to T is not all gain.

At this point you may be ready for a story about how some African populations, like Eurasians, underwent a lifestyle change, and diet changes resulted in a shift in sensory perception. That does not seem to be the story. Rather, the authors did not seem to be able to agree upon a neat explanation for what is driving these recent sweeps up from ancient standing genetic variation. They do observe that the variation does tend to cluster geographically, more so than the genome-wide results would imply. There’s likely some adaptation going on, they simply don’t know what. In the introduction and elsewhere you can see that variation at TAS2R16 does correlate with other traits. Not too surprising due to the relatively ubiquity of pleiotropy; one gene with many effects.

Stepping outside of the implications of this specific result, let’s think about what might be a takeaway: something as essential as taste perception might be a side effect of other aspects of evolutionary processes. In other words, we don’t know what the phenotypic target of selection is in this case, but we do have a good handle one of the major side effects, which is sensory perception. How one taste seems like a big deal.** Andthere have been many theories propounded that variation in bitter sensitivity is due to adaptation to poisonous plants and such, but really no one knew, and that was just the most plausible of low hanging fruit. With these results from Africa, where there is more variation in the trait and genes, and good geographic coverage, that seems to be an implausible model to adhere to (one would think the hunter-gatherer Hadza would exhibit the most sensitivity, no?). Many of the traits and tendencies which we humans see as fundamental, essential, and of great import, many actually be side effects of powerful evolutionary forces hammering at the genetic-correlation matrices which define the hidden network of co-dependencies within the genome. So there, I said it. Life is an accident. Enjoy it.

Citation: Campbell, Michael C., et al. “Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa.” Molecular biology and evolution (2013): mst211.

* If it was closer to ~10,000 I think haplotype based tests would come back with something, but they do not.

** Some Epicureans might be accused of reducing the good to taste!

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Evolution, Evolutionary Genetics, Taste 
🔊 Listen RSS
Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Nina Davuluri, Miss America 2014, Credit: Andy Jones

Nina Davuluri, Miss America 2014, Credit: Andy Jones

One of the secondary issues which cropped up with Nina Davuluri winning Miss America is that it seems implausible that someone with her complexion would be able to win any Indian beauty contest. A quick skim of Google images “Miss India” will make clear the reality that I’m alluding to. The Indian beauty ideal, especially for females, is skewed to the lighter end of the complexion distribution of native South Asians. Nina Davuluri herself is not particularly dark skinned if you compared her to the average South Asian; in fact she is likely at the median. But it would be surprising to see a woman who looks like her held up as conventionally beautiful in the mainstream Indian media. When I’ve pointed this peculiar aspect out to Indians* some of them of will submit that there are dark skinned female celebrities, but when I look up the actresses in question they are invariably not very dark skinned, though perhaps by comparison to what is the norm in that industry they may be. But whatever the cultural reality is, the fraught relationship of color variation to aesthetic variation prompts us to ask, why are South Asians so diverse in their complexions in the first place? A new paper in PLoS Genetics, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, explores this genetic question in depth.

Much of the low hanging fruit in this area was picked years ago. A few large effect genetic variants which are known to be polymorphic across many populations in Western Eurasia segregate within South Asian populations. What this means in plainer language is that a few genes which cause major changes in phenotype are floating around in alternative flavors even within families among people of Indian subcontinental origin. Ergo, you can see huge differences between full siblings in complexion (African Americans, as an admixed population, are analogous). While loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.

So what’s the angle on this paper you may ask? Two things. The first is that it has excellent coverage of South Asian populations. This matters because to understand variation in complexion you should probably look at populations which vary a great deal. Much of the previous work has focused on populations at the extremes of the human distribution, Africans and Europeans. There are obvious limitations using this approach. If you are looking at variant traits, then focusing on populations where the full range of variation is expressed can be useful. Second, this paper digs deeply into the subtle evolutionary and phylogenomic questions which are posed by the diversification of human pigmentation. It is often said that race is often skin deep, as if to dismiss the importance of human biological variation. But skin is a rather big deal. It’s our biggest organ, and the pigmentation loci do seem to be rather peculiar.

You probably know that on the order of ~20% of genetic variation is partitioned between continent populations (races). But this is not the case at all genes. And pigmentation ones tend to be particular notable exceptions to the rule. In late 2005 a paper was published which arguably ushered in the era of modern pigmentation genomics, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. The authors found that one nonsynonomous mutation was responsible for on the order of 25 to 33% of the variation in skin color difference between Africans and Europeans. And, the allele frequency was nearly disjoint across the two populations, and between Europeans and East Asians. When comparing Europeans to Africans and East Asians almost all the variation was partitioned across the populations, with very little within them. The derived SNP, which differs from the ancestral state, is found at ~100% frequency in Europeans, and ~0% in Africans and East Asians. It is often stated (you can Google it!) that this variant is the second most ancestrally informative allele in the human genome in relation to Europeans vs. Africans.

SLC24A5 was just the beginning. SLC45A2, TYR, OCA2, and KITLG are just some of the numerous alphabet soup of loci which has come to be understood to affect normal human variation in pigmentation. Despite the relatively large roll call of pigmentation genes one can safely say that between any two reasonably distinct geographic populations ~90 percent of the between population variation in the trait is going to be due to ~10 genes. Often there is a power law distribution as well. The first few genes of large effect are over 50% of the variance, while subsequent loci are progressively less important.

So how does this work to push the overall results forward?

- With their population coverage the authors confirm that SLC24A5 seems to be polymorphic in all Indo-European and Dravidian speaking populations in the subcontinent. The frequency of the derived variant ranges from ~90% in the Northwest, and ~80% in Brahmin populations all over the subcontinent, to ~10-20% in some tribal groups.

- Though there is a north-south gradient, it is modest, with a correlation of ~0.25. There is a much stronger correlation with longtitude, but I’m rather sure that this is an artifact of their low sampling of Indo-European populations in the eastern Gangetic plain. As hinted in the piece the correlation with longitude has to do with the fact that Tibetan and Burman populations in these fringe regions tend to lack the West Eurasian allele.

- Using haplotype based tests of natural selection the authors infer that the frequency of this allele has been driven up positively in north, but not south, India. It could be that the authors lack power to detect selection in the south because of lower frequency of the derived allele. And, I did wonder if selection in the north was simply an echo of what occurred in West Eurasia. But if you look at the frequency of the A allele in the north most of the populations seem to have a higher frequency of the derived variant than they do of inferred “Ancestral North Indian”.

What’s perhaps more interesting is the bigger picture of human evolutionary dynamics and phylogenetics that these results illuminate. Resequencing the region around SLC24A5 these researchers confirmed it does look like the derived variant is identical by descent in all populations across Western Eurasia and into South Asia. What this means is that this mutation arose in someone at some point around the Last Glacial Maximum, after West Eurasians separated from East Eurasians. The authors gives some numbers using some standard phylogenetic techniques, but admit that it is ancient DNA that will give true clarity on the deeper questions. When I see something written like that my hunch, and hope, is that more papers are coming soon.

When I first read The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, I thought that it was essential to read Ancient DNA Links Native Americans With Europe and Efficient moment-based inference of admixture parameters and sources of gene flow. The reason goes back to the plot which I generated at the top of this post: notice that Native Americans do not carry the West Eurasian variant of SLC24A5. What the find of the ~24,000 Siberian boy, and his ancient DNA, suggest is that there was a population with affinities closer to West Eurasians than East Eurasians that contributed to the ancestry of Native Americans. The lack of the European variant of SLC24A5 in Native Americans suggests to me that the sweep had not begun, or, that the European variant was disfavored. What the other paper reports is that on the order of 20-40% of the ancestry of Europeans may be derived from an ancient North Eurasian population, unrelated to West Eurasians (or at least not closely related). It is likely that this population has something to do with the Siberian boy. Since Europeans are fixed for the derived variant of SLC24A5, that implies to me that sweep must have occurred after 24,000 years ago.

journal.pgen.1003912.g002 At this point I have to admit that I believe need to be careful calling this a “European variant.” Just because it is nearly fixed in Europe, does not imply that the variant arose in Europe. If you look at the frequency of the derived variant you see it is rather high in the northern Middle East. Looking at some of the populations in the Middle Eastern panel the ancestral variant might be all explained by admixture in historical time from Africa. If the sweep began during the last Ice Age, then most of Europe would have been uninhabited. The modern distribution is informative, but it surely does not tell the whole story.

Where we are is that SLC24A5 , and pigmentation as a whole, is coming to be genomically characterized fully. We don’t know the whole story of why light skin was selected so strongly. And we don’t quite know where the selection began, and when it began. But through gradually filling in pieces of the puzzle we may come to grips with this adaptively significant trait in the nearly future.

Citation: Basu Mallick C, Iliescu FM, Möls M, Hill S, Tamang R, et al. (2013) The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. PLoS Genet 9(11): e1003912. doi:10.1371/journal.pgen.1003912

* From my personal experience American born Indians often do not share the same prejudices and biases, partly because subtle shades of brown which are relevant in the Indian context seem ludicrous in the United States.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Genetics, Genomics, Pigmentation 
🔊 Listen RSS

Likely an individual with derived allele on KITL locus (Credit: David Shankbone)

An individual polymorphic on the KITL locus? (Credit: David Shankbone)

Pigmentation is one of the few complex traits in the post-genomic era which has been amenable to nearly total characterization. The reason for this is clear in hindsight. As far back as the 1950s (see The Genetics of Human Populations) there were inferences made using human pedigrees which suggested that normal human variation on this trait was controlled by fewer than ten genes of large effect. In other words, it was a polygenic character, but not highly so. This means that the alleles which control the variation are going to have reasonably large response, and be well within the power of statistical genetic techniques to capture their effect.

I should be careful about being flip on this issue. As recently as the mid aughts (see Mutants) the details of this trait were not entirely understood. Today the nature of inheritance in various populations is well understood, and a substantial proportion of the evolutionary history is also known to a reasonable clarity as far as these things go. The 50,000 foot perspective is this: we lost our fur millions of years ago, and developed dark skin, and many of us lost our pigmentation after we left Africa ~50,000 years ago (in fact, it seems likely that hominins in the northern latitudes were always diverse in their pigmentation)

A new paper in Cell sheds some further light on the fine-grained details which might be the outcome of this process. Being a Cell paper there is a lot of neat molecular technique to elucidate the mechanistic pathways. But I will gloss over that, because it is neither my forte nor my focus. A summary of the paper is that it shows that p53, a relatively well known tumor suppressor gene, seems to have an interaction with a response element (the gene product binds in many regions, it is a transcription factor) around the KITLG locus. This locus is well known in part because it has been implicated in pigment variation in human and fish. So KITLG is one of the generalized pigmentation pathways which spans metazoans. There are derived variants in both Europeans and East Asians which are correlated with lighter skin, though there is polymorphism in both cases (it has not swept to fixation).

The wages of adaptation? (Credit: Hoggarazzi Photography)

The wages of adaptation? (Credit: Hoggarazzi Photography)

But this is a Cell paper, so there has to be a more concrete and practical angle than just evolution. And there is. It turns out that a single nucleotide polymorphism mutation in the p53 response element results in a tendency toward upregulation of KITLG and male germ line proliferation. The latter matters when it comes to tumor genesis, and in particular testicular cancer. This form of cancer is one where there doesn’t seem to be a somatic cell mutation of p53 itself. Additionally, the authors observe that testicular cancer manifests at a 4-5 fold greater rate in people of European descent than African Americans. And, presumably the upregulation of KITLG is somehow related to increased melanin production. The authors posit that because of lighter skin in Europeans due to selection at other loci there has been a balancing effect at KITLG (increased tanning response). There is evidence of selection at this locus (a long haplotype and increased homozygosity), so this is not an unreasonable conjecture, though the high frequency of loss of function alleles suggests that the model is likely complex.

I don’t know if this particular story is correct in its details (though I am intrigued that variation in KITLG is associated with cancer in other organisms). But it illustrates one of the possible consequences of rapid evolutionary change due to human migration out of Africa: deleterious side effects because of pleiotropy. In other words, as you tinker with the genomic architecture of a population you are going to have to accept tradeoffs as you are optimizing one aspect of function. Genes don’t have just one consequence, but are embedded in myriad pathways. Over time evolutionary theory predicts a slow re-balancing, as modifier genes arise to mask the deleterious side effects. But until then, we will bear the burdens of adaptation as best as we can.

Citation: Zeron-Medina, Jorge, et al. “A Polymorphic p53 Response Element in KIT Ligand Influences Cancer Risk and Has Undergone Natural Selection.” Cell 155.2 (2013): 410-422.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS


Credit: Aviok

“Think not that I am come to send peace on earth: I came not to send peace, but a sword.” -Matthew 10:34

“There were giants in the earth in those days…when the sons of God came in unto the daughters of men, and they bare children to them, the same became mighty men which were of old, men of renown.” -Genesis 6:4

Seven years ago I wrote a short post, Why patriarchy?, which attempted to present a concise explanation for the ubiquity of what we might term patriarchy in complex societies (i.e., not “small-scale societies”). Broadly speaking my conjecture is that social and political dominance of small groups of males (proportionally) over the past several thousand years is an example of “evoked culture”. The higher population densities in agricultural societies produced a relative surfeit of accessible marginal surplus, which could be given over to supporting non-peasant classes who specialized in trade, religion, and war, all of which were connected. This new economic and cultural context served to trigger a reorganization the typical distribution of power relations of human societies because of the responses of the basic cognitive architecture of our species inherited from Paleolithic humans. Agon, or intra-specific competition, has always been part of the game on human socialization. The scaling up and channeling of this instinct in bands of males totally transformed human societies (another dynamic is elaboration of cooperative structures, though this often manifests as agonistic competition between coalitions of humans).

To get a sense of what I mean when I say transforming, consider this section of an article in The Wall Street Journal which profiles the wife of one of the 2012 New Delhi gang rape:

Some people blame the December gang rape and similar attacks in part on a collision of traditional social expectations—commonplace in rural areas—and the modernity of India’s cities, where rural migrant workers encounter the values of urbanites living by a different set of rules. During the brutal Delhi assault, for instance, the attackers accosted the woman and the young man she was with, asking why they were out together in the evening, the young man told the court.

Speaking about the events of that night, Ms. Devi says she doesn’t understand how a woman could be out for the evening with a man who wasn’t her husband.

The normalcy of this sort of ‘mate guarding’ is taken for granted in many ‘traditional’ societies. You see it reflected in the 1995 film First Knight, where King Arthur tries Lancelot and Guinevere for treason based on a kiss (dishonor to the realm). I won’t go into excessive psychoanalysis, but end by saying that the emergence of radical inequality and stratification with complex societies transformed instincts shaped in small-scale bands where petty conflicts were no doubt the norm. To my knowledge the literature from small-scale societies tends to imply a relatively more relaxed, even modern, attitude toward sexuality than one can see in world of the Eurasian Ecumene.

At this point you might be curious as to the point of reviewing this conjecture. Perhaps I will bring to the fore historical and archaeological evidence which might support this model? No. Rather, I contend that the evidence of this radical reshaping of human power structures, which led to the emergence of patriarchy as we understand it, is reflected in the phlyogenetic history of our species. Two papers illustrate the differing patterns which one sees in the maternal lineage, mtDNA, and the paternal lineage, Y chromosomes.

First, Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers:

Demographic change of human populations is one of the central questions for delving into the past of human beings. To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region (NRY), discovered >4,000 new SNPs, and identified many new clades. The relative divergence dates can be estimated much more precisely using molecular clock. We found that all the Paleolithic divergences were binary; however, three strong star-like Neolithic expansions at ~6 kya (thousand years ago) (assuming a constant substitution rate of 1e-9/bp/year) indicates that ~40% of modern Chinese are patrilineal descendants of only three super-grandfathers at that time. This observation suggests that the main patrilineal expansion in China occurred in the Neolithic Era and might be related to the development of agriculture.

Second, Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian aborigines:

Cambodia harbours a variety of aboriginal (and presumably ancient) populations that have largely been ignored in studies of genetic diversity. Here we investigate the matrilineal gene pool of 1,054 Cambodians from 14 geographic populations. Using mitochondrial whole-genome sequencing, we identify eight new mitochondrial DNA haplogroups, all of which are either newly defined basal haplogroups or basal sub-branches. Most of the new basal haplogroups have very old coalescence ages, ranging from ~55,000 to ~68,000 years, suggesting that present-day Cambodian aborigines still carry ancient genetic polymorphisms in their maternal lineages, and most of the common Cambodian haplogroups probably originated locally before expanding to the surrounding areas during prehistory. Moreover, we observe a relatively close relationship between Cambodians and populations from the Indian subcontinent, supporting the earliest costal route of migration of modern humans from Africa into mainland Southeast Asia by way of the Indian subcontinent some 60,000 years ago.

The scientific methods here are straightforward, or at least tried and tested. The main gains here are in terms of raw numbers and sequencing. Basically this is the extension of phylogeographic work which goes back 20 years, but on steroids. As such one should be cautious. The old phylogeography literature has turned out to be wrong on many of the details. But that’s OK, there’s still gold there, you just have to look.

The broad scale implication of the paper on Chinese Y chromosomal diversity is obvious. Like the Genghis Khan modal haplotype these are lineages which exhibit a ‘star-like phylogeny.’ They explode out of a common ancestor in short order, with few mutational steps. This explosion is simply a reflection of very rapid population growth. The skewed distribution of Y lineages here (i.e., three lineages representing nearly half the population) indicates to me a pattern where elite males tend to be much more fit in reproductive terms than the average male. Rapid population growth may have been correlated with a high rate of extinction of Y lineages due to “elite turnover“.


Citation: Zhang, Xiaoming, et al. “Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian aborigines.” Nature Communications 4 (2013).

The second paper looks at mtDNA, the maternal line. There are some specific results which are interesting. In line with Joe Pickrell’s TreeMix results it does look like Cambodians and Indians share deep ancestry dating to the Paleolithic. The PCA to the left shows the relationships of populations in relation to their haplogroups, and one clear finding is that Cambodians tend to cluster with Indians, and not Northeast Asians. This result is not unsurprising. As I’ve noted before on mtDNA lineages South Asians are closer to East Eurasians than they are to West Eurasians. The result for the Y chromosomes is inverted, while autosomes are somewhere in the middle. In addition the results above show that South Chinese Han mtDNA tend to occupy the same part of the plot as the Dai, who are related to the Thai people of Southeast Asia. In contrast the few North Chinese Han tend to cluster with Tibetans and Altaics. Could Sinicization have been male mediated? There’s been circumstantial ethnographic evidence which points to this (e.g., some Cantonese marriage practices may reflect assimilation of Dai women).

The big picture result to me is that it illustrates the discordance between migration patterns of males and females over the past 10,000 years due to the rise of agriculture and its offspring, patriarchy. I hold that there was no hunter-gatherer Genghis Khan. Such a reproductively prolific male, worthy of an elephant seal, is only feasible with the cultural and technological accoutrements of civilization. ~20,000 years ago Temujin may have had to be satisfied with being the big man in a small clan. Thanks to various ideological and military advancements by the year 1200 AD you saw the rise to power of a man who could realistically assert that he was a ‘world conqueror.’


Credit: Brocken Inaglory

Of course I do not believe that the world before agriculture was static. On the contrary the Chinese Y chromosomal paper reports an inferred pattern of lineage extinction which is regular and consistent. But civilization escalated the magnitude of genocide, and in particular androcide of the losers in the games of power. The relative continuity of mtDNA across vast swaths of southern Eurasia is a testament to the fact that the lineages of the ‘first women’ still persists down among the settled agricultural peoples, whose genomes have been reshaped by untold sequences of conquests and assimilations. While female mediated gene flow can be imagined to be constant, continuous, and localized, I believe that male mediated gene flow has a more punctuated pattern. It explodes due to cultural and social innovations, such as the horse or Islam, and long standing Y chromosomal variation which has emerged since the last wave of conquerors is wiped away in a single fell swoop. Obviously this has an effect on the total genome, and I suspect that in some cases repeated male mediated expansions have resulted in striking discordances between the autosomal and mtDNA lineages. You see this in Argentina, where Native American mtDNA seems to persist to a higher degree than autosomal ancestry because of male skew of European migration. And it looks to be the case in Cambodia, where non-North East Asian autosomal ancestry seems to be present a lower fraction than the equivalent mtDNA.

With the rise of ubiquitous genomic typing and sequencing the geographical coverage will be fine grained enough the broad patterns, and specific details, will become clear. Then we will finally be able to understand if the societies fueled by grain truly ushered in the age of the domination of the many by the few. How easily does a scythe become a sword?

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

And he will be a wild man; his hand will be against every man, and every man’s hand against him; and he shall dwell in the presence of all his brethren.

- Genesis 16:12

By now you may have seen or read two important papers which just came out in Science, 2000 Years of Parallel Societies in Stone Age Central Europe, and Ancient DNA Reveals Key Stages in the Formation of Central European Mitochondrial Genetic Diversity. The details have been extensively explored elsewhere. If you don’t have academic access I highly recommend the supplement of the second paper. It’s also very illuminating if you don’t have a good grasp of the nuts and bolts of archaeology (I do not). I can’t, for example, confirm whether the merging strategies of different archaeological cultures were appropriate or not, because I’m not totally clear in my own head about the nature of these distinct archaeological ‘cultures’ (quotations due to the fact that archaeologists infer culture from material remains, and so they may not be cultures in the sense we understand culture). But the overall finding is clear, in ancient Europe thousands of years ago there were multiple demographic replacements and amalgamations. The post-World War II thesis in archaeology that one could not infer changes in the demographic character from material remains (because the latter can diffuse purely through memetic means) seems to be false. The correspondence is surprisingly tight.

In its broad outlines this was clear before these papers emerged. There is very little I would change from my post The last days of Grendel. This confusing welter of societies in prehistoric Europe is hard for us to conceive of (or reconstruct with any plausibility) today, and as one of the authors of the broader mtDNA paper observes you could not infer this pattern of replacements based on modern patterns of variation. Phylogeography inferring the past from present distributions of variation clearly has limitations, because it is constrained by the necessity to adhere to parsimony in the absence of a dense enough data set. In the early aughts the argument was between scholars who adhered to a more dominant role for demographics in transferring the farming lifestyle (L. L. Cavalli-Sforza et al.) from the Neolithic societies of the Middle East to Europe, and those who pushed forward the thesis of cultural diffusion (Sykes et al.). These are obviously stylized extreme positions, but they capture the essence of the dispute in regards to how cultures transform and expand. Scholars looked at present day European and Middle Eastern populations, and compared their genetic relatedness, usually with male and female lineages (Y and mtDNA).

There was a major problem with this model: the ancient DNA we have is telling us that present population genetic distributions are poorly correlated with past population genetic distributions. And, not only are the ancient populations of Europe rather well mixed and overturned, like a well tilled field, but it seems entirely likely that those of the Middle East are too. Therefore the methodology was bound to mislead from the get-go; the premise of a few major population movements was false. But there was I believe another major lacunae in our understanding: prehistoric people were not entirely atomized. Whether one believed in the central role of demographic movements or cultural transmission, both theses seemed to posit that prehistoric human populations were mobilizing and interacting mostly on a small scale. Diffusing. This seems likely to be wrong. Or at least it misses enough of the picture that it turns out to give a false impression.

To understand what I’m getting on, consider the American migration west in the 19th century. There were multiple forces at work. First, there was a real demographic pressure in many parts of the United States. New England for example was literally at capacity. It simply had no more land for subsistence agriculture which could support a larger population beyond the Malthusian limit. There were three primary responses. A transition up the “value chain” toward industry, made possible by the natural endowments of water power available in the region. Decreased total fertility rate (related to the first). And finally, a mass migration west, first to upstate New York, but then across the Great Lakes and out even to the Pacific. To a great extent these shifts can be modeled as individual (for family/firm) dynamics. People are responding rationally to changing incentives. But this misses “higher level” structural shifts.

As we all are now well aware the United State government entered into a massive program of ethnic cleansing and pacification of the native populations of the western territories, making migration a viable option. It acquired the western seaboard states through victory in war (California) or diplomatic bluster and coordinated demographic assault (Oregon and Washington). These events are linked to macroscale cultural dynamics, encapsulated in a slogan such as Manifest Destiny. Increasing the geographic scale of the model cultural and demographic changes in Europe itself also made itself felt in the United States (i.e., European migration to places such as the Midwest were important contributors to settlement of the nation, and this migration was often due to social and political dynamics in source nations). The reality of these macroscale dynamics means that demographic shifts often occurred in pulses, in a discontinuous fashion.

Credit: dbachman

Because prehistory is defined by the lack of writing from which we can draw detailed narratives, we will always be in the dark as to the specific macroscale dynamics which resulted in the cultural and genetic shifts we infer (barring the development of time machines). But, we can at least construct a correct framework get a true flavor of the context of how humans interacted in the past. As I have stated elsewhere, I believe that once autosomal and Y chromosomal results come on line (mtDNA is more copious and so easier to extract) we will see that many of the discontinuities and shifts are actually attenuated in the female lineage. What I mean here is that the picture from these papers may actually be less radical than the real shifts truly were. In India the source populations for admixture were distinct enough that it seems clear that admixture was male-mediated. West Eurasian Y chromosomal lineages are more well represented fractionally than West Eurasian autosomal ancestry, which is more well represented than West Eurasian mtDNA. The whole zone from West Asia out toward Atlantic Europe was more of a continuum, so solid inferences will have to wait on the ancient DNA.

Finally, one last big picture aspect which I think is important to note is that the genetic distances between ancient populations across small spatial scales was very large. I suspect that with the rise of agriculture, and imperial states, we have seen a massive process of genetic requilibration across vast swaths of Eurasia in particular. Though I think we mislead ourselves if we view prehistory purely as an affair of small scale bands with vague higher order structure, it is still the fact that the scale was smaller than what came later. That leads me to conclude that population genetic diversity as a function of distance in the far past was likely greater than it has been across most of recorded history. So inferences about the character of human genetic diversity derived from contemporary variation is misleading.* The large differences between Bushmen populations may be highly representative of what was the norm in the past.

* To be clear, Fst between continental populations may be the same. But Fst over small scales may have been larger.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Archaeogenetics, Archaeology 
🔊 Listen RSS

Update: Just watch this movie.

No time to write about it now, but check Science Magazine this afternoon (in a few hours from this posting) for a major paper on ancient mtDNA, and the striking correlation between changes in lineage frequencies and cultures that they discovered. Turns out that when you peel back the palimpsest it is much more complicated and surprising than we’d have thought. National Geographic, which funded the project, already has a post out on it:

What they found was that the shift in the frequency of DNA lineages closely matched the changes and appearances of new Central European cultures across time. In other words, the people who lived in Central Europe 7,000 years ago had different DNA lineages than those that lived there 5,000 years ago, and again different to those that lived 3,500 years ago. Central Europe was dynamic place during the Bronze age, and the genetic composition of the people that lived there demonstrates that there was nothing static about European prehistory.

Genographic Project Director and National Geographic Explorer-in-Residence, Spencer Wells expounds: “spanning a period from the dawn of farming during the Neolithic period through to the Bronze Age, the [genetic] data from the archaeological remains reveals successive waves of migration and population replacement- genetic ‘revolutions’ that combined to create the genetic patterns we see today.”

I hope this doesn’t lead to a new simplicity to replace the old one of no migration.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology 
🔊 Listen RSS

Sports Illustrated writer David Epstein has a new book out, The Sports Gene: Inside the Science of Extraordinary Athletic Performance. The title strikes me as coarse and reductive, but I am aware that authors do not always have control over such things. I’ve corresponded with Epstein a bit over the past year, and he’s sent me some passages relating to human evolutionary genetics and paleoanthropology to me to make sure they don’t sound crazy. I haven’t had time to read the book, but judging from the interview I listened to on NPR it’s data rich and theory subtle. Though the title seems to imply that athleticism is a single gene trait where most of the variation in the population is due to genetic variation, Epstein denies this and instead presents the reality that athleticism is a complex trait which many dimensions, subject to numerous genetic and environment variables, and, interactions across those variables. That would make for a less sexy subtitle, but it would have had the attribute of being correct.

Epstein’s survey of the research touches on sensitive topics bound to be sensationalized (e.g., The Urgency—and the Challenge—of Connecting Sports, Race, and Genetics). But it seems likely that there are going to be plenty of “gee whiz” facts in the book judging from the interview. For example, he reports that 17 percent of men over the the height of seven feet (2.14 meters) between the ages of 20 and 40 in the United States are playing in the NBA! Obviously there is no gene which is guaranteed to make you an NBA star, but having the allelic profile which predisposes you to being seven feet tall obviously helps. It also illustrates the ridiculousness which the “10,000 hour rule” has been taken to in popular culture. Practice matters, and, talent matters. At extremely high levels of performance one often needs to have focus to engage in repetitive tasks over and over. But, one also likely needs a preternatural complement of genes. Most of the children of NBA players do not become professional basketball players, but the probabilities are far higher. Epstein outlines these sorts of facts in a breezy and concise manner in the interview, as well as dismissing the infantile disorder of genetic determinism which results in the purchasing of DNA kits which will tell you if your child is an athlete or not.

And yet despite the complexity one of the things that I take away from David Epstein’s description of his book is that there is a massive and robust scientific literature on what makes a great athlete. This seems reasonable because professional athletics is a profitable enterprise, and where there is money there is scientific inquiry. But it helps to reiterate the message now and then.

Addendum: A good place to mention James F. Crow’s 2002 essay Unequal by nature: a geneticist’s perspective on human differences

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Human Genetics, Sports 
🔊 Listen RSS

Pritchard, Jonathan K., Matthew Stephens, and Peter Donnelly. “Inference of population structure using multilocus genotype data.” Genetics 155.2 (2000): 945-959.

Before there was Structure there was just structure. By this, I mean that population substructure has always been. The question is how we as humans shall characterize and visualize it in a manner which imparts some measure of wisdom and enlightenment. A simple fashion in which we can assess population substructure is to visualize the genetic distances across individuals or populations on a two dimensional plot. Another way which is quite popular is to represent the distance on a neighbor joining tree, as on the left. As you can see this is not always satisfying: dense trees with too many tips are often almost impossible to interpret beyond the most trivial inferences (though there is an aesthetic beauty in their feathery topology!). And where graphical representations such as neighbor-joining trees and MDS plots remove too much relevant information, cluttered F STmatrices have the opposite problem. All the distance data is there in its glorious specific detail, but there’s very little Gestalt comprehension.

Rosenberg, Noah A., et al. “Genetic structure of human populations.” Science 298.5602 (2002): 2381-2385.

Into this confusing world stepped the Structure bar plot. When I say “Structure bar plot,” in 2013 I really mean the host of model-based clustering phylogenetic packages. Because it is faster I prefer Admixture. But Admixture is really just a twist on the basic rules of the game which Structure set. What you see to the right is one of the beautiful bar plots which have made their appearance regularly on this blog over the past half a decade or more. I’ve repeated what they do, and don’t mean, ad nauseum, though it doesn’t hurt to repeat oneself. What you see is how individuals from a range of human populations shake out at K = 6. More verbosely, assume that your pool of individuals can be thought of as an admixture to various proportions of six ancestral populations. Each line is an individual, and the proportional shading of each line and the specific color represents a particular K (for K = 6, population 1, 2, 3, 4, 5, 6).

This is when I should remind you that this does not mean that these individuals are actually combinations of six ancestral populations. When you think about it, that is common sense. Just because someone generates a bar plot with a given K, that does not mean that that bar plot makes any sense. I could set K = 666, for example. The results would be totally without value (evil even!), but, they would be results, because if you put garbage in, the algorithm will produce something (garbage). This is why I say that population structure is concrete and ineffable. We know that it is the outcome of real history which we can grasp intuitively. But how we generate a map of that structure for our visual delectation and quantitative precision is far more dicey and slippery.

To truly understand what’s going on it might be useful to review the original paper which presented Structure, Inference of Population Structure Using Multilocus Genotype Data. Though there are follow-ups, the guts of the package are laid out in this initial publication. Basically you have some data, multilocus genotypes. Since Structure debuted in 2000, this was before the era of hundreds-of-thousands-loci-SNP-chip data. Today the term multilocus sounds almost quaint. In 2000 the classical autosomal era was fading out, but people did still use RFLP s and what not. It is a testament to the robustness of the framework of Structure that it transitioned smoothly to the era of massive data sets. Roughly, the three major ingredients of Structure are the empirical genotype data, formal assumptions about population dynamics, and, powerful computational techniques to map between the two first two elements. In the language of the paper you have X, the genotypes of the individuals, Z, the populations, and P, the allele frequencies of the populations. They’re multi-dimensional vectors. That’s not as important here as the fact that you only have X. The real grunt work of Structure is generating a vector, Q, which defines the contributions to each individual from the set of ancestral populations. This is done via an MCMC, which explores the space of probabilities, given the data, and the priors which are baked into the cake of the package. Though some people seem to treat the details of the MCMC as a black-box, actually having some intuition about how it works is often useful when you want to shift from default settings (there are indeed people who run Structure who are not clear about what the burn-in is exactly). What’s going on ultimately is that in structured populations the genotypes are not in Hardy-Weinberg Equilibrium. Structure is attempting to find a solution which will result in populations in HWE.

This brings us to the question of how we make sense of the results and which K to select. If you run Structure you are probably iterating over many K values, and repeating the iteration multiple times. You will likely have to merge the outputs for replicates because they are going to vary using a different algorithm. But in any case, each iteration generates a likelihood (which derives from the probability of the data given the K value). The most intuitive way to “pick” an appropriate K is to simply wait until the likelihood begins to plateau. This means that the algorithm can’t squeeze more informative juice going up the K values.* This may seem dry and tedious, but it brings home exactly why you should not view any given K as natural or real in a deep sense. The selection of a K has less to do with reality, and more with instrumentality. If, for example your aim is to detect African ancestry in a worldwide population pool, then a low K will suffice, even if a higher K gives a better model fit (higher K values often take longer in the MCMC). In contrast if you want to discern much finer population clusters then it is prudent to go up to the most informative K, no matter how long that might take.

Today model-based clustering like Structure, frappe, and Admixture are part of the background furniture of the population genetic toolkit. There are now newer methods on the block. A package like TreeMix uses allele frequencies to transform the stale phylogram into a more informative set of graphs. Other frameworks do not rely on independent information locus after locus, but assimilate patterns across loci, generating ancestry tracts within individual genomes. Though some historical information can be inferred from Structure, it is often an ad hoc process which resembles reading tea leaves. Linkage disequilibrium methods have the advantage in that they explicitly explore historical processes in the genome. But with all that said, the Structure bar plot revolution of the aughts wrought a massive change, and what was once wondrous has become banal.

* The ad hoc Delta K statistic is very popular too. It combines the rate of change of the likelihoods and the variation across replicate runs.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

My friend Zack Ajmal has been running the Harappa Ancestry Project for several years now. This is a non-institutional complement to the genomic research which occurs in the academy. His motivation was in large part to fill in the gaps of population coverage within South Asia which one sees in the academic literature. Much of this is due to politics, as the government of India has traditionally been reluctant to allow sample collection (ergo, the HGDP data uses Pakistanis as their South Asian reference, while the HapMap collected DNA from Indian Americans in Houston). Of course this sort of project is not without its own blind spots. Zack must rely on public data sets to get a better picture of groups like tribal populations and Dalits, because they are so underrepresented in the Diaspora from which he draws many of the project participants.

Once Zack has the genotype one of the primary things he does is add it to his broader data set (which includes many public samples) and analyze it with the Admixture model-based clustering package. What Admixture does is take a specific number of populations (e.g. K = 12) and generate quantity assignments to individuals. So, for example individual A might be assigned 40% population 1 and 60% population 2 for K = 2. Individual B might be 45% population 1 and 55% population 2. These are not necessarily ‘real’ populations. Rather, the populations and their proportions are there to allow you to discern patterns of relationships across individuals.

Since Zack has put his results online, I thought it would be useful to review what patterns have emerged over the past two years, as his sample sizes for some regions are now moderately significant. Though he has K=16 populations, not all of them will concern us, because South Asians do not tend to exhibit many of the components. I will focus on seven: S Indian, Baloch, Caucasian, NE Euro, SE Asian, Siberian and NE Asian. These are not real populations, but the labels tell you which region these components are modal. So, for example, the “S Indian” component peaks in southern India. The “Baloch” in among the Baloch people of southeastern Iran and southwest Pakistan. The “NE Euro” among the eastern Baltic peoples. The last three are Asian components, running the latitude from south to north to center. They only concern the first population of interest, Bengalis. I will combine these last three together as “Asian.”

Below is a table, mostly individuals from Zack’s results (though there are some aggregate results from public data sets). Comments below.

Ethnicity SIndian Baloch Caucasian NEEuro Asian
Bengali 53% 28% 2% 5% 8%
Bengali Baidya 45% 30% 3% 5% 12%
Bengali Baidya 45% 27% 3% 6% 12%
Bengali Brahmin 45% 35% 2% 11% 4%
Bengali Brahmin 44% 35% 5% 11% 4%
Bengali Brahmin 43% 35% 4% 10% 4%
Bengali Brahmin 42% 32% 4% 8% 6%
Bengali Brahmin 41% 33% 7% 8% 5%
Bengali Brahmin 40% 33% 4% 10% 4%
Bengali Brahmin 40% 30% 6% 10% 7%
Bengali Muslim 50% 25% 1% 5% 15%
Bengali Muslim 49% 28% 3% 4% 15%
Bengali Muslim 45% 27% 4% 4% 17%
Bengali Muslim 45% 26% 2% 2% 16%
Bengali Muslim 45% 24% 1% 3% 19%
Bengali Muslim 43% 25% 3% 2% 18%
Bengali Muslim 48% 27% 0% 5% 15%
Tamil Brahmin 48% 37% 6% 5%
Tamil Brahmin 48% 37% 3% 5%
Tamil Brahmin 48% 35% 5% 6%
Tamil Brahmin 47% 38% 6% 4%
Tamil Brahmin 47% 40% 3% 5%
Tamil Brahmin 46% 40% 3% 6%
Tamil Brahmin Iyengar 50% 35% 2% 8%
Tamil Brahmin Iyengar 47% 38% 6% 4%
Tamil Brahmin Iyengar 47% 35% 6% 6%
Tamil Brahmin Iyer 48% 38% 4% 5%
Tamil Brahmin Iyer 48% 38% 2% 5%
Tamil Brahmin Iyer 47% 37% 2% 5%
Tamil Brahmin Iyer 47% 37% 6% 8%
Tamil Brahmin Iyer 43% 35% 6% 5%
Tamil Muslim 58% 28% 3% 2%
Tamil Nadar 62% 30% 0% 0%
Tamil Nadar 59% 32% 3% 0%
Tamil Nadar 55% 30% 3% 0%
Tamil Vellalar 50% 35% 6% 1%
Tamil Vellalar 51% 32% 5% 0%
Tamil Vellalar (Sri Lankan) 60% 32% 5% 0%
Tamil Vellalar (Sri Lankan) 60% 33% 0% 0%
Tamil Vellalar (Sri Lankan) 56% 36% 0% 0%
Tamil Vishwakarma 70% 23% 0% 0%
Tamil Vishwakarma 66% 25% 4% 0%
Andhra Pradesh 60% 34% 2% 0%
Andhra Pradesh 54% 36% 2% 3%
Andhra Pradesh (Hyderabad) 56% 29% 5% 0%
Andhra Pradesh (Hyderabad) 47% 35% 8% 4%
Andhra Pradesh Gouda 61% 30% 2% 1%
Andhra Pradesh Kamma 51% 33% 7% 0%
Andhra Pradesh Kapu 62% 30% 2% 1%
Andhra Pradesh Naidu 51% 32% 4% 2%
Andhra Pradesh Reddy 57% 37% 1% 0%
Andhra Pradesh Reddy 54% 38% 3% 0%
Andhra Pradesh Reddy 51% 35% 4% 0%
Andhra Pradesh Reddy 50% 36% 2% 1%
Andhra Pradesh Telegu Brahmin 45% 33% 6% 4%
AP Brahmin (Xing, N = 25) 49% 36% 3% 6%
AP Naidu (Reich, N = 4) 61% 31% 1% 1%
Kannada Devanga 60% 31% 3% 1%
Karnataka Catholic Christian 56% 37% 3% 0%
Karnataka Lingayat 55% 34% 4% 0%
Karnataka 54% 36% 2% 0%
Karnataka Brahmin 51% 35% 3% 5%
Karnataka Iyengar 49% 36% 5% 5%
Karnataka Iyengar 48% 39% 3% 5%
Karnataka Iyengar 48% 37% 3% 7%
Karnataka Brahmin 47% 38% 4% 6%
Karnataka Konkani Brahmin 47% 37% 2% 6%
Karnataka Konkani Brahmin 46% 33% 6% 7%
Karnataka Kokani Brahmin 44% 34% 6% 5%
Kerala 47% 33% 7% 2%
Kerala Brahmin 43% 39% 4% 6%
Kerala Christian 53% 35% 4% 0%
Kerala Christian 50% 35% 8% 1%
Kerala Christian 45% 33% 7% 3%
Kerala Muslim Rawther 53% 35% 2% 1%
Kerala Muslim Rawther 51% 28% 4% 3%
Kerala Nair 48% 40% 4% 0%
Kerala Nair 47% 38% 5% 5%
Kerala Syrian Christian 50% 37% 6% 0%
Kerala Syrian Christian 50% 35% 9% 1%
Kerala Syrian Christian 46% 33% 5% 4%
Kerala Syrian Christian 44% 33% 6% 4%
Pathan (HGDP, N = 23) 23% 42% 16% 11%
Kalash (HGDP, N = 23) 22% 43% 18% 11%
Burusho (HGDP, N = 25) 23% 41% 12% 10%
Brahui (HGDP, N = 25) 12% 58% 12% 2%
Sindhi (HGDP, N = 24) 29% 46% 10% 6%
Kashmiri Pandit (Reich, N = 5) 32% 39% 12% 9%
Punjabi 43% 36% 5% 9%
Punjabi 39% 39% 9% 7%
Punjabi 34% 43% 7% 7%
Punjabi 34% 40% 12% 8%
Punjabi 33% 44% 5% 10%
Punjabi 31% 41% 14% 8%
Punjabi 29% 36% 11% 11%
Punjabi Arain (Xing, N = 25) 31% 44% 10% 7%
Punjabi Brahmin 35% 40% 8% 11%
Punjabi Brahmin 33% 41% 13% 10%
Punjabi Chamar 40% 33% 9% 6%
Punjabi Jatt 28% 39% 11% 10%
Punjabi Jatt 30% 44% 6% 14%
Punjabi Jatt 28% 42% 8% 13%
Punjabi Jatt 28% 46% 7% 13%
Punjabi Jatt 28% 40% 10% 15%
Punjabi Jatt 27% 44% 10% 13%
Punjabi Jatt 27% 35% 16% 11%
Punjabi Jatt Muslim 30% 39% 13% 8%
Punjabi Khatri 30% 42% 12% 12%
Punjabi Lahori Muslim 31% 44% 11% 8%
Punjabi Pahari Rajput 34% 43% 11% 7%
Punjabi Pakistan 28% 36% 16% 7%
Punjabi Ramgarhia 35% 43% 5% 9%
Haryana Jat 25% 33% 12% 17%
Haryana Jat 25% 33% 12% 17%
Haryana Jatt 28% 38% 5% 20%
Haryana Jatt 26% 39% 10% 17%
Rajasthan Marwari Jain 47% 34% 5% 6%
Rajasthani Agarwal 51% 37% 6% 1%
Rajasthani Brahmin 32% 38% 9% 15%
Rajasthani Marwari 48% 34% 6% 2%
Rajasthani Rajput 45% 38% 5% 9%
UP 40% 28% 10% 8%
UP Brahmin 41% 37% 7% 11%
UP Brahmin 40% 37% 7% 11%
UP Brahmin 37% 38% 2% 14%
UP Kayastha 47% 38% 5% 3%
UP Muslim 33% 33% 10% 9%
UP Muslim 28% 35% 12% 11%
UP Muslim Pathan 48% 36% 7% 4%
UP Muslim Syed 33% 31% 13% 7%
UP Syed 36% 37% 7% 8%
UP/Haryana Agarwal 52% 35% 6% 2%
UP/Haryana Jatt 28% 42% 7% 18%
UP/Madhya Pradesh 51% 27% 1% 7%
UP/Punjabi 40% 33% 7% 10%
UP/Punjabi Khatri 27% 43% 10% 11%
Bihari Baniya 47% 31% 5% 5%
Bihari Brahmin 39% 38% 5% 11%
Bihari Kayastha 53% 33% 1% 7%
Bihari Muslim 48% 28% 5% 8%
Bihari Muslim 42% 34% 9% 6%
Bihari Muslim 41% 36% 7% 8%
Bihari Muslim 42% 32% 7% 9%
Bihari Syed 42% 35% 4% 9%
Gujarati (HapMap, N = 63, Patel) 54% 42% 0% 1%
Gujarati (HapMap, N = 34, Non-Patel) 44% 39% 5% 7%

A recent paper suggested that there was a single pulse of admixture between South and East Asians in the environs of what is today Bangladesh which occurred ~500 A.D. The traditional accounts for the arrival of Brahmins to Bengal suggests a period around and after 1000 A.D. (Bengal was one of the last redoubts of institutional Buddhism in northern India, so presumably would have less need for the services of Brahmins). The results are easy to align with these two facts. All the Bengali non-Brahmins (Baidya are a non-Brahmin high caste in West Bengal) have substantial East Asian ancestry. The Bengali Brahmins have far less of this. Additionally, their “NE Euro” component is about double that of non-Brahmins. There is still room for the Bengali Brahmins being a synthetic community with some admixture (their East Asian fraction is still notably higher than elsewhere in South Asia), but the outlines of the traditional narrative seem to explain the broad outline of these results.

When you look at South Indians from the four Dravidian states there are four facts which strike me as of note:

- There is a distinct difference between Brahmins and non-Brahmins (most of the non-Brahmins Zack has in the Harappa data set are upper caste, though the public data sets have Dalits and tribal populations)

- There is very little difference between South Indian Brahmins by region and sect (e.g., Iyengar vs. Iyer are Tamil Brahmins divided by theological differences).

- South Indian Brahmins are genetically distinct from North Indian Brahmins. They seem to have about one half the proportion of the “NE Euro” component as North Indian Brahmins (e.g., compare to Bengali Brahmins).

- South Indian non-Brahmin upper castes have very little of the “NE Euro” component, which is found at low, but consistent fractions among non-Brahmins in the Gangetic plain (and at much higher fractions as one moves toward the Punjab)

I do not know about the nature of the origin of the Pancha-Dravida group of Brahmins, but they look to be endogamous, from the same source, and probably had some admixture with the local substrate early on. This would explain their uniformity and lower fraction of “NE Euro” in relation to North Indian Brahmins. The results above also suggest that the Syrian Christians derive from converts from the Nair community, or related communities. This should not be surprising.

Finally let’s move to North India, and the zone stretching between Punjab in the Northwest and Bihar in the East. Though in much of this region Brahmins have higher “NE Euro” fractions, this relationship seems to breakdown as you go northwest. The Jatt community in particular seems to have the highest in the subcontinent. There are inchoate theories for the origins of the Jatts in Central Asia. I had dismissed them, but am thinking now they need a second look. The reasoning is simple. The Jatts of the eastern Punjab have a higher fraction of “NE Euro” than populations to their northwest (Pathans, Kalash, etc.), and Brahmin groups (e.g., Pandits) in their area who are theoretically higher in caste status. This violation of these two trends implies something not easily explained by straightforward social and geographic processes. The connection between ancestry and caste status also seems to break down somewhat in the Northwest, as there is a wide variation in ancestral components.

Someone with more knowledge of South Asian ethnography should weigh in. But until then I invite readers of South Asian heritage to submit their results to Zack.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

It is well known that Alexander the Great invaded the Indus river valley. Coincidentally in the mountains shadowing this region are isolated groups of tribal populations whose physical appearance is at at variance with South Asians. In particular, they are much lighter skinned, and often blonde or blue eyed. Naturally this led to 19th and early 20th century speculation that they were lost white races, perhaps descended from some of the Macedonian soldiers of Alexander. This was partly the basis of the Rudyard Kipling novel The Man Who Would Be King. Naturally over time some of these people themselves have forwarded this idea. In the case of a group such as the Kalash of Pakistan this conjecture is supported by the exotic nature of their religion, which seems to be Indo-European, and similar to Vedic Hinduism, with minimal influence from Islam.

Kalash girl, Credit: Dave Watts

The major problem with this set of theses is that they are wrong. And the reason I bring up this tired old idea is that many people, including Wikipedia apparently, do not know that this is wrong. I’ve had correspondents sincerely bringing up this model, and, I’ve seen it presented by scholars offhand during talks. There are many historical genetic issues which remain mysterious, or tendentious. This is not one of them. There are hundreds of thousands of SNPs of the Kalash and Burusho distributed to the public. If you want to know how these populations stack up genetically, analyze them yourself. I know that they aren’t related to Macedonians because I have plenty of European population data sets, and I have plenty of South Asian ones. The peoples of the hills of Pakistan are clearly part of the continuum of the latter, albeit shifted toward Iranian peoples.

Those seeking further proof, and unable to analyze the data themselves for any reason, can check out my posts on the topic:

- The Kalash in perspective

- Kalash on the human tree

Addendum: It would be nice of someone corrected the appropriate Wikipedia entries.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Human Genetics, Human Genomics, Kalash 
🔊 Listen RSS

Credit: Ryan Somma

Like many people I’ve been following the tales of the Hobbits of Flores, H. floresiensis, with some interest since 2004. And, like most people I have no personal expertise or skill which is relevant to evaluating whether this putative hominin species actually is a new species (as opposed to a pathological modern human). So how are we to evaluate a new PLOS ONE article which comes down on the side that it is a new species? First, my very vague impression is that over the past ~10 years the new-species camp has been gaining ground on the pathological-modern-human set. But setting all that aside perhaps the critical issue for me is that the likely reality of archaic human admixture into our own lineage means that the world is far stranger than we had thought in 2004. For various anatomical and paleoanthropological reasons H. floresiensiswas implausible. But as implausible as the idea that the genome of a Siberian hominin would yield admixture in modern Papuans?

Addendum: New York Times on the paper in PLOS ONE.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Hobbit, Human Evolution 
🔊 Listen RSS

Related to Muhammad?
Credit: Ian Beatty

Last year a paper came out in AJHG which reported that Ethiopian populations seem to be a compound of West Eurasians and Sub-Saharan Africans. This is result itself is not too surprising for a host of reasons. First, Ethiopians and other populations of the Horn of Africa are physically equidistant between West Eurasians and Sub-Saharan Africans. 20th century physical anthropologists sometimes placed them in the “Caucasoid” racial classification for this reason. Second, the languages of the Horn of Africa have Afro-Asiatic affinities. The Cushitic languages (e.g. Somali) have deep connections with more familiar tongues such as Arabic, but Semitic Ethiopian languages (e.g. Amharic) are much closer in historical distance. Third, there has been a fair amount of previous genetic analysis of these populations, and their synthetic character was obvious from those (e.g. mtDNA and Y results suggest a diverse array of haplogroups). What the AJHG paper reported was that the Eurasian ancestors of the Ethiopians admixed with the presumably Sub-Saharan indigenes ~3,000 years ago in a single pulse event, and, their closest modern relations in West Asia today are Levantines. To put a mild gloss on it the dating is controversial (using patterns of decayed genetic correlations of markers across the length of the genome). This is not just clinal variation.

As I have noted if this dating of the admixture is correct then modern Ethiopians as a coherent biocultural entity post-date Egyptian civilization by thousands of years. During the reign of Hatshepsut, ~1500 BC, there was a trade delegation send to the land of Punt (probably Somaliland). The depictions of the people of Punt by the Egyptians were very strange, insofar as they did not look Sub-Saharan African, and, the queen of Puntland seemed to exhibit steatopygia more common among the Khoisan people. I posited speculatively that during this period of ancient Egyptian civilization East Africa was in ferment in regards to its population mix. The Bantu people who dominate the landscape of Sub-Saharan Africa east and south of Cameroon only began to expand in earnest after 1000 BC (reaching southern African about 1,500 years ago). It seems plausible that the range of Khoisan-like peoples was much further north and east than is the case today. Additionally, there are likely to have been other populations, currently uncharacterized, present on the landscape (it may be that the Khoisan loom large only because their distribution was such that relic populations survive to this day to be studied). The Tishkoff lab for example has a paper in preparation on the presumed Sub-Saharan African populations present in the Horn of Africa when West Eurasians arrived (the Sub-Saharan component of highland Ethiopians does not seem to be Bantu-like).

I bring all this up again because Dienekes highlights an abstract by Joe Pickrell at next weeks’ SMBE 2013:

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved temporarily or permanently into the region. The influence of these interactions on the genetic structure of current populations remains unclear. Here, using patterns of linkage disequilibrium, we show that there are at least two admixture events in the genetic history of southern African hunter-gatherers and pastoralists: one involving populations related to Niger-Congo-speaking African populations, and one which introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We estimate that at least a few percent of ancestry in the Khoisan is derived from this latter admixture event, which occurred on average 1,200-1,800 years ago. We show that a similar signal of west Eurasian ancestry is present throughout eastern Africa; in particular, we also find evidence for two admixture events in the genetic history of several Kenyan, Tanzanian, and Somali populations, the earliest of which involved populations related to southern Europeans and which we date to approximately 2700 – 3300 years ago. We thus suggest that west Eurasian ancestry entered southern Africa indirectly through eastern Africa. These results demonstrate how large-scale genomic datasets can inform complex models of population movements, and highlight the genomic impact of largely uncharacterized back-to-Africa migrations in human history.

The Khoisan here are not specified, so I took the liberty of putting an image of a Bushman at the head of the post. But it seems more plausible that they would be Khoi, who received cattle-culture from non-Bantu populations (they had them when the Bantu arrived) at some point in the distant past. There is already evidence that the enigmatic Sandawe people of Tanzania have old Eurasian admixture, so this would not be particularly surprising. The Sandawe language has affinities with that of the Khoisan (clicks), though the broader language family as a coherent entity is still controversial (some of the Bushmen languages themselves may not really have a close affinity, aside from broad distinctive similarities such as clicks). The whole question of the ethnogenesis of the Sandawe people seems clouded until we get denser population data sets in terms of geographic coverage.

Jan Jonker Afrikaner, leader of the Oorlam people

As for how Eurasian ancestry might have entered into the Khoisan, that is a process which is easy to imagine, because more recent European and Asian ancestry has entered into these populations over the last five centuries in ways which have been recorded by history. Some of this has been through organized ethnic amalgamation as one people become assimilated into another (just as the Alans assimilated into the Vandals in Spain after their defeat by the Romans). Consider the case of the Oorlam people, who are now part of the Nama Khoi tribe in Namibia.

In the period between the rise of modern South Africa in the early 20th century and the initial founding of the Cape Colony in the 18th century a large number of multiracial individuals of Northern European, Asian (Indian and Indonesian), and native (Khoisan, and later Bantu) origins arose from various contacts (relationships between European men and slaves, relationships between African and Asian slaves, etc.). The most well known of the resultant people are the Cape Coloureds. Though Afrikaner in language, and Dutch Reformed in religion, this population has a diverse racial heritage, and was relegated to second class status by white Afrikaners. Less famous, but still well known, are the Griqua, frontiers folk who created their own political units similar to what their white Afrikaner cousins achieved further east in the 19th century to escape colonial rule as well as racial oppression due to their ‘bastard’ status. Finally you have groups like the Oorlam, who like the Griqua attained some superiority over native populations via their European cultural heritage & connections further into the bush in Namibia, beyond the farthest frontiers of Dutch republics and British South Africa. But they were eventually submerged into the Khoi Nama after a series of defeats imposed upon them by a Bantu tribe. What you see here is a shift in the balance between European and native cultural traditions being salient as a function of distance from the Cape Colony (some Griqua men were known to have disappeared into the desert and assimilated into their “mother’s kin,” probably explaining how some Bushmen have European and Asian ancestry).

It seems entirely possible that this sort of dynamic may have played out in antiquity in Africa (and other places). In some cases where demographic preponderance and environmental conditions were amenable the amalgamated populations maintained a large degree of affinity with Eurasians, rather than going “native.” This is certainly the case in the Horn of Africa, where Ethiopian polities intervened in Arabian affairs, and became part of the Oriental Orthodox community of polities (which includes the Copts of Egypt, Armenians, and Syrian Orthodoxy). To a great extent Ethiopia was more a frontier of the West Eurasian oikoumene than part of Sub-Saharan Africa. In contrast you have a situation such as that of the Sandawe, whose vague Cushitic affinities have long been suspected, but had become hunter-gatherers like their Hadza neighbors, and adopted broad elements of the language and culture of the Khoisan. Finally, you have the likelihood that the Khoi peoples only retained extremely useful cultural knowledge such as animal husbandry (and animals) from their Eurasian forebears, who in any case had ancestry likely mediated through Sandawe-like cattle herders of already hybridized nature.*

In the big picture what these results tell us is that the story of human prehistory is complex and multi-layered, and peeling it apart genetically is going to leave us with more questions than answers. Ten years ago a simple story of Out of Africa ~50,000 years ago and subsequent fission between non-Africans (e.g. Europeans separation from Asians, Amerindians separate from Asians) was a robust stylized model we could live with and accept with a clean conscience. Today we are confronted with a more inscrutable world, with archaic and Holocene admixture littering the scene. A substantial proportion of the world’s population (e.g. India) seem to be the byproduct of admixture between very distinct populations which merged less than 10,000 years ago. Much of Sub-Saharan African has been totally remodeled culturally and likely genetically over the past 3,000 years. Now there is a fair amount of evidence that eastern Africa has been subject to “back migration” over the past 5,000 years (though there have long been uniparental lineages which suggest this). A simple story of humans leaving African Eden is no longer viable, because Africa wasn’t Eden for everyone, and modern Africans themselves have felt the stamp Eurasian migratory events, as well as extensive internal folk wanderings.

Finally, I would say that perhaps the most genetically valuable people to study might be the Mbuti Pygmies of eastern Congo. I suspect they’re the least touched by both the Bantu expansion and the Eurasian back migrations of all the Africans. At least I hope.

* The Lemba of Zimbabwe are probably one of the clues to what occurred in Southern Africa over the past 2,000 years.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Human Genetics, Human Genomics 
🔊 Listen RSS

Credit: Sci Transl Med 3 July 2013: Vol. 5, Issue 192, p. 192ra86, Sci. Transl. Med. DOI: 10.1126/scitranslmed.3006338

Right before I was to sleep a reader sent me an email which pointed to a Nick Wade piece in The New York Times, Gene Sleuths Find How Some Naturally Resist Cholera. It’s about new research in ScienceTranslational Medicine, Natural Selection in a Bangladeshi Population from the Cholera-Endemic Ganges River Delta. The authors use the “composite of multiple signals” (CMS) test to ascertain regions of the genome subject to natural selection (look for long haplotypes, high frequency derived alleles, and alleles with high cross population frequency differences). The results aren’t too surprising, I was born in Bangladesh, and I can attest to the fact that it’s a germaphobe’s nightmare. Rather, it is a secondary and very minor aspect of the paper which frankly draws my ire. First let’s quote Wade’s treatment:

As a necessary preliminary to testing for natural selection, the researchers looked at the racial composition of the Bengali population and found that they are an Indian population with a 9 percent admixture of East Asian genes, probably Chinese. The admixture occurred almost exactly 52 generations ago, according to statistical calculation, or around A.D. 500, assuming 29 years per generation. The Gupta empire in India was in decline at this time, but it is unclear whether the intermarriage with East Asians took place through trade or conquest. “We can now go back to the historians and see what happened then,” Dr. Karlsson said.

But sometimes science gets garbled in transmission. What do they say in the paper? Again, the relevant section:

We estimate that the admixture between Indian and East Asian populations occurred 52 ± 2 generations ago (generation = 29 years)…or around 500 AD, based on the exponential decline of linkage disequilibrium (LD) with distance analyzed using ROLLOFF…This remarkably close-fitted age estimate roughly corresponds to the collapse of the Indian Gupta Empire, the rise of the Chinese Tang dynasty, and the brief unification of Bengal under a single ruler (590 AD to 625 AD). Although alternative histories, such as continuous admixture or multiple admixture events, are possible, the single-event model shows excellent fit to our data, and we found no statistical support for very ancient flow…Using the maximum likelihood–based ancestry estimation software ADMIXTURE (32), we found 9.3 ± 2.6% East Asian ancestry in the BEB….

If you read the rest of the paper you can see where Wade would get the idea that the admixture was “probably Chinese,” but I don’t think it jumps out from the text or the supplements. Perhaps he got this from the first author who is quoted; I have no idea. The population history component of this work is not essential to the understanding of the selection operating via disease, but the problem when this sort of confusion is allowed to stand is that it will become solidified conventional wisdom. I’ve seen this happen before, as throwaway lines have a way of persisting and spreading. Additionally, it also makes geneticists seem superficial when they step outside of their narrow domains of presumed knowledge.

In fact the reality of who these East Asians are in the Bengali ancestral gene pool is clear in the supplements. At K = 5 you see that a Malay-like ancestral component emerges, and all of the East Asian ancestry of the Bengalis is assigned to this. You also have the Han Chinese and Chinese in Singapore data sets split between a Japanese-like and Malay-like component. The Chinese in Singapore is more Malay-like. This is entirely expected, as the Singapore Chinese population is disproportionately from the southwest region of Fujian, with a minority of Hakka (who are reputedly northern transplants to South China), as well as Straits Babba Chinese, who have maternal Malay ancestry. The Han Chinese data set is from the HapMap, with a Beijing sample which is northern biased. In sum, the East Asian ancestry of the modern Bengalis is almost certainly derived from a group with the closest affinities to those in Southeast Asia, not, the Chinese.

I have long suspected this because I have genotypes for two unrelated Bengalis, my parents. I’ve talked about this before. Some preliminary investigation on my part looking at the East Asian segment length did suggest an admixture event ~1,000 years ago, so I can believe the ROLLOFF results. But comparing them to various populations, it’s clear that the East Asian ancestry is Southeast Asian for both of them. Here’s a MDS I generated 15 minutes ago using 133,000 LD pruned SNPs:

The Chamar are a low caste group from Uttar Pradesh (to the north and west of Bengal). You notice that my parents (“RazibFam”) are both shifted toward the Southeast Asian groups. Totally unsurprising. No matter how you analyze the results this jumps out (the second dimension pulled out a few inbred individuals from another Southeast Asian group).

So assuming one admixture event (or a primary one), what’s a good story? Eastern Bengal has always been on the margins of and beyond Aryavarta, the cultural core of the North Indian plain. Magadha, the heart of the Gupta Empire in Bihar, was long a marchland. Like the region around Xian in northwest China Magadha was often the locus of the classical pre-Islamic Indian macro-polities, despite (or perhaps because of?) its liminal relationship to the broader Indo-Aryan culture of Northern India, which was fixed upon the Upper Gangetic plain. As Magadha collapsed, and Bengal rose under dynasties such as the Pala later in the first millennium the frontier of Indo-Aryan society rapidly expanded outward and eastward. But I doubt eastern Bengal was empty. It is quite possible there were slash & burn agriculturalists who had arrived at some previous time from Southeast Asia. And it is these who I believe were absorbed into Bengali society in toto.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Encephalization of hominins
Credit: Luke Jostins

Over the past ~2 million years, up until ~100-200 thousand years before the present, the lineages leading up to modern humans have exhibited gradually increased cranial capacities. Why? The implication of the shift in anatomy is that our brains are getting large, ergo, our cranium needs to expand to accommodate the increased size. Larger brains are not a trivial matter. Human brains account for on the order of ~25% of our energetic expenditures. In other words they’re expensive, so presumably they result in some major gain in fitness.

There have been many reasons posited for this increase in brain size. Please note though that this gradual increase predates the cultural creativity of the Upper Paleolithic, and the emergence of behaviorally modern humans ~50,000 years before the present. In fact there has been a mild reversal of encephalization since the Last Glacial Maximum ~20-25 thousand years before the present. So this is not as simple a conundrum as you might think.

About a generation ago the anthropologist Robin Dunbar came up with an answer which has been broadly persuasive to many. Using comparative data from other primates, as well as human ethnographies, he posits that it was increased social complexity facilitated by language which entailed greater cognitive demands on our lineage. The basic intuition is obvious. Keeping track of interactions across a dyad, two individuals, is not particularly demanding. But a “three body” social problem is not just incrementally more complex. As band sizes scale up to a dozen individuals, and groups of bands include hundreds (clans?), individuals have to keep track of a maddeningly complex network of relations. Dunbar’s surveys suggest that the real social network of humans (e.g., non-famous people whose personal lives you have some familiarity with) does not go much beyond 200 individuals. This is Dunbar’s number.

The neurological rationale for this ceiling is obvious: your brain can only keep so many people and connections between those people in long term memory. In theory the brain might become even larger, but it seems likely that the fact that the human female pelvis can widen only so much and maintain proper locomotion serves as a check on cranial capacity. Dunbar’s number is then the outcome of a compromise between the gains to fitness of individuals of being in large social groups (and negotiating themselves well in those groups) and the anatomical limits of the female body.

Grooming, Gossip, and the Evolution of Language, the book outlining the thesis in long form came out over 15 years ago. But it is today that Robin Dunbar is very hot, and very topical, as far as scholars go. The reason is the rise of ubiquitous social networking technologies. A few years ago a friend of mine posited that perhaps with Facebook we may now have ways to move beyond Dunbar’s number. I was skeptical. But then again, computers allow us to break out of our cognitive deficits in may domains, so it’s not totally impossible (imagine doing an MCMC by hand!). But the results seem in, and it turns out that many people within Silicon Valley either view Dunbar’s number as a genuine insurmountable ceiling, or, a challenge which is not trivially tractable.

The details are outlined in a BusinessWeek piece, The Dunbar Number, From the Guru of Social Networks. Though the article reports on some scholars who disagree with Robin Dunbar’s number in terms of its value or utility, I think it is clear that despite this scholarly debate, the reality that our social domains are constrained toward values of less than 1,000 individuals is a robust finding. This is important, because it suggests that there are numerical units of particular relevance when it comes to scaling large social organizations, including nations of hundreds of millions of individuals. These Dunbar number sized units are probably the civil society “platoons” alluded to by social philosophers of previous centuries. They are the bricks upon which society is built.

But for me the more interesting question is the possibility of variation in social network size and topology by individual. If the expected value of the size of an individual’s social network is 150, what’s the variation? Presumably individuals with severe autism converge upon zero, as they can not form conventional relationships with other human beings. But what about individuals with very high functioning autism? And how about the other end? Are there those whose cognitive powers are allocated in such a way that they can push somewhat beyond Dunbar’s number? Those more familiar with the literature in this area are free to enlighten me….

I’ll leave you with a fascinating paragraph from the piece:

…Dunbar actually describes a scale of numbers, delimiting ever-widening circles of connection. The innermost is a group of three to five, our very closest friends. Then there is a circle of 12 to 15, those whose death would be devastating to us. (This is also, Dunbar points out, the size of a jury.) Then comes 50, “the typical overnight camp size among traditional hunter-gatherers like the Australian Aboriginals or the San Bushmen of southern Africa,” Dunbar writes in his book How Many Friends Does One Person Need? Beyond 150 there are further rings: Fifteen hundred, for example, is the average tribe size in hunter-gatherer societies, the number of people who speak the same language or dialect. These numbers, which Dunbar has teased out of surveys and ethnographies, grow by a factor of roughly three. Why, he isn’t sure.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Dunbar's Number, Encephalization 
🔊 Listen RSS

Several years ago I reviewed Christopher Beckwith’s magisterial Empires of the Silk Road: A History of Central Eurasia from the Bronze Age to the Present. In many ways Beckwith’s narrative is a refreshing inversion of the traditional form of macrohistory, whereby charter societies along the Eurasian littoral issue civilizing tendrils toward the heartland, and are met with periodic barbaric eruptions which they then have to assimilate. From what I can gather Beckwith is not a subjectivist. Rather, the inversion of perspective serves to flesh out neglected dynamics at work across history and near prehistory. For example he highlights the reality that core polities of the Eurasian littoral often crystallized on the barbaric marches of established civilization via process of synthesis between the two cultures. Zoroastrian religion emerged on the northern frontier in Khorasan rather than the southwestern Iranian heartland of Fars. Han China’s predecessor in the form of the Chin dynasty arose from a marcher state in the northwest, and the same was true of the previous ruling house, that of the Zhou. In India classical Hindu civilization first congealed in an elaborated form in Magadha, on the eastern frontiers of Aryavarta. In the West Rome was fundamentally a barbaric and peculiar fringe polity, with only tenuous connections to Magna Graecia, and arguably more influenced by the enigmatic Etruscans.

The last of the World Conquerors?

The vigor of frontiers is such an established historical cliche that I have no great enthusiasm to revisit it in detail. Rather, following Beckwith I believe we need to seriously revisit the proposition that the vast expanses of the Eurasian heartland beyond the civilized frontiers have served as more than just a source of militarized barbarians bent on exploitation. Yes, all that is true, but it seems likely that the cultural and racial melange at the intersection of internal Eurasian trade networks have fundamentally reshaped the contemporary landscapes in ways we are only now beginning to understand. But first, our worldview has to acknowledge that not all peoples and lands have made contributions of equal weight to the shape of the world.

Elements of civilized society, from organized religion to bureaucracy seems to have arisen in distinct and unique forms in three of the charter hearths of the Eurasian littoral. In the west were the cluster of societies which radiated out of the Levant and Mesopotamia. In the east the north China plain served as the locus of the proto-Han civilization. And in South Asia the northwestern region between the Indus and the Ganges gave rise to an influential cluster of societies. To illustrate my point the culture of Java is unique, with deep indigenous roots. But, its high civilization has been clearly affected by both that of South Asia, and later Western Eurasia (in the form of Islam). In mainland Southeast Asia the people of Vietnam by and large look to the north, to China. Their rulers were self-styled Emperors, who administered a bureaucratic society. In contrast the societies to their west are more Indic, in that their symbolic currencies are rooted in South Asia (e.g. Theravada Buddhism and the Chakravartin).

Spread of the Indo-European languages?
Credit: Dbachman

But the barbarians of the heartland are not without accomplishment either. Though there are still debates as to the ultimate origins of the Indo-Europeans, I think it is hard to dispute that at least some of the expansion of this language family was mediated via the Eurasian heartland. Later in history the Turkic language family spread rapidly over the course of 500 years, moving from a group of dialects clustered on the trans-Siberian fringe, to an international collection of tongues spanning China to Europe. Though united by a language, the Turkic peoples are biological variegated. Populations such as the Kirghiz and Yakut remain predominantly East Eurasian in character. Those such as the Rumelian Turks have only a mildest of tinctures, if any, of East Eurasian ancestry. Those groups occupying the middle ground include most Central Asians, such as Uzbeks. I suspect some of the same applies to Indo-Europeans. Genetically there is little in common, but the tell-tales signs of genetic affinities will be eventually found.

Conquering multiculturalists!

In earlier ages the narrative of the rise of civilizations tended toward an explicit or implicit racial diffusionism. By this, I mean that in antiquity and the early medieval period potentates asserted lineages which went back to the ancient Greeks, Trojans or Hebrews. This established their legitimacy because the high civilization which Northern Europeans inherited had Mediterranean antecedents. In our more recent era more explicitly racialist narratives of Nordic hordes spilling out of the north have been bandied about. A working assumption in both these models is that the purity of the vigorous herrenvolk of yore degrades over time. Asabiyyah unwinds via natural processes.

Moderns have turned their back on these narratives because they are legendary and unpalatable. Even believing Christians are unlikely to accept that the British royal family is of the lineage of king David. The heroes of Homer are simply not relevant to us due to the decline of the classical education. As far as the theories of Nordic superman, that sort of racial triumphalism went out of favor with the Nazis. In their place is an inchoate set of impulses, perhaps best articulated by the pots not people framework in archaeology. Rather than a broad framework there is a vague sentiment of cultural egalitarianism which fits uncomfortably with the rapid and explosive spread of “pots” periodically.

But I am here to present a new model. One of mongrelization, hybridization, and synthetic vigor. The cultural elements of this model have been long present. The Ottoman Turks assimilated Armenian, Kurd, and Greek notables, so that the Sultans of the later years had little “Turkic” blood in them. But their language remained Turkic, and some aspects of their cultural mythology was grounded in their Central Asian origins. Today multiculturalism is often perceived to be an egalitarian ideology, but the Ottomans represent a more accurate historical instance. Though synthetic in origin they had a core self-identity which was domineering, expansive, and acquisitive. Those who did not assimilate to that self-identity were made to be subjects, with second class status.

Today genetics is telling us that these long term connections and diffusions across Eurasia are very old. Modern Europeans seem to have a non-trivial quantum of East Eurasian ancestry. Many East Eurasian groups also exhibit the same pattern. Modern Indians are clearly a hybrid between a West Eurasian and South Eurasian set of populations. And these are simply the more distant genetic affinities which have been scrambled. Today Dienekes posted a translation of a German research project which documents the ethnic complexity of the Eurasian heartland thousands of years ago. Multi-layered complexity in the heartland has very antique roots. In Empires of the Silk Road Christopher Beckwith emphasizes that the free men of the steppe formed bonds of brotherhood which cross-linked them across ethnicity and family (e.g. Jamukha and Temujin). Perhaps these ideological paradigms predicated upon fictive kinship are a natural response of peoples whose origins are synthetic, and who can not fall back on implicit and traditional myths of identity.

The massive polities of the Eurasian littoral had enough surplus worthy of stealing on the part of its rulers. In ancient Egypt pharaoh even had the whole land stolen for his own private property. This is what the brotherhood of the steppe craved, and this is what the often captured. How did they do this? As peoples with diverse origins from brought together from the antipodes of Eurasia perhaps their primary currency was in ideological toolkits which might allow for greater coordination and organization. While the rulers of the littoral societies viewed their peasant masses like an extractive resource, men such as Temujin and Atilla had be entrepreneurial, and always maximize the productivity of their human capital and operate as a lean organizational machine. They were the investment bankers of their age, plundering the human capital of distant lands, and binding them together toward one selfish purpose.

Addendum: See The Geographic Pivot of History.

(Republished from Discover/GNXP by permission of author or representative)
• Category: History, Science • Tags: Anthropology 
🔊 Listen RSS

Credit: Robert Payne

In my earlier post on Prince William’s mtDNA lineage, and its possible Indian provenance, I didn’t address the issue of genetic privacy in much detail. The discussion is relevant in this case because BritainsDNA inferred his lineage by looking at distant relatives. Assuming that the biological pedigree we have for William is correct, he must share the mtDNA of his relatives who descend in an unbroken line from a common female ancestor.

A concern about the breach of privacy emerged almost immediately. Though I have serious reservations about the sensationalism which BritainsDNA has engaged in, I think it is totally legitimate of them to infer William’s ancestry in the fashion they did. First, Prince William is a public person, and in direct line to the throne of the United Kingdom. Though some of the spin may be distasteful, remember that this is a person who is where he is because of his ancestry. Second, anyone who performs genealogical research is exposing the information of family members, often without their consent. If William’s mtDNA haplogroup was known to be pathogenic than the case for withholding the information from the public seems straightforward. As it is all that was uncovered was relatively banal, that William may have a South Asian ancestress. There’s a lot of information about me that I’d rather not others know first, but that’s not how the world works. In the grand scheme of things this just isn’t a big deal, and we should focus on the more concrete problem of public understanding of science, and long term issues in regards to genetic privacy more generally.

Addendum: I am aware of concerns in regards to paternity. On the whole I generally think in most situations this is probably information that is going to come out in any case, and so it wouldn’t hurt for it to emerge earlier. Additionally, in the cases of historical figures such as Thomas Jefferson’s presumed line of descent there were widely diverging views among the white descendants as to whether they should cooperate because of the possible moral implications. I suspect most would agree it is better to know this information, even though it implied that line of putative black Jefferson descendants may have paternity misassignment in their lineage. Finally, obviously these issues are far diminished in the case of mtDNA, since maternity is guaranteed. Though one never knows if someone who was adopted was never told of his reality.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Genetics, Prince William 
🔊 Listen RSS

MDS of all samples

Noah Rosenberg’s lab has put out the mother of all microsatellite papers, Population Structure in a Comprehensive Genomic Data Set on Human Microsatellite Variation. It seems to me that this is the culmination of all the work with microsatellite markers which has come out of his lab over the past decade, applying all sorts of fancy analytic techniques they’ve developed (for example, Procrustes transformation). The big thing to note is that the human sample size is nearly 6,000 individuals with over 600 loci. Because microsatellites mutate and diverge very fast (mutation rates 10-4 rather than 10-8as with SNPs) 600 loci is more than sufficient to differentiate populations. Because of this rapid mutation I’m a little dubious about their attempt to explore human-chimp differences using a smaller set ascertained on humans, though that may be simply a proof of principle (if the markers evolve too fast they might not tell you much informative about very deep divergences).

Click to enlarge

Reading the paper it’s quite obvious that just merging the samples was a big feat. And it’s not just sample size, they had excellent population coverage (267). As Dienekes observes microsats are somewhat “retro”, but try and get this sort of population coverage with whole genomes, or even SNPs. You can get to N>5,000, but with SNPs the overlapping markers start to drop off very quickly, to the point where they are far less informative than this number of microsats. Dienekes quite liked the tree to the left, and I’ve uploaded a rather large version of it for your enjoyment (just zoom in if your browser sizes it down).

But to some extent the tree above illustrates the limitations of this sort of analysis. Rather than an analysis, this is really more a useful data set that you have to slice and dice, and explore on a finer grain. Pooling all the samples together makes it far less informative and unintelligible. This is already obvious in their aggregation to create the large data set, as they had to prune very large subpopulations so they didn’t overwhelm the results. Even then problems obvious to those familiar with the data crop up, though they might not be so clear to those who are reading superficially. The Gujarati data set among the South Asians separated out on a two dimensional visualization from all other populations. This is something that often occurs because it looks like Gujaratis are sampled from a very specific caste, which increases the perceived affinity of this regional ethnicity. Similarly, pooling all the populations and representing them on a two dimensional plot is more an aesthetic declaration than an informative visualization. You have to bracket out the populations to see value-added structure. Finally, even the coarse and general observations need to be integrated with caution. Rosenberg’s lab has been illustrating the decay of genetic diversity from Ethiopia for nearly a decade now. It’s a classic result which shows up in graduate level population genetics courses. But both the anthropology and genetics tell us that Ethiopians are a compound population with Sub-Saharan African and Eurasian affinities. Most readers can be expected to know this, but I would not be surprised if some simply took the general plot at face value and applied the insight to all the populations, as if they really were subject to a serial founder effect (my specific point is that Ethiopians are the product of a synthesize due to back migration, reversal of the general migration out of Africa being illustrated with the decline in genetic diversity).

Overall I find this an interesting paper which sets the backdrop for understanding the canvas of human genetic variation. The only last caution I would offer is that microsatellites are atypical regions of the genome which evolve rapidly in a neutral fashion. This makes them excellent for pinpointing population differences and inferring history from a limited marker set. But I think people should be cautious of specific novel results, and not hold them up as that authoritative when we have high density SNP data.

Note: They’ve released the data. If readers are curious about doing different things with these data than was shown in this paper, Treemix can handle microsats. Also, props to them for releasing this creative commons.

Citation: Pemberton, Trevor J., Michael DeGiorgio, and Noah A. Rosenberg. “Population structure in a comprehensive genomic data set on human microsatellite variation.” G3: Genes| Genomes| Genetics 3.5 (2013): 891-907.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Christopher Columbus

A few year ago there was a minor controversy when some evolutionary genomicists reported that they had reconstructed the genome of the extinct Taino people of Puerto Rico by reassembling fragments preserved in contemporary populations long since admixed. The controversy had to do with the fact that some individuals today claim to be Taino, and therefore, they were not an extinct population. Though that controversy eventually blew over, the methods lived on, and continue to be used. Now some of the same people who brought you that have come out with work which reconstructs the recent demographic history of the Caribbean, both maritime and mainland, using genomics. Even better, it’s totally open access because it’s up on arXiv, Reconstructing the Population Genetic History of the Caribbean (please see the comments at Haldane’s Sieve as well, kicked off by little old me). Though the authors pooled a variety of data sets (e.g., HapMap, POPRES, HGDP) the focus is on the populations highlighted in the map above.

Much of the novel insight in the results begins with their observation of a distinct “Latino” population genetic cluster with strong affinities with Europe within the Caribbean populations. This is clearly visible in their ADMIXTURE analysis. What they did was pool various populations, and run a method which decomposes the ancestry of each individual as a combination of K ancestral populations. In cases where the pooled populations are clear and distinct the results will be clear and distinct. For example, if you had 50 Finns and 50 Nigerians and pooled them, and ran ADMIXTURE at K = 2, then with a non-trivial number of SNPs (10,000 is more than sufficient) all the Finns and Nigerians will partition into two distinct ancestral populations according to these sorts of model based clustering. But it always has to be remembered that though these methods map onto reality, and give us some sense of the variation within the data sets, the K’s themselves are artificial constructs. So, for example, the HGDP Maya population is known to have non-trivial European gene flow. If you use this sort of Maya population as your “Native American” reference, then you will underestimate Native ancestry in admixed groups because your reference Native population is already skewed toward Europeans (this is obviously a major problem when you don’t have the appropriate reference because it is extinct, such as with the Taino).

With those cautionary preliminaries out of the way what’s going on in these results? As you can see many of the Caribbean populations are straightforward combinations of various continental ‘parent’ populations. This is clearly evident in K = 3, where green = Africa, red = European, and blue = Native (note that the Maya have a range of European ancestry just as I said). By looking at individual variation within populations you can already gain some insights as to the nature of the admixture. In Mexico there is a wide range of the European vs. Native fraction, though in this data set there are no “pure” individuals. Additionally, there are low, but relatively even, amounts of African ancestry across the population. Though African consciousness this is not a major element of modern Mexican national identity, people of African ancestry were a major part of the Spanish colonial enterprise (see Empire: How Spain Became a World Power, 1492-1763). In some areas, such as Veracruz, people of visibly African ancestry remain, but in much of Mexico these individuals intermarried and their physical characteristics were diluted toward the point of not being visible.

The situation in the maritime Caribbean is somewhat more complex. In these contexts it was the Native, not African, ancestry which was subsumed and submerged. It is genomics which has ‘rediscovered’ this ancestry, to the extent that many scholars had previously been skeptical of the possibility that modern Puerto Ricans and Dominicans inherited a substantial share of Taino ancestry. In both Puerto Rico and the Dominican Republic the relevant issue is that there is a wide range of proportion of African and European ancestry, with Cuba being the notable extreme case of this phenomenon. What’s going on with Cuba in particular is that there were late waves of migration from Spain, so some modern white Cubans are much less affected by admixture than other Caribbeans (remember that Cuba was part of Spain until 1898). In Haiti the situation is reversed, where the revolutions of the late 18th and early 19th centuries had a racial tinge, and whites were expelled (leaving a small mulatto class).

But it is K = 8 where things really get interesting. The black component is a European Iberian-like element which is distinct to Latino populations (including Maya). As you can see on this PCA the Latino element is related to the Iberian populations, as they took the European segments from the Caribbean populations and used them to flesh out the distribution in ancestry. There are several ways to interpret this. Dienekes suggested this might simply be a function of the source Iberian populations hundreds of years ago being somewhat different from the contemporary ones. For example, obviously contemporary Spaniards would be more subject to gene flow with other Europeans >1600 than their New World cousins. Another possibility is that there was extreme sampling from a particular region of Spain, and that has how broken out as its own cluster. For example, I know that a disproportionate number of migrants were from Andalucia and Extramadura. But the pattern here doesn’t suggest to me that possibility (the black dots should be more south-shifted I would think if they were from those two provinces).

Rather, the interpretation they seem to favor is that this element has been drifted away from the ancestral populations due to a bottleneck. This is not ethnographically implausible; the early years of the Spanish colonial experiment was characterized by de facto polygyny. Many adventurers lived lives not unlike those of the white grandees of the East India company in the late 18th century. Some have argued that this period of ubiquitous common law polygyny has influenced the fact that illegitimate births have traditionally been very common in Latin America. One reason the authors favor the bottleneck model is that the genetic distance between the Latino element and the Iberian one is rather high. This is often common in situations where drift/bottleneck has deviated allele frequencies particularly rapidly. Not only that, but the tendency is most strong in maritime Latin America, many of whose islands received relatively fewer subsequent migrants than the large and expansive mainland viceroyalties.

23andMe ancestry decomposition for friend who is 1/4 Asian

Another way the authors explored the demographic history was to look at the length distribution of the tracts of ancestries. How this works is simple. A first generation hybrid will have unbroken lengths of ancestry each parent, but subsequent generations will start to have fragmentation occur as recombination breaks apart long blocks identical by descent. You can see this in the figure to the left, where my friend who has one Asian grandparent has blocks of alternating European and Asian ancestry because of meiotic recombination events. The longer from the time of admixture the smaller and smaller the blocks will become, as recombination slices apart long blocks and recombines ancestral components. By looking at the distribution and mix of lengths the authors can construct demographic histories of the populations. In short it looks like much of the European ancestry came in one short quick pulse, rather early on in settlement. This is in keeping with the high reproductive output attested for European males thanks to polygyny during this period.

The same method was performed for the African ancestry, and the authors discovered an intriguing result. It seems that in the early years most of the Caribbean black slaves were derived from the western tip of Sub-Saharan Africa, from the Senegal river down to modern Ghana. Later on the longer tracts show affinities with populations further east, from the Bight of Benin toward the Equator. I don’t know the history of slavery well enough to confirm or deny the reality of this finding, but it illustrates the power of genomics combined with wide sampling strategies. More relevantly I suspect genomics’ role will be to assign magnitudes to known dynamics.

Finally, the authors also inferred diverse relationships for the Native admixture in the Caribbean populations. They confirmed some evidence of south-to-north migration into Central and Caribbean America, and also specific ethno-linguistic associations between now de facto extinct Caribbean populations and those of mainland South America. Some of these results have long been suggested, but lack of historical documentation makes inferences shadowy. Genomics can not resolve these debates, but they shed light upon them.

Overall this is an interesting study because I think it is a test run at the sort of historical-demographic questions that genomics will be used for. There has long been a ‘genetics as a tool’ school of thought among many ecologists and phylogeneticists, and now you shall have a ‘genomics as a tool’ to sit right along side that in many more diverse fields. Caribbean and Latin American populations are the low hanging fruit, because the Spanish and Portuguese colonial experiment are reasonably well attested, and the source populations are very distinct (so easy to pick signal out of the noise). But there are other historical questions of the same period which are also of interest. In Albion’s Seed David Hackett Fisher describes four Anglo-American folkways which contributed to the culture of this nation. Of these, ~20,000 Puritans arrived between 1620-1640 and became the ancestors of ~700,000 by 1970. Though 20,000 is not quite a bottleneck (in fact, they arrived from different sectors of England), I am curious if these individuals, a segment of “Old Americans,” can still be discerned in the genomic data. This is just one of many possible questions which will be with reach of answer in the near future….

Citation: arXiv:1306.0558
(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"