The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Human Genomics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Screenshot - 090614 - 19:56:13

Screenshot - 100614 - 01:23:00 David Reich’s talk at SMBE 2014 has come and gone, and it seems like from the reports on Twitter that it was a synthesis of the results in their bioRxiv preprint from last fall, Ancient human genomes suggest three ancestral populations for present-day Europeans, and the ancient DNA samples from Samara in Russia. The major takeaway being that genetically modern Europeans are by and large an admixture between three very distinct population groups, which fused together only during the Holocene (last ~10,000 years). A stylized variant of the model is represented in the figure I’ve taken from the bioRxiv preprint.

But a question that’s nagged me is how realistically to take the proposition that some of these nodes are genuinely distinct populations separated by barriers to gene flow, as opposed to being part of a broader continuum of genetic variation? For example, populations separated by water barriers such as those of Sahul almost certainly exhibited enough attenuated gene flow so that drift could work to shift their allele frequencies away from the populations of Sundaland. On the other hand, it seems reasonable to me that genetic variation on the broad plain from western Europe to the Urals in northern Europe may have been mostly clinal, with each population exchanging genes with the next, from the Atlantic to the fringes of Siberia. Some have argued that the Paleo-Siberian population which has been termed “Ancient North Eurasian” (ANE) is only part of a cline across Eurasia which extends out toward the European hunter-gatherers (to be clear, I’m skeptical of this because the genetic distance seems too great, but who knows how rapidly genetic distance increased as a function of distance in the Pleistocene?). On the other hand one might posit regions of extremely low population density during the Pleistocene due to inclement conditions in many regions so that various ancestral groups may have been isolated enough to drift apart due to more conventional genetic isolation (for example, it seems to me that the ancestors of groups such as the Han Chinese have been isolated from western Eurasians for ~40,000 years, unless you count relatively recent fusions such as the Uygurs and the peoples of Turkestan more broadly).

And yet some discussions I’ve had recently (on Twitter) have made me clarify my thoughts and admit that for some purposes it really doesn’t matter whether ANE was part of a genetic continuum or not in relation to European hunter-gatherers. The reason is that I believe that the human past was characterized by many powerful demographic sweeps which we are beginning to comprehend due to the power of ancient DNA. If the expansions occur from specific narrow geographic zones, and overwhelm a huge area adjacent, then whether the genetic variation is characterized by clines or not is irrelevant, as it will look like a discontinuous replacement in regions far from the core point of origination.

This brings me to a major update in my own personal views on these sorts of dynamics. I recently read Richersen, Boyd, and Heinrich’s Gene-culture coevolution in the age of genomics. It’s a good overview of the intersection of the fields of cultural evolution and genomics, but too often it struck me that the authors were keen on ascertaining how genomics could illuminate problems in cultural evolution, without considering the converse. That is, what can our understanding of cultural evolutionary process tell us as to what patterns of genomic variation we should see around us? Modern human genomics has a surfeit of data, and population genetic theoretical machinery of yore is being drafted to hammer away at the massive rich empirical seams, but in the domain of paleodemography a model of culture is probably more informative in allowing us to gain an expectation of the distribution of dynamics. More concretely cultural and economic factors are clearly critical in understanding why a few nations of western Europe* entered into massive settlement of the New World after 1492, and others did not. Obviously we’ll never have historical records from 50,000 years in the past, but a better understanding of the processes of cultural evolution might allow us to judge whether rapid archaeological transitions signal demographic shifts, or not. And these then might serve as an interpretative framework for genomic results.

* I specify western Europe because the genetic distances here are small, and the major settler nations, the British and Iberians, are not particularly clustered together.

• Category: Science • Tags: Ancient DNA, Europeans, Genomics, Human Genomics 
🔊 Listen RSS

Citation: Towards a new history and geography of human genes informed by ancient DNA, Joseph Pickrell, David Reich, doi: 10.1101/003517

440px-PazuzuDemonAssyria1stMilleniumBCEThere were giants in the earth in those days; and also after that, when the sons of God came in unto the daughters of men, and they bare children to them, the same became mighty men which were of old, men of renown.

– Genesis 6:4

Joe Pickrell and David Reich have put up a preprint at BioRxiv, Towards a new history and geography of human genes informed by ancient DNA. Since it’s a preprint at BioRxiv you can 1) read it for free 2) comment on it. It is a magesterial review of “where we are,” though close readers of this weblog may not find much that is new in their survey of the empirical results which are coming out of human population genomics and ancient DNA analysis. In regards to this let me highlight two sentences. First, it is now clear that long-range migration, admixture and population replacement have been the rule rather than the exception in human history. Second, the serial founder effect model is no longer a reasonable null hypothesis for modeling the ancient spread of anatomically modern humans around the globe. For the second I’m thinking in particular of Sohini Ramanchandran’s 2005 paper, Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa, though the model is older than that obviously, as is made clear in the acknowledgments. For the massive ground that the paper covers when it comes to the latest findings it is highly concise, and I commend it to anyone wishing to dive into this exploding literature. Pickrell & Reich show how the analysis of dense marker data sets with more powerful techniques has allowed for the teasing apart of the interlaced layers of the historical genetic palimpsest. But, the complement to this has been the development of the field of paleogenomics, which allows for the explicit analysis of ancient genomes. Another section of the preprint touches upon the technological changes which are allowing for more and more DNA analysis of ancient samples. In particular they point out that rather than focusing on sequencing very rare pristine remains the near future may be in looking at known SNPs on a larger number of samples, because the technical challenges for such typing are far lower.

Credit: Maulucioni

Credit: Maulucioni, Haplogroup R

The preprint is focused on the genomic aspects of this research because the authors are statistical geneticists, but it does not hesitate in offering up a host of historical and archaeological hypothesis which might be tested in the next few years. Also, they do not take a definitive position on the role of long distance migration and punctuated admixture events, as opposed to more continuous gene flow (though the methods which analyze contemporary populations seem to be better at detecting the former). So I will hazard a general model. It seems that root of what is driving these demographic changes are cultural changes. And cultural changes over the past ~30,000 years have been very fast and punctuated, and have accelerated. To given an example, the cultural chasm between a Egyptian in 500 AD as opposed to one in 500 BC would be far greater than that between two that lived in 500 BC and 1500 BC. Whether the word “revolution” is necessary for cultural adaptations such as the acquisition of agriculture, it seems clear that these were shifts in lifestyle which radically changed the local human demographics, as some populations entered into a phase of rapid population expansion in a condition of land surplus (e.g., farmers can extract many more calories per unit of land than hunter-gatherers, so the first farmers invariably encounter massive land surplus and operate at the higher boundary of productivity). Basically Peter Bellwood’s model in First Farmers captures many of the broad features of what occurred in the Holocene to produce ubiquitous admixture we see in the map at the top of this post (the methods pick up the strongest signals, and so usually underestimate admixture). Small group of individuals acquired a cultural adaptations which resulted in a winner-take-all scenarios of demographic expansions until a new equilibrium was attained repeatedly over the past 20,000, and especially 10,000, years. These the primary layers in the palimpsests that geneticists are teasing apart. Additionally, I will add the proviso that I suspect these long distance leapfrogs often became strongly male-biased in the genetic signal. It would be totally unsurprising to me if haplogroup R has its origins in the North Eurasian population which has left a legacy in Native Americans and Europeans.

Archaeologists and historians are going to be reluctant to shift from a dominant position which is skeptical of migrationism. Part of this is due to an ideological bias which emerged after World War 2. It is also simply the fact that the statistical methods employed by the newest batch of researchers are abstruse and difficult for outsiders to decrypt (though I find the methods in Ancient Admixture in Human History comprehensible after a close reading, so it’s not impossible). But archaeologists and historians are essential in constructing plausible models which can explain the genetic patterns we see around us. The motive engine for these changes are cultural phenomena, and cultural researchers are the ones who can shed the most light on the possibilities.

• Category: Science • Tags: Ancient DNA, Human Genomics 
🔊 Listen RSS
Father of Lies

Father of Lies

More often than not the discipline of history seems to swing between the true and trivial (or perhaps more precisely, picayune), and grand narratives which emphasize a nearly fictionalized story. In some ways this is not entirely a problem. When teaching young children the history of the United States a punctilious adherence to fact is essential, but, one can not deny that the selection of topicality can sway and shade the direction of the lessons learned. But far too often this ideological element of the historical narrative determines the central focus, rather than floating along the margins. With erudite command of detail historical scholars can, if they so choose, engage in a game of ideological sophistry, cultural flattery, and underhanded polemic. Both Howard Zinn and David Barton were and are players at this game. But there are still those who engage in the Sisyphean task of perceiving the world as it is, not as we would wish it to be, through the dark glass. Such a colossal enterprise, to ascertain the objective character of an exceedingly complex phenomenon, requires every tool at hand. Historians have traditionally been hunters of musty texts in neglected libraries, but they have on many an occasion received auxiliary data from scholars working in more material domains, such as archaeologists and engineers. Today you must add geneticists to the growing brigade of scholars attempting to excavate the past.

It is known

It is known

In truth the power of genetics is most evident and necessary in areas where history is silent, before written records can build a narrative skeleton in which we can play. Using both modern and ancient DNA samples the geneticists, working with archaeologists, can still make vague inferences where before there was only darkness. But illumination can be had even in time periods when historical records are quite good. Though the public understands evolution to transpire over eons, basic population genetic processes occur over a matter of generations, and so can give us fresh insight into dynamics which played out quite recently in time. A new paper in PLoS GENETICS, Reconstructing the Population Genetic History of the Caribbean, does just this. Obviously we already have a history of the Caribbean. As every schoolboy knows it began in 1492, and proceeded across the centuries as a palimpsest of European colonial powers, and later independent nations, rose and fell. But history is more than just wars, international congresses, and once-in-a-generation discoveries. It is the ebbing and flowing of peoples themselves in their aggregate masses. Conventional textual narratives and coarse archaeological inferences can get us rather far. See Charles C. Mann’s magisterial 1493 for an example. But historical population genetics goes a step further, as it attempts to infer demography through patterns of variation in genes, the most elemental instrumental variable for tracing demographic patterns one might imagine.

journal.pgen.1003925.g004 What the above paper does is reiterate, emphasize, and clarify, particular population genetic demographic events which have been suspected. First, the Amerindian populations were not static creatures in equilibrium with nature, but dynamic. There is clear evidence in these results that some groups migrated from South America to Central America, and especially the Caribbean. This is not unreasonable a priori, but far too often our stylized models presume the Amerindian population as a homogeneous, uniform, almost ahistorical substrate upon which European agency and African tragedy can unfold. But on the contrary, the peoples of the New World had their own history, oral as it may be. As you can see the Maya, one of the most iconic of Amerindian peoples, seem to exhibit some southern affinities, perhaps the result of an ancient “back migration.” If the Old World is any guide there may have been many forward and back migrations.

journal.pgen.1003925.g004 This ancient legacy is evident in the admixed populations, the Mestizos, Zambos, and Mulattos of the Greater Caribbean region. Looking in particular at the Puerto Rican and Dominican populations you see low, but significant, levels of admixture from specific native groups. One the one hand you may not be surprised, but it must be stated that before the genetic evidence there was much skepticism as to whether any Amerindian genetic heritage persisted in the populations of the Caribbean. A particular style of cultural/humanistic scholar intuited that perhaps an emphasis on indigenous ancestry was a mechanism for people of some African ancestry to deflect attention away from this aspect of their heritage because of the fraught history of slavery. Though the internal logic here seems reasonable, the empirical evidence makes it clear that the legacy of the Amerindians does persist in these islands, among these peoples. Their motive may have been unpalatable, but their argument was right.

Who were these Amerindian people? And how did they become integrated with the synthetic populations which came to dominate these islands? This is where textual history and genetics operate in a complimentary fashion. Both history and ethnography document mass population collapse in concert with an androcide of the Amerindians. By this, I mean that European males took Amerindian women as concubines, and engaged in de facto polygyny in the New World. Hernan Cortes, conqueror of the Aztecs, illustrates his phenomenon, as he had an illegitimate son, Martin Cortes, with his native translator, and later on a legitimate son, another Martin Cortes, with a Spanish noblewoman. This pattern of sexual liberty and license was common in the early years, and has been extensively documented by historians. It is a reason many anthropologists give for the relatively low rates of legitimacy in much of Latin America. And of course what applied to Amerindian females also applied to African females. What the genetics makes clear is that this asymmetric pattern of cultural power relations was demographically very significant. Populations with a near total lack of Amerindian and African Y chromosomal lineages, passed from father to son, may still have high levels of non-European mtDNA, passed from mother to daughter. In this study they also looked at the X chromosome, which spends 2/3 of its time in females, and did find an enrichment of Amerindian ancestry there as well.

Tract But they didn’t just focus on the nature of admixture today, they inferred its history. The technique is rooted in basic concepts in genetics. When you have chromosomes come together from parents in a child, those are distinct and identical in nature to segments of ancestry one might find in parents. But genetic recombination in the next generation shuffles the segments, so that parental elements become mixed together on the same segment. When parents are from different geographic populations you see alternative segments of “ancestry tracts.” For example a chromosomal segment with alternative regions of European, Amerindian, and African, ancestry. Because there are only 20-30 recombination events per generation per individual the distribution of the length of these tracts is a function of the length of time since admixture. The early years after admixture will be characterized by long blocks of ancestry from one population, alternated with another. As time passes the segments will get smaller, and alternate much more rapidly. What the authors found was that indeed Amerindian segments exhibited the latter pattern, while European and African segments were more diverse in their distribution. The distinction was strongest in the Caribbean populations, but was evident elsewhere. The explanation is the one above. The early years of Iberian settlement were characterized by de facto polygyny and decimation of male Amerindians through enslavement (though there was population collapse more generally due to disease). Amerindian ancestry came in one singular pulse, and slowly dissipated and distributed itself through the population.

Finally, the results here also yield the finding that Latin American European ancestry seems to have diverged from its parent source. A detailed exploration of the technical issues can be found at the Haldane’s Sieve weblog, but I will say I am convinced that the authors have made a good, if not definitive, case for the proposition that the Latin American ancestral component is one which has diverged significantly. Again, the reason was listed above: de facto polygyny. This drives down the effective population, increases the drift, and skews the allele frequency distribution rapidly away from the source population. If this is a true result it shows us the possibilities for how new populations can arise through fission and rapid expansion. In particular, they may be male mediated. For this period, from 1500-1900, we have extensive documentation to corroborate the broad inferences made. But not so for many regions deep into the past. What these sorts of papers illustrate is the fine-grained power of genetics in shedding light on topics and issues which might otherwise have remained off limits. In particular genetics taps into some of the most primal activities of humankind, those that lead to procreation.

Citation: Moreno-Estrada A, Gravel S, Zakharia F, McCauley JL, Byrnes JK, et al. (2013) Reconstructing the Population Genetic History of the Caribbean. PLoS Genet 9(11): e1003925. doi:10.1371/journal.pgen.1003925

🔊 Listen RSS

Credit: Dbachmann

Credit: Dbachmann

Update: Turns out “Maria” is also an ethnic Roma.

There was a recent case in Ireland of a young Roma girl who was blonde haired and blue eyed being removed from her home, on the suspicion that she was not in fact the biological child of the presumed parents (who, like most Roma, are reportedly of dark complexion, hair, and eye). I even saw a report that a hospital was consulted on the probability of such an outcome, and they said it would be “extremely unusual”. It turns out that DNA tests confirmed that this girl was the biological child of the putative parents. And of course all this has be understood in light of the case of “Maria” in Greece; a little blonde girl who turned out not to be the biological child of the two Roma who claimed her as their daughter (it looks like there was welfare fraud in that case).

My initial response to the Irish case was that consultant should be fired, because in an admixed population like the Roma it shouldn’t be that unusual to have offspring who deviate a great deal from the parental phenotype. This prompted some interesting reactions. First, there were those who seem blissfully ignorant of the fact that the Roma are an admixed population. That’s easy enough to resolve, as there have been scientific papers published on this issue using genome-wide data. Second, there are claims that very small fraction of Roma have blonde hair and blue eyes (on the order of less than 1%). The latter may be a defensible claim, though not indisputably so.

Before we move on I have to clarify that there is a distinction between “Roma” and “Romani.” The latter refers broadly to the populations across Europe which were referred to as “Gypsy,” while the former denotes a set of populations with a center of distribution in Southeast Europe, in particular in the Balkans. In much of Northern and Western Europe there are now two populations of Romani with very distinct histories (and genetics): the Roma who have recently arrived from Southeast Europe, and the various non-Roma groups who have a very long history in their nations of residence (e.g., Finnish Kale).

In terms of various traits we know a fair amount about the genetics of pigmentation in humans. Though the fine grained individual predictive models are coarse, most of the genes which have large effects on population-scale differences are now well characterized. This allows me to produce a model which is reasonably plausible to give you an intuition for why brown-skinned populations can generate a wide range of outcomes in realized phenotype.

Imagine five loci rank-ordered in effect size, gene 1, gene 2, gene 3, gene 4, and gene 5. Each gene comes in two flavors, two alleles. One is a “dark” allele (produces dark pigmentation) and another is a “light” allele. From these you can have a distribution of complexion which is referred to as a “melanin index” (it’s dependent on reflectance). Imagine that you assume each allele at each gene exhibits a melanin index value like so in relation to the aggregate:

Gene 1 = 30, 2
Gene 2 = 15, 1
Gene 3 = 10, 1
Gene 4 = 5, 2
Gene 5 = 5, 0


What you see above are potential genotypes (all heterozygote implicitly), with their phenotypic values being the sum of the two. One allele at gene 1 contributes 30 melanin units, and the other 2. And so on. Taking the “dark” alleles and assume they’re all homozygote (so doubling them), you get a maximal potential value of 130, and a minimal one of 6 if you make the “light” ones homozygote. But of course in most cases you’ll get a combination. But what would be the outcome for a given set of frequencies? Since I’m lazy I ran a simulation. I set the frequencies of the dark allele for each each like so:

Gene 1 = 60%
Gene 2 = 45%
Gene 3 = 35%
Gene 4 = 46%
Gene 5 = 50%

Then I generated 10,000 multilocus genotypes, and added a “noise” parameter so that the trait wasn’t totally determined by the genes. This is why the phenotypic value can be higher (and lower, though that bound can go no further than ~0) than what genotype would predict. Here’s the distribution:


The mean value is 73. The 25th percentile is 55. 1 out of 26 individuals should have an exclusively “light” genotype across all five genes. The point is that in a polygenic character if you have polymorphism on the genotypic level you’re likely to have it on the phenotypic level.

roma2 The second major question is is this even plausible for Roma? Yes. They’ve very admixed. Two recent papers make the case definitively, Reconstructing Roma History from Genome-Wide Data and Reconstructing the Population History of European Romani from Genome-wide Data. These papers used tens of thousands to hundreds of thousands of markers. You can see in the bar plot to the left that the Roma have much higher European-like ancestry proportions than other Indians. It is likely their parental population is Punjabi-like, so it seems that they’re ~50% non-Indian in admixture. The second paper offers up a wider population set for comparison, and it suggests that the Roma did not experience much gene flow with Middle Eastern groups (there are still Roma-related populations in the Middle East, the Dom). Rather, their primary phase of admixture occurred ~1,000 years ago in the Balkans.

Reconstructing the Population History of European Romani from Genome-wide Data has a wide range of Romani populations, and it seems evident that the Western and Northern Romani have more European admixture than the Balkan Roma. It turns out that the Welsh Romani seem totally Europanized in their genome.That is, they’re basically now a Northern European population, perhaps with some residual South Asian ancestry. Because these Romani originally spoke an Indo-Aryan language it seems that they are genuine Romani in a cultural sense. The Welsh Romani have simply undergone enough gene flow with the surrounding population over the past hundreds of years to lose their genetic distinctiveness.

second You can see a broader population wide comparison in this bar plot. European populations are at the top, and below them are the Romani groups. The South Asian admixture is again evident, but observe the paucity of both of the Middle Eastern components (you can label them “Northern/Caucasian” or “Southern/Arabian” for convenience; they show up repeatedly in Admixture analyses). The authors of the second paper linked above make much of this, but I would be cautious. I would have preferred that they run Admixture in supervised mode, or perhaps used a formal test of admixture (e.g., D-statistic). But, it is strongly suggestive of the possibility that the Roma sojourn in the Middle East was rather short, and that the true ethnogenesis of the group occurred in the Balkans primarily. And, as I said earlier, the European genetic character of Welsh Romani is pretty obvious in this plot (they cluster with Europeans in the PCA as well).

But, despite the Romani history of admixture in Europe, some of them are genetically very isolated now, and have been for hundreds of years. This seems the case of the Roma, who have had surprisingly little admixtures since the initial settlement. There’s widespread evidence of inbreeding and founder effect across the Romani populations as well, making them both admixed and very distinct. You see long runs of homozygosity, and the clustering bar plots tend to “break out” the Roma rather early on in the steps up the number of populations, similar to what you see in groups such as the Kalash. I believe one of the problems with adducing phylogenetic relationships of the Romani with Y and mtDNA markers was simply that bottleneck effects are more powerful for uniparental lines, and they were buffeted more by the small population size. In sum, when it comes to Roma genetic variation there are a few things to keep in mind:

1) South Asian source

2) Admixture with Southeastern Europeans

3) Long period of relatively genetic continuity and isolation after the initial phase

4) Genetic homogeneity within the groups. That is, they’re well admixed across most individuals

5) Lots of novel genetic uniqueness because of high drift rate because of small effective population size

• Category: Science • Tags: Human Genomics, Roma, Romani 
🔊 Listen RSS

Likely an individual with derived allele on KITL locus (Credit: David Shankbone)

An individual polymorphic on the KITL locus? (Credit: David Shankbone)

Pigmentation is one of the few complex traits in the post-genomic era which has been amenable to nearly total characterization. The reason for this is clear in hindsight. As far back as the 1950s (see The Genetics of Human Populations) there were inferences made using human pedigrees which suggested that normal human variation on this trait was controlled by fewer than ten genes of large effect. In other words, it was a polygenic character, but not highly so. This means that the alleles which control the variation are going to have reasonably large response, and be well within the power of statistical genetic techniques to capture their effect.

I should be careful about being flip on this issue. As recently as the mid aughts (see Mutants) the details of this trait were not entirely understood. Today the nature of inheritance in various populations is well understood, and a substantial proportion of the evolutionary history is also known to a reasonable clarity as far as these things go. The 50,000 foot perspective is this: we lost our fur millions of years ago, and developed dark skin, and many of us lost our pigmentation after we left Africa ~50,000 years ago (in fact, it seems likely that hominins in the northern latitudes were always diverse in their pigmentation)

A new paper in Cell sheds some further light on the fine-grained details which might be the outcome of this process. Being a Cell paper there is a lot of neat molecular technique to elucidate the mechanistic pathways. But I will gloss over that, because it is neither my forte nor my focus. A summary of the paper is that it shows that p53, a relatively well known tumor suppressor gene, seems to have an interaction with a response element (the gene product binds in many regions, it is a transcription factor) around the KITLG locus. This locus is well known in part because it has been implicated in pigment variation in human and fish. So KITLG is one of the generalized pigmentation pathways which spans metazoans. There are derived variants in both Europeans and East Asians which are correlated with lighter skin, though there is polymorphism in both cases (it has not swept to fixation).

The wages of adaptation? (Credit: Hoggarazzi Photography)

The wages of adaptation? (Credit: Hoggarazzi Photography)

But this is a Cell paper, so there has to be a more concrete and practical angle than just evolution. And there is. It turns out that a single nucleotide polymorphism mutation in the p53 response element results in a tendency toward upregulation of KITLG and male germ line proliferation. The latter matters when it comes to tumor genesis, and in particular testicular cancer. This form of cancer is one where there doesn’t seem to be a somatic cell mutation of p53 itself. Additionally, the authors observe that testicular cancer manifests at a 4-5 fold greater rate in people of European descent than African Americans. And, presumably the upregulation of KITLG is somehow related to increased melanin production. The authors posit that because of lighter skin in Europeans due to selection at other loci there has been a balancing effect at KITLG (increased tanning response). There is evidence of selection at this locus (a long haplotype and increased homozygosity), so this is not an unreasonable conjecture, though the high frequency of loss of function alleles suggests that the model is likely complex.

I don’t know if this particular story is correct in its details (though I am intrigued that variation in KITLG is associated with cancer in other organisms). But it illustrates one of the possible consequences of rapid evolutionary change due to human migration out of Africa: deleterious side effects because of pleiotropy. In other words, as you tinker with the genomic architecture of a population you are going to have to accept tradeoffs as you are optimizing one aspect of function. Genes don’t have just one consequence, but are embedded in myriad pathways. Over time evolutionary theory predicts a slow re-balancing, as modifier genes arise to mask the deleterious side effects. But until then, we will bear the burdens of adaptation as best as we can.

Citation: Zeron-Medina, Jorge, et al. “A Polymorphic p53 Response Element in KIT Ligand Influences Cancer Risk and Has Undergone Natural Selection.” Cell 155.2 (2013): 410-422.

🔊 Listen RSS


Credit: Aviok

“Think not that I am come to send peace on earth: I came not to send peace, but a sword.” -Matthew 10:34

“There were giants in the earth in those days…when the sons of God came in unto the daughters of men, and they bare children to them, the same became mighty men which were of old, men of renown.” -Genesis 6:4

Seven years ago I wrote a short post, Why patriarchy?, which attempted to present a concise explanation for the ubiquity of what we might term patriarchy in complex societies (i.e., not “small-scale societies”). Broadly speaking my conjecture is that social and political dominance of small groups of males (proportionally) over the past several thousand years is an example of “evoked culture”. The higher population densities in agricultural societies produced a relative surfeit of accessible marginal surplus, which could be given over to supporting non-peasant classes who specialized in trade, religion, and war, all of which were connected. This new economic and cultural context served to trigger a reorganization the typical distribution of power relations of human societies because of the responses of the basic cognitive architecture of our species inherited from Paleolithic humans. Agon, or intra-specific competition, has always been part of the game on human socialization. The scaling up and channeling of this instinct in bands of males totally transformed human societies (another dynamic is elaboration of cooperative structures, though this often manifests as agonistic competition between coalitions of humans).

To get a sense of what I mean when I say transforming, consider this section of an article in The Wall Street Journal which profiles the wife of one of the 2012 New Delhi gang rape:

Some people blame the December gang rape and similar attacks in part on a collision of traditional social expectations—commonplace in rural areas—and the modernity of India’s cities, where rural migrant workers encounter the values of urbanites living by a different set of rules. During the brutal Delhi assault, for instance, the attackers accosted the woman and the young man she was with, asking why they were out together in the evening, the young man told the court.

Speaking about the events of that night, Ms. Devi says she doesn’t understand how a woman could be out for the evening with a man who wasn’t her husband.

The normalcy of this sort of ‘mate guarding’ is taken for granted in many ‘traditional’ societies. You see it reflected in the 1995 film First Knight, where King Arthur tries Lancelot and Guinevere for treason based on a kiss (dishonor to the realm). I won’t go into excessive psychoanalysis, but end by saying that the emergence of radical inequality and stratification with complex societies transformed instincts shaped in small-scale bands where petty conflicts were no doubt the norm. To my knowledge the literature from small-scale societies tends to imply a relatively more relaxed, even modern, attitude toward sexuality than one can see in world of the Eurasian Ecumene.

At this point you might be curious as to the point of reviewing this conjecture. Perhaps I will bring to the fore historical and archaeological evidence which might support this model? No. Rather, I contend that the evidence of this radical reshaping of human power structures, which led to the emergence of patriarchy as we understand it, is reflected in the phlyogenetic history of our species. Two papers illustrate the differing patterns which one sees in the maternal lineage, mtDNA, and the paternal lineage, Y chromosomes.

First, Y Chromosomes of 40% Chinese Are Descendants of Three Neolithic Super-grandfathers:

Demographic change of human populations is one of the central questions for delving into the past of human beings. To identify major population expansions related to male lineages, we sequenced 78 East Asian Y chromosomes at 3.9 Mbp of the non-recombining region (NRY), discovered >4,000 new SNPs, and identified many new clades. The relative divergence dates can be estimated much more precisely using molecular clock. We found that all the Paleolithic divergences were binary; however, three strong star-like Neolithic expansions at ~6 kya (thousand years ago) (assuming a constant substitution rate of 1e-9/bp/year) indicates that ~40% of modern Chinese are patrilineal descendants of only three super-grandfathers at that time. This observation suggests that the main patrilineal expansion in China occurred in the Neolithic Era and might be related to the development of agriculture.

Second, Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian aborigines:

Cambodia harbours a variety of aboriginal (and presumably ancient) populations that have largely been ignored in studies of genetic diversity. Here we investigate the matrilineal gene pool of 1,054 Cambodians from 14 geographic populations. Using mitochondrial whole-genome sequencing, we identify eight new mitochondrial DNA haplogroups, all of which are either newly defined basal haplogroups or basal sub-branches. Most of the new basal haplogroups have very old coalescence ages, ranging from ~55,000 to ~68,000 years, suggesting that present-day Cambodian aborigines still carry ancient genetic polymorphisms in their maternal lineages, and most of the common Cambodian haplogroups probably originated locally before expanding to the surrounding areas during prehistory. Moreover, we observe a relatively close relationship between Cambodians and populations from the Indian subcontinent, supporting the earliest costal route of migration of modern humans from Africa into mainland Southeast Asia by way of the Indian subcontinent some 60,000 years ago.

The scientific methods here are straightforward, or at least tried and tested. The main gains here are in terms of raw numbers and sequencing. Basically this is the extension of phylogeographic work which goes back 20 years, but on steroids. As such one should be cautious. The old phylogeography literature has turned out to be wrong on many of the details. But that’s OK, there’s still gold there, you just have to look.

The broad scale implication of the paper on Chinese Y chromosomal diversity is obvious. Like the Genghis Khan modal haplotype these are lineages which exhibit a ‘star-like phylogeny.’ They explode out of a common ancestor in short order, with few mutational steps. This explosion is simply a reflection of very rapid population growth. The skewed distribution of Y lineages here (i.e., three lineages representing nearly half the population) indicates to me a pattern where elite males tend to be much more fit in reproductive terms than the average male. Rapid population growth may have been correlated with a high rate of extinction of Y lineages due to “elite turnover“.


Citation: Zhang, Xiaoming, et al. “Analysis of mitochondrial genome diversity identifies new and ancient maternal lineages in Cambodian aborigines.” Nature Communications 4 (2013).

The second paper looks at mtDNA, the maternal line. There are some specific results which are interesting. In line with Joe Pickrell’s TreeMix results it does look like Cambodians and Indians share deep ancestry dating to the Paleolithic. The PCA to the left shows the relationships of populations in relation to their haplogroups, and one clear finding is that Cambodians tend to cluster with Indians, and not Northeast Asians. This result is not unsurprising. As I’ve noted before on mtDNA lineages South Asians are closer to East Eurasians than they are to West Eurasians. The result for the Y chromosomes is inverted, while autosomes are somewhere in the middle. In addition the results above show that South Chinese Han mtDNA tend to occupy the same part of the plot as the Dai, who are related to the Thai people of Southeast Asia. In contrast the few North Chinese Han tend to cluster with Tibetans and Altaics. Could Sinicization have been male mediated? There’s been circumstantial ethnographic evidence which points to this (e.g., some Cantonese marriage practices may reflect assimilation of Dai women).

The big picture result to me is that it illustrates the discordance between migration patterns of males and females over the past 10,000 years due to the rise of agriculture and its offspring, patriarchy. I hold that there was no hunter-gatherer Genghis Khan. Such a reproductively prolific male, worthy of an elephant seal, is only feasible with the cultural and technological accoutrements of civilization. ~20,000 years ago Temujin may have had to be satisfied with being the big man in a small clan. Thanks to various ideological and military advancements by the year 1200 AD you saw the rise to power of a man who could realistically assert that he was a ‘world conqueror.’


Credit: Brocken Inaglory

Of course I do not believe that the world before agriculture was static. On the contrary the Chinese Y chromosomal paper reports an inferred pattern of lineage extinction which is regular and consistent. But civilization escalated the magnitude of genocide, and in particular androcide of the losers in the games of power. The relative continuity of mtDNA across vast swaths of southern Eurasia is a testament to the fact that the lineages of the ‘first women’ still persists down among the settled agricultural peoples, whose genomes have been reshaped by untold sequences of conquests and assimilations. While female mediated gene flow can be imagined to be constant, continuous, and localized, I believe that male mediated gene flow has a more punctuated pattern. It explodes due to cultural and social innovations, such as the horse or Islam, and long standing Y chromosomal variation which has emerged since the last wave of conquerors is wiped away in a single fell swoop. Obviously this has an effect on the total genome, and I suspect that in some cases repeated male mediated expansions have resulted in striking discordances between the autosomal and mtDNA lineages. You see this in Argentina, where Native American mtDNA seems to persist to a higher degree than autosomal ancestry because of male skew of European migration. And it looks to be the case in Cambodia, where non-North East Asian autosomal ancestry seems to be present a lower fraction than the equivalent mtDNA.

With the rise of ubiquitous genomic typing and sequencing the geographical coverage will be fine grained enough the broad patterns, and specific details, will become clear. Then we will finally be able to understand if the societies fueled by grain truly ushered in the age of the domination of the many by the few. How easily does a scythe become a sword?

🔊 Listen RSS

Have no fear

There has been a lot of attention to Erika Check Hayden’s piece Ethics: Taboo genetics, at least judging by people commenting on my Facebook feed. In some ways this is not an incredibly empirically grounded argument, because the biological basis of complex traits is going to be rather difficult to untangle on a gene-by-gene basis. In other words, this isn’t a clear and present “concern.” The heritability of many behavioral traits has long been known. This is not revolutionary, though for cultural reasons may well educated people are totally surprised when confronted with data that many traits, such as intelligence and personality, have robust heritabilities* (the proportion of trait variation explained by variation in genes across the population). The literature reviewed in The Nurture Assumption makes clear that a surprising proportion of contribution any parents make to their offspring is through their genetic composition, and not their modeled example. You wouldn’t know this if you read someone like Brian Palmer of Slate, who seems to be getting paid to reaffirm the biases of the current age among the smart set (pretty much every single one of his pieces that touch upon genetics is larded with phrases which could have been written by a software program designed to sooth the concerns of the cultural Zeitgeist). But the new genomics is confirming the broad outlines of the findings from behavior genetics. There’s nothing really to see there. The bigger issue of any interest is normative; the values we hold dear as a culture.

For example:

Chabris says that the work can actually contribute to greater social mobility — for instance, by helping to identify preschoolers who could be helped by more intensive early childhood education. “The fact that people in the past interpreted the results in a certain way doesn’t mean that it shouldn’t be studied,” he says. But not everyone buys that potential misuses of the information can be divorced from gathering it. Anthropologist Anne Buchanan at Pennsylvania State University in University Park wrote on the blog The Mermaid’s Tale that rather than being purely academic and detached, such studies are “dangerously immoral”.

Of course John Horgan reiterates his call for race and IQ research to be banned. To some extent this reminds me of Patricia Churchland’s account of being verbally attacked by an anthropologist in an elevator as a “reductionist.” These are matters of morality, and reflect quasi-religious sensibilities. The science is secondary.

But there’s a major problem when you have norms and facts operating at cross-purposes: the facts are ultimately always there, invariant, and true. Banning research is totally a short-term step, because it isn’t as if the United States, with its particular set of values, has a monopoly on research. Patricia Churchland’s work which reduces human consciousness to a totally natural process would not get funded in Saudi Arabia, or by the Vatican, but that’s irrelevant because it will get funded in the Western world. Similarly, the cultural Left taboos which are very strong in Western academia are far weaker in Asia. Assuming that economic development proceeds apace, someone will do the research, and it will be published. If the facts of the world are as you’d always assumed, you have nothing really to fear.

* I think human psychology is complicated enough though that on some level people do understand the importance of genes. Look at who they choose to reproduce with.

🔊 Listen RSS

Layers and layers….

There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*

But ultimately these 10,000 feet debates are more a matter of philosophy than science. At least until the scientific questions are stripped of their controversy and an equilibrium consensus emerges. That will only occur through an accumulation of publications whose results are robust to time, and subtle enough to convince dissenters. This is why Enard et al.’s preprint, Genome wide signals of pervasive positive selection in human evolution, attracted my notice. With the emergence of genomics it has been humans first in line to be analyzed, as the best data is often found from this species, so no surprise there. Rather, what is so notable about this paper in light of the past 10 years of back and forth exploration of this topic?**

By taking a deeper and more subtle look at patterns of the variation in the human genome this group has inferred that adaptation through classic positive selection has been a pervasive feature of the human genome over the past ~100,000 years. This is not a trivial inference, because there has been a great deal of controversy as to the population genetic statistics which have been used to infer selection over the past 10 years with the arrival of genome-wide data sets (in particular, a tendency toward false positives). In fact, one group has posited that a more prominent selective force within the genome has been “background selection,” which refers to constraint upon genetic variation due to purification of numerous deleterious mutations and neighboring linked sites.

The sum totality of Enard et al. may seem abstruse, and even opaque, in terms of the method. But each element is actually rather simple and clear. The major gist is that many tests for selection within the genome focus on the differences between nonynonymous and synonymous mutational variants. The former refer to base positions in the genome which result in a change in the amino acid state, while the latter are those (see the third positions) where different bases may still produce the same amino acid. The ratio between substitutions, replacements across lineages for particular base states, at these positions is a rough measure of adaptation driven by selection on the molecular level. Changes at synonymous positions are far less constrained by negative selection, while positive selection due to an increased fitness via new phenotypes is presumed to have occurred only via nonsynonymous changes. What Enard et al. point out is that the human genome is heterogeneous in the distribution of characteristics, and focusing on these sorts of pairwise differences in classes without accounting for other confounding variables may obscure dynamics on is attempting to measure. In particular, they argue that evidence of positive selective sweeps are masked by the fact that background selection tends to be stronger in regions where synonymous mutational substitutions are more likely (i.e., they are more functionally constrained, so nonsynonymous variants will be disfavored). This results in elevated neutral diversity around regions of nonsynonymous substitutions vis-a-vis strongly constrained regions with synonymous substitutions. Once correcting for the power of background selection the authors evidence for sweeps of novel adaptive variants across the human genome, which had previous been hidden.

There are two interesting empirical findings from the 1000 Genomes data set. First, the authors find that positive selection tends to operate upon regulatory elements rather than coding sequence changes. You are probably aware that this is a major area of debate currently within the field of molecular evolutionary biology. Second, there seems to be less evidence for positive selection in Sub-Saharan Africans, or, less background selection in this population. My own hunch is that it is the former, that the demographic pulse across Eurasia, and to the New World and Australasia, naturally resulted in local adaptations as environmental conditions shifted. Though it may be that the African pathogenic environment is particularly well adapted to hominin immune systems, and so imposes a stronger cost upon novel mutations than is the case for non-Africans. So I do not dismiss the second idea out of hand.

Where this debate about the power of selection will end is anyone’s guess. Nor do I care. Rather, what’s important is getting a finer-grained map of the dynamics at work so that we may perceive reality with greater clarity. One must be cautious about extrapolating from humans (e.g., the authors point out that Drosophila genomes are richer in coding sequence proportionally). But the human results which emerge because of the coming swell of genomic data will be a useful outline for the possibilities in other organisms.

Citation: Genome wide signals of pervasive positive selection in human evolution

* The cartoon qualification is due to the fact that I am aware that selection is stochastic as well.

** Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72., Sabeti, Pardis C., et al. “Detecting recent positive selection in the human genome from haplotype structure.” Nature 419.6909 (2002): 832-837., Wang, Eric T., et al. “Global landscape of recent inferred Darwinian selection for Homo sapiens.” Proceedings of the National Academy of Sciences of the United States of America 103.1 (2006): 135-140., Williamson, Scott H., et al. “Localizing recent adaptive evolution in the human genome.” PLoS genetics 3.6 (2007): e90., Hawks, John, et al. “Recent acceleration of human adaptive evolution.” Proceedings of the National Academy of Sciences 104.52 (2007): 20753-20758., Pickrell, Joseph K., et al. “Signals of recent positive selection in a worldwide sample of human populations.” Genome research 19.5 (2009): 826-837., Hernandez, Ryan D., et al. “Classic selective sweeps were rare in recent human evolution.” Science 331.6019 (2011): 920-924.

🔊 Listen RSS

Citation: Genetic Evidence for Recent Population Mixture in India
Moorjani et al.

The Pith:In India 5,000 years ago there were the hunter-gathers. Then came the Dravidian farmers. Finally came the Indo-Aryan cattle herders.

There is a new paper out of the Reich lab, Genetic Evidence for Recent Population Mixture in India, which follows up on their seminal 2009 work, Reconstructing Indian Population History. I don’t have time right now to do justice to it, but as noted this morning in the press, it is “carefully and cautiously crafted.” Since I am not associated with the study, I do not have to be cautious and careful, so I will be frank in terms of what I think these results imply (note that confidence on many assertions below are modest). Though less crazy in a bald-faced sense than another recent result which came out of the Reich lab, this paper is arguably more explosive because of its historical and social valence in the Indian subcontinent. There has been a trend over the past few years of scholars in the humanities engaging in deconstruction and intellectual archaeology which overturns old historical orthodoxies, understandings, and leaves the historiography of a particular topic of study in a chaotic mess. From where I stand the Reich lab and its confederates are doing the same, but instead of attacking the past with cunning verbal sophistry (I’m looking at you postcolonial“theorists”), they are taking a sledge-hammer of statistical genetics and ripping apart paradigms woven together by innumerable threads. I am not sure that they even understand the depths of the havoc they’re going to unleash, but all the argumentation in the world will not stand up to science in the end, we know that.

Since the paper is not open access, let me give you the abstract first:

Most Indian groups descend from a mixture of two genetically divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners, Caucasians, and Europeans; and Ancestral South Indians (ASI) not closely related to groups outside the subcontinent. The date of mixture is unknown but has implications for understanding Indian history. We report genome-wide data from 73 groups from the Indian subcontinent and analyze linkage disequilibrium to estimate ANI-ASI mixture dates ranging from about 1,900 to 4,200 years ago. In a subset of groups, 100% of the mixture is consistent with having occurred during this period. These results show that India experienced a demographic transformation several thousand years ago, from a region in which major population mixture was common to one in which mixture even between closely related groups became rare because of a shift to endogamy.

Young Stalin

I want to highlight one aspect which is not in the abstract: the closest population to the “Ancestral North Indians”, those who contributed the West Eurasian component to modern Indian ancestry, seem to be Georgians and other Caucasians. Since Reconstructing Indian Population History many have suspected this. I want to highlight in particular two genome bloggers, Dienekes and Zack Ajmal, who’ve prefigured that particular result. But wait, there’s more! The figure which I posted at the top illustrates that it looks like Indo-European speakers were subject to two waves of admixture, while Dravidian speakers were subject to one!

The authors were cautious indeed in not engaging in excessive speculation. The term “Indo-Aryan” only shows up in the notes, not in the body of the main paper. But the historical and philological literature is references:

The dates we report have significant implications for Indian history in the sense that they document a period of demographic and cultural change in which mixture between highly differentiated populations became pervasive before it eventually became uncommon. The period of around 1,900–4,200 years BP was a time of profound change in India, characterized by the deurbanization of the Indus civilization, increasing population density in the central and downstream portions of the Gangetic system, shifts in burial practices, and the likely first appearance of Indo-European languages and Vedic religion in the subcontinent. The shift from widespread mixture to strict endogamy that we document is mirrored in ancient Indian texts. [notes removed -Razib]

How does this “deconstruct” the contemporary scholarship? Here’s an Amazon summary of a book which I read years ago, Castes of Mind: Colonialism and the Making of Modern India:

When thinking of India, it is hard not to think of caste. In academic and common parlance alike, caste has become a central symbol for India, marking it as fundamentally different from other places while expressing its essence. Nicholas Dirks argues that caste is, in fact, neither an unchanged survival of ancient India nor a single system that reflects a core cultural value. Rather than a basic expression of Indian tradition, caste is a modern phenomenon–the product of a concrete historical encounter between India and British colonial rule. Dirks does not contend that caste was invented by the British. But under British domination caste did become a single term capable of naming and above all subsuming India’s diverse forms of social identity and organization.

The argument is not totally fallacious, as some castes are almost certainly recent constructions and interpretations, with fictive origin narratives. But the deep genetic structure of Indian castes, which go back ~4,000 years in some cases, falsifies a strong form of the constructivist narrative. The case of the Vysya is highlighted in the paper as a population with deep origins in Indian history. Interestingly they seem to be a caste which has changed its own status within the hierarchy over the past few hundred years. Where the postcolonial theorists were right is that caste identity as a group in relation to other castes was somewhat flexible (e.g., Jats and Marathas in the past, Nadars today). Where they seem to have been wrong is the implicit idea that many castes were an ad hoc crystallization of individuals only bound together by common interests relatively recently in time, and in reaction to colonial pressures. Rather, it seems that the colonial experience simply rearranged pieces of the puzzle which had deep indigenous roots.

Indra, slayer of Dasas? Credit: Gnanapiti

Stepping back in time from the early modern to the ancient, the implications of this research seem straightforward, if explosive. One common theme in contemporary Western treatments of the Vedic period is to interpret narratives of ethnic conflict coded in racialized terms as metaphor. So references to markers of ethnic differences may be tropes in Vedic culture, rather than concrete pointers to ancient socio-political dynamics. The description of the enemies of the Aryans as dark skinned and snub-nosed is not a racial observation in this reading, but analogous to the stylized conflicts between the Norse gods and their less aesthetically pleasing enemies, the Frost Giants. The mien of the Frost Giants was reflective of their symbolic role in the Norse cosmogony.


What these results imply is that there was admixture between very distinct populations in the period between 0 and 2000 B.C. By distinct, I mean to imply that the last common ancestors of the “Ancestral North Indians” and “Ancestral South Indians” probably date to ~50,000 years ago. The population in the Reich data set with the lowest fraction of ANI are the Paniya (~20%). One of those with higher fractions of ANI (70%) are Kashmiri Pandits. It does not take an Orientalist with colonial motives to infer that the ancient Vedic passages which are straightforwardly interpreted in physical anthropological terms may actually refer to ethnic conflicts in concrete terms, and not symbolic ones.

Finally, the authors note that uniparental lineages (mtDNA and Y) seem to imply that the last common ancestors of the ANI with other sampled West Eurasian groups dates to ~10,000 years before the present. This leads them to suggest that the ANI may not have come from afar necessarily. That is, the “Georgian” element is a signal of a population which perhaps diverged ~10,000 years ago, during the early period of agriculture in West Asia, and occupied the marginal fringes of South Asia, as in sites such as Mehrgarh in Balochistan. A plausible framework then is that expansion of institutional complexity resulted in an expansion of the agriculture complex ~3,000 B.C., and subsequent admixture with the indigenous hunter-gatherer substrate to the east and south during this period. One of the components that Zack Ajmal finds through ADMIXTURE analysis in South Asia, with higher fractions in higher castes even in non-Brahmins in South India, he terms “Baloch,” because it is modal in that population. This fraction is also high in the Dravidian speaking Brahui people, who coexist with the Baloch. It seems plausible to me that this widespread Baloch fraction is reflective of the initial ANI-ASI admixture event. In contrast, the Baloch and Brahui have very little of the “NE Euro” fraction, which is found at low frequencies in Indo-European speakers, and especially higher castes east and south of Punjab, as well as South Indian Brahmins. I believe that this component is correlated with the second, smaller wave of admixture, which brought the Indo-European speaking Indo-Aryans to much of the subcontinent. The Dasas described in the Vedas are not ASI, but hybrid populations. The collapse of the Indus Valley civilization was an explosive event for the rest of the subcontinent, as Moorjani et al. report that all indigenous Indian populations have ANI-ASI admixture (with the exceptions of Tibeto-Burman groups).

Overall I’d say that the authors of this paper covered their bases. Though I wish them well in avoiding getting caught up in ideologically tinged debates. Their papers routinely result in at least one email to me per week, ranging from confusion to frothing-at-the-mouth.

Related: The Gift of the Gopi.

Citation: et al., Genetic Evidence for Recent Population Mixture in India, The American Journal of Human
Genetics (2013),

🔊 Listen RSS

For various reasons the idea of mitochondrial Eve and Y chromosomal Adam capture the public imagination. This frustrates many people, including me. I’ve gotten into the fatigue stage on this topic, but some sort of counter-attack is necessary against malignant memes. Even geneticists who don’t usually work with populations can get confused by the implications of mtDNA and Y chromosomal phylogenies. Melissa Wilson Sayres, who works on Y chromosomes, has a useful post (promised first of two) at Panda’s Thumb, Y and mtDNA are not Adam and Eve: Part 1. If you have friends/acquaintances who are confused by this issue, it might be a good place to start.

Much of the discussion around this topic was triggered by the recent paper in Science, Sequencing Y Chromosomes Resolves Discrepancy in Time to Common Ancestor of Males Versus Females. As Graham Coop observed on Twitter the idea of a “discrepancy” is not clear, insofar as it would not be that surprising if the last common ancestor of the extant Y chromosomal lineages existed at a different time than the last common ancestor of the mtDNA lineages. Expected coalescence is contingent upon various population genetic parameters such as effective population size, but expectations are also subject to variation in realized outcomes. And, as Sayres observes the references to the Adam & Eve analogy were present within the paper, fueling the fire. Finally, the reference to “dogma” tagged onto the end struck me as a touch too cute.

• Category: Science • Tags: Adam, Eve, Human Genetics, Human Genomics, MTDNA, Y Chromosome 
🔊 Listen RSS

The inimitable Joe Pickrell has dropped his Khoisan-are-part-Italian preprint onto arXiv, Ancient west Eurasian ancestry in southern and eastern Africa. I’m being glib in my characterization of the paper’s core conclusion, but there’s a reason for such a flip response: the inferences that he seems to draw from the genetic data strike me as verging on crazy. But that’s OK, what genetics is telling us is that history was a whole lot crazier than we had imagined.

Let’s back up for a moment here. For several decades now geneticists have assumed that the Bushmen of the Kalahari, the Khoisan-qua-Khoisan, Africa’s last hunter-gatherers who retain their ancestral language along with the Hadza, are the ur-humans. The basal lineage that first diverged from the rest of mankind at the cusp of the Out of Africa event. This is evident in Y chromosomal and mtDNA phylogenies, where the Bushmen and their kin harbor variants which coalesce deeply in time with those of others. And, a few years ago another group revealed the likelihood that Bushmen also are products of an admixture event in the last ~50,000 years with a distinct hominin lineage which diverged ~1 million years before the present from the main line which led up to anatomically modern humanity. Now Pickrell et al. present us with a twist which is perhaps even more astringent than a lime: in their genomes the Bushmen and their Khoisan kin, the Khoe herders, reflect an ancient admixture event with East Africans, who themselves were the outcomes of hybridizations between West Eurasians and indigenous African populations. More relevantly for my concise summation of the conclusion, the West Eurasian component does not necessarily reflect modern Middle Eastern populations, so much as Southern Europeans!

How did they infer such bizarre results? Magic? No. Basically the authors looked at patterns of linkage disequilibrium. Got it? Probably not. If you are curious, confused, and intent upon understanding the thrust of their methods in your bones, you probably need to read Loh et al. Barring that trust in the great hive-mind that is the Reich lab, or attempt to swallow my trite condensation.

If you consider a short to medium length sequence of the genome, there are genetic variants, alleles, segregating across that sequence. The frequency of these alleles vary across populations. And, there are on occasion correlations of allelic combinations, seen together across a single sequence than would be likely if the alleles across the loci assorted at random. A concrete example would be a population which is the product of a recent admixture event between Africans and Europeans. Recombination would take many generations to break apart all the associations between alleles which are diagnostic and distinctive of African and European ancestry, so long blocks of ancestry tracts could be inferred simply by phasing the genome on the individual level (i.e., you know the sequence of each homolog inherited from each parent, instead of just genotype values). There would be linkage disequilibrium within the population because particular variants would be associated with others across loci due to recent distinct ancestry at the genomic level. If you noticed that SNP 1 had an African allele, then SNP 2 located nearby in the locus is also more likely to have an African allele than expectation, until the point that linkage equilibrium is attained.

As I noted above, these associations are broken apart over time in a regular fashion by genetic recombination. Therefore, the decay in linkage disequilibrium across the genome can allow you to infer time since a putative admixture event. This works at various time depths. African Americans have long range LD because the admixture was relatively recent. To date older admixture events one must be more cunning, as the LD decays and becomes exceedingly faint as recombination hacks apart previous distinctive associations as two genetic backgrounds merge. But what about multiple admixture events and the consequent linkage disequilibrium patterns? What the authors did in the above paper was to test the fit of the data to a composite of LD curves in scenarios where it seems likely that there were two possible admixture events. And, they found multiple populations which did fit this model.

Dispensing with the technicalities, here are the results of admixture events as inferred from the LD decay curves:

The most parsimonious model that Pickrell et al. propose is simple as it is crazy.

1) An ancient initial admixture event in the environs of the Horn of Africa between a proto-West Eurasian population and a proto-Sudanic population

2) A second admixture event which occurs when a population derived downstream from event 1 encounters the ancestors of the Khoisan

Pickrell et al. infer a ~3,000 year old admixture event between West Eurasians and Africans for the Semitic populations of the Ethiopian plateau in keeping with Pagani et al.’s only marginally less crazy results. Then you have step 2, with an admixture between proto-Bushmen/proto-Khoe and the hybrid East Africans ~1,500 years ago. Let us accept these genetic results on the face of it. What they bring home to me is the power of culture. Though vastly diminished today, groups such as the Khoe Nama managed to preserve their integrity and independence down to the period of European colonialism (only being truly decimated in Namibia in the early 20th century by the Germans). A wave of Bantu farmers overwhelmed most of southern Africa, but select groups of Khoisan managed to maintain zones of habitation where they persisted with their unique cultural traditions and perpetuated their language. Some of this surely was ecology, as the vast Karoo region is not particularly amenable to the Bantu cultural toolkit. But, I also suspect that institutional and economic (e.g. cattle culture) influences that the East Africans had upon the Khoe, and perhaps even indirectly the Bushmen, also made these populations more robust to the Bantu expansion than otherwise would have been the case.

Being a preprint on arXiv, the paper of which I speak here is free to you, and copiously explained in loving detail in the supplements in terms of method and madness. I am not particularly enthusiastic about having long discussions about how these results are crazy and can not be right. They are crazy. But I know enough about the methodology here to understand the logic, and accept that the authors are grasping at something very strange and true, even if their particular interpretation and specific results may be disputable. Let me quote the paper at this point:

The hypothesis that west Eurasian ancestry entered eastern Africa through Arabia must be reconciled with the observation that the best modern proxies for this ancestry are often found in southern Europe rather than the Middle East (Supplementary Table 4). This observation can be interpreted in the context of ancient DNA work in Europe, which has shown that, approximately 5,000 years ago, people genetically closely related to modern southern Europeans were present as far north as Scandinavia [Keller et al., 2012; Skoglund et al., 2012]. We thus find it plausible that the people living in the Middle East today are not representative of the people who were living the Middle East 3,000 years ago. Indeed, even in historical times, there have been extensive population movements from and to the Middle East [Davies, 1997; Kennedy, 2008].

Think on that. If Pickrell et. al. are right do you think that the Middle East is particularly special in this regard? I will say that it comes to mind that the high consanguinity may result in strange outcomes if one is not careful with the sampling strategy (I’m thinking of the Samaritans I see in their data), though I doubt that this is an incautious group. But I do think it is plausible that some European populations are better proxies for the ancient Levantines than the modern Levantines because the latter have been washed over by multiple demographic waves (though I want to see more comparisons with Christian Arab* samples).

A second bombshell dropped by Pickrell et. al.:

We note that we have interpreted admixture signals in terms of large-scale movements of people. An alternative frame for interpreting these results might instead propose an isolation-by-distance model in which populations primarily remain in a single location but individuals choose mates from within some relatively small radius. In principle, this sort of model could introduce west Eurasian ancestry into southern Africa via a “diffusion-like” process. Two observations argue against this possibility. First, the gene ow we observe is asymmetric: while some eastern African populations have up to 50% west Eurasian ancestry, levels of sub-Saharan African ancestry in the Middle East and Europe are considerably lower than this (maximum of 15% [Moorjani et al., 2011]) and do not appear to consist of ancestry related to the Khoisan. Second, the signal of west Eurasian ancestry is present in southern Africa but absent from central Africa, despite the fact that central Africa is geographically closer to the putative source of the ancestry. These geographically-specific and asymmetric dispersal patterns are most parsimoniously explained by migration from west Eurasia into eastern Africa, and then from eastern to southern Africa.

Isolation-by-distance is alluded to implicitly when we speak of human genetic variation as clinal. And it’s not totally lacking in utility as a null model. But I think we need to add another layer of complexity upon this parsimonious elegance of human clans eternally exchanging mates in monotonous step-wise fashion. Multiple populations over the past 10,000 years (and likely earlier!) were rocked massive demographic turmoil, as foreigners from afar amalgamated themselves upon the local substrate, and abolished the old to bring forth something new. The author of this post is himself a product of such an event. The genetic story of mankind is not just one of continuous and diffuse gene flow gradually over a landscape of small-sale societies. No, this placid background condition was periodically perturbed by an explosion of translocating peoples, likely triggered by a technological or cultural revolution of some sort. The genetic impact in many cases is too great to be anything but a folk wandering.

Unlike isolation-by-distance these patterns do not flow linearly across space, but exhibit discordant lashing patterns through ecologically fertile terrain. Rather than a mist gliding across the plains, imaging a flash flood scouring a ravine. A more gentle analogy would be that these are demographic ripples, which expand outward, temporarily distorting the calm surface of isolation-by-distance dynamics, and eventually fading back into the background and becoming the new normal. But once the ripple has faded how do we know that it was once? That is a difficult thing indeed, and these results indicate the problems inherent. It may be that the echoes of the ripple that Pickrell et al. detect issue from a source which no longer exists. Are the scions of the first farmers of the ancient Levant hidden away in the valleys of Tuscany and the plains of Tanzania? A crazy proposition also, but not necessarily a false one.

Citation: arXiv:1307.8014v1 [q-bio.PE]

* I know some Christian Arabs do not want to be called Arabs.

🔊 Listen RSS

It is well known that Alexander the Great invaded the Indus river valley. Coincidentally in the mountains shadowing this region are isolated groups of tribal populations whose physical appearance is at at variance with South Asians. In particular, they are much lighter skinned, and often blonde or blue eyed. Naturally this led to 19th and early 20th century speculation that they were lost white races, perhaps descended from some of the Macedonian soldiers of Alexander. This was partly the basis of the Rudyard Kipling novel The Man Who Would Be King. Naturally over time some of these people themselves have forwarded this idea. In the case of a group such as the Kalash of Pakistan this conjecture is supported by the exotic nature of their religion, which seems to be Indo-European, and similar to Vedic Hinduism, with minimal influence from Islam.

Kalash girl, Credit: Dave Watts

The major problem with this set of theses is that they are wrong. And the reason I bring up this tired old idea is that many people, including Wikipedia apparently, do not know that this is wrong. I’ve had correspondents sincerely bringing up this model, and, I’ve seen it presented by scholars offhand during talks. There are many historical genetic issues which remain mysterious, or tendentious. This is not one of them. There are hundreds of thousands of SNPs of the Kalash and Burusho distributed to the public. If you want to know how these populations stack up genetically, analyze them yourself. I know that they aren’t related to Macedonians because I have plenty of European population data sets, and I have plenty of South Asian ones. The peoples of the hills of Pakistan are clearly part of the continuum of the latter, albeit shifted toward Iranian peoples.

Those seeking further proof, and unable to analyze the data themselves for any reason, can check out my posts on the topic:

The Kalash in perspective

Kalash on the human tree

Addendum: It would be nice of someone corrected the appropriate Wikipedia entries.

• Category: Science • Tags: Anthropology, Human Genetics, Human Genomics, Kalash 
🔊 Listen RSS

Malaysian “Negritos,” presumably the indigenous people of the Malay peninsula

A few days ago Dienekes pointed to a paper which reports on the presence of anatomically modern humans in China 80-100,000 years before the present. I say “anatomically modern” because there is a presumable distinction between populations which resemble moderns in their gross morphology, which first emerged in southern and eastern Africa 100 to 200 thousand years ago (and were dominant all across the world after 40,000 years before the present), and “behaviorally modern” societies, which exhibit all the hallmarks of protean symbolic cultural expression that are the hallmarks of humanity. The paper reporting on such old specimens is not particularly revolutionary. Rather, it’s part of a growing corpus which contributes to a “counter-narrative” to the dominant model, whereby behaviorally modern humans swept across Eurasia (and Australia) ~50,000 years B.P. after the “Out of Africa” event. Obviously the problem here is that if there were anatomically modern humans in China tens of thousands years before this expansion, were they replaced? Or is the chronology wrong? (e.g. the mutation rate controversy, though please note that the dominant model has many physical anthropologists who support it as well). On Twitter I pointed out to Aylwyn Scally that we do have evidence of substantial population replacement across East and Southeast Asia.

The recent human genetics results out of China, Japan, and Southeast Asia, suggest to me that these populations are simply too close genetically to have roots prior to the Holocene (i.e., before ~10,000 years before the present). But there’s another indication that there was relatively recent population expansion and replacement: physically very distinctive “Negrito” populations are still found in the most remote areas of Southeast Asia, in Malaysia and the Philippines. The Reich group also reported a few years ago that these Negrito populations themselves exhibit population substructure, with the Philippine group having deep ancient affinities to the likely first settlers of Sahul, and the Malaysians being closer to Andaman Islanders. And just because the Negritos of Malaysia are reputed to the aboriginal people, one can not discount the possibility that indeed they are not, but replaced even earlier populations (this is an implication of the Reich group’s results if I read it right). If we update our prior as to the likelihood of demographic displacements, then a ‘solution’ to the paradox of shallow convergence of populations in time in relation to the archaeological record may simply be that older populations did not contribute much to present lineages. This does not mean a zero contribution. Recall that the Neandertal admixture results almost became definitive only with dense marker sets, and an ancient reference sequence. Diverged H. sapiens sapiens groups will not be quite as diverged, so they may have left a legacy, but at such a low level that current data sets and techniques do not have the power to detect them.

A Palimpsest

But we don’t need to focus on prehistory. The recent history and semi-history of Southeast Asia is complex, and filled with cultural and demographic events which suggest great changes in the distribution of populations. The best outline I have read of this is Victor Lieberman’s peculiarly titled Strange Parallels, which outlines the rise of mainland Southeast Asian polities between 800 and 1800. The most important transformation of the past 1,200 years in mainland Southeast Asia has been the rise of the Dai/Thai peoples, and the recession of the Mon-Khmer groups. The dominant language of Thailand and Laos, as well as the highlands of eastern Burma, originally derives in historical time from a South Chinese set of ethnic groups, the Dai. As the Han Chinese pushed southward over the first millennium A.D. the indigenous populations either assimilated, or reacted by organizing their own polities (e.g., Nan-Chao). Ultimately the resistance was futile, and the Chinese conquest of their homelands helped precipitate a mass out-migration of Dai, into the lands of the Khmer. What is today Thailand was once part of greater Cambodia. To the east in Vietnam a somewhat less dramatic phenomenon occurred, as the Vietnamese (Kinh) pushed south along the coast, eventually absorbing or assimilating the Khmer of the lower Mekong. Further south the situation in maritime Southeast Asia the situation is more confused. My own belief is that it is likely that in these regions before the arrival of the Austronesians Austro-Asiatic languages were dominant. But by the time we have written records Austronesian dialects were universal, with only the Negritos of interior Malaysia retaining Austro-Asiatic as their first language.

One way we can further explore this issue is through genetics. For example, here are some results from the Harappa data set. I’ve posted only the most relevant ancestral components, and pruned the populations.*

Ethnicity “Indian” “SE Asian” “NE Asian” “Siberian”
Iban 5% 87% 0% 4%
Malay-Singapore 11% 72% 6% 3%
Cambodian 11% 71% 10% 3%
Dai-China 0% 69% 30% 0%
Vietnamese 2% 62% 35% 0%
Thai 14% 61% 12% 3%
Lahu 3% 55% 37% 3%
Miao 0% 37% 61% 1%
Han-Singapore 0% 36% 63% 0%
Han-South 0% 33% 66% 0%
Burman 17% 28% 42% 6%
Han-Beijing 0% 19% 76% 3%
Santhal 72% 17% 0% 1%
Naxi 4% 15% 73% 7%
Japanese 1% 11% 74% 11%
Mongola 0% 7% 62% 23%
Bengali 47% 6% 5% 2%


Some quick comments. First, note that though there are differences among the Han Chinese, the gap is much larger between indigenous South China ethnicities (Dai) and the South Chinese, than between the latter and the North China. Second, of all the indigenous Southeast Asian groups the Burman samples stand out. Why? Unlike their eastern neighbors the Burman population speaks a Tibeto-Burman. The prefix “Tibeto” suggests affinities with peoples from the north and west fringes of China, and that is often part of the origin legends of these people. Those legends seem correct. Not only are there cultural affinities, but these results suggest that the exogenous Burmans contributed substantially to the demographic makeup of the populace. One difficult aspect of Southeast Asian genetics is that there seems to be two South Asian affiliated components. It is likely that the Cambodians are reflecting a very ancient admixture event. For the Burmans some of this is likely the case as well, but some of the admixture is almost certainly recent (e.g. the original file shows that some Burmans have “Baloch,” which is a tell for more recent South Asian gene flow). Separating these two may not be easy, but, they are necessary if one is to get a good grasp of the impact of historical South Asian migration. And in both Burma and Malaysia the issue is complicated by both ancient and medieval migrations, and more recent colonial era settlement from India. Finally, with the Iban, an indigenous Austronesian group, you see that they are the “most Southeast Asian” of the populations listed above. It seems plausible here that this is partly a function of isolation and lack of cosmopolitanism; the Malays have had both Indian and Chinese admixture.

This is just scratching the surface of the last 4 to 5 thousand years. How plausible is it that we’ll have a neat story about the settlement of Southeast Asian >5,000 years before the present, back to 50,000 years? Looking at extant genetic variation it will be difficult, and population coverage, marker density, and methodological precision, all need o be maximized. At this point I am not surprised we are confused and unable to tell a neat story.

* Filtered for at least 5% “Southeast Asian,” N >= 5, and removed uninterested or duplicate populations

🔊 Listen RSS

Related to Muhammad?
Credit: Ian Beatty

Last year a paper came out in AJHG which reported that Ethiopian populations seem to be a compound of West Eurasians and Sub-Saharan Africans. This is result itself is not too surprising for a host of reasons. First, Ethiopians and other populations of the Horn of Africa are physically equidistant between West Eurasians and Sub-Saharan Africans. 20th century physical anthropologists sometimes placed them in the “Caucasoid” racial classification for this reason. Second, the languages of the Horn of Africa have Afro-Asiatic affinities. The Cushitic languages (e.g. Somali) have deep connections with more familiar tongues such as Arabic, but Semitic Ethiopian languages (e.g. Amharic) are much closer in historical distance. Third, there has been a fair amount of previous genetic analysis of these populations, and their synthetic character was obvious from those (e.g. mtDNA and Y results suggest a diverse array of haplogroups). What the AJHG paper reported was that the Eurasian ancestors of the Ethiopians admixed with the presumably Sub-Saharan indigenes ~3,000 years ago in a single pulse event, and, their closest modern relations in West Asia today are Levantines. To put a mild gloss on it the dating is controversial (using patterns of decayed genetic correlations of markers across the length of the genome). This is not just clinal variation.

As I have noted if this dating of the admixture is correct then modern Ethiopians as a coherent biocultural entity post-date Egyptian civilization by thousands of years. During the reign of Hatshepsut, ~1500 BC, there was a trade delegation send to the land of Punt (probably Somaliland). The depictions of the people of Punt by the Egyptians were very strange, insofar as they did not look Sub-Saharan African, and, the queen of Puntland seemed to exhibit steatopygia more common among the Khoisan people. I posited speculatively that during this period of ancient Egyptian civilization East Africa was in ferment in regards to its population mix. The Bantu people who dominate the landscape of Sub-Saharan Africa east and south of Cameroon only began to expand in earnest after 1000 BC (reaching southern African about 1,500 years ago). It seems plausible that the range of Khoisan-like peoples was much further north and east than is the case today. Additionally, there are likely to have been other populations, currently uncharacterized, present on the landscape (it may be that the Khoisan loom large only because their distribution was such that relic populations survive to this day to be studied). The Tishkoff lab for example has a paper in preparation on the presumed Sub-Saharan African populations present in the Horn of Africa when West Eurasians arrived (the Sub-Saharan component of highland Ethiopians does not seem to be Bantu-like).

I bring all this up again because Dienekes highlights an abstract by Joe Pickrell at next weeks’ SMBE 2013:

The history of southern Africa involved interactions between indigenous hunter-gatherers and a range of populations that moved temporarily or permanently into the region. The influence of these interactions on the genetic structure of current populations remains unclear. Here, using patterns of linkage disequilibrium, we show that there are at least two admixture events in the genetic history of southern African hunter-gatherers and pastoralists: one involving populations related to Niger-Congo-speaking African populations, and one which introduced ancestry most closely related to west Eurasian (European or Middle Eastern) populations. We estimate that at least a few percent of ancestry in the Khoisan is derived from this latter admixture event, which occurred on average 1,200-1,800 years ago. We show that a similar signal of west Eurasian ancestry is present throughout eastern Africa; in particular, we also find evidence for two admixture events in the genetic history of several Kenyan, Tanzanian, and Somali populations, the earliest of which involved populations related to southern Europeans and which we date to approximately 2700 – 3300 years ago. We thus suggest that west Eurasian ancestry entered southern Africa indirectly through eastern Africa. These results demonstrate how large-scale genomic datasets can inform complex models of population movements, and highlight the genomic impact of largely uncharacterized back-to-Africa migrations in human history.

The Khoisan here are not specified, so I took the liberty of putting an image of a Bushman at the head of the post. But it seems more plausible that they would be Khoi, who received cattle-culture from non-Bantu populations (they had them when the Bantu arrived) at some point in the distant past. There is already evidence that the enigmatic Sandawe people of Tanzania have old Eurasian admixture, so this would not be particularly surprising. The Sandawe language has affinities with that of the Khoisan (clicks), though the broader language family as a coherent entity is still controversial (some of the Bushmen languages themselves may not really have a close affinity, aside from broad distinctive similarities such as clicks). The whole question of the ethnogenesis of the Sandawe people seems clouded until we get denser population data sets in terms of geographic coverage.

Jan Jonker Afrikaner, leader of the Oorlam people

As for how Eurasian ancestry might have entered into the Khoisan, that is a process which is easy to imagine, because more recent European and Asian ancestry has entered into these populations over the last five centuries in ways which have been recorded by history. Some of this has been through organized ethnic amalgamation as one people become assimilated into another (just as the Alans assimilated into the Vandals in Spain after their defeat by the Romans). Consider the case of the Oorlam people, who are now part of the Nama Khoi tribe in Namibia.

In the period between the rise of modern South Africa in the early 20th century and the initial founding of the Cape Colony in the 18th century a large number of multiracial individuals of Northern European, Asian (Indian and Indonesian), and native (Khoisan, and later Bantu) origins arose from various contacts (relationships between European men and slaves, relationships between African and Asian slaves, etc.). The most well known of the resultant people are the Cape Coloureds. Though Afrikaner in language, and Dutch Reformed in religion, this population has a diverse racial heritage, and was relegated to second class status by white Afrikaners. Less famous, but still well known, are the Griqua, frontiers folk who created their own political units similar to what their white Afrikaner cousins achieved further east in the 19th century to escape colonial rule as well as racial oppression due to their ‘bastard’ status. Finally you have groups like the Oorlam, who like the Griqua attained some superiority over native populations via their European cultural heritage & connections further into the bush in Namibia, beyond the farthest frontiers of Dutch republics and British South Africa. But they were eventually submerged into the Khoi Nama after a series of defeats imposed upon them by a Bantu tribe. What you see here is a shift in the balance between European and native cultural traditions being salient as a function of distance from the Cape Colony (some Griqua men were known to have disappeared into the desert and assimilated into their “mother’s kin,” probably explaining how some Bushmen have European and Asian ancestry).

It seems entirely possible that this sort of dynamic may have played out in antiquity in Africa (and other places). In some cases where demographic preponderance and environmental conditions were amenable the amalgamated populations maintained a large degree of affinity with Eurasians, rather than going “native.” This is certainly the case in the Horn of Africa, where Ethiopian polities intervened in Arabian affairs, and became part of the Oriental Orthodox community of polities (which includes the Copts of Egypt, Armenians, and Syrian Orthodoxy). To a great extent Ethiopia was more a frontier of the West Eurasian oikoumene than part of Sub-Saharan Africa. In contrast you have a situation such as that of the Sandawe, whose vague Cushitic affinities have long been suspected, but had become hunter-gatherers like their Hadza neighbors, and adopted broad elements of the language and culture of the Khoisan. Finally, you have the likelihood that the Khoi peoples only retained extremely useful cultural knowledge such as animal husbandry (and animals) from their Eurasian forebears, who in any case had ancestry likely mediated through Sandawe-like cattle herders of already hybridized nature.*

In the big picture what these results tell us is that the story of human prehistory is complex and multi-layered, and peeling it apart genetically is going to leave us with more questions than answers. Ten years ago a simple story of Out of Africa ~50,000 years ago and subsequent fission between non-Africans (e.g. Europeans separation from Asians, Amerindians separate from Asians) was a robust stylized model we could live with and accept with a clean conscience. Today we are confronted with a more inscrutable world, with archaic and Holocene admixture littering the scene. A substantial proportion of the world’s population (e.g. India) seem to be the byproduct of admixture between very distinct populations which merged less than 10,000 years ago. Much of Sub-Saharan African has been totally remodeled culturally and likely genetically over the past 3,000 years. Now there is a fair amount of evidence that eastern Africa has been subject to “back migration” over the past 5,000 years (though there have long been uniparental lineages which suggest this). A simple story of humans leaving African Eden is no longer viable, because Africa wasn’t Eden for everyone, and modern Africans themselves have felt the stamp Eurasian migratory events, as well as extensive internal folk wanderings.

Finally, I would say that perhaps the most genetically valuable people to study might be the Mbuti Pygmies of eastern Congo. I suspect they’re the least touched by both the Bantu expansion and the Eurasian back migrations of all the Africans. At least I hope.

* The Lemba of Zimbabwe are probably one of the clues to what occurred in Southern Africa over the past 2,000 years.

• Category: Science • Tags: Anthropology, Human Genetics, Human Genomics 
🔊 Listen RSS

Credit: Sci Transl Med 3 July 2013: Vol. 5, Issue 192, p. 192ra86, Sci. Transl. Med. DOI: 10.1126/scitranslmed.3006338

Right before I was to sleep a reader sent me an email which pointed to a Nick Wade piece in The New York Times, Gene Sleuths Find How Some Naturally Resist Cholera. It’s about new research in ScienceTranslational Medicine, Natural Selection in a Bangladeshi Population from the Cholera-Endemic Ganges River Delta. The authors use the “composite of multiple signals” (CMS) test to ascertain regions of the genome subject to natural selection (look for long haplotypes, high frequency derived alleles, and alleles with high cross population frequency differences). The results aren’t too surprising, I was born in Bangladesh, and I can attest to the fact that it’s a germaphobe’s nightmare. Rather, it is a secondary and very minor aspect of the paper which frankly draws my ire. First let’s quote Wade’s treatment:

As a necessary preliminary to testing for natural selection, the researchers looked at the racial composition of the Bengali population and found that they are an Indian population with a 9 percent admixture of East Asian genes, probably Chinese. The admixture occurred almost exactly 52 generations ago, according to statistical calculation, or around A.D. 500, assuming 29 years per generation. The Gupta empire in India was in decline at this time, but it is unclear whether the intermarriage with East Asians took place through trade or conquest. “We can now go back to the historians and see what happened then,” Dr. Karlsson said.

But sometimes science gets garbled in transmission. What do they say in the paper? Again, the relevant section:

We estimate that the admixture between Indian and East Asian populations occurred 52 ± 2 generations ago (generation = 29 years)…or around 500 AD, based on the exponential decline of linkage disequilibrium (LD) with distance analyzed using ROLLOFF…This remarkably close-fitted age estimate roughly corresponds to the collapse of the Indian Gupta Empire, the rise of the Chinese Tang dynasty, and the brief unification of Bengal under a single ruler (590 AD to 625 AD). Although alternative histories, such as continuous admixture or multiple admixture events, are possible, the single-event model shows excellent fit to our data, and we found no statistical support for very ancient flow…Using the maximum likelihood–based ancestry estimation software ADMIXTURE (32), we found 9.3 ± 2.6% East Asian ancestry in the BEB….

If you read the rest of the paper you can see where Wade would get the idea that the admixture was “probably Chinese,” but I don’t think it jumps out from the text or the supplements. Perhaps he got this from the first author who is quoted; I have no idea. The population history component of this work is not essential to the understanding of the selection operating via disease, but the problem when this sort of confusion is allowed to stand is that it will become solidified conventional wisdom. I’ve seen this happen before, as throwaway lines have a way of persisting and spreading. Additionally, it also makes geneticists seem superficial when they step outside of their narrow domains of presumed knowledge.

In fact the reality of who these East Asians are in the Bengali ancestral gene pool is clear in the supplements. At K = 5 you see that a Malay-like ancestral component emerges, and all of the East Asian ancestry of the Bengalis is assigned to this. You also have the Han Chinese and Chinese in Singapore data sets split between a Japanese-like and Malay-like component. The Chinese in Singapore is more Malay-like. This is entirely expected, as the Singapore Chinese population is disproportionately from the southwest region of Fujian, with a minority of Hakka (who are reputedly northern transplants to South China), as well as Straits Babba Chinese, who have maternal Malay ancestry. The Han Chinese data set is from the HapMap, with a Beijing sample which is northern biased. In sum, the East Asian ancestry of the modern Bengalis is almost certainly derived from a group with the closest affinities to those in Southeast Asia, not, the Chinese.

I have long suspected this because I have genotypes for two unrelated Bengalis, my parents. I’ve talked about this before. Some preliminary investigation on my part looking at the East Asian segment length did suggest an admixture event ~1,000 years ago, so I can believe the ROLLOFF results. But comparing them to various populations, it’s clear that the East Asian ancestry is Southeast Asian for both of them. Here’s a MDS I generated 15 minutes ago using 133,000 LD pruned SNPs:

The Chamar are a low caste group from Uttar Pradesh (to the north and west of Bengal). You notice that my parents (“RazibFam”) are both shifted toward the Southeast Asian groups. Totally unsurprising. No matter how you analyze the results this jumps out (the second dimension pulled out a few inbred individuals from another Southeast Asian group).

So assuming one admixture event (or a primary one), what’s a good story? Eastern Bengal has always been on the margins of and beyond Aryavarta, the cultural core of the North Indian plain. Magadha, the heart of the Gupta Empire in Bihar, was long a marchland. Like the region around Xian in northwest China Magadha was often the locus of the classical pre-Islamic Indian macro-polities, despite (or perhaps because of?) its liminal relationship to the broader Indo-Aryan culture of Northern India, which was fixed upon the Upper Gangetic plain. As Magadha collapsed, and Bengal rose under dynasties such as the Pala later in the first millennium the frontier of Indo-Aryan society rapidly expanded outward and eastward. But I doubt eastern Bengal was empty. It is quite possible there were slash & burn agriculturalists who had arrived at some previous time from Southeast Asia. And it is these who I believe were absorbed into Bengali society in toto.

🔊 Listen RSS

Illustration of runs of homozygosity for affected and unaffected siblings
Credit: Intellectual Disability Is Associated with Increased Runs of Homozygosity in Simplex Autism

It is generally understood that inbreeding has some negative biological consequences for complex animals. Recessive diseases are the most straightforward. The rarer a recessive disease is the higher and higher fraction of sufferers of that disease will be products of pairings between relatives (the reason for this is straightforward, as extremely rare alleles which express in a deleterious fashion in homozygotes will be unlikely to come together in unrelated individuals). But when it comes to traits associated with inbred individuals recessive diseases are not what comes to mind for most, the boy from the film Deliverance is usually the more gripping image (contrary to what some of the actors claimed the young boy did not have any condition).

Some are curious about the consequences of inbreeding for a trait such as intelligence. The scientific literature here is somewhat muddled. But it seems likely that all things equal if two people of average intelligence pair up and are first cousins the I.Q. of their offspring will be expected to be 0-5 points lower than would otherwise be the case. By this, I mean that the studies you can find in the literature suggest when correcting for other variables that the inbreeding depression on the phenotypic level is greater than 0 (there is an effect) but less than 5 (it is not that large, less than 1/3 of a standard deviation of the trait value). Presumably for higher levels of inbreeding the consequences are going to be more dire.

But what about genetic homogeneity that’s not due to inbreeding? Recall that the recent Ralph and Coop paper showed empirically that there were many networks of genetic relatedness between people who one might think are absolutely unrelated. Anyone who uses 23andMe has plenty of evidence of this, as “relatives” begin to pop up who match genetic segments with you. If you have one line of descent from an individual far in the past you are often going to have another. This means that segments of DNA from the same individual may come “back together” and form a homozygous block. How this occurs for inbred individuals is simple. If your parents are first cousins they share one pair of grandparents, and each of these grandparents has two short lines of descent down to you. But this same dynamic applies in diluted form to those who are much further back in your genealogy. You may be entirely outbred in a pedigree sense, but still have runs of homozygosity due to chance.

A new paper in AJHG compares levels of runs of homozygosity in a data set of unaffected parents, unaffected offspring, and affected offspring. In particular the authors had a data set in the thousands of families who participated in autism research. The affected siblings were diagnosed with autism, while the parents and unaffected siblings did not exhibit the condition. In addition to autism there was a range in intelligence of the affected siblings. This experimental design is useful because you are comparing siblings who share a great deal genetically, but are phenotypically different. You have to correct for fewer confounds because their genetic backgrounds overlap, and, their environments are highly correlated.

Of course siblings are similar genetically, they are not duplicates. The expected relationship of siblings is 50%, but there is a variation in this value, and obviously there are genetic differences with the unrelated balance. In this study the authors focused on runs of homozygosity (ROH), which are likely due to ancestors showing up across their lineage at some distant time in multiple instances. Their minimum threshold for serious ROH blocks were relatively short at 2,500 kb (the expected value for how far back the common ancestor responsible for the ROH block of 2,500 is ~1,000 years). The topline find is that very low I.Q. affected siblings (<70, what would be termed “mentally retarded” in the past) had 1.32 times more ROH of > 2,500 kb (p = 0.03). They did not find a statistically significant different >70. The authors did a range of manipulations and slicing and dicing of the data. I am not particularly interested in those hoary details.

Rather, I’m heartened that high density SNP chips are now being applied to these sorts of massive family based studies, and the biological differences between siblings can be more properly assessed. There is a great deal of randomness across siblings in terms of their genetic inheritance. Two of my siblings only share 41% of their genes identical by descent (as opposed to the expected 50%). Not only that, I know that frequency of ROH also varies randomly across siblings; they do across mine. If the number of de novo point mutations of significant effect is on the order of 30 or so per individual then again variation across sibling cohorts is liable to be significant and of note.

Issues such as inbreeding depression or the phenotypic consequences of homozygosity were until recently theoretical matters, or explored in organisms such as Drosophila. That age is coming to an end. High coverage whole genome sequencing is going to allow for precise and powerful comparisons across sibling cohorts, and as whole nations go “all in” the swell of data is going to be awesome. I suppose people will find out things that they may be uncomfortable with, but one ultimately has to face up to the truth in the end.

Addendum: On a converse note, here’s a case where you seem to see outbreeding depression. On the genome-wide scale I’d be willing to bet this is less of a problem than inbreeding. And no, I am not convinced by the fact that there seems to be higher fertility in more closely related individuals in Iceland.

🔊 Listen RSS

Razib’s daughter’s ancestry composition

An F1, r = 0.5 to Razib

Genome-wide associations are rather simple in their methodological philosophy. You take cases (affected) and controls (unaffected) of the same genetic background (i.e. ethnically homogeneous) and look for alleles which diverge greatly between the two pooled populations. Visually the risk alleles, which exhibit higher odds ratios, are represented via Manhattan plots. But please note the clause: ethnically homogeneous study populations. In practice this means white Europeans, and to a lesser extent East Asians and African Americans (the last because of the biomedical industrial complex in the United States performs many GWAS, and the USA is a diverse nation). Looking within ethnic groups eliminates many false positives one might obtain due to population stratification. Basically, alleles which differ between groups because of their history may produce associations when the groups themselves differ in the propensity of the trait of interest (e.g. hypertension in blacks vs. whites).

But this begs the question: how generalizable are GWAS, and therefore portable across ethnicities? This is not a trivial question for someone like me, as South Asians tend to be understudied for natural reasons (there aren’t that may of us in the West, and funding for this sort of thing is not viable in Third World nations where most South Asians live). Not only are South Asians understudied, but we tend to have large genetic distances within the putative population, so I’m not even sure that GWAS from the HapMap Gujarati samples would be applicable to me (the genetic distance between South Asian ethnic groups is actually greater than between Europeans and some West Asians). And then there is the question of people of mixed heritage. Is there really a possibility in the near future of GWAS’ of various F1 combinations, let alone backcrosses like Reiko Aylesworth?

Fortunately, from where I stand seems that most GWAS being reported today are portable across ethnicities, so we don’t have to go reinventing every wheel. Some of the evidence is plain to see a in new PLoS GENETICS paper, High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants. Here is the abstract:

Describing and identifying the genetic variants that increase risk for complex diseases remains a central focus of human genetics and is fundamental for the emergent field of personalized medicine. Over the last six years, GWAS have revolutionized the field, discovering hundreds of disease loci. However, with only a handful of exceptions, the causal variants that generate the associations unveiled by GWAS have not been identified, and their frequency and degree of sharing across populations remains unknown. Here, we present a comprehensive comparison of GWAS results designed to try to understand the nature of causal variants. By examining the results of GWAS for 28 diseases that have been performed with peoples of European, East Asian, and African ancestries, we conclude that a large fraction of associations are caused by common causal variants that should map relatively close to the associated markers. Our results indicate that many of the disease risk variants discovered by GWAS are shared across Eurasians.

I want to stipulate that my own views on this matter do not hinge on just this paper. Nor do I believe that there is no regional heterogeneity in the genomic architecture of disease risk alleles. Rather, as a prior I now would contend that when looking at the odds ratios for a relatively large effect allele in Europeans for Eurasians at least one shouldn’t be excessively skeptical of transferring the inference toward other populations In the paper the authors report that when accounting for differences in statistical power (European studies tend to have much larger sample sizes, and so can catch more variants) there is a decent replicability of GWAS. Additionally, there is the possibility that some non-replications are due to the fact that the GWAS are focusing on marker SNPs, rather than causal SNPs, and the marker associations are not portable across populations even if the causal ones are. Remember, often current GWAS studies utilizing SNP-chips are focusing on a genomic region, more than a particular SNP as such. This is why you may get strong GWAS signals in noncoding regions.

Of course there are going to be rare variants which are less portable, and as genomics scales up in population sample size and deep whole genome analyses we’re going to be plumbing private alleles. But until then there’ll be a mountain of common variants of diverse effect sizes, and that information needn’t be discarded when one considers populations outside of the study’s purview. When viewing odds ratios in 23andMe there’s always the caveat that “results X for Europeans.” This not expected for a business. And in terms of medical actions one still needs to be cautious. But to the question of how seriously to take GWAS performed in Europeans if you are not European? If you are non-African, I’d say moderately seriously. If you are an African, I’d probably still say somewhat seriously.

Citation: Marigorta, Urko M., and Arcadi Navarro. “High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants.” PLoS Genetics 9.6 (2013): e1003566.

🔊 Listen RSS

MDS of all samples

Noah Rosenberg’s lab has put out the mother of all microsatellite papers, Population Structure in a Comprehensive Genomic Data Set on Human Microsatellite Variation. It seems to me that this is the culmination of all the work with microsatellite markers which has come out of his lab over the past decade, applying all sorts of fancy analytic techniques they’ve developed (for example, Procrustes transformation). The big thing to note is that the human sample size is nearly 6,000 individuals with over 600 loci. Because microsatellites mutate and diverge very fast (mutation rates 10-4 rather than 10-8as with SNPs) 600 loci is more than sufficient to differentiate populations. Because of this rapid mutation I’m a little dubious about their attempt to explore human-chimp differences using a smaller set ascertained on humans, though that may be simply a proof of principle (if the markers evolve too fast they might not tell you much informative about very deep divergences).

Click to enlarge

Reading the paper it’s quite obvious that just merging the samples was a big feat. And it’s not just sample size, they had excellent population coverage (267). As Dienekes observes microsats are somewhat “retro”, but try and get this sort of population coverage with whole genomes, or even SNPs. You can get to N>5,000, but with SNPs the overlapping markers start to drop off very quickly, to the point where they are far less informative than this number of microsats. Dienekes quite liked the tree to the left, and I’ve uploaded a rather large version of it for your enjoyment (just zoom in if your browser sizes it down).

But to some extent the tree above illustrates the limitations of this sort of analysis. Rather than an analysis, this is really more a useful data set that you have to slice and dice, and explore on a finer grain. Pooling all the samples together makes it far less informative and unintelligible. This is already obvious in their aggregation to create the large data set, as they had to prune very large subpopulations so they didn’t overwhelm the results. Even then problems obvious to those familiar with the data crop up, though they might not be so clear to those who are reading superficially. The Gujarati data set among the South Asians separated out on a two dimensional visualization from all other populations. This is something that often occurs because it looks like Gujaratis are sampled from a very specific caste, which increases the perceived affinity of this regional ethnicity. Similarly, pooling all the populations and representing them on a two dimensional plot is more an aesthetic declaration than an informative visualization. You have to bracket out the populations to see value-added structure. Finally, even the coarse and general observations need to be integrated with caution. Rosenberg’s lab has been illustrating the decay of genetic diversity from Ethiopia for nearly a decade now. It’s a classic result which shows up in graduate level population genetics courses. But both the anthropology and genetics tell us that Ethiopians are a compound population with Sub-Saharan African and Eurasian affinities. Most readers can be expected to know this, but I would not be surprised if some simply took the general plot at face value and applied the insight to all the populations, as if they really were subject to a serial founder effect (my specific point is that Ethiopians are the product of a synthesize due to back migration, reversal of the general migration out of Africa being illustrated with the decline in genetic diversity).

Overall I find this an interesting paper which sets the backdrop for understanding the canvas of human genetic variation. The only last caution I would offer is that microsatellites are atypical regions of the genome which evolve rapidly in a neutral fashion. This makes them excellent for pinpointing population differences and inferring history from a limited marker set. But I think people should be cautious of specific novel results, and not hold them up as that authoritative when we have high density SNP data.

Note: They’ve released the data. If readers are curious about doing different things with these data than was shown in this paper, Treemix can handle microsats. Also, props to them for releasing this creative commons.

Citation: Pemberton, Trevor J., Michael DeGiorgio, and Noah A. Rosenberg. “Population structure in a comprehensive genomic data set on human microsatellite variation.” G3: Genes| Genomes| Genetics 3.5 (2013): 891-907.

🔊 Listen RSS

Sir Francis Galton

Modern evolutionary genetics owes its origins to a series of intellectual debates around the turn of the 20th century. Much of this is outlined in Will Provines’ The Origins of Theoretical Population Genetics, though a biography of Francis Galton will do just as well. In short what happened is that during this period there were conflicts between the heirs of Charles Darwin as to the nature of inheritance (an issue Darwin left muddled from what I can tell). On the one side you had a young coterie around William Bateson, the champion of Gregor Mendel’s ideas about discrete and particulate inheritance via the abstraction of genes. Arrayed against them were the acolytes of Charles Darwin’s cousin Francis Galton, led by the mathematician Karl Pearson, and the biologist Walter Weldon. This school of “biometricians” focused on continuous characteristics and Darwinian gradualism, and are arguably the forerunners of quantitative genetics. There is some irony in their espousal of a “Galtonian” view, because Galton was himself not without sympathy for a discrete model of inheritance!

William Bateson

In the end science and truth won out. Young scholars trained in the biometric tradition repeatedly defected to the Mendelian camp (e.g. Charles Davenport). Eventually, R. A. Fisher, one of the founders of modern statistics and evolutionary biology, merged both traditions in his seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance. The intuition for why Mendelism does not undermine classical Darwinian theory is simple (granted, some of the original Mendelians did seem to believe that it was a violation!). Many discrete genes of moderate to small effect upon a trait can produce a continuous distribution via the central limit theorem. In fact classical genetic methods often had difficulty perceiving traits with more than half dozen significant loci as anything but quantitative and continuous (consider pigmentation, which we know through genomic methods to vary across populations mostly due to half a dozen segregating genes or so).

Notice here I have not said a word about DNA. That is because 40 years before the understanding that DNA was the substrate of genetic inheritance scientists had a good grasp of the nature of inheritance through Mendelian processes. The gene is fundamentally an abstract unit, an analytic element subject to manipulation which allows us to intelligibly trace and predict patterns of variation across the generations. It so happens that the gene is instantiated in a material sense through sequences of the biomolecule DNA. This is very important. Because we know the material basis of modern genetics it is a much more fundamental science than economics (economics remains mired in its “biometric age!”).

The “post-genomic era” is predicated on industrial scale analysis of the material basis of genetics in the form of DNA sequence and structure. But we shouldn’t confuse DNA, concrete bases, with classical Mendelism. A focus on the material and concrete is not limited to genetics. In the mid-2000s there was a fad for cognitive neuroscience fMRI studies, which were perceived to be more scientific and convincing than classical cognitive scientific understandings of “how the mind works.” In the wake of the recession of fMRI “science” due to serious methodological problems we’re left to fall back on less sexy psychological abstractions, which may not be as simply reduced to material comprehension, but which have the redeeming quality of being informative nonetheless.

This brings me to the recent paper on SNPs associated with education in a massive cohort, GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment. You should also read the accompanying FAQ. The bottom line is that the authors have convincingly identified three SNPs to explain 0.02% of the variation in educational attainment across their massive data set. Pooling all of the SNPs with some association they get ~2% of the variation explained. This is not particularly surprising. A few years back one of the authors on this paper wrote Most Reported Genetic Associations with General Intelligence Are Probably False Positives. Those with longer memories in human genetics warned me of this issue in the early 2000s. More statistically savvy friends began to warn me in 2007. At that point I began to caution people who assumed that genomics would reveal the variants which are responsible for normal variation on intelligence, because it seemed likely that we might have to wait a lot longer than I had anticipated. As suggested in the paper above previous work strongly implied that the genetic architecture of intelligence is one where the variation on the trait in the normal range is controlled by innumerable alleles of small effect segregating in the population. Otherwise classical genetic techniques may have been able to detect the number of loci with more surety. If you read Genetics of Human Populations you will note that using classical crossing techniques and pedigrees geneticists did in fact converge upon approximately the right number of loci segregating to explain the variation between European and African pigmentation 60 years ago!

Some of my friends have been arguing that the small effect sizes here validate the position that intelligence variation is mostly a function of environment. This is a complicated issue, and first I want to constrain the discussion to developed Western nations. It is an ironic aspect that arguably intelligence is most heritable among the most privileged. By heritable I mean the component of variation of the trait controlled by genes. When you remove environmental variation (i.e. deprivation) you are left with genetic variation. Within families there is a great deal of I.Q. difference across siblings. The correlation is about 0.5. Not bad, but not that high. Of course some of you may think that I’m going to talk about twin studies now. Not at all! Though contrary to what science journalists who seem to enjoy engaging in malpractice like Brian Palmer of Slate seem to think classical techniques have been to a great extent validated by genomics, it is by looking at unrelated individuals that some of the most persuasive evidence for the heritability of intelligence has been established. It is no coincidence that one of the major authors of the above study also is an author on the previous link. There is no contradiction in acknowledging difficulties of assessing the concrete material loci of a trait’s variation even if one can confidently infer that association. There was genetics before DNA. And there is heritability even without specific SNPs.

Additionally, I want to add one caveat into the “environmental” component of variation. For technical reasons this environmental component may actually include relatively fixed biological variables. Gene-gene interactions, or developmental stochasticity come to mind. Though these are difficult or impossible to predict from parent to offspring correlations they are not as simple as removing lead from the environment of deprived children. My own suspicion is that the large variation in intelligence across full siblings tell us a lot about the difficult to control and channel nature of “environmental” variation.

Finally, I want to point out that even small effect loci are not trivial. The authors mention this in their FAQ, but I want to be more clear, Small genetic effects do not preclude drug development:

Consider a trait like, say, cholesterol levels. Massive genome-wide association studies have been performed on this trait, identifying a large number of loci of small effect. One of these loci is HMGCR, coding for HMG-CoA reductase, an important molecule in cholesterol synthesis. The allele identified increases cholesterol levels by 0.1 standard deviations, meaning a genetic test would have essentially no ability to predict cholesterol levels. By the logic of the Newsweek piece, any drug targeted at HMGCR would have no chance of becoming a blockbuster.

Any doctor knows where I’m going with this: one of the best-selling groups of drugs in the world currently are statins, which inhibit the activity of (the gene product of) HMGCR. Of course, statins have already been invented, so this is something of a cherry-picked example, but my guess is that there are tens of additional examples like this waiting to be discovered in the wealth of genome-wide association study data. Figuring out which GWAS hits are promising drug targets will take time, effort, and a good deal of luck; in my opinion, this is the major lesson from Decode (which is not all that surprising a lesson)–drug development is really hard

Addendum: Most of my friends, who have undergraduate backgrounds in biology, and have taken at some quantitative genetics, seem to guess the heritability of I.Q. to be 0.0 to 0.20. This is just way too low. But is it even important to know this? I happen to think an accurate picture of genetic inheritance is probably useful when assessing prospects of mates….

Citation: Rietveld, Cornelius A., et al. “GWAS of 126,559 Individuals Identifies Genetic Variants Associated with Educational Attainment.” Science (New York, NY) (2013).

🔊 Listen RSS

Christopher Columbus

A few year ago there was a minor controversy when some evolutionary genomicists reported that they had reconstructed the genome of the extinct Taino people of Puerto Rico by reassembling fragments preserved in contemporary populations long since admixed. The controversy had to do with the fact that some individuals today claim to be Taino, and therefore, they were not an extinct population. Though that controversy eventually blew over, the methods lived on, and continue to be used. Now some of the same people who brought you that have come out with work which reconstructs the recent demographic history of the Caribbean, both maritime and mainland, using genomics. Even better, it’s totally open access because it’s up on arXiv, Reconstructing the Population Genetic History of the Caribbean (please see the comments at Haldane’s Sieve as well, kicked off by little old me). Though the authors pooled a variety of data sets (e.g., HapMap, POPRES, HGDP) the focus is on the populations highlighted in the map above.

Much of the novel insight in the results begins with their observation of a distinct “Latino” population genetic cluster with strong affinities with Europe within the Caribbean populations. This is clearly visible in their ADMIXTURE analysis. What they did was pool various populations, and run a method which decomposes the ancestry of each individual as a combination of K ancestral populations. In cases where the pooled populations are clear and distinct the results will be clear and distinct. For example, if you had 50 Finns and 50 Nigerians and pooled them, and ran ADMIXTURE at K = 2, then with a non-trivial number of SNPs (10,000 is more than sufficient) all the Finns and Nigerians will partition into two distinct ancestral populations according to these sorts of model based clustering. But it always has to be remembered that though these methods map onto reality, and give us some sense of the variation within the data sets, the K’s themselves are artificial constructs. So, for example, the HGDP Maya population is known to have non-trivial European gene flow. If you use this sort of Maya population as your “Native American” reference, then you will underestimate Native ancestry in admixed groups because your reference Native population is already skewed toward Europeans (this is obviously a major problem when you don’t have the appropriate reference because it is extinct, such as with the Taino).

With those cautionary preliminaries out of the way what’s going on in these results? As you can see many of the Caribbean populations are straightforward combinations of various continental ‘parent’ populations. This is clearly evident in K = 3, where green = Africa, red = European, and blue = Native (note that the Maya have a range of European ancestry just as I said). By looking at individual variation within populations you can already gain some insights as to the nature of the admixture. In Mexico there is a wide range of the European vs. Native fraction, though in this data set there are no “pure” individuals. Additionally, there are low, but relatively even, amounts of African ancestry across the population. Though African consciousness this is not a major element of modern Mexican national identity, people of African ancestry were a major part of the Spanish colonial enterprise (see Empire: How Spain Became a World Power, 1492-1763). In some areas, such as Veracruz, people of visibly African ancestry remain, but in much of Mexico these individuals intermarried and their physical characteristics were diluted toward the point of not being visible.

The situation in the maritime Caribbean is somewhat more complex. In these contexts it was the Native, not African, ancestry which was subsumed and submerged. It is genomics which has ‘rediscovered’ this ancestry, to the extent that many scholars had previously been skeptical of the possibility that modern Puerto Ricans and Dominicans inherited a substantial share of Taino ancestry. In both Puerto Rico and the Dominican Republic the relevant issue is that there is a wide range of proportion of African and European ancestry, with Cuba being the notable extreme case of this phenomenon. What’s going on with Cuba in particular is that there were late waves of migration from Spain, so some modern white Cubans are much less affected by admixture than other Caribbeans (remember that Cuba was part of Spain until 1898). In Haiti the situation is reversed, where the revolutions of the late 18th and early 19th centuries had a racial tinge, and whites were expelled (leaving a small mulatto class).

But it is K = 8 where things really get interesting. The black component is a European Iberian-like element which is distinct to Latino populations (including Maya). As you can see on this PCA the Latino element is related to the Iberian populations, as they took the European segments from the Caribbean populations and used them to flesh out the distribution in ancestry. There are several ways to interpret this. Dienekes suggested this might simply be a function of the source Iberian populations hundreds of years ago being somewhat different from the contemporary ones. For example, obviously contemporary Spaniards would be more subject to gene flow with other Europeans >1600 than their New World cousins. Another possibility is that there was extreme sampling from a particular region of Spain, and that has how broken out as its own cluster. For example, I know that a disproportionate number of migrants were from Andalucia and Extramadura. But the pattern here doesn’t suggest to me that possibility (the black dots should be more south-shifted I would think if they were from those two provinces).

Rather, the interpretation they seem to favor is that this element has been drifted away from the ancestral populations due to a bottleneck. This is not ethnographically implausible; the early years of the Spanish colonial experiment was characterized by de facto polygyny. Many adventurers lived lives not unlike those of the white grandees of the East India company in the late 18th century. Some have argued that this period of ubiquitous common law polygyny has influenced the fact that illegitimate births have traditionally been very common in Latin America. One reason the authors favor the bottleneck model is that the genetic distance between the Latino element and the Iberian one is rather high. This is often common in situations where drift/bottleneck has deviated allele frequencies particularly rapidly. Not only that, but the tendency is most strong in maritime Latin America, many of whose islands received relatively fewer subsequent migrants than the large and expansive mainland viceroyalties.

23andMe ancestry decomposition for friend who is 1/4 Asian

Another way the authors explored the demographic history was to look at the length distribution of the tracts of ancestries. How this works is simple. A first generation hybrid will have unbroken lengths of ancestry each parent, but subsequent generations will start to have fragmentation occur as recombination breaks apart long blocks identical by descent. You can see this in the figure to the left, where my friend who has one Asian grandparent has blocks of alternating European and Asian ancestry because of meiotic recombination events. The longer from the time of admixture the smaller and smaller the blocks will become, as recombination slices apart long blocks and recombines ancestral components. By looking at the distribution and mix of lengths the authors can construct demographic histories of the populations. In short it looks like much of the European ancestry came in one short quick pulse, rather early on in settlement. This is in keeping with the high reproductive output attested for European males thanks to polygyny during this period.

The same method was performed for the African ancestry, and the authors discovered an intriguing result. It seems that in the early years most of the Caribbean black slaves were derived from the western tip of Sub-Saharan Africa, from the Senegal river down to modern Ghana. Later on the longer tracts show affinities with populations further east, from the Bight of Benin toward the Equator. I don’t know the history of slavery well enough to confirm or deny the reality of this finding, but it illustrates the power of genomics combined with wide sampling strategies. More relevantly I suspect genomics’ role will be to assign magnitudes to known dynamics.

Finally, the authors also inferred diverse relationships for the Native admixture in the Caribbean populations. They confirmed some evidence of south-to-north migration into Central and Caribbean America, and also specific ethno-linguistic associations between now de facto extinct Caribbean populations and those of mainland South America. Some of these results have long been suggested, but lack of historical documentation makes inferences shadowy. Genomics can not resolve these debates, but they shed light upon them.

Overall this is an interesting study because I think it is a test run at the sort of historical-demographic questions that genomics will be used for. There has long been a ‘genetics as a tool’ school of thought among many ecologists and phylogeneticists, and now you shall have a ‘genomics as a tool’ to sit right along side that in many more diverse fields. Caribbean and Latin American populations are the low hanging fruit, because the Spanish and Portuguese colonial experiment are reasonably well attested, and the source populations are very distinct (so easy to pick signal out of the noise). But there are other historical questions of the same period which are also of interest. In Albion’s Seed David Hackett Fisher describes four Anglo-American folkways which contributed to the culture of this nation. Of these, ~20,000 Puritans arrived between 1620-1640 and became the ancestors of ~700,000 by 1970. Though 20,000 is not quite a bottleneck (in fact, they arrived from different sectors of England), I am curious if these individuals, a segment of “Old Americans,” can still be discerned in the genomic data. This is just one of many possible questions which will be with reach of answer in the near future….

Citation: arXiv:1306.0558
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"