The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Human Genetic Variation

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

mtDNA haplogroup G1a2

The pith: In this post I examine the most recent results from 23andMe for my family in the context of familial and regional (Bengal) history. I also use these results to offer up a framework for the ethnognesis of the eastern Bengali people within the last 1,000 years, and their relationship to other South Asian and Southeast Asian populations.

Since I received my 23andMe results last May I’ve been blogging about it a fair amount. In a recent post I inferred that perhaps I had a recent ancestor who was an ethnic Burman or some related group. My reasoning was that this explained a pattern of elevated matches on chromosomal segments with populations from southwest China in the HGDP data set. But now we have more than my genome to go on. This week I got the first V3 chip results from a sibling. And finally, yesterday the results from my parents came in. One thing that I immediately found interesting was my father’s mtDNA haplogroup assignment, G1a2. This came from his maternal grandmother, and as you can see it has a distribution which is mostly outside of South Asia. In case you care, I asked my father her background, and like my patrilineage she was a “Khan,” though an unrelated one (“Khan” is just an honorific). I received these results before the total genome assessment, and so initially assumed this confirmed my hunch that my father had some unknown recent ancestry of “eastern” provenance. But it turns out my hunch is probably wrong. In fact, my parents have about the same “eastern” proportion, with my mother slightly more! My expectation was that perhaps my mother would be around 25-30% “Asian,” and my father above 50%. The reality turns out that my father is 38%, and my mother 40%.

Image credit: f_mafra

Below are the “Ancestry Paintings” generated by 23andMe for my family (so far). What you see are the 22 non-sex chromosomes, which have two copies each, and assignments to “Asian,” “European,” and “African,” ancestry groups. The reference populations to generate these assignments come from the HapMap, the northern European sample of white Americans from Utah, Chinese from Beijing, Japanese from Tokyo, and ethnic Yoruba from Nigeria. What the assignment to one of these classes denotes is that that region of the genome is closest to that category in identity. It does not imply that your recent ancestry is European or Asian (African is probably a different matter, but there are many complaints about the results for African Americans and East Africans in the 23andMe forums). This caveat is especially important for South Asians, because we generally find that we’re ~75% European and ~25% Asian. All that means is that though most of our genetic affinity is with Europeans, a smaller fraction seems to resemble Asians more. Via “gene sharing” on 23andMe I can see that the Asian fraction varies from ~35% in South India and Sri Lanka, to ~10% in Pakistan and Punjab. This is not because South Indians have more East Asian ancestry than Punjabis. Rather, to a great extent the South Asian genome can be decomposed into two ancestral elements, one with a distant, but closer, affinity to populations of eastern Eurasia, and one with a close affinity to populations of western Eurasia. What some have termed “Ancient South Indians” (ASI) and “Ancient North Indians” (ANI). ASI ancestry, which is probably just a touch under 50% in South Asians overall, seems to shake out then as somewhat more Asian than European.* The fraction of ASI increases as one moves south and east in South Asia (and as one moves down the caste status ladder).

[zenphotopress album=249 sort=sorder_order number=4]

First, I want to note that I’ll be using abbreviations for my family members now and then (this applies to future posts). My father will be RF, my mother will be RM, and my siblings will be RS, with a number to denote which sibling. So currently we have RS1. As you can see in a gestalt sense we resemble each other a great deal as a family. We’re about 40% Asian, and 60% European. The extent of fragmentation indicates that we’re not that recent of an admixture; otherwise, the Asian and European fragments would cluster on one strand or the other. Some have suggested that my mother does exhibit less fragmentation. A hypothesis for why this may be is that her maternal grandfather was reputedly from a family of Middle Eastern origin who had resettled in South Asia, first in Delhi, and later in southeast Bengal (specifically, the district of Noakhali). Since he presumably would hardly have had any Asian ancestry according to 23andMe’s algorithm the homologs inherited from him would be overwhelmingly European, with only one generation of recombination intervening.

To assess probabilities of the plausibility of various hypotheses to explain the pattern of the results you need all the non-genomic information. Above is a map of British India. I’ve pointed to the region of Bengal from which my family comes. Of my great-grandparents 7 out of 8 were born in Comilla (which is actually a greater expanse to the southeast of Dhaka than the current Bangladeshi administrative division). 1 grandparent was born in Noakhali, which is just to the southeast of Comilla. 4 out of 8 great-grandparents were born within 5 miles of the town of Chandpur (RF’s grandparents). 3 out of 4 great-grandparents were born within 5 miles of the village of Homna (RM’s grandparents). These two locations are about 30 miles from each other as the crow flies, though transport between them would have been by water in an earlier era (Homna is on the Meghna river, which is actually a more substantial body of water than the Ganges by the time the latter reaches Bangaldesh). This region is bounded on the west by the Padma river, which narrows at Chandpur to about 2 miles in width (average depth ~1,000 feet). To the east is the Indian state of Tripura. This is a relatively porous border, defined on the map, not imposed by geography. You can see that in some regions the Bangaldesh-India border here in the east actually bisects rice paddies.

Tripuri children

Today Tripura state is majority ethnic Bengali due to mass migration of Hindus from what was East Bengal during the 20th century (and later East Pakistan, and now Bangladesh). But its indigenous people are the Tripuri, a tribe whose native language is clearly Tibeto-Burman, and physical type points to their connection with populations to the north and east. At the same time, ~90% of the Tripuri are Hindus, and during the period of Islamic rule in South Asia the rajahs of Tripura styled themselves defenders of Hindu civilization (just as the Tibeto-Burman Ahoms of Assam did). As such, linguistically and genetically the native people of Tripura exhibit a sharp contrast to the Indo-Aryan peoples of the Gangetic Plain, of whom the Bengalis are the easternmost representatives along with the Assamese. But, they have also long been part of the South Asian cultural scene, and can not longer be viewed as purely intrusive (their oral history indicates that they arrived before the Muslims, for one).

Finally, in regards to the detailed backgrounds of my 8 great-grandparents, 2 were of the Khan class. 1 was from a family of Hindu Thakurs who were recently converted to Islam. Another was of the family name Sarkar. 1 was likely from a family of Middle Eastern transplants to South Asia, at least in part. The 4 remaining great-grandparents were Bengali Muslims, with no particular background information beyond that known by my parents.

I gave you all this because genetic variation is strongly conditioned upon geographical and cultural parameters. Water barriers seem to have been particular efficacious in the pre-modern period dividing people culturally and genetically (though ironically water was also a precondition for any bulk trade). Language is also another major parameter of difference. And finally, there is religion. In the last section I would not be surprised if 300 years ago the majority of my ancestors in that generation were Hindus; there is some fluidity in this obviously. I provide the data on radius of place of birth because we know from European results that even villages exhibit genetic clustering. This is mitigated in my family because my father has a diverse background among his grandparents as far as community goes, while my mother has a grandparent who was from a different district, and to a great extent a different ethnic group in biological terms.

When I initially saw that I was ~40% Asian I was little taken aback by the high proportion (remember, the average South Asian is about 25% Asian), but there were two parsimonious explanations, a) I had a lot of ASI, b) I had ancestry which did not seem South Asian as such, but was genuinely from East Asia. To ascertain whether it was the former I began proactively gene sharing with a wide range of South Asians on 23andMe. After dozens of individuals it became clear that I was outside of the normal interval of variation. I was more Asian than individuals from South India or Sri Lanka. Additionally, even these individuals tended to be genetically closest to Central South/Asians in the HGDP data set. I was closest to East Asians. Also, on the two dimensional PCA projected onto Central South/Asians I was definitely outside of the cluster of all the other South Asians. Finally, I did find someone who broke the magic 35% barrier of Asian…and that individual was a Bangladeshi, at 38%. And, like me, he was closer to East Asians on the basic “Global Similarity” match. He also carried a Y chromosomal lineage which was rare in South Asia and common among the Hmong. Finally, when Dienekes started his Dodecad Ancestry Project it was clear that about ~15% of my ancestry clustered with an element which was not South Asian, but East Asian. If one removes this fraction, I would be about 70% European and 30% Asian, absolutely within the normal range for someone with ancestry to the east or south of the subcontinent.

If you’ve read up to this point, you may be wondering how it is that my father is 38% Asian and my mother is 40% Asian, and I’m 43% Asian. After all, shouldn’t I be an average between the two? Actually, on the PCA scatter plot I am (along with my sibling) exactly between my parents (you can’t see the offspring because the flags are just too large). So why the difference? First, remember that the PCA is projecting you onto a two dimensional axis where the x and y represent the two biggest components of variance in the data set. In other words, it’s yanking out the subset of genetic variance which really stands out in terms of between population difference. This is how an individual who is a first generation Eurasian can be so far from their parents on this plot, but still exhibit a great deal of identity by state in terms of total genome; there’s a lot of variation that the two dimensional plot does not capture (e.g., private variants to family lineages). The Ancestry Painting estimates are different; they’re looking across the whole genome and making assessments for each region as to its genetic affinity between the three reference populations. So to repeat, you have over 50 reference populations vs. 3, and, you have a small proportion of the total genetic variation, vs. the whole genome. Both methods are reporting real and valid results, but they’re somewhat different.

So there are two very simple and methodological explanations for the discrepancy above which I can think of. I’m on V2, while my parents and sibling are on V3. I know this has made a difference in other measurements. Additionally, there’s clearly some “noise” within this algorithm, resulting in people with trace African or Asian ancestry which isn’t real, even if you take into account the kludgey nature of the reference populations. But let’s take the results at face value. With the ancestry painting, recall how the European and Asian components were chunky across the genome? Both of my parents received half their genomes from their parents. My own chromosomes are a mosaic of those of my grandparents. Some of the original linkage between genomic regions because of their physical location on the same strand have been broke apart by recombination in the two generations downstream from my grandparents. Concretely, two instances of meiosis which produced sex cells. Therefore, some of the associations of alleles present in my grandparents have been transformed within me. But even without recombination, it is clear that one homologous chromosome could be more European or Asian than the total genome average. Because only one of these is passed to any given offspring, there is going to be variance from sibling to sibling. Genetics is not a pure blending process. That may be why I am 43% Asian while RS1 is 40% Asian. We’re both sampling from our parents genes, and there’s going be variance in that process (on the chromosomal level you have 22 autosomal draws from each parent where each draw has two outcomes).

An interesting implication of this is that the grandchildren of a multiracial couple will exhibit variance in their ancestral quanta from major racial groups. This is one reason why it is a fallacy to presume that intermarriage will result in the washing away of biological diversity. And processes such as assortative mating could even presumably extract out “pure” individuals from an originally admixed random-mating population.

With all that said, I now believe that with an N = 3 from eastern Bengal that I am not an exception with recent Southeast Asian ancestry, but rather eastern Bengal is part of the gene frequency cline between South Asia and Southeast Asia, and as such has a substantial fraction of eastern ancestry. Zack has my parents’ data, so once the results come back from the first runs of HAP I believe that he will see the same pattern of substantial non-South Asian ancestry in them that Dienekes found in me. The cline here is still sharp. The average Bangladeshi is probably interchangeable with just 10-20% with the average Burmese when it comes to proportions in inference of ancestral quanta algorithms. (remember that the Burmese probably have a small South Asian component too). In contrast, the average Bangladeshi probably can be interchangeable at 80-90% with a resident of Bihar (the closest match in total SNP comparison in 23andMe that I’m sharing with is a Bihar, not the two other ethnic Bengalis). This is clearly a function of geography, the north-south ranges in Burma seal it off from South Asia. In contrast, there are open plains from northern Bangladesh to Bihar. In some ways Burma has more cultural affinity and connection with peninsular South Asia because of the ease of maritime travel. The prevalence of Theravada Buddhism in Burma is a testament to the association of the lower Irrawaddy region with Sri Lanka.

Back to Bangladesh. One aspect of the Indian subcontinent in terms of religious demography is that the heart of Indo-Islam, the Delhi area, never had a Muslim majority. Rather, Muslims were a majority along the northwestern and northeastern fringe (along with a few other districts, such as northern Kerala). The predominance of Islam on the northwest isn’t that surprising, as that region borders upon the Dar-al-Islam proper. But what about Bengal? In the late 19th century the British were apparently surprised that in the united Bengal (which includes roughly the modern state of West Bengal in India, and Bangladesh) had a Muslim majority. Because of differential birth rates and conversion (this second includes sections of my family as I note above) about 2 out of 3 ethnic Bengalis alive today are Muslim, with the balance being Hindu. Bangladesh is estimated to be 90% Muslim, while West Bengal is 25% Muslim. Even today after generations of Hindu outmigration one pattern within Bangladesh is the relative concentration of Hindus to the west and north (also, Hindus in Bangladesh tend to be urban). The “buckle” of the “Koran belt” in Bangladesh is actually the district of Noakhali, on the southeast fringe of Bengal. My mother’s maternal grandfather, who came from a lineage of pirs who had originally settled in the Muslim heartland in Delhi, was from Noakhali. It is apparently said that in Noakhali even the Hindus know proper Islamic forms!**

An explanation for this pattern is that the religious influence and power of Hindu elites declined as a direct function of distance from the regions of West Bengal, which were closer to the core Aryavarata, and had traditionally been the locus of power of Hindu dynasties before the rise of Islam. Additionally, Bengal was the last region of the mainland subcontinent with a robust Buddhist society during the flowering of the Pala Empire around the year 1000. It is therefore suggested that many Bengali Muslims were converted directly from Buddhism, not Hinduism (there remains even today a small minority of ethnic Bengali Buddhists, who carry the surname “Baura.” This is in distinction to the descendants of Tibeto-Burman people who now speak Bengali, but retain a tribal identity and Theravada Buddhist religion). Also, it may be that eastern Bengal was populated mostly by animist tribes before the arrival of Muslims, and just as European colonial powers were more successful in Asia at spreading their religion among marginalized people (e.g., tribal peoples in northeast India and Southeast Asia are often Christian), so Islam found purchase among those outside of the Hindu caste system.

These models are broadly persuasive to me. But, I still am suspicious that there was such a strong disjunction in the depth of Hindu institutions in western vs. eastern Bengal; after all, the kings of Tripura to the east were Hindu when Islam was new in South Asia. If being tribal and marginal to the core Hindu civilization was one of the grounds for susceptibility to Islam it is peculiar that it is precisely many tribal people in modern Bangladesh who are not Muslim. Indeed, the Tibeto-Burman populations nearer to Indian groups in eastern South Asia are Hindu or Buddhist, not Muslim (those further in the hinterlands were not integrated into any South Asian religion, but converted to Christianity by Western missionaries within the last century).

Instead, I find the model espoused in The Rise of Islam and the Bengal Frontier, 1204-1760 broadly plausible as a complement, or even substitute, to the above hypotheses. Additionally, it has the utility of making sense of the genetic data which I have presented here so far. The author argues that eastern Bengal, most of Bangladesh, was very lightly populated before the conquest of Bengal by Muslims in the 13th century. During the modern era the western region of Bengal, in India, has tended to have issues with the moribund nature of many of the water courses. But one thousand years ago this region was more active in terms of sedimentation, while eastern Bengal was a wilderness. Over the centuries there has been a shift of large rivers to the east, opening up that area to cultivation because of improved transport. Additionally, the arrival of Muslims also resulted in the spread of new techniques of land clearing and settlement. The rough model is that eastern Bengal is in fact a relatively newly settled territory in terms of its current demographic density. As the clearance and settlement operations were performed by Muslim elites, many of the peasants who settled these lands were either Muslim, or more likely, adopted the religion of their landlords. Because of the virgin nature of the territory these original settlers entered into a phase of massive demographic expansion, to the point where eastern Bengal (Bangladesh) is now today twice as populous as western Bengal (West Bengal). The key here is that there need not be a massive conversion of the enormous masses of marginal animist, Hindu or Buddhist peasants. Rather, all one needs is a modest number of converted Bengali peasants to enter into exponential population growth until the land is “filled.” (interestingly, one sees similar patterns between descendant populations in both the USA and among Koreans. The religions in the “core” homelands are very different in constitution from the Diaspora)

I find this persuasive for two major reasons. First, Peter Bellwood’s First Farmers documents the difficulties of populations which have not been engaged in intensive farming to switch to that modality. At least back to the Mughal period Bengal was a densely settled land from which one could extract massive rents simply due to aggregate productivity. Today a united Bengal would have a population of 240 million, making it the fourth most populous nation in the world, below the USA, and just above Indonesia. In hindsight I find it less likely that the peasants of eastern Bengal descend from tribal peoples who had been practicing extensive agriculture, but were introduced to new techniques, than that western populations already habituated to the grinding expectations of intensive farming colonized the “empty” lands (in fact, Bengali peasants migrate to Assam in part because of the perception of land surplus there, even though Assam has 30 million inhabitants). But this initial phase of colonization would entail relatively few peasants, and probably exhibit some male bias. Therefore, this can to explain a substantial fraction of the eastern ancestry among Bangladeshis, as in the first generations the Bengali peasants did assimilate the native tribal peoples of the region, whether it to be the Munda Santhals or Tibeto-Burman relatives of the Tripura. With the massive numbers of ethnic Bengalis in comparison to Tibeto-Burman groups it seems one would need a great deal of gene flow in any model which posited that exchange between these two groups over long periods of time explain the high fractions that one finds of non-South Asian ancestry. In all of India there are only 10 million speakers of Tibeto-Burman languages, vs. the 240 million speakers of Bengali alone in the Indian subcontinent.

Where does this leave us? From what I gather you’ll probably not make it into the first round of results for HAP, but if you have 23andMe results and haven’t it sent it to Zack, and want to learn more about the historical genetics of the Indian subcontinent, you can still get involved! With my parents Zack now has an N = 2 of Bengalis. It would be nice to get more. We still need samples from North-Central India. The number of Punjabis is in the 5-10 range, Tamils is around 5. Enough to make inferences, but certainly not robust enough to bet the house on. In the near future I’ll get results from my other siblings, and I’ve decided to “upgrade” to the V3 chip. Once that comes in I’ll phase some of the results, and probably start comparing myself to my siblings, “phase” the results, etc.

* Native Americans, descendants of pre-Columbian Americans, have the inverted results from South Asians, mostly Asian with a European minority. This is not just due to recent European admixture. Rather, though Amerindians have affinities to East Asians, the two groups have been distinct for at least 10,000 years, and probably considerably longer.

** Also, some have stated that the people of Noakhali are sly and cunning, adept at following the letter of the law, but not the spirit. I only know this because when I was young one of my father’s friends, also from Bangladesh, complained that a mutual acquaintance from Noakhali who made much of his piety (he put his wife in purdah when she arrived from Bangladesh) requested that someone else purchase a pornographic magazine for him. His reasoning was that he did not want to be seen purchasing the magazine. It was a sin to purchase such an item for a good Muslim. Later my father and his friend (who was from northern Bangladesh for what it’s worth) commiserated that such was the way of the people of Noakhali, amongst whom you have to have your wits about you lest they exploit some angle for their own self-interest. The pious-porn-non-purchaser was notorious for being a non or late payer of rent when he was a lodger with other Bangladeshis, always emphasizing his religious piety as surety of final payment of the debt. He also eventually finagled a loophole in the immigration law of the time, obtaining green card with relative ease and no necessity of sponsorship. The proper connotation of how people from Noakhali are is probably captured by the American English word slick.

🔊 Listen RSS

A follow up to the post below, see John Hawks, Selection’s genome-wide effect on population differentiation and p-ter’s Natural selection and recombination. As I said, it’s a dense paper, and I didn’t touch on many issues.

🔊 Listen RSS

If you are like me, and if you are reading this weblog there is a significant probability you are like me, you read L. L. Cavalli-Sforza‘s History and Geography of Human Genes in the 1990s, and in the early aughts Spencer Wells’ A Journey of Man. Science has come very far in the last in the last 10-15 years, even Cavalli-Sforza’s magnum opus pales in comparison to the literal tsunami of data and analysis which the “post-genomic era” has ushered in. Instead of a gene here and there, or even the mtDNA and Y chromosome, researchers are now looking at hundreds of thousands of genetic variants, SNPs, across genomes. We’re rapidly approaching the era of whole genome sequencing, even if we’re not quite there yet.

But what’s the purpose of advances in technique and computation? Though the long-term project is to understand human variation and genetic function so as to have biomedical utility, in the short-term there is an enormous wealth of more abstract population genetic insight which can be extracted. Because of the biomedical focus of contemporary genomics we take a somewhat anthropocentric view, which is fine by me as I am an unregenerate speciest. The fish, fowl and crawling things of the earth can come later. And in any case, the beauty of the human focus of modern evolutionary genomics is that there are whole disciplines such as paleoanthropology which can serve as partners in interdisciplinary projects.

Humans are like any other organism, buffeted by conventional evolutionary genetic dynamics, drift, migration, natural selection, as well as processes which are more biophysically rooted such as recombination and mutation. Each of these processes leave their tell-tale marks on the genome. Mutation replenishes variation which drift and selection often eliminate, the former by chance and the latter in the form of negative selection. Migration serves to homogenize across populations through gene flow, while diversifying within populations by introducing novel variants. Finally, recombination breaks up linear associations of genetic variants along a DNA sequence, and has been used to explain sex.

In regards to H. sapiens it seems that our recent evolutionary history is dominated by a few big events. Within the last 100,000 years we underwent an extremely rapid population expansion from a small founding group within Africa, and radiated adaptively across all continents except for Antarctica. We are a then a relatively genetically homogeneous population, with much of the extant variation remaining within Africa, and the non-African groups getting progressively less diverse with distance from that continent. Basically a model whereby our species spread across the world via serial founder events. This simple model suffices in the broad sketch, but there is much more to the story. Over the past few years the older idea that current continental populations are the descendants of the first settlers, that is, the first modern humans who displaced the archaic populations which preceded them, seems unlikely to be to totally correct in all cases. It is likely wrong in Europe and to some extent India, no trivial exceptions.

There is much which can be said about details of demographic history in regards to the possibility of mass migrations, but today I want to focus on another dynamic: the effect of natural selection on the human genome. There are some researchers who are very skeptical of the efficacy of selection in shaping the patterns of variation we see, constraining it to a few loci such as that which confers lactase persistence or resistance to malaria. Others feel that selection’s power in shaping the genome is far more pervasive. Finally, there is a middle path, which emphasizes a diverse and complex portfolio.

A new paper explores the extent and nature of selection in human genomes through combining a rather old population genetic statistic with new expanded data sets and powerful statistical techniques, Human Population Differentiation Is Strongly Correlated with Local Recombination Rate:

Allele frequency differences across populations can provide valuable information both for studying population structure and for identifying loci that have been targets of natural selection. Here, we examine the relationship between recombination rate and population differentiation in humans by analyzing two uniformly-ascertained, whole-genome data sets. We find that population differentiation as assessed by inter-continental FSTshows negative correlation with recombination rate, with FST reduced by 10% in the tenth of the genome with the highest recombination rate compared with the tenth of the genome with the lowest recombination rate (P≪10−12). This pattern cannot be explained by the mutagenic properties of recombination and instead must reflect the impact of selection in the last 100,000 years since human continental populations split. The correlation between recombination rate and FST has a qualitatively different relationship for FST between African and non-African populations and for FST between European and East Asian populations, suggesting varying levels or types of selection in different epochs of human history.

You know of FST, even if you don’t know what FST is. You have heard that 15% of the variation in human genes is between races, and 85% within races. That 15% is an FST of 0.15. In other words,FST is a population genetic statistic which partitions the variance in genes between and within populations. If you have two populations and both have allele frequencies of 0.50 for two alleles, A & B, at one locus, then the FST would naturally be 0 as there is no between population difference, you can swap individuals from either group interchangeably for purposes of comparison. In contrast if the frequencies were disjoint so that all individuals in one population were of allele A and all individuals in the other of B, then naturally the FST would be 1, as all the variance is between populations, and all the information you need is found within population substructure. A perfect opportunity for profiling!

But this varies by gene and genomic region. As you know from above most variance is within races, not between them. But for the gene SLC24A5 almost all the variance is between Europeans and Africans, not within them. Similarly, all the variance on this gene is between Europeans and East Asians. Finally, there is no variance between East Asians and Africans on this gene. Why? Because it looks like that this gene has recently increased in frequency in West Eurasia, to the point where a new mutation has replaced the ancestral variant, which is common in Africa and East Eurasia. Additionally, it is notable that this genetic variant seems to account for 30-40% of the skin tone difference between Africans and Europeans. The point is that total genome variation is not always a good indicator of the evolutionary history of a specific gene. This is a problem especially in the case of those which we might find of interest.

In the paper above they find that areas of high recombination are negatively correlated with FST on a global scale. In other words, genomic regions which recombine more often across DNA strands and so shuffle genetic variation about and break apart linear associations show lower FST values. Reduced between population variation.

It’s rather clear in their first figure. Before we jump to that, let me note that they’re using the Perlegen data set, which has two dozen African Americans, Chinese and Europeans, respectively, and 1 million SNPs. Each panel has the FST values on the Y axis, and the recombination rate on the X axis.


It’s pretty clear what’s going on just through inspection. There’s an average decrease of 4% in FST for every 1 cM increase in recombination rate. The correlation estimates for FST and median recombination rates are for each panel:

A −0.962 (P = 8.9×10−6)
B −0.815 (P = 0.0041)
C −0.931 (P = 0.0001)
D −0.361 (P = 0.306)

The correlation estimates tell you something you can see visually: there’s a big difference in the relationships contingent upon which populations you’re using to calculate FST. In particular, a lot of the linear relationship between FST and recombination rate is actually due to the African vs. non-African difference. This is not a total surprise, Africans have a lot of genetic variation. In terms of genes one can think of non-Africans as simply a branch of Africans in many ways. Additionally, there’s been suggestive data for a decade now that when Africans left Africa they were subject to new selection pressures which seem common to Eurasians as a whole (though to be fair these signatures of selective sweeps in Eurasia may simply be false positives generated by population bottlenecks and the like).

Yes, I will work back to selection from genetic variation; the two are related, though the relation can be subtle. So why the negative correlation between FST and recombination? Consider an SNP, a single DNA base pair, which is subject to positive selection. It can increase rapidly in frequency so that it goes from ~0 to ~1 in proportion with the population. Fair enough, but SNPs do not exist in an abstract universe, they’re physically embedded in DNA, and so are flanked by many other bases. If an SNP is subject to powerful directional selection which drives its frequency upward, then adjacent bases also “hitchhike” along in frequency. In other words, powerful directional selection can reorder the variation of whole genomic regions, depending on how powerful the selection is. It can sweep away the noisy scatter of variants introduced over many generations by mutation and replace them with a long sequence of alleles from an ancestral genome which harbored the selected variant. Over time mutation can mask the homogenization as it replenishes variation, but there is also another dynamic which blocks the long march to genetic uniformity: recombination. Recombination can tear apart blocks of alleles as they sweep up in frequency, and the more recombination, the greater the counterforce to the homogenizing power of selection on the local genome as the block is chopped up evermore.

Now consider the nature of selection in different populations. Let us stipulate that the light skin of East Asians and Europeans are adaptations; we know that they are conferred by different SNPs. In other words, selection operated on different genetic variants to produce the adaptation (though by and large across the same set of genes). Therefore, FST on pigmentation genes is relatively high because of between population difference, and these genes tend to be surrounded by regions of homogenization as they seem to have swept up to high frequency rapidly and dragged along many nearby alleles. Assuming this effect of natural selection equal recombination would naturally tend to work against increased FST by reducing the number of linked alleles being dragged along by breaking apart the genomic blocks along their transient up.

A similar effect occurs in the case of background selection, which is operative around deleterious mutations. When there are alleles which are negatively selected their neighbors are also effected; consider it a sort of population genetic “property value.” Because negative selection tends to reduce effective population size, on a large geographical scale it can result in increased FST (consider the genetic uniqueness of isolated populations which have gone through bottlenecks). Again, recombination can blunt the impact of a deleterious allele on its neighbors. The authors do note that the particular dynamics of positive and background selection differ, the latter is a gentler affair by far, a repetitive tap as opposed to a sledgehammer, but the genomic resolution of their analysis is such that they lack power to explore these differences.

I’ll let the authors describe the peculiarities of the European-Chinese panel:

…The weaker correlation for the FST between European and Chinese populations is driven by a dip in differentiation at very low recombination rate loci…which is not at all what is seen in the comparison of African and non-African populations…This curve shows a qualitatively non-monotonic pattern, which motivated us to perform a quadratic regression fitted within the bootstrapping framework. The regression is concave and includes very significant linear (P = 3.0×10−4) as well as quadratic (P = 1.8×10−5) terms. Conversely, quadratic regression gives a non-significant quadratic term for FST between African Americans and each of the other two populations and if anything is slightly convex. As expected, for single SNP analysis (without binning by recombination rate), linear regression is very significant for FST between African Americans and either non-African population (P≪10−12). For FST between Chinese and Europeans, however, linear regression is not significant (P = 0.81), while a quadratic regression is very significant (P≪10−12)….

Non-monotic is just a way to say that the trend reverses direction. In other words, the linear model isn’t too good a fit on what’s going on with the variation between Europeans and Chinese, and how it relates to recombination rates. They offer two speculative possibilities for the “inverted U-shaped” nature of the relationship between FST and recombination in Europeans and Chinese. First, the smaller effective population sizes of non-African groups results in greater efficacy of background selection. As random genetic drift tends to increase the frequency of deleterious alleles, powerful negative selection is given opportunity work against that region of the genome. This results in more background selection as adjacent genomic regions are impacted. Because of differing population sizes the balance between positive and background selection is different for Africans and non-Africans. A second hypothesis is that gene flow between the two Eurasian groups allowed for selective sweeps to move from one group to the other. In other words, between population variance can be reduced if a favored alleles spreads across all populations from one original group (lactase persistence in much of northern Eurasia may be a case of just this).

Intriguingly they found the linear relationship between FST and recombination to be stronger in genomic regions which are coding (i.e., they have genes which code for proteins), and in particular in the X chromosome. The second aligns with other recent work which indicates that the X chromosome may be subject to stronger selective pressures than the rest of the genome because of its peculiar expression pattern (males have only one copy of the gene, and females express only one copy per cell due to X chromosome inactivation).

Finally, they replicated their results using other data sets. I’ll just show the figure from HapMap3:

I obviously rotated so you could see the labels at higher resolution. Here are the populations:

WAF (“West African”) is a combined sample of YRI (Yoruba in Ibadan, Nigeria) and LWK (Luhya in Webuye, Kenya)

EAS (“East Asia”) is a combined sample of CHB (Han Chinese in Beijing, China), CHD (Chinese in Metropolitan Denver, CO, USA), and JPT (Japanese in Tokyo, Japan)

EUR (“Europe”) is a combined sample of CEU (ancestry from Northern and Western Europe) and TSI (Toscani in Italia)

GIH is a sample of Gujarati Indians in Houston, TX, USA

MKK is a sample of Maasai in Kinyawa, Kenya; and CHI (Chinese) is a combined sample of CHB and CHD.

Their analysis of these findings are cautious:

A striking result is that the relationship between FST and recombination rate is stronger for FST between pairs of closely-related populations, whether within or outside Africa: FST between a West African sample and Maasai (of mixed West African and East African ancestry…decreases by an average of 6% for every 1 cM/Mb…FST between Italians and individuals of North-Western European ancestry decreases by 10% for every cM/Mb …and FST between Japanese and individuals of Chinese ancestry decreases by 4%...In view of the large effective population size in recent human history since each of these pairs of populations have split, these observations support the possibility that the different patterns observed between different pairs of populations are due to natural selection operating more efficiently in the context of larger population sizes. We observed a weak convex relationship with recombination rate for FSTbetween closely-related populations in a quadratic regression analysis …which is intriguingly opposite to what was observed between Europeans and Asians…On the other hand, these observations do not seem to support the possibility that the different patterns are due to selective sweeps being shared to different extent across different pairs of populations since the level of gene flow between HapMap 3 closely-related populations likely have had been higher than that between continents. These results, while interesting, should be viewed with caution due to the confounder of ascertainment bias. It will be possible to test these observations further by analyzing data from the 1000 Genomes Project, where whole-genome sequencing will generate data that is largely free of ascertainment bias for many of the HapMap 3 populations as well as additional populations

The reason that selection would be more powerful at large population sizes is that the noise of random genetic drift is less likely to interfere with its deterministic process. Additionally, one presumes there would be more extant genetic variation in large populations than small ones. But though these results are interesting, they don’t seem to put too much stock in them.

Let me finish with the author’s conclusion:

In conclusion, we have shown that genome-wide human population differentiation in allele frequencies is significantly correlated with recombination rate on a megabase scale, demonstrating that natural selection has had a profound effect on allele frequency distributions averaged over the last hundred thousand years. While these results likely reflect the effects of hitchhiking and background selection, disentangling the strengths of these two forces will require extending the analyses presented in this paper. One important direction is to use genetic maps that have fine spatial resolution, which may shed light on the detailed distribution of selective coefficients that have shaped allele frequency differentiation. A second direction in which these results can be extended is to compare more populations of continentally diverse ancestry. This should facilitate an exploration of the relationship between recombination rate and population differentiation during different epochs of human evolution, and should allow a better understanding of how demographic history has shaped the impact of natural selection on patterns of human genetic variation.

Note: I left a lot out in this treatment. It’s Open Access so you can read the whole thing!

Citation: Keinan A, Reich D, 2010 Human Population Differentiation Is Strongly Correlated with Local Recombination Rate. PLoS Genet 6(3): e1000886. doi:10.1371/journal.pgen.1000886

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"