The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Indian genomics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

ResearchBlogging.orgThe Pith: Afro-Indians are mostly African, with a substantial Indian minority ancestry. The latter is disproportionately female mediated. It also seems that that ancestry is more northwest Indian, and that natural selection has been operating upon them outside of the African environment.

Along the western coast of South Asia, from Makran in southwest Pakistan, down to the Konkan coast of southwest Iindia, there are isolated communities of Afro-Indians. They are called Siddis or Habshi. Their African origin is clear in their physical appearance, as well as aspects of their folk customs which tie them back to Sub-Saharan African. Nevertheless, they have assimilated to many Indian cultural traits. They generally speak the local language, and practice Islam, Hinduism, or Roman Catholic Christianity (in that order in proportion).

How and why did the Siddis arrive in India? The earliest date for their arrival almost certainly must be bounded by the period when Indo-Islamic polities rose to prominence in the early second millennium. The cosmopolitan melange of the armies of the Muslim warlords included diverse groups of Africans, some of whom took power, and established their own self-conscious Afro-Indian dynasties, set apart from the Turkish, Afghan, Persian, and Arab inflected statelets. Were these the sources of the modern Siddi communities? The oral history of the Siddi of the western coast of South Asia suggests not. In fact the geographical concentration of these Afro-Indian tribes along the Arabian sea fringe is indicative of different historical actors: the Portuguese. In much of Asia, out to China, the role of Africans was very different from that in the New World. They were objects purchased as for elite consumption, not production. They served at court, guarded the harem, etc. Lowland Asia had no need for imported labor, as there was human stock aplenty. Whereas in much of the New World black African slaves were critical cogs in the capitalist system of production, in Asia, as in the Arab world outside of a few areas such as southern Iraq, they were signals of luxurious consumption by the high and mighty (this was in vogue at European courts for a period as well).

Two new papers published yesterday in the American Journal of Human Genetics examine the genetics of the Siddi of India with an eye toward elucidating the details of their historical ethnogenesis. Though the papers overlap to a great extent, there are subtle differences which result in complementation. Shah et al. uses a far thicker set of markers, while Narang et al. look at many more populations, but due to removing SNPs which don’t span their populations the marker set is much thinner. Let’s review the papers in turn.

Indian Siddis: African Descendants with Indian Admixture:

The Siddis (Afro-Indians) are a tribal population whose members live in coastal Karnataka, Gujarat, and in some parts of Andhra Pradesh. Historical records indicate that the Portuguese brought the Siddis to India from Africa about 300–500 years ago; however, there is little information about their more precise ancestral origins. Here, we perform a genome-wide survey to understand the population history of the Siddis. Using hundreds of thousands of autosomal markers, we show that they have inherited ancestry from Africans, Indians, and possibly Europeans (Portuguese). Additionally, analyses of the uniparental (Y-chromosomal and mitochondrial DNA) markers indicate that the Siddis trace their ancestry to Bantu speakers from sub-Saharan Africa. We estimate that the admixture between the African ancestors of the Siddis and neighboring South Asian groups probably occurred in the past eight generations (∼200 years ago), consistent with historical records.

The major value-add of this paper is a estimate of the time of admixture with Indians. I’ll get to that, but let’s look at the phylogenetic relationships really quickly:

The PCA and admixture estimate are perfectly consistent. The Siddis are more African than not, but, they are clearly admixed with the Indian populations. To obtain more fine-grained understanding the authors also looked at uniparental lineages. Note the striking discordance between maternal mtDNA and paternal Y ancestral estimates. And more curiously, note the far closer value using the autosomal estimates, a proxy for total ancestry, and the paternal lineage quanta. I think there’s a rather good explanation for what’s going on: the transport of slaves from Africa was strongly male-biased. These African-born males assimilated into the native Afro-Indian community, which had a strong local Indian component in the early years via women who had married in. But once a significant Siddi community had developed it assimilated new arrivals, who were male, and beefed up the African quanta of autosomal and Y chromosomal ancestry, but not the mtDNA. Like Argentina the matriline of the Siddis is a shadow of the initial generations, when the boundaries between the Afro-Indians and locals were more permeable.

And that initial generation is likely to have been somewhat recent, as the authors estimate that the average date of admixture was ~8 generations before the present, with a standard error of 1 generation. This comes rather close to falsifying the proposition that the Siddis derive in the main from the first generations of Indo-Islamic arrivals. Rather, the Siddis seem more likely to date to the Indian ocean trade in human beings which post-dates the arrival of the Portuguese, as suggested in their oral history. It is important to remember that Omani Arabs and others were also involved in this trade, but the Portuguese were during the 16th and 18th centuries uniquely placed to transport Africans from their East African strong-points to the fortifications on the west coast of India.

The manner in which they estimated this admixture event is rather straightforward. Geographically distinct populations have their own unique genetic variants. If you take two individuals from very distinct populations, they pass a single strand out of the two they carry (granting recombination’s confounding of the two parental strands). That means that the offspring are going to have two homologous chromosomes which are reflective of very different ancestral histories. To give a concrete example, if someone had an Indian parent and an African parent, then one of their DNA strands would have a sequence of genetic variants extremely associated with the ancestry of the parent from which that DNA strand was passed. That is why first generation mixed-race individuals have very high rates of heterozygosity and few runs of homozygosity; their paired strands are very unlikely to have recent common ancestry.

This also implies that in a first generation population of mixed-race individuals you’ll see a whole lot of linkage disequilibrium (LD). This means that markers x, y, and z, associated with population 1, are going to be likely found on the same DNA strands, while markers a, b, and c, associated with population 2, are going to be find on other DNA strands. Therefore, you’ll get long haplotypes, sets of distinctive markers across genes, indicative of the shared demographic history of the two parent populations.

But I stipulated the first generation, because over time LD will decay due to genetic recombination. The schematic to the left illustrates what’s going on. Recall that during meoisis the parental chromosomes segregate and assort, and haploid gametes are formed which transmit the single strands to the offspring. But this process is not always without incident. In particular, the parents’ distinctive strands can break and recombine to form a new haplotype on the strand level. For example, say your mother has one strand which is maternal and another that is paternal. Through recombination she may transmit to her offspring a strand which is 2/3 maternal and 1/3 paternal in reference to her own parents, because the strands may recombine. Therefore, in the first generation the hybrids have a perfect association between ancestry across single strands, but recombination will break apart these associations. First generation Afro-Indians might transmit a strand which is 25% African and 75% Indian to their offspring. Over the generations this mixing & matching with break apart the associations generated through admixture. If one assumes that this rate of recombination is constant, then the extent of linkage disequilibrium and the length of haplotype blocks can give us a sense of time since admixture. This method is relatively powerful if the admixture was recent, as over the generations the extent of LD will asymptotically approach the baseline one might expect without an admixture event. In other words, there is precision toward events near in time, but relatively little to ancient ones.

As noted in the paper, the Uyghur population exhibits a signature of an admixture event ~2,000 years before the present, while the African American population exhibits admixture on the order of hundreds of years. One of the authors of Shah et al. is David Reich, who was coauthor on a paper which famously (to readers of this weblog!) posited that South Asians are an ancient admixture between “Ancestral North Indians” (ANI) and “Ancestral South Indians” (ASI). This event is too ancient for LD methods to peg a date, at least the ones they use here. The Siddi resemble New World African populations in the date of their admixture event, but, their sex bias is very different. In the New World the maternal lineages are overwhelmingly African, while the paternal lineages are more European (though some African groups have Amerindian paternal lineages). I think this tells us something about the peculiarities of the Siddi community in India. Interestingly, I think that they may resemble Ashkenazi Jews and Roma in this tendency, with the paternal lineage being more associated with their cultural and physically salient characteristics, with exogenous admixture occurring through the female lineages.

Finally, in the analysis of the uniparental lineages they show that there seems to be a clear association between the Bantu people of Africa and the Siddi, and that the admixture events were unidirectional insofar as the nearby Indian groups don’t have African admixture. These samples were from Gujarat and Karnataka, and because the Siddi tend to be Muslim while their neighbors are likely to be Hindu, I think we should be careful to generalize too much. An analysis of the HGDP shows non-trivial African admixture among some South Asian groups to the north and west. I would assume that this is a touch older, and dates back to West Asian groups which were somewhat admixed, but it makes sense Pakistani Muslims are more likely to be able to assimilate another Muslim population, exotic though it may be. One of the Pakistanis I analyzed privately exhibited a clear African ancestral signal which they were not able to explain, so it may be a part of the genetic background of many South Asian Muslims, though not Hindus.

So what about the second paper? Narang et al. has a wider variation in populations in an intra-Indian sense, but a smaller number of markers. While Shah et al. used ~800,000 markers, the combined set of Narang et al. is ~20,000, and, they paired it down in some cases to ~3,000 ancestrally informative markers. ~20,000 is sufficient for PCA from what I’ve seen, but for intra-continental differences it is on the bubble for analysis of admixture between putative ancestral populations (i.e., the bar plots produced by Structure, Admixture, frappe, etc.). Additionally, while Shah et al. used Siddi samples from Karnataka and Gujarat, Narang et al. focused on Gujarati Siddis only. The biggest result seems to confirm something hinted at in Shah et al.: the Indian admixture into the Siddis exhibits a regional bias. Shah et al. concluded that using an ASI-skewed Indian sample was less effective than using an ANI-skewed sample. Narang et al. confirms this, showing that the Gujarati Siddis exhibit and admixture cline more toward northwest Indian groups than not. Some of this may be European or Middle Eastern admixture, but I suspect that the best explanation is that as a predominantly Muslim population these Siddis had interactions disproportionately with individuals of Indo-Islamic background. In particular, a disproportionate number of transplants from northern and northwest India (today Pakistan) who relocated to central and southern India with the collapse of the original Delhi Sultanate. These would be the elites purchasing the Siddis in the first place more often than not (though some Hindu potentates also purchased or received gifts of black slaves, their international connections were more tenuous, and their polities were often more land than sea-based).

Because of the thinner marker set the authors couldn’t much more about the admixture event except that it was recent. But, there was this interesting bit about functionally relevant genes:

We also wanted to see whether there were some biological processes that were selectively enriched in the admixed populations from either of the ancestors. Considering the SNPs that have an FST value ≥0.1 between the two ancestral populations, we selected 3396 of the 18,534 SNPs for functional analysis. Of these, 1218 SNPs were filtered out because their frequencies in the OG population were within 5% of the expected frequency, which is the ancestry proportionate weighted average of the allele frequencies of the two ancestral populations. The remaining SNPs were classified into two groups of 1240 and 938 SNPs on the basis of their closeness, in terms of allele frequency, to the Indian and African ancestral populations, respectively. Analysis of gene classes in these groups revealed significant enrichment of cadherins, potassium channels, membrane proteins, and solute carriers as well as protein kinases from the group close to IE and kinases and immune-related genes from the group close to African ancestry. Further functional annotation clustering (FAC) revealed significant enrichment of processes related to axonogenesis and potassium transport in genes from the group for which the frequency of SNPs is close to that of the Indian ancestral population (Table 5). However, FAC did not reveal any specific enrichment of the processes contributed by the other group.

In other words, there’s a deviation from what you’d expect just from ancestry alone. Why? I suspect there was some sort of release of functional constraint due to the high pathogen load common in Africa in relation to South Asia (yes, South Asia has a low pathogen load compared to Africa!). It isn’t as if the climate is that different. Here the categories of genes which seem to be overrepresented in the Siddi population in relation to the ancestral Indian component (in other words, the proportion of “Indian” ancestry is higher at this locations than expectation):

Here’s the elaboration in the discussion:

…. However, we wanted to examine whether the OG have retained any enriched biological processes from either of the ancestors. Our search for functional enrichments was directed at the AIMs that were associated with genes and whose frequency in OG was close to either of the ancestral populations. We observed a significant enrichment of processes related to ion-channel activity and cadherin genes; the genotypic spectrum in these enriched processes was close to that of the IE ancestors (Figure 7). Selection in ion-channel genes among populations of African ancestry has been a long-term global enigma. However, the fact that the population resides in an extremely saline region of the country and has shown deviations in these genes was intriguing and made it compelling to speculate that this finding is biologically relevant. This is especially interesting in the light of the fact that a recent GWAS study of hypertension and blood pressure in African Americans implicated a similar family of genes related to ion channels, cadherins, and calmodulins.

IE here means “Indo-European.” Since the samples are from Gujarat, an Indo-European speaking region, one would expect this affinity, though recall that the Siddis are biased toward a more northern affinity than that. In any case, the implications of constraint and selection on these loci have long been discussed, and the Afro-Indian case serves as an interesting replication of the larger pattern.

Summary points:

1 – The Siddis are relatively recent in time in their origin. Post-1500, and possibly early British.

2 – Admixture with South Asians was more “female mediated.” That is, Indian ancestry tends toward a maternal origin, though not exclusively so.

3) The ancestry also seems somewhat biased toward north and western South Asian sources. Shah et al. had a Karnataka sample, which is in a Dravidian speaking region (albeit, with Indo-Aryan minority populations), and they still found that in that group a North Indian ancestral population was a better fit than a South Indian one. The main caveat is that this may be due to exogenous West Asian or European ancestry against a South Indian background.

4) There seems some evidence of changes in the selective constraints and pressures, which have had a genome-wide impact even in ~10 or so generations.

On a final note: if the numbers quoted here are correct then I believe that the majority of the African ancestral element within the boundaries of South Asia is distributed amongst South Asian Muslims. A generous estimate of the number of culturally identified Siddis seems to be ~250,000. If 0.25% of the genome of Pakistanis is African, which I think is plausible, then that would be ~400,000 Siddis! I suspect that Indian Muslims, even some Bangladeshis with Middle Eastern ancestry (such as my mother), also have a non-trivial African ancestral element due to the cosmopolitanism of the Dar-ul-Islam, and the ubiquity of black slaves as consumption signals and military shock troops amongst Islamic elites. As for how much is found in the Hindu population, that will be a good gauge I think not of the intermarriage of Africans with Hindus, but the assimilation of liminal Muslim groups, in particular sects considered heterodox by India’s Sunni rulers, into the Hindu caste system.

Citation: Anish M. Shah, Rakesh Tamang, Priya Moorjani, Deepa Selvi Rani, Periyasamy Govindaraj, Gururaj Kulkarni, Tanmoy Bhattacharya, Mohammed S. Mustak, L.V.K.S. Bhaskar, Alla G. Reddy, Dharmendra Gadhvi, Pramod B. Gai, Gyaneshwer Chaubey, Nick Patterson, David Reich, Chris Tyler-Smith, Lalji Singh, & Kumarasamy Thangaraj (2011). Indian Siddis: African Descendants with Indian Admixture American Journal of Human Genetics : 10.1016/j.ajhg.2011.05.030

Citation: Ankita Narang, Pankaj Jha, Vimal Rawat, Arijit Mukhopadhayay, Debasis Dash, Indian Genome Variation Consortium, Analabha Basu, & Mitali Mukerji (2011). , Recent Admixture in an Indian Population of African Ancestry American Journal of Human Genetics : 10.1016/j.ajhg.2011.06.004

Addendum: Am the only one a touch weirded out by the face of the black person in the first figure? It isn’t as if illiterates are going to be reading the paper! Kind of funny though.

🔊 Listen RSS

Two years ago Reconstructing Indian Genetic History reframed how we should view South Asian historical genomics. In short, Indians can be viewed as a hybrid between a West Eurasian group, “Ancestral North Indians” (ANI) and a very different group, “Ancestral South Indians” (ASI), which had distant connections to West and East Eurasians. At least to a first approximation. Last fall I posted on a new paper which surveyed the Austro-Asiatic speaking peoples of India, and concluded that they were exogenous to the subcontinent. This is an interesting point. Prehistoric treatments of South Asia often use linguistic terms to denote putative ancient populations. One model is that first it was the Munda, the most ancient Austro-Asiatics. Then the Dravidians. And finally the Indo-Aryans. These genetic data imply that the Munda arrived after the initial ANI-ASI synthesis. The Munda people of India can be thought of as ANI-ASI, with an overlay of East Eurasian ancestry.

Zack Ajmal’s K = 11 ADMIXTURE run has highlighted some further issues. He has a set of Austro-Asiatic samples, as well as a host of Indo-Aryan and Dravidian speaking populations. I now believe we can now further clarify and refine our model of the peopling of India. Here it is:

1) ASI, circa ~10,000 years BP

2) ANI enters the subcontinent from the northwest, synthesis with ASI

3) The ancestors of the Munda enter from the northeast, synthesis with ANI + ASI in their region

4) A subsequent group of West Eurasians, related to the ANI, so I will term them ANI2, enters from the northwest and overlays the ANI + ASI synthesis. In the northeast quadrant of the subcontinent this group marginalizes the Munda people, who are either assimilated or escape to more remote locations. I believe that ANI2 is likely the Indo-Europeans, but it may be Dravidians as well

5) A second group of Austro-Asiatic peoples enters from the northeast, and synthesizes with the AN2 + ANI + ASI. In some regions they are absorbed (Assam), but in other regions they are culturally dominant (Meghalaya)

Below are two plots which illustrate where I’m coming from. The “S Asian” component from K = 11 above seems to overlap, but is not identical to, ANI. The “Onge” component plays a similar role with ASI. The “SW Asian” and “European” elements are pretty straightforward. They’re very closely related to the “S Asian” one, but they do separate from it. Their relationship to distant non-Indian groups as well as a gradient toward the northwest suggests to me a more recent arrival of this element.

Two patterns. For the Indo-European and Dravidian South Asian groups you see a vertical distribution which corresponds to populations which are a combination of ANI/ASI. But notice the perpendicular distribution of the Austro-Asiatic groups. The East Eurasian element to their ancestry means that they are not fully modeled by the two-way admixture. I believe that the the “Onge” fraction, which tracks ASI, is overestimating ASI in the Austro-Asiatic because the this proportion just seems way too high in many Southeast Asian and Dai groups to be plausible to me as a prefect proxy for ASI in them. But in any case, note that the Austro-Asiatic groups seem to be mostly a mix of ANI/ASI like other South Asians. There is clearly one outlier population. I’ll get to them.

Below is a plot which shows the ratio of the sum of AN2 over the stabilized hybrid proportion.

We know from Reconstructing Indian Genetic History that South Indian tribals and Dalits have a fair amount of West Eurasian ANI. But, from the genome bloggers, and especially Zack’s further analyses, we can see that there is a further component of West Eurasian ancestry which is probably not ANI, but post-dates it. These components have affinities to Southwest Asia or Central Eurasia. They’re labeled “SW Asian” and “European” in Zack’s K = 11. Here’s the big thing you notice: this element increases southeast-northwest, and low caste to high caste. It’s almost absent among many Dravidian populations. It is very common in the northwest of the subcontinent.

Again, except for that one outlier, the Austro-Asiatic groups almost totally lack AN2, just like some Dravidian tribals. On the other hand, even the most AN2 groups in South Asia clearly have some ASI and ANI. But having ASI and ANI does not guarantee AN2. The East Eurasian component found in the Austro-Asiatics seems constrained to the northeast of the subcontinent by and large. Finally, we have the outlier Austro-Asiatic group.

These are the Khasi. They are are not Munda, and seem to have closer relationships to other East Eurasian populations. They also have a small, but noticeable AN2 component. What’s going on? I believe that the Khasi arrived in northeast India after those who brought AN2 had already marginalized the Munda. Some of the Khasi were probably assimilated into the post-Munda (Indo-European or Dravidian speaking) peasantry. But some of the Khasi maintained their identity in the highlands, where they also intermarried with the post-Munda population, which had AN2. In contrast the Munda who retained their cultural identity had withdrawn and disengaged.

Here’s a table for you perusal (remember that ASI is inferred):

Group Language Status S Asian Onge E Asian SW Asian Euro Siberian ASI
Paniya Dravidian Tribe 47% 45% 4% 0% 0% 1% 67%
Santhal Austro-Asiatic Tribe 40% 45% 13% 0% 0% 0% 67%
Bonda Austro-Asiatic Tribe 27% 44% 27% 0% 0% 0% 66%
Ho Austro-Asiatic Tribe 34% 44% 20% 0% 0% 0% 66%
Kharia Austro-Asiatic Tribe 33% 44% 21% 0% 0% 0% 65%
Savara Austro-Asiatic Tribe 33% 44% 21% 0% 0% 0% 65%
Mawasi Austro-Asiatic Tribe 38% 44% 16% 0% 0% 1% 65%
Juang Austro-Asiatic Tribe 26% 43% 28% 0% 0% 0% 65%
Asur Austro-Asiatic Tribe 42% 42% 14% 0% 0% 0% 64%
Gadaba Austro-Asiatic Tribe 29% 42% 24% 0% 0% 0% 63%
Mala Dravidian Dalit 58% 40% 1% 0% 0% 0% 60%
Kurumba Dravidian Tribe 54% 39% 2% 2% 1% 0% 60%
Sahariya Indo-European Dalit 44% 39% 12% 0% 2% 1% 59%
Chenchu Dravidian Tribe 53% 39% 3% 0% 2% 1% 59%
Madiga Dravidian Dalit 57% 38% 0% 0% 1% 1% 58%
Bhil Indo-European Tribe 56% 37% 0% 1% 3% 1% 57%
North Kannadi Dravidian 57% 37% 1% 1% 2% 0% 56%
Satnami Indo-European L Caste 49% 36% 8% 1% 3% 0% 56%
Sakilli Dravidian Dalit 59% 36% 1% 2% 0% 0% 55%
Kamsali Dravidian L Caste 59% 35% 1% 2% 0% 0% 54%
Vysya Dravidian Mid Caste 62% 34% 0% 2% 0% 0% 53%
Hallaki Dravidian Tribe 57% 34% 0% 3% 3% 1% 53%
Tharu Indo-European Tribe 52% 32% 3% 3% 6% 2% 50%
Naidu Dravidian U Caste 59% 32% 0% 4% 2% 1% 50%
Lodi Indo-European L Caste 58% 32% 1% 2% 6% 0% 50%
Velama Dravidian U Caste 60% 29% 0% 7% 2% 0% 46%
Srivastava Indo-European U Caste 56% 28% 0% 4% 10% 0% 44%
Gujaratis a Indo-European 64% 26% 0% 3% 6% 0% 42%
Meghawal Indo-European Dalit 55% 25% 0% 8% 10% 1% 41%
Cochin jews Dravidian 50% 24% 1% 16% 7% 0% 39%
Vaish Indo-European U Caste 52% 24% 0% 6% 15% 0% 39%
Gujaratis b Indo-European 56% 22% 0% 7% 13% 0% 36%
Khasi Austro-Asiatic Tribe 21% 21% 48% 0% 3% 5% 36%
Bene Israel Jews Indo-European 45% 19% 0% 26% 8% 1% 32%
Kashmiri pandit Indo-European U Caste 51% 18% 0% 12% 15% 2% 31%
Cambodian 4% 17% 75% 1% 1% 0% 30%
Singapore malay 5% 17% 73% 1% 1% 0% 30%
Garo Tibeto-Burman Tribe 8% 17% 65% 0% 0% 9% 29%
Sindhi Indo-European 52% 13% 0% 16% 13% 1% 25%
Pathan Indo-European Tribe 48% 11% 1% 17% 19% 2% 21%
Burusho Isolate Tribe 47% 10% 6% 12% 18% 5% 21%
Lahu Tibeto-Burman 0% 10% 86% 0% 0% 3% 20%
Dai Tibeto-Burman 0% 8% 91% 0% 0% 0% 18%
Balochi Indo-European Tribe 49% 7% 0% 27% 12% 1% 16%
Brahui Dravidian Tribe 50% 5% 0% 28% 12% 1% 14%
Makrani Indo-European 47% 5% 0% 29% 11% 1% 14%

• Category: Science • Tags: Genetics, Genomics, Indian Genetics, Indian genomics 
🔊 Listen RSS

Zack Ajmal now has over 50 participants in the Harappa Ancestry Project. This does not include the Pakistani populations in the HGDP, the HapMap Gujaratis, the Indians from the SVGP. Nevertheless, all these samples still barely cover vast heart of South Asia, the Indo-Gangetic plain. Here is the provenance of the submitted samples Zack has so far:

  • Punjab: 7
  • Iran: 7
  • Tamil: 6
  • Bengal: 5
  • Andhra Pradesh: 2
  • Bihar: 2
  • Karnataka: 2
  • Caribbean Indian: 2
  • Kashmir: 2
  • Uttar Pradesh: 2
  • Sri Lankan: 2
  • Kerala: 2
  • Iraqi Arab: 2
  • Anglo-Indian: 1
  • Roma: 1
  • Goa: 1
  • Rajasthan: 1
  • Baloch: 1
  • Unknown: 1
  • Egyptian/Iraqi Jew: 1
  • Maharashtra: 1

Again, note the underrepresentation of two of India’s most populous states, Uttar Pradesh, ~200 million, and Bihar, ~100 million. Nevertheless, there are already some interesting yields from the project. Below I’ve reedited Zack’s static images (though go to his website for something more dynamic) with the labels of individuals. I’ve highlighted myself and my parents with the red pointers.

To the left is a set of plots and tables which I’ve spliced together from Zack’s various posts. What you need to know is that this at K = 12, and I’ve used the labels that Zack gave the various putative “ancestral populations” which emerged out of his ADMIXTURE runs. I’ve also displayed the participants in the Harappa Ancestry Project so far, with their ethnic labels. Finally, smack in the middle you see the Fst values, standardized by the smallest between population difference. So the values in the boxes represent the genetic distances for the inferred ancestral populations in the row and column (I also rounded, since I didn’t want to give the impression of excessive precision). This last point is important, these are not between population distance measures across real populations. Rather, they’re distance measures across the inferred allele frequencies of populations generated which emerge out of the parameters you constrain ADMIXTURE to, as well as the genetic variation which you throw into the pot for the algorithm in the first place.

In the broadest sense the first thing that jumps out at you is the high distance value between “Papuans” and everyone else. This is interesting. In fact, the genetic distance of between Papuans and other ancestral populations is greater than the genetic distance between the putative African populations and other non-Africans, except Papuans. This goes to the point that you need to be very careful in making definitive inferences from these sorts of programs. Interestingly, the population to which the Papuans exhibit the least genetic distance are the “South Asians.” What does that mean? I think this has a straightforward explanation. I believe that the South Asian cluster is a hybridized compound, as suggested by Reconstructing Indian History, and that the populations of Oceania represent a relatively “pure” eastern expansion of long resident southern Asian groups which have generally been submerged by admixture with other groups intrusive to the region. This also explains the fact that Cambodians share some of this Papuan component with various South Asian populations. Finally, I wouldn’t make too much of this, but in some ADMIXTURE runs which I’ve done the genuine Papuan population in the HGDP data set breaks into two ancestral components, of which the southern Asian groups from Pakistan to Cambodia share only one. Remember that Oceania was settled initially by Melanesians and Australians ~40-50,000 years ago, and it looks like the people of Melanesia and indigenous Australians date to this initial period. So connections between southern Asians and Papuans are likely very old, and the two groups have been distinctive for a long time.

To the South Asian individuals surveyed so far, there’s nothing that surprising. The South Asian element tends to increase as one goes south and east. This is what you’d expect. And, the Pakistan/Caucasian component which spans much of western and central Asia is what connects the Iranian samples to the South Asian ones. The Iranians have very little of the South Asian component. This makes sense if the South Asian element is simply an outcome of an admixed population, and one of the ancestral groups from which this component derives, “Ancestral South Indians,” were generally not present to the west of Pakistan. The eastern Asian components are enriched among Bengalis, as you’d expect, but they’re found in different proportions among many individuals who hail from the northern fringe of South Asia more generally. It seems clear that the further west you go, the more likely the “eastern” element is going to be Turk, while the further east (and to some extent south) the more likely it is to be more southernly in provenance. Most of the other patterns are as you would expect. Finally, I’d like to point out that I suspect that Zack is the first one to post the ancestral fractions of someone from the Nadar caste using SNP-chip markers.

Here are all the details about participation.

🔊 Listen RSS

Zack has been posting his data sources, as well as how he filtered and formatted them, all this week. I assume that the first wave of results will be online soon. As of yesterday, this is what he had (I know he got some more today):

– Punjab 7
– Bengal 1
– Bihar 1
– Tamil 5
– Karnataka 1
– Anglo-Indian 1
– Roma 1
– Iran 3

Whole swaths of north-central India are missing. I am hopeful that more people will join in after the first wave of results are put out there. But, from what I have discussed with Zack it looks plausible that the very first wave will have a richer set of results because of the necessity of preliminary steps. So there’s some benefit in getting early. It’s really ridiculous to have literally 1 sample representing the 300 million people of Uttar Pradesh and Bihar. That’s 25% of South Asians represented by one person. I’ve gotten a commitment from one friend who was born U.P. to give his data up once it comes in, but there have to be others out there. (the Bengali N should go up to 2 when I swap my parents in for me)

The public data sources have Gujaratis, Tamils, Pakistanis (Punjabis, Pathans, Sindhis), and some South Indian groups (Tamil and Telugu). This leaves a blank spot on the North Indian plain.

Here’s the brief for the project again.

🔊 Listen RSS

modelhumanQuick review. In the 19th century once the idea that humans were derived from non-human ancestral species was injected into the bloodstream of the intellectual classes there was an immediate debate as to the location of the proto-human homeland; the Urheimat of us all. Charles Darwin favored Africa, but in many ways this ran against the cultural grain. The theory of evolution was birthed before the highest tide of the age of white supremacy and European hegemony, and Darwin’s model had to swim against the conviction that Africans were the most primitive of the colored races. After the waning of the ideological edifice of white supremacy, and the shock it received during and after World War II, the debates as to the origin of humanity still remained contentious and followed the same outlines (though without the charged normative inferences). But as the decades wore on many more researchers began to believe that Darwin was correct, and that the origin of humanity lay in the African continent. First, the deep origin of the human lineage in Africa was accepted, but eventually a more recent expansion out of Africa was argued for by one school. The turning point in these academic disputes was the popularization of the “mitochondrial Eve” theory of the 1980s.

What some paleontologists had long argued, that anatomically modern humans have their locus of origin in Africa, was supported now by research from genetics which indicated that Africans were the most basal clade of humans on a continental scale, so that non-Africans could be conceived of as a subset of Africans. From this originates the chestnut of wisdom that Africans have more genetic diversity than all other human populations combined. By the year 2000 one could say that the “Out of Africa” triumphalism had proceeded to the point where an almost exterminationist model had taken hold when it came to the relationships of anatomically modern H. sapiens, and other groups which had evolved outside of Africa over the past million or so years, such as the Neandertals. But the theoretical dichotomies were too coarse and absolute as it turns out. A division between multiregionalist phyletic gradualism, where H. sapiens evolved out of its hominin ancestors concurrently on a world wide scale, and a model of rapid expansion of one tribe in Africa to replace all others in totality, may have been warranted in the age of classical genetics and a morphometric analysis, but now we can look at the raw genomic material in a more fine-grained fashion. In fact, we can now look at the genomic patterns of variation among extinct hominins! Though there have long been hints that the expansion-and-replacement paradigm was too extreme from the genetic and morphological data, with the publication last spring in Science of a paper which made the claim for admixture between Neandertals and non-Africans in the range of 1-4% in all non-African groups based on a comparison of Neandertal and modern human genetic variation, one can dismiss absolutist expansion-and-replacement as self-evidently true orthodoxy. But one orthodoxy has no given way to another, and the shock to the old models presented by the data has not resulted in the coalescence of new robust paradigms. We live in a time of scientific troubles, so to speak.

One of the more notable results in the Science paper from last spring was that all non-Africans had about the same admixture in relation to the Neandertal reference genome, ~1-4%. This means from the Orkneys to New Guinea. Because Neandertals were distributed only in the western half of Eurasia this implies that the admixture was an early event. By the time of modern human expansion across Eurasia, Australasia, and the New World, it had become equally distributed across the individuals within the population. Recall the contrast between African Americans and Uyghurs. Among the Uyghurs the ancestral quanta are equitably distributed from individual to individual, but among African Americans there remains substantial intra-population variance. The reason is that African Americans are quite new, an order of magnitude younger than the Uyghurs in a genetic sense, and admixture is still occurring into the African American population from the ancestral groups. The Uyghurs as we known them today genetically are probably ~1,000-2,000 years old (though their cultural origins are both more and less ancient, as a matter of linguistics in the former, and ethnic self-conception as a Muslim East Turkic group in the latter). The implication here is clear: there was a pause in the Out of Africa movement, where the proto-non-Africans mixed with a Neandertal group, possibly in the Middle East, and only began a massive demographic expansion after an unspecified sojourn. A paper from last spring makes this all explicit:

A more likely explanation for the OoA bottleneck is that Eurasia was populated by a larger population that had been relatively isolated from other modern human populations for tens of thousands of years prior to the expansion. The first fossil evidence for modern humans outside of Africa is in the Middle East at Skhul and Qafzeh between 80,000-100,000 years ago, which is at least 20,000 years prior to the Eurasian diaspora. If a population of modern humans remained in the Middle East until the expansion into Eurasia, there would have been sufficient time for genetic drift to reduce heterozygosity dramatically before the Eurasia expansion. This “Middle East isolation” hypothesis provides a robust explanation for the relative homogeneity of European and Asian populations relative to African populations (see Figures 3A-B) and is supported by a recent maximum likelihood estimate of 140,000 years ago for the time of Eurasian-West African population separation . Interestingly, a recent study of the Neandertal genome suggests that the non-African individuals, but not the Africans, contain similar amount of admixture (1-4%) with the Neandertals . The authors suggest that the admixture must have happened between the Neandertals with an ancestral non-African population before the Eurasian expansion. Given the fossil, archaeological, and genetic evidence, the Middle East isolation hypothesis warrants rigorous evaluation as whole-genome sequence data become available.

Now the same group has published a follow up paper in Genome Biology which fleshes out the Deep Time aspect of human evolutionary history by looking closely at the genetic variation of an under-sampled population: South Asians. You may have noticed that the HGDP populations include Pakistani groups as South Asian exemplars. That’s apparently because during the Permit Raj era in India the government was wary of cooperating with the HGDP consortium. But more recently the barriers have come down in India, and one can viably supplement the data sets with Indian Americans. So the GIH sample in HapMap3 consists of Gujaratis from Houston. At ~1.25 billion, or nearly 20% of the world’s population, South Asians are a critical portion of the “big picture” when it comes to world wide genetic variation.

Genetic diversity in India and the inference of Eurasian population expansion:

To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100 kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90-110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.

First, I want to put into the record that I think there are high enough uncertainties (evident in the confidence intervals in the paper itself) that we need to be careful about taking the divergence times from their results as values we’d bet the house on. Someone with a better knowledge of the fossils (e.g., John Hawks) or controversies about the mutational rates (e.g., Dienekes) can comment on the plausibilities of the dating. But, I think we can infer that there was a time lag closer to a 10,000 years order of magnitude than 1,000 years when it comes to the Middle Eastern sojourn of non-African humans.

The basic method here is that the research group zoomed in on a ~100 kb region of the genome, on chromosome 12, and surveyed their Indian populations, as well as the HapMap3 ones. This is important because the SNPs in the HapMap probably exhibit an ascertainment bias toward variants in European and other more widely surveyed groups. The fact 30% of the SNPs in the South Indian groups seem to not be found among the HapMap populations confirms this hunch. Before digging into the details of the paper, let’s note that the South Indian groups are from the state of Andhara Pradesh, Brahmins, a lower caste group (Yadava), Dalits (Mala/Madiga), and a tribe (Irula). This is a case where even more thorough coverage is necessary. There is some suggestion that South Asian groups have a long history of endogamy and genetic peculiarities, which would limit the usefulness of extrapolations from this sample. Even within the HapMap Gujarati sample there seems to be two clusters when the PCA is used with reference to the European samples.

There are basically three portions of the paper:

– A survey of conventional population genetic statistics,

θ = 4N eμ (N e = effective population, μ = mutation rate)
π = nucleotide diversity
H = heterozygosity
D = Tajima’s D

– Measures of genetic distance between contemporary populations, F st and PCA

– Finally, taking the genetic variance from the ~100 kb and plugging it into explicit models of human evolutionary history

Table 1 (I reformatted) shows the genetic statistics by “continent.” Indian includes some Gujarati individuals. They sampled out of the HapMap populations to equalize the numbers.


euro2Some of these results are striking. The general truism is that Africans are the most diverse population in the world, but some of the South Indian groups are very diverse indeed. Of particular interest though is that some Indian groups are not very diverse at all. What’s going on here? Here you have to look at the specifics of each group. It is likely that South Indian Brahmins are the result of a relatively recent population expansion, with some uptake of other genes through hypergamy. A paper from last year argued that all Indian populations can be modeled as a two-way admixture of different quantities from two ancestral groups, Ancient North Indians and Ancient South Indians. The heterozygosity values may be explained in such a fashion, though the relatively low values for Gujaratis and Andhara Pradesh Brahmins would still surprise. Frankly, I’m just mostly confused by the diversity statistics. Probably the substructure through endogamy and population bottlenecks are obscuring broader dynamics. We can, though, conclude that the idea that all non-Africans are uniformly homogeneous in comparison to Africans may not hold water. Figure 2 above illustrates this by plotting heterozygosity vs. distance from Africa.

Next, let’s move to genetic distance. There’s two ways you can look at this: a summary statistic like F st, which partitions between and within population variance, and PCA, which visualizes the largest dimensions of variations in the data set. So you have both below (reedited for reasons of space):


In the generality the results are expected, but there are weird details. For example, the Brahmins from Andhara Pradesh are on the margins, where you’d expect them to cluster with the Gujaratis. The Gujaratis are closer to the Chinese from Denver than Utah Whites? This is a provisional paper, so I’m almost wondering if there’s a typo or coding error here, as I don’t understand how the GIH can be so close to the Tuscans and Chinese from Denver, and much further from the Northern Europeans and Chinese from Beijing. The two European and Chinese samples are rather close in other analyses.

So let’s get to the real deal. The modified Out of Africa model where non-Africans take a “break” after they leave the mother continent:


I’ve mashed up the figures. The models were generated by looking at allele frequencies. They took the variants they found by sequencing the ~100 kb on chromosome 12, which was in a very gene-poor region so as to bias it toward neutrality, and plugged them into a few models in the ∂a∂i program. I’ll jump to the text here:

…the divergence time between African and the ancestral Eurasian population (88-112 kya, CIs: 63-150 kya) is much older than the divergence time among the Eurasian groups (27-39 kya, CI: 20-59 kya). The more recent divergence time and the low migration rate estimates among the current Eurasian populations support the “delayed expansion” hypothesis for the human colonization of Eurasia (Figure 5). Consistent with previous studies…these estimates indicate that a single Eurasian ancestral population remained separated from African populations for more than 40 thousand years prior to the population expansion throughout Eurasia and the divergence of individual Eurasian populations.

Manafi al-Hayawan, Adam and Eve

Take a good look at those confidence intervals. We know that some of those have to be false: the bones don’t lie. From what little I know a very young consensus date for the settlement of Australasia by modern humans is 40,000 years ago. That happens nicely to be their median, but the dispersion toward younger dates is probably not right, unless Aborigines are a separate population who are remnants of an earlier wave of migrants (or the current Aborigines replaced earlier waves). It is also hard to reconcile these dates for the diversification of non-African humanity with very old dates for Chinese fossils which exhibit some elements of modern morphology.

In the broad outlines I think we can accept that the model outlined in this paper may be correct. It would explain the uniform admixture of Neandertal in non-Africans, since they’d need time as a compact population before demographic expansion to integrate the Neandertal genes as part of their genetic background. But before the Neandertal genome came out there were plenty of papers which purported to show how there was no archaic admixture in modern humans, and plenty of papers which did claim there was evidence for such admixture. The point is that these computational models are sensitive to their inputs, and being models they simplify what really happened. In the discussion the authors repeatedly observe that migration between the various non-African demes doesn’t effect the outcome. That is fine, but there is modestly strong evidence that the Indian samples that they’re using are an admixed population of old. That would make me skeptical of claims about dating the separation of “Indians” when Indians are themselves possibly a compound between other groups.

Below is the model presented from Reconstructing Indian population history:


The teens of this century are going to be very exciting when it comes to reconstructing human evolutionary history. You’d be a fool to put bets on any horse at this time.

eurasicansAddendum: I need a term for non-African humanity. So I’m making up one right now: Eurausicans. From Eurasians, Australasians, and Americans.

Citation: Jinchuan Xing, W Scott Watkins, Ya Hu, Chad D Huff, Aniko Sabo, Donna M Muzny, Michael J Bamshad, Richard A Gibbs, Lynn B Jorde, & Fuli Yu (2010). Genetic diversity in India and the inference of Eurasian population expansion Genome Biology : 10.1186/gb-2010-11-11-r113

🔊 Listen RSS

price_fig1I have put up a few posts warning readers to be careful of confusing PCA plots with real genetic variation. PCA plots are just ways to capture variation in large data sets and extract out the independent dimensions. Its great at detecting population substructure because the largest components of variation often track between population differences, which consist of sets of correlated allele frequencies. Remeber that PCA plots usually are constructed from the two largest dimensions of variation, so they will be drawn from just these correlated allele frequency differences between populations which emerge from historical separation and evolutionary events. Observe that African Americans are distributed along an axis between Europeans and West Africans. Since we know that these are the two parental populations this makes total sense; the between population differences (e.g., SLC24A5 and Duffy) are the raw material from which independent dimensions can pop out. But on a finer scale one has to be cautious because the distribution of elements on the plot as a function of principal components is sensitive to the variation you input to generate the dimensions in the first place.

I can give you a concrete example: me. I showed you my 23andMe ancestry painting yesterday. I didn’t show you my position on the HGDP data set because I’ve shared genes with others and I don’t want to take the step of displaying other peoples’ genetic data, even if at a remove. But, I have reedited some “demo” screenshots and placed where I am on the plot to illustrate what I’m talking about above. The first shot is my position on the two-dimensional plot of first and second principal components of genetic variation from the HGDP data set.

gsa-lillymendel-worldNo surprise that I’m in the Central/South Asian cluster. But what may surprise you is that I’m not in the South Asian cluster, I’m in the Central Asian cluster. In the Central Asian cluster are Uyghurs and Hazaras. These are two hybrid populations, a mixture of West and East Eurasian elements. The Uyghurs are likely the outcome of a process of admixture between the Iranian and Tocharian Indo-European populations of the cities of the Tarim basin, and later Turkic speaking settlers who arrived in the wake of the expansion and later collapse of the first Uyghur Empire (the historical connection between the current Uyghurs and ancient Uyghurs is tenuous at best, and complicated). The Hazaras are a more recent population, likely emerging as the product of intermarriages between Mongol soldiers who arrived in the 13th century, and indigenous women, Persians, Turks, and assorted Indo-Iranian groups between the Zagros and Khyber Pass. It is somewhat ironic that I’m on the edge of the Hazara cluster since they are almost certainly in part descended from Genghis Khan’s family, and my own surname is Khan. But I know that my Y chromosomal lineage is R1a1, very common across Central and Southern Eurasia, and not a Mongolian one at all.

23andmepcazoomZoom! Now we’ve constrained the input data set to the Central/South Asian groups. First, look at the Kalash. They’re strange, which is no surprise, they’re an inbred mountain group in Pakistan who have not adopted Islam. The Pakistani Taliban looks to be ending them as we speak. I really would prefer that they were just thrown out of the data set for this zoom view, because on this fine grained scale I don’t think they add much at all. They’re just an example of what long term endogamy can do to your allele frequencies. The bigger picture is the axis between the populations of Pakistan, and those of Central Asia. Observe that I’ve changed position. Whereas when taking world wide genetic variation into account I clustered with Central Asians, now I’m 2/3 of the way to the South Asian cluster. I will tell you that I’ve shared “genes” with around 50 South Asians now, from various parts of the subcontinent, and in the 23andMe plot they overlay the South Asians nearly perfectly. I’ve put labels at the approximate ethno-linguistic position. I’m an outlier. 23andMe tells me that I’m 43% “East Asian.” The typical South Asian is in the 10-30% range. My first assumption was that I have a lot of ancient South Indian, which just shows up as East Asian in their algorithm. With this in mind I tried sharing with a lot of South and East Indians, and found out two interesting points. First, South Indians seem no higher than 30-35% East Asian. Bengalis on the other hand are more East Asian, with Bangladeshis more East Asian than West Bengalis. My sample size for Bengalis is small, so take that with caution. Second, the PCA plots put the South Indians firmly in the South Asian cluster, but the Bengalis trail out toward my own position. This indicates again that different methods are telling you slightly different things. The PCA is only a thin slice of variation, but it’s highly informative of between population differences. A Bengali and a South Indian with the same “East Asian” fraction in the ancestry painting nevertheless have consistently different positions on the PCA, with Bengalis closer to the East Asians. Additionally, there’s an ethnic Persian in this zoom plot that I’m describing, and they are positioned near the Balochi. But on the world wide plot they’re on the margins of the European cluster. Another illustration that position of an element is sensitive to the input data because of how the dimensions are generated.

Blaine Bettinger, who inspired me to post this, told a story with his ancestry painting which was plausible. What can I say? First, I have less than 1% African ancestry. This could be noise. But, I do observe that the South Asians with Muslim names are enriched in the set of those who I’ve shared genes with and who have less than 1%, but not 0%, African ancestry. Just as Muslim South Asians have non-trivial West Asian ancestry, I suspect that many of us have Sub-Saharan African ancestry through the same dynamic. Sub-Saharan African soldiers were prominent across South Asia with the arrival of Muslims. Bengal even has a period of rule by Abyssinian rulers. But the bigger issue for me is the East Asian component. Here is a figure from a paper published 4 years ago:


The figure is showing Fst value comparing Indian Americans with Europeans and East Asians. Fst measures between population differences in allele frequency, in this case the alleles being 207 indels. Take a look at the Bengalis. These are West Bengalis, who I believe have a lesser East Asian component, but even there the allele frequency difference to East Asians is near that of Europeans. The Assamese, who speak a language very close to Bengali, are similar. Assam was ruled by a Tibeto-Burman people for nearly 600 years. The Oriya speakers, from the southwest of Bengal, are more distant from East Asians. As one goes south and east, and west and north, the distance from East Asians increases. This shouldn’t be that surprising, but nice to confirm. The fact that the genetic distance increases as one goes south means that for northeast South Asia you need to complexify the model from a two-way admixture with “ancient North Indians” and “ancient South Indians.” Set next to these two is an East Asian element, which is also clear in the Indo-Aryan peoples of Nepal.

Sheikh Hasina, Khaleda ZiaOf course anyone who knows Bengalis won’t be totally surprised by an East Asian component to their ancestry. To the left are head shots of the two women who have dominated Bangladeshi politics for the past two decades, Khaleda Zia and Sheik Hasina. They’re both Bengalis, but they do look different, and I know many people who look like one or the other (or a combination). My family is from one of most easternmost districts of Bengali, next to Tripura. In fact my late maternal grandmother lived in Tripura for some of her childhood (she was almost trampled to death by the Maharani of Tripura’s insane elephant as a young girl!). When I was a young child I once saw a black and white photo from my father’s college days, and I was curious who the Asiatic looking young man in the middle of the photograph was. Turns out it was my father! Sometimes our expectations affect how we perceive people. I have never perceived my father to have an Asian cast to his features as a more mature man, but others have told me that he does still exhibit them.

There is still the question of how Bengalis came to have this particular admixture. I think the most plausible scenario probably synthesizes conventional village-to-village intermarriage and isolation-by-distance, along with some component of migrationism. Tribes such as the Chakma have left Burma in historical time. The Chakma of Bangladesh now speak a dialect of Bengali, not their ancestral Sino-Tibetan tongue. I believe that a non-trivial portion of Bengalis have ancestors who were tribal people who shifted their religious identity to that of Hinduism or Islam (from Theravada Buddhism in the case of the Chakma, or animism in the case of the Garos before their Christianization). But eastern South Asia is adjacent to mainland Southeast Asia, and it stands to reason that continuous gene flow would over time would also have introduced East Asian alleles into the Bengali gene pool.

Image Credit:

🔊 Listen RSS

Dienekes has a post up where he highlights the fact that the recent paper on South Asian metabolic diseases has a figure which elucidates population structure within the region. Accounting for structure is important for genome-wide associations since you might get a spurious correlations if trait value/disease frequency is simply tracking cryptic population variation. Dienekes says:

The existence of two clusters is kind of obvious, while their interpretation is not as dots of the same color appear in both clusters: a placement of these individuals in a global context might have been useful here. Things are clearer at the top cluster which shows a clear gradient anchored by Punjabi Sikh and Hindu Tamils on either end.

Also of interest is the group of isolated Muslim/Christian individuals on the left which deviate strongly from the mainstream; these probably represent exogenous elements that don’t resembe the bulk of the Indian population.

The second issue is easily addressed. The Christian outliers are both give English as their native language. That suggests to me that they’re Anglo-Indian, a community of mixed South Asian and European origin. South Asian Muslims are overwhelmingly of indigenous origin. But, a minority of the Muslim elite are West Asian, or have substantial West Asian ancestry, as is evident by the fact that they look white. Benazir Bhutto’s mother was of Kurdish and Persian ethnic background (her family was from Esfahan in Iran). I’ve reedited the religious & linguistic PC plots to fit onto the screen.


So what’s going on with the cluster which extends along the second principal component? The first component is probably just a European/West Asian-South Asian axis of variation. But I don’t understand where the variation for the second is coming from. Observe that the one South Indian group, Tamil speakers, are not represented in the secondary cluster. The plot reminded me of something I saw last fall.

Below is figure S4 is from the supplements of Reconstructing Indian population history. I added some labels. The Indian cluster is tight when the genetic variation includes non-Indian groups. But, when you constrain the variation to Europeans and South Asians only, something strange happens:

The Gujarati sample is from Houston, and is from HapMap Phase 3. I have a suspicion that the secondary cluster among the Gujaratis here is of the same class of phenomenon as the secondary cluster in the first plot. The Anglo-Indians and West Asian Muslims serve as rough proxies for Europeans, and you have an expected European-South Asian axis. But you also have this strange orthogonal component. I had assumed that the plot from the Reich et al. paper was an anomaly, but I’m not so sure seeing the second paper.

• Category: Science • Tags: Genetics, Genomics, Indian Genetics, Indian genomics 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"