The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
India Genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

514px-Major_crop_areas_IndiaThere seems to be a deep and ancient connection between the populations of Southeast and South Asia, most evident in the substrate of the Cambodians. In First Farmers the author relays an early report about a farming community in northern Vietnam where morphological and ancient DNA evidence both pointed to a stabilized coexistence between a classically East Asian majority population and another which he terms “Austro-Melanesian.” This latter group has been predominantly absorbed today, but seems to persist in isolated tribes such as the Senoi. But these are most certainly residual elements, near extinction, and it seems the dominant genetic heritage of major ethnicities such as the Khmer derives from agriculturalists who left southern China over 4,000 years ago. Only in eastern Indonesia does the Melanesian component of ancestry in Southeast Asia begin to increase to a non-trivial component, and this area is truly as much or more part of Oceania than maritime Southeast Asia.

Freida Pinto

Freida Pinto

Nguyễn Linh Nga

Nguyễn Linh Nga

The Indian subcontinent has also characterized by a synthesis between outsiders, who likely brought farming technologies, and the native inhabitants. These ancient populations had very distant connections to the ancestors of the hunter-gatherers of the Andaman Islands, and no doubt with the peoples of pre-agricultural Southeast Asia, and further on toward Oceania. This is not to say that the zone between the South China Sea and Indus was homogeneous. Rather, like Northeast and Northwest Eurasia, it was likely a region where peoples diversified from an original Pleistocene element which arrived ~50,000 years ago, and retained broad affinities through gene flow and common ancestry. But whereas the farmers in Southeast Asia came from the north, those in India came from the west. Additionally, it seems clear that the fraction of ‘indigenous’ ancestry is far higher in South Asia, on the order of ~50% across the subcontinent. The equivalent figure for Austronesians, Daic, Burman, and Austro-Asiatic populations of Southeast Asia of Pleistocene hunter-gatherer is probably closer to ~10% (higher in the Austro-Asiatic, least among the Daic).

Ggas_human_soc So I have decided to offer up a hypothesis: the agricultural toolkit which West Asian farmers brought to the northwest fringe of the Indian subcontinent was far more constrained in its ability to expand than the equivalent for the rice farmers from southern China. Though there is still debate, it seems that the dominant Indian cultivar of rice has an East Asian origin. Though wheat plays an important role in Pakistan and northwest India, rice is the staple crop for the preponderance of the South Asian population. Though I hold to the proposition that the Austro-Asiatic populations of South Asia are recently intrusive (i.e., they are not the primal inhabitants as some would argue), for geographic reasons, it seems that east to west migration across the difficult north-south mountains separating South and Southeast Asia served as a check on migration from farmers in that zone. Ultimately it was South Asian rice farmers, a hybrid population, that pushed south and east and absorbed the tribal hunter-gatherers who remained in their fastness (the current Indian tribes are not descendants of the original hunter-gatherers, but admixed populations at the margins of Sanskritic civilization; both genetics and their mode of production suggest this). The long pause in the northwest due to the limitations of their agricultural toolkit may explain the difference between South and Southeast Asia in the completeness of their demographic assimilation. Where the rice farmers from southern China swept across all of Southeast Asia rapidly in a singular sweep, the West Asian farmers were halted for many generations at the limits of their ecological range, absorbing genes from the hunter-gatherers on their frontiers. The analogy here would be the Xhosa, Bantus at the edge of their range of expansion which have absorbed a great deal of genetic material (~25% of their ancestry) from Khoisan populations. Once the proto-Indians of the northwest had accumulated enough cultural adaptations their distinctive West Asian genetic signal may already have been substantially diluted by gene flow from the hunter-gatherers to the south and east. The subsequent expansion into the forest zones was likely a demographic disaster for the old natives, but the newcomers themselves were already partly cousins.

• Category: Science • Tags: India Genetics 
🔊 Listen RSS
Citation: Mallick, Chandana Basu, et al. "The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent." PLoS genetics 9.11 (2013): e1003912.

Citation: Mallick, Chandana Basu, et al. “The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent.” PLoS genetics 9.11 (2013): e1003912.

Greg Cochran has a few posts up on the phylogeography of SLC24A5. A quick refresher: this gene has been under very strong selection across Western Eurasia, and seems to correlate with lighter skin. One of Cochran’s points is that though it is fixed (100%) for the new variant in Europe, it persists at very high frequencies in the Middle East. Much of the sub-100% value in the Middle East can probably be attributed to recent Sub-Saharan African admixture. A paper from last fall, Molecular Phylogeography of a Human Autosomal Skin Color Locus Under Natural Selection, has convinced me that the derived variant so common across the world today probably spread from the Middle East least than 10,000 years ago. In any case, Cochran asks:

As for those who assume that sexual selection must be driving that increase – show me the time machine. I don’t know if there was any such preference over the past three thousand years in Ethiopia and neither do you. This is used as an excuse to avoid looking at the biochemical details and trying to find out what’s actually happening. If I hear it again, I may have to call the elephants.

Next, someone should do the same for the Deccan plateau, which ought to be easy.

Prompted by another paper I had stumbled upon data to answer this question a few months back. The results below are from Polymorphisms of four pigmentation genes (SLC45A2, SLC24A5, MC1R and TYRP1) among eleven endogamous populations of India (except for the HapMap samples).

Population State Linguistic Status SLC24A5 N
Konkanastha Brahmin Maharashtra Indo-European Caste 0.9789 71
Gujarati Gujarat Indo-European Diverse 0.955 100
Kanyakubja Brahmin Madhya Pradesh Indo-European Caste 0.8846 78
Sakaldwipi Brahmin Jharkhand Indo-European Caste 0.7692 65
Iyengar Brahmin Tamil Nadu Dravidian Caste 0.7463 66
Mahadev Maharashtra Indo-European Tribe 0.7231 65
Balmiki Punjab Indo-European Caste 0.6694 62
Kurumans Tamil Nadu Dravidian Tribe 0.4104 67
Gond Madhya Pradesh Dravidian Tribe 0.2333 75
Riang Tripura Tibeto-Burman Tribe 0.1119 67
Munda Jharkhand Austro-Asiatic Tribe 0.0956 68
Tripuri Tripura Tibeto-Burman Tribe 0.0923 65

If you look at the latest research it seems pretty obvious that the proportion of derived SLC24A5 in all the italicized populations is higher than their “Ancestral North Indian,” the West Eurasian element which arrived in the last 10,000 and almost certainly brought the derived variant. On the North Indian plain in Uttar Pradesh the ANI proportion for high castes like Brahmins is 60%, while for Dalit low castes like Chamars it is 40%.

• Category: Science • Tags: India Genetics, SLC24A5 
🔊 Listen RSS

ResearchBlogging.orgThe Pith: Afro-Indians are mostly African, with a substantial Indian minority ancestry. The latter is disproportionately female mediated. It also seems that that ancestry is more northwest Indian, and that natural selection has been operating upon them outside of the African environment.

Along the western coast of South Asia, from Makran in southwest Pakistan, down to the Konkan coast of southwest Iindia, there are isolated communities of Afro-Indians. They are called Siddis or Habshi. Their African origin is clear in their physical appearance, as well as aspects of their folk customs which tie them back to Sub-Saharan African. Nevertheless, they have assimilated to many Indian cultural traits. They generally speak the local language, and practice Islam, Hinduism, or Roman Catholic Christianity (in that order in proportion).

How and why did the Siddis arrive in India? The earliest date for their arrival almost certainly must be bounded by the period when Indo-Islamic polities rose to prominence in the early second millennium. The cosmopolitan melange of the armies of the Muslim warlords included diverse groups of Africans, some of whom took power, and established their own self-conscious Afro-Indian dynasties, set apart from the Turkish, Afghan, Persian, and Arab inflected statelets. Were these the sources of the modern Siddi communities? The oral history of the Siddi of the western coast of South Asia suggests not. In fact the geographical concentration of these Afro-Indian tribes along the Arabian sea fringe is indicative of different historical actors: the Portuguese. In much of Asia, out to China, the role of Africans was very different from that in the New World. They were objects purchased as for elite consumption, not production. They served at court, guarded the harem, etc. Lowland Asia had no need for imported labor, as there was human stock aplenty. Whereas in much of the New World black African slaves were critical cogs in the capitalist system of production, in Asia, as in the Arab world outside of a few areas such as southern Iraq, they were signals of luxurious consumption by the high and mighty (this was in vogue at European courts for a period as well).

Two new papers published yesterday in the American Journal of Human Genetics examine the genetics of the Siddi of India with an eye toward elucidating the details of their historical ethnogenesis. Though the papers overlap to a great extent, there are subtle differences which result in complementation. Shah et al. uses a far thicker set of markers, while Narang et al. look at many more populations, but due to removing SNPs which don’t span their populations the marker set is much thinner. Let’s review the papers in turn.

Indian Siddis: African Descendants with Indian Admixture:

The Siddis (Afro-Indians) are a tribal population whose members live in coastal Karnataka, Gujarat, and in some parts of Andhra Pradesh. Historical records indicate that the Portuguese brought the Siddis to India from Africa about 300–500 years ago; however, there is little information about their more precise ancestral origins. Here, we perform a genome-wide survey to understand the population history of the Siddis. Using hundreds of thousands of autosomal markers, we show that they have inherited ancestry from Africans, Indians, and possibly Europeans (Portuguese). Additionally, analyses of the uniparental (Y-chromosomal and mitochondrial DNA) markers indicate that the Siddis trace their ancestry to Bantu speakers from sub-Saharan Africa. We estimate that the admixture between the African ancestors of the Siddis and neighboring South Asian groups probably occurred in the past eight generations (∼200 years ago), consistent with historical records.

The major value-add of this paper is a estimate of the time of admixture with Indians. I’ll get to that, but let’s look at the phylogenetic relationships really quickly:

The PCA and admixture estimate are perfectly consistent. The Siddis are more African than not, but, they are clearly admixed with the Indian populations. To obtain more fine-grained understanding the authors also looked at uniparental lineages. Note the striking discordance between maternal mtDNA and paternal Y ancestral estimates. And more curiously, note the far closer value using the autosomal estimates, a proxy for total ancestry, and the paternal lineage quanta. I think there’s a rather good explanation for what’s going on: the transport of slaves from Africa was strongly male-biased. These African-born males assimilated into the native Afro-Indian community, which had a strong local Indian component in the early years via women who had married in. But once a significant Siddi community had developed it assimilated new arrivals, who were male, and beefed up the African quanta of autosomal and Y chromosomal ancestry, but not the mtDNA. Like Argentina the matriline of the Siddis is a shadow of the initial generations, when the boundaries between the Afro-Indians and locals were more permeable.

And that initial generation is likely to have been somewhat recent, as the authors estimate that the average date of admixture was ~8 generations before the present, with a standard error of 1 generation. This comes rather close to falsifying the proposition that the Siddis derive in the main from the first generations of Indo-Islamic arrivals. Rather, the Siddis seem more likely to date to the Indian ocean trade in human beings which post-dates the arrival of the Portuguese, as suggested in their oral history. It is important to remember that Omani Arabs and others were also involved in this trade, but the Portuguese were during the 16th and 18th centuries uniquely placed to transport Africans from their East African strong-points to the fortifications on the west coast of India.

The manner in which they estimated this admixture event is rather straightforward. Geographically distinct populations have their own unique genetic variants. If you take two individuals from very distinct populations, they pass a single strand out of the two they carry (granting recombination’s confounding of the two parental strands). That means that the offspring are going to have two homologous chromosomes which are reflective of very different ancestral histories. To give a concrete example, if someone had an Indian parent and an African parent, then one of their DNA strands would have a sequence of genetic variants extremely associated with the ancestry of the parent from which that DNA strand was passed. That is why first generation mixed-race individuals have very high rates of heterozygosity and few runs of homozygosity; their paired strands are very unlikely to have recent common ancestry.

This also implies that in a first generation population of mixed-race individuals you’ll see a whole lot of linkage disequilibrium (LD). This means that markers x, y, and z, associated with population 1, are going to be likely found on the same DNA strands, while markers a, b, and c, associated with population 2, are going to be find on other DNA strands. Therefore, you’ll get long haplotypes, sets of distinctive markers across genes, indicative of the shared demographic history of the two parent populations.

But I stipulated the first generation, because over time LD will decay due to genetic recombination. The schematic to the left illustrates what’s going on. Recall that during meoisis the parental chromosomes segregate and assort, and haploid gametes are formed which transmit the single strands to the offspring. But this process is not always without incident. In particular, the parents’ distinctive strands can break and recombine to form a new haplotype on the strand level. For example, say your mother has one strand which is maternal and another that is paternal. Through recombination she may transmit to her offspring a strand which is 2/3 maternal and 1/3 paternal in reference to her own parents, because the strands may recombine. Therefore, in the first generation the hybrids have a perfect association between ancestry across single strands, but recombination will break apart these associations. First generation Afro-Indians might transmit a strand which is 25% African and 75% Indian to their offspring. Over the generations this mixing & matching with break apart the associations generated through admixture. If one assumes that this rate of recombination is constant, then the extent of linkage disequilibrium and the length of haplotype blocks can give us a sense of time since admixture. This method is relatively powerful if the admixture was recent, as over the generations the extent of LD will asymptotically approach the baseline one might expect without an admixture event. In other words, there is precision toward events near in time, but relatively little to ancient ones.

As noted in the paper, the Uyghur population exhibits a signature of an admixture event ~2,000 years before the present, while the African American population exhibits admixture on the order of hundreds of years. One of the authors of Shah et al. is David Reich, who was coauthor on a paper which famously (to readers of this weblog!) posited that South Asians are an ancient admixture between “Ancestral North Indians” (ANI) and “Ancestral South Indians” (ASI). This event is too ancient for LD methods to peg a date, at least the ones they use here. The Siddi resemble New World African populations in the date of their admixture event, but, their sex bias is very different. In the New World the maternal lineages are overwhelmingly African, while the paternal lineages are more European (though some African groups have Amerindian paternal lineages). I think this tells us something about the peculiarities of the Siddi community in India. Interestingly, I think that they may resemble Ashkenazi Jews and Roma in this tendency, with the paternal lineage being more associated with their cultural and physically salient characteristics, with exogenous admixture occurring through the female lineages.

Finally, in the analysis of the uniparental lineages they show that there seems to be a clear association between the Bantu people of Africa and the Siddi, and that the admixture events were unidirectional insofar as the nearby Indian groups don’t have African admixture. These samples were from Gujarat and Karnataka, and because the Siddi tend to be Muslim while their neighbors are likely to be Hindu, I think we should be careful to generalize too much. An analysis of the HGDP shows non-trivial African admixture among some South Asian groups to the north and west. I would assume that this is a touch older, and dates back to West Asian groups which were somewhat admixed, but it makes sense Pakistani Muslims are more likely to be able to assimilate another Muslim population, exotic though it may be. One of the Pakistanis I analyzed privately exhibited a clear African ancestral signal which they were not able to explain, so it may be a part of the genetic background of many South Asian Muslims, though not Hindus.

So what about the second paper? Narang et al. has a wider variation in populations in an intra-Indian sense, but a smaller number of markers. While Shah et al. used ~800,000 markers, the combined set of Narang et al. is ~20,000, and, they paired it down in some cases to ~3,000 ancestrally informative markers. ~20,000 is sufficient for PCA from what I’ve seen, but for intra-continental differences it is on the bubble for analysis of admixture between putative ancestral populations (i.e., the bar plots produced by Structure, Admixture, frappe, etc.). Additionally, while Shah et al. used Siddi samples from Karnataka and Gujarat, Narang et al. focused on Gujarati Siddis only. The biggest result seems to confirm something hinted at in Shah et al.: the Indian admixture into the Siddis exhibits a regional bias. Shah et al. concluded that using an ASI-skewed Indian sample was less effective than using an ANI-skewed sample. Narang et al. confirms this, showing that the Gujarati Siddis exhibit and admixture cline more toward northwest Indian groups than not. Some of this may be European or Middle Eastern admixture, but I suspect that the best explanation is that as a predominantly Muslim population these Siddis had interactions disproportionately with individuals of Indo-Islamic background. In particular, a disproportionate number of transplants from northern and northwest India (today Pakistan) who relocated to central and southern India with the collapse of the original Delhi Sultanate. These would be the elites purchasing the Siddis in the first place more often than not (though some Hindu potentates also purchased or received gifts of black slaves, their international connections were more tenuous, and their polities were often more land than sea-based).

Because of the thinner marker set the authors couldn’t much more about the admixture event except that it was recent. But, there was this interesting bit about functionally relevant genes:

We also wanted to see whether there were some biological processes that were selectively enriched in the admixed populations from either of the ancestors. Considering the SNPs that have an FST value ≥0.1 between the two ancestral populations, we selected 3396 of the 18,534 SNPs for functional analysis. Of these, 1218 SNPs were filtered out because their frequencies in the OG population were within 5% of the expected frequency, which is the ancestry proportionate weighted average of the allele frequencies of the two ancestral populations. The remaining SNPs were classified into two groups of 1240 and 938 SNPs on the basis of their closeness, in terms of allele frequency, to the Indian and African ancestral populations, respectively. Analysis of gene classes in these groups revealed significant enrichment of cadherins, potassium channels, membrane proteins, and solute carriers as well as protein kinases from the group close to IE and kinases and immune-related genes from the group close to African ancestry. Further functional annotation clustering (FAC) revealed significant enrichment of processes related to axonogenesis and potassium transport in genes from the group for which the frequency of SNPs is close to that of the Indian ancestral population (Table 5). However, FAC did not reveal any specific enrichment of the processes contributed by the other group.

In other words, there’s a deviation from what you’d expect just from ancestry alone. Why? I suspect there was some sort of release of functional constraint due to the high pathogen load common in Africa in relation to South Asia (yes, South Asia has a low pathogen load compared to Africa!). It isn’t as if the climate is that different. Here the categories of genes which seem to be overrepresented in the Siddi population in relation to the ancestral Indian component (in other words, the proportion of “Indian” ancestry is higher at this locations than expectation):

Here’s the elaboration in the discussion:

…. However, we wanted to examine whether the OG have retained any enriched biological processes from either of the ancestors. Our search for functional enrichments was directed at the AIMs that were associated with genes and whose frequency in OG was close to either of the ancestral populations. We observed a significant enrichment of processes related to ion-channel activity and cadherin genes; the genotypic spectrum in these enriched processes was close to that of the IE ancestors (Figure 7). Selection in ion-channel genes among populations of African ancestry has been a long-term global enigma. However, the fact that the population resides in an extremely saline region of the country and has shown deviations in these genes was intriguing and made it compelling to speculate that this finding is biologically relevant. This is especially interesting in the light of the fact that a recent GWAS study of hypertension and blood pressure in African Americans implicated a similar family of genes related to ion channels, cadherins, and calmodulins.

IE here means “Indo-European.” Since the samples are from Gujarat, an Indo-European speaking region, one would expect this affinity, though recall that the Siddis are biased toward a more northern affinity than that. In any case, the implications of constraint and selection on these loci have long been discussed, and the Afro-Indian case serves as an interesting replication of the larger pattern.

Summary points:

1 – The Siddis are relatively recent in time in their origin. Post-1500, and possibly early British.

2 – Admixture with South Asians was more “female mediated.” That is, Indian ancestry tends toward a maternal origin, though not exclusively so.

3) The ancestry also seems somewhat biased toward north and western South Asian sources. Shah et al. had a Karnataka sample, which is in a Dravidian speaking region (albeit, with Indo-Aryan minority populations), and they still found that in that group a North Indian ancestral population was a better fit than a South Indian one. The main caveat is that this may be due to exogenous West Asian or European ancestry against a South Indian background.

4) There seems some evidence of changes in the selective constraints and pressures, which have had a genome-wide impact even in ~10 or so generations.

On a final note: if the numbers quoted here are correct then I believe that the majority of the African ancestral element within the boundaries of South Asia is distributed amongst South Asian Muslims. A generous estimate of the number of culturally identified Siddis seems to be ~250,000. If 0.25% of the genome of Pakistanis is African, which I think is plausible, then that would be ~400,000 Siddis! I suspect that Indian Muslims, even some Bangladeshis with Middle Eastern ancestry (such as my mother), also have a non-trivial African ancestral element due to the cosmopolitanism of the Dar-ul-Islam, and the ubiquity of black slaves as consumption signals and military shock troops amongst Islamic elites. As for how much is found in the Hindu population, that will be a good gauge I think not of the intermarriage of Africans with Hindus, but the assimilation of liminal Muslim groups, in particular sects considered heterodox by India’s Sunni rulers, into the Hindu caste system.

Citation: Anish M. Shah, Rakesh Tamang, Priya Moorjani, Deepa Selvi Rani, Periyasamy Govindaraj, Gururaj Kulkarni, Tanmoy Bhattacharya, Mohammed S. Mustak, L.V.K.S. Bhaskar, Alla G. Reddy, Dharmendra Gadhvi, Pramod B. Gai, Gyaneshwer Chaubey, Nick Patterson, David Reich, Chris Tyler-Smith, Lalji Singh, & Kumarasamy Thangaraj (2011). Indian Siddis: African Descendants with Indian Admixture American Journal of Human Genetics : 10.1016/j.ajhg.2011.05.030

Citation: Ankita Narang, Pankaj Jha, Vimal Rawat, Arijit Mukhopadhayay, Debasis Dash, Indian Genome Variation Consortium, Analabha Basu, & Mitali Mukerji (2011). , Recent Admixture in an Indian Population of African Ancestry American Journal of Human Genetics : 10.1016/j.ajhg.2011.06.004

Addendum: Am the only one a touch weirded out by the face of the black person in the first figure? It isn’t as if illiterates are going to be reading the paper! Kind of funny though.

🔊 Listen RSS

As I’ve been harping on and on for the past few years that the patterns of contemporary genetic variation are probably only weakly tied to past patterns of genetic variation (though Henry Harpending warned me about this as far back as 2004). A major reason that scholars operated under this presupposition is the axiom that most of the variation we see around us crystallized during the Last Glacial Maximum (~20 thousand years before the present).

This may be true in some cases, but I doubt it is true in most cases. I was pointed to a classic case of this problem just today. A reader alerted me to a short paper from this spring which attempts to ascertain the point of origin of the dominant mtDNA haplogroup among the Onge tribe of the Andaman Islanders, M31a1. This is an interesting issue because some researchers proposed, plausibly in the past, that these indigenous people in the Andaman Islands represent the descendants of the first wave “Out of Africa,” who took the rapid “beachcomber” path. Understanding the key to their genetics may then unlock the key to the “Out of Africa” event. Or so we thought. It looks like the human evolutionary past was a lot more complicated than we’d presumed.

The paper is in the Journal of Genetics and Genomics. Mitochondrial DNA evidence supports northeast Indian origin of the aboriginal Andamanese in the Late Paleolithic:

In view of the geographically closest location to Andaman archipelago, Myanmar was suggested to be the origin place of aboriginal Andamanese. However, for lacking any genetic information from this region, which has prevented to resolve the dispute on whether the aboriginal Andamanese were originated from mainland India or Myanmar. To solve this question and better understand the origin of the aboriginal Andamanese, we screened for haplogroups M31 (from which Andaman-specific lineage M31a1 branched off) and M32 among 846 mitochondrial DNAs (mtDNAs) sampled across Myanmar. As a result, two Myanmar individuals belonging to haplogroup M31 were identified, and completely sequencing the entire mtDNA genomes of both samples testified that the two M31 individuals observed in Myanmar were probably attributed to the recent gene flow from northeast India populations. Since no root lineages of haplogroup M31 or M32 were observed in Myanmar, it is unlikely that Myanmar may serve as the source place of the aboriginal Andamanese. To get further insight into the origin of this unique population, the detailed phylogenetic and phylogeographic analyses were performed by including additional 7 new entire mtDNA genomes and 113 M31 mtDNAs pinpointed from South Asian populations, and the results suggested that Andaman-specific M31a1 could in fact trace its origin to northeast India. Time estimation results further indicated that the Andaman archipelago was likely settled by modern humans from northeast India via the land-bridge which connected the Andaman archipelago and Myanmar around the Last Glacial Maximum (LGM), a scenario in well agreement with the evidence from linguistic and palaeoclimate studies.

Geologically unless the Andaman Islanders’ ancestors were accomplished open ocean travelers they almost certainly did arrive via Myanmar. The inference they’re making is based on the likely false axiom that mainland Southeast Asia has been genetically stable for the past 10 to 20 thousand years . It hasn’t been genetically stable over the past 1,000 years! The authors themselves offer up a good explanation for what’s going on here in the conclusion:

In summary, by extensively studying a large number of Myanmar samples, our results failed to find any root lineage of haplogroup M31 in Myanmar, therefore suggesting that aboriginal Andamanese were unlikely originated from Myanmar, the closest region to the Andaman archipelago in geographic. Nevertheless, we still cannot completely rule out the possibility that the matrilineal landscape in Myanmar had been largely shaped by the Neolithic immigrants from the neighboring regions, addressing this issue needs extensive studying on the Myanmar populations. Significantly, our further analyses strongly suggested that Andamanese-specific M31a1 finds its origin in northeast India. Therefore, it seems that the ancient people bearing M31a root type likely had peopled the Andaman archipelago via the land-bridge connecting the Andaman archipelago and southeast Asia continent around the LGM.

Bingo! At a minimum it seems likely that the Onge have been resident in the Andaman Islands for ~10 thousand years. Therefore we should be cautious I think about making too many inferences as to whether their ancestors were resident only in Mynamar, or spanned the South China Sea to the Indus, and so forth. But, I think we can grant that they arrived via Mynamar, and were once resident in Myanmar. The disjunction between mtDNA lineages in their rather large sample strongly implies that Myanmar has seen major demographic reshaping since the ancestors of the Andaman Islanders parted ways with their mainland kin. This stands to reason. It is almost certainly likely that Myanmar was dominated by populations speaking Austro-Asiatic languages at some point in the past. These were replaced by the ancestors of the Burmans, Karen, etc. And to some extent even these have been displaced by newcomers, such as the Shan. But the Austro-Asiatic people themselves probably came from further east. If, and it’s a big if, the kin of the Andaman Islanders were the population which immediatedly predated the Austro-Asiatic groups, then there have been two linguistic shifts, likely accompanied by major genetic turnover. In fact I suspect there were probably more transitions in the past. I doubt hunter-gatherer populations were quite as static as we sometimes seem to posit, at least in the past 40 thousand years.

Here’s a table of haplogroup frequencies:

And here is how the branches of M31 are related to each other:

The Onge branch is distinct, as you might expect from an isolated island population. Using the molecular clock models they came up with a series of coalescences back to the last common ancestor (represented by the star in the figure above). I’ll quote them:

Previous work has suggested the “recent settlement” of the Andaman archipelago about 24 ± 9 kilo-years ago (kya) (Barik et al., 2008). In view of the time estimation results based on the updated phylogeny tree of haplogroups M31, peopling the Andaman archipelago would have occurred after the differentiation of lineage M31a (19.82 ± 10.01 kya) and before the divergence of M31a1 (7.96 ± 3.91 kya) (Table 2). Intriguingly, a similar result was achieved by studying the whole nuclear genome, in which the Andaman aboriginals were suggested to be originated from the potential ancestral populations of South Asian sub-continent before the admixture of ASI-ANI on the mainland (Reich et al., 2009). Noticeably, the paleoclimate evidence and data of Global Ocean Associates prepared for the office of Naval Research have showed that the sea level of Southeast Asia was about 120 m lower than that of today before 17 kya, and most of the sea level of Andaman sea was above 100 m today, supporting the existence of the potential land-bridge connecting Andaman archipelago and southeast Asia continent before the Last Glacial Maximum (LGM) (22–18 kya) ([Voris, 2000] and [Clark et al., 2009]). Taking into account the interesting distribution patterns and time estimation results of different subclades within haplogroup M31, it is likely that the ancestors of aboriginal Andamanese had arrived at Andaman arhcipelago around the LGM through the land-bridge before it was submerged with the raising of the sea level after the peak of the LGM.

They’re right that their number is in rough alignment with the results from Reich et al. The Andaman Islanders diverged from “Ancestral South Indians” on the order of a few tens of thousands of years before the present. But I wonder as the value-add of their estimate when they have a interval over ~10 years on their expectation. That being said, it seems clear that this mtDNA estimate at least pegs a lower boundary. As cultural anthropology would tell us the Andaman Islanders diverged from mainland South Asians well before agriculture. And, the arrival of “Ancestral North Indians.”

On a final note, if the Andaman Islanders arrived ~20 thousand years before the present from the South Asian mainland they don’t tell us very much about the “Out of Africa” people. They’re not “living fossils,” and it was frankly somewhat stupid probably to think they would be. Until recently the “Out of Africa” event was pegged at ~50 thousand years, at its most recent. Even assuming this date the Andaman Islanders arrived in their present location closer to the present than the point at which their ancestors left Africa. But now there is more of a tendency to accept the possibility that the “Out of Africa” event wasn’t so cut & dried in any case, and may date as far back as ~100,000 years. If so we may simply have to acknowledge that fine-grained understanding of paleodemographics will always elude us if we can’t get our hands on a sample of ancient DNA. Even among pre-agricultural peoples there was probably too much population genetic turnover for the palimpsest to be teased apart with enough subtly to read the tea leaves of the past

🔊 Listen RSS

ResearchBlogging.orgThe Pith: Honorable intent and punctilious adherence to proper form and method does not guarantee a set of results which flesh out a genuine phenomenon. Much of science is tragic.

Most of the time I point to and review papers on this weblog which excite me. But in the interests of “balance” and dampening the bias toward material I find interesting and salient I thought it would be interesting to look at a paper which I thought wasn’t too interesting. It’s in the Journal of Human Genetics, part of the Nature Publishing Group empire. Also, it is open access, so you can read it yourself and make your own individual judgments.

The Soliga, an isolated tribe from Southern India: genetic diversity and phylogenetic affinities:

India’s role in the dispersal of modern humans can be explored by investigating its oldest inhabitants: the tribal people. The Soliga people of the Biligiri Rangana Hills, a tribal community in Southern India, could be among the country’s first settlers. This forest-bound, Dravidian speaking group, lives isolated, practicing subsistence-level agriculture under primitive conditions. The aim of this study is to examine the phylogenetic relationships of the Soligas in relation to 29 worldwide, geographically targeted, reference populations. For this purpose, we employed a battery of 15 hypervariable autosomal short tandem repeat loci as markers. The Soliga tribe was found to be remarkably different from other Indian populations including other southern Dravidian-speaking tribes. In contrast, the Soliga people exhibited genetic affinity to two Australian aboriginal populations. This genetic similarity could be attributed to the ‘Out of Africa’ migratory wave(s) along the southern coast of India that eventually reached Australia. Alternatively, the observed genetic affinity may be explained by more recent migrations from the Indian subcontinent into Australia.

To be blunt about it I think the researchers here just randomly stumbled onto a weird result which happened to align with some plausible preconceptions. This happens all the time, and is responsible for the unfortunate confirmation bias which plagues science. Researchers know very well what the expected results are, and may unconsciously or consciously sift through their data for a set of facts which align well with their theoretical preconceptions. In this case it isn’t quite so bald, as there are no orthodoxies, but a set of alternative hypotheses which go back a century or so.

The back story is the idea of the Australoid race, first conceived of by Thomas H. Huxely. To the left is a map which illustrates the original divisions of mankind as inferred by Huxley from his catalog of human characters. I haven’t included the labels because they should be rather intuitive. Observe the similar shading of Australia and a portion of India. This is as economists might say a ‘stylized fact,’ it captures the basic nugget of truth, but shouldn’t be taken as a strict concrete representation of reality. The fact is that it is obvious that upon visual inspection many South Asians, especially those termed adivasi, the “tribal” population which has customarily existed on the margins or outside of the Hindu caste system, bear some resemblance to Australian Aborigines. Additionally some anatomists adduced that there were similarities in the skeletal morphology and the like. I can’t evaluate that, but there’s a long tradition in biological anthropology which asserts that there is some connection between the peoples of Australia, and a substrate element in South Asia. Many South Asians I know can see this resemblance as well, so it isn’t as if this was “invented” by Thomas H. Huxley from his fertile mind.

More recently there has been the idea that the Out of Africa migration was characterized by a “southern wave” which skirted the coastlines of the Indian ocean, and pushed all the way to Australia. The reason that this rapid maritime migration has been posited is that the residence of modern humans in Australia is of long standing, on the order of ~50,000 years. In a traditional genetic model of the emergence of modern humanity that left barely any time between the rise of modern humans in Africa and their arrival in Australia (in contrast, anatomically modern humans didn’t arrive in Europe until after 40,000 years before the present, and perhaps a bit later). Obviously any migration of humans from Africa to Australia would have had to touch base in India. Therefore genetic anthropologists went looking, in particular they focused on the mitochondrial and Y chromosomal lineages. Eventually they found what they were looking for. At low frequencies in India they detected possible connections to Australian haplogroups. In other words, the ancestors of Australian Aborigines who had no doubt touched down in India left some descendants in India.

The idea of a southern migration of neo-Africans ~50,000 years ago naturally allowed one to bridge Huxley’s model of an “Australoid race” derived from pre-cladistic taxonomy to the methods of modern genetics. And conveniently for the purposes of time depth the features of the Australoid race are more clearly represented amongst the tribal and low caste populations which are also presumed to have deeper roots in South Asia.

There are two major problems which jump out at me here though. The first is somewhat theoretical: how exactly does phenotypic continuity get maintained between populations which diverged ~50,000 years ago? According to the older model of modern human origins this isn’t really that much later than the last common divergence between all non-Africans, and perhaps even Africans. Did the Australian Aborigines and Indian tribal populations enter into a period of phenotypic stasis? There a rejoinder here: the connections between Indian tribal populations and Australian Aborigines are far more recent. The arguments, theses, and data to support this conjecture are all laid out in the paper. The most extreme adherents have suggested that in fact a migration occurred to Australia within the last ~5,000 years, which brought the dingo, and that that migration is the common source population of Australian Aborigines and Indian tribes. Both the genetic and archaeological data are tendentious which might support this model. The discussion in the text of the paper doesn’t go into the contention and frank politicization which occurred in regards to these theories in Australia. And why should they? It’s a journal of human genetics, not one of the social construction of science. But it’s important to keep in mind.

But the big issue is that as they note surveys of hundreds of thousands of SNPs don’t really show a connection between Aborigines and South Asians which are particularly supportive of any strong affinity between the two groups. Projects such as the Harappa Ancestry Project have huge data sets of South Asians, including tribal Indians. At low K’s there is some affinity between Papuans and South Asians, but this tends to go away at higher K’s. I do think there is some continuity and relationship between Oceanians (Australian Aborigines & Melanesians) and the genetic substrate of South and Southeast Asia, but it is far too attenuated to substantiate the persistence of an Australoid race.

So what’s going on with the results in this paper? As I note in the title the methods are in my opinion kosher from what I can tell. But the conclusion just doesn’t seem creditable. How to explain the failure of valid methods? First, they use 15 loci. Granted, these are hypervariable regions of the genome which should be ancestrally informative. But it’s still 15 markers! Very importantly the authors note in regards to the Australian Aborigine affiliated Indian tribe:

For example, they possess the lowest number of alleles (115) of all the reference worldwide populations examined…They also display the lowest average observed heterozygosity (0.75643)…The high degree of genetic homogeneity observed could also have been caused, in part, by their low status in the social hierarchy.

I think a plausible explanation for their genetic homogeneity is that like many Indian tribes they have low effective population sizes, and so lost most of their genetic variation because of drift. Take 15 markers, crank them through drift, and I don’t think it is implausible that you could random walk a population far away from its neighbors. Indian tribal populations in other analyses seem to exhibit a repeated pattern of strange results because of excessive inbreeding or some sort of population bottleneck in the recent past (think about how the Kalash of Pakistan often break out in their own genetic cluster).

This brings me back to my suspicion that this is just a false positive which bubbled up at the confluence of a preconceived model and the noise which is going to be an issue in any of these statistical genetic analyses. The authors know that Indian tribes should cluster with Australian Aborigines in some models. So when they see one of their several Indian tribal populations clustering with Aborigines on their 15 marker diagnostic, naturally this result is slotted into the prefab model. But as I have hinted before if you “mix & match” the populations in your data, modulate the marker thickness, and tweak parameters enough, you can “stumble” upon many explanatory models using these algorithms which infer genetic distance and ancestry. I suspect that other research teams using other tribal populations with other STRs may have stumbled onto weirder results, such as a cluster of Indian tribals with Sami or Greenlanders, which were just assumed to be ridiculous on the face of it. This particular result is obviously not ridiculous on the face of it, but I think looking at the full sweep of other genetic results we can discard it as being a good representation of the total genome affinity between these two populations. A reductio ad absurdum of this emphasis on a small marker set were the old attempts to construct races based on blood group distributions!

Finally, what about old Thomas H. Huxley and his Australoid race? I think that it’s probably convergent evolution. Humans come in a range of colors from pink to very dark brown. They don’t come in red or yellow or green. They’re tall or short. Their hair is curly or straight. And so on. In the finite set of possible variables you’re going to have many human populations which arrive at a convergence of traits, and so resemble each other despite lack of particularly recent common ancestry. The Ainu of Japan were once assumed to be a distant branch of the family of European peoples because of their lack of the distinctive characteristics of their Japanese neighbors. Even the early classical genetic markers disabused scientists of this possibility, and more recent genetic work seems to point a broad affinity with other Siberian populations. Similarly, despite superficial similarities between Melanesians and Africans, the two groups are not particularly close (in fact, most genetic distance measures seem to place Melanesians as more distant from Africans than West Eurasian populations, probably due to greater long term isolation).

These sorts of complications are why I’m so obsessed with emphasizing a caution about relying on a particular figure or paper as definitive on a given genetic question. In some domains results can be taken out of their proper context, but in the case of a statistical science there’s just a lot of randomness, and our pattern matching intuitions and culturally preconditioned expectations strongly predispose us to anchor onto confirming results. This is a major reason why I’m pretty dismissive and hostile to attempts to “win” arguments by dragging out a few citations. The unfortunately reality is that most results are either trivial or false, and with a search engine you can construct an argument with five supporting facts elementary school style within a few minutes.

This may “win” the argument, but you lose the war to “win” an understanding of reality.

Addendum: The undersampling of Australian Aborigine populations and South Asians in surveys of genetic variation softens the force of my critique here. It may be that the Soglia are a particular distinctive Dravidian tribe, which preserve a very ancient element in South Asian genetic history. Honestly I kind of doubt it after seeing the rampant admixture results among all South Asians in the most recent waves of SNP-chip studies (including the amateurs who are genome blogging). A bigger issue for me is the undersampling of Australian Aborigines. There may be variation which we’re just no aware of it. I doubt that that variation will be too surprising, but who knows?

Citation: Morlote DM, Gayden T, Arvind P, Babu A, & Herrera RJ (2011). The Soliga, an isolated tribe from Southern India: genetic diversity and phylogenetic affinities. Journal of human genetics, 56 (4), 258-69 PMID: 21307856

• Category: Science • Tags: Anthropology, Genetics, Genomics, India Genetics 
🔊 Listen RSS

Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he’s been driving his computer to crunch up ADMIXTURE results ascending up a later of K’s. Because it is the Harappa Ancestry Project Zack’s populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, “Ancestral North Indian” (ANI) and “Ancestral South Indian” (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no “pure” groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago.

At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:

Now let’s take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

The r-squared between % ASI and % Onge, an Andaman group, is 0.994. That means 99.4% of the variation in the former can be explained by variation of the latter. The % ASI is consistently higher than Onge. Why? The last common ancestors of Andaman Islanders and the ASI diverged on the order of tens of thousands of years ago. Dienekes observed ADMIXTURE needs good reference populations, and the Onge have been so long diverged from the last common ancestor with the mainland ASI populations that it’s not a perfect proxy for this ancient group. But it seems that the underestimate is systematically biased in the same direction, so that explains the good fit between the two trends.

Zack naturally generated a pairwise matrix of Fsts between these inferred ancestral populations. Remember, the value within Fst shows the proportion of the genetic variance in the two populations which can be partitioned across them, but not within them. So it’s a rough measure of genetic distance.

Here’s the matrix. I’ve renamed some populations:

S Asian Andaman E Asian SW Asian European Siberian W African Papuan Amerindian Khoisan/Pygmy E African
S Asian 0 0.165 0.121 0.09 0.071 0.134 0.184 0.21 0.175 0.261 0.15
Andaman 0.165 0 0.122 0.161 0.152 0.144 0.224 0.209 0.207 0.304 0.304
E Asian 0.121 0.122 0 0.152 0.137 0.067 0.216 0.205 0.139 0.294 0.187
SW Asian 0.09 0.161 0.152 0 0.048 0.163 0.179 0.235 0.208 0.257 0.143
European 0.071 0.152 0.137 0.048 0 0.143 0.186 0.223 0.178 0.261 0.148
Siberian 0.134 0.144 0.067 0.163 0.143 0 0.232 0.228 0.141 0.311 0.203
W African 0.184 0.224 0.216 0.179 0.186 0.232 0 0.286 0.281 0.123 0.059
Papuan 0.21 0.209 0.205 0.235 0.223 0.228 0.286 0 0.29 0.367 0.26
Amerindian 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.29 0 0.364 0.252
Khoisan/Pygmy 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364 0 0.133
E African 0.15 0.195 0.187 0.143 0.148 0.203 0.059 0.26 0.252 0.133 0

The South Asian population above is very different from the components you’ve seen before. It seems equivalent to ANI more than anything else. This is a good reminder that the labels we’re giving to these ancestral groups are mnemonics, they’re not to be taken as literal and concretely. Personally I find Fst matrices hard to read, so I’ve generated a number of multidimensional scaling plots illustrating the relationships with the matrix. Clarity can be achieved by mixing & matching the populations, so that’s what I did. Also, I only display dimension 1 and dimension 2. Remember that dimension 1 is the one with more weight.

Do not think of these as real concrete populations from which all modern populations emerged. These eleven populations are abstractions which fulfill the dictates of the algorithm. But, I do think that with that caveat in mind, there are suggestive patterns.

First, the “SW Asian” component isn’t that much closer to “W Africans” than the other West Eurasian groups. Yet we know in reality that Southwest Asian populations are closer to Africans. What’s going on? Southwest African populations have African admixture. And, that admixture is recent enough that it shakes out rather easily. This is in contrast to the normal South Asian modal components, which are indicative of a greater time since admixture, which was thorough enough that it is not trivial to tease out the two ancestral groups from each other’s genetic background. Fission and fusion are normal parts of the history of any geographically expansive species. ADMIXTURE will capture the earlier parts of fusion. But after a long enough period of time that fusion becomes its own distinctive element.

There is the conventional east-west division you see in Eurasia on PCA, but you see evidence of the north-south secondary component on these plots too. The Andaman populations are closer to East Eurasians than West Eurasians, but, they also occupy their own position which highlights a north-south axis.

Finally, the S. Asian/ANI population seems somewhat closer to “Europeans” than “SW Asians. That is interesting. But this where you have to very careful and remember that these “pure” ancestral components can themselves fractionate into substituent elements at higher K’s or when you constrain the data set appropriately (Africans and inbred groups tend to hog clusters in ADMIXTURE). If you’ve read all the genome bloggers you will be aware that “European” and “SW Asian” components themselves break apart upon closer inspection. The “SW Asian” component usually divides into a northern and southern branch. The northern branch is often positioned closer to the other “European” groups than it is to the southern branch in terms of genetic distance. Here are a selection of West Eurasian groups sorted by their “S Asian” proportion:

South Asian %
Iranians 30%
Lezgins (Caucasian) 29%
Georgians (Caucasian) 26%
Adygei (Caucasian) 24%
Armenians 22%
Turks 21%
Syrians 19%
Druze 18%
Lebanese 17%
Samaritians 16%
Palestinian 15%
Cypriots 14%
Saudis 14%
Yemenese 14%
Russian 8%
Tuscans 7%
Hungarians 7%
Utah whites 7%
Orcadian 5%
British 5%
French 5%
Italian 5%
Finnish 4%

Also observe that the distance between SW Asians and Europeans is smaller than bertween Europeans and S Asians. Crunching up the K’s, or limited the data set to West Eurasian groups, would probably show more fine-grained relationships.

🔊 Listen RSS

Whenever Zack Ajmal posts a new update to the Harappa Ancestry Project he appends some data to his ethnic database. This sends me to Wikipedia, because how many people are supposed to know what a “Muslim Rawther” means? Well, if you are a Muslim Rawther, and perhaps from Southern India, you would. But South Asian ethno-linguistic categories and hierarchies are notoriously Byzantine, and I have difficulty making sense of them. This isn’t too surprising in my case, as my family’s background is relatively mixed in the very recent past (e.g., Hindus and Muslims, and people of various caste backgrounds), so we’re not the sort who can go at length about our pure ancestry and all that stuff. Unfortunately, Wikipedia isn’t always useful, because the people editing the entries on particular South Asian ethnic groups are often people from those ethnic groups, so you get a lot of extraneous information, and a particular slant on how awesome and high achieving the group (also, sometimes there’s funny stuff about how notoriously good looking that particular caste!). On occasion there are other sources which are informative. For example, Zack has several individuals from the Tamil Nadar caste. I know a little about this group because 1) I have a friend whose family is Nadar (he’s American, so saying he’s an American Nadar is pretty worthless), 2) The New York Times profiled the group last fall.

When Zack noted that a group termed Tamil Vishwakarma had submitted entries, I went to Wikipedia. That was the first time I’d heard of the group. This is what I found:

Viśvákarma is the term used in India for a caste of priests, engineers, architects, sculptors, temple builders and artists. The term is applied to five sub-castes; blacksmiths, carpenters, coppersmiths, goldsmiths and sculptors.They connect themselves as Pancha janas of vedic period [Rathakara, Karmakara, Thakshaka, Kumbhakara,and NishadaSthapathies] and worshiping various forms of Viswakarma, i.e., Twostar, Daksha prajapathy, Takshaka and Maya and Rhibhus etc.

Vishwakarma Brahmins are also called Rathakara Brahmins, and the Rathakara mentioned in the Rigveda (1.6.32) indicates high status and is associated with the placing of the holy sacrificial fire in the Yajna kunda…According to the Srautasutras, the Rathakara (Chariot-maker) is entitled to perform all the sacrifices….In many sacrifices, like the Rajasuya, the Rathakara played a role as recipient of the offerings (ratninaḥ)….

First, I don’t know what a lot of this means. For example, “many sacrifices, like the Rajasuya….” makes no impression on me, as I don’t know what Rajasuya is supposed to be. But the salient point here is that the Vishwakarma are making some assertion to a relationship with Brahmins. This, I can understand. Many non-Brahmin groups in South Asia want to associate with Brahmins, because Brahmins are high status and socially superior. I assume most of the time this is made up, how many fallen Brahmins can there be exactly? It’s kind of like claiming descent from Muhammad among Muslims, or being descended from a particular lecherous and promiscuous king among the poor of Europe.

But after months of the Harappa Ancestry Project you can shift your assessment of the probabilities based on the genetics alone. South Indian Brahmins are genetically distinctive consistently from other non-Brahmin South Indians. So how would I go about exploring the veracity of the Vishwakarma’s claims?

First, I am looking at K = 4. So the data set has four ancestral populations: South Asians, Europeans, East Asians, and Africans. These are hypothetical abstractions, so focus on the relative relationships across individuals and groups, not on the absolute quanta. I took Zack’s ADMIXTURE results, ethnic labels, and added a few categories myself. You can see the CSV here. Basically I took the ones with caste identification and partitioned them into Brahmin vs. non-Brahmin. Note that the non-Brahmin categories includes groups of all caste ranks. It’s socially heterogeneous. I also added a geographical label. NW = Pakistan, and the northwestern third of India. NE = Bangladesh and the northeastern third of India (Bihar is in the northeast here). S is south, for the four Dravidian dominated states. And C, central, includes Mahrashtra, Gujarat, etc.

First, let’s look at all the Brahmins and the two Vishwakarma. I sorted by South Asian ancestry.

The Vishwakarma are outliers among the Brahmins. You can see a discontinuity.

Sorting by South Asian, European, and then East Asian ancestry, here are the Vishwakarma’s neighbors:

They’re like other non-Brahmin South Indians. No discontinuity. I can’t attest to the spiritual Brahmanitude of the Vishwakarma, but I’d say that they’re probably asserting Brahmin associations to elevate their status vis-a-vis other castes.

Now let’s look at the all the Harappa samples. I will sort first by region, and then by South Asian ancestry.

A few notes. Jatts are originally the freehold peasant cultivators of Punjab I think. They think they’re pretty awesome! Sourastrians are transplants from Gujarat to Tamil Nadu in the South. They’ve maintained their Indo-Aryan dialect. The two Bengalis with a lot of East Asian ancestry are my parents. There is a pretty fit here to a two parameter model of predicting South Asian ancestral quanta: geography & caste. Where this breaks down the most seems to be in the far northwest, where the Brahmins don’t seem to be that much less South Asian than non-Brahmins, and in fact, perhaps more South Asian than the peasant Jatts.

Finally, the origins of Caribbean Indians is generally presumed to be among the peasants of the northeastern half of the Indo-Gangetic plain, going by the historical sources and the persistence of Bhojpuri in Trinidad and Guyana. This looks about right.

• Category: Science • Tags: Genetics, Genomics, India Genetics, India genomics 
🔊 Listen RSS

School girls in Hunza, Pakistan

A few days ago I observed that pseudonymous blogger Dienekes Pontikos seemed intent on throwing as much data and interpretation into the public domain via his Dodecad Ancestry Project as possible. What are the long term implications of this? I know that Dienekes has been cited in the academic literature, but it seems more plausible that this sort of project will simply distort the nature of academic investigation. Distort has negative connotations, but it need not be deleterious at all. Academic institutions have legal constraints on what data they can use and how they can use it (see why Genomes Unzipped started). Not so with Dienekes’ project. He began soliciting for data ~2 months ago, and Dodecad has already yielded a rich set of results (granted, it would not be possible without academically funded public domain software, such as ADMIXTURE). Even if researchers don’t cite his results (and no doubt some will), he’s reshaping the broader framework. In other words, he’s implicitly updating everyone’s priors. Sometimes it isn’t even a matter of new information, as much as putting a spotlight on information which was already there. Below is a slice of a bar plot from Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. It uses STRUCTURE with K = 7. To the right of the STRUCTURE slice are two plots of individual data on French and French Basque from the same HGDP data set using ADMIXTURE at K = 10 from Dodecad.


Repeated runs and higher K’s make it clear that the French Basque lack a “West Asian” aspect which other French, and Iberians as well, have. Some of this is clear in the paper I referenced above as well…the key is you have to look at the supplements at K = 6. Because the Basque are the only native non-Indo-European speakers in Western Europe, their origin and relationship to nearby populations has always been of interest (they also have the highest Rh- frequency of world populations). Granted, the French Basque are very similar genetically to the French as a whole. But, it is obviously highly informative that they lack an ancestral component in totality which seems to exist at low but consistent levels across Western European populations. The only other European population at K = 15 who lack the West Asian component in totality are Finns (the Lithuanians come very close).

This is all preamble to a discussion of a post Dienekes put up today, A solution to the problem of Indo-Aryan origins. Remember that Dienekes has been “playing” with ADMIXTURE for only a few months. To claim to have found a ‘solution’ to a problem as intellectually and politically intractable and explosive as this is rather bold. The crux of the matter is that at a certain confluences of K’s and population sets Dienekes has discovered a distinctive signature of ancestry which seems to be modal on the north slope of the Caucasus, and spans India and Europe. He terms this “Dagestani,” due to the fact that among a population sample from this province in Russia this ancestral component is overwhelmingly dominant. The patterns of Dagestani admixture in Europe and India are curious and suggestive.

1 – In Europe the frequencies are low, but irregularly distributed (excepting around the North Caucasus). Scandinavians and British have appreciable fractions, Finns and Southern Europeans do not. Here’s Dienekes:

Interpreting this pattern is not easy, but it does seem that this component seems to have a V-like distribution, achieving its maximum in Caucasus and its environs, then undergoing a diminution, and achieving a secondary (lower) frequency mode in NW Europe.

The surprising appearance of the homonymous Dagestan component in India suggests a widespread presence of a common ancestry element. The West Asian element, by comparison seems to have a more normal /-like distribution around its center in Anatolia-Caucasus-Iran region. It does reach the Atlantic coast, but is lacking in Scandinavia and Finland, and also in India itself.

2 – South Indian Brahmins have appreciable fractions, but non-Brahmins in the same region do not. In contrast, those who come from Indo-Aryan speaking backgrounds do seem to have Dagestani ancestral components, irrespective of other aspects of ancestry. For example Pakistanis don’t have that much more Dagestani than South Indian Brahmins or Gujaratis. Also compare the relatively narrow window of Dagestani ancestry variance among Dodecad South Asians (I’m DOD075). DOD088 is from what I recall a Reddy from Andhara Pradesh, a non-Brahmin but non-low caste. It is interesting that they have a high proportion of “Pakistan,” but no Dagestani. I have ~10% Dagestani, but no Pakistani.

Below is K = 10 for a selection of populations. Dienekes has now included in two non-Indo-European speaking Pakistani populations: the Brahui (Dravidian) and Burusho (linguistic isolate in the mountains of Pakistan):

Some general patterns are evident. The light blue is indicative of generic “Indian” ancestry. It is not found in appreciable proportions outside of subcontinental populations (or those of recent subcontinental origin). The same with the red, and light orange. For your reference the dark orange is a “Northern European” component, modal in Lithuania. The light and dark Green are both East Asian components. The dark blue is a “West Asian” component modal in Georgia, and prominent across Europe with declining as a function of distance from the eastern shore of the Black Sea (this is surely the West Asian which distinguishes the French from the French Basque). I believe that the light purple dominant in the Brahui and the light red dominant in the Burusho probably form as a compound the aforementioned Pakistani component. The dark purple is the Dagestani.

587px-Dravidische_SprachenFirst, a word on the Brahui. These are a group of tribes who reside in northern Balochistan in Pakistan. A small number are even to be found in Afghanistan. Historically they have had close relations with the Baloch, an Iranian speaking cluster of tribes who totally envelop the Brahui. The Brahui do speak a Dravidian language, of a family dominant in South India and found in isolated regions of Central and Eastern India. There are two broad models for the existence of a Dravidian language in Pakistan. The first is that the Brahui are remnants of more widely spoken Dravidian languages which date back to the Indus Valley civilization. The second is that the Brahui arrived during the medieval period from another region of South Asia where Dravidian languages were more common. Assuming either model, it has long been presumed that their involution by the Baloch has had a strong impact on the Brahui genetically; the two groups are very close. This is evident in Dienekes’ results as well. But observe that the Baloch are the group which seems more cosmopolitan in ancestry than the Brahui. If the Brahui were Dravidians from deep in India it seems that they would have a greater residual component of India-specific ancestry (light blue and orange). This is not so. In fact the Baloch have more of the Indian ancestral component than the Brahui. The Brahui component is found across Pakistan, and into India, albeit at lower proportions. Naturally, the Baloch have the second highest fraction. I believe these results should shift us toward the position that the Brahui are indigenous in relation to the Baloch, and that the Baloch ethnic identity emerged through the shift of a Brahui substrate, as evidenced by the greater cosmpolitanism of the Baloch. Additionally, Dienekes observes that the Brahui have a lower proportion of the Dagestani component than most other Pakistani groups, and several Indo-Aryan groups in India proper.

The Burusho are event more interesting than the Brahui. Unlike the Brahui the Burusho are very isolated in the mountainous fastness of Baltistan in northern Pakistan. Additionally, their language, Burushashki, is a linguistic isolate. Others of the class are Basque and Sumerian. In general it is assumed that linguistic isolates were once part of broader families of languages which have gone extinct. Burushashki probably persists in large part because of the geography which its speakers inhabit. Mountainous areas often preserve ethnic and linguistic diversity because the terrain allows for the persistence of local variety. I believe it is plausible that the Burusho have been far more isolated than the Brahui. This seems to show up in the ADMIXTURE plot, the Burusho have a greater proportion of their modal ancestral component than the Brahui. Additionally, the Burusho have even an smaller component of Dagestani than the Brahui.

Below is a chart Dienekes constructed ordered by proportion of Dagestani for his South Asian populations. Next to it I’ve placed a chart from a PCA which has some of the same population samples. Compare & contrast:


The PCA is looking at between population variation in totality. So naturally the Dagestani component isn’t going to be predictive of that. Rather, it speaks to the possibility which Dienekes is mooting: that the Dagestani component spread in the India subcontinent with the Indo-Aryans specifically, overlying the local resident substrate. In South India this meant that Brahmins brought this, mixing with the indigenous Dravidian population. In Pakistan the Indo-Aryan, and Iranians, were overlain on a substrate which were the ancestors of the Burusho and Brahui. The dominant signal of genetic relationship has to do with the substrate, not the Indo-Aryans. So that’s what’s going to show up on the PCA. In other PCA plots the model where South Indian Brahmins are a linear combination of a Pakistani-like population and a Dravidian population becomes clearer. But when you look at ancestry using something like ADMIXTURE you have the potential to tease apart different components, and so uncover relationships which may have been obscured when looking at aggregate variation.

dieDienekes’ model seems to post three steps in rapid succession ~4,000 years ago. A background variable which must be mentioned is that one must account for the Mitanni, a dominant Syrian power circa 1500 BC where a non-Indo-European language was the lingua franca, and yet a definite Indo-Aryan element existed within the elite. Indo-Aryan specifically because the Indo-European element within the Mitanni was not Iranian, but specifically Indo-Aryan. An easy explanation for this is that the Indo-Aryan component of the Indo-Iranian branch of the Indo-European languages crystallized outside South Asia, and independently reached Syria and India. In Syria it went extinct, while in India it obviously did not. By Dienekes’ model the Mitanni would be rather closer to the urheimat of the Indo-Aryans.

An aspect of his model which I do not understand is why it has to be Indo-Aryan, instead of Indo-Iranian. The South Asian population which the Dagestani component is modal, the Pathans, are Iranian, not Indo-Aryan. Additionally, this model seems to not speak in detail to the existence of the Dagestani element among Europeans. Here is a sorting of European populations (with Iranians included) by the Dagestani component:

Population Dagestan
Urkarah 93
Lezgins 47.9
Stalskoe 38.7
Adygei 16.4
Orcadian (Orkney) 12.6
Georgians 12.4
White_Utahns 11.2
Iranian 10.9
Scandinavian_D 10.2
Armenian_D 9.9
German_D 9.1
Turks 8.8
Armenians 8.4
French 7.9
Hungarians 7.5
Russian_D 6.3
Spanish_D 4.6
North_Italian 4.5
Spaniards 4.4
Romanian 4.1
Finnish_D 4.1
Russian 4
Greek_D 3.8
Portuguese_D 3.6
Tuscan 3.5
Tuscans 3.4
Lithuanians 2.9
S_Italian_Sicilian_D 2.8
Belorussian 2.5
Cypriots 2
Sardinian 1.5
French_Basque 0.7

There is here a strange pattern of rapid drop off from the Caucasus, and a bounce back very far away, on the margins of Germanic Northwestern Europe. This to me indicates some sort of leapfrog dynamic. A well known illustration of this would be the Ugric languages. The existence of Hungarian on what was Roman Pannonia is a function of the mobility and power of Magyar horseman, and their cultural domination over the Romance and Slavic speaking peasantry (their genetic impact seems to have been slight). No one believes that Germanic languages are closely related to Indo-Aryan (rather, if there is structure in Indo-European beyond Indo-Iranian, Celtic, etc., it would place the Indo-Iranian languages with Slavic). So what’s going on? I think perhaps the Dagestani component is part a reflection of the common Indo-European origin in that region. For whatever reason that signal is diminished in much of the rest of Europe. Perhaps Southern Europe was much more densely populated when the Indo-Europeans arrived. Additionally, it seems highly likely that in places like Sardinia, much of Spain, and Cyprus, Indo-European speech came through cultural diffusion (elite emulation) and not population movement. Or perhaps we’re seeing the vague shadows of population admixtures on the Pontic steppe, where distinct Germanic and Indo-Iranian confederations admixed with a common North Caucasian substrate.

Going back to India, let’s revisit the model of a two-way admixture between “Ancestral North Indians,” who were genetically similar to Europeans and West Asians, and “Ancestral South Indians,” who were closer to, but not very close to, East Eurasians. The ANI & ASI. The ASI were probably one of the ancient populations along the fringe of southern Eurasia, all of whom have been submerged by demographic movements from other parts of Eurasia over the past 10,000 years, excepting a few groups such as the Andaman Islanders and some Southeast Asian tribes. The model was admittedly a simplification. But taking that model as a given, and accepting that the Dagestani element is in indeed Indo-Aryan, we can infer that the ANI were not Indo-European. It is notable that the South Indian Brahmins have elevated fractions of both the Brahui and Burusho modal components. This is probably indicative of admixture of the Indo-Aryan element in the Indus Valley, prior to their expansion to other parts of India. I assume one of the languages spoken was Dravidian, though if ancient Mesopotamia was linguistically polyglot at the dawn of history I would not be surprised if the much more geographically Indus Valley civilization was as well.

Aishwarya Rai

The irony is that today when someone refers to a “Dravidian” physical type, they’re not talking about someone who looks like a Pakistani. They’re talking about someone who looks South Indian, where most Dravidian languages are spoken. But combining the inference from Dienekes’ model and the previous two-way admixture model, you reach the conclusion that lighter skin and more West Asian features among South Asians may be more due to Dravidian-speaking ancestors in the Indus Valley, not Indo-Aryans! It goes to show the wisdom of differentiating linguistic classes from biological ones when discussing historical population genetics. Unfortunately wisdom most of us interested in these topics do not show, alas.

As I like to say, interesting times….

Note: If you leave a comment, please don’t be smarter-than-thou in your tone. I have stopped publishing those sorts of comments because the reality is that most of them have not been that smart or informed. At least by my estimation. If you actually are smarter than the average-bear, and impress me with your erudition and analysis clarity, I’ll probably let your comment through no matter your attitude. But I wouldn’t bet on it if I were you, so show some class and humility. Most of us are muddling through.

Image Credit: Georges Biard, iStockPhoto

🔊 Listen RSS

I mentioned a few days ago that a friend was trying to get together some data to analyze the genetic variation of South Asians. By a strange coincidence Dienekes just published a more detailed analysis of South Asians…and uncovered something very interesting, though not that surprising. Some technical preliminaries:

A note of caution: The reduced marker set (~30k) means that a lot of noise is added in the admixture estimates. In particular, many individuals are likely to get low-level admixture from population sources that can be attributed to noise. But, as we will see, the small marker set does not really affect either the power of the GALORE approach, or of ADMIXTURE to infer meaningful clusters.

In addition to the various online sources of public data Dienekes got about a dozen South Asians. I was one of those South Asians, DOD075. In many ways I’m a rather standard issue South Asian, similar to Gujaratis, except that I have a substantial ‘East Asian’ component. More concretely, between 1/6 and 1/7 of my ancestry seems to be of eastern origin, far higher than the norm among South Asians. The rest of my ancestry was mostly South Asian specific, with a minor, but significant ‘West Asian’ component common across northern India.

Rerunning with more data with different samples Dienekes came out with a different set of ancestral components. Of particular interest to me he broke down the East Asian between East Asian proper and Southeast Asian. Below are a selection of populations with ancestral components + me. I’ve also renamed a few components. North Kannadi = Dravidian and Irula = Indian tribal. Indian = Generic Indian. Looking at the Fst it seems that Indian endogamy and population bottlenecks has had an effect…look at the North Kannadi distance from everyone else.


Remember that in the previous analysis I was very similar to a Gujarati, except with an East Asian element. My supposition that my ancestry has some connection to Burma seems to be supported by these results. Looking at my balanced ratio between East Asian and Southeast Asian, that is what one might expect from someone of a Burman ethnicity. I am not saying that I have recent Burman ancestry per se. Rather, Ahom, Mizo, Chakma, and a range of tribal populations from the liminal zone between South and Southeast Asia may suffice. The main other option is that I have a great deal of Munda ancestry. Not implausible in light of the likelihood that Munda brought rice agriculture to northeast South Asia, and pre-date Indo-Aryans, and possibly Dravidians, in Bengal. How would I distinguish these possibilities? I’ve ordered 23andMe kits for both my parents. The most likely candidate for recent Southeast Asian ancestry is my paternal grandfather. If the admixture event was recent, if I have a recent ancestor(s) of “hill tribe” origin, I would expect to see more linked regions of East/Southeast Asian origin than if the admixture was ancient (and so distributed more equitably across DNA strands due to recombination).

But the bigger point of Dienekes’ post is what he terms “Dagestani” ancestry across much of Eurasia. I’ll quote him:

The most exciting thing, however, is the fact that the origins of a part of the West Asian component of my previous analyses can be partially located: it is the purple component centered in Dagestan, i.e., among Northeast Caucasian speakers such as Lezgins, and the Dargins who inhabit Urkarah.

Readers of this blog may remember the surprising appearance of this Lezgin-specific component in the Balkans (but not Greeks) a few weeks ago. Now it has turned up as a substantial component in India as well.

Back then, I speculated that this component may derive from a prehistoric population that was spread in (but not limited to) the northern arc of the Black Sea from the Balkans to the Caucasus. Even in this analysis, you can see that both Romanians and Hungarians have some of it, and so do Lithuanians and Belorussians, while Tuscans (like the Greeks of my previous experiment) do not.

Hence, this component stretches from at least the Baltic to India, but is largely absent in southern Europe. I will go out on a limb and propose that this component is representative of a non-Indo-European component in the ancestors of the Indo-Iranians.

Paul Conroy observes that on this finer-grained analysis I don’t have any “West Asian” at all. What had previously been West Asian terms out to have been, in my case, a compound of Dagestani + European. I can’t say that I’m that surprised by this. Years ago I noticed that HGDP STRUCTURE analyses were always giving suggestive signs of a connection between West-Central Eurasia and South Asia.

Who were the Indo-Iranians? I lean toward the proposition that they do derive from the Andronovo culture of the Eurasian steppe. This would date the entrance and expansion of Indo-Aryans in northern India 3-4,000 years ago. I also contend that the dominant element of ancestry among modern South Asians is not Indo-Aryan. Rather, it is an ancient stabilized hybrid of pre-agricultural societies in the Indus valley and Neolithic farmers who originated from what is today western Iran and eastern Anatolia. Therefore, I posit that the “Aryanization” of the Indian subcontinent is properly modeled as the same processes which led to the emergence of an Anatolian and Rumelian Turkish identity; a small elite population which forces a identity shift among the majority.

Back to farming:

As I’ve remarked in the past, Eurasia can be broadly seen as the playground of three major groups of people: the Caucasoids of the West, the Mongoloids of the East, and a southern group of people which is most strongly represented in South Asia, but whose presence can be detected in Southeast Asia as well, although in the latter case it has been marginalized and/or absorbed by the arrival of Mongoloids.

This southern group of people has sometimes been called “Australoid” because of its perceived resemblance to Australo-Melanesians. Indeed, in my K=5 mega-analysis an affinity between Papuans/Melanesians and people of South and Southeast Asia is apparent. These “Australoids” are very old populations, probably stemming from the early Out-of-Africa coastal dispersal route, and we shouldn’t be tricked by their phenotypic similarity into thinking that different groups of them are particularly close genetically. Just as “black Africans” are not the same, neither are the “Australoids” and mixed-“Australoids” at the shores of the Indian Ocean.

It is probably the invention of agriculture that is responsible for their marginalization. In Africa, the Pygmies and Bushmen have been absorbed or pushed aside by the demographic Bantu juggernaut, with a few other language groups also hitching a ride on the agriculture/pastoralism economy. In West Eurasia, where agriculture was invented earliest, pre-agricultural populations left no traces. In East Eurasia, the agriculturalists could not expand to the far north where many relic populations exist, but they could (and did) move to the south where they assimilated or drove away pre-existing populations, leaving a few of thems, like the Taiwanese Atayal as partial remnants of the older population stratum.

The Irula are South Indian tribals, so they are the the closest one can get to South Asian autochthons, and yet even they presumably have a large minor component of “Ancestral North Indian.” The tribal groups in Reconstructing Indian Population History all exhibited proportions on the order of ~40% ANI. It seems that agriculture “stalled” in the Indus valley and the highlands to the west for thousands of years in South Asia. During this period of stalling I believe that the farmers absorbed a great deal of genetic material from the indigenous hunter-gatherers, and so produced a “distinctive” Indian genetic profile. More West Eurasian than not, but with a very large dollop of the ancient substrate of southern Eurasia which had a distant, but closer, affinity with that of East Asia. Once social and cultural forces allowed for the rapid expansion of farmers there was a wave of advance from the Indus valley east and south. In the east the proto-Indians would have encountered Mundari speaking groups drifting who practiced rice agriculture, which they also adopted. In the south the proto-Indians would have encountered more hunter-gatherers. Many of the tribal people in India are today facultative hunter-gatherers, herders, and extensive farmers. I believe that these marginal proto-Indian groups assimilated hunter-gatherers more easily than would have otherwise been the case because some of the proto-Indians reverted to a hunter-gatherer lifestyle in the agriculturally unsuitable highlands of the Deccan and Chota Nagpur. The social boundaries in the uplands of South India were such that the line between hunter-gatherer and farmer was more fluid than elsewhere, explaining the former’s greater genetic impact through intermarriage and assimilation.

This sort of general dynamic probably applies to Indo-Europeans. There is no reason why the original Indo-European tribes could not have been compounds who picked up different ancestral components in their peregrinations. Compare the various Turkic people, Anatolian Turks, Chuvash, and Yakut. All of them have affinities with nearby peoples, despite having a common Turkic culture and genetic component. One notable trend in Europe is that while the French have a minor, but significant West Asian component, the Basque have none of it. Dienekes’ sample is small, but it looks as if Scandinavians have more of this than the Finns. This West Asian component may not have been the dominant one among the Indo-Europeans, but I suspect it was a significant one. If the original speakers of proto-Indo-European did not have it, they likely absorbed early on, just as the West Asians absorbed a native South Asian element in the Indus valley.

Finally, as a general rule of thumb, I would now suggest that the primary way in which hunter-gatherer genes can persist is through an ecological stall on the part of farmers. During the stall gene flow naturally occurs, probably through exchange of females (coercive or not), or the integration of hunter-gatherer males into war-bands or as slaves. Over time the farmers on the frontier have changed genetically, so that when they start expanding rapidly due to a technological or cultural innovation, they share more with the hunter-gatherers whom they supersede than they otherwise would have.

🔊 Listen RSS


The past ten years has obviously been very active in the area of human genomics, but in the domain of South Asian genetic relationships in a world wide context it has seen veritable revolutions and counter-revolutions. The final outlines are still to be determined. In the mid-1990s the conventional wisdom was that South Asians were a branch of a broader West Eurasian cluster of peoples, albeit more distant from the core Middle Eastern-North-African-European-Caucasian clade. The older physical anthropological literature would have asserted that South Asians were predominantly Caucasoid, but with a Australoid element admixed in at varying proportions as a function of geography and caste. To put it more concretely, and I think accurately, a large degree of South Asian physical variety can be defined along the spectrum between A. R. Rahman and Nawaz Sharif. The regional and caste truisms are only correlations. Subrahmanyan Chandrasekhar was a Tamil Brahmin, but experienced anti-black racism in the United States. I think that is reasonable in light of his appearance. This rough & ready mainstream understanding, supporting by classical genetic markers, was overturned in the early years of the 21st century. One line of thought argued that South Asians were much more distinctive from the broader Western Eurasian cluster of peoples. Representative of this body of work is a paper like The genetic heritage of the earliest settlers persists both in Indian tribal and caste populations. These researchers tended to start with the female lineages, mtDNA, and then supplement that with Y lineages, the paternal descent. A separate line of evidence, generally drawn from Y chromosomal results, indicated that there were deep connections between the people of India and those of Central Eurasia, in particular via the R1a haplogroup. Additionally, one aspect of the first set of results which was very surprising was that it actually placed South Asians closer to East, not West, Eurasians. But by the end of the aughts the uniparental studies had been supplemented by a range of results produced from SNP-chips, which looked at hundreds of thousands of genetic variants. These studies seemed to support the older view of South Asians being closer to West Eurasians than East Eurasians. Finally last year a paper came out which posited that almost all South Asian populations were actually an ancient stabilized hybrid between two groups, a European-like population, “Ancient North Indians” (ANI), and another group which is no longer present in unadmixed form, “Ancient South Indians” (ASI), of whom the Andaman Islanders are distant relatives. Though there was a slight bias toward ANI as a whole, the fraction of ASI increased as one went southeast, and down the caste ladder. The distinctive “South Asian” ancestral group in other words then may actually be conceived of as a compound of these two elements; an admixture of the native substrate against a European-like genetic background.

Strangely it sounds an awful lot like the older idea of a Caucasoid population with Australoid admixture. We know now that the connection between the tribal peoples of India, and the indigenous groups of South and Southeast Asia as a whole, to those of Australia and Melanesia, is tenuous at best. So the term “Australoid” is not really informative, and may even mislead. And in terms of historical linguistics I don’t think we’ve solved the problem by appealing to an “Aryan invasion.” The high fraction of ANI among South Indian tribal groups who are isolated from even Dravidian caste groups is a clue to the likelihood that the admixture event is very ancient, and probably precedes the arrival of the Aryans to the Indian subcontinent.

But there are more than two actors in this game. In Reconstructing Indian population history the authors acknowledge that their model is stylized, that reality is more complex. Additionally, they perceive in their data that some tribal groups from northeast India have an element which is outside of the purview of a two-way admixture event. They discarded this set from their broader analysis because this seemed to be a restricted phenomenon to these groups. A new paper in Molecular Biology and Evolution re-injects this third element into the picture. Population Genetic Structure in Indian Austroasiatic speakers: The Role of Landscape Barriers and Sex-specific Admixture:

The geographic origin and time of dispersal of Austroasiatic (AA) speakers, presently settled in South and Southeast Asia, remains disputed. Two rival hypotheses, both assuming a demic component to the language dispersal, have been proposed. The first of these places the origin of Austroasiatic speakers in Southeast Asia with a later dispersal to South Asia during the Neolithic, whereas the second hypothesis advocates pre-Neolithic origins and dispersal of this language family from South Asia. To test the two alternative models this study combines the analysis of uniparentally inherited markers with 610,000 common SNP loci from the nuclear genome. Indian AA speakers have high frequencies of Y chromosome haplogroup O2a; our results show that this haplogroup has significantly higher diversity and coalescent time (17-28 KYA) in Southeast Asia, strongly supporting the first of the two hypotheses. Nevertheless, the results of principal component and “structure-like” analyses on autosomal loci also show that the population history of AA speakers in India is more complex, being characterised by two ancestral components – o ne represented in the pattern of Y chromosomal and EDAR results, the other by mtDNA diversity and genomic structure. We propose that AA speakers in India today are derived from dispersal from Southeast Asia, followed by extensive sex-specific admixture with local Indian populations.

Some background is necessary here. South Asia is notoriously linguistically diverse, but, that diversity can be bracketed into several broad families. First, the Indo-European languages are represented by Indo-Aryan and Iranian dialects (and Germanic, if you include English). Second, the Dravidian languages are found across the subcontinent, from Brahui in Pakistan to Malto in Bangladesh. But they’re really the dominant languages in the southern cone of South Asia. That being said it seems likely that historically their distribution extended far into the north, with Brahui in western Pakistan being a relic of that period, as well as the fragmented tribal groups in Central India. There is also evidence down to historic periods of a Dravidian-speaking substrate in Maharashtra. And purely from a philological perspective it seems clear that many Indo-Aryan languages evolved within a Dravidian linguistic substrate.

Next, in the far north there are languages of Tibetan provenance and affinity. These are explicable in their origins and relationship. But in the northeast third of the Indian subcontinent there are a two groups of Austro-Asiatic languages. The prefix “Austro” is indicative of the symbiotic relationship between historical linguistics and physical anthropology in the early 20th century (most famously illustrated in the transplantation of the social-linguistic term Aryan from a South Asian and Iranian context, to a racialized Northern European term). The map at the top of this post shows the distribution of the Austro-Asiatic languages, as well as their subdivisions. There is clearly an eastern and western wing to the group, but most scholars assume that this is an artifact of the historical eruption of the Burman and Thai peoples out of the southern fringes of the Chinese Empire and into mainland Southeast Asia.

800px-Ramakrishna_Mission_Cherrapunjee_106Within India the Austro-Asiatic languages fall into two broad categories: the Munda and the Khasi. The Khasi inhabit the massif which separates Bengal and Assam. Their culture and society is at some variance from the norm in India (they are matrilocal, and animist or Christian). A close relationship to the people to the east is clear in both their language and their physical appearance. The Khasi, and other groups such as the Garo, are of the family of peoples and ethnicities which have arrived from the east and north relatively recently, making the transition from the world of Tibet and Burma to India. This is evident in the face of the Khasi child in the image to the left. Once passing out of their lands of origin these populations have assimilated to different degrees to the Indic domain. The Tripuri people for example retain a Tibeto-Burman language, but are adherents of Vaishnav Hinduism (my own family were once subjects of the Manikya dynasty). The Ahom of Assam were totally assimilated by the Indo-Aryan substrate. Like the Bulgars of Bulgaria their only influence was in the ethnonym that they contributed to their subjects. A quick survey of my own genetics, and those of other South Asians of eastern origin on 23andMe, clearly shows the influence of assimilated Tibeto-Burmans. One Bangladeshi Muslim individual clearly carries an East Asian Y chromosomal haplogroup.

The Munda are a somewhat different case. In older historical literature on South Asia there is some consideration that the Munda may be the earliest inhabitants of India; predating the Dravidians. Some readers of South Asian origin also point out that in the early Indo-Aryan language there may be more evidence of Munda, than Dravidian, influence. But the eastern connections of the Munda languages seem clear, albeit less explicable than those of the Khasi or the Tibeto-Burman peoples of the far northeast. If the Munda are the indigenous people then it stands to reason that the Mon-Khmer languages derive from South Asia. On the other hand the vast majority of the Austro-Asiatic languages exist in Southeast Asia, and, the Munda themselves have been hypothesized as being the bearers of rice-culture from the east.

This is where genetics comes into play. There has already been evidence of an eastern influence in the genes of the Munda from other researchers, so what this paper does is look at that in detail, instead of discarding it as a minor effect which muddles the broader picture. I’ve reformatted figure 3 to show how the groups relate to each other. On the left is a PCA. Most of the variance is west-east, ~6%, while some of it is north-south, ~1%. On the right is a bar plot generated from ADMIXTURE. I’ve edited out many of the populations. Focus on the Austro-Asiatic groups from India.


In the PCA you see the SE-NW axis of ANI-ASI admixture which is the primary aspect of genetic variation within South Asia. Numerically Dravidian and Indo-Aryan groups along this axis are the vast majority of South Asians. But the Munda and other Austro-Asiatic groups are not trivial; there are strong suggestions that the eastern Indo-Aryan groups, Oriya, Bengali, and Assamese, are to some extent shaped by influence from the Austro-Asiatic elements. The closer connection of the Khasi to East Asian populations is clear on the PCA. But the fact that the South Indian samples are further along axis-Y than the Munda are indicative of admixture in the Munda population. Looking at the bar plot that’s clear. The dominant dark-green signature of South Indian ancestry is also predominant among the Munda, and found at non-trivial amounts among Iranian, Khasi, and Southeast Asian populations, but the Munda clearly have an eastern component which is not found in South Indians. This is probably the element which perturbs them on the PCA.

But this just tells us the relationships in terms of total genome content. It doesn’t necessarily tells us the historical sequence of admixture events or the direction of migration. In fact the evidence of Indian ancestry in Southeast Asia could be suggesting migration from South Asia to the Southeast Asia (there is plenty of cultural evidence of transmission, though the presumption is that the demographic movements were marginal). They note in the paper that one phenomenon which could be obscuring and confusing our understanding is that much of gene flow occurs through isolation-by-distance (IBD). Village-to-village dynamics. In contrast to this you have folk wanderings, which result in a “leapfrog” aspect. The Hazara and Uyghur are both cases of leapfrogging, as their genetic makeup can’t be explained easily by IBD. So here the connections between the Munda and Southeast Asians, and the broader relationship between Southeast Asians and South Asians, could be IBD, or perhaps reflect deep ancient common ancestry. Perhaps the ASI group spanned the region from the Arabian Sea to the South China sea, and were only later overlain by ANI and East Asian populations.

To explore these questions the authors tunneled down to a more fine-grained scale, and looked at uniparental lineages as well as a gene at which recent selection seems to have operated upon East Asians in distinction to other groups, EDAR. Though uniparental lineages are only partially informative in terms of ancestry, they are very amenable to dating because of their haploid inheritance patterns. And the relationships between the branches of the termini can give us historical information.

The following figure shows the relationship and distribution of a particular Y chromosomal haplogroup which the Munda carry, and other South Asians tend not to, which connects them to the east:


The haplogroup is O2a (M95). The results from the Y chromosomal data are not clear, though they do seem to reject the model whereby Southeast Asian O2a lineages derive from Indian ones. But it does not seem as if you have a scenario where one founder lineage entered into South Asia from Southeast Asia, there are too many disparate branches of O2a found among Indians. Additionally, the coalescence time (back to last common ancestor) is deeper in Southeast Asia, but still deep in South Asia among the Munda. From this it seems that the origin of Austro-Asiatic languages in South Asia can be rejected, but the details of the emergence of Austro-Asiatic in South Asia can not be clearly perceived as of yet. From what I can gather the authors themselves do not necessarily believe that their results in this domain are robust (insensitive to varying the model’s assumptions even marginally).

An interesting point though is that the mtDNA, the female lineage, does not seem to diverge from other South Asians much at all. I find it intriguing that this is the same pattern we see along the major NW-SE axis of variation. It seems that mtDNA lineages unite South Asians, while the Y lineages separate them (by caste and region). The generality has many exceptions, but it points to a peculiar sex mediated admixture process from both the northwest and northeast. Men on the move have reshaped the genetics and culture of South Asia, but the mtDNA lineages still point to an ancient Eurasian group with distant but stronger affinities to the east than the west. The mtDNA are likely the purest distillation of ASI.

Finally, they look at frequencies of variants of EDAR among the South Asian groups. EDAR is in some ways diagnostic of East Asian ancestry; it seems that a variant which produces thick straight hair emerged relatively recently among East Asians. Here’s the result from the HGDP browser:


edar2The G allele exhibits co-dominance, so the GA phenotype has intermediate hair-thickness between AA and GG. Haplotype structure based tests of natural selection have indicated that the derived G allele is recent. The map to the right shows the frequency of the derived G variant by population group. The bubble size is proportional to frequency, while the colors represent language groups. Again the Khasi and Tibeto-Burman groups are as you’d expect, they exhibit a relatively high frequency ofthe derived variant. The Hazara are a group which only came into being within the last 1,000 years through an admixture event. The Tharu seem to have their origins in Nepal’s transitional zone, and all the Nepali populations have significant admixture with Tibetan groups even if they themselves are not Tibetan in language and culture. The interesting result are the Munda. The Dravidian groups lack the derived EDAR variant, as do Indo-European groups without a plausible East Asian source of admixture. But within the Munda the derived variant is found in proportions ~5%. This is far lower than the 60% among the Tibeto-Burmans of the northeast, or the 40% among the Khasi, but it is significant. And this result allows the authors to reject the IBD model of connection for Austro-Asiatic groups, because the Munda harbor the variant which other South Asian groups in their environs do not. Gene flow predicated on linguistic affiliation at such a remove seems implausible, so the most parsimonious explanation is that the Munda languages arrived in India from Southeast Asia as part of a leapfrog folk wandering.

But why the low frequency of the derived variant? Obviously the Munda have admixed with the local substrate, so dilution would be one explanation. Another could be that when the Munda left East Asia the frequency was lower. Additionally, whatever selective forces were driving the frequency up may have abated in South Asia, and it could be that there was selection against the derived variant! Whatever the truth of it the existence of the derived EDAR variant among the Munda would be like finding the European LCT variant among an East Asian population: clear evidence of long distance gene flow and population movement.

So where does this lead us? First, let me observe that some of the authors on this paper are the same ones who argued for a predominantly indigenous origin for South Asians in the early 2000s based on mtDNA variation. In this paper they seem to be leaning against an indigenous origin for the Munda, or at least refuting the conjecture that the Munda are ur-Indians par excellence. I didn’t go into the details of the coalescence times because they’re rather a mess, but EDAR is probably a “tipping point” in arguing for a relatively recent exogenous origin for the Munda. The strong sex asymmetry in genetic variation is also suggestive, we have plenty of evidence of historical examples of genetic leapfrogs occurring through men-on-the-move. The asymmetry also seems to exist among the Khasi and other Tibeto-Burmans in India’s northeast (figure 2 of the paper).

The arguments about the history, culture, and genetics of South Asia have traditionally been disputed along the Aryan-Dravidian axis. I’m not interested in rehashing that aspect, but these data point us to another reality: on India’s northeast frontier there’s another component. As an ethnic Bengali myself I’ve always been somewhat aware of this. Some of my relatives and family acquaintances look much more like Garos than other South Asians. This component is even more evident on the face of Assamese and Nepali, whose languages are Indo-Aryan and religion is Hinduism, but whose appearance bespeaks a more variegated background. On some level South Asians from these regions are aware of their peculiarity, even if it isn’t spoken of much. I have read that in the wake of the victory of Japan over Russia in the early 20th century Bengali intellectuals expressed in public their pride at their Asiatic ancestry. With the rise of China in the 21st century I suspect more South Asians from Nepal, Bengal, and Assam, will rediscover that aspect of their background which links them to the east, and not the west. The genetics is just telling us what we already knew.

Citation: Gyaneshwer Chaubey, Mait Metspalu, Ying Choi, Reedik Mägi, Irene Gallego Romero, Pedro Soares, Mannis van Oven, Doron M. Behar, Siiri Rootsi, Georgi Hudjashov, Chandana Basu Mallick, Monika Karmin, Mari Nelis, Jüri Parik, Alla Goverdhana Reddy, Ene Metspalu, George van Driem, Yali Xue, Chris Tyler-Smith, Kumarasamy Thangaraj, Lalji Singh, Maido Remm, Martin B. Richards, Marta Mirazon Lahr, Manfred Kayser, Richard Villems, & Toomas Kivisild (2010). Population Genetic Structure in Indian Austroasiatic speakers: The Role of Landscape Barriers and Sex-specific Admixture Mol Biol Evol : 10.1093/molbev/msq288

Link acknowledgement: Dienekes Pontikos.

Addendum: This is more a speculative comment, so I will tack this on to the body of the main post. Here’s my current very tentative model for how South Asians came to be. At some point after the last Ice Age 10,000 years ago the ANI arrived, and hybridized with the ASI, who are descendants of the older original Out of Africa wave to South Asia. After this, but before the Aryans, the Munda arrived from the northeast, and pushed into lands inhabited by ANI-ASI groups. 4,000-3,000 years ago the Indo-Aryans arrive, and impose themselves as an elite on the ANI-ASI hybrid population, before being assimilated biologically and imparting their language to the Indian majority. I don’t know where Dravidian came from, but perhaps it was the language of the ANI (its existence in fragments all across the swath of the northern Indian subcontinent is suggestive, as well as possible connections to ancient Elamite, the language of Bronze Age southwest Iran). Eventually the Aryanized ANI-ASI marginalized the Munda in northeast India and drove them to the highlands. Finally, the Tibeto-Burmans arrived in the historical period.

Image Credit: Wikimedia Commons

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"