The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

If you have not read my post “To the antipode of Asia”, this might be a good time to do so if you are unfamiliar with the history, prehistory, and ethnography of mainland Southeast Asia. In this post I will focus on mainland Southeast Asia, and how it relates implicitly to India and China genetically, and what inferences we can make about demography and history. Though I will touch upon the Malay peninsula in the preliminary results, I have removed the Indonesian and Philippine samples from the data set in totality. This means that in this post I will not touch upon spread of the Austronesians.

I present before you two tentative questions:

- What was the relationship of the spread of Indic culture to Indic genes in mainland Southeast Asia before 1000 A.D.?

- What was the relationship of the spread of Tai culture to Tai genes in mainland Southeast Asia after 1000 A.D.?

The two maps above show the distribution of Austro-Asiatic and Tai languages in mainland Southeast Asia. Observe that when you join the two together in a union they cover much of the eastern 2/3 of mainland Southeast Asia. The fragmented nature of Austro-Asiatic languages in the northern region, edging into the People’s Republic of China, implies to us immediately that it is likely that in the past there was a continuous zone of Austro-Asiatic speech in this region. From the histories and mythologies of the Tai people we know that this group migrated from the southern fringes of China around ~1000 A.D. This is obvious when we note that there are still Tai people in southern China, and the expansion of the Tai across what is today Thailand is to some extent historically attested. Between 1000 and 1500 there was a wholesale ethnic reorganization of the Chao Phray river basin. Was that a matter of demographic replacement, or cultural assimilation, or some of both?

Second, what was the impact of Indians upon mainland Southeast Asia? One of the easiest ways to ascertain Indian influence is script. Burmese, Thai and Cambodian scripts all derive from Grantha, an archaic Tamil script (non-Islamic scripts in island Southeast Asia, such as Javanese and Balinese, are also derive from South Indian precursors). The Indian religious influences also are more southern than northern, manifesting in the southern forms of Shaivite Hinduism and Sri Lankan Theravada Buddhism.

There are three data sets which I looked at. I ran most of them from K = 2 to K = 12. This means that I threw all the individuals into a common pool and told the ADMIXTURE program to estimate their individual proportions of K number of populations. In this way we can get a general sense of the relationships of the populations. Remember that these aren’t necessarily real populations, and, the nature of the variation thrown into the pool impacts the nature of the inferred components greatly. I’m not reporting clear, distinctive, and objective entities extracted out of the data set. We’re looking at human intelligible interpretations of the patterns dependent upon the inputs and parameters. They’re telling us something real, but this isn’t like measuring the acceleration of a falling ball. It’s like describing the position of the ball in relation to a different set of reference objects. There’s a real ball with a specific position, but the descriptions are going to vary depending on what references you use (e.g., to the left of object A and below B, to the right of object C and above object D, etc.).

Here are the sets:

1) A “large” set which includes the mainland Pan-Asian populations, the white Americans from the HapMap, and some Malay peninsular groups.

2) A “medium” set which prunes most of the North Asian groups, Malaysian groups, and the white Americans. So it’s mostly mainland Southeast Asia, southern China, and India.

3) A “small” set, which removes many of the Southeast Asian populations, but keeps the Indian ones. I purposely overloaded this set with Indians to examine possibilities of Indian admixture in a few Southeast Asian groups.

Some notes. The Pan-Asian data set has ~56,000 markers. This is tolerable, but not optimal. It’s definitely good enough for European vs. Indian vs. East Asian vs. Negrito. But not less optimal for intra-regional variation. So take it with a grain of salt. But since I’m looking at Indian vs. East Asian, I’m mildly confident of that finding in relation to this data set. Second, the intersection of white Americans with the Pan-Asian set was ~30,000 markers. For Cambodians it was only ~22,000. There were ~100 white Americans, but only ~11 Cambodians. Be very cautious of the Cambodian results for this reason. Finally, remember that the ancestral components are abstractions, and can imply that stable and long admixed hybrid populations are their own distinct component, as well as isolates which are highly inbred.

There are three analyses and visualizations I will display below.

1) ADMIXTURE bar plots, which show the ancestral proportions of groups or individuals of a particular ancestral element.

2) Fst estimates across ancestral elements. This is a rough summary of genetic distance. I’ll also show you a two dimensional visualization on occasion, but remember that this removes some relationship information. The table is more accurate, though the visualization is easier to read.

3) Finally, I used EIGENSOFT to run some PCAs. This means that I took the pool of data and allowed the program to extract out the independent dimensions of variation. I ran it so that it pulled out the top 6 dimensions. The west-east dimension is always the largest by many multiples. Remember that the plots are not scaled.

I should also say that the K’s I’m showing are the most before inbred subgroups within the reported populations started breaking out into their own components (this happened especially within the Indians).

Starting at the beginning, I have noticed in the Pan-Asian data set that some groups, particularly Mons and Malays, seem to show Indian admixture. My question: is this really Indian admixture, or perhaps recent European admixture? That’s why I had the large data set, with white Americans. Here are the results:

So it seems unlikely that the Mon and Maly admixture with a West Eurasian element is from Europeans. Rather, it is consistent with Indians. In fact, I’m pretty confident it isn’t West Asian either, as is a possibility in the case of the Malays, because that component tends to align with Europeans at this scale. Finally, I will tell you that the admixture in both Mon and Malays is relatively even. In other words, the group estimates aren’t being shifted by one or two highly admixed Indians, which would be a good tell as to recent intermarriage. Not unheard of. Mahathir Mohamad’s paternal grandfather was a Kerala Muslim.

Now let’s look at the PCA. I’ll focus on dimensions 1, 2, and 3. Remember that these are the three largest dimensions of genetic variance rank ordered. Dimension one is by far the largest, by a factor of at least five usually in these plots. It’s the west vs. east Eurasian dimension.

I’ve highlighted the important bits. Two notes. First, I think you do see the suggestion that the Mon & Malay are shifted toward the Indians, not the Europeans. This is in perfect alignment with the ADMIXTURE result. Second, please note that the “Indian Singapore” population is heterogeneous. It is mostly Tamil, but there are clearly other Indians in the sample, and, some individuals who have Malay or Chinese ancestry.

Additionally, please note in the ADMIXTURE result above the similarity between the Tai and the Zhuang. The Zhuang are China’s second largest ethnic group, and reputedly the source population for the Tai migrations into mainland Southeast Asia. Before I move on, you should have some sense of the locations and ethno-linguistic affinities of some of the more obscure groups:

Location Group Language group
Northern Thailand Htin Austro-Asiatic
Northern Thailand Lawa Austro-Asiatic
Northern Thailand Mon Austro-Asiatic
Northern Thailand Palong Austro-Asiatic
Northern Thailand Plang Austro-Asiatic
Southern China Wa Austro-Asiatic
Northern Thailand Yao Hmong-Mien (Mien)
Southern China and Northern Thailand Hmong Hmong-Mien
Southern China Zhuang Tai
Northern Thailand Karen Tibeto-Burman
Southern China Jinuo Tibeto-Burman

One aspect which isn’t listed here is the classification of some of these populations as “hill tribes” or not. The Mon and the H’tin are both Austro-Asiatic, but the former are in some ways analogous to the Greeks on mainland Southeast Asia, while the latter are a tribal isolate which has preserved its identity in the hills of northern Thailand. By Greeks, I mean that the Mon have been assimilated or dominated by the Bamar in Burma and the Tai in Thailand, but in both cases have imparted to these groups the essence of Southeast Asian Indic high culture. The Mon were at one point ascendant from the lower Irrawaddy in southern Burma to the lower Chao Praya basin in Thailand, the terminus of which today is Bangkok. In contrast, groups like the H’tin and Lawa were presumably relatively insulated from Indic influence. The Hmong are relative newcomers to Southeast Asia, which explains their status as animists for example. Finally, you have groups like the Wa which are technically not even Southeast Asian, but are Austro-Asiatic. They should give us a sense of Austro-Asiatics without an Indic imprint.

Let’s move on to step two, the medium data set. I’m removing the white Americans, Malaysians, and North Asian groups. And now I’m including the Cambodians.

Again, the Mon have the Indian component. And so do the Cambodians. Remember that while everyone else has 56,000 SNPs, the Cambodians only have 22,000, so we need to be careful. Though you see this element in the HGDP runs as well. That is, an Indian affiliated component. It’s relatively evenly distributed among the Cambodians, so you can’t chalk it up to a few admixed individuals. Again, you see the similarity between the Zhuang and the Tai. The main difference is that the Tai seem to have admixed with various Southeast Asian groups. That’s to be expected. What surprised me though is that from these results it seems that the Tai expansion was demographically, not just linguistically, dominant. This is clear even the Bangkok sample. More on this later.

Below are the genetic distances between the inferred ancestral groups. The labels given the modal population, and then the language family:

Jinuo_Burman Htin_Austro Tai SouthAsian Palong_Austro Hmong
Jinuo_Burman 0 0.073 0.057 0.115 0.092 0.085
Htin_Austro 0.073 0 0.03 0.088 0.065 0.06
Tai 0.057 0.03 0 0.09 0.064 0.047
SouthAsian 0.115 0.088 0.09 0 0.117 0.117
Palong_Austro 0.092 0.065 0.064 0.117 0 0.09
Hmong 0.085 0.06 0.047 0.117 0.09 0

Here are some visualizations:

And here’s the PCA:

In this plot you see both the Mon and Cambodians shifted toward the Indians, again. Also, note the Zhuang and the Tai mostly overlap rather well. The y-axis is defined it seems by Austro-Asiatic hill tribes, then the Tibeto-Burman groups, and a gap until you hit the Tai cluster, which eventually merges with the Hmong. There’s a reasonable language family affinity here, insofar as the Yao are between the Tai and the Hmong.

Finally, we move to the Indo-centric run. I’ve removed a lot of the Southeast Asian groups now. Some of the hill tribes are obviously relatively isolated, and so throw up their own clusters or diverge on PCA rather easily. That’s a function of genetic differences which build up if you are relatively insulated from gene flow. Because I removed so many populations I’m only left with three K’s before you get qasi-family clusters showing up as K’s. Also, I’m going to show you individual bar plots for Cambodians and Mon to illustrate that the Indian component isn’t just isolated instances of admixture:

The Fsts are straightforward in this case:

Austro-Asiatic Tai South Asian
Austro-Asiatic 0 0.028 0.084
Tai 0.028 0 0.085
South Asian 0.084 0.085 0

It’s the PCA which is really interesting in this run. The first isn’t too exceptional:

OK, first, since this is an Indian focused set, you see that there’s more than the standard west-east dimension. You have several lower order dimensions which separate Indians! I had previous assumed that the Indian component which always shows up in the Cambodians in the HGDP was a function of deep ancient ancestry with the “Ancestral South Indians” of Reich et al. This ancient population may have had affinities with many groups out toward Southeast Asia, and so the residual cluster in Cambodians may have been part of the deep Ice Age ancestry of this group. These results convince me that this is not so straightforward an explanation. In this sample the group that has the highest ASI are the Bhils, a tribal population. In one of the plots you see that the Bhils form one end of the distribution, and Gujarat Vaishyas the other. It is clear that this is an Ancestral North Indian-Ancestral South Indian cline. The Mon and Cambodians don’t deviate much from the center, suggesting to me that they aren’t too skewed toward the ASI! Additionally, the “center” of the distribution is weighted toward caste South Indians. This is then is a nice resolution, because it dovetails perfectly with the historical evidence for a South Indian specific influence on Southeast Asia in the early historic period.

This isn’t a slam dunk. There needs to be estimates of the time since admixture. It should post-date the ANI-ASI admixture event, and be in the same range as the Uyghurs. Unfortunately with only 56,000 SNPs I’m not sure this estimate is possible, but I’ll look into it. Additionally, a deeper survey of Y and mtDNA lineages need to be done in Southeast Asia. They may show sex-biased migration. I did look for the West Eurasian specific SLC24A5 variant, which goes no lower than ~50% in South India, but that’s not in the Pan-Asian SNP data set. It is in the HGDP, and none of the 11 Cambodians have it. This would lean toward the ASI hypothesis, but seeing as how the West Eurasian variant may only about ~50%, and the Cambodians are less than 10% South Asian, it isn’t totally implausible that it wouldn’t show up in 22 gene copies (using realistic assumptions I get a ~50% probability that a West Eurasian copy of SLC24A5 wouldn’t be found in the Cambodians with N = 11).

I’ve not devoted too much space to the Tai-Zhuang connection in this post, because it’s obvious in the plots. The Tai are obviously somewhat shifted toward Austro-Asiatic groups, but far less than I would have expected. In fact, taking the ADMIXTURE components too literally you might infer that there’s been more Tai admixture into the Mon and Khmer than the other way around! This might not be totally implausible when you consider that Thailand’s population is nearly five times that of Cambodia. But the standard model I’ve read suggests that Tai warrior bands conquered the Mon-Khmer indigenes, and absorbed much of their high culture. These results don’t cohere easily with that in terms of demographics.

I have a possible explanation for what occurred. Much of Thailand may not have been too populous until the past ~1,000 years, with lowland agriculture being driven by elite direction. The Tai may have brought superior agricultural techniques, and so entered into a phase of rapid population expansion into the lowland frontier, which had no parallel during the Mon and Khmer period of dominance. In other words, the Tai bands were small and initially outnumbered by the Mon and Khmer. But through favorable resource direction and priority allocation of newly arable land to co-ethnics the small Tai population might quickly have come to dominate the previous inhabitants. This is the model which is outlined in the Rise of Islam and the Bengal Frontier. In it the author basically argues that eastern Bengal was lightly populated until large scale Muslim elite driven projects to open up the agricultural frontier. The recruited peasants were either Muslim or converted to Islam, because the cultural landscape was relatively fluid and unsettled, in contrast to the more static peasant economy of western Bengal, which remained Hindu. The Islamicization of eastern Bengal in this model had less to do with the conversion of native tribes, and more to do with the rapid demographic expansion of Bengali peasant colonies which were enabled by agricultural projects, colonies which were Islamicized or were drawn from the minority Muslim peasantry of the western zone by Mughal elites intent on creating a region where the Hindu upper castes were marginalized. Similarly, the Tai expansion in Southeast Asia may have been into a de facto “empty” landscape. During the period when Mon and Khmer high culture was absorbed the Tai may have been the smaller element in terms of numbers. The current ratios are a function of later social and demographic processes.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

As I am currently reading Victor Lieberman’s magisterial Strange Parallels: Volume 2. So I was very interested in a new paper from BMC Genetics, Genetic structure of the Mon-Khmer speaking groups and their affinity to the neighbouring Tai populations in Northern Thailand, pointed to by Dienekes today. Here are the results and conclusions:

A large fraction of genetic variation is observed within populations (about 80% and 90 % for mtDNA and the Y-chromosome, respectively). The genetic divergence between populations is much higher in Mon-Khmer than in Tai speaking groups, especially at the paternally inherited markers. The two major linguistic groups are genetically distinct, but only for a marginal fraction (1 to 2 %) of the total genetic variation. Genetic distances between populations correlate with their linguistic differences, whereas the geographic distance does not explain the genetic divergence pattern.

The Mon-Khmer speaking populations in northern Thailand exhibited the genetic divergence among each other and also when compared to Tai speaking peoples. The different drift effects and the post-marital residence patterns between the two linguistic groups are the explanation for a small but significant fraction of the genetic variation pattern within and between them.

There are many occasions when it has taken a synthetic scholar to point out to me the overall structure of a constellation of facts which I was conscious of prior. So it is with Lieberman’s work. I had known that the eruption of the Thai peoples into Southeast Asia occurred with the last 1,000 years, before which the peninsula was divided between Tibeto-Burman populations to the west and Austro-Asiatic languages to the east (the latter divided between the Khmer and Vietnamese). Additionally, it is presumed that the Tibeto-Burman languages themselves displaced Austro-Asiatic in the western zone (as evident by the persistence of Mon in modern Burma). What was noted in volume 1 of Strange Parallels though is that the three geographical regions engaged with and assimilated the Thai invasions different. In the center the Thai succeeded in dominating the previous groups and imposing their identity upon the region. It is often asserted that modern Cambodia’s existence as an independent state is a function of the protection conferred upon it by the French from the expansive ambitions of the Empire of Siam. But in the east the Vietnamese state was barely impacted by the Thai folk wandering. As in China the Thai in Vietnam are marginalized “mountain tribes.” Finally, in the west, in the zone which became Burma, the Thai did not take over the cultural commanding heights. But neither were they absolutely marginalized as in the east. Rather, the Shan people became part of the of the Burmese landscape, integrated into the Theravada Buddhist culture, but also a significant secondary ethnos to the Burman majority (along with Karens, Mons, etc.).

What does this have to do with genetics? Possibly everything and nothing, and all answers in between.

The massive shift in ethno-linguistic identity in the center of mainland Southeast Asia, its lack in the east, and position at the equipoise in the west, should be excellent tests of propositions as to the nature of the spread of such ethno-linguistic identities. Is it pure construction, demographic replacement, or some quantitative combination of the two parameters? Unfortunately the BMC Genetics paper focuses only on Y chromosomes and mtDNA, the paternal and maternal lineage. These markers are informative, but I’d rather look at total genome content. The ethnic coverage in a small area of northern Thailand though is impressive. The open circles represent Mon-Khmer ethnic groups, the dark ones Thai. The Mon-Khmer are the presumed indigenes, while the Thai are intrusive. At least over the past 1,000 years.

Below I’ve reedited the Y and mtDNA multidimensional scaling plots. The Y is on the left, and mtDNA on the right. The clustering pattern shows relationships across the lineages. Again, the open markers represent Mon-Khmer groups, and the closed ones Thai.

Since the paper is open access I invite you to read their interpretations. All I’d say is that the clustering of male Thai lineages is very interesting, and is well explained by the model of groups of related men being intrusive to a region, and taking wives from the indigenes. In contrast the Mon-Khmer Y chromosomal lineages scatter about more, and that may be due to the fact they coalesce back to common ancestors far further back in history. The intrusion of the Thai into Southeast Asia may then be demographically characterized by a migration of male warbands. In regions where these warbands managed to topple the previous order, as in central mainland Southeast Asia, they may have then monopolized access to women and entered into a period of demographic expansion.

Luckily we do have some thick-marker autosomal data. To the left I’ve reedited a figure generated with the HUGO Pan-Asian data. The bar plot is at K = 14. I’ve excised many of the extraneous populations. The colors within the bar plot correspond to associations with broader language families. So red seems to be Austro-Asiatic, while blue is Thai. You can see in the figure that the Chinese Thai lack the red Mon-Khmer component. Interestingly the the Hmong of upland Southeast Asia, who are culturally marginal to the dominant Theravada Buddhist culture of the lowlands, exhibit evidence of very sharp differentiation from the Thai and the Austro-Asiatic groups. They lack the affinity with island Southeast Asians, Malays, and Taiwanese Aborigines, which seems common amongst the South Chinese more broadly. The Karen of Thailand are probably the best proxy we have for the Tibeto-Burman people of Burma, who post-date the Austro-Asiatic, and predate the Thai. Going by these data it looks as if the Karen are very hard to differentiate from the Austro-Asiatic populations, though very distinctive from the Thai.

The Pan-Asian data set leaves a lot to be desired. There’s not much coverage of the east or west. I suspect that Southeast Asia is going to be somewhat complex, and extrapolating from the correlations between languages and genes in Thailand is going to get us only so far. But it’s a start. In Strange Parallels the author makes the case that mainland Southeast Asia can tell us a lot about generic Eurasian historical process. I hope, and suspect, that it can tell us something more general about the interplay between language and genes over time in other regions as well.

(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"