The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Southeast Asia

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Lipson, Mark, et al. “Reconstructing Austronesian population history in Island Southeast Asia.” Nature communications 5 (2014).

Austroasiatic-en.svgThere was a question below about the relationship of the Cambodians to the Vietnamese. Both these groups speak Austro-Asiatic languages. As can be inferred from the figure to the left, these languages are predominant in Indo-China. In particular, Cambodian and Vietnamese are both Austro-Asiatic. Traditionally, as noted in the comments, Cambodian is classed with the Mon language of Burma, under Mon-Khmer (as is faintly evident on the map Mon languages seem to have been spoken in what is today central Thailand before their replacement by Thai). There are also Austro-Asiatic languages on the fringes of southern China, though again the map is very illuminating, as the pattern of fragmentation is often indicative of marginalization and language replacement. Finally, there are Austro-Asiatic languages spoken in India, and among the indigenous people of central Malaysia, who are often termed Negritos (as well as the Nicobarese). As I have explored in depth elsewhere, there is now strong suggestive evidence from the genetics that the Austro-Asiatic languages are intrusive to South Asia from Southeast Asia. As the Negritos of the Phillipines also speak the language of nearby Austronesian agriculturalists, as do the Pygmy of the Congo, it seems likely that Malaysian Negritos received their language from agriculturalists.

51IZQjMbVlL._SX346_BO1,204,203,200_ Peter Bellowood fleshes out most of the details in First Farmers of how agricultural came to Southeast Asia (highly recommended, though it’s a little out of date now in some areas). At an archaeological site in northern Vietnam Bellwood describes burial grounds dating to 4,000 years in the past where two distinct groups are evident in the remains. One set of skeletons resembles modern East Asians morphologically, while the other element exhibits broad similarities to Near Oceanian peoples. He terms these “Austro-Melanesians.” Frankly, I think this is a confusing term. Though it seems likely that these groups are part of the broader range of populations which gave rise to modern Southeast Asian Negritos, like Papuans and Australian Aborigines they were in no way diminutive. So terming them “Proto-Negrito” would seem misleading. Therefore, I will term then “Ancestral Southeast Asians,” or ASA. The genetics points to the likelihood that as substantial minority of the ancestry of modern Southeast Asians derives from the ASA, in various quanta.

The best paper I know of in relation to the genetic history of Southeast Asia, maritime and mainland, is Reconstructing Austronesian population history in Island Southeast Asia. They used the PanAsian data set, which is somewhat thin on SNPs (<100,000), and also spotty in population coverage. The figure above shows one of the primary results. It seems that agriculture came to Southeast Asia in two major waves. First, with Austro-Asiatic peoples. And later, with Austronesians. The latter seem to have settled maritime Southeast Asia, where archaeological evidence of agriculture is thin to nonexistent before they arrived. But, as you can see from the figure many maritime Southeast Asian peoples also have signatures of Austro-Asiatic ancestry. The likely case then is that they picked this up en route, though there may also have been indigenous people in the islands when they arrived. But curiously, not in the east. There a Melanesian ancestral component is present, which has affinities to that contributed to modern Filipino ancestry from Negritos. The 2011 paper which posits two distinct elements before agriculture between mainland Southeast Asia and Papua would make sense of this pattern. The division probably followed Huxley’s Wallace Line.

As I said above, the PanAsian data set is spotty on population coverage. There are lots of obscure tribes, but not so much when it comes to the numerous people of mainland Southeast Asia. I have some data to probe these questions. Unfortunately not all of it is public, so I can’t release it (though some of it is from the 1000 Genomes, Estonian Biocentre, and HGDP, so you can find much of it it elsewhere).

The data set has 150,000 SNPs, with ~0 missingness (I just removed anything that had missing calls). I labeled samples from countries without ethnic provenance by those nation names. Additionally, I already did some preliminary outlier removal (e.g., removing Filipinos with non-trivial European ancestry, etc.).

Let me give you plots of PC 1 to 4 below. Click to enlarge.

The first plot shows Indians and Papuans away from a cluster of Southeast and Northeast Asians. There are a lot of Southern Chinese from the 1000 Genomes, as well as Koreans. The Burmese are the first out toward Indians. The cluster that pushing itself toward Papuans are Filipinos. This makes sense in light of what we know bout Philippine Negritos. They are probably not descended from ASA, but rather a sister population, highly diverged, and with greater affinities to the peoples of Near Oceania. While the first plot shows PCs which separate both Southeast and Northeast Asians from Indians an Papuans, the second plot separates Southeast Asians among themselves. The north to south axis seems to align with a cline of Austro-Asiatics. The axis east to west runs toward Austronesians. Intriguingly there are three Indonesian samples which span the two axes, exactly in lines with the results of the paper above. Vietnamese and Dai are pulled more toward the Cambodians. Toward the top of the plot are Koreans, while the very dense cluster includes Southern Chinese, as well as assorted Southern Chinese ethnic minorities. There’s a few Malaysian samples in there. Unlike the Indonesians they are drawn much closer to the Southern Chinese cluster, but not quite in it. These may be Baba Chinese. I was surprised there weren’t more Overseas Chinese in these data. But there were some. It’s interesting that the Indians are close to the Chinese cluster, rather far from the Cambodians. I think that the Cambodian cline is probably indicative of ASA ancestry fraction.

To get more clarity a bigger plot with more labels:


The Lahu are a group that is in both southern China and Burma. You see the influence of geography, as they are right below the Burmese samples. Thailand is interesting, because one of the individuals is close to the Cambodians. The others sit atop the Dai and the Vietnamese. With only four individuals you can only say much. But I think we’re seeing a cline or structure within Thailand. Some regions are linguistically Thai, but have barely any genetic footprint from the Tai migrations. In other regions the impact of the newcomers might have been overwhelming. Until we get more samples from indigenous Thai we can’t say much. For all I know the three individuals clustering with the Dai and Vietnamese are of partial Chinese heritage.

Let’s look at the first plot zoomed in with some more labels:


You see now that one of the Indonesians seems to have a lot of Papuan-like admixture. They may be from the east. Another is very like the Burmese. And a few are clearly Overseas Chinese, and a last is shifted toward the Indians. The Burmese in these samples really seem to be two groups. I bet the ones further away from the Dai are ethnic Burmans or Mon. The others may be Shan, the descendants of Tai, at least in part.

I’m going to run TreeMix now. I will remove the Overseas Chinese, and the Malaysians. I’ll also remove the Thailand samples, as I don’t know whether they’re Chinese or not (I could run ADMIXTURE, but I’m tired). I rooted it with Papuan and did 5 migrations. Plots below.

SoutheastAsiaOut.6 SoutheastAsiaOut.7 SoutheastAsiaOut.8 SoutheastAsiaOut.9 SoutheastAsiaOut.10 SoutheastAsiaOut.1 SoutheastAsiaOut.2 SoutheastAsiaOut.4 SoutheastAsiaOut.5 SoutheastAsiaOut.3

OK, the strangest thing you’ll notice is that Burmese are placed near the Koreans, but they seem to have admixture from a population with a lot of Indian affinities. The Shan show this pattern, but much more mildly. I think this actually answers even more fully what’s going on among East Bengalis. The Burmese are a compound population, of newcomers from the north, rather deep into China, and the long resident Austro-Asiatic population. Mons. It was probably a pretty singular pulse event to get picked up so well. These later went to eastern India, and mixed with the proto-Bengalis, who probably already had some Austro-Asiatic Munda ancestry. Additionally, the connection between the Burmese and Naxi minority in China is not surprising; they speak related languages. The Miao are near the Southern Chinese. They are also quite closely related to the Hmong, who are a southern branch. The gene flow between the Filipinos and Indonesians makes sense. These are the two Austronesian groups. Finally, notice that the Dai are closest to the Chinese, then the Vietnamese, and most distant the Cambodians.

Let’s make the TreeMix plots cleaner by removing some of the groups:

SoutheastAsiaTOut.7 SoutheastAsiaTOut.8 SoutheastAsiaTOut.9 SoutheastAsiaTOut.10 SoutheastAsiaTOut.1 SoutheastAsiaTOut.2 SoutheastAsiaTOut.3 SoutheastAsiaTOut.4 SoutheastAsiaTOut.5


Alright, what’s the upshot of these plots?

1) We see that the Cambodians are a hybrid of a population like the Dai of South China, and something somewhat Indian-like, but not totally.

2) The Vietnamese have a very faint gene flow from the Cambodians. Some of the samples which were Vietnamese look to me like they were ethnic minorities. I left them in because that is part of Vietnam’s genetics. They may simply be assimilated minorities or Khmer.

3) The gene flow into the Burmese is between Indian-like and Cambodian-like. It’s hard to parse out the distinctions, but there’s probably recent gene flow into Burma (I removed a few outliers as it is). Additionally, the ASA ancestry in the Cambodians may be higher than in the Burmese, because the latter have had more dilutions. But remember that I think there’s Indian ancestry in the Cambodians too.

4) The Negrito ancestry among Filipinos is pretty obvious in the gene flow.

• Category: Science • Tags: Southeast Asia 
🔊 Listen RSS

Reconstructing Austronesian population history in Island Southeast Asia, Mark Lipson, Po-Ru Loh, Nick Patterson, Priya Moorjani, Ying-Chin Ko, Mark Stoneking, Bonnie Berger, David Reich doi: 10.1101/005603

One of the strangest aspects of human history is the fact that periodically groups on the margins seem to rise to the fore and enter into a phase of rapid expansion into virgin territory. By “virgin” I don’t necessarily mean uninhabited, but rather virgin in relation to the mode of production which defines the expansionary group. A classic illustration by this is the rise of the Anglo-Saxon Diaspora between 1600 and 1900, as it settled territories inhabited by other populations at much lower population densities. The Bantu Expansion is another case in point. What you see in both cases is the migration of a population which has found a way to produce more calories per unit of land, and the weight of numbers resulted in the marginalization and/or absorption of the native populations, to varying degrees. In the Anglo North America and Oceania the admixture of indigenous ancestry is relatively low, at least into European populations. In East and Southern Africa the admixture of non-Bantu populations is definitely somewhat higher.

Austronesian expansion

Austronesian expansion

This dynamic has old roots in our lineage. It goes back at least to the rise of modern humanity on the fringe of Africa 50 to 100 thousand years ago, and its subsequent expansion across the world (with some assimilation of older hominin lineages). A more recent case is the Austronesian expansion out of Taiwan, which encompasses a longitudinal gradient from East Africa all the way to South America, and a latitudinal one from Hawaii to New Zealand. Even today I suspect people would be impressed by this, but it is all the more amazing when you observe that modern humans seem to have stabilized their range in Near Oceania for ~30,000 years. Unlike the “first farmers” of the Middle East the expansion of the Austronesians had less to do with a mode of production, than pioneering navigational skills and a lack of all sanity and rationality when it came to venturing across great expanses of water.

The question of why a small group of Southeast Asian people in Taiwan began to move in a manner which would trigger a world-wide cultural and demographic revolution is still an open one. But a second issue which can be explored is the nature of who these seafarers came into contact with. Of course most of the discussion has been around the uptake of Melanesian admixture in Near Oceania. A second question for me has always been the nature of the dominance of Austronesians in maritime Southeast Asia. Basically, Indonesia and Malaysia. The mainland of Southeast Asia was dominated by Austro-Asiatic peoples until the arrival of Tai, Miao, and Tibeto-Burman groups over the past few thousand years. Did Austro-Asiatics populate maritime Southeast Asia at one point? A preprint on bioRxiv aims to explore this question, Reconstructing Austronesian population history in Island Southeast Asia:

Austronesian languages are spread across half the globe, from Easter Island to Madagascar. Evidence from linguistics and archaeology indicates that the “Austronesian expansion,” which began 4-5 thousand years ago, likely had roots in Taiwan, but the ancestry of present-day Austronesian-speaking populations remains controversial. Here, focusing primarily on Island Southeast Asia, we analyze genome-wide data from 56 populations using new methods for tracing ancestral gene flow. We show that all sampled Austronesian groups harbor ancestry that is more closely related to aboriginal Taiwanese than to any present-day mainland population. Surprisingly, western Island Southeast Asian populations have also inherited ancestry from a source nested within the variation of present-day populations speaking Austro-Asiatic languages, which have historically been nearly exclusive to the mainland. Thus, either there was once a substantial Austro-Asiatic presence in Island Southeast Asia, or Austronesian speakers migrated to and through the mainland, admixing there before continuing to western Indonesia.

In the discussion the authors clear come down on the side that Austronesian and Austro-Asiatic admixture occurred prior to the settlement of maritime Southeast Asia. Though their marker set couldn’t infer timing of admixture event the relatively evenness of admixture in western Southeast Asia and the archaeological evidence seem to point to the idea that Austro-Asiatic speakers did not push past peninsular Malaysia (where there are Austro-Asiatic speakers in the interior among the Negrito populations). To me this has always struck me as strange, because obviously the island of Java has been amenable to widespread rice farming, and Indonesia today is as populous as all of mainland Southeast Asia. But it seems that the spread of populations over water can be highly contingent, and not inevitable. For example barbarian incursions into mainland Italy often stopped at the straits which separated the continent from Sicily, and the Vandal adoption of seafaring has always been somewhat mysterious. Though there has been gene flow across Gibraltar, it does seem that the existence of a water barrier has resulted in a major genetic discontinuity. And yet tens of thousands of years ago in prehistory the ancestors of the Australian and Melanesian peoples crossed from Sundaland to Sahul.

• Category: Science • Tags: Austro-Asiatic, Austronesians, Southeast Asia 
🔊 Listen RSS

Malaysian “Negritos,” presumably the indigenous people of the Malay peninsula

A few days ago Dienekes pointed to a paper which reports on the presence of anatomically modern humans in China 80-100,000 years before the present. I say “anatomically modern” because there is a presumable distinction between populations which resemble moderns in their gross morphology, which first emerged in southern and eastern Africa 100 to 200 thousand years ago (and were dominant all across the world after 40,000 years before the present), and “behaviorally modern” societies, which exhibit all the hallmarks of protean symbolic cultural expression that are the hallmarks of humanity. The paper reporting on such old specimens is not particularly revolutionary. Rather, it’s part of a growing corpus which contributes to a “counter-narrative” to the dominant model, whereby behaviorally modern humans swept across Eurasia (and Australia) ~50,000 years B.P. after the “Out of Africa” event. Obviously the problem here is that if there were anatomically modern humans in China tens of thousands years before this expansion, were they replaced? Or is the chronology wrong? (e.g. the mutation rate controversy, though please note that the dominant model has many physical anthropologists who support it as well). On Twitter I pointed out to Aylwyn Scally that we do have evidence of substantial population replacement across East and Southeast Asia.

The recent human genetics results out of China, Japan, and Southeast Asia, suggest to me that these populations are simply too close genetically to have roots prior to the Holocene (i.e., before ~10,000 years before the present). But there’s another indication that there was relatively recent population expansion and replacement: physically very distinctive “Negrito” populations are still found in the most remote areas of Southeast Asia, in Malaysia and the Philippines. The Reich group also reported a few years ago that these Negrito populations themselves exhibit population substructure, with the Philippine group having deep ancient affinities to the likely first settlers of Sahul, and the Malaysians being closer to Andaman Islanders. And just because the Negritos of Malaysia are reputed to the aboriginal people, one can not discount the possibility that indeed they are not, but replaced even earlier populations (this is an implication of the Reich group’s results if I read it right). If we update our prior as to the likelihood of demographic displacements, then a ‘solution’ to the paradox of shallow convergence of populations in time in relation to the archaeological record may simply be that older populations did not contribute much to present lineages. This does not mean a zero contribution. Recall that the Neandertal admixture results almost became definitive only with dense marker sets, and an ancient reference sequence. Diverged H. sapiens sapiens groups will not be quite as diverged, so they may have left a legacy, but at such a low level that current data sets and techniques do not have the power to detect them.

A Palimpsest

But we don’t need to focus on prehistory. The recent history and semi-history of Southeast Asia is complex, and filled with cultural and demographic events which suggest great changes in the distribution of populations. The best outline I have read of this is Victor Lieberman’s peculiarly titled Strange Parallels, which outlines the rise of mainland Southeast Asian polities between 800 and 1800. The most important transformation of the past 1,200 years in mainland Southeast Asia has been the rise of the Dai/Thai peoples, and the recession of the Mon-Khmer groups. The dominant language of Thailand and Laos, as well as the highlands of eastern Burma, originally derives in historical time from a South Chinese set of ethnic groups, the Dai. As the Han Chinese pushed southward over the first millennium A.D. the indigenous populations either assimilated, or reacted by organizing their own polities (e.g., Nan-Chao). Ultimately the resistance was futile, and the Chinese conquest of their homelands helped precipitate a mass out-migration of Dai, into the lands of the Khmer. What is today Thailand was once part of greater Cambodia. To the east in Vietnam a somewhat less dramatic phenomenon occurred, as the Vietnamese (Kinh) pushed south along the coast, eventually absorbing or assimilating the Khmer of the lower Mekong. Further south the situation in maritime Southeast Asia the situation is more confused. My own belief is that it is likely that in these regions before the arrival of the Austronesians Austro-Asiatic languages were dominant. But by the time we have written records Austronesian dialects were universal, with only the Negritos of interior Malaysia retaining Austro-Asiatic as their first language.

One way we can further explore this issue is through genetics. For example, here are some results from the Harappa data set. I’ve posted only the most relevant ancestral components, and pruned the populations.*

Ethnicity “Indian” “SE Asian” “NE Asian” “Siberian”
Iban 5% 87% 0% 4%
Malay-Singapore 11% 72% 6% 3%
Cambodian 11% 71% 10% 3%
Dai-China 0% 69% 30% 0%
Vietnamese 2% 62% 35% 0%
Thai 14% 61% 12% 3%
Lahu 3% 55% 37% 3%
Miao 0% 37% 61% 1%
Han-Singapore 0% 36% 63% 0%
Han-South 0% 33% 66% 0%
Burman 17% 28% 42% 6%
Han-Beijing 0% 19% 76% 3%
Santhal 72% 17% 0% 1%
Naxi 4% 15% 73% 7%
Japanese 1% 11% 74% 11%
Mongola 0% 7% 62% 23%
Bengali 47% 6% 5% 2%


Some quick comments. First, note that though there are differences among the Han Chinese, the gap is much larger between indigenous South China ethnicities (Dai) and the South Chinese, than between the latter and the North China. Second, of all the indigenous Southeast Asian groups the Burman samples stand out. Why? Unlike their eastern neighbors the Burman population speaks a Tibeto-Burman. The prefix “Tibeto” suggests affinities with peoples from the north and west fringes of China, and that is often part of the origin legends of these people. Those legends seem correct. Not only are there cultural affinities, but these results suggest that the exogenous Burmans contributed substantially to the demographic makeup of the populace. One difficult aspect of Southeast Asian genetics is that there seems to be two South Asian affiliated components. It is likely that the Cambodians are reflecting a very ancient admixture event. For the Burmans some of this is likely the case as well, but some of the admixture is almost certainly recent (e.g. the original file shows that some Burmans have “Baloch,” which is a tell for more recent South Asian gene flow). Separating these two may not be easy, but, they are necessary if one is to get a good grasp of the impact of historical South Asian migration. And in both Burma and Malaysia the issue is complicated by both ancient and medieval migrations, and more recent colonial era settlement from India. Finally, with the Iban, an indigenous Austronesian group, you see that they are the “most Southeast Asian” of the populations listed above. It seems plausible here that this is partly a function of isolation and lack of cosmopolitanism; the Malays have had both Indian and Chinese admixture.

This is just scratching the surface of the last 4 to 5 thousand years. How plausible is it that we’ll have a neat story about the settlement of Southeast Asian >5,000 years before the present, back to 50,000 years? Looking at extant genetic variation it will be difficult, and population coverage, marker density, and methodological precision, all need o be maximized. At this point I am not surprised we are confused and unable to tell a neat story.

* Filtered for at least 5% “Southeast Asian,” N >= 5, and removed uninterested or duplicate populations

🔊 Listen RSS

Dienekes has touched upon it in detail, so I don’t have much to add. Except for two points:

1) The ancestry cline here is not due to isolation-by-distance, but the expansion of the Austronesian population rather precipitously ~4,000 years ago. As Dienekes observed this was rather clear by non-genetic means; this is just icing on the cake.

2) There is evidence of an Austro-Asiatic substrate across maritime Southeast Asia. For whatever reason it seems that Austro-Asiatic speaking agriculturalists ceased their push east at the Wallace Line.

🔊 Listen RSS

If you have not read my post “To the antipode of Asia”, this might be a good time to do so if you are unfamiliar with the history, prehistory, and ethnography of mainland Southeast Asia. In this post I will focus on mainland Southeast Asia, and how it relates implicitly to India and China genetically, and what inferences we can make about demography and history. Though I will touch upon the Malay peninsula in the preliminary results, I have removed the Indonesian and Philippine samples from the data set in totality. This means that in this post I will not touch upon spread of the Austronesians.

I present before you two tentative questions:

– What was the relationship of the spread of Indic culture to Indic genes in mainland Southeast Asia before 1000 A.D.?

– What was the relationship of the spread of Tai culture to Tai genes in mainland Southeast Asia after 1000 A.D.?

The two maps above show the distribution of Austro-Asiatic and Tai languages in mainland Southeast Asia. Observe that when you join the two together in a union they cover much of the eastern 2/3 of mainland Southeast Asia. The fragmented nature of Austro-Asiatic languages in the northern region, edging into the People’s Republic of China, implies to us immediately that it is likely that in the past there was a continuous zone of Austro-Asiatic speech in this region. From the histories and mythologies of the Tai people we know that this group migrated from the southern fringes of China around ~1000 A.D. This is obvious when we note that there are still Tai people in southern China, and the expansion of the Tai across what is today Thailand is to some extent historically attested. Between 1000 and 1500 there was a wholesale ethnic reorganization of the Chao Phray river basin. Was that a matter of demographic replacement, or cultural assimilation, or some of both?

Second, what was the impact of Indians upon mainland Southeast Asia? One of the easiest ways to ascertain Indian influence is script. Burmese, Thai and Cambodian scripts all derive from Grantha, an archaic Tamil script (non-Islamic scripts in island Southeast Asia, such as Javanese and Balinese, are also derive from South Indian precursors). The Indian religious influences also are more southern than northern, manifesting in the southern forms of Shaivite Hinduism and Sri Lankan Theravada Buddhism.

There are three data sets which I looked at. I ran most of them from K = 2 to K = 12. This means that I threw all the individuals into a common pool and told the ADMIXTURE program to estimate their individual proportions of K number of populations. In this way we can get a general sense of the relationships of the populations. Remember that these aren’t necessarily real populations, and, the nature of the variation thrown into the pool impacts the nature of the inferred components greatly. I’m not reporting clear, distinctive, and objective entities extracted out of the data set. We’re looking at human intelligible interpretations of the patterns dependent upon the inputs and parameters. They’re telling us something real, but this isn’t like measuring the acceleration of a falling ball. It’s like describing the position of the ball in relation to a different set of reference objects. There’s a real ball with a specific position, but the descriptions are going to vary depending on what references you use (e.g., to the left of object A and below B, to the right of object C and above object D, etc.).

Here are the sets:

1) A “large” set which includes the mainland Pan-Asian populations, the white Americans from the HapMap, and some Malay peninsular groups.

2) A “medium” set which prunes most of the North Asian groups, Malaysian groups, and the white Americans. So it’s mostly mainland Southeast Asia, southern China, and India.

3) A “small” set, which removes many of the Southeast Asian populations, but keeps the Indian ones. I purposely overloaded this set with Indians to examine possibilities of Indian admixture in a few Southeast Asian groups.

Some notes. The Pan-Asian data set has ~56,000 markers. This is tolerable, but not optimal. It’s definitely good enough for European vs. Indian vs. East Asian vs. Negrito. But not less optimal for intra-regional variation. So take it with a grain of salt. But since I’m looking at Indian vs. East Asian, I’m mildly confident of that finding in relation to this data set. Second, the intersection of white Americans with the Pan-Asian set was ~30,000 markers. For Cambodians it was only ~22,000. There were ~100 white Americans, but only ~11 Cambodians. Be very cautious of the Cambodian results for this reason. Finally, remember that the ancestral components are abstractions, and can imply that stable and long admixed hybrid populations are their own distinct component, as well as isolates which are highly inbred.

There are three analyses and visualizations I will display below.

1) ADMIXTURE bar plots, which show the ancestral proportions of groups or individuals of a particular ancestral element.

2) Fst estimates across ancestral elements. This is a rough summary of genetic distance. I’ll also show you a two dimensional visualization on occasion, but remember that this removes some relationship information. The table is more accurate, though the visualization is easier to read.

3) Finally, I used EIGENSOFT to run some PCAs. This means that I took the pool of data and allowed the program to extract out the independent dimensions of variation. I ran it so that it pulled out the top 6 dimensions. The west-east dimension is always the largest by many multiples. Remember that the plots are not scaled.

I should also say that the K’s I’m showing are the most before inbred subgroups within the reported populations started breaking out into their own components (this happened especially within the Indians).

Starting at the beginning, I have noticed in the Pan-Asian data set that some groups, particularly Mons and Malays, seem to show Indian admixture. My question: is this really Indian admixture, or perhaps recent European admixture? That’s why I had the large data set, with white Americans. Here are the results:

So it seems unlikely that the Mon and Maly admixture with a West Eurasian element is from Europeans. Rather, it is consistent with Indians. In fact, I’m pretty confident it isn’t West Asian either, as is a possibility in the case of the Malays, because that component tends to align with Europeans at this scale. Finally, I will tell you that the admixture in both Mon and Malays is relatively even. In other words, the group estimates aren’t being shifted by one or two highly admixed Indians, which would be a good tell as to recent intermarriage. Not unheard of. Mahathir Mohamad’s paternal grandfather was a Kerala Muslim.

Now let’s look at the PCA. I’ll focus on dimensions 1, 2, and 3. Remember that these are the three largest dimensions of genetic variance rank ordered. Dimension one is by far the largest, by a factor of at least five usually in these plots. It’s the west vs. east Eurasian dimension.

I’ve highlighted the important bits. Two notes. First, I think you do see the suggestion that the Mon & Malay are shifted toward the Indians, not the Europeans. This is in perfect alignment with the ADMIXTURE result. Second, please note that the “Indian Singapore” population is heterogeneous. It is mostly Tamil, but there are clearly other Indians in the sample, and, some individuals who have Malay or Chinese ancestry.

Additionally, please note in the ADMIXTURE result above the similarity between the Tai and the Zhuang. The Zhuang are China’s second largest ethnic group, and reputedly the source population for the Tai migrations into mainland Southeast Asia. Before I move on, you should have some sense of the locations and ethno-linguistic affinities of some of the more obscure groups:

Location Group Language group
Northern Thailand Htin Austro-Asiatic
Northern Thailand Lawa Austro-Asiatic
Northern Thailand Mon Austro-Asiatic
Northern Thailand Palong Austro-Asiatic
Northern Thailand Plang Austro-Asiatic
Southern China Wa Austro-Asiatic
Northern Thailand Yao Hmong-Mien (Mien)
Southern China and Northern Thailand Hmong Hmong-Mien
Southern China Zhuang Tai
Northern Thailand Karen Tibeto-Burman
Southern China Jinuo Tibeto-Burman

One aspect which isn’t listed here is the classification of some of these populations as “hill tribes” or not. The Mon and the H’tin are both Austro-Asiatic, but the former are in some ways analogous to the Greeks on mainland Southeast Asia, while the latter are a tribal isolate which has preserved its identity in the hills of northern Thailand. By Greeks, I mean that the Mon have been assimilated or dominated by the Bamar in Burma and the Tai in Thailand, but in both cases have imparted to these groups the essence of Southeast Asian Indic high culture. The Mon were at one point ascendant from the lower Irrawaddy in southern Burma to the lower Chao Praya basin in Thailand, the terminus of which today is Bangkok. In contrast, groups like the H’tin and Lawa were presumably relatively insulated from Indic influence. The Hmong are relative newcomers to Southeast Asia, which explains their status as animists for example. Finally, you have groups like the Wa which are technically not even Southeast Asian, but are Austro-Asiatic. They should give us a sense of Austro-Asiatics without an Indic imprint.

Let’s move on to step two, the medium data set. I’m removing the white Americans, Malaysians, and North Asian groups. And now I’m including the Cambodians.

Again, the Mon have the Indian component. And so do the Cambodians. Remember that while everyone else has 56,000 SNPs, the Cambodians only have 22,000, so we need to be careful. Though you see this element in the HGDP runs as well. That is, an Indian affiliated component. It’s relatively evenly distributed among the Cambodians, so you can’t chalk it up to a few admixed individuals. Again, you see the similarity between the Zhuang and the Tai. The main difference is that the Tai seem to have admixed with various Southeast Asian groups. That’s to be expected. What surprised me though is that from these results it seems that the Tai expansion was demographically, not just linguistically, dominant. This is clear even the Bangkok sample. More on this later.

Below are the genetic distances between the inferred ancestral groups. The labels given the modal population, and then the language family:

Jinuo_Burman Htin_Austro Tai SouthAsian Palong_Austro Hmong
Jinuo_Burman 0 0.073 0.057 0.115 0.092 0.085
Htin_Austro 0.073 0 0.03 0.088 0.065 0.06
Tai 0.057 0.03 0 0.09 0.064 0.047
SouthAsian 0.115 0.088 0.09 0 0.117 0.117
Palong_Austro 0.092 0.065 0.064 0.117 0 0.09
Hmong 0.085 0.06 0.047 0.117 0.09 0

Here are some visualizations:

And here’s the PCA:

In this plot you see both the Mon and Cambodians shifted toward the Indians, again. Also, note the Zhuang and the Tai mostly overlap rather well. The y-axis is defined it seems by Austro-Asiatic hill tribes, then the Tibeto-Burman groups, and a gap until you hit the Tai cluster, which eventually merges with the Hmong. There’s a reasonable language family affinity here, insofar as the Yao are between the Tai and the Hmong.

Finally, we move to the Indo-centric run. I’ve removed a lot of the Southeast Asian groups now. Some of the hill tribes are obviously relatively isolated, and so throw up their own clusters or diverge on PCA rather easily. That’s a function of genetic differences which build up if you are relatively insulated from gene flow. Because I removed so many populations I’m only left with three K’s before you get qasi-family clusters showing up as K’s. Also, I’m going to show you individual bar plots for Cambodians and Mon to illustrate that the Indian component isn’t just isolated instances of admixture:

The Fsts are straightforward in this case:

Austro-Asiatic Tai South Asian
Austro-Asiatic 0 0.028 0.084
Tai 0.028 0 0.085
South Asian 0.084 0.085 0

It’s the PCA which is really interesting in this run. The first isn’t too exceptional:

OK, first, since this is an Indian focused set, you see that there’s more than the standard west-east dimension. You have several lower order dimensions which separate Indians! I had previous assumed that the Indian component which always shows up in the Cambodians in the HGDP was a function of deep ancient ancestry with the “Ancestral South Indians” of Reich et al. This ancient population may have had affinities with many groups out toward Southeast Asia, and so the residual cluster in Cambodians may have been part of the deep Ice Age ancestry of this group. These results convince me that this is not so straightforward an explanation. In this sample the group that has the highest ASI are the Bhils, a tribal population. In one of the plots you see that the Bhils form one end of the distribution, and Gujarat Vaishyas the other. It is clear that this is an Ancestral North Indian-Ancestral South Indian cline. The Mon and Cambodians don’t deviate much from the center, suggesting to me that they aren’t too skewed toward the ASI! Additionally, the “center” of the distribution is weighted toward caste South Indians. This is then is a nice resolution, because it dovetails perfectly with the historical evidence for a South Indian specific influence on Southeast Asia in the early historic period.

This isn’t a slam dunk. There needs to be estimates of the time since admixture. It should post-date the ANI-ASI admixture event, and be in the same range as the Uyghurs. Unfortunately with only 56,000 SNPs I’m not sure this estimate is possible, but I’ll look into it. Additionally, a deeper survey of Y and mtDNA lineages need to be done in Southeast Asia. They may show sex-biased migration. I did look for the West Eurasian specific SLC24A5 variant, which goes no lower than ~50% in South India, but that’s not in the Pan-Asian SNP data set. It is in the HGDP, and none of the 11 Cambodians have it. This would lean toward the ASI hypothesis, but seeing as how the West Eurasian variant may only about ~50%, and the Cambodians are less than 10% South Asian, it isn’t totally implausible that it wouldn’t show up in 22 gene copies (using realistic assumptions I get a ~50% probability that a West Eurasian copy of SLC24A5 wouldn’t be found in the Cambodians with N = 11).

I’ve not devoted too much space to the Tai-Zhuang connection in this post, because it’s obvious in the plots. The Tai are obviously somewhat shifted toward Austro-Asiatic groups, but far less than I would have expected. In fact, taking the ADMIXTURE components too literally you might infer that there’s been more Tai admixture into the Mon and Khmer than the other way around! This might not be totally implausible when you consider that Thailand’s population is nearly five times that of Cambodia. But the standard model I’ve read suggests that Tai warrior bands conquered the Mon-Khmer indigenes, and absorbed much of their high culture. These results don’t cohere easily with that in terms of demographics.

I have a possible explanation for what occurred. Much of Thailand may not have been too populous until the past ~1,000 years, with lowland agriculture being driven by elite direction. The Tai may have brought superior agricultural techniques, and so entered into a phase of rapid population expansion into the lowland frontier, which had no parallel during the Mon and Khmer period of dominance. In other words, the Tai bands were small and initially outnumbered by the Mon and Khmer. But through favorable resource direction and priority allocation of newly arable land to co-ethnics the small Tai population might quickly have come to dominate the previous inhabitants. This is the model which is outlined in the Rise of Islam and the Bengal Frontier. In it the author basically argues that eastern Bengal was lightly populated until large scale Muslim elite driven projects to open up the agricultural frontier. The recruited peasants were either Muslim or converted to Islam, because the cultural landscape was relatively fluid and unsettled, in contrast to the more static peasant economy of western Bengal, which remained Hindu. The Islamicization of eastern Bengal in this model had less to do with the conversion of native tribes, and more to do with the rapid demographic expansion of Bengali peasant colonies which were enabled by agricultural projects, colonies which were Islamicized or were drawn from the minority Muslim peasantry of the western zone by Mughal elites intent on creating a region where the Hindu upper castes were marginalized. Similarly, the Tai expansion in Southeast Asia may have been into a de facto “empty” landscape. During the period when Mon and Khmer high culture was absorbed the Tai may have been the smaller element in terms of numbers. The current ratios are a function of later social and demographic processes.

🔊 Listen RSS

Negrito, Philippines. Credit: Ken Ilio

In the post below I mentioned that the Malaysian and Philippine Negritos seem to be two very distinct populations. This was something I wanted to explore in more detail, so I naturally decided to poke around the Pan-Asian SNP data set. The aims are made somewhat more difficult by the fact that there are only ~56,000 markers in the data set (as opposed to ~600,000 in the HGDP and more than 1 million in the HapMap). Additionally, the intersection with other data sets is small. For example, only ~20,000 SNPs with the HGDP. With all that in mind I hazarded that something is better than nothing. Relatives and HapMap populations were removed from the data set (thanks Zack). Additionally, I beefed up the South Asian populations with the Gujaratis from the HapMap,which had an intersection of ~32,000 SNPs. After a few test runs I decided to remove the Mlabri. They always shook out very early as a separate population from many others nearby, and, their genetic distances were very high. This tribe is only numbered in the hundreds, and I wouldn’t be surprised if they’ve been subjected to a lot of population bottlenecks, resulting in some very distinctive allele frequencies.

But before I move to the results, let’s back up for a moment. Who are the “Negritos”? As suggested by the term Negrito refers to a range of populations which are characterized by small size and African-like features (very dark skin and frizzy hair). In general their distribution is limited to Southeast Asia (there are suggestions that a Negrito population may only recently have gone extinct in Australia’s rainforests, but that’s speculative. On a more antique scale there are records which may be interpreted to suggest the existence of Negritos in Taiwan as late as 1900, and in southern China within the past 1,000 years). So you can bracket their distribution from the Andaman Islands to the Philippines, with isolated groups in the Malay peninsula. Negritos are presumed to be the original inhabitants of Southeast Asia before the arrival of rice farmers from the north. Like the Pygmies of Africa most of the Negritos speak languages whic hare known in other populations. Those of the Philippines speak Austronesian dialects. Interestingly those of Malaysia speak an Austro-Asiatic language, and so have affinities with many groups to their north linguistically, being surrounded by Austronesian speakers. Only the Andaman Islanders have a distinctive language, which makes sense seeing as how they have been relatively isolated from mainland Asian influences.

I ran ADMIXTURE from K = 4 to K = 12. K = 8 seemed the most informative to me (at higher K’s the major dynamic is that the Philippine Negritos start fragmenting into many distinct clusters). I’ve made a few cosmetic changes. With this East and Southeast Asia heavy data set there’s almost no difference between all the various Indian groups, so I amalgamated them together. I also did the same for related populations geographically adjacent which exhibited no genetic difference (e.g., Central and East Javanese).

The distinctiveness of the two Negrito groups is rather obvious. But what makes it even more obvious are the Fst values across two inferred clusters.

NE Asian P Negrito Hmong Melanesian S Asian M Negrito Austro-Asiatic Austronesian
NE Asian 0 0.101 0.046 0.099 0.095 0.098 0.07 0.045
Philippines Negrito 0.101 0 0.11 0.105 0.113 0.124 0.108 0.096
Hmong 0.046 0.11 0 0.108 0.107 0.099 0.072 0.058
Melanesian 0.099 0.105 0.108 0 0.099 0.113 0.104 0.1
S Asian 0.095 0.113 0.107 0.099 0 0.104 0.103 0.108
Malaysian Negrito 0.098 0.124 0.099 0.113 0.104 0 0.086 0.098
Austro-Asiatic 0.07 0.108 0.072 0.104 0.103 0.086 0 0.062
Austronesian 0.045 0.096 0.058 0.1 0.108 0.098 0.062 0

What’s clearly evident here is that the largest genetic distance across any two inferred populations is between the Malaysian and Philippine Negrito clusters! Let’s visualize the relationships a bit:

Observe that Philippine Negrito cluster tends to have an affinity for Austronesians and the Malaysian Negrito one has one for the Austro-Asiatics. You can tell from the different tribes that there’s varied admixture with Austronesians in the former case, but what about the Malaysian Negritos? (who are of the Jehai and Kensui affiliation) Here’s the individual bar plot:

There’s a substantial Austro-Asiatic admixture in this population. We have seen before that ADMIXTURE’s inferred population clusters are only a rough guide to the real populations which existed in the past. The reality is that the history of the human race has been characterized by repeated fissions and fusions of populations, so all ADMIXTURE clusters are going to be composites anyway depending on the chronology which you are using. The affinity of the Malaysian and Philippine Negrito clusters to the ones geographically close to them suggest to me that they’re telling us about ancient admixture events which have been recombined to the point that a new composite population naturally falls out of the algorithm. It may be that after Taiwan the Philippines was the first major landfall of the original Austronesians, so they may have had a substantial impact on the Negritos very early one. This is already clear in most of the Negrito tribes even with ADMIXTURE. In the case of the Malaysian Negritos the admixture is less extreme, but the fact that they speak an Austro-Asiatic language points to hybridization.

I also ran some principal component analyses. Basically each dimension represents an independent axis of variation in the genetic data set. The first component of variation is clearly the west-east one ins Eurasia. Note that it’s 5 times larger than the second dimension. I’ve highlighted the two Negrito populations. Observe that in general they don’t manifest a particular close relationship:

So what’s the moral of this story? Don’t judge a book by its cover! Up until the waters rose with the end of the last Ice Age ~12,000 years ago the southeast region of Eurasia, and out toward Oceania, were far less fragmented. Biogeographically you have Sundaland where Malaysia and western Indonesia are today. To the east there was Sahul, which combined Australia and New Guinea. Expanding out to India and southeast to Australia physical anthropologists in the 20th century posited the Australoid race. Using ADMIXTURE and PCA you can see shadows of an Ice Age Southeast Eurasian race which extended from India to Australia. Shadows because outside of Australia and Melanesia it is either a thin submerged layer, or it persists as residual tribes such as the Ati of the Philippines or the Semang of Malaysia.

Malaysian Negritos ~1900

But these populations had their own population structure and distinctions. My bet would be that the Malaysian Negritos are closer to the Onge of the Andaman Islanders, and that these two groups emerged out of a western branch of Sundalanders. The Philippine Negritos are from an eastern branch, and their closer affinity to Melanesians may be due to longstanding gene flow across the two populations (another option is that very old Austronesian admixture in both has shifted their non-Austronesian components closer together in allele frequency). Overall, I suspect that the past ~10,000 years have been radically different from the past insofar as population replacement and expansion has occurred on a scale never before seen due to the demographic impact of agriculture. The remaining Negritos may be the tip of the iceberg in terms of the genetic diversity which disappeared as the hunters gave way to the farmers.

Image credit: Maximilian Dörrbecker

🔊 Listen RSS

The Pith: the genetic relationships between bacteria in our stomach can tell us a lot about the relationships between various groups of people. Additionally, the distribution of different strains of bacteria may have significant public health implications.

The above image is from a paper which was pushed online yesterday in PLoS ONE: Evolutionary History of Helicobacter pylori Sequences Reflect Past Human Migrations in Southeast Asia. It’s a paper which caught my attention for several reasons. First, I’ve exhibited some curiosity about the history and prehistory of Southeast Asia of late. Elucidating this region’s historical dynamics may bear upon more general questions of human evolutionary and cultural process. Second, H. pylori is a fascinating organism whose connection to specific human populations is tight enough that it can shed light on past interactions of different groups. In short, just like humans H. pylori exhibits regional specificity and local history. But additionally, H. pylori is also subject to natural selection after introduction into a new population, and so can serve as a window upon cultural contacts which might otherwise leave a light demographic footprint. In other words, the spread of H. pylori across human populations may be compared to the spread of Buddhism. This religion came to China and Japan with some Buddhists of South and Central Asian origin, but by and large its spread was memetic rather than through natural increase of a Buddhist population.

First, let’s hit the abstract:

The human population history in Southeast Asia was shaped by numerous migrations and population expansions. Their reconstruction based on archaeological, linguistic or human genetic data is often hampered by the limited number of informative polymorphisms in classical human genetic markers, such as the hypervariable regions of the mitochondrial DNA. Here, we analyse housekeeping gene sequences of the human stomach bacterium Helicobacter pylori from various countries in Southeast Asia and we provide evidence that H. pylori accompanied at least three ancient human migrations into this area: i) a migration from India introducing hpEurope bacteria into Thailand, Cambodia and Malaysia; ii) a migration of the ancestors of Austro-Asiatic speaking people into Vietnam and Cambodia carrying hspEAsia bacteria; and iii) a migration of the ancestors of the Thai people from Southern China into Thailand carrying H. pylori of population hpAsia2. Moreover, the H. pylori sequences reflect iv) the migrations of Chinese to Thailand and Malaysia within the last 200 years spreading hspEasia strains, and v) migrations of Indians to Malaysia within the last 200 years distributing both hpAsia2 and hpEurope bacteria. The distribution of the bacterial populations seems to strongly influence the incidence of gastric cancer as countries with predominantly hspEAsia isolates exhibit a high incidence of gastric cancer while the incidence is low in countries with a high proportion of hpAsia2 or hpEurope strains. In the future, the host range expansion of hpEurope strains among Asian populations, combined with human motility, may have a significant impact on gastric cancer incidence in Asia.

H. pylori can be separated into very distinctive lineages of geographically limited scope, despite some horizontal gene flow. One clade seems generally restricted to western Eurasia, another to eastern Eurasia, and there are some Africa specific lineages as well. But within these particular clades one can drill-down to a finer-grain. For example, there are Indian lineages within the broader west Eurasian family of strains. As mutation over time results in the build up of distinctive variants in localized populations, a simple assessment of mutational steps between lineages can allow one to infer a tree of descent from a common ancestor.

Let’s tack for a moment to some history without microbial goodness. To some extent Southeast Asia can be considered part of “Greater India,” more or less. This is most evident in Thailand and Cambodia, two nations which are cultural heirs to the Khmer civilization which produced Angor Wat. The religious and artistic sensibilities of both these modern societies are deeply imprinted by South Asian norms through that precursor polity. The Theravada Buddhism of these societies still has a vital connection to South Asia (especially Sri Lanka) and is more obviously Indian in its sensibility than for example the Zen sect of Japan (which derives from Chinese Chan). In Vietnam there remains a small group of Malay Cham Saivite Hindus, the remnants of the Champa Empire.

The affinities in maritime Southeast Asia are a bit clouded because of the interposition of Islam between moderns and the Dharmic past. Only the Balinese remain as a vital living heir to the Indian-influenced polities of early Indonesia, Srivijaya and Majapahit. Despite this notional reality the Indian influence remains discernible even among Muslim Indonesians, in particular in East Java, where shadow puppet shows of the Ramayana remain popular. Like Angor Wat, Borobudur in Java is a testament to the monumental Indian past. But even the avowed Islamic flavor of modern maritime Southeast Asia may have some Indian connection, insofar as there is the possibility that South Asian Muslims were critical players in the eastern Indian ocean trade network which slowly Islamicized over the course of the second millennium.

We are then presented with the question: if the Indian influence in Southeast Asia was so strong in the past, where are the genes of Indians? The authors note that mitochondrial DNA analyses, the maternal lineage, show no South Asian specific lineages in appreciable frequencies among native populations. A fixation on mtDNA seemed rather strange to me for two reasons. First, with the PanAsian SNP data set there’s some autosomal data. Second, there are strong reasons to suppose that Indian migrants would be male. The myths and sketchy historical references of this period don’t seem to envisage mass folk migrations, where Indian men bring their women and children and recreate their homelands. Rather, often these men are portrayed as religious specialists or military leaders of genius. The authors note that there is evidence of Indian artisans in Thailand ~2,000 years ago. This is eminently plausible, there are references to towns of Indian merchants in Sumeria ~4,000 years ago! But again, there is no reason that these artisans necessarily brought their wives. Rather, if they were purchased for their skills they may simply have been the human property which was the object of capitalist transactions between two autocrats.

The nature of cultural transfer, and the relatively high fidelity of that transfer, implies to me that some Indians did migrate to Southeast Asia. But they were few, and their genetic impact was minimal. Rather, what we see is the power of memes to operate very differently from genes. The Indian memes rapidly swallowed up the cultural commanding heights, and became normative from Java to northern Thailand (northern Vietnam is the exception to this rule, as it was influenced by China).

H. pylori shares many of the same tendencies as memes, despite its more concrete biological character. As bacteria it can spread rapidly within a population, and decouple itself from the endogenous natural increase of its original hosts. That spread can be driven by natural selection which means that it isn’t a good representation of the ancestry of its hosts. But even natural selection can’t erase the inferences one can make about original contacts between distinct groups.

In this paper the authors present evidence from the nature of H. pylori in Southeast Asia that there was tangible physical contact between Indians and Southeast Asians in the antique past. More precisely, below is a figure which shows the nature of relationships of west Eurasian H. pylori lineages in India and Southeast Asia, with European and other west Eurasian samples as a control.

What you see here is that Indian H. pylori is basal to the Southeast Asian branches, though within the same clade against the European lineages. This tells you that there’s an affinity between Indian and Southeast Asian lineages under consideration here, but that that affinity is diminished by a period of separation. This matters because some regions of Southeast Asia, such as Malaysia, have a large Indian population which arrived in the past few centuries. The fact that there is a distinct Southeast Asia specific lineage suggests that there has been a long period of separation between the two populations, and one can’t attribute the frequency of the west Eurasian Indian H. pylori simply to recent contacts. At least in most of Southeast Asia. It turns out that in the Philippines the west Eurasian H. pylori cluster with Spanish populations. This has to be the outcome of hundreds of years of colonialism.

There’s also this fascinating historical and geographical tidbit:

A study on the distribution of H. pylori virulence factor cagA among Vietnamese identified 84% of the strains harbouring the type II of the cag-right motif…which is characteristic for East Asian strains (hpEastAsia), ranging from 76% in Ho Chi Minh city in South Vietnam to 93% in Hanoi in North Vietnam. However, there was a remarkable difference in the frequency of cag-right motif of type I which is predominant in European (hpEurope) strains. While the type I motif was absent from North Vietnam, it was found in 8/49 (16%) of the samples from Ho Chi Minh city near the Mekong delta. Interestingly, prior to annexation by the Vietnamese in the 17th century, this city was an important Khmer sea port known as Prey Nokor…Thus, hpEurope strains also seem to be frequent among Vietnamese in the Mekong delta, and thus the Annamite mountain range that originates in the Tibetan and Yunnan regions of southwest China and forms Vietnam’s border with Laos and Cambodia seem to have shaped an effective natural barrier for the containment of Indian influence in the Mekong basin, explaining the low prevalence of hpEurope strains elsewhere in Vietnam.

The geographic contours of the nation-state of Vietnam as we understand it today are a relatively new phenomenon. The Vietnamese people, the Kinh, are an ancient nation. But for most of the past ~2,000 years what we know as Vietnam was divided between the Kinh in the north, and the Khmers and later Austronesian Chams in the center and south. Unlike the other peoples of Southeast Asia the Kinh looked to the north, to China, as their cultural model. While India’s influence in Southeast Asia (excepting the Chola adventures) has been through “soft power,” the Chinese have periodically ruled Vietnam directly, and otherwise placed it into the category of tributary state.

There needn’t be any geographical determinism here. Projection of cultural or military power declines in proportion to distance. In relation to culture that projection does not decline linearly, but often exhibits a sharp break. The Vietnamese did not move the Annamite range south when they defeated Champa and began to swallow the eastern flank of the Khmer kingdom. Rather, they shifted populations and cultural identities of populations, and therefore the civilizational boundaries. The line which separated Indic and Sinic moved south with the spread of the Kinh and the retreat of the Khmer. This did not eliminate in totality the Indic influence. Hindu Cham remain in Vietnam, while forms of Therevada Buddhism have some purchase in the Mekong delta, unlike in the rest of country where Chinese derived Mahayana reigns supreme. And so it is that Indic H. pylori also remains as a residual in the southern regions of Vietnam, evidence of the trade and cultural networks which bound it to Greater India as some point in the past.

Next let’s look at the distribution of East Asia specific H. pylori:

The figure is hard to read, but here’s the short of it: there are Amerindian, Taiwan-Oceanian, Chinese, and Southeast Asia specific lineages. More specifically the authors attempt to infer the origin of one particular Southeast Asia specific lineage which exhibits some overlap in southern China. This is because they believe that it can trace the migration of the Austro-Asiatics, likely the first agriculturalists in Southeast Asia. The H. pylori strain in question spans southern China to Malaysia. The geographic zone encompasses regions now inhabited by Thai or Malay speakers, but it seems likely that at one point the whole zone was dominated by Austro-Asiatics. The clincher would be to see if Munda from northeast India carry this same H. pylori strain. In fact an analysis of the phylogenetic tree of strains of H. pylori found in Austro-Asiatic populations or their descendants might be able to move the needle on whether they’re exogenous to India or not (the “older” lineages should be basal).

So far I’ve been focused on issues of phylogeny. How populations of humans and bacteria relate to each other. But there are functional and adaptive implications and dynamics at work. In terms of adaptation it seems that some strains of H. pylori are simply more fit than others in some environments. The Spanish presence in the Philippines was very light demographically over the centuries of their colonial rule. There was considerable residential segregation of the Spanish away from the natives, and the Chinese, who outnumbered the Spaniards often by two orders of magnitude. And yet you have a situation where H. pylori of Spanish provenance seems to be dominant. Why? The authors report that there’s a fair amount of evidence that European H. pylori strains are generalists who outcompete the specialist East Asian and Amerindian lineages. I think one can’t ignore the reality that the “European” strains are endemic to a huge swath of western Eurasia, from Europe to India. Because of their large population sizes these lineages probably have more diversity than the other populations, and so can adapt to a wide range of conditions.

A functional and public health concern is that East Asia H. pylori may be the cause of the much higher stomach cancer rates in that region of the world. You probably know that H. pylori is a critical player in ulcers, so its impact in this region shouldn’t be a surprise. Prior to reading this paper I’ve heard that East Asian stomach cancer rates were due to condiments used. This goes to show the difficulty of much of medical science which relies on correlations and rough guesses about causality.

Obviously I’m interested in what markers such as the distribution of pathogens which are reliant on humans can tell us about history. But over the long term the complex interplay between these pathogens, disease risk, and other phenotypic characteristics, is where the real action is going to be.

Citation: Breurec S, Guillard B, Hem S, Brisse S, Dieye FB, & et al. (2011). Evolutionary History of Helicobacter pylori Sequences Reflect Past Human Migrations in Southeast Asia PLoS ONE : 10.1371/journal.pone.0022058

Image credit: Mark Alexander

🔊 Listen RSS

Markers show populations sampled by HUGO Pan-Asian SNP Consortium

The Pith: Southeast Asia was settled by a series of distinct peoples. The pattern of settlement can be discerned in part by examination of patterns of genetic variation. It seems likely that Austro-Asiatic populations were dominant across the western half of Indonesia before the arrival of Austronesians.

About a year and a half ago I reviewed a paper in Science which did a first pass through some of the findings suggested by the HUGO Pan-Asian SNP Consortium data set, which pooled a wide range of Asian populations. You can see the locations on the map above (alas, the labels are too small to read the codes). The important issue in relation to this data set is that it has a thick coverage of Southeast Asia, which is not well represented in the HGDP. Unfortunately there are only ~50,000 markers, which is not optimal for really fine-grained intra-regional analysis in my opinion. But better than nothing, and definitely sufficient for coarser scale analysis.

A few things have changed since I first reviewed this paper. First, I pulled down a copy of the Pan-Asian SNP data set. I’m going to play with it myself soon. Second, after reading Strange Parallels, volume 1 and 2, I know a lot more about Southeast Asian history. Finally, the possibility of archaic admixture amongst Near Oceanians makes the genetics of the regions which were once Sundaland and Sahul of particular interest.

Before we hit the genetics, let’s review a little of the ethnography of Southeast Asia, as this may allow us to tease apart the meaning of some of the results. The largest ethno-linguistic group in Southeast Asia is that of Austronesians. An interesting point in relation to Austronesians is that they aren’t limited to Southeast Asia. As you can see the Austronesians range from off the coast of South America (Easter Island) to southeast Africa (Madagascar). Though there’s debate about this issue it seems to me that the most likely current point of departure of the Austronesian migration is Taiwan. Though today Taiwan is predominantly Han Chinese, that is an artifact of relatively recent migration. The indigenous population is clearly Austronesian.

A second language family which is somewhat expansive, though Southeast Asia focused, is Austro-Asiatic. There is a great deal of internal structure to this ethno-linguistic group, in that there is a well known coherent Mon-Khmer cluster, which includes some ethnic minorities in Burma and Thailand, as well as Cambodians. Additionally you have Vietnamese in the east and some tribal groups in northeast India. There has long been debate about whether these Indian tribes, the Munda, are the original Indians, to be supplanted later by Dravidian and Indo-Aryan speakers, or intrusive to the subcontinent. I believe that the most recent genetic data points to intrusion from the east into South Asia. Austro-Asiatic was likely less fragmented in mainland Southeast Asia before the historical period. Both the dominant ethnic groups in Burma and Thailand are intrusive and absorbed Mon-Khmer populations, the latter dynamic being historically attested.

Finally there are the ethno-linguistic clusters of Burma and Thailand (and Laos). The former nation is dominated by the Bamar, a Sino-Tibetan population with origins in South China ~1,500 years ago. In Burma the Mon substrate persists, while the Shan people of Thai affinity reign supreme across the northeastern fringe of the nation. In Thailand and Laos the Mon-Khmer substrate has been marginalized to isolated residual groups. But it is notable that in both these polities the Mon-Khmer populations set the tone for the civilizational orientation of the conquering ethnicities. The Thai abandoned Chinese influenced Mahayana Buddhism for the Indian influenced Theravada Buddhism of the conquered populace. Despite the notional ethnic chasm between the Thai and the Khmer of Cambodia, the broad cultural similarities due to the common roots in the society of the Khmer Empire is clear.

With the ethnographic context in place, let’s look at the two primary figures which we get from the paper. The first figure shows a phylogenetic tree of the relationships of the populations in their database, color-coded by ethnolinguistic group. Next to that tree there’s a STRUCTURE plot at K = 14, which means 14 ancestral populations. They’ve colored the bar components to match the ethno-linguistic classes (e.g., red = Austro-Asiatic, an Austro-Asiatic modal component). The second figure shows two PCA panels. PC 1 is the largest component of genetic variance in the data set, and PC 2 the second largest. I’ve added a label for the Papuan populations.

Going back to the chronology above, we know that the Thai came last. The Sino-Tibetans came before then. The issue I wonder about is the relationship of the Austronesians and Austro-Asiatic groups. Interestingly the Austronesian proportions are high not only in island Southeast Asia, but also among many South Chinese groups. In contrast, among the Mon-Khmer hill tribes of Thailand, who are presumably representative of groups which were present before the Thai migrations, it is absent. And it is notable to me that not only does Austro-Asiatic exhibit fragmentation in relation to Thai and Sino-Tibetan, but it does so to some extent with relation to Austronesian! The indigenous folk of central Malaysia seem to speak a Austro-Asiatic language. Finally, the Austro-Asiatic component rises in frequency on the southern fringes of island Southeast Asia, in densely populated Java.

Because of the thicker textual record for mainland Southeast Asia we know that the Austro-Asiatic groups predate the Thai and Sino-Tibetan ones. I believe that the Austro-Asiatic element also predates Austronesian in Southeast Asia. That is, I believe that an Austro-Asiatic substrate existed before the arrival of Austronesians from the zone between the Philippines and Taiwan. The Negritos of inner Malaysia, who are genetically and physically distinctive, speak Austro-Asiatic languages. This should not be surprising, it seems that hunter-gatherer groups often switch to the language of resident agriculturalists. Because of their isolation some of these groups have persisted in speaking the languages of the “first farmers” of Malaysia, even after those pioneers were absorbed by newcomers.

The PCA shows clearly that the Austronesians are the genetically most varied of these Southeast Asian groups. Why? I believe it is because they are late arrivals who have admixed in sequence with whoever was resident in their target zones. In the east of island Southeast Asia the admixture occurred with a Melanesian population. Both the STRUCTURE plot and the PCA show evidence of this sort of two-way admixture. The STRUCTURE is straightforward, but note the linear distribution of the Austronesians in relation to outgroups in the first panel, and implicitly on the second.

Why is the Austro-Asiatic fraction higher in Java than to the zones in the north? Java is today the most densely populated region of Indonesia because of its fertility. I hypothesize that the spread of the Austronesians was facilitated by a more effective form of agriculture which could squeeze more productivity out of marginal land. Relative to Java the Malay peninsula, Borneo, and Sumatra, are agriculturally marginal. The densities of the Austro-Asiatics was greatest in Java, while they were very thin in the regions to the north. It seems likely that the Austronesians engaged in a series of “leap-frogs” to islands and maritime fringes which were not cultivated by the Austro-Asiatic populations. Some Indonesian groups, such as the Mentawai who live on the island of the same name off the western coast of Sumatra, cluster with the Taiwanese, as if they transplanted their society in totality.

One thing that needs to be mentioned when talking about the genetics and prehistory of Southeast Asia are the “Negritos.” As indicated by their name these are a small people with African-like features. As is clear from the charts above these people are not particularly genetically close to Africans. The Philippine Negritos seem to have some relationship to the Melanesians. Interestingly they speak an Austronesian language; again following the trend where marginalized indigenes seem to pick up the language of their farming neighbors. The Negritos of Malaysia are somewhat different, but note that one of the populations exhibits Austro-Asiatic, but not Austronesian, admixture. This comports with my supposition that the Austro-Asiatic populations were the first to marginalize these tribes before themselves being assimilated by the Austronesians.

Someone with a better ethnographic understanding of Southeast Asia than I could probably decode the results above with greater power. But at this point I think we’ve got a chronology like so:

1) First you have hunter-gatherer populations of broad Melanesian affinities in Southeast Asia.

2) Then Austro-Asiatic populations move south from the fringes of southern China. Some push west to India, while others leap-frog south to zones suitable for agriculture such as Java.

3) Then Austronesian populations sweep south along water routes, and marginalize the Austro-Asiatics in island Southeast Asia, though the not on the mainland.

4) The Bamar arrive from southern China over 1,000 years ago, and marginalize the Austro-Asiatics in Burma.

5) The Thai arrive from southern China less than 1,000 years ago, take over the central zone of mainland Southeast Asia, and make inroads to the west in Burma.

I will hazard to guess that the Malagasy of Madagascar are Austronesians who have very little of the Austro-Asiatic element in their ancestry. I believe this is so because they were part of the leap-frog dynamic where societies were transplanted from suitable point to point by water (the Malagasy language seems to be a branch of dialects of southern Borneo!).

So far I’ve been talking about the north to south movement. And yet the paper observes a south or north gradient in genetic diversity, which implies to the authors migration from south to north (the northern East Asian groups being a subset of the southern). But the past may have been more complex than we give it credit for. It is entirely possible that modern humans arrived in northeast Asia via a southern route, retreated south during the glaciation, and expanded north, with some groups pushing back south again. As it is, looking at how distantly the Melanesians relate to East Eurasians I think the most plausible model is that there wasn’t a relatively recent expansion from Southeast Asia. Rather, the ancestors of most East Eurasians survived in refugia in China, and a sequence of agriculturally driven expansions have reshaped Southeast Asia more recently. These populations admixed with the indigenous substrate, more or less. This would have resulted in an uptake of genetic diversity. Finally, the massive expansion of Han from the Yellow river basin may have caused the extinction of many lineages across China within the past ~3,000 years.

Citation: ., Abdulla, M., Ahmed, I., Assawamakin, A., Bhak, J., Brahmachari, S., Calacal, G., Chaurasia, A., Chen, C., Chen, J., Chen, Y., Chu, J., Cutiongco-de la Paz, E., De Ungria, M., Delfin, F., Edo, J., Fuchareon, S., Ghang, H., Gojobori, T., Han, J., Ho, S., Hoh, B., Huang, W., Inoko, H., Jha, P., Jinam, T., Jin, L., Jung, J., Kangwanpong, D., Kampuansai, J., Kennedy, G., Khurana, P., Kim, H., Kim, K., Kim, S., Kim, W., Kimm, K., Kimura, R., Koike, T., Kulawonganunchai, S., Kumar, V., Lai, P., Lee, J., Lee, S., Liu, E., Majumder, P., Mandapati, K., Marzuki, S., Mitchell, W., Mukerji, M., Naritomi, K., Ngamphiw, C., Niikawa, N., Nishida, N., Oh, B., Oh, S., Ohashi, J., Oka, A., Ong, R., Padilla, C., Palittapongarnpim, P., Perdigon, H., Phipps, M., Png, E., Sakaki, Y., Salvador, J., Sandraling, Y., Scaria, V., Seielstad, M., Sidek, M., Sinha, A., Srikummool, M., Sudoyo, H., Sugano, S., Suryadi, H., Suzuki, Y., Tabbada, K., Tan, A., Tokunaga, K., Tongsima, S., Villamor, L., Wang, E., Wang, Y., Wang, H., Wu, J., Xiao, H., Xu, S., Yang, J., Shugart, Y., Yoo, H., Yuan, W., Zhao, G., & Zilfalil, B. (2009). Mapping Human Genetic Diversity in Asia Science, 326 (5959), 1541-1545 DOI: 10.1126/science.1177074

🔊 Listen RSS

As I am currently reading Victor Lieberman’s magisterial Strange Parallels: Volume 2. So I was very interested in a new paper from BMC Genetics, Genetic structure of the Mon-Khmer speaking groups and their affinity to the neighbouring Tai populations in Northern Thailand, pointed to by Dienekes today. Here are the results and conclusions:

A large fraction of genetic variation is observed within populations (about 80% and 90 % for mtDNA and the Y-chromosome, respectively). The genetic divergence between populations is much higher in Mon-Khmer than in Tai speaking groups, especially at the paternally inherited markers. The two major linguistic groups are genetically distinct, but only for a marginal fraction (1 to 2 %) of the total genetic variation. Genetic distances between populations correlate with their linguistic differences, whereas the geographic distance does not explain the genetic divergence pattern.

The Mon-Khmer speaking populations in northern Thailand exhibited the genetic divergence among each other and also when compared to Tai speaking peoples. The different drift effects and the post-marital residence patterns between the two linguistic groups are the explanation for a small but significant fraction of the genetic variation pattern within and between them.

There are many occasions when it has taken a synthetic scholar to point out to me the overall structure of a constellation of facts which I was conscious of prior. So it is with Lieberman’s work. I had known that the eruption of the Thai peoples into Southeast Asia occurred with the last 1,000 years, before which the peninsula was divided between Tibeto-Burman populations to the west and Austro-Asiatic languages to the east (the latter divided between the Khmer and Vietnamese). Additionally, it is presumed that the Tibeto-Burman languages themselves displaced Austro-Asiatic in the western zone (as evident by the persistence of Mon in modern Burma). What was noted in volume 1 of Strange Parallels though is that the three geographical regions engaged with and assimilated the Thai invasions different. In the center the Thai succeeded in dominating the previous groups and imposing their identity upon the region. It is often asserted that modern Cambodia’s existence as an independent state is a function of the protection conferred upon it by the French from the expansive ambitions of the Empire of Siam. But in the east the Vietnamese state was barely impacted by the Thai folk wandering. As in China the Thai in Vietnam are marginalized “mountain tribes.” Finally, in the west, in the zone which became Burma, the Thai did not take over the cultural commanding heights. But neither were they absolutely marginalized as in the east. Rather, the Shan people became part of the of the Burmese landscape, integrated into the Theravada Buddhist culture, but also a significant secondary ethnos to the Burman majority (along with Karens, Mons, etc.).

What does this have to do with genetics? Possibly everything and nothing, and all answers in between.

The massive shift in ethno-linguistic identity in the center of mainland Southeast Asia, its lack in the east, and position at the equipoise in the west, should be excellent tests of propositions as to the nature of the spread of such ethno-linguistic identities. Is it pure construction, demographic replacement, or some quantitative combination of the two parameters? Unfortunately the BMC Genetics paper focuses only on Y chromosomes and mtDNA, the paternal and maternal lineage. These markers are informative, but I’d rather look at total genome content. The ethnic coverage in a small area of northern Thailand though is impressive. The open circles represent Mon-Khmer ethnic groups, the dark ones Thai. The Mon-Khmer are the presumed indigenes, while the Thai are intrusive. At least over the past 1,000 years.

Below I’ve reedited the Y and mtDNA multidimensional scaling plots. The Y is on the left, and mtDNA on the right. The clustering pattern shows relationships across the lineages. Again, the open markers represent Mon-Khmer groups, and the closed ones Thai.

Since the paper is open access I invite you to read their interpretations. All I’d say is that the clustering of male Thai lineages is very interesting, and is well explained by the model of groups of related men being intrusive to a region, and taking wives from the indigenes. In contrast the Mon-Khmer Y chromosomal lineages scatter about more, and that may be due to the fact they coalesce back to common ancestors far further back in history. The intrusion of the Thai into Southeast Asia may then be demographically characterized by a migration of male warbands. In regions where these warbands managed to topple the previous order, as in central mainland Southeast Asia, they may have then monopolized access to women and entered into a period of demographic expansion.

Luckily we do have some thick-marker autosomal data. To the left I’ve reedited a figure generated with the HUGO Pan-Asian data. The bar plot is at K = 14. I’ve excised many of the extraneous populations. The colors within the bar plot correspond to associations with broader language families. So red seems to be Austro-Asiatic, while blue is Thai. You can see in the figure that the Chinese Thai lack the red Mon-Khmer component. Interestingly the the Hmong of upland Southeast Asia, who are culturally marginal to the dominant Theravada Buddhist culture of the lowlands, exhibit evidence of very sharp differentiation from the Thai and the Austro-Asiatic groups. They lack the affinity with island Southeast Asians, Malays, and Taiwanese Aborigines, which seems common amongst the South Chinese more broadly. The Karen of Thailand are probably the best proxy we have for the Tibeto-Burman people of Burma, who post-date the Austro-Asiatic, and predate the Thai. Going by these data it looks as if the Karen are very hard to differentiate from the Austro-Asiatic populations, though very distinctive from the Thai.

The Pan-Asian data set leaves a lot to be desired. There’s not much coverage of the east or west. I suspect that Southeast Asia is going to be somewhat complex, and extrapolating from the correlations between languages and genes in Thailand is going to get us only so far. But it’s a start. In Strange Parallels the author makes the case that mainland Southeast Asia can tell us a lot about generic Eurasian historical process. I hope, and suspect, that it can tell us something more general about the interplay between language and genes over time in other regions as well.

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"