There was a question below about the relationship of the Cambodians to the Vietnamese. Both these groups speak Austro-Asiatic languages. As can be inferred from the figure to the left, these languages are predominant in Indo-China. In particular, Cambodian and Vietnamese are both Austro-Asiatic. Traditionally, as noted in the comments, Cambodian is classed with the Mon language of Burma, under Mon-Khmer (as is faintly evident on the map Mon languages seem to have been spoken in what is today central Thailand before their replacement by Thai). There are also Austro-Asiatic languages on the fringes of southern China, though again the map is very illuminating, as the pattern of fragmentation is often indicative of marginalization and language replacement. Finally, there are Austro-Asiatic languages spoken in India, and among the indigenous people of central Malaysia, who are often termed Negritos (as well as the Nicobarese). As I have explored in depth elsewhere, there is now strong suggestive evidence from the genetics that the Austro-Asiatic languages are intrusive to South Asia from Southeast Asia. As the Negritos of the Phillipines also speak the language of nearby Austronesian agriculturalists, as do the Pygmy of the Congo, it seems likely that Malaysian Negritos received their language from agriculturalists.
Peter Bellowood fleshes out most of the details in First Farmers of how agricultural came to Southeast Asia (highly recommended, though it’s a little out of date now in some areas). At an archaeological site in northern Vietnam Bellwood describes burial grounds dating to 4,000 years in the past where two distinct groups are evident in the remains. One set of skeletons resembles modern East Asians morphologically, while the other element exhibits broad similarities to Near Oceanian peoples. He terms these “Austro-Melanesians.” Frankly, I think this is a confusing term. Though it seems likely that these groups are part of the broader range of populations which gave rise to modern Southeast Asian Negritos, like Papuans and Australian Aborigines they were in no way diminutive. So terming them “Proto-Negrito” would seem misleading. Therefore, I will term then “Ancestral Southeast Asians,” or ASA. The genetics points to the likelihood that as substantial minority of the ancestry of modern Southeast Asians derives from the ASA, in various quanta.
The best paper I know of in relation to the genetic history of Southeast Asia, maritime and mainland, is Reconstructing Austronesian population history in Island Southeast Asia. They used the PanAsian data set, which is somewhat thin on SNPs (<100,000), and also spotty in population coverage. The figure above shows one of the primary results. It seems that agriculture came to Southeast Asia in two major waves. First, with Austro-Asiatic peoples. And later, with Austronesians. The latter seem to have settled maritime Southeast Asia, where archaeological evidence of agriculture is thin to nonexistent before they arrived. But, as you can see from the figure many maritime Southeast Asian peoples also have signatures of Austro-Asiatic ancestry. The likely case then is that they picked this up en route, though there may also have been indigenous people in the islands when they arrived. But curiously, not in the east. There a Melanesian ancestral component is present, which has affinities to that contributed to modern Filipino ancestry from Negritos. The 2011 paper which posits two distinct elements before agriculture between mainland Southeast Asia and Papua would make sense of this pattern. The division probably followed Huxley’s Wallace Line.
As I said above, the PanAsian data set is spotty on population coverage. There are lots of obscure tribes, but not so much when it comes to the numerous people of mainland Southeast Asia. I have some data to probe these questions. Unfortunately not all of it is public, so I can’t release it (though some of it is from the 1000 Genomes, Estonian Biocentre, and HGDP, so you can find much of it it elsewhere).
The data set has 150,000 SNPs, with ~0 missingness (I just removed anything that had missing calls). I labeled samples from countries without ethnic provenance by those nation names. Additionally, I already did some preliminary outlier removal (e.g., removing Filipinos with non-trivial European ancestry, etc.).
The first plot shows Indians and Papuans away from a cluster of Southeast and Northeast Asians. There are a lot of Southern Chinese from the 1000 Genomes, as well as Koreans. The Burmese are the first out toward Indians. The cluster that pushing itself toward Papuans are Filipinos. This makes sense in light of what we know bout Philippine Negritos. They are probably not descended from ASA, but rather a sister population, highly diverged, and with greater affinities to the peoples of Near Oceania. While the first plot shows PCs which separate both Southeast and Northeast Asians from Indians an Papuans, the second plot separates Southeast Asians among themselves. The north to south axis seems to align with a cline of Austro-Asiatics. The axis east to west runs toward Austronesians. Intriguingly there are three Indonesian samples which span the two axes, exactly in lines with the results of the paper above. Vietnamese and Dai are pulled more toward the Cambodians. Toward the top of the plot are Koreans, while the very dense cluster includes Southern Chinese, as well as assorted Southern Chinese ethnic minorities. There’s a few Malaysian samples in there. Unlike the Indonesians they are drawn much closer to the Southern Chinese cluster, but not quite in it. These may be Baba Chinese. I was surprised there weren’t more Overseas Chinese in these data. But there were some. It’s interesting that the Indians are close to the Chinese cluster, rather far from the Cambodians. I think that the Cambodian cline is probably indicative of ASA ancestry fraction.
To get more clarity a bigger plot with more labels:
The Lahu are a group that is in both southern China and Burma. You see the influence of geography, as they are right below the Burmese samples. Thailand is interesting, because one of the individuals is close to the Cambodians. The others sit atop the Dai and the Vietnamese. With only four individuals you can only say much. But I think we’re seeing a cline or structure within Thailand. Some regions are linguistically Thai, but have barely any genetic footprint from the Tai migrations. In other regions the impact of the newcomers might have been overwhelming. Until we get more samples from indigenous Thai we can’t say much. For all I know the three individuals clustering with the Dai and Vietnamese are of partial Chinese heritage.
Let’s look at the first plot zoomed in with some more labels:
You see now that one of the Indonesians seems to have a lot of Papuan-like admixture. They may be from the east. Another is very like the Burmese. And a few are clearly Overseas Chinese, and a last is shifted toward the Indians. The Burmese in these samples really seem to be two groups. I bet the ones further away from the Dai are ethnic Burmans or Mon. The others may be Shan, the descendants of Tai, at least in part.
I’m going to run TreeMix now. I will remove the Overseas Chinese, and the Malaysians. I’ll also remove the Thailand samples, as I don’t know whether they’re Chinese or not (I could run ADMIXTURE, but I’m tired). I rooted it with Papuan and did 5 migrations. Plots below.
OK, the strangest thing you’ll notice is that Burmese are placed near the Koreans, but they seem to have admixture from a population with a lot of Indian affinities. The Shan show this pattern, but much more mildly. I think this actually answers even more fully what’s going on among East Bengalis. The Burmese are a compound population, of newcomers from the north, rather deep into China, and the long resident Austro-Asiatic population. Mons. It was probably a pretty singular pulse event to get picked up so well. These later went to eastern India, and mixed with the proto-Bengalis, who probably already had some Austro-Asiatic Munda ancestry. Additionally, the connection between the Burmese and Naxi minority in China is not surprising; they speak related languages. The Miao are near the Southern Chinese. They are also quite closely related to the Hmong, who are a southern branch. The gene flow between the Filipinos and Indonesians makes sense. These are the two Austronesian groups. Finally, notice that the Dai are closest to the Chinese, then the Vietnamese, and most distant the Cambodians.
Let’s make the TreeMix plots cleaner by removing some of the groups:
Alright, what’s the upshot of these plots?
1) We see that the Cambodians are a hybrid of a population like the Dai of South China, and something somewhat Indian-like, but not totally.
2) The Vietnamese have a very faint gene flow from the Cambodians. Some of the samples which were Vietnamese look to me like they were ethnic minorities. I left them in because that is part of Vietnam’s genetics. They may simply be assimilated minorities or Khmer.
3) The gene flow into the Burmese is between Indian-like and Cambodian-like. It’s hard to parse out the distinctions, but there’s probably recent gene flow into Burma (I removed a few outliers as it is). Additionally, the ASA ancestry in the Cambodians may be higher than in the Burmese, because the latter have had more dilutions. But remember that I think there’s Indian ancestry in the Cambodians too.
4) The Negrito ancestry among Filipinos is pretty obvious in the gene flow.