I was having a discussion on Twitter with Jessica Chong about the nature of Chinese genetic variation. There’s been a fair amount of work on it. But, I have the 1000 Genomes data, in addition to others, and wanted to place them in their proper context myself. First, I did a preliminary PCA, and it was clear that the 1000 Genomes Northern Chinese (CHB) had a lot of Southern Chinese, and the Southern Chinese (CHS) were two distinct clusters (CHB was collected at a university). Looking up the provenance of these samples, it turns out that CHS were collected in Hunan and Fujian. So from these probably corresponded to two clusters I found in the data.
In History and Geography of Human Genes L. L. Cavalli-Sforza reported that Southern Chinese formed a clade with Southeast Asians, while Northern Chinese formed one with Northeast Chinese. Genome-wide results don’t seem to support this inference. The Han do exhibit north-south structure. But, they’re not that diverse for more than one billion individuals (Fst lower than Intra-European). As observed in whole genome sequence analyses the Han Chinese have undergone massive demographic expansion over the past 5,000 years.
I decided to run TreeMix to explore this issue further. I was prompted by the observation that North and South Chinese often show gene flow from northern and southern East Asian ethnic groups. I pushed the data set’s number of migrations to 10. This is high, I wouldn’t normally do this, but I wanted to see if there was any consistent gene flow to Han Chinese, even if it wasn’t one of the marrow edges. The results are below in the plots.
This what I can say:
1) The North Chinese have a faint migration edge from nonspecific northern Asians. Probably this is a composite signal of the past few thousand years. Or, they’re an old signal of the absorption of groups from antiquity such as the Rong and Di.
2) The Southern Chinese do have closer affinities to southeast Asian groups and ethnic minorities in the south. The group I labeled “South_China2″ is more Southeast Asian in affinity than “South_China.” These are probably Hunanesse and Fuijianese respectively. I drew these conclusions from the fact that the “South_China” group is often near a node close to the She minority, which is present in Fuijian. In contrast, the “South_China2″ cluster is often near the Tuija group, which is present in Hunan.
3) Though the North and South Chinese groups are placed on different branches of the graph in these trees note the strong migration edge, especially into the Fuijian cluster. They’re genetically not that far apart. Observe that on the PCA the southern groups seem between Southeast Asians proper, and Northern Chinese.
4) The Yakut are donors to lots of groups in North China. I’m pretty sure that this is a signal of the Turkic expansions, which the Yakut have affinities too because they’re Turkic.
5) Many of the native ethnic groups of China proper don’t seem to be that different than Han Chinese. In fact, they resemble Han in their own region. This might be gene flow, or, it might just be that the Han for whatever reason were the demographic winners over the last 4,000 years in China proper and marginalized the other groups.