One out of five people in the world today are of the Han ethnicity. Colloquially known as Chinese. Like the West China has a long history, and its development can be traced, more or less, over the past 3,000 years. Because of the history of a system of taxation coordinated from the center we also know about aspects of its demographic expansion as a social, cultural, and biological entity from the North China plain south toward the edges of Southeast Asia (e.g., between the Tang and Song there was a shift in taxation from the northern provinces to the southern ones because of demographics). The Retreat of the Elephants: An Environmental History of China documents the movement out of the north, and eventually the shift of the center of Chinese civilization at an equipoise between the subtropical rice consuming south threading arable sections around rugged panoramas, and the old north, where a continental temperate climate characterized fields of millet and wheat and an open landscape. These environmentally contingent models of economic and agricultural production have even been used to infer broader social-cultural patterns which characterize Chinese civilization, such a recent paper in Science, Large-Scale Psychological Differences Within China Explained by Rice Versus Wheat Agriculture.
But when you are focused on the genetic origins and distribution of Chinese populations, the answers are a bit different from the cultural history. In History and Geography of Human Genes L. L. Cavalli-Sforza reported that North and South Chinese were genetically very distinct; with the northern populations being closer to northern Northeast Asians and the southern ones closer to Southeast Asians than either were to each other. He was wrong. Genome-wide analyses make it clear that Chinese populations exhibit relatively little intra-ethnic variation, though the southern groups are closer to Southeast Asians, in particular Tai and Vietnamese, and the northern Chinese are similar to Koreans and other Northeast Asians.
To get a sense of this, I plotted some East Asian HGDP groups with 1000 Genome Chinese on Pcaso. You can manipulate and examine the PCs yourself. What you see is that Southern Chinese are very distinct from the HGDP samples from northern China. The individuals from Beijing span the whole range of Han variation, probably because Beijing is a cosmopolitan city. Across PC 1 the South Chinese are clearly positioned between the North Chinese and Tai and Vietnamese. Fromm this can we conclude that the South Chinese emerge from an admixture event between migrants from the north and indigenous peoples? Not necessarily. Or at least there may be more to the story than a PCA can tell us.
I ran TreeMix 10 times, and the graph to the left is pretty representative (I rooted with Cambodia and removed some of the groups you can see in the PC). You can view all the other plots in Dropbox. These graphs do seem to suggest that the South Chinese population has received substantial admixture from an indigenous Southeast Asian population. What I’m curious about though is the relationship of central Chinese ethnic minorities like the She people to the Han majority. On the PC plots the She and Southern Chinese are basically in the same position. But not so in TreeMix, where the long branch out toward the She tip indicates some sort of bottleneck or lower effective population. In addition, the Southern Chinese are near the She, but the gene flow is moving from a Tai or Vietnamese group on TreeMix. Why?
One model which we can’t necessarily reject at this point without further investigation is that like the Hui the ethnic minorities across China resemble nearby Han because of gene flow form the Han. Another model is that the Han absorbed in totality indigenous groups very different from the ones which were, and are, resident in the rugged hinterlands, and are today national minorities. Finally, there is the possibility that the North Chinese themselves are complex mixes due to intrusion of Turkic groups between the Han and Sui-Tang, and later back-migration from Central China as the empire expanded in comparison to barbarian groups.
Finally, the genetic homogeneity of Han and many of their national minorities (the Fst values are invariably small) suggests to me that all underwent agricultural expansion during the Holocene, but there was a second stage where the proto-Han marginalized the other groups to become so numerically preponderant. This explains the recent coalescence of ancestries across many of these populations, and the weak genetic differentiation between the Han and minorities.