I am wont to say that the genomics of human pigmentation are solved. Arguably this has been one of the major successes of the early GWAS era. In 2005 the postscript to Mutants: On Genetic Variety and the Human Body alluded to the fact that the genetic architecture of pigmentation in humans was relatively mysterious. A year and a half later reviews such as A golden age of human pigmentation genetics where being published. What happened?
First, and foremost, the genetic architecture of human pigmentation variation is characterized by the reality that most of the variation is due to a handful of loci. In other words, skin color is not monogenic Mendelian, but neither is it highly polygenic in the same fashion as height or IQ, where variation is distributed across so many loci that alleles have nearly an infinitesimal effect size. The small sample sizes and simple methodologies of aught era genomics were sufficient to capture the relatively large effect variants segregating in many populations. A second major aspect to pigmentation genomics is that the pathways seem strikingly conserved across vertebrates. That means that pelage color research could inform human genetics, and vice versa.Some of the most interesting confirmations of the power of loss of function mutations in humans occurred by inducing a similar change in zebrafish! One inference that I think one might take away from this is that ancient human populations likely exhibited variation due to polymorphism around the same set of loci as modern humans.
But, and there’s a big but, is that though the set of loci which are responsible for pigmentation variation across human populations are familiar, finite, and well characterized, the particular mutations responsible within a given locus varies quite a bit. Because derived mutations which result in reduced pigmentation are mostly loss of function all you need to do is “break” the functionality in some manner. Therefore, you might target a regulatory element, or, the exonic sequence itself, but the possibilities are rather numerous. Heather Norton’s publication from 2007, Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians, is still rather relevant. For various reasons the pigmentation of Europeans has been well elucidated. That means that to a great extent the variation in West and South Eurasians more generally (and North Africans) is well understand because most of the same variants seem to be at play. The big lacunae, as pointed out by Norton et al., concerns East Asians. This is a population which is light-skinned, but lacking in the typical set of European “light” alleles.
The title of the post is “white-skinned”, and not “white”, because the conventional understanding is that East Asians are not white. That term is reserved in world-wide usage for people of European descent (or to a lesser extent related peoples, such as Turks) for historical and cultural reasons.
But this is a recent development. From what I am to understand historically the peoples of Northeast Asia did refer to themselves as white in contrast to the browner people of Southeast Asia (in an analogous fashion, the people of West Asia as far east as Afghanistan consider themselves white, in contrast to the black people of South Asia). Additionally, when Europeans first encountered Northeast Asians in large numbers in the 16th century they observed that physically the people of nations such as Japan and Korea were white in color. Only with total domination of the globe by Europeans in the 19th century did the identification of white and European become such as that Northeast Asians were classed among the “colored” peoples (the appellation “yellow” was taken up by early 20th century East Asian intellectuals). But both quantitative empirical evidence and simple visual inspection can remind us that many Northeast Asians are as light in complexion as many Europeans, albeit never as pale as many Northern Europeans.
A new paper in Molecular Biology and Evolution, A genetic mechanism for convergent skin lightening during recent human evolution, goes a major step toward pinpointing what is going on in a functional sense in relation to East Asians. In fact they’re doing what occurred ten years ago for Europeans. First, they’re finding the variant through GWAS, and second, they are confirming through molecular methods and animal models that the variant of interest is actually the causal mechanism. And, they are also attempting to establish a temporal narrative by adducing signatures of selection.
The major finding is that variation on a particular SNP in OCA2 is responsible for differences in pigmentation across many groups in eastern Eurasia. You should remember OCA2, since the region that spans it and HERC2 accounts for the pattern of blue and brown eye variation in Europeans. The SNP, rs1800414, is in the ancestral state in Europe and Africa, but derived in Northeast Asia. The results from the left are from the HGDP browser. The only thing is that I can’t find the SNP on the browser. So I looked for that particular SNP on my own HGDP data sets, and couldn’t find it. The SNP is in ALFRED, and you can see that the results are somewhat different. The HGDP results (which for whatever reason I can’t replicate) show that the derived allele is modal in Northeast Asia, and, that it is present in the New World. In contrast, the ALFRED map shows that the derived allele is modal among more southerly groups (including indigenous non-Han groups in South China), and absent in the New World. The 1000 Genomes has fewer populations, but large sample sizes. The allele frequency in Japan in the 1000 Genomes matches Alfred more than the HGDP results.
All that being said, the general stylized facts are in alignment. The derived allele is common on the eastern coastal region of Eurasia, and nearly absent in Africa, Europe, and West and South Asia. But a curious aspect to me is that in the 1000 Genomes data the allele is nearly as absent in the Bangladeshi samples as it is in other South Asians. In contrast, the derived variant of EDAR, which is diagnostic of East Asian or Amerindian ancestry, is present at 5% frequency in Bangladeshis, about what you would expect assuming the attested levels of gene flow from an East Asian population. While the authors in the above study found that the effect of the allele is additive, it is curious that in the 1000 Genomes there is no variation across Japanese, North and South Chinese, and Vietnamese. The implication is that the average between group differences across these populations has to be due to variation on other loci. The indigenous Dai people in fact had the highest frequency of the derived allele in the 1000 Genomes.
A final issue that is important to note is that the phylogenetic framework the authors are using is probably wrong. The major value-add of this paper is that they include several Austro-Asiatic populations to the data set, and compared individuals phenotypically between the Austro-Asiatic group and among the Han Chinese. Because the supplemental information isn’t online I don’t know which Austro-Asiatic groups they included in China, but there aren’t too many, so one can guess. The main problem though is that they presume these Austro-Asiatic are basal to the Han. This probably isn’t true. Rather, there was probably a migration of early rice farmers from what is today China proper southward, that resulted in the spread of the Austro-Asiatic languages to Southeast Asia and further west toward India. Vietnamese and Cambodian are two numerous languages which are Austro-Asiatic. Bringing together all the genomic evidence, it seems that a substantial minority of the ancestry of these Austro-Asiatic people are from the descendants of hunter-gatherers who were resident in Southeast Asian during the Pleistocene, but the majority of their ancestry derives from farmers who pushed south.
These details matter because the authors estimated how deep the selection sweeps around this locus must be in terms of time. Using two methods they arrive at a figure between 10 and 15 thousand years (one method is closer to 10, another to 15). That implies that selection began before the Holocene. The interpretation the authors put on these results is that the northern East Asian groups experienced selection as they migrated up from Southeast Asia during the Pleistocene, with the Austro-Asiatic groups being basal and reflecting the ancestral state. The problem, as I suggest above, is that the Austro-Asiatic populations are a compound of genuinely basal groups (their minority ancestry) to the Northeast Asians, and a population to which other Northeast Asians further north may be basal!
One thing Eight thousand years of natural selection in Europe tells us using ancient DNA that a history of admixture is important to understanding the specific dynamics of selection. Though the haplotype based methods were roughly correct, they did not exhibit the granularity necessary to make fine-grained inferences, and did not totally predict what the empirical ancient DNA is telling us about allele frequencies across time. For example, earlier attempts to infer the selection sweep which resulted in high frequencies of SLC45A2 in Europe arrived at a figure a bit north of ~10,000 years. But it seems that a great deal of selection on this locus has been occurring more recently than 5,000 years.
And on a final note, I would point out that the intermediate frequencies of the derived allele in much of East Asia are suggestive to me that the genuine target of selection here is not skin color, but a dominant trait. The fact that the derived allele is nearly absent in Bangladeshis indicates that either the sweep up in frequency is very recent, so that not all East Asian populations experienced it, or, more likely to my mind, there is constraining selection on the trait which is the genuine target of interest in other genetic backgrounds. To decrypt what I’m saying, the derived allele is probably useful in East Asia, but entails some cost. South Asians may already have another allele which gains the same function, and so the cost resulted in purification of the derived allele in Bangladeshis (who are ~10% derived from a group very similar to the Dai).
As should be clear, this paper has some confusions. But it’s a taste of things to come. There are many Chinese who are interested in the genomics of their region, and ancient DNA should begin to unveil the past in the next few years.