The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

korean I am wont to say that the genomics of human pigmentation are solved. Arguably this has been one of the major successes of the early GWAS era. In 2005 the postscript to Mutants: On Genetic Variety and the Human Body alluded to the fact that the genetic architecture of pigmentation in humans was relatively mysterious. A year and a half later reviews such as A golden age of human pigmentation genetics where being published. What happened?

First, and foremost, the genetic architecture of human pigmentation variation is characterized by the reality that most of the variation is due to a handful of loci. In other words, skin color is not monogenic Mendelian, but neither is it highly polygenic in the same fashion as height or IQ, where variation is distributed across so many loci that alleles have nearly an infinitesimal effect size. The small sample sizes and simple methodologies of aught era genomics were sufficient to capture the relatively large effect variants segregating in many populations. A second major aspect to pigmentation genomics is that the pathways seem strikingly conserved across vertebrates. That means that pelage color research could inform human genetics, and vice versa.Some of the most interesting confirmations of the power of loss of function mutations in humans occurred by inducing a similar change in zebrafish! One inference that I think one might take away from this is that ancient human populations likely exhibited variation due to polymorphism around the same set of loci as modern humans.

But, and there’s a big but, is that though the set of loci which are responsible for pigmentation variation across human populations are familiar, finite, and well characterized, the particular mutations responsible within a given locus varies quite a bit. Because derived mutations which result in reduced pigmentation are mostly loss of function all you need to do is “break” the functionality in some manner. Therefore, you might target a regulatory element, or, the exonic sequence itself, but the possibilities are rather numerous. Heather Norton’s publication from 2007, Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians, is still rather relevant. For various reasons the pigmentation of Europeans has been well elucidated. That means that to a great extent the variation in West and South Eurasians more generally (and North Africans) is well understand because most of the same variants seem to be at play. The big lacunae, as pointed out by Norton et al., concerns East Asians. This is a population which is light-skinned, but lacking in the typical set of European “light” alleles.

Unlabeled_Renatto_Luschan_Skin_color_map.svgThe title of the post is “white-skinned”, and not “white”, because the conventional understanding is that East Asians are not white. That term is reserved in world-wide usage for people of European descent (or to a lesser extent related peoples, such as Turks) for historical and cultural reasons.

But this is a recent development. From what I am to understand historically the peoples of Northeast Asia did refer to themselves as white in contrast to the browner people of Southeast Asia (in an analogous fashion, the people of West Asia as far east as Afghanistan consider themselves white, in contrast to the black people of South Asia). Additionally, when Europeans first encountered Northeast Asians in large numbers in the 16th century they observed that physically the people of nations such as Japan and Korea were white in color. Only with total domination of the globe by Europeans in the 19th century did the identification of white and European become such as that Northeast Asians were classed among the “colored” peoples (the appellation “yellow” was taken up by early 20th century East Asian intellectuals). But both quantitative empirical evidence and simple visual inspection can remind us that many Northeast Asians are as light in complexion as many Europeans, albeit never as pale as many Northern Europeans.

A new paper in Molecular Biology and Evolution, A genetic mechanism for convergent skin lightening during recent human evolution, goes a major step toward pinpointing what is going on in a functional sense in relation to East Asians. In fact they’re doing what occurred ten years ago for Europeans. First, they’re finding the variant through GWAS, and second, they are confirming through molecular methods and animal models that the variant of interest is actually the causal mechanism. And, they are also attempting to establish a temporal narrative by adducing signatures of selection.

rs180014 The major finding is that variation on a particular SNP in OCA2 is responsible for differences in pigmentation across many groups in eastern Eurasia. You should remember OCA2, since the region that spans it and HERC2 accounts for the pattern of blue and brown eye variation in Europeans. The SNP, rs1800414, is in the ancestral state in Europe and Africa, but derived in Northeast Asia. The results from the left are from the HGDP browser. The only thing is that I can’t find the SNP on the browser. So I looked for that particular SNP on my own HGDP data sets, and couldn’t find it. The SNP is in ALFRED, and you can see that the results are somewhat different. OCA2_labels_S The HGDP results (which for whatever reason I can’t replicate) show that the derived allele is modal in Northeast Asia, and, that it is present in the New World. In contrast, the ALFRED map shows that the derived allele is modal among more southerly groups (including indigenous non-Han groups in South China), and absent in the New World. The 1000 Genomes has fewer populations, but large sample sizes. The allele frequency in Japan in the 1000 Genomes matches Alfred more than the HGDP results.

All that being said, the general stylized facts are in alignment. The derived allele is common on the eastern coastal region of Eurasia, and nearly absent in Africa, Europe, and West and South Asia. But a curious aspect to me is that in the 1000 Genomes data the allele is nearly as absent in the Bangladeshi samples as it is in other South Asians. In contrast, the derived variant of EDAR, which is diagnostic of East Asian or Amerindian ancestry, is present at 5% frequency in Bangladeshis, about what you would expect assuming the attested levels of gene flow from an East Asian population. While the authors in the above study found that the effect of the allele is additive, it is curious that in the 1000 Genomes there is no variation across Japanese, North and South Chinese, and Vietnamese. The implication is that the average between group differences across these populations has to be due to variation on other loci. The indigenous Dai people in fact had the highest frequency of the derived allele in the 1000 Genomes.

Austroasiatic-en.svg A final issue that is important to note is that the phylogenetic framework the authors are using is probably wrong. The major value-add of this paper is that they include several Austro-Asiatic populations to the data set, and compared individuals phenotypically between the Austro-Asiatic group and among the Han Chinese. Because the supplemental information isn’t online I don’t know which Austro-Asiatic groups they included in China, but there aren’t too many, so one can guess. The main problem though is that they presume these Austro-Asiatic are basal to the Han. This probably isn’t true. Rather, there was probably a migration of early rice farmers from what is today China proper southward, that resulted in the spread of the Austro-Asiatic languages to Southeast Asia and further west toward India. Vietnamese and Cambodian are two numerous languages which are Austro-Asiatic. Bringing together all the genomic evidence, it seems that a substantial minority of the ancestry of these Austro-Asiatic people are from the descendants of hunter-gatherers who were resident in Southeast Asian during the Pleistocene, but the majority of their ancestry derives from farmers who pushed south.

These details matter because the authors estimated how deep the selection sweeps around this locus must be in terms of time. Using two methods they arrive at a figure between 10 and 15 thousand years (one method is closer to 10, another to 15). That implies that selection began before the Holocene. The interpretation the authors put on these results is that the northern East Asian groups experienced selection as they migrated up from Southeast Asia during the Pleistocene, with the Austro-Asiatic groups being basal and reflecting the ancestral state. The problem, as I suggest above, is that the Austro-Asiatic populations are a compound of genuinely basal groups (their minority ancestry) to the Northeast Asians, and a population to which other Northeast Asians further north may be basal!

One thing Eight thousand years of natural selection in Europe tells us using ancient DNA that a history of admixture is important to understanding the specific dynamics of selection. Though the haplotype based methods were roughly correct, they did not exhibit the granularity necessary to make fine-grained inferences, and did not totally predict what the empirical ancient DNA is telling us about allele frequencies across time. For example, earlier attempts to infer the selection sweep which resulted in high frequencies of SLC45A2 in Europe arrived at a figure a bit north of ~10,000 years. But it seems that a great deal of selection on this locus has been occurring more recently than 5,000 years.

And on a final note, I would point out that the intermediate frequencies of the derived allele in much of East Asia are suggestive to me that the genuine target of selection here is not skin color, but a dominant trait. The fact that the derived allele is nearly absent in Bangladeshis indicates that either the sweep up in frequency is very recent, so that not all East Asian populations experienced it, or, more likely to my mind, there is constraining selection on the trait which is the genuine target of interest in other genetic backgrounds. To decrypt what I’m saying, the derived allele is probably useful in East Asia, but entails some cost. South Asians may already have another allele which gains the same function, and so the cost resulted in purification of the derived allele in Bangladeshis (who are ~10% derived from a group very similar to the Dai).

As should be clear, this paper has some confusions. But it’s a taste of things to come. There are many Chinese who are interested in the genomics of their region, and ancient DNA should begin to unveil the past in the next few years.

• Category: Science • Tags: Genomics, Pigmentation 
🔊 Listen RSS

516JD1M3N5L._SX323_BO1,204,203,200_ I got curious about pigmentation about ten years when reading the coda to Armand Leroi’s Mutants: On Genetic Variety and the Human Body, where he observes curiously that after all these decades geneticists still didn’t understand very well the basis of normal variation in skin color. I read that in the summer of 2005, so Armand had probably written it in 2004 (he can correct me if he has time, he occasionally comments here). Depending on how you view it, it was a fortunate or unfortunate time to write something like this. Over the past ten years geneticists have solved the basis of normal variation in human pigmentation. In fact, most of the major work was completed between 2005 and 2007. In December of 2005 Science published SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans. The authors reported that rs1426654 was nearly disjoint in distribution between Africans and Europeans, and, that it explained on the order of 1/3 of the variance in pigmentation between the two populations (European populations are fixed for the A allele, Africans for the G allele).

41h+3YmTZRL._SX329_BO1,204,203,200_ There are several facts just within that statement that illustrates why pigmentation genomics has been such a success in comparison to other domains tackled by the new methods. First, pigmentation pathways seem to be somewhat constrained across animals, so model organisms can given us a lot of insight and clues. A lot of the pigmentation genes, such as KITLG, TYR, and SLC24A5, actually increase or decrease melanin production and alter tissue specific expression just as they do in humans, across vertebrates. Second, the fact that I just named genes off the top of my head highlights the fact that are a few conserved loci that explain most of the variance, crop up in study after study. This is in contrast to height, where the variance is distributed across thousands of genes, and the only one I can name off the top of my head is HGMA2. And it explains a princely ~0.3% of the variance of the trait.

This wasn’t entirely a surprise. I happen to have had a copy of The Genetics of Human Populations. In it, L. L. Cavalli-Sforza reported on a classical pedigree analysis of individuals in Britain of varying levels of African ancestry dating to the 1950s. In particular, in genetic jargon the study focused on the variance in trait values between parentals, F1 individuals, and “back-cross” individuals (as well as a few F2 individuals from what I recall). The research concluded that pigmentation was probably controlled by on the order of 10 genes or so. In particular, the authors suggested that the trait was unlikely to be highly polygenic, which for the designs of that period really meant more than a dozen loci or so, beyond which they lacked the power to differentiate the number of independent effects with any precision (i.e., they wouldn’t be able to distinguish between a trait where 25 loci explain 90% of the variance, and a trait where 500 loci explain 90% of the variance). Third, pigmentation loci exhibit a relatively high pairwise Fst. That is, most of the variation on many of these alleles is partitioned between populations, rather than within them. Obviously that is convenient when you are trying to detect associations between genes and phenotypes which are partitioned on an inter-continental scale.

The illustration with SLC24A5 is pretty straightforward; the frequency of the derived allele is 100% in Europeans, and over 99% ancestral in unadmixed Sub-Saharan Africans. In the 1000 Genomes frequency in the Utah white American sample of the derived A allele is 100% (out of 99 individuals). In the 91 British individuals it is 100%. In the Tuscan set of 107, there are 213 A alleles, and 1 G allele. In the 107 Spanish individuals, the A allele is at 100%. In contrast, for the Yoruba Nigerian data set, there are 3 A alleles for 213 G variants. For the Esan of Nigeria, it is 5 A for 193 G. For the Chinese samples from Beijing, 6 A alleles, and 200 G. At this point you might think that the A variant at this SNP position is diagnostic of European ancestry, but it is not. I, for example, am homozygous for the A variant, as are both of my parents. In the 1000 Genomes data there are 25 Bengalis who are AA, 42 who are AG, and 19 who are GG. In the Sri Lankan Tamil population A is at 49% frequency.

F1.medium The figure to my left is from Heather Norton’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians, and it uses neighbor-joining trees to represent genetic distances at particular loci then known (2007) to be implicated in inter-continental variation in pigmentation. The abbreviations are pretty self-evident, WA=West African, NA = Native American, EA = East Asian, IM = Island Melanesian, SA = South Asian, and EU = European. What you see is that pigmentation genes are not particularly phylogenetically representative. That is, whole genome relationships, whereby all non-Africans form one clade set against Africans, are not reflected here. Looking at these patterns, you would have inferred that Europeans were the outgroup. And, the lowest genetic distance from West Africans are Island Melanesians. What’s going on here is Island Melanesians and West Africans have similar phenotypes in skin color, and that is being reflected in these genes. Roughly, Melanesians and West Africans exhibit a fair amount of functional constraint around pigmentation genes. They haven’t changed much. In contrast, East Asians and Europeans actually are not too different in their pigmentation on a world-wide scale, but that is not reflected in these trees. Why? As is made clear in the title of Norton et al.’s paper East Asians and Europeans arrived at their phenotypes via different mutational paths. I say different mutational paths because there is a broad overlap in genes, but, the alleles are often different (different SNPs or regulatory elements within the gene).

One of the questions that I often get is how to translate genetic variation into realized trait value shifts in individuals, as opposed to simply proportion of variation explained within the population. Luckily, geneticists who study pigmentation have a quantitative unit, a “melanin index” (MI), which naturally utilizes the fact that individuals with darker skin exhibit less reflectance. But there are two problems giving a simple answer to these sorts of questions. First, a substitution of an allele may have an average effect, but, that effect may not be realized for various reasons (e.g., epistasis). And there are still individual differences between people with the exact same genotype. Second, that effect manifests within a population, and different populations have different mixes of alleles.

Screenshot from 2015-08-14 22:52:45 The table to the left is adapted from The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. I think we can agree that the results here fit our intuitions. These are averages. Some of the populations in this list, such as the South Asian ones, as well as African Americans, exhibit a lot of variance within population. We now know why; they have a lot of segregating variants. Even within families you can see variation across siblings of quite an extreme nature. The subtle difference between Europeans and East Asians comports with my experience too. The American white population is mostly Northern European, so this is probably a bit on the low side in MI for a typical European population. A paper on Cuban pigmentation genetics given a median MI for self-identified whites as 34. The ancestry is 86% European, 7% African, and 7% Native American, in this set. Therefore the average Iberian probably is somewhat lighter complected, but not by much. Notice how much darker Bougainville Islanders are than African Americans. Though the latter may be “black” in figurative terms, Bougainville Islanders are black in literal terms. Along with some Sudanic people they are among the darkest skinned in the world. In these data Tamil Brahmins are at 41. These are people whose surnames are often, but not always, Iyer. The stereotype, and my personal experience, is that the modal Tamil Brahmin is light to medium brown. Some are rather dark, while a few may have complexions that veer on brunette white. To be honest in my personal experience I have not met any Tamil Brahmins whose skins are white, though it has not been uncommon for me to meet such individuals with such fair skins from Northwest India, in particular Punjabis and Kashmiris (the best way to judge for me is meeting people in real life, as I’ve heard that Indian celebrities often are made up in a way that lighten them up somewhat).

The supplements of the paper have allele frequencies of SLC24A5 for various castes. Kashmiri Pandits are at >95% frequency for the A allele. Other Brahmins are at ~80%, irrespective of whether they are in the North or South. Punjabis, irrespective of caste are at ~95%. Middle castes in South India, like the Reddy and Naidu, are at ~60 to 65%. Chamars, a Dalit caste in North India clock in at 68%, while the Toda people of the Nilgiri plateau of the far south of India have a derived allele frequency of 86%. The low caste individuals in Bihar at 78%. At the other end of the distribution some of the Austro-Asiatic tribes have very low frequencies. The Juang people for example are at 7%. Part of this may just be recent East Asian admixture. But it can’t explain all of it, these groups are mostly of the same component elements as other South Asians, albeit at fractions skewed toward the Ancestral South Indians (ASI). I don’t see any geographic pattern that suggests why selection would happen in certain regions and not in others, though it is suggestive that the Kashmiris and Toda are both living at high elevations, so are the Austro-Asiatic groups. I’ll get back to this paper when we talk about selection, but I’ll set it aside for now.

Rather, what are the effects on MI of substitutions of particular alleles at given genes again? The paper on Cuban admixture and pigmentation genetics and another using Cape Verde as the population of interest are particular useful, because these two data sets have a wide range in ancestral quanta (these are not the only papers with these sorts of results, but this post isn’t a literature review!). The figure to the right is from the second paper, and shows the effect size in standardized units of variants which were statistically significant in their study. Pretty much every study tends to come to the conclusion that SLC24A5 is the biggest effect locus in the genome on this trait if the data set includes substantial West Eurasian ancestry. The main qualification I’d put on that is that East Asians have been understudied for this trait, so the European derived alleles are much more well understood. Be as that maybe, each substitution of SLC24A5 derived allele, A, reduces MI by ~5 units. That is, it’s additive to a first approximation. Some studies do show a mild dominance effect…but of the A allele. That is, light is dominant to dark (e.g., in the Cape Verde study GG is further away from GA than AA is). It’s actually a consistent result. This is curious, because many people believe that dark skin is dominant to light skin. Thanks to genetics we know in a quantitative sense that that’s not true. In fact, perhaps the reverse is on SLC24A5 and KITLG (concretely, individuals who are heterozygous will be lighter than you would expect going by mid-parent mean).

But, in a qualitative sense it is true, because many people simply “bin” complexion into white and non-white, with the latter encompassing a range all the way from pale olive-brown to black. Really the perception is a function of human culture, and ideas of contagion. I don’t like to make invidious accusations of racism often (I don’t think they’re warranted most of the time), but the perception that dark skin is dominant over white skin seems pretty easily explained by hypodescent within a framework of white racial superiority and exclusivity. Most people who have this impression are not racist at all, but, as per the cliche they’ve internalized some perspectives about the recessive nature of whiteness which derives from a model whereby racial purity is essential and necessary for white identity. And, as I like to say, revealed preferences are telling. The majority of whites rapturously reading Ta-Nehisi CoatesBetween the World and Me have mostly white friends, live in mostly white neighbors, and date mostly white people. Yes, some of this is happenstance, but a sequence of events which consistently fall in one direction indicate preferences at variance with avowals of racial neutrality (Seinfeld and Girls operate in core white social worlds in a riotously diverse megalopolis where whites are a minority; believe it or not you can be friends mostly with people who are not the same race and exhibit good mental health, just ask me about my experience).

With that sociological tangent out of the way, what does this mean? What if I was GG, instead of AA, on SLC24A5? You would expect I’d be about 10 MI units darker. Instead of being an average complected South Asian, neither dark nor fair, I’d be a dark skinned one. As the above statistics suggest it is very rare to find someone of unadmixed European background who carries a G allele at this SNP. But some do exist in the above data, so what would they look like? Let’s take a Northern European, with an MI ~30. The predicted value is about the same as for a “white Cuban.” In other words, they would be swarthy, notably so in Northern Europe. How about two alleles, so they are a homozygote for the ancestral allele, G. You don’t really see Europeans with this genotype at all today. Assuming all other loci the same (e.g., probably the derived variant on SLC45A2), it looks as if you’d expect this Northern European substituted at that SNP be about the same complexion as many Northern Indians today. Though some Northern Indians can pass as white, they are not common. Most are visibly brown in some sense.

But wait, there’s more! SLC45A2 is not as strong an effect as SLC24A5, but it’s still significant. In the Cuban study a substitution at its major SNP of interest has an effect of ~3 units. If the genotypes at both these loci were ancestral homozygous in a Northern European, then the expected MI would be > 45. That’s around where the Senoi of Malaysia are. Definitely brown, a touch on the darker shade. Then there are other loci, TYR, TYRP1, ASIP, KITLG, and APBA2. Few enough that I can name, but enough that touching on each would be repetitious and boring. SLC24A5 and SLC45A2 seem relevant to pigmentation anytime you have a West Eurasian population in the mix. The other loci are hit and miss. But one thing that comes out of the studies in admixed populations is that there is still a significant residual that has not been accounted for in this variation. In the Cape Verde study 44% of the variance seems to be due to “genomic ancestry.” That is, African vs. European. The implication here is that the loci we’re catching are at the large effect end of the long tail of distribution of effects, and there are smaller effect loci still segregating which we haven’t picked up. In European populations where a lot of this work began only a few large effect loci may be segregating, with the others being fixed, and so not variable. This doesn’t change the big picture about the genomic architecture. But, it’s more like half a dozen loci can explain half the heritable variation, as opposed to 90%. At least in that study (it seems that the population you are studying matters for the final summary statistic).

eye I left OCA2 and HERC2 out of the above list for a reason. Looking at them alone gives me a reason to post this beautiful figure of eye color distributions on a two dimensional axis. As most of you may know, SNPs in the OCA2 and HERC2 region of the genome account for most of the blue vs. brown eye color variation in Europeans. Eye color varies less in human populations, and fewer genes likely effect this variation. In the Cape Verde sample the proportion of variation explained by African vs. European ancestry was 44% (the r-squared). For eye color? A mere 8% (note that they used an RGB quantification scale, rather than binning phenotypes). The correlation between skin color and eye color in this data set was 0.38, so 14% of the variation of eye color could be explained by variance of skin color.

kartandtinki1_vanessa-williams_03.jpgThe combination of brown skin and light eyes in women such as Vanessa Williams, the first black Miss America, is totally understandable. All black Americans with roots in this country have ancestry that goes back to the 18th century at the latest, and all of them have white American ancestry (I’ve looked at a lot of black American genotypes; they’re mostly African, but all have some European ancestry, and I literally mean all). So the derived variants around OCA2 and HERC2 are segregating at frequencies weighted by European ancestry in African Americans, ~20% × 75%, so 0.152, which implies that a few percent of African Americans should have light eyes. While skin color seems mostly additive, eye color does seem to exhibit a recessive expression pattern for the lighter variants. Therefore you need to square the q element of the Hardy-Weinberg equation in this case.

kgt But are the variants that result in blue eyes only relevant for eye color? Might they not explain skin color as well? That depends. The Cape Verde study did not find any of the blue vs. brown eye color SNPs to correlate with skin color when one controlled for genomic ancestry and the state of a nearby pigmentation gene. In contrast, the Cuba study did find that an OCA2 marker had an effect on skin color, a little over 1 MI units. This is a smaller effect compared to SLC24A5 obviously, but it is still an effect. As I indicated above, if you follow this literature you notice that a few genes have major effects no matter how you mix and match the data set and population coverage. Others are spottier, and may not reach statistical significance, depending on your mix of populations. It is important to not make one study dispositive of any particular thesis.

What about hair color? While blue eyes are the majority state in much of Northern Europe, blonde hair in adults is rarer. This makes sense when you notice that one of the major pigmentation genes associated with blonde hair, KITLG, in a derived allele, only has a frequency of that allele at 15% in much of Northern Europe. That means that only a few percent of individuals are homozygote. The above image of mice is from A molecular basis for classic blond hair color in Europeans. The individual in the middle is a heterozygote. The authors claim that they can see a subtle effect. I suppose it’s there if you squint (my son is a heterozygote, and I will report his hair is lighter than his sister’s, who is homozygote for the ancestral variant). The individual to the right in the figure is an pale homozygote for the derived allele. This locus also shows up in cats and horses in generating tissue specific depigmentation, though in humans it has also been implicated in skin color and testicular cancer as well (yes, you read that right!).

But the scientific story about pigmentation isn’t simply one of GWAS after GWAS. There’s a huge evolutionary story here involving classic population genetic parameters, in particular natural selection. Many of these alleles have been implicated in selective sweep events. That is, the allele has increased in frequency very rapidly, often very recently. One major tell is that there are long haplotype blocks around these alleles. This means that there are sequences of variants closely associated with each other, which is suggestive of the fact that they’re co-inherited together as a unit in a region of the genome where the frequency is increasing faster than recombination can break apart the association. The region around OCA2 and HERC2 is Europeans is the third longest haplotype in the Northern European genome. SLC24A5 is a long haplotype that has very little variation in it from which one can infer structure. The paper above, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, the authors sequence the region around that locus to smoke out variation. There just isn’t that much time for the derived allele for to have accrued mutations. They conclude that the SNP in SLC24A5 responsible for lighter skin derives from a common mutation across all the populations in which it is prevalent. That is, the SNP spread through migration or selection from one individual, rather than the extant variation of a population, so that there were several genetic backgrounds from which selection could. A paper from 2013, Molecular phylogeography of a human autosomal skin color locus under natural selection, attempts to look at the haplotype patterns with a bigger population coverage but lower marker density. It comes to the conclusion that “The distributions of C11 and its parental haplotypes make it most likely that these two last steps occurred between the Middle East and the Indian subcontinent.” In other words, the SNP took off from a launching pad in West Asia. If you look at their evidence it is modest at best, they don’t have many variants to generate haplotypes, especially in a genetic region which lacks diversity.

10K0 All this talk about the past has been about inference. In the South Asian paper they use Bayesian methods to infer that the derived allele SLC24A5 arose in a genetic background which coalesces 20-30 thousand years ago, with enormous confidence intervals on the order of tens of thousands of years. You don’t know much more than you already did, as the distribution of the derived variant strongly suggests it arose after East and West Eurasians diverged. Haplotype based methods suggest that the sweep up in frequency increased only in the last 5-10 thousand years.

So what do the ancient DNA tell us? The figure to the left is from Eight thousand years of natural selection in Europe. You can see that there is a transect in time of alleles in Northern Europe. Blue is the variant in SLC24A5, green is SLC45A2, and red is OCA2. The variation in allele frequencies over time are pretty similar to what you’d expect for a positive selective sweep, which is what the genomics is telling us occurred. The sweep of SLC24A5 is to fixation. This makes sense on an additive trait where selection prefers homozygote state to heterozygote state. SLC45A2 is close to fixation, though not as total as SLC24A5. Its trajectory has been more gentle, indicating a lower selection coefficient, a least across its arc up toward fixation. For OCA2 the pattern looks like one of demographic decline, as it was fixed in European hunter-gatherers. And yet at some point the frequency began to increase again. As this region of the genome has a long haplotype it’s suggestive of selection, and not just demographic change. Since blue eyes are recessive one major issue for any selective model that hinges on this trait is how selection would be effective at lower frequencies. E.g., if 20% of the population has the alleles then only 4% of the population has the favored trait.

Of course there is Population Genomics in Bronze Age Eurasia, which has a much larger number of SNPs. But unfortunately as they went with a whole genome methodology, they didn’t target the most important functional markers, but caught a lot of tag SNPs which are associated with the major ones. You can find the list for the populations in the supplements, but there are a lot of other genes. I took the table and filtered it for pigmentation SNPs, and also added the ones from the above paper. There is one overlap, at OCA2. As most of the SNPs are not super critical, I just paired them down to really informative ones. You can access the full spreadsheet here.

Bronze Age
SNP gene Africa N_Eur S_Asia S_Eur Asia Eur Step HG Neo SHG WHG EN BA Yam
rs12821256 KITLG 0.00 0.17 0.03 0.05 0.13 0.07 0.33 0.00 0.10
rs1805005 MC1R 0.00 0.08 0.01 0.20 0.00 0.05 0.00 0.00 0.00
rs1805007 MC1R 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rs1805008 MC1R 0.00 0.07 0.00 0.03 0.00 0.03 0.00 0.00 0.00
rs1805009 MC1R 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rs2228479 MC1R 0.00 0.07 0.09 0.10 0.00 0.13 0.20 0.00 0.00
rs885479 MC1R 0.00 0.12 0.08 0.03 0.09 0.00 0.00 0.00 0.00
rs885479 MC1R 0.00 0.12 0.08 0.03 0.09 0.00 0.00 0.00 0.00
rs12913832 OCA2 0.01 0.85 0.08 0.30 0.40 0.41 0.00 1.00 0.56 1 1 0.5 0.5 0.1
rs2470102 SLC24A5 0.05 1.00 0.73 1.00 0.94 0.95 1.00 0.33 0.88
rs28777 SLC45A2 0.12 0.98 0.23 0.95 0.50 0.61 0.33 0.43 0.56
rs35395 SLC45A2 0.16 0.98 0.23 0.95 0.78 0.56 0.00 0.20 0.33
rs1426654 SLC24A5 0.00 1.00 0.69 1.00 0.65 0.18 0.9 1 1
rs16891982 SLC45A2 0.00 0.98 0.06 0.90 0.65 0.00 0.2 0.75 0.4

erin-chambers-05 I didn’t mention MC1R much above because it doesn’t explain much variance. It’s well known for two things. First, there’s a huge body of research from the era of classical mouse genetics on this locus because of its importance in fur coloration, and coat color across mammals in general. Second, a lot of knockouts at this locus seems a necessary, but not sufficient, condition for being red-haired or a ginger. The decreased production in eumelanin combined with constitutive production of pheomelanin results in a reddish tinge. Most people have pheomelanin, but it’s masked by emelanin. When I’ve bleached my hair there are two stages. First, the eumelanin gets stripped out, and my hair is left reddish/copper colored. Then a second bleaching removes the pheomelanin.

Before the “golden age” of pigmentation genetics, basically between December of 2005 and the end of 2007, there was a lot of exploration of MC1R because that’s where the light was. Here’s a paper from 2000, Evidence for Variable Selective Pressures at MC1R:

It is widely assumed that genes that influence variation in skin and hair pigmentation are under selection. To date, the melanocortin 1 receptor (MC1R) is the only gene identified that explains substantial phenotypic variance in human pigmentation. Here we investigate MC1R polymorphism in several populations, for evidence of selection. We conclude that MC1R is under strong functional constraint in Africa, where any diversion from eumelanin production (black pigmentation) appears to be evolutionarily deleterious. Although many of the MC1R amino acid variants observed in non-African populations do affect MC1R function and contribute to high levels of MC1R diversity in Europeans, we found no evidence, in either the magnitude or the patterns of diversity, for its enhancement by selection; rather, our analyses show that levels of MC1R polymorphism simply reflect neutral expectations under relaxation of strong functional constraint outside Africa.

The basic model here is that MC1R started losing function due to relaxation of constraint, and variation started to become dominated by neutral processes. It turns out that Neanderthals too had variation around MC1R. Further investigation suggests that modern Europeans don’t seem to have this variant. More recent evidence suggests that some haplotypes did introgress from Neanderthals at this locus, though perhaps into East Asians far more than Europeans.

So look at the MC1R SNPs in the table above. Neolithic and HG samples are all fixed for the derived variant. That is, one reason it seems implausible that the diversity of MC1R in Europe today is due to long term drift in situ is that it didn’t exist in the continent before the arrival of people from the steppe.

Second, rs12821256, in KITLG, associated with blonde hair in Europeans, is also no present in the ancient hunter-gatherers. But, it is present in the Neolithic farmers, as well as the people coming from the steppe. In fact the steppe samples have a higher fraction than any modern population (in the 1000 Genomes the frequency is ~20% in the British and Finnish samples). Remember, KITLG has been implicated in skin depigmentation in several studies, though the effect size is more modest than SLC24A5.

For the two solute carrier genes the trends are what we already knew. The frequency for 24A5 is high in the steppe, in fact, fixed, and high among the Neolithic farmers. It is low in Western European hunter-gatherers, and segregating at modest frequencies among the Scandinavian hunter-gatherers. The work above suggestions that the genetic background around rs1426654, which is a nonsynonomous change, dates to the Upper Paleolithic. But, both ancient DNA and haplotype based selection methods suggest that in places like Europe and India the frequency of this allele and its flanking sequence have been rapidly rising over the past ~10,000 years. The fact that some European hunter-gatherers had the derived variant of rs1426654, seems to confirm the idea that this mutation arose during the Ice Age, and was widely distributed. But, we can’t really adduce where the particular variant came from until we get good haplotype data from these ancient samples. Let me quote from Molecular Phylogeography of a Human Autosomal Skin Color Locus Under Natural Selection:

With sufficiently strong positive selection for C11, it is possible that this haplotype could have originated anywhere within its current range and spread via local migration. However, selection acting in concert with major population migrations would have facilitated a much more rapid dispersal. Archeological, mitochondrial, and Y-chromosomal data suggest involvement of multiple dispersals in shaping the current populations of Europe and the Middle East (Soares et al. 2010). Because A111T is far from fixation in most Indian samples (Table S1), the high diversity of B-region haplotypes associated with C11 in the GIH sample may be the result of prolonged recombination rather than early arrival of A111T. In fact, the decrease in frequency of A111T to the east of Pakistan suggests that C11 originated farther to the west and after the initial genetic split between western and eastern Eurasians. On this basis, we hold the view that an origin of C11 in the Middle East, broadly defined, is most likely.

Where does this leave us? First, we understand the genetic architecture of normal variation in pigmentation in humans to a good degree. Depending on how much residual there is in smaller effect QTLs there are publications to come which will probably yield a few more genes, but the remaining variance may simple be distributed across many small-effect loci. Second, the frequency of many pigmentation genes seems have changed due to natural selection. in South Asia and Ethiopia the methods have been able to detect genomic signatures of positive selection at SLC24A5. It can’t be ancestry alone, just look at table S5 for South Asia. The range across populations is huge, even if you exclude those with enriched East Asian ancestry.

Third, we don’t really know why this selection occurred across these pigmentation genes. This is going to sound strange of course. There are many theories out there. Readers regularly ask me what I think about Peter Frost’s thesis. My standard response is that I’m skeptical, but who knows? Peter has asserted that the selection he speaks of began in a very narrow delimited area in northeastern Europe. In the next few years we will have ancient DNA and be able to test some of his predictions. A more widely accepted thesis is promoted by Nina Jablonski in Skin: A Natural History. In her model at lower latitudes selection constrains variation due to high UV, while at higher latitudes there is relaxation of that constraint, and selection for vitamin D synthesis. The story is neat, but selection for SLC24A5 at lower latitudes, and higher elevation as those latitudes, occurs.

gh_map_world_v7The map to the left makes clear that the Sudan has some of the highest radiation levels in the world. It is reasonable then that people in this area would have darker skin than anywhere else. But Ethiopia’s radiation levels are not that much lower. And yet we know that there hasn’t been strong selection against the light skin alleles presumably derived from West Eurasian migrants. Rather, the reverse has occurred! None of the parsimonious models seem to explain very well the complexity on offer here.

Then, as Graham Coop observed in response to an Ewen Callaway piece in Nature where the latter inferred that European hunter-gatherers must have been dark skinned and blue eyed because of what genetics implies, we don’t really know the genetic architectures of pigmentation of ancient individuals. The reason is simple: we have genotype data, but not phenotype data. East Asians and Western Europeans converge upon lighter complexions via diverse genetic mechanisms, so why couldn’t ancient European hunter-gatherers be the same? This is a fair point. And, if true, then selection on pigmentation loci couldn’t, by definition, target pigmentation, since there wouldn’t be much heritable phenotypic variation to select upon.

401px-Vanuatu_blonde-200x300 But in response to the idea we should be phenotype-agnostic, pigmentation is one of the most well characterized traits for mammals in regards to the genetics. The parameter space of possibilities is not infinitely constrained. The same genes, and sometimes same mutations, re-occur across different populations. The reason some Melanesians have blonde hair is due to a mutation in TYRP1. Again, this is a locus implicated in pigmentation variation across many populations, and in other mammalian lineages. If we had good high quality whole genome sequences we could actually look for functional mutations across a set of pigmentation loci. If ancient European hunter-gatherers were functionally constrained around the pigmentation genes, or subject to neutral dynamics, that would be informative. A better characterization of all the diverse modern populations will probably give us better expectations of the size of the parameter space of genetic variation and how it maps onto phenotypic variation.

I’ve been giving a lot of thought to this topic for a while. And I have to say that in terms of the evolutionary origin of this trait and its variation, I’m left befuddled. After talking to researchers who are on the cutting edge in this area I’m pretty sure they are confused, too. That’s not dispiriting; that’s the state of science before discoveries push the edge of knowledge further. But, I’d also appreciate it if in response to this very long post readers don’t go Google Pundit on me and start throwing down a list of publications which resolve all these problems. I’m moderately familiar with this literature, and have probably internalized studies which go in both directions. In response to a post into which I put more effort over the last day than I probably should have, I expect the comments to be not-annoying. Or else (I assume you know what’s in that conditional!).

• Category: Science • Tags: Genomics, Pigmentation 
🔊 Listen RSS

Citation: Wilde et al.

Credit: Igor Kruglenko

Credit: Igor Kruglenko

A new paper in PNAS, Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 years, uses ancient DNA to examine the possibility of very recent natural selection in Europeans. In particular, it focuses on eastern Europeans, and roughly a region coterminous with Ukraine ~6000 to ~4000 years ago. The sample seems somewhat biased toward the low end of the age range if you look the supplemental tables. In the paper itself (which is open access) I don’t see a map to get a sense of the distribution of the sites from which the DNA was extracted. So I took the supplemental table and used the latitude and longitude information, as well as the samples from each site, and produced a density map with a bubble plot overlain upon it with specific locations (size of bubble proportional to number of samples at site). Like the earlier ancient DNA from a few European hunter-gatherers one must keep in mind the limitations of the scope of sampling so few to infer about so many. Though the number here is far larger (N >20 or >40 depending on the SNP), the set of markers examined was much smaller, a few pigmentation loci and mtDNA. Nevertheless this is not a trivial geographic example, nor is the time frame, from the Early Eneolithic down to the Bronze Age.

Figure S1

Figure S1

The clearest illustration of the topline result is found in the supplements (I prefer figures to tables obviously). What you see here is that there is a large difference in allele frequencies between ancient samples and modern ones from the equivalent geographic region at specific markers diagnostic for variation in pigmentation in modern Europeans. HERC2 is well known for being one of the two loci which span a long haplotype strongly correlated with blue eyes in Europeans. SLC452 and TRY are part of the standard suite of pigmentation genes which show up as variable across Eurasia. I am confused as to why they did not focus on SLC24A5, a locus which is nearly totally fixed in modern Europeans for the A allele, but may not have been so in hunter-gatherers. But in any case the result is rather clear: the ancient populations sampled here are statistically differentiated from modern populations in the same region, and, seem to have been darkly complected in comparison. The natural inference then is that powerful sweeps of natural selection increased the allele frequencies of lightening alleles in Europeans within the last ~4,000-6,000 years. This is not a crazy proposition; tests for recent natural selection in Europeans are often enriched around pigmentation loci, which are genomically atypical (long homogeneous blocks are common). What this study does is intersect inferences from modern variation with the distribution of variants in an ancient population presumed to be ancestral.

The problem of course is whether these are truly ancestral. But recall I stated earlier that they had mtDNA. This is copious, and so rather easy (comparatively!) to get from ancient DNA. Comparing their samples with modern ones from the region they find there isn’t great discontinuity. Using a model of genetic drift they support the scenario of continuity, and that the F st of ~0.005 is what you would expect for a set of populations ~4,000-6,000 years in the past. To put this in perspective this is about the Fst using autosomal SNPs between Russsians and French, or Palestinians and Greeks. Considering the time depth separating these putative populations I think even without their coalescent simulation models I can accept continuity of mtDNA intuitively. Of course the key is to not forget this is mtDNA, only the maternal lineage. If you looked at modern South Asians you’d see they’re mostly not West Eurasian. But if you looked at their Y chromosomes they’d be mostly West Eurasian. The autosomal DNA gives a half & half picture. The issue of sex mediated gene flow is made even more stark in the case of Latin America.

k8488 A model like is made more plausible by the fact that many of these individuals were of the Yamna culture, Kurgans. The thesis forwarded by some scholars is that it is these Kurgans, a patriarchal nomadic society, who brought Indo-European languages to central and western Europe ~5,000 years ago (their eastern cousins becoming Tocharians and Indo-Iranians, their southern ones Hittites and perhaps Armenians). Probably the best recent outline of this thesis is by David Anthony in The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. I found it so engrossing that I finished it in one sitting in 2008. If these data are correct the Kurgans did not look like blonde Aryan Übermensch, rather, they became that (though to be fair, in this case we are talking about them becoming Slavs, who the Nazis labelled Untermensch). But one of the general assumptions about Kurgans is that they were groups of mobiles males. In that case one wouldn’t be surprised if their mtDNA tended to reflect subject peoples, while the whole genome was more mixed and cosmopolitan, reflecting their migrations.

So the crux then is whether to trust this mtDNA evidence as representative of the whole genome. If I simply had the mtDNA, along with the information about provenance in terms of time and place, I’d probably accept the argument for continuity. But the phenotypic markers are so different, either there’s been population replacement, or, we’ve had a lot of in situ selection. Replacement seems like the more boring hypothesis, especially in light of the fact that many of the sites sampled were not in classically Slav zones of habitation, but were occupied by Iranian or Uralic peoples, or more recently Turks. Though the researchers are using contemporary East Slavs to compare to the ancient samples, across many of these sites Slavs only become dominant in the area with the rollback of the Ottomans in the 18th century.

Ultimately I’m very unsure that the assumption of genetic continuity in this case will hold, so let’s simply take that as a given for now. Then what? You have lots of selection. The question naturally moves to why. What drove the selection? In the discussion the authors the go over many of the hypotheses rather thoroughly. Roughly they fall into two classes, the ecological/environmental and the social/sexual. The former generally has do with a combination of a switch to agriculture and the need to synthesize vitamin D due to the shift away from fish in the far north. The latter focuses on sexual selection, and favoring particular markers due to heightened paternity certainty. In particular the sexual selection hypothesis would seem to be able to explain the rise of HERC2, which is associated with light eyes, as that may be a favored trait. The immediate rejoinder is provided in the text: many of the pigmentation loci have pleiotropic effects. In other words, they tune overall pigmentation, skin, hair, and eyes, though perhaps to different extents. So if the selection was environmental due to skin it would not be totally surprising if hair and eyes changed as a side effect. Of course, as suggested in the comments here one need not posit that there was one singular selection event, as opposed to a sequential composite. Perhaps it was both environmental and sexual selection?

This again is another area where I’ll throw my hands up the air. If selection is the answer, and not population replacement, then it’s very strong. It seems that these loci were subject to sweeps in the same range of power as that around LCT, for lactase persistence, the Tibetan high altitude adaptations, as well as the various malaria resistance alleles (which have different selective dynamics, some of them balancing). One can actually still detect differential fitness at high altitudes based on phenotype, and the same with malaria, at least before modern medicine. The problem I have is that I’m just not aware of studies on the extent of differential fitness in human populations due to sexual selection. In theory sexual selection is very powerful, especially in contexts of hyper-polygyny, but to have it be realized in humans would require very particular social structures. The environmental selection arguments by their nature tend to be simpler, and therefore more attractive. But we’ve reached a point where there’s a lot of confusing stuff coming out of ancient DNA, and we need to go back to first principles, and reexamine everything. This includes sexual selection, as more than simply a deus ex machina to throw out there when we don’t have a better model on hand. That necessitates a serious examination of patterns of variance in reproductive output by phenotype, and plugging these back into models of selective sweeps.

Citation: Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 years

Note: Yulia Tymoshenko has very dark eyes. So I assume she’s not a natural blonde.

🔊 Listen RSS

San_tribesman A phenotypic takeaway from the ancient European DNA preprint has been that the hunter-gatherer from Luxembourg was light eyed and darker skinned (and dark haired), while the early farmer from Germany was lighter skinned and darker eyed (and dark haired). In yesterday’s post a reader pointed out that I misinterpreted the genotypes of these two individuals at a very important pigmentation locus: I thought that it was homozygous for the lighter skin conferring allele which is at very high frequency in modern Europeans. I was wrong. A SNP at this locus, SLC45A2, correlated with darker complexion and eyes, is present at ~3% in Europeans (as opposed to ~100% in East Asia and Sub-Saharan Africa). In a European American sample the genotypes are as follows:

15, CC
321, CG
3964, GG

Both the hunter-gather and the farmer are CC. Combined with the hunter-gatherer being GG (which is nearly absent in modern Europeans) at SLC24A5 it does seem that as the authors of the preprint were right, the hunter-gatherer had darker skin. The twist is that the region of the genome, OCA2/HERC2, that seems to explain most of the blue vs. non-blue eye color difference in Europe, is homozygous for the blue variant in the hunter-gatherer. I would say that if I had just the pigmentation loci I would think that the hunter-gatherer in this study was from a population mixed between Europeans and non-Europeans. For example, inhabitants of Cape Verde.

The hunter-gatherer individual sequenced from Loschbour rock shelter may be an anomaly. But you wouldn’t find this sort of individual among modern Europeans without recent non-European ancestry, even as an anomaly. So there’s something to explain.

• Category: Science • Tags: Pigmentation 
🔊 Listen RSS

250px-Indig1 One of the most interesting results in the preprint on ancient European genetics (or more accurately, the ethnogenesis of Europeans in a genetic sense) is the fact that the ~8,000 year old hunter-gatherer sample from Luxembourg had a GG genotype on the SLC24A5 locus. Actually, interesting isn’t the right word, shock, and frankly a little skepticism is more precise. The reason for my reflexive incredulity is that the GG genotype is very much the minor variant in Western Eurasia, and extremely rare among unadmixed Europeans. Europeans have such a high fraction of the A allele that some population genetic statistics to test for selection at a locus are not viable, because there’s not enough variation segregating in that region. This allele also is present outside of Europe, with the A allele being the major variant in South Asians, albeit at a lower fraction, verging on ~50% or less in some South Indian groups. Yet it is not entirely implausible that this allele only swept to fixation over the past 8,000 years in Europe looking at the genomic features* of the region in which it is embedded.

I want to make more concrete why this result is a pretty big deal. If you look at the 1000 Genomes data you have results for British, Finnish, Tuscan, and Spanish individuals, as well as a well characterized sample of white Utahans of Northwest European heritage. There is also a less well characterized pooled data set of “European Americans.” Here are the genotype counts by population:

Population AA AG GG
Utah white 85 0 0
British 89 0 0
Finnish 91 2 0
Tuscan 97 1 0
Spanish 14 0 0
European American 4256 40 1

Yesterday on Twitter I suggested that I’d want at least 10,000 individuals of unadmixed Northern European ancestry before I might take a bet that I’d find someone with a GG genotype. I don’t think I was exaggerating. The sample size might be one, but the fact that the individual was homozygous for GG implies to me that the G allele was present at a far higher fraction in Northern Europe 8,000 years ago than today. In contrast the LBK farmer individual was AA on SLC24A5. Why this matters functionally is that no matter how you look at it, when comparing Europeans and dark skinned populations (e.g., Africans, South Indians, and Australasians) this locus is the one that explains the highest proportion of the variation on pigmentation of any gene. Comparing simply people of African ancestry and Europeans the variation at this gene accounts for on the order of ~1/3 of the difference.** I myself have the “European” AA genotype, with most of my other large effect loci being of the “dark” correlated alleles. The pigmentation difference between a Sub-Saharan African and myself is probably accounted for just by this locus alone. But a twist on this story is that the hunter-gatherer also exhibited the genotype associated with blue eyes in Europeans. In contrast, the farmer genotype was the one not correlated with blue eyes. On another locus which is not quite fixed for a derived light encoding variant, but very close in Europeans (and found in much lower proportions in other West Eurasians), SLC45A2, it looks as if both the hunter-gatherer and the farmer carry the modal European form.

220px-Lucy_Merriam Rather that squeezing too much more out of a few samples, I want to posit that these results increase the plausibility that the suite of genetic variants across many loci which are often diagnostic of the complexion of Northern Europeans are a function of a combination of admixture and then selection within the resultant Northwest European lineages. It seems plausible that independent selection events were occurring across these groups, and with admixture more novel variants were present in the combined population which allowed for a skew even further along the phenotypic continuum, toward the physiological limit (at least for non-albinos). Though it looks like the majority of the ancestry of Northern Europeans, especially populations around the coastal East Baltic region, derive from hunter-gatherer groups indigenous to the continent (i.e., pre-Holocene), if they were not fixed for the derived variant on SLC24A5 it seems implausible that these ur-Europeans were defined by the rosy complexions which are archetypical for Northern Europeans . This is part of the broader picture whereby the phenotypically salient population clusters we see around us today, as if they are Platonic ideals of underlying racial forms, may themselves be phenomena distinctive to the Holocene .

* A large correlated block of markers which seem to have risen in frequency recently and rapidly within the population.

** Northeast Asians have their own distinct mutations which confer light skin.

• Category: Science • Tags: Pigmentation, Population Structure, Race 
🔊 Listen RSS
Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Distribution of SLC452 variation at SNP rs1426654. Credit, HGDP Browser

Nina Davuluri, Miss America 2014, Credit: Andy Jones

Nina Davuluri, Miss America 2014, Credit: Andy Jones

One of the secondary issues which cropped up with Nina Davuluri winning Miss America is that it seems implausible that someone with her complexion would be able to win any Indian beauty contest. A quick skim of Google images “Miss India” will make clear the reality that I’m alluding to. The Indian beauty ideal, especially for females, is skewed to the lighter end of the complexion distribution of native South Asians. Nina Davuluri herself is not particularly dark skinned if you compared her to the average South Asian; in fact she is likely at the median. But it would be surprising to see a woman who looks like her held up as conventionally beautiful in the mainstream Indian media. When I’ve pointed this peculiar aspect out to Indians* some of them of will submit that there are dark skinned female celebrities, but when I look up the actresses in question they are invariably not very dark skinned, though perhaps by comparison to what is the norm in that industry they may be. But whatever the cultural reality is, the fraught relationship of color variation to aesthetic variation prompts us to ask, why are South Asians so diverse in their complexions in the first place? A new paper in PLoS Genetics, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, explores this genetic question in depth.

Much of the low hanging fruit in this area was picked years ago. A few large effect genetic variants which are known to be polymorphic across many populations in Western Eurasia segregate within South Asian populations. What this means in plainer language is that a few genes which cause major changes in phenotype are floating around in alternative flavors even within families among people of Indian subcontinental origin. Ergo, you can see huge differences between full siblings in complexion (African Americans, as an admixed population, are analogous). While loss of pigmentation in eastern and western Eurasia seems to be a case of convergent evolution (different mutations in overlapping sets of genes), the H. sapiens sapiens ancestral condition of darker skin is well conserved from Melanesia to Africa.

So what’s the angle on this paper you may ask? Two things. The first is that it has excellent coverage of South Asian populations. This matters because to understand variation in complexion you should probably look at populations which vary a great deal. Much of the previous work has focused on populations at the extremes of the human distribution, Africans and Europeans. There are obvious limitations using this approach. If you are looking at variant traits, then focusing on populations where the full range of variation is expressed can be useful. Second, this paper digs deeply into the subtle evolutionary and phylogenomic questions which are posed by the diversification of human pigmentation. It is often said that race is often skin deep, as if to dismiss the importance of human biological variation. But skin is a rather big deal. It’s our biggest organ, and the pigmentation loci do seem to be rather peculiar.

You probably know that on the order of ~20% of genetic variation is partitioned between continent populations (races). But this is not the case at all genes. And pigmentation ones tend to be particular notable exceptions to the rule. In late 2005 a paper was published which arguably ushered in the era of modern pigmentation genomics, SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. The authors found that one nonsynonomous mutation was responsible for on the order of 25 to 33% of the variation in skin color difference between Africans and Europeans. And, the allele frequency was nearly disjoint across the two populations, and between Europeans and East Asians. When comparing Europeans to Africans and East Asians almost all the variation was partitioned across the populations, with very little within them. The derived SNP, which differs from the ancestral state, is found at ~100% frequency in Europeans, and ~0% in Africans and East Asians. It is often stated (you can Google it!) that this variant is the second most ancestrally informative allele in the human genome in relation to Europeans vs. Africans.

SLC24A5 was just the beginning. SLC45A2, TYR, OCA2, and KITLG are just some of the numerous alphabet soup of loci which has come to be understood to affect normal human variation in pigmentation. Despite the relatively large roll call of pigmentation genes one can safely say that between any two reasonably distinct geographic populations ~90 percent of the between population variation in the trait is going to be due to ~10 genes. Often there is a power law distribution as well. The first few genes of large effect are over 50% of the variance, while subsequent loci are progressively less important.

So how does this work to push the overall results forward?

– With their population coverage the authors confirm that SLC24A5 seems to be polymorphic in all Indo-European and Dravidian speaking populations in the subcontinent. The frequency of the derived variant ranges from ~90% in the Northwest, and ~80% in Brahmin populations all over the subcontinent, to ~10-20% in some tribal groups.

– Though there is a north-south gradient, it is modest, with a correlation of ~0.25. There is a much stronger correlation with longtitude, but I’m rather sure that this is an artifact of their low sampling of Indo-European populations in the eastern Gangetic plain. As hinted in the piece the correlation with longitude has to do with the fact that Tibetan and Burman populations in these fringe regions tend to lack the West Eurasian allele.

– Using haplotype based tests of natural selection the authors infer that the frequency of this allele has been driven up positively in north, but not south, India. It could be that the authors lack power to detect selection in the south because of lower frequency of the derived allele. And, I did wonder if selection in the north was simply an echo of what occurred in West Eurasia. But if you look at the frequency of the A allele in the north most of the populations seem to have a higher frequency of the derived variant than they do of inferred “Ancestral North Indian”.

What’s perhaps more interesting is the bigger picture of human evolutionary dynamics and phylogenetics that these results illuminate. Resequencing the region around SLC24A5 these researchers confirmed it does look like the derived variant is identical by descent in all populations across Western Eurasia and into South Asia. What this means is that this mutation arose in someone at some point around the Last Glacial Maximum, after West Eurasians separated from East Eurasians. The authors gives some numbers using some standard phylogenetic techniques, but admit that it is ancient DNA that will give true clarity on the deeper questions. When I see something written like that my hunch, and hope, is that more papers are coming soon.

When I first read The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, I thought that it was essential to read Ancient DNA Links Native Americans With Europe and Efficient moment-based inference of admixture parameters and sources of gene flow. The reason goes back to the plot which I generated at the top of this post: notice that Native Americans do not carry the West Eurasian variant of SLC24A5. What the find of the ~24,000 Siberian boy, and his ancient DNA, suggest is that there was a population with affinities closer to West Eurasians than East Eurasians that contributed to the ancestry of Native Americans. The lack of the European variant of SLC24A5 in Native Americans suggests to me that the sweep had not begun, or, that the European variant was disfavored. What the other paper reports is that on the order of 20-40% of the ancestry of Europeans may be derived from an ancient North Eurasian population, unrelated to West Eurasians (or at least not closely related). It is likely that this population has something to do with the Siberian boy. Since Europeans are fixed for the derived variant of SLC24A5, that implies to me that sweep must have occurred after 24,000 years ago.

journal.pgen.1003912.g002 At this point I have to admit that I believe need to be careful calling this a “European variant.” Just because it is nearly fixed in Europe, does not imply that the variant arose in Europe. If you look at the frequency of the derived variant you see it is rather high in the northern Middle East. Looking at some of the populations in the Middle Eastern panel the ancestral variant might be all explained by admixture in historical time from Africa. If the sweep began during the last Ice Age, then most of Europe would have been uninhabited. The modern distribution is informative, but it surely does not tell the whole story.

Where we are is that SLC24A5 , and pigmentation as a whole, is coming to be genomically characterized fully. We don’t know the whole story of why light skin was selected so strongly. And we don’t quite know where the selection began, and when it began. But through gradually filling in pieces of the puzzle we may come to grips with this adaptively significant trait in the nearly future.

Citation: Basu Mallick C, Iliescu FM, Möls M, Hill S, Tamang R, et al. (2013) The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. PLoS Genet 9(11): e1003912. doi:10.1371/journal.pgen.1003912

* From my personal experience American born Indians often do not share the same prejudices and biases, partly because subtle shades of brown which are relevant in the Indian context seem ludicrous in the United States.

• Category: Science • Tags: Anthropology, Genetics, Genomics, Pigmentation 
🔊 Listen RSS

Likely an individual with derived allele on KITL locus (Credit: David Shankbone)

An individual polymorphic on the KITL locus? (Credit: David Shankbone)

Pigmentation is one of the few complex traits in the post-genomic era which has been amenable to nearly total characterization. The reason for this is clear in hindsight. As far back as the 1950s (see The Genetics of Human Populations) there were inferences made using human pedigrees which suggested that normal human variation on this trait was controlled by fewer than ten genes of large effect. In other words, it was a polygenic character, but not highly so. This means that the alleles which control the variation are going to have reasonably large response, and be well within the power of statistical genetic techniques to capture their effect.

I should be careful about being flip on this issue. As recently as the mid aughts (see Mutants) the details of this trait were not entirely understood. Today the nature of inheritance in various populations is well understood, and a substantial proportion of the evolutionary history is also known to a reasonable clarity as far as these things go. The 50,000 foot perspective is this: we lost our fur millions of years ago, and developed dark skin, and many of us lost our pigmentation after we left Africa ~50,000 years ago (in fact, it seems likely that hominins in the northern latitudes were always diverse in their pigmentation)

A new paper in Cell sheds some further light on the fine-grained details which might be the outcome of this process. Being a Cell paper there is a lot of neat molecular technique to elucidate the mechanistic pathways. But I will gloss over that, because it is neither my forte nor my focus. A summary of the paper is that it shows that p53, a relatively well known tumor suppressor gene, seems to have an interaction with a response element (the gene product binds in many regions, it is a transcription factor) around the KITLG locus. This locus is well known in part because it has been implicated in pigment variation in human and fish. So KITLG is one of the generalized pigmentation pathways which spans metazoans. There are derived variants in both Europeans and East Asians which are correlated with lighter skin, though there is polymorphism in both cases (it has not swept to fixation).

The wages of adaptation? (Credit: Hoggarazzi Photography)

The wages of adaptation? (Credit: Hoggarazzi Photography)

But this is a Cell paper, so there has to be a more concrete and practical angle than just evolution. And there is. It turns out that a single nucleotide polymorphism mutation in the p53 response element results in a tendency toward upregulation of KITLG and male germ line proliferation. The latter matters when it comes to tumor genesis, and in particular testicular cancer. This form of cancer is one where there doesn’t seem to be a somatic cell mutation of p53 itself. Additionally, the authors observe that testicular cancer manifests at a 4-5 fold greater rate in people of European descent than African Americans. And, presumably the upregulation of KITLG is somehow related to increased melanin production. The authors posit that because of lighter skin in Europeans due to selection at other loci there has been a balancing effect at KITLG (increased tanning response). There is evidence of selection at this locus (a long haplotype and increased homozygosity), so this is not an unreasonable conjecture, though the high frequency of loss of function alleles suggests that the model is likely complex.

I don’t know if this particular story is correct in its details (though I am intrigued that variation in KITLG is associated with cancer in other organisms). But it illustrates one of the possible consequences of rapid evolutionary change due to human migration out of Africa: deleterious side effects because of pleiotropy. In other words, as you tinker with the genomic architecture of a population you are going to have to accept tradeoffs as you are optimizing one aspect of function. Genes don’t have just one consequence, but are embedded in myriad pathways. Over time evolutionary theory predicts a slow re-balancing, as modifier genes arise to mask the deleterious side effects. But until then, we will bear the burdens of adaptation as best as we can.

Citation: Zeron-Medina, Jorge, et al. “A Polymorphic p53 Response Element in KIT Ligand Influences Cancer Risk and Has Undergone Natural Selection.” Cell 155.2 (2013): 410-422.

🔊 Listen RSS

SLC45A2 rs16891982 frequency, Norton, Heather L., et al. “Genetic evidence for the convergent evolution of light skin in Europeans and East Asians.” Molecular biology and evolution 24.3 (2007): 710-722.


The above figure is from Norton et al.’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians. It shows that rs16891982 on the SLC45A2 locus exhibits strong differentiation between Europe and the rest of the world. This is in contrast to SLC24A5, where the well known allele which differentiates Africans/East Asians from Europeans is found at very high frequencies across Western Eurasia (both my parents are homozygotes for the “European” variant; in fact SLC24A5’s derived variant is found at fractions on the order of ~50% in eastern and southern India). The ancestral allele on SLC24A5 is very difficult to find in Europeans, it is so close to fixation for the derived variant. In contrast SLC45A2‘s minor allele is segregating at appreciable frequencies in places like southern Spain, and the derived allele is not fixed even in Northern Europe.

I won’t review the literature on the genomics and evolution of human pigmentation at this point. Rather, I’ll just note that it seems most of the inter-population variation is controlled by a handful of genes. It’s a polygenic trait, but just. Second, a fair amount of evidence has emerged that some of the lightening derived variants have increased in frequency only very recently (e.g., on the order of ~10,000 years). Pigmentation is then a peculiar trait where the genetic underpinnings can give historical phylogenetic information because of the varied dates of differentiation and selective sweeps.

Below I’ve collated results from several studies on frequencies of SLC45A2. I invite readers to persue them. I will say two things. First, the frequency of the “European” variant in ~140 northern Ethiopians is 0%. This is peculiar for a population which may be on the order of ~50% West Eurasian. Second, the fraction of SLC45A2 derived variant in South Asians coincidentally tracks the “NE Euro” percentage in Zack Ajmal’s results.


Country/Region Group/Place N Frequency light allele
A Decreasing Gradient of 374F Allele Frequencies in the Skin Pigmentation GeneSLC45A2, from the North of West Europe to North Africa
Denmark Copenhagen 51 0.98
England London 56 0.955
Belgium Brussels 53 0.934
France Lille 64 0.945
Rheims 98 0.893
Rennes 52 0.971
Marseilles 312 0.888
Perpignan 101 0.827
Corsica 328 0.878
Germany Mulheim 59 0.975
Switzerland Basel 51 0.96
Italy Genoa 97 0.85
Roma 64 0.898
Napoli 128 0.859
Sicily 39 0.833
Sardinia 100 0.805
Spain Barcelona 59 0.856
Sevilla 71 0.725
Portugal North 79 0.829
South 59 0.78
Near Fixation of 374l Allele Frequencies of the Skin Pigmentation Gene SLC45A2 in Africa
Algeria Algiers 141 0.7
Morocco Tangier 123 0.69
Rabat 102 0.68
Berbers from Morocco 75 0.57
Libya Tripoli 38 0.58
Egypt Alexandria 162 0.65
Assouan 66 0.14
South 46 0.2
Mauritania Moors 65 0.41
Senegal Wolof 209 0
Serrere 92 0
Mandingue 51 0
Diola 42 0
Balant 21 0
Peuls 71 0.1
Toucouleur 70 0.03
Soninké 69 0.03
Ethiopia Addis Ababa 104 0
Falashas 38 0
Democratic Republic of Congo 188 0
Distribution of the F374 Allele of the SLC45A2 (MATP) Gene and Founder-Haplotype Analysis
Munich German 93 0.962
West Germany Turk 200 0.615
New Delhi Indian 51 0.147
Dhaka Bangladeshi 118 0.059
Ulaan Baator Khalha 173 0.113
Dashbalbar Buryat 143 0.115
Shenyang Han 89 0.028
Wuxi Han 119 0
Huizhou Han 111 0.005
Tottori Japanese 103 0
Okinawa Japanese 87 0
Surabaya Indonesian 105 0.005
White S African 54 0.89
Ghanaian 50 0
New Guinean 52 0
Japanese 49 0
Polymorphisms of four pigmentation genes (SLC45A2, SLC24A5, MC1R and TYRP1) among eleven endogamous populations of India
Jharkhand Munda 68 0.03
Madhya Pradesh Kanyabuja Brahmin 78 0.11
Madhya Pradesh Gond 75 0.02
Maharashtra Konkanastha Brahmin 71 0.06
Maharashtra Mahadev Koli 65 0.06
Tamil Nadu Iyengar Brahmin 66 0.07
Tamil Nadu Kurumans 67 0.07
Tripura Tripuri 65 0
Tripura Riang 67 0.01
• Category: Science • Tags: Genomics, Human Genetics, Pigmentation 
🔊 Listen RSS

Image credit: Muntuwandi

One of the pitfalls about talking about genetics, especially human genetics, is that the public wants a specific gene for a specific trait. Ergo, the “God gene” or the “language gene.” In some cases science has been able to pull a rabbit out of the hat, and offer up a gene for a trait. But in most of those instances these are going to be single gene recessive diseases. Not exactly what the doctor ordered. In other cases the association seems trivial. For example, wet or dry earwax?* What people are truly interested in are the genetic basis of complex traits, such as intelligence, personality, and height. Unfortunately complex traits often have a complex genetic basis. A trait such as height, which is highly heritable (i.e., most of the variation in the population is due to variation in genes), turns out to be subject to the control of innumerable genes, each of which has a small impact on the value of the final trait. Then there is the possibility that the heritability is tied up to interaction effects across genes.

All of this might compel you to wonder why even tackle the morass that is complex trait genetics?The simple answer is why not? The more concrete answer is that unlike the social sciences geneticists have the gene as an abstract unit from which to construct their theoretical models. It may be a daunting task, but unpacking the causal components of complex traits in a genetic sense is at least more tractable than other intellectual endeavors. Memes are fine as a metaphor, but they haven’t been nearly as useful in constructing a science which generates non-obvious inferences as genes have been.

For most complex traits of any great interest it is not feasible for someone to list off the genes which control most of the normal variation on that trait. Pigmentation is an exception to this. While the continuous variation in height or intelligence seems to be distributed across many, many, genes, (on the order of hundreds or thousands) most of the variation in pigmentation seems to be collapsed into a few genes of large effect. One way to think about it is as an exponential decay function where each successive gene explains less and less of the variation of the trait within your population(s) of interest. So, SLC24A5 is probably the largest effect locus, with a nearly disjoint allele frequency difference between Europeans and Africans (and East Asians). A paper from the mid aughts reports that a substitution at this locus in Europeans (who carry the derived variant) explains “between 25 and 38% of the European-African difference in skin melanin index.” A few years later another paper reported in relation to the locus KITLG that a variant within this region was responsible for ~20% of the variation in European-African pigmentation difference. Assuming independent, effects these two genes then may account for nearly half of the average difference in trait value across the two populations. There are other loci which crop up in the literature repeatedly. TYR, ASIP, SLC45A2, OCA2, and HERC2, for example.

The point in listing off these genes is to emphasize that pigmentation has been one of the major success stories in human genomics over the past decade. This really is the golden age of inquiry into this field. As I like to recount, in 2003 Armand Leroi wrote in the epilogue to his book Mutants that it we didn’t even have a good grasp of the genetic basis of the normal variation in skin color in humans. This assertion was totally out of date within five years. Such a radical change is what you want science to be on its best days, when you’re not slamming your head against a problem which presents no clear and obvious solution. Though many of the genes above were discovered by analyzing differences between Europeans and Africans, a similar set seem to be segregating within South Asians. Expanding the sample coverage to more diverse populations does yield more loci, and in many cases you see different mutations within the same gene producing a change in pigmentation across divergent populations (e.g., at OCA2 there are European and East Asian derived variants). All this taken together seems to imply that change in pigmentation occurred repeatedly across human populations over the last 10-20,000 years, though targeting the same relatively small space of genes which can modulate melanin pathways.

As you might have guessed I have been keeping track of this literature rather closely for a while. The reason is two fold. First, normal human phenotypic variation is interesting to me. And the genetics of pigmentation has been a relative success story. Additionally, I also need to add that the polygenic but large effect genetic character of pigmentation was predicted by the mid-20th century using an analysis of phenotypes and pedigrees in mixed-race populations. See Genetics of Human Populations. All of it has been coming together so well that I’ve started ignoring the literature in this area. Pigmentation genetics was more and more the purview of forensic specialists and the like.

But a new paper in PLOS GENETICS makes me reconsider whether the game is quite over yet. Genetic Architecture of Skin and Eye Color in an African-European Admixed Population:

Differences in skin and eye color are some of the most obvious traits that underlie human diversity, yet most of our knowledge regarding the genetic basis for these traits is based on the limited range of variation represented by individuals of European ancestry. We have studied a unique population in Cape Verde, an archipelago located off the West African coast, in which extensive mixing between individuals of Portuguese and West African ancestry has given rise to a broad range of phenotypes and ancestral genome proportions. Our results help to explain how genes work together to control the full range of pigmentary phenotypic diversity, provide new insight into the evolution of these traits, and provide a model for understanding other types of quantitative variation in admixed populations.

That’s the author summary, not the abstract. The primary result is that in their results the authors find that the genes of large effect have much smaller effects proportionately than in earlier studies, and that ancestry can explain a substantial proportion of the variation when the specific removes are already accounted for. Their study population is in Cape Verde, an island off the coast of West Africa where most of the population are of West African and Portuguese ancestry. This is not an unimportant detail.

Much of their analysis of Cape Verde as a mixed population uses the Yoruba and CEU HapMap data set. CEU consists of whites from Utah who are of British and other assorted Northern European ancestry. The Yoruba are probably a reasonable representative of the African ancestors of this population (though the HGDP Mandenka would probably have been somewhat better). But I don’t understand why they didn’t use the HapMap Tuscan population, as the parental European population for individuals from Cape Verede is Portuguese, Southern European, not Northern European. When speaking of pigmentation genetics it is important to note that European populations vary a great deal. Previous studies focused on a population which was about 20% Northern European and 80% West African (African Americans). This study focuses on a population which is 40% Southern European and 60% West African. It seems entirely reasonable that admixture source populations would have a strong impact on the nature of genetic effects.

The image (source) to the left illustrates variation in frequency of alleles of SLC45A2. A derived variant associated with lighter skin is prevalent across Europe at frequencies on the order of ~90%. But where in Northern European the proportions range the interval from 90 to 100 percent, in Portugal it is present at frequencies of ~80 percent. Alleles associated with light eyes in Europeans are also found at far lower frequency in the southwest and southeast of the continent.

Overall the biggest result out of this paper is found in the abstract: “We identify four major loci…for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (~44%).” The implication, which they lay out, is that in this admixed population the genetic architecture is such as that within that 44% there may be smaller effect genes which diffused through the genome, and strongly correlated with differential ancestry (i.e., European ancestral segments have more “light” alleles, African segments the “dark” ones). This is not entirely unreasonable. If pigmentation loci are targets of selection (their results suggest that this is so) then one might see change on large effect loci first, and then graduate convergence to the adaptive peak via small effect loci. But, I also believe that the fact that the European source population is on the darker side also is having an effect. The allele frequency differences between Swedes and Yoruba, would be larger than Portuguese and Yoruba (though to be sure the Portuguese and Swedes would still be far closer).

Rather than reject previous models, these results refine them, and remind scientists that they need to see how robust their general inferences are. It seems plausible that in the next few years more scholars will explore the genetics of pigmentation in diverse populations, and therefore gain a more nuanced understanding of the genetic architecture of this trait.

Citation: Beleza S, Johnson NA, Candille SI, Absher DM, Coram MA, et al. (2013) Genetic Architecture of Skin and Eye Color in an African-European Admixed Population. PLoS Genet 9(3): e1003372. doi:10.1371/journal.pgen.1003372

* To be fair, these trivial associations are often side effects of other genetic changes which are presumably adaptive in some fashion.

• Category: Science • Tags: Genetics, Genomics, Pigmentation 
🔊 Listen RSS

The Pith: the evolution of lighter skin is complex, and seems to have occurred in stages. The current European phenotype may date to the end of the last Ice Age.

A new paper in Molecular Biology and Evolution, The timing of pigmentation lightening in Europeans, is rather interesting. It’s important because skin pigmentation has been one of the major successes of the first age of human genomics. In 2002 we really didn’t know the nature of normal human variation in skin color in terms of specific genes (basically, we knew about MC1R). This is what Armand Leroi observed in Mutants in 2005, wondering about our ignorance of such a salient trait. Within a few years though Leroi’s contention was out of date (in fact, while Mutants was going to press it became out of date) . Today we do know the genetic architecture of pigmentation. This is why GEDmatch can predict that my daughter’s eyes will be light brown from just her SNPs (they are currently hazel). This genomic yield was facilitated by the fact that pigmentation seems to be a trait where most human variation is controlled by half a dozen genes. In contrast, height or I.Q. are controlled by innumerable genes.

But first, a major gripe. In the discussion they write: “Our estimates additionally show that the onset of selective sweeps at SLC24A5, SLC45A2, and TYRP1, the three genes in which the geographic distribution of the polymorphisms is primarily restricted to European populations.” This is just not literally true. SLC24A5 in its derived skin lightening state is found outside of Europe. As the map from the HGDP browser to the left indicates, the derived “European” variant is nearly fixed in Middle Easterners. If you subtract Sub-Saharan admixture it almost is fixed in Middle Easterners. It is also found in high frequencies in South Asians. The HGDP samples are Pakistani, but the derived variant is present at a frequency of 95% in the HapMap Gujaratis! My parents are also homozygotes for the derived “European” variant. I’m rather sure there are more copies of the derived “European” allele among non-Europeans: South Asians, Middle Easterners, and North Americans. The problem here is semantic I think. The authors were really talking about West Eurasians in a generic sense, but because their data utilized Europeans, East Asians, and Africans, they felt like they had to speak about Europeans specifically. Additionally, during the Last Glacial Maximum much of Europe was not inhabited, or very sparsely so. That suggests to me that much of the evolution of “European pigmentation” may have taken outside of geographical Europe proper.

As for the paper, the results are pretty simple and striking. And speaking of striking, I’ll just paste this figure illustrating a neighbor-joining network of haplotypes at four skin pigmentation loci first to orient you. The yellow bubbles are derived lineages (in this case, they are often associated with SNPs correlated with lighter skin), while the black are ancestral ones.

What you see in the first two panels is that derived lineages are tightly clustered. SLC24A5 looks in particular to have almost a “star phylogeny,” so that you are seeing signatures of rapid expansion of this haplotype. SLC45a2 in contrast is dispersed across the networks. The authors posit that there may have been a recombination event which resulted in the jumping of the derived lineage onto the background of the ancestral one. Finally, with KITLG you see a pattern where numerous derived lineages are widely dispersed, albeit differentiated from the ancestral branch.

How did they do this? For the purposes of this blog post what I will say is that they first focused on a SNP, a single nucelotide polymorphism, associated with the lightening of the skin. This need not be the causal mutation, but generally they are strongly associated with the trait, and so can serve as useful markers. Second, around these focal SNPs they assembled a set of microsatellites with which they could perform phylogenetic tests. Microsatellites mutate fast, and accumulate variation. The main issue is that they mutate so fast you lose resolution at deeper time depths.

With the combination of SNP and microsatellite data the authors tested their empirical patterns against explicit models from which they generated simulations. Basically the goal here was to test for neutrality. In other words, you have a set of outcomes you’d expect based on neutral dynamics (i.e., just drift changing the frequencies), and you see how the “real world” results fit in. If the empirical data are not well explained by the neutral model, perhaps it was selection? Looking at patterns of variation around these loci you can also get a sense of the strength of the selection and time since the last common ancestor. Here’s a table with the outcomes:

Just so you know, a selection coefficient of 0.01 is respectable, and 0.10 is massive. In particular in the case of SLC24A5 it looks like there was a lot of selection, and recently. A few years ago a conference presentation implied that the selective sweep around SLC24A5 began ~6,000 years ago. To my knowledge a paper never came out of this, and from what I’ve heard in part that’s because that very low number is probably not right, and you may have to push it back some. These results look around to be in the right range from what I’ve heard. Others have found similar ages for SLC24A5 and SLC45A2 sweeps. But take a look at the confidence intervals. This is a case where I would really like to play around with their data and the model assumptions, and see how robust they are.

More intuitively obvious though are the patterns of KITLG in terms of geography, as well as the haplotype phylogenetic tree. The authors basically conclude that KITLG is a variant which precedes the differentiation between Europeans and East Asians, while the other genes have sweeps which postdated the divergence. The latter makes sense in light of the differentiation in skin pigmentation architecture in western and eastern Eurasians. Repeatedly the authors basically admit that this is a complicated issue, so I wouldn’t take these results home. It does concern me that they assume a demographic model which is a tree without reticulation. My own question in regards to the ~25,000 year values for divergence of west and east Eurasians is the extent to which admixture and gene flow are pulling forward in time the node. Second, the authors focused on a few representative populations in Europe, East Asia, and Africa. But there’s a whole world out there. It isn’t as if evolution occurred in isolation at these antipodes, and everyone else is a linear combination of subsequent admixture. In fact, I have to wonder if the estimates here are for populations which are intrusive to Europe, rather than indigenous. One point is that one might speculate that newcomers assimilated old lightening variants from the European Ice Age hunter-gatherers. But the haplotype structure mitigates against this. You should see more diverse derived variants if they’re drawn from the reservoir of ancient variants extant in Ice Age Europe.

So what’s the explanation from the authors? One proposal they make is that human evolution is accelerating due to more genetic variation because of larger effective population sizes. I assume they make this argument because it doesn’t look like the more recently selected variants emerged from standing variation, the diversity already present at the time of the sweep. Rather, the sweeps are triggered by new mutations which emerged recently (ergo, fewer “steps” away mutationally in the network for all the derived variants).

Ultimately there’s a lot to think about here. But I do wonder how ancient DNA is going to update and revise things. As I’ve said over and over again I’m a lot more skeptical of inferences and simulations after the dozens of phylogenetic model papers I read in the 2000s which “proved” no admixture between archaics and modern humans.

Image credit: Rita Molnar

🔊 Listen RSS

Michelle points me to this article in The Lost Angeles Times, The Colors of the Family:

I was holding my 1-year-old, ambling about downtown with some friends. White friends. She must have thought my boy belonged to one of them.

There’s a simple explanation: I’m black but my son, Ashe, is white. At least he looks it.

But things are more complicated than that.

I’m actually half black and half white. It should come as no surprise, though, that even as sophisticated as we’ve become about people of mixed parentage, I’m pigeonholed as black. If someone asks and I don’t have time to go deeper, that’s what I call myself.

Ashe is mixed too. His mother, my wife, Vanashree, is half white and half South Asian, with roots in India. She has olive skin, and Ashe is slightly lighter than she is.

This surprised us. When Ashe was born, one of the first things I said to Vanashree was, “Honey, he’s so light!” We chuckled, poking fun at our assumptions.*

Let’s get the sociological aspect out of the way. Is this really that surprising? Folk-biology has always had the concept of a “throwback,” which really distills the reality of Mendelian inheritance (as opposed to simple blending processes). In societies such as Brazil or India where there is a fair amount of segregation of polymorphisms which control skin color it isn’t that unheard of for a child to be darker or lighter in tone than both parents. And more frankly, this is not unknown within the African American community, where there is a range of skin tone due to ~20% European admixture. I suspect many African American would have these “assumptions,” because of an intuitive understanding of the unpredictable nature of the inheritance of this trait.

Second, the author of the piece is half black and half white in social terms, but there is no chance he is 50 percent African in ancestry. Barack H. Obama is 50 percent African in ancestry, but African Americans almost always have some admixture. I’ve analyzed ~150 African Americans in terms of their ancestry, and they always have some European ancestry. In fact the few Africans in my data set jump out because they lack this component. In other words, the author’s child is somewhat more than 50 percent European in ancestry.

Finally, what’s the science behind this? This isn’t that hard to actually understand, because the genetic architecture of pigmentation has been well elucidated. Only a few genes control most of the variation across populations (the difference we see between Africans and Europeans, South Asians and East Asians). Because we know the parents’ ancestry we can make a few educated The largest effect size upon of a gene pigmentation in a given individual is probably from SLC24A5. The father is likely a heterozygote on this at the SNP in question, with a “light” European copy, and a “dark” African one. The mother is most likely, though not inevitably, a homozygote; the frequency of the “light” copy is well north of 50 percent in South Asians (I’m a homozygote, as are both my parents). So the child has a 50 percent chance of being a heterozygote or a “light” homozygote. That’s some of the answer right there. Because the child does not have blue eyes we know that they are unlikely to be homozygote for the combination of markers which is correlated with blue eyes (probably due to a regulator element on the HERC2 locus). This is also associated with lighter complexion and hair color. But there is another locus which I think would be especially important: SLC45A2. There is a “light” variant here which is highly localized to Europeans. Its frequency is 95 percent in Northern Europe, and 15 percent in Northern India (85 percent in Northern Italy, 65 percent in Turkey, etc.). It is not found in East Asia or Africa, except in cases of clear admixture with Europeans. Europeans who are homozygote for the “dark” variant tend to be olive skinned (this genotype is relatively rare, though not unheard of in Southern Europe as per the frequencies above). Both the parents in this case would almost certainly be heterozygotes. This means that their son had a 25 percent chance of exhibiting the Northern European genotype. That is a straightforward explanation for why he might be lighter than either parent. Of course there are a few other genes of some importance, but I suspect that SLC45A2 is where most of the work is done in this case because of the backgrounds of the parents (i.e., I’m pretty sure they’re heterozygotes).

I understand that the point of the article was not the genomics of pigmentation. But to talk about social matters it sometimes pays to get the science nailed down. Like it or not this is a time in the United States where people of mixed ancestry are going to be more common. I rarely get the “Where are you from?” question anymore (because I’m not black or white), but I wonder if the “What are you?” (asked of mixed-race individuals) is going to persist a little longer.

* I think lurking within the subtext of the article is the salience of African ancestry, and the idea that it is particularly potent. The author’s wife’s background is mentioned almost in passing, before moving back to the main attraction of the child of an African American no longer appearing visibly African American. Many of the ideas of white nationalist thinkers such as Madison Grant may no longer be in vogue, but their idea that African ancestry was particular powerful in swamping out all other ancestry remains an unspoken assumption in American society.

🔊 Listen RSS

Dienekes and Maju recently pointed to a paper, Contrasting signals of positive selection in genes involved in human skin color variation from tests based on SNP scans and resequencing, in Investigative Genetics. Skin color is an interesting trait because it’s one of the big “wins” in human genomics over the past 10 years. To a great extent we now know with reasonable certainty the genetic architecture and the loci responsible for most of the between population variation in pigmentation in humans. This is sharp contrast to the situation in the year 2000. Yet this result was foreseeable decades ago. Here’s what I said 5 years ago:

About two months ago I posted an entry where I sketched out an extremely simple model for skin color assuming there were 6 loci and two alleles (on and off). There was a reference in the comments to “5 loci” for skin color as a quantitative trait. From what I can gather that assumption derives from a paper published in 1981 by Russ Lande, which is online. In reality that paper simply draws upon older work from 1964, and its primary focus is on estimating the number of loci in crosses between heterogenous populations (using inbred lines was the way pioneered by Sewall Wright). But, it turns out that Cavalli-Sforza and Bodmer discuss that older work in Genetics of Human Populations, which I have a copy of. Today genomics is exploring the details of the loci which control for skin color, but we have a long way to go, so I’m going to reproduce some of the data and conclusions from Bodmer & Cavalli-Sforza’s work so that it will be online….

I’m laughing at the “we have a long way to go” part. Long way in this case probably meant a few years, as I don’t think there’s been that much substantive change since about 2008 in human pigmentation genetics. All the low hanging fruit has been picked. It looks like that across any two distinct inter-continental populations you’ll be able to apportion most of the variance to less than half a dozen loci. Geneticists were able to infer this decades ago based on pedigree analysis, which was only possible because of the fact that these were large effect quantitative trait loci in the first place (i.e., most of the variation was due to only a few genes). * If the trait had been extremely polygenic they’d only have been able to say with any plausibility or precision that the number of genes responsible was very large.


But it’s one thing to ascertain the genetic architecture of the trait, and another to make reasonable characterizations about its natural history. To make a long story short haplotype based tests, which look for correlations of markers across regions of the genome, tend to suggest that many of the pigmentation loci have been subjected to recent bouts of natural selection. More interestingly, the candidate genes which seem likely to account for light skin in East and West Eurasians seem to be somewhat different, implying that the change in allele frequencies postdates the separation of these two populations. A few years ago there were waves made when there was a report that the gene which seems to be responsible for a great deal of the de-pigmentation in West Eurasians, SLC24A5, only began to sweep up to higher frequencies within the last ~6,000 years. But I heard through the grapevine that this may be too much of an underestimate, and you might be looking at a sweep which began more than ~10,000 years ago.**

The results in the paper above throw some cold water on positive results for natural selection at the pigmentation loci. Why does this matter? Because a priori there are obvious reasons why there might be natural selection at these genes. In contrast, many results have to be accompanied by after the fact suppositions as to the functional rationale for adaptation. The question becomes: if you can’t trust the results to be consistent on a trait where the adaptive rationale and genetic architecture are clear, when can you trust these tests? I think the qualifying kicker in the paper above comes in the discussion:

The fifth, and perhaps most likely, reason for discrepancies between LRH [long range haplotype] and sequence-based tests we observed here may be the different underlying assumptions of the evolutionary models used (that is, instantaneous selective sweep versus incomplete selective sweeps) in the definition of each statistic, and the evolutionary timescale over which each type of test can recover departures from neutrality…In that case, our results might indicate an extremely recent selection in the pigmentation genes, which would be recovered by haplotype-based but not sequence-based tests.

In other words, the authors themselves believe i is entirely possible that the likely reason you don’t see a concordance between the results in these sets of tests is that they exhibit differing sensitives to different adaptive dynamics. This is one reason haplotype based tests became popular in the first place, as they could fix upon processes which something like Tajima’s D might miss. So at this point I think we can still say with some certainty that natural selection seems highly likely at these genes, even if they don’t jump out on all the tests.

COMMENTS NOTE: Any comment which misrepresents the material in this post will result in banning without warning. So you should probably stick to direct quotes in lieu of reformulations of what you perceive to be my intent in your own words. For example, if you start a sentence with “so what you’re trying to say….”, you’re probably going to get banned. I said what I tried or wanted to say in the post. Period.

* There are few enough SNPs that I can, and have, constructed a distribution of phenotypic outcomes of my soon-to-arrive child based on the variation present in the parents, who have both been genotyped.

** I am homozygous for the “European” allele at this locus, as are my parents. I am of the suspicious that this variant arrived in the Indian subcontinent via the “Ancestral North Indians.”

🔊 Listen RSS

I was pointed today to a piece in the BBC titled What makes a mixed race twin white or black?. The British media seems to revisit this topic repeatedly. There are perhaps three reasons I can offer for this. First, it tends toward sensationalism. Even though the BBC is relatively staid, when it comes to science it converges upon the tabloids. Second, because the number of non-whites in Britain is relatively small, there is a higher proportion of intermarriages between minorities and the white majority (from the perspective of minorities). This is especially true of people of Afro-Caribbean ancestry. So of the proportion of minorities a larger fraction are recently mixed in Britain than in the USA. Finally, the United States has a more complex attitude toward race relations than the United Kingdom, because the former has traditionally had a large non-white minority while the latter has only had so since the years after World War II. I suspect that “black-white twins” stories would seem in bad taste on this side of the pond, and bring up certain memories best forgotten.

Now, there are fallacies, confusions, and misleading shadings, in the BBC piece. I’ll hit those first before reviewing what’s going on here when fraternal twins exhibit totally different complexions.

It starts out somewhat ludicrously: “Her son Leo has black skin and her daughter Hope, has white skin.” This is false in a precise sense. Leo clearly has medium to light brown skin (there are photos in the piece). What’s going on here is that Leo has some African ancestry, and because of the rule of hypodescent all people of African ancestry with a shade of brown skin, from nearly black to light brown are termed “black skinned.” This is not a trivial semantic elision. If Leo truly had black skin, very dark brown, than there’d be a lot of explaining to do, because the genetics would be somewhat mystifying. More on that later.


She was adopted when she was four years old, and her birth mother is Afro-Caribbean and her British birth father was white. Her DNA tests revealed that, genetically, she was exactly 50% African and 50% European.

This is very unusual, and the results suggested that Shirley’s mother had pure African roots, and that her ancestors must have moved from Africa to the Caribbean quite recently.

Not necessarily. Mixed-race people, especially those with recent admixture, don’t have their different ancestral components distributed equally across their genome. It may be that in the process of sampling chromosomes from this individual’s Afro-Caribbean mother she received almost none of the European quantum, perhaps localized to a few chromosomal segments. This “noise” in the process explains why I seem to carry an elevated proportion of East Asian ancestry in relation to both of my parents. I simply received genetic copies sampled from the more “East Asian” regions of my parents’ genomes.


“Our skin colour is determined by a number of gene variants – at least 20 variants, I would say, probably quite a few more than that,” says Dr Wilson.

This is complicated, but I’d say that the good doctor is misleading the audience. Skin color seems to be a quantitative trait where you can explain the vast majority of between population variation with only a few genes, at most six. When it comes to European-African difference variants at two loci, SLC24A5 and KITLG can account for well over half of the difference. It is true that there are many, many, genes that effect skin color, but there is a definite distribution where the vast majority of genes tweak the trait only on the margins. In other words, there may be 20 variants (there are more), but for good predictive power at the inter-population level you’re good to go with 4 or 5.

I specify inter-population level, because within populations the gene set which can allow you to predict variation may be slightly different, and you have to take into account sex differences. For hormonal reasons males seem somewhat darker than females in human populations. Additionally, people also are palest in their youth, and become darker as they age. Finally, some of the genes which explain differences between populations are invariant within a population. Therefore the genes which are of lower effect size move up the stack. So when it comes to European-African variation, the largest effect gene, SLC24A5, won’t explain anything within these two populations. That’s because it is fixed for alternative variants (the light vs. dark conferring variants). So the second effect size may move up to first effect size when you evaluate on a smaller grain (but if the second effect size is nearly fixed, then it might drop far down as well).

Now let’s move on to the common idea that darkness dominates over lightness:

As in a painter’s palette, in the skin the presence of pigment dominates the absence of pigment, so the fact that Hope is white is very unusual.

This is hypodescent popping up again. Though in the West we live in an anti-racist age, at least notionally, it is interesting how concepts and models from a white supremacist era remain operative, at least implicitly. The idea that whites are recessive to non-whites makes totally sense if you code anyone with visible non-white ancestry as non-white. Even if they are genetically more white than not. The rationale for this model was the idea that there is a reversion to the more “primitive” type. So a cross between a black and a white produced a black, and a cross between a Nordic and a Mediterranean produced a Mediterranean. Inferiority taints the purity of superiority.

Less ideologically if you classify skin complexion into white and non-white in a dichotomous fashion then you logically consign the non-white trait to dominance. For example, if nearly, but not quite, white skin is “dark,” then you make it very difficult for someone with a substantial number of pigment conferring alleles to produce a child with very light skin.

Finally, now that we have elucidated the genetic architecture of pigmentation to a great extent we can make assessments of dominance and recessiveness on a locus by locus manner. If you plot skin complexion darkness as a function of reflectance you can turn it from a dichotomous or discrete trait to a continuous one. So individuals can have a “melanin index,” an integer value equivalent to their position on a scale of lightness and darkness. Converse to expectations above it turns out that on the two largest effect genes explaining difference between Africans and Europeans the light alleles are more dominant than the dark alleles! In other words, if the two alleles had an equal effect you’d expect a value between the two in their homozygote state. As it is, the values tend toward higher reflectance (light) than dark. I would caution that terms like “dominant” and “recessive” can be highly subjective and dependent on how you code the trait, the nature of the population you sample from in a polygenic character, or even scale the of values. So in this case you notice that switching from a dichotomous code of white vs. non-white to a continuous value corresponding to reflectance flips the model from the light trait being recessive to the dark, to the dark being recessive to the light (albeit, only mildly).

Because pigmentation is controlled by only a few genes the state at these loci are poor proxies for total genome content. In plainer language mixed-race siblings won’t deviate too much in their ancestral quantum, but they can deviate a great deal in their physical appearance. In fact, because of the poor correlation the slightly “blacker” twin in total ancestry may actually look more like a white person, and vice versa.

Now let’s go back to first principles. We’ll make some simplifying assumptions to illustrate what’s going on easily. Take 6 genes which control skin color. Assume equal effect. Each gene comes in two variants. Light and dark. Two copies of light result in a value of 0, while two copies of dark result in 2. A copy of each results in 1. In other words, the alleles are additive across a locus. Also assume that the genes are independent. They’re not linked. So the value at each gene is independent of the other genes. Finally, assume that the genes’ implied values summed together result in a total pigmentation phenotype outcome. So they’re additive across loci.

To make even simpler let’s assume that the parents are F1 African-European hybrids. That means that one of their parents’ was European and the other African. So both share the same ancestry of recent vintage. As it happens Africans and Europeans are very different on pigmentation genes, so we can assume that these parents carry one light copy and one dark copy across the six genes. This means you’d expect them to be brown.

Since they are brown, wouldn’t their children be brown? No. Not necessarily. As per Mendel’s Laws each contributes contributes one gene copy at each locus. So for the 6 loci above each parent contributes one pigmentation gene. What does that mean concretely? I Already simplified things to produce an elegant outcome: the F2 offspring could be all light, all dark, or one copy of both, like their parents, at any given gene. To illustrate what I’m talking about, SLC24A5 is disjoint in frequency across Africans and Europeans. All Europeans have one variant, and all Africans have another. So the offspring of a marriage between an African and a European will be heterozygote on that locus. If they marry another person of similar background, homozygote light and dark genotypes will resegregate out at fractions of 25% each, with half the outcome being heterozygote as in the parental condition. In other words, there are a 25% probability of a F2 child of F1 hybrids being “white” at this locus. There are 6 loci. Assuming independent probabilities, you multiply out 0.256, and get 1 out of ~4,000 that the child will be white like their white grandparents.

I ran this as a binomial 10,000 times, and here’s the distribution I came up with:

The white and black offspring don’t show up because the number of outcomes is so rare in this model, but as you can see the median outcome is brown, like the parents. But the tails are significant. In other words, don’t be surprised if there’s a lot of variation among the siblings. But why should you be? If you know of people from populations where pigmentation alleles are segregating in polymorphic frequencies, such as Latin Americans and South Asians, you are aware that different siblings can look strikingly different when it comes to complexion. Though I guess that’s a new insight for the British….

🔊 Listen RSS

Sort of and possible. I’ve been talking about this for years, and Greg Cochran points me to an abstract at the human genetics conference referenced earlier. Novel coding variation at TYRP1 explains a large proportion of variance in the hair colour of Solomon Islanders:

The Solomon archipelago comprises over 1,000 islands located east of Papua New Guinea and has a population noted for wide variation in hair pigmentation. 1200 samples were collected from 16 centres and hair colour measured in donors by spectrophotometer. Analysis of 589,241 single nucleotide polymorphisms across a subset of 42 dark haired and 43 blond haired individuals revealed a signal for pigmentation driven by 27 markers on 9p23 at the TYRP1 gene (rs13289810…). There were no systematic differences in ancestry between dark and blond haired participants indicating that this variation is unlikely to be due to recent introgression from other populations. Sequencing of TRYP1 showed complete conservation of this locus bar nucleotide 5,888(NG_011750), which was homozygous C in dark haired individuals and T in blonds. The resulting CGC->TGC missense mutation changes the 93 amino acid in exon 2 from an Arginine to a Cystine. Genotyping of TYRP1(93C/T) in all samples and analysis showed that in a recessive model including sex, age and local geography, there was a -1.67(-1.76, -1.50) standard deviation difference in hair colour by genotype groups (p=3.5e-106) equating to ~40% variance in this trait. Genotyping in the Human Gene Diversity Panel showed TYRP1(93C/T) to be essentially private to the Solomon Islanders…In humans, complete loss of function for Tyrp1 is known to cause rufous albinism. This is one of the only examples of a genomewide association study implicating causal variation directly, of a common local variant of functional effect being absent in other human populations and is one of the largest phenotypic effects attributable to a common polymorphism. Reasons for the maintenance of this variation are unclear, however this finding prompts the notion that we may find other large (disease causing) effect variants that are population specific and that our results are a call to arms to expand medical genomics to underrepresented populations.

Australian Aboriginals are not present in the HGDP panel, so there is no clarity on blondism in those populations, or amongst other indigenous groups in Southeast Asia and Oceania. If these are deep ancient variants then this may span all these populations. If not, then you see independent occurrences of a phenotype which is only present in Europeans and European-derived/admixed populations elsewhere. Why? One hypothesis I’ve thrown out is that it is possible that the expansive of agriculture populations erased a great deal of past human phenotypic diversity, due to the demographic growth of small initial founding groups ~5-10,000 years ago.

The question mark in the title by the way is that just because we characterize the genomic architecture of a trait, we don’t understand why it is distributed in the way it is. Perhaps small populations resulted in more genetic drift in Oceania than elsewhere? Or there is selection on the TYRP1 locus, and this trait is a side effect?

🔊 Listen RSS


John Hawks illustrates what can be gained at the intersection of old data and analysis and new knowledge, Quote: Boyd on New World pigmentation clines:

I’m using some statistics out of William Boyd’s 1956 printing of Genetics and the Races of Man[1]. It gives a good accounting of blood group data known more than fifty years ago, which I’m using to illustrate my intro lectures. Meanwhile, there are some interesting passages, from the standpoint of today’s knowledge of the human genome and its variation.

On skin pigmentation — this is the earliest statement I’ve run across of the argument that the New World pigmentation cline is shallower than the Old World cline because of the relative recency of occupation….

Looking at what was said about pigmentation generations ago is of interest because it’s a trait which in many ways we have pegged. See Molecular genetics of human pigmentation diversity. Why humans vary in pigmentation in a deep ultimate sense is still an issue of some contention, but how they do so, and when the differences came about, are questions which are now modestly well understood. We know most of the genetic variants which produce between population variation. We also know that East and West Eurasians seem to have been subject to independent depigmentation events. We also know that some of the depigmentation was relatively recent, probably after the Last Glacial Maximum, and possibly as late as the advent of agriculture.

On the New World cline, which is clearly shallower than that of the Old World. The chart below from Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms is useful:

skinvarianceWhat you’re seeing here are patterns of relationships by population when it comes to the select subset of genes which we know are implicated in between population variation in pigmentation. The peoples of Melanesia are arguably the darkest skinned peoples outside of Africa (and perhaps India), and interestingly they are closer to Africans than any other non-African population. But in total genome content they’re more distant from Africans than other non-African populations, excluding the peoples of the New World.

This disjunction between phylogenetic relationships when looking at broad swaths of the genome, as opposed to constraining the analysis to the half a dozen or so genes which specifically encode between population differences on a specific trait, is indicative of selection. In this case, probably functional constraint on the genetic architecture. From the reading I’ve done on skin pigmentation genetics there is an ancestral “consensus sequence” on these genes which result in dark complexions. In contrast, as has been extensively documented over the last few years there are different ways to be light skinned. In fact, the Neandertals which have been sequenced at those loci of interest also turn out to have a different genetic variant than modern humans.

How to explain this? I think here we can go back to our first course in genetics in undergrad: it is easier to lose function than gain function. The best current estimate is that on the order of one million years ago our species lost its fur, and developed dark skin. And it doesn’t look like we’ve reinvented the wheel since that time. All of the peoples termed “black” across the world, from India, to Australasia, to Africa, are dark because of that ancestral genetic innovation. In contrast, deleterious mutations which “break” the function of the genes which gave some of us an ebony complexion occur relatively frequently, and seem to have resulted in lighter skinned groups in more northerly climes. It turns out that some of the pigmentation genes which are implicated in between population variance in complexion were actually originally discovered because of their role in albinism.

So how does this relate to the New World? I think the difficulty in gaining function once it has been lost explains why the people of Peru or the Amazon are not as dark skinned as those of Africa, Melanesia, or South Asia . They haven’t had enough time to regain function which they lost as H. sapiens traversed northern Eurasia. So there you have it. A nice little illustration of how the genetics taught to 18 year olds can be leveraged by the insights of modern genomics and biological anthropology! In the end, nature is one.

Image Credit: Dennis O’Neil

🔊 Listen RSS

Populations_first_wawe_migrRecently I was looking for images of the alpine biomes of the New Guinea highlands* and stumbled onto some intriguing, though not entirely surprising, set of photographs of individuals from Papua New Guinea. They were noteworthy because they manifested the conventional Melanesian physical type, but their hair had a blonde cast to it. For example, here is a charming blonde boy. The photographer has several other striking portraits of Melanesians with lighter hair at his website. In regards to the peculiar hair color of these people he says: “When you ask the people why there are so many blonde people on the islands, they answer 3 things: they have white ancestors, they receive too much sun, or they do not eat enough vitamins! – Langania village, New Ireland, Papua New Guinea.” There is more discussion in the comments about this issue, some claiming that likely it is the sea water and sun which is producing bleaching naturally. If you look around you will see references to bleaching of hair among some of these people as a cultural trait, though the references tend not to be concrete (many clearly assume they’re bleaching their hair, rather than reporting bleaching). The blonde being at the tips from what I can tell in some cases I certainly don’t reject the explanation that bleaching is a cultural practice among these peoples, albeit for children and women only.

But the peculiar hair color of these populations is noted in the scientific literature as if it is a biological characteristic of these groups, not a cultural artifact. From Molecular genetic evidence for the human settlement of the Pacifc: analysis of mitochondrial DNA, Y chromosome and HLA markers: “The Tolais of New Britain are phenotypically ‘Melanesian’, with fairly dark skin and frizzy hair, some-times almost blonde as in some highland Papuan groups.” Enter Tolai ‘New Britain’ into Google Images and the first few pages have several instances of blonde children, including this cute triplet.

Before we go any further, I want to express my skepticism at the idea that this is European admixture. The loci associated with higher odds of having blonde hair in Europeans, OCA2, KITLG, etc., also result in light skin, and secondarily blue eyes. In other words in Europeans blonde hair is to a large extent one effect of generalized depigmentation. There is no magic “blonde gene” which operates independently from the variants which produce lighter skin, or lighter eyes. Though the outcome is not deterministic, the probabilities make it so that someone who has naturally blonde hair is very unlikely to have dark brown skin, at least in any genetic architecture we’re familiar with in Europeans (e.g., African Americans with light eyes and/or hair, also tend to be light skinned).

But if you want more than my logic above, here’s a STRUCTURE plot from The Genetic Structure of Pacific Islanders:


I reedited for clarity. Remember that K = putative ancestral populations. So you’re looking for population substructure, and inferred admixture. I’ve compared the Oceanian groups to French from the HGDP sample. The Polynesians in the sample have clear European admixture, but the Melanesians generally do not. The aforementioned Tolai are one of the groups analyzed in this paper, and contrary to one of their explanations for their high frequency of blondness they do not some to have any European ancestry.

What about bleaching? I will be interested to hear what readers have seen, but to my limited knowledge dark skinned populations in other oceanic environments do not seem to have such bleached hair. But, relatively simple forms of hair bleaching do exist which would be possible for a less affluent population to practice as a rite of some sort, or perhaps for simple aesthetic reasons. I put a modest probability on this being the full explanation for this phenotype, and a high probability for it being some of the explanation.

So let’s move to the most novel explanation: that the populations of Oceania have an independent genetic architecture for the emergence of lighter hair color. For me the biggest factor to weight in this hypothesis’ favor is that to my knowledge there are only two population groups in the world which have an appreciable frequency of lighter hair which are not of West Eurasian origin, and they are the indigenous peoples of Melanesia and the Australian desert (this trait seems to be relatively common in the children of the Warlpiri people for example). As we noted last week these two populations form a natural phylogenetic clade, so it seems highly coincidental to me that both exhibit the unique phenotype of relatively dark general pigmentation, but lightness of hair. Additionally, like Europeans lighter hair color seems to be concentrated among children and women in both these groups, aligning with what we know are the correlations of pigmentation and hormones (males and adults are darker).

One obvious model for the blondeness of central desert Australian Aboriginals is European admixture. But the same problems emerge as in the case of the Melanesians: of presumed European traits only the blonde hair expresses, which is a highly peculiar phenomenon. Additionally, we have a relatively recent report from a scientific perspective on the genetics of this trait among these populations, Joseph Birdsell’s Microevolutionary Patterns in Aboriginal Australia: A Gradient Analysis of Clines. The book is from 1993, and no doubt most of the research was done earlier, so the techniques and analyses may seem a bit crude to us. Birdsell observed that the inheritance pattern of blonde hair among the desert Aboriginals exhibited “incomplete dominance.” He recorded that the frequency of the trait was rather high within these tribes, at least for children and women. Additionally, he observes that people with an eastern Aboriginal parent and European parent usually had brown hair of various shades. But among individuals who had one blonde (at least as a child) desert Aboriginal parent and a European parent the offspring tended to be disproportionately blonde, even if the European parent was a brunette! Finally, he observed that aside from head hair, only the body hair of the forearm was blonde. The rest was dark in these Aboriginals.

From what I can tell Birdsell’s monograph is the only recent scientific exploration of this particular topic of blondism among the peoples of Oceania. Many physical anthropologists record the observation of non-black hair among these peoples, but for most their interest did not go beyond cataloging the fact, or it was an incidental result in a bigger project. There’s still a lot about human variation we don’t know. In regards to human pigmentation most of the puzzle has been completed. This is one piece which remains to be found.

Addendum: Some work on the pigmentation genetics of Melanesian populations has been done. They resemble Africans more than any other non-African group in their genetic architecture of loci implicated in the variation of pigmentation. That would basically eliminate the European admixture model to my mind to explain light hair, and increase the probability of bleaching and/or a different and unknown locus.

Note: Blondism among North African, Middle Eastern, Central and South Asian populations is I believe either simply part of the natural continuum of West Eurasians, or, admixture from Europeans or other blonder groups. I believe that this is even the source of blondism among groups like the Hmong, who have a legend of migration from deeper in Asia, where they may have mixed with West Eurasian populations on the fringes of China proper.

Related: Blondism in Melanesia.

* The highest peak in New Guinea is ~14,000 feet above sea level, and in the higher reaches of the uplands it snows periodically.

Image Credit: Wikimedia

🔊 Listen RSS

Thank god for animals and their resemblance to humans to elucidate general patterns and relationships. Missense Mutation in Exon 2 of SLC36A1 Responsible for Champagne Dilution in Horses:

The purpose of this study was to uncover the molecular basis for the champagne hair color dilution phenotype in horses. Here, we report a DNA base substitution in the second exon of the horse gene SLC36A1 that changes an amino acid in the transmembrane domain of the protein from threonine to arginine. The phenotypic effect of this base change is a diminution of hair and skin color intensity for both red and black pigment in horses, and the resulting dilution has become known as champagne. This is the first genetic variant reported for SLC36A1 and the first evidence for its effect on eye, skin, and hair pigmentation. So far, no other phenotypic effects have been attributed to this gene. This discovery of the base substitution provides a molecular test for horse breeders to test their animals for the Champagne gene (CH).

Is horse color a big deal in terms of value? I wonder what the reason why there are so many horse pigmentation papers as opposed to a cheaper multi-colored animal like dog or cat.

Related: White horses and blonde humans: a genetic connection? KITLG makes you whiter.

• Category: Science • Tags: Genetics, Pigmentation 
🔊 Listen RSS

A Genome-Wide Association Study Identifies Novel Alleles Associated with Hair Color and Skin Pigmentation:

It has been a longstanding hypothesis that human pigmentation is tightly regulated by genetic variation. However, very few genes have been identified that contain common genetic variants associated with human pigmentation. We scanned the genome for genetic variants associated with natural hair color and other pigmentary characteristics in a multi-stage study of more than 10,000 men and women of European ancestry from the United States and Australia. We identified IRF4 and SLC24A4 as loci highly associated with hair color, along with three other regions encompassing known pigmentation genes. Further work is needed to identify the causal variants at these loci. Improved understanding of the genetic determinants of human pigmentation may help identify the molecular mechanisms of pigmentation-associated conditions such as the tanning response and skin cancers.

….Taken together, these four regions explain approximately 21.9% of the residual variation in hair color (black-blond) after adjusting for the top four principal components of genetic variation. (Conversely, after adjusting for these four regions, the top four principal components of genetic variation explain 2.6% of the residual variation in hair color.)….

There are four regions because areas around HERC2/OCA2 and MAPT showed signals. MAPT is also known as AIM1 and SLC45A2, so this makes 3 genes of the potassium-dependent sodium/calcium exchangers implicated in pigmentation (the other is SLC24A5 obviously). They adjusted for the components of genetic variation so as not to be confounded by population stratification (i.e., there was some ethnic variation among their whites and so you don’t have a random mating population).

It’s in PLOS; you can read the whole thing, etc.

Related: Why white people are so colorful!. Sandy also comments.

• Category: Science • Tags: Genetics, Pigmentation 
🔊 Listen RSS

How Skin Color Is Determined:

In 2005 researchers identified a gene called SLC24A5 as a key determinant of skin color. Rebecca Ginger and colleagues now confirm that the protein product of this gene (NCKX5) is an ion exchanger; it exchanges sodium for calcium across a membrane, regulated by potassium. But unlike other NCKX proteins, they found that NCKX5 is not present on the cell surface, but internally in a compartment known as the trans-Golgi network. This compartment is where new proteins and vesicles are processed, modified and sorted.

When the researchers knocked out NCKX5 in melanocytes (the skin cells that manufacture the melanin pigment), melanin production decreased dramatically. They also demonstrated that changing the ancestral amino acid (alanine) at position 111 to the European form associated with lighter skintone (threonine) reduced NCKX5’s exchanger activity.

Related: SLC24A5 and skin color.

• Category: Science • Tags: Genetics, Pigmentation 
🔊 Listen RSS

I’ve been blogging the HERC2/OCA2 story a fair amount. It seems this genomic region is the locus of main effect for variation of eye color in Europeans, in particular blue vs. non-blue eyes. But I also pointed out that this locus has also been connected to variation in skin color, and while that variation is additive in effect, the variation on eye color exhibits strong dominance/recessive dynamics. My inference here is that it is more plausible that selection occurred on skin color, while eye color was a tissue specific expression pattern which emerged as a byproduct. Peter Frost has an objection to this:

The correlation between eye color and skin color may simply be an artefact of geographic origin. Europeans vary clinally for both eye color and skin color along a north-south and west-east gradient, so if the pool of subjects is geographically heterogeneous you will almost certainly get a correlation between eye and skin color. But this doesn’t prove a cause and effect relationship.

Fair enough. Spurious associations driven by cryptic population substructure is one of the main reasons Structure was developed. I responded to Peter here, here and here. The short of is that I don’t know of any analysis within an admixed population like African Africans, which would settle the matter, but there are plenty of other points which would suggest that we should look at the skin color trait (and, to be fair, if substructure exists at the level of British Isles origin samples we really need Strucure!).

But there was something that has been bothering me: eye color difference exhibits a lot of dominance/recessive dynamics in expression. The skin color data here does not, and aside from KITLG (which is dominant for light skin) all the other loci seem additive and independent (the report of epistatic effects here & there don’t seem reproduced very often). One of the main reasons that I am favoring a skin color model as the phenotype driving selection is that if it is additive it is exposed to selection immediately at low frequencies. In contrast, recessive traits at low frequencies have the problem that most copies of the allele which increases fitness are still in heterozygotes which mask them from selection. It came to my mind that the different assumptions about dominance would matter in terms of long term evolutionary dynamics and how that would be realized in terms of results from tests for selection. So I found this paper, Directional Positive Selection on an Allele of Arbitrary Dominance. It says:

…fixation of a beneficial allele leaves a signature in patterns of genetic variation at linked neutral sites. If this signature is well characterized, it can be used to identify recent adaptations from polymorphism data. To date, most models developed to characterize the effects of positive directional selection (termed “selective sweep”) have assumed that the favored allele is codominant. In other words, if the fitnesses of the three genotypes are given by 1, 1 + sh, and 1 + s (where s is the selection coefficient), then h = 1/2….

For skin color h would be 1/2 for HERC2/OCA2, it has half the effect on the trait value. Assuming proportional selection based on the character value two copies would be better than one copy which would be better than no copies. In contrast, for eye color the h would be between 0 and 1/2, and probably closer to 0 because of predominant recessivity in expression for blue eyes. That means the fitness of those with one blue eye copy would be much closer to those with no blue eye copies than those with two; to the homozygote recessives would go all the benefit. On to the results:

…when h is small, most of the sojourn time is when the allele is at low frequency in the population. During this phase, the allele will have the opportunity to recombine onto other backgrounds. In other words, the favored allele will tend to increase in frequency on multiple backgrounds, preserving more of the diversity that existed when it first arose. In contrast, for dominant alleles, most of the sojourn time is spent at higher frequency, when there is less opportunity for the favored allele to recombine onto other backgrounds. This results in a wider signature of a fixation event for larger h-values.

…presents the two statistics as a function of distance from the selected site for different h-values. As can be seen, both reach 0 faster for smaller h. For example, for these parameters, the means of these statistics 18 kb from the selected site are ~0 when h = 0.1, but they are still negative 40 kb away for h = 0.9. This finding suggests that, all else being equal, it will be more difficult to detect a selective sweep if the beneficial allele was recessive.

…This difference produces distinct genealogies and hence distinct patterns of polymorphism after the fixation of a beneficial mutation. In particular, our simulations show that the fixation of dominant alleles influences a larger genomic region, suggesting that this type of favorable substitution may be easiest to detect from polymorphism data.

Why the bolded parts? From A Map of Recent Positive Selection in the Human Genome:

Some of the strongest signals of recent selection appear in various types of genes related to morphology. For example, four genes involved in skin pigmentation show clear evidence of selection in Europeans (OCA2, MYO5A, DTNBP1, TYRP1). All four genes are associated with Mendelian disorders that cause lighter pigmentation or albinism, and all are in different genomic locations, indicating the action of separate selective events. One of these genes, OCA2, is associated with the third longest haplotype on a high frequency SNP anywhere in the genome for Europeans….

I don’t know if my connection of inferences here is valid, and the paper I originally referenced makes clear that it is important to frame these sorts of assumptions within their statistical context; just because something is less likely does not mean it is impossible. I’ve sent out emails about OCA2 and skin color, and will report back, but at this point I suspect that the final proof in the pudding will have to be admixture analysis in a group like African Americans. But I think the above makes it more likely that whatever was going on 10,000 years ago did not express as a recessive phenotype.

• Category: Science • Tags: Genetics, Pigmentation 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"