The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

51zeajUmWhL._SX316_BO1,204,203,200_ Reading The Essential Talmud about ten years ago I vaguely recall the author stating that it was common for working class males to devote each day to one page of one a tractate from the commentaries on the oral law of the Jewish religion. As I am not religious, and look dimly on excessive orthopraxy, it struck me as a depressing thought.

But I am not entirely different. I often will relax at some point in the day and open up a random page of a population genetics textbook. Just as those Jewish men attempted to gain insight into the divine intent for how they should live their life, so with population genetics I am attempting to refine the theory which allows me to interpret the world around me.

It would probably help anyone who reads many of my posts as well, as it develops particular habits of mind. Though I often recommend Principles of Population Genetics, Elements of Evolutionary Genetics is also excellent. So in the future I’ll try to write up short insights which are pretty banal to most population geneticists, but which might be interesting to a motivated public, if my modest readership can be considered the “public.”

Page 100 has a section, “Selection in inbreeding populations.” The most important formal relationship on this page is:

Δqqs[h(1 -f) + f]

q = minor allele frequency on a biallelic locus, that is, the remainder from 1 – p

h = dominane coefficient , so that h = 0 means q is totally recessive and h = 0.5 means that the locus is additive in regards to allelic effect.

f = inbreeding coefficient, a basic measure of two alleles at the same locus sharing recent common ancestry (and therefore, rendering the genotype likely homozygous). From 0 to 1, with 1 meaning totally inbred and homozygous.

s = selection coefficient against the population mean fitness. Usually the value is near zero, though not exactly zero. A positive selection coefficient of 0.01 is considered very favorable for a new mutant.

What you see here is that in an instance where q is entirely recessive, inbreeding increases the selection on the locus. In a normal population with lots of random mating homozygous recessive genotypes are rare. When f ≈ 0 the change in the frequency of q is just a function of the selection coefficient and the dominance. As inbreeding increases, the importance of alleles (or lack thereof) in heterozygote genotypes decreases. For recessive traits inbreeding is another way to expose the novel alleles to selection.

This is one reason that unscrupulous breeders of animals sometimes utilize very close relatives in programs to change traits. The problem is that inbreeding has an effect across the whole genome, even if you are interested in particular loci. And that effect on the whole genome is often very bad, as lots of deleterious alleles with recessive expression are present in populations which are normally outbred. Of course in plants this also results in purging of genetic load, as alleles get flushed out of the system. Unfortunately for mammals, and complex metazoans in general, this doesn’t seem to work to well for out lineage. If it did work well zoological veterinarians, who I’ve talked to, would be a lot more hopeful about what they’re trying to do by mating near relations in the hopes that they can get a large enough population to maintain a viable breeding program.

• Category: Science • Tags: Inbreeding, Selection 
🔊 Listen RSS
Credit: Graham Crumb

Credit: Graham Crumb

If population genetics is “study of the distributions and changes of allele frequency in a population,” then the understanding of the maintenance of variation (or lack thereof) is one of the major topics of focus. In the first half of the 20th century when there was a lot more theory than data there were arguments about whether polymoprhism (in this era they’re talking about classical markers) was maintained through balancing selection or whether it was just a transient phenomena, and that at any given moment you’re just getting a snapshot of alleles sweeping up to fixation, or being purged out of the gene pool. In the second half of the 20th century it was all about neutral theory, and its discontents. Then the post-genomic era showed up, and geneticists had access to a lot of data and computational power to analyze it. Rather than relying on older molecular tests which were geared toward detecting inter-specific selection events population geneticists began scouring haplotype structure.

But even now there’s a lot of mystery. First, you might be able to adduce that selection is highly likely in a given region, but you may have no clue what that region does functionally (in some cases the region may not even be genic, in which case it has be a mysterious regulatory element). There are some good case studies where the mystery has cleared. Lactase persistence. The ways you can fight malaria. But over the past day I’ve been having to admit that it sure looks like the regions of the genome around pigmentation function are the targets of selection. But we don’t really know what selection is selecting for. And this is actually a set of selection events that I can imagine some day reaching a resolution into their probable cause. But we’re far from that.

A few years ago Eimear Kenney and company solved the mystery of why some Melanesian populations had very dark skins but blonde hair. I blogged about it, but didn’t read the paper too closely. Looking at the publication date, May 2012, I realize I was busy studying for some really big end of first year exams at that time, so that explains my lack of attention. In any case they found that a mutation, rs13289810 in TYRP1, results in blonde hair when it’s a homozygote. They didn’t find strong evidence for recent selection. That is there wasn’t a long haplotype block indicating a sweep in the past 10,000 years. The allele frequency difference across populations as well as long range linkage disequilibrium was suggestive of past selection.

map2 This was in the Solomon Islands. Today I decided to see if there was any follow up on this work. Well, Heather Norton’s group published a paper, Distribution of an allele associated with blond hair color across Northern Island Melanesia. It’s on a different set of islands, but the same results pretty much hold. The allele has a recessive effect on hair color, not much on skin color (there was a small effect in the original paper, so it seems it’s not wholly tissue specific in expression). But I just kept staring at this map and the frequencies. Look at the derived proportions…they don’t get above 0.50. But in most of the populations they’re around in appreciable proportions. I had a hard time not thinking there wasn’t balancing selection going on here. That this was something old that was persisting, but not fixing.

I asked Carlos Bustamante, and he got back me on Twitter:

I also had an exchange with the first author, and she pointed out in the supplements that the frequencies in the Solomons were quite curious too:

Region Genotype counts Frequency of 93C
Central 126 80 22 0.27
Choiseul 17 2 0 0.05
Guadalcanal 33 33 13 0.37
Isabel 23 17 7 0.33
Makira 13 11 3 0.31
Malaita 98 185 92 0.49
40 11 0 0.11
Temotu 13 11 3 0.31
Western 40 22 2 0.2
Total 405 374 142 0.36


When they looked in the HGDP data set it’s ancestral everywhere else. The derived variant isn’t floating around at low frequencies. One might naively think that it’s overdominance, but I suspect we’re looking at some negative frequency dependent selection. In the 2014 paper by Norton et al. it’s pretty clear that this is distributed across rather disparate populations. It is unlikely in my opinion to be purely due to population structure, as diverse islands have been sampled. It looks to be an old variant that’s persisted, so it dates to the Pleistocene settlement of Near Oceania. It’s also found in Australia, though we don’t know the genetic basis.

Ten years ago I would have been super excited to know the genetic basis of an interesting trait like this. But now I’m left with why? Why? We’ll be grappling with a lot of why’s in the next few decades.

• Category: Science • Tags: Blondism, Selection 
🔊 Listen RSS


$_35 Very important paper in PLOS BIOLOGY just out, Natural Selection Constrains Neutral Diversity across A Wide Range of Species. Important enough that the journal commissioned this article: Lewontin’s Paradox Resolved? In Larger Populations, Stronger Selection Erases More Diversity. The paradox is pretty straightforward. Assuming the neutral theory of molecular evolution you’d expect that you’d have more genetic diversity in species with larger population sizes, because the larger the population size the longer it would take for mutations to transition from novelty to fixation. More formally the time until fixation of a neutral polymorphism is ~4N e, with N e being the effective population size. In small populations mutations will emerge and fix rather quickly due to the generation to generation volatility of drift being so powerful, and therefore keeping down the total diversity. In large populations mutations will take a long time to traverse the frequency range from 0 to 100% because of the weakness of inter-generational random drift. The paradox was a big deal because for the past 30 years or so the neutral (or nearly neutral) has been the implicit null model, and I’d argue broadly supported as such, albeit with strong dissents.

41TCN6WTB4L._SY344_BO1,204,203,200_ The “controversies” that occurred from the 1970s onward about the role of selection and and its enemies are somewhat notorious. Some of the figures are well known to the public. Richard Dawkins and Stephen Jay Gould both had cameos because of their differing views about the pervasiveness of adaptation in evolutionary process more generally. But the geneticists at the heart of the major disagreements are more obscure to the general public, though in the early 1990s the Sacramento Bee reported on the beef between John Gillespie and Motoo Kimura (Gillespie was based out of UC Davis, near Sacramento). From what I can tell, and who I know, it strikes me that genomics has now somewhat mitigated the role of rhetoric in the debate, and at the same time fostered an abating of the extremism of some of the anti-selectionists. Leibniz’s stance of “let us calculate” has now become more important than a turn of the phrase or evocative metaphor. With data there is less of a role for posturing. Additionally, the fact is that many researchers did not follow mathematical theoretical proofs very closely or with genuine comprehension, so empirical results are really what is changing the terms of the debate. The Drosophila world has long been a redoubt for selectionism, but now you see papers such as Genome-wide signals of positive selection in human evolution, which argue for the importance of that population genetic parameter even for small effective population size organisms such as humans.

187874 What the authors did in the above paper was leverage the fact that with genome-wide data they could test the theoretical propositions empirically. In particular, they looked at regions with reduced recombination,* and therefore should be subject more strongly to selection (whether selective sweeps, which allow for the hitchhiking of regions around the target of selection and generate long haplotypes, or background selection, which constrains genomic variation due to negative pressures against mutation). As the figure above shows there is a correlation between the power of selection on the genome and inferred effective population size. I say inferred because they had to use species range and size as proxies. Obviously this isn’t perfect, but I suspect that the utilization of these proxy variables only diminishes the correlation. The authors admit that there is a lot of work to be done, but this is just the first step. Perhaps the results will change somewhat with a different selection of organisms (N = 40), but I’m moderately skeptical. Probably the most important line in the paper is “it seems clear that, in most cases, BGS [background selection] is a more appropriate null model for tests of natural selection than strict neutrality.”

* Recombination shuffles the association of variants across the genome, and so separates their destiny, whether good (positive selection) or bad (negative selection).

• Category: Science • Tags: Evolutionary Genetics, Genetic Draft, Selection 
🔊 Listen RSS

Is ADSL the locus of human genius?

God knows I would sleep more if it weren’t for bioRxiv. A new single author preprint debuts a new method, 3P-CLR, which extends XP-CLR, as a method to detect natural selection. The key is that it uses an explicit three-population tree to pick up selection events after the most recent, and second most recent, divergence events. So in the tree of ((Eurasians , Africans)Archaic Humans), this method can pick up perturbations which suggest selection after the emergence of a coherent anatomically modern population, but before it differentiated into its gorgeous mosaic.

In any case, the most recent version of the preprint, Testing for ancient selection using cross-population allele frequency differentiation:

A powerful way to detect selection in a population is by modeling local allele frequency changes in a particular region of the genome under scenarios of selection and neutrality, and finding which model is most compatible with the data. Chen et al. (2010) developed a composite likelihood method called XP-CLR that uses an outgroup population to detect departures from neutrality which could be compatible with hard or soft sweeps, at linked sites near a beneficial allele. However, this method is most sensitive to recent selection and may miss selective events that happened a long time ago. To overcome this, we developed an extension of XP-CLR that jointly models the behavior of a selected allele in a three-population tree. Our method – called 3P-CLR – outperforms XP-CLR when testing for selection that occurred before two populations split from each other, and can distinguish between those events and events that occurred specifically in each of the populations after the split. We applied our new test to population genomic data from the 1000 Genomes Project, to search for selective sweeps that occurred before the split of Africans and Eurasians, but after their split from Neanderthals, and that could have presumably led to the fixation of modern-human-specific phenotypes. We also searched for sweep events that occurred in East Asians, Europeans and the ancestors of both populations, after their split from Africans.

The software will be posted on the author’s github when the manuscript is accepted somewhere.

A minor note is that the data set used was from the 1000 Genomes. The Sub-Saharan Africans then are not from the hunter-gatherer populations, the Khoisan and the Pygmy, who seem to have the largest reservoir of genetic variation. The figure above is from a major signal of selection which is specific to modern humans, but excluded from the Neandertal populations. That is, fixed in us for a derived mutation, fixed in our cousins for the ancestral type (ancestral as judged by reference to the chimpanzee outgroup). My main curiosity is to push the three-population model so that it is ((Khoisan, non-Khoisan)Archaic Humans). I know from ASHG that there are now a fair amount of good quality whole genomes from African hunter-gatherers, so no doubt people are looking for these signatures.

The holy grail here for some geneticists (e.g., Svante Paabo) is to find that gene or genes which changed in us to make us sui generis. I no longer believe that this will ever be found. Assuming tens of millions of polymorphisms floating around in the genome no doubt candidate genes will emerge, just like FOXP2 did all those years ago. But I no longer believe that there is a necessary or sufficient genetic variant for our humanity. It’s a quantitative trait, and many of the hominin lineages were actually stumbling in the same direction.

On a more optimistic note, those of us who work on non-human genomes will also have data sets to rival those who are savants of humanics in the near future, so these methods are generally useful.

Citation: Testing for ancient selection using cross-population allele frequency differentiation, Fernando Racimo, bioRxiv doi:

• Category: Science • Tags: 3P-CLR, Human Evolution, Selection 
🔊 Listen RSS


David Reich’s lab has a new preprint out, Eight thousand years of natural selection in Europe, which serves as a complement to Massive migration from the steppe is a source for Indo-European languages in Europe. Where the previous work has focused on the relationships of ancient and modern populations, this research puts the spotlight on patterns of natural selection which have shaped ancient and modern populations. The method utilizes the explicit model which is supported by the previous work, that Europeans are best approximated as a three population admixture of a group represented by the hunter-gatherers of Western Europe, the first farmers which brought agriculture to Europe, and the peoples of Central Eurasia which likely brought the Indo-European languages to Europe. In the parlance of these sets of papers, WHG, EFF, and Yamnaya. Basically they have allele frequencies of these ancestral groups, thanks to ancient DNA techniques, and the frequencies in modern populations. By comparing the frequencies one can then infer if the deviations from expectation are large enough to satisfy the conditions you’d expect for a locus subject to a selective sweep of some sort which is changing proportions rapidly as a function of a given selection coefficient.

lctFirst, it is very obvious that lactase persistence in Europe has been under strong directional selection over the past 4,000 years. Even in the Bronze Age Central European samples did not exhibit frequencies of the derived variant common across Western and South-Central Eurasia on the LCT locus which is associated with persistence today. A quick survey of the 1000 Genomes data shows that this variant has wide variation in modern European populations which are phylogenetically close. The frequency in the Spanish data set is ~50 percent, but in the Tuscan Italian samples it is ~10 percent for the derived variant. In Denmark and Sweden the derived allele frequency goes up to ~75 percent (the phenotypic expression is dominant, so that means ~95 percent lactase persistence), though in the Finnish sample it is closer to the frequency of the Spanish data set. In South Asia the 1000 Genomes data as well as earlier work shows that frequencies are 25 percent or more in Northwest India, in the Punjab, where dairy culture is most pervasive. It drops as a function of distance from this zone, to 5 percent in the Southern and Eastern South Asia. The haplotype network around this particular mutation implies that it probably originated in Central Eurasia, so the varied frequencies across the Old World is suggestive of both migration and selection. Intriguingly, the lactase persistence allele is not present at appreciably frequencies in the Yamnaya. It begins to appear in cultures such as the Corded Ware Bell Beaker, though at far lower frequencies than is presently the case in this region.

But the story of lactase persistence is not entirely surprising. Its late evolutionary trajectory in relation to the rise of cattle culture and complex societies in Eurasia points to the reality that evolutionary change in the biological dimension requires a powerful cultural scaffold. That existed in the form of agro-pastoralism in Eurasia. Similar forces are at play across regions of Africa, where signatures of selection are even more evident in groups dependent upon cattle, likely because of the recency of the emergence of the trait, caught in mid-sweep.

A new face in the world?

A new face in the world?

There are few other signatures evident in these data. Three of them have to do with pigmentation, SLC24A5, SLC45A2, and HERC2. Ewen Callaway reported on the peculiarity last year that Paleolithic European hunter-gatherers may have had dark skin and light eyes. The reasoning here is that a large fraction of the complexion difference between Europeans and Africans is attributable to a derived mutation on SLC24A5, which is nearly fixed in modern Europeans. And yet ancient European hunter-gatherers on the whole were not fixed at this locus, and Western European hunter-gatherers, exhibited the ancestral variants. To get a sense of how peculiar this is the vast majority of the alleles in much of the Middle East are in the derived state, as are about half the alleles in South Asia (I am a homozygote for the derived allele for what it’s worth, and my skin is still notably brown, though obviously not extremely dark). The best available data suggests that the mutant allele emerged recently in the Middle East, and it has expanded out from that point of origin.

SLC45A2 is different in that its distribution is far more constrained to within Europe, though it is found at appreciable frequencies in the Middle East, and at lower frequencies in South Asia. The same for HERC2, though I was surprised to see that the “European” variant associated with blue eye color is actually found at a 0.10 proportion in the 1000 Genomes data in Bangladesh (I am a homozygote for the ancestral variant), the same fraction as the Punjabi sample.*

The results here seem to suggest that all these loci are under selection. The two SLC genes are under positive selection, though SLC24A5 probably got its first boost from EFF with the arrival of agriculture, and was subsequently fixed even when that group fused with the hunter-gatherers who lacked it. Curiously HERC2 is under some negative selection. Remember that all the hunter-gatherers seem to carry the derived variant, so the frequency could only but go down. But in Southern Europe it is likely being driven down in frequency, while it Northern Europe it has been maintained, or rebounded.

Of course one of the major issues we have when evaluating pigmentation loci and their relationship to selection is it’s not always clear if the target of selection is the trait of pigmentation, or something else which the locus modulates, and pigmentation just happens to be a salient side effect. There are many theories about why populations have become depigmented, but none of them are truly well supported in my opinion. Another question is whether we know the genetic architecture of pigmentation well enough to actually infer that these ancient populations are easily predicted in their trait character by modern models which map genotype to phenotype. In other words, were Paleolithic Europeans light skinned because of different alleles? The genetic architecture of skin color is relatively well understood in extant populations. Though it is possible, it so happens that modern Northern Europeans, and to a lesser extent Southern Europeans, harbor a substantial portion of European ancestry which is rooted in the Paleolithic. Studies in admixed African American populations, which are about ~20 percent European, indicate that the primary variants which determine complexion are the ones extant in modern populations, though it may be that there isn’t power to detect the ones from WHG, etc. Of course it could be that the lightening alleles of the Paleolithic Europeans were subject to negative selection, excepting the HERC2/OCA2 locus. But that’s not a particularly parsimonious solution from where I stand (by the way, if selection is targeting something other than pigmentation it is strange that pigmentation associated loci emerge in clusters as positive hits for selection tests).

A secondary issue in relation to pigmentation is that the Yamnaya population does not seem to have been particular fair of hair or azure of eye. The frequency of the derived HERC2 SNP is in the range of North Indian populations, while the SLC45A2 SNP is in the same frequency range as Middle Eastern groups. One might suggest that the Yamnaya are not representative of the population which was intrusive to Europe, but note that the frequencies of the alleles in question during the Late Neolithic and Bronze Age are intermediate between it and modern groups. These results imply in situ evolution within Europe over the Holocene, and down into historical times, toward the phenotype which we ascribe uniquely to Europeans. This is strange especially in light of the fact that a later eastern branch of Indo-Europeans seem to have been quite light. I don’t think we can make final inferences, but to me it is starting to look like the “Proto-Indo-European” complex of peoples was highly cosmopolitan and heterogeneous. Should we expect anything other? As the Mongols expanded in all directions their divergent tendrils were embedded in different ethnic substrate (e.g., Tatars, Khitai and Jurchen in China, Kipchak Turks in Russia, etc.).

The other major locus that showed up was one related to fatty acid metabolism, FADS1. Many tests for selection in humans and domestic animals show changes in the ability to process nutritive inputs. It seems an eminently plausible candidate phenotype to target for selection since the relationship to fitness is straightforward. Using polygenic score methods they also find that there was selection for shorter stature in early Neolithic populations in places like Spain. I think in the future one area of investigation is going to be in the domain of biological adaptations on the margin of farming populations which are put into a Malthusian pressure cooker. Humans, on average, were getting smaller until recently in comparison to their average stature during the Last Glacial Maximum. The Yamnaya people, in contrast to the Neolithic Iberians, seem to have been rather tall. Perhaps it had something to do with the nature of agro-pastoralism? (though do note that without lactase persistence they’d miss out on about 1/3 of the calories in the form of lactose sugar, though not the protein and fat)

edarmotalaBut there’s a twist which I haven’t gotten to, and that’s the one in regards to the hunter-gatherers from the Scandinavian region. Unlike the WHG samples you can see that they exhibit mixed frequencies of derived and ancestral alleles at the SLC loci. That’s peculiar, since geographically they are more distant from the core region from which EFF issued. We do know that their ancestry is somewhat exotic, as paper on Indo-European migrations pointed out that they seem to carry the same ancestral component which the Indo-Europeans brought to most of Europe, that of the Ancestral North Eurasians (albeit at far lower fractions than the EHG group which was a partial precursor of the Yamnaya population).

The past is complex and doesn’t fit into a solid narrative. And yet the weirdest aspect of the Scandinavian samples is that they carry the East Asian/Native American variant of EDAR at appreciable frequencies! The figure to the right illustrates this. In blue you have the focal SNP (dark is homozygote, light is heterozygote, dark circle means only one allele was retrieved). In the Chinese from Beijing population (CHB) the derived variant is at high frequency. In the sample of Northwest Europeans from Utah (CEU) it is not present. You can confirm these findings in the 1000 Genomes and elsewhere. In European EDAR of the East Asian form seems only to be found in Finland and associated populations. Using ALDER the authors conclude that admixture occurred on the order of 1 to 2 thousand years before the present, from an East Asian-like group (in the Indo-European paper they found this source best matched the Nganasans of North Central Siberia). An interesting fact which also comes out of this finding is that the haplotype that the derived SNP arose against is relatively common in Northern Europe. The arrows in the figure point to individuals who carry the ancestral SNP, but exhibit the same haplotype which is dominant in East Asia (and also among the Scandinavian hunter-gatherers with the derived variant). The authors state that “The statistic f4(Yoruba, Scandinavian hunter-gatherers, Han, Onge Andaman Islanders) is significantly negative (Z=-3.9) implying gene flow between the ancestors of Scandinavian hunter-gatherers and Han so this shared haplotype is likely the result of ancient gene flow between groups ancestral to these two populations.” Though in earlier work on these data sets they left open the possibility of gene flow between Eastern and Western Eurasia during the Paleolithic as a way to explain some results, it was not offered as a result for the Scandinavian hunter-gatherers. I do not know what to think of the fact that the haplotype that the derived East Asian SNP arose in is common in Northern Europe (though without the derived SNP, which is likely only present in a few populations due to recent Siberian admixture). Could it be that ancient gene flow from Western Eurasian Paleolithic people occurred into East Asian populations, and that then this haplotype accrued the mutation which later swept to near fixation? If that is the case I’m curious about haplotype networks, as Northern Europeans should be more diverse when it comes to the haplotype in question.

In the near future we’ll probably have better and more numerous whole genome sequences of ancient samples. Some of the confusions engendered by this work will be cleared up, as better data renders paradox crisply coherent. The preprint is free to anyone, and I invite readers to dig deeply into it. Though the results yielded only a few positive signals of selection, they’re subtle and complex in their implications. I certainly haven’t thought through everything….

* The fraction of blue eyes is MUCH higher among Punjabis than Bengalis in my experience. It goes to the point that blue eyes likely expresses against the genetic background found in Europeans, where there are other depigmenting alleles near fixation.

• Category: Science • Tags: Europeans, Population Genetics, Selection 
🔊 Listen RSS

Credit: CISC

Credit: CISC

Early last year an ancient genomics paper came out with the title Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. The point here is that light pigmentation associated alleles common in Europeans seem to be relatively new derived mutations from the ancestral state, associated with Africans. An Ewen Callaway write up highlighted the fact that one of the inferences made from these genomes is that these hunter-gatherers had light eyes (blue) and dark(er) skin. At the time I pointed out to Callaway on Twitter that we need to be careful here, as ancient Europeans may have had different variants, and these traits are not monogenic but exhibit dependencies on multiple loci. In light of my post below Graham Coop suggested a similar issue, that there could have been convergence. In other words, just because modern Europeans have particular derived alleles which confer a particular trait, it does not entail that ancient peoples who lived in Europe had to have the same alleles to confer the same phenotype. Alicia Martin observed that OCA2 is a locus where fast evolution occurred in both East Asians and West Eurasians (especially Europeans), but at different SNPs. In other words, the same gene is modified, but the mutational event is distinct.

Pigmentation in humans seems to be a trait we have a pretty good grasp of. Because most of the genetic variation between populations seems to be localized at relatively large effect loci GWAS has been good at picking up the signals. Tests of selection which look at haplotype structure also detect these loci because many of them seem to have swept up in frequency relatively recently. This is consonant with what ancient DNA is telling us, as a substantial proportion of modern European ancestry does derive from peoples who have been resident at high latitudes for tens of thousands of years, but new variants, possibly from the Middle East or elsewhere, have increased in frequency within this admixed populations (in South Asia the same pattern is evident, as the Ancestral North Indians likely introduced West Eurasian variants into the hybrid daughter populations).

But let’s think through some of the implications of the alternative scenarios. One model is implicitly the dominant one, that the modern skin lightening alleles which are derived in contemporary populations are due to new pressures for de-pigmentation. Though some de-pigmentation likely occurred early on, perhaps even in Neandertals, the full suite is recent. Another model is that there were other variants segregating in the older populations, and that new populations brought new variants which swept to fixation. My question is simple: if the indigenous populations of Europe were already relatively light skinned whey did the new alleles rise in frequency so rapidly?

Let’s unpack what I’m getting at. OCA2 and SLC24A5 are two loci implicated in de-pigmentation in Europeans. The regions around the selective events are highly homogenized so that there’s a long haplotype around them. This means that the causal variant was targeted by such strong selection that the flanking regions of the genome were swept upward in frequency faster than recombination could break apart the association. SLC24A5 in particular seems to have been under very strong selection, to the point where almost all variation has been purged from European populations at this locus. In India SLC24A5 is also at a higher frequency than might be predicted by simple contribution of ANI ancestry. The issue that I’m getting at, assuming that modern continental populations such as Europeans are admixed, is why these skin lightening alleles swept to frequency so rapidly and in the case of SLC24A5 nearly to fixation. It’s framed by the analysis presented by this paper, Parallel Adaptation: One or Many Waves of Advance of an Advantageous Allele?:

Models for detecting the effect of adaptation on population genomic diversity are often predicated on a single newly arisen mutation sweeping rapidly to fixation. However, a population can also adapt to a new environment by multiple mutations of similar phenotypic effect that arise in parallel, at the same locus or different loci. These mutations can each quickly reach intermediate frequency, preventing any single one from rapidly sweeping to fixation globally, leading to a “soft” sweep in the population. Here we study various models of parallel mutation in a continuous, geographically spread population adapting to a global selection pressure. The slow geographic spread of a selected allele due to limited dispersal can allow other selected alleles to arise and start to spread elsewhere in the species range. When these different selected alleles meet, their spread can slow dramatically and so initially form a geographic patchwork, a random tessellation, which could be mistaken for a signal of local adaptation. This spatial tessellation will dissipate over time due to mixing by migration, leaving a set of partial sweeps within the global population. We show that the spatial tessellation initially formed by mutational types is closely connected to Poisson process models of crystallization, which we extend. We find that the probability of parallel mutation and the spatial scale on which parallel mutation occurs are captured by a single compound parameter, a characteristic length, which reflects the expected distance a spreading allele travels before it encounters a different spreading allele. This characteristic length depends on the mutation rate, the dispersal parameter, the effective local density of individuals, and to a much lesser extent the strength of selection. While our knowledge of these parameters is poor, we argue that even in widely dispersing species, such parallel geographic sweeps may be surprisingly common. Thus, we predict that as more data become available, many more examples of intraspecies parallel adaptation will be uncovered.

Basically, if the ancient North Eurasian populations had lighter skin due to their own alleles, why are the new light skin alleles sweeping up in frequency so strongly after admixture? (for Europeans, I’m thinking SLC45A2 and SLC24A5 in particular). Perhaps the selective sweeps were not driven by light skin at all? Or, perhaps the ancient North Eurasians didn’t have their own variants.

Addendum: The 2007 Neandertal red hair paper offers up a possible solution toward phenotype reconstruction: test the ancient genetic variants in cell lines to check for expression.

• Category: Science • Tags: Selection 
🔊 Listen RSS

select Simon Gravel has a new preprint up on bioRxiv, When is selection effective? It’s a preprint, so has to be thought of as a work-in-progress. From my perspective it’s interesting because it combines analytic methods along with simulation in an attempt to sharpen intuitions about the power of selection to modulate genetic load. Issues relating to load matters because there have been empirical results and arguments about the differences between human populations due to findings from genomics over the past 10 years (e.g., Europeans have higher load than Africans because of lower long term effective population size). More generally I believe that the interplay of selection and drift across natural history are relevant for conservative genetics.

These results seem to imply that using realistic models of human demographic over the past ~100,000 the differences in load should be relatively minor. Interestingly the power of selection on recessive alleles of large deleterious effect actually becomes stronger in bottlenecked populations, presumably because of exposure of homozygotes. This is obvious in hindsight. In contrast weakly deleterious alleles are more efficiently purged in the larger effective population size of Africans.

The main thing I took away from the preprint is the emphasis on the long term population history and its impact on genetic load in a given generation. It strike me that this is why simulation methods are so persuasive, as the combined effects are indeed subtle.

Citation: When is selection effective?, Simon Gravel,

• Category: Science • Tags: Selection 
🔊 Listen RSS

170px-Charles_Darwin_by_Julia_Margaret_Cameron_2Over at The New York Review of books H. Allen Orr has put up a reaction to A troublesome inheritance. It’s very similar to Jerry Coyne’s take, the part about science (e.g., population structure being non-trivial) is deemed acceptable, but speculations in the second half of the book are not as appreciated. This is not surprising, and seems typical for working population geneticists (though do note that R. A. Fisher’s A Genetical Theory of Natural Selection has quite a bit of sociological speculation in the second half). But I have to say that I disagree with Orr when he says that “it seems hard to maintain that educated people deny that DNA sequences differ subtly among continents.” Jerry Coyne has a follow-up post where he praises Orr’s review, but adds:

This is what I also claimed, and of course got slammed by the race-denialists who are motivated largely by politics. To a biologist, races are simply genetically differentiated populations, and human populations are genetically differentiated. Although it’s a subjective exercise to say how many races there are, human genetic differentiation seems to cluster largely by continent, as you’d expect if that differentiation evolved in allopatry.

You can see what Coyne is talking about in the comments of his blog, for example:

“The idea that human populations are genetically identical, and “races” are purely social constructs, reflecting nothing about genetic differences, is simply wrong." [quoting Coyne -Razib]

This is completely false. I don’t see how any analysis of genetic differences will produce a ‘black’ race that combines Africans, Sinhalese and Australian Aborigines, nor that will justify the ‘one drop’ rule.

Can you believe that Jerry Coyne actually has to respond to this sort of thing? The whole point of formal means of clustering populations is to avoid these a priori social constructions. Gross phenotypes like skin color seem only moderately informative of population history. But a great number of educated people talk about human variation with the taxonomic sophistication of an 11 year old from the Jim Crow south.

In any case, I want to highlight a second area where Orr has a mild slap at Wade, and that’s on the science. Others have touched on this, but a key issue is that Nicholas Wade is covering a beat which is changing by the month, so obviously a lot of the science has now been superseded. Orr states:

Wade’s survey of human population genomics is lively and generally serviceable. It is not, however, without error. He exaggerates, for example, the percentage of the human genome that shows evidence of recent natural selection. The correct figure from the study he cites is 8 percent, not 14, and even this lower figure is soft and open to some alternative explanation…

What’s the truth here? This is a very live area of science. Last summer a preprint was posted on arXiv, Genome wide signals of pervasive positive selection in human evolution. The title makes the conclusion pretty clear. It’s now been published in Genome Research. The authors argue that background selection is confounded with regions of positive selection, in such a manner that the latter is obscured. I blogged it when it came out, if you want to dig deeper. At this point all the controversies about selectionism vs. neutralism really are irrelevant, as there’s enough data to go around that you can actually concretely test hypotheses. In regards to humans my own position is now leaning more toward greater, rather than less, selection, despite the small effective size of our species. That’s because I suspect that we’ve only scratched the surface when it comes to ‘soft selection’ on standing variation….

• Category: Race/Ethnicity, Science • Tags: Genomics, Selection 
🔊 Listen RSS

Layers and layers….

There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*

But ultimately these 10,000 feet debates are more a matter of philosophy than science. At least until the scientific questions are stripped of their controversy and an equilibrium consensus emerges. That will only occur through an accumulation of publications whose results are robust to time, and subtle enough to convince dissenters. This is why Enard et al.’s preprint, Genome wide signals of pervasive positive selection in human evolution, attracted my notice. With the emergence of genomics it has been humans first in line to be analyzed, as the best data is often found from this species, so no surprise there. Rather, what is so notable about this paper in light of the past 10 years of back and forth exploration of this topic?**

By taking a deeper and more subtle look at patterns of the variation in the human genome this group has inferred that adaptation through classic positive selection has been a pervasive feature of the human genome over the past ~100,000 years. This is not a trivial inference, because there has been a great deal of controversy as to the population genetic statistics which have been used to infer selection over the past 10 years with the arrival of genome-wide data sets (in particular, a tendency toward false positives). In fact, one group has posited that a more prominent selective force within the genome has been “background selection,” which refers to constraint upon genetic variation due to purification of numerous deleterious mutations and neighboring linked sites.

The sum totality of Enard et al. may seem abstruse, and even opaque, in terms of the method. But each element is actually rather simple and clear. The major gist is that many tests for selection within the genome focus on the differences between nonynonymous and synonymous mutational variants. The former refer to base positions in the genome which result in a change in the amino acid state, while the latter are those (see the third positions) where different bases may still produce the same amino acid. The ratio between substitutions, replacements across lineages for particular base states, at these positions is a rough measure of adaptation driven by selection on the molecular level. Changes at synonymous positions are far less constrained by negative selection, while positive selection due to an increased fitness via new phenotypes is presumed to have occurred only via nonsynonymous changes. What Enard et al. point out is that the human genome is heterogeneous in the distribution of characteristics, and focusing on these sorts of pairwise differences in classes without accounting for other confounding variables may obscure dynamics on is attempting to measure. In particular, they argue that evidence of positive selective sweeps are masked by the fact that background selection tends to be stronger in regions where synonymous mutational substitutions are more likely (i.e., they are more functionally constrained, so nonsynonymous variants will be disfavored). This results in elevated neutral diversity around regions of nonsynonymous substitutions vis-a-vis strongly constrained regions with synonymous substitutions. Once correcting for the power of background selection the authors evidence for sweeps of novel adaptive variants across the human genome, which had previous been hidden.

There are two interesting empirical findings from the 1000 Genomes data set. First, the authors find that positive selection tends to operate upon regulatory elements rather than coding sequence changes. You are probably aware that this is a major area of debate currently within the field of molecular evolutionary biology. Second, there seems to be less evidence for positive selection in Sub-Saharan Africans, or, less background selection in this population. My own hunch is that it is the former, that the demographic pulse across Eurasia, and to the New World and Australasia, naturally resulted in local adaptations as environmental conditions shifted. Though it may be that the African pathogenic environment is particularly well adapted to hominin immune systems, and so imposes a stronger cost upon novel mutations than is the case for non-Africans. So I do not dismiss the second idea out of hand.

Where this debate about the power of selection will end is anyone’s guess. Nor do I care. Rather, what’s important is getting a finer-grained map of the dynamics at work so that we may perceive reality with greater clarity. One must be cautious about extrapolating from humans (e.g., the authors point out that Drosophila genomes are richer in coding sequence proportionally). But the human results which emerge because of the coming swell of genomic data will be a useful outline for the possibilities in other organisms.

Citation: Genome wide signals of pervasive positive selection in human evolution

* The cartoon qualification is due to the fact that I am aware that selection is stochastic as well.

** Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72., Sabeti, Pardis C., et al. “Detecting recent positive selection in the human genome from haplotype structure.” Nature 419.6909 (2002): 832-837., Wang, Eric T., et al. “Global landscape of recent inferred Darwinian selection for Homo sapiens.” Proceedings of the National Academy of Sciences of the United States of America 103.1 (2006): 135-140., Williamson, Scott H., et al. “Localizing recent adaptive evolution in the human genome.” PLoS genetics 3.6 (2007): e90., Hawks, John, et al. “Recent acceleration of human adaptive evolution.” Proceedings of the National Academy of Sciences 104.52 (2007): 20753-20758., Pickrell, Joseph K., et al. “Signals of recent positive selection in a worldwide sample of human populations.” Genome research 19.5 (2009): 826-837., Hernandez, Ryan D., et al. “Classic selective sweeps were rare in recent human evolution.” Science 331.6019 (2011): 920-924.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.

As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:

Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending these studies by inferring parameters, such as selection coefficients and the time when a selected variant arose. Of particular interest is the question whether the selective pressure was already present when the selected variant was first introduced into a population. In this case, the variant would be selected right after it originated in the population, a process we call selection from a de novo mutation. We contrast this with selection from standing variation, where the selected variant predates the selective pressure. We present a method to distinguish these two scenarios, test its accuracy, and apply it to seven human genes. We find three genes, ADH1B, EDAR, and LCT, that were presumably selected from a de novo mutation and two other genes, ASPM and PSCA, which we infer to be under selection from standing variation.

The dynamic which they refer to seems to be a reframing of the conundrum of detecting hard sweeps vs. soft sweeps. In the former you case have a new mutation, so its frequency is ~1/(2N). It is quickly subject to natural selection (though stochastic processes dominate at low frequencies, so probability of extinction is high), and adaptation drives the allele to fixation (or nearly to fixation). In the latter scenario you have a great deal of extant genetic variation, present in numerous different allelic variants. A novel selection pressure reshapes the frequency landscape, but you can not ascribe the genetic shift to only one allele. It is no surprise that the former is easier to model and detect than the latter. Much of the evolutionary genomics of the 2000s focused on hard sweeps from de novo mutations because they were low hanging fruit. The methods had reasonable power to detect them (as well as many false positives!). But of late many are suspecting that hard sweeps are not the full story, and that much of evolutionary genetic process may be characterized by a combination of hard sweeps, soft sweeps (from standing variation), various forms of negative selection, not to mention the plethora of possibilities which abound in the domain of balancing selection.

Many of the details of the paper may seem overly technical and opaque (and to be fair, I will say here that the figures are somewhat difficult to decrypt, though the subject is not one that lends itself to general clarity), but the major finding is straightforward, and illustrated in figure 4 (I’ve added labels):

- The y-axis represents the frequency of the selected allele(s) at the initial start of the selection phase

- The x-axis frequency represents a population scaled selection coefficient: α = 4 Ns. Recall that N is the population size, and s is the standard selection coefficient, which measures the relative fitness difference between an individual/gene against the population median. A selection coefficient of 0.10 (10% increased fitness) is strong. One of 0.01 (1%) is modest.

What the results above, derived from simulations using particular parameters relevant to population genetic models and the output statistics (e.g., iHS, EHH, Tajima’s D), show you is that it is easier to differentiate forms of selection when:

- For standing variation the selected variants are present at a higher initial frequency when selection initiates. This is not relevant for de novo mutation, where the frequency is very low by definition. Remember that the latter case is actually a subset of the former. If the standing variation model has a parameter which varies in frequency, as the proportion converged upon 1/(2 N) you just get the de novo scenario.

- The stronger the selection event, the greater the power to detect and correctly assign selection for standing variation. This is rather straightforward on first blush. The main exception seems to be in panel e, where increased strength of selection decreases the ability to differentiate the models when the adaptive phase initiates when the initial allele frequency is low. I assume here you have a situation where it is difficult to distinguish the two models, as de novo and standing variation are converging. Note that it is easier to assign a hard sweep from a de novo mutation when the final frequency (or the frequency you are attempting to detect) is lower. Why? Probably because as the mutation fixes you are removing much of the variant genomic information you need to infer the trajectory of the selected variant (this is true for iHS).

All this may seem abstract. But what you need to do to make some sense of this is to visualize the trajectory of the evolutionary dynamics in temporal and concrete terms. For example, a de novo mutation which drives adaptation will rapidly expand in the population over time. Because of this phenomenon there will be a hitchhiking event where the flanking regions of the favored allele also rise in frequency. This generates a extended region of homogeneity in the genome, in direction proportion of the frequency of the haplotype. This block of homogeneity eventually decays as genetic recombination breaks apart the physical association of the markers which were found together on the original mutant by chance. This is why the power to detect these events declines over time; the perturbation decays, and the genome reverts to equilibrium. In contrast selection against standing variation is more complex, and therefore more difficult to detect, as it does not produce a clear and distinct signal as often. You may have numerous alleles dispersed across wide regions of the genome amenable to being driven up in frequency by adaptive pressures. This generates a mass action shift in variants, but does not entail the production of wide and distinctive homogeneous blocks across the genome. Rather, you have a larger number of alleles subject to less intensive individual selection. Though some of the same consequences are entailed as in the de novo mutation case, the magnitude will be sharply attenuated in any given region of the genome.

Though the conceptual & methodological issues here are of interest in and of themselves (e.g., can you trust the Approximate Bayesian Computation framework to generate simulations which give useful results?), there are also some analyses of real human genes. These are not revolutionary, they’re loci which have been analyzed before. But methods need to be judged against reality at some point, and this is an attempt. The table below shows their results.

Some of these genes should be familiar to you. If not, see the function column. I do want to mention that EDAR has been implicated in hair thickness in East Asians. The most amusing aspect of this gene is that it can turn mice into Asians, at least in their hair form. Obviously they focused on single populations. They note in the methods that more populations would introduce demographic complexities into their simulations, and it seems likely that they were already pushing the realistic boundaries of computations which you might want to run routinely in a laboratory. But, this simplification might explain some ambiguity with ADH1B, which has been found in West Asia as well (forgoing the straightforward model in all likelihood of one single sweep in East Asia). An important issue then may be the population sensitivity of these methods. One could imagine that selection at a gene is easy to discern in population A, but not population B. One population may shift to a different phenotype through standing variation, while another was subject to a hard sweep from de novo mutation. The devil here is in the details. There may not be one narrative to rule them all.

The most important result from this paper was its exploration of the reasonable parameter space over which one can make robust inferences about the specific variety of selection which is operative (or lack thereof). In the near future computational power and a surfeit of empirical data sets will make it so that there will be great temptation to generate reams of results in a blind fashion utilizing off the shelf techniques. But techniques without subtly and human judgment can lead to confusion and falsity. It is useful to know the scenarios where one would expect large numbers of false positives or low statistical power, a priori. That way you may save yourself a great deal of time after the fact.

As for soft vs. hard sweeps. This isn’t simply a question of interest and relevance to population geneticists and genomicists. The nature of adaptation is a question of deep importance across evolutionary biology. The balance between these two phenomena are important in characterizing the mode and tempo of evolution. It may be that in fact the ratio varies as a function of the tree of life, so that evolution may operate with slightly different rules contingent upon taxon.

Citation: Peter BM, Huerta-Sanchez E, Nielsen R (2012) Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. PLoS Genet 8(10): e1003011. doi:10.1371/journal.pgen.1003011

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Long time readers will be familiar with the large literature in behavior genetics/genomics and dopamine receptor genes. So with that, I point you to a paper exploring the patterns of variation and their relationship to possible natural selection, No Evidence for Strong Recent Positive Selection Favoring the 7 Repeat Allele of VNTR in the DRD4 Gene:

The human dopamine receptor D4 (DRD4) gene contains a 48-bp variable number of tandem repeat (VNTR) in exon 3, encoding the third intracellular loop of this dopamine receptor. The DRD4 7R allele, which seems to have a single origin, is commonly observed in various human populations and the nucleotide diversity of the DRD4 7R haplotype at the DRD4 locus is reduced compared to the most common DRD4 4R haplotype. Based on these observations, previous studies have hypothesized that positive selection has acted on the DRD4 7R allele. However, the degrees of linkage disequilibrium (LD) of the DRD4 7R allele with single nucleotide polymorphisms (SNPs) outside the DRD4 locus have not been evaluated. In this study, to re-examine the possibility of recent positive selection favoring the DRD4 7R allele, we genotyped HapMap subjects for DRD4 VNTR, and conducted several neutrality tests including long range haplotype test and iHS test based on the extended haplotype homozygosity. Our results indicated that LD of the DRD4 7R allele was not extended compared to SNP alleles with the similar frequency. Thus, we conclude that the DRD4 7R allele has not been subjected to strong recent positive selection.

In that vein, I also stumbled upon this paper recently, Contrasting signals of positive selection in genes involved in human skin-color variation from tests based on SNP scans and resequencing:

Applying all commonly used neutrality-test statistics for allele frequency distribution to the newly generated sequence data provided conflicting results regarding evidence for positive selection. Previous haplotype-based findings could not be clearly confirmed. Although some tests were marginally significant for some populations and genes, none of them were significant after multiple-testing correction. Combined P values for each gene-population pair did not improve these results. Application of Approximate Bayesian Computation Markov chain Monte Carlo based to these sequence data using a simple forward simulator revealed broad posterior distributions of the selective parameters for all four genes, providing no support for positive selection. However, when we applied this approach to published sequence data on SLC45A2, another human pigmentation candidate gene, we could readily confirm evidence for positive selection, as previously detected with sequence-based and some haplotype-based tests.

Please note that they didn’t check for selection at SLC24A5. This probably would yielded some evidence of selection.

Both papers are open access, so I invite readers to take a look for themselves.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Genetics, Genomics, Selection 
🔊 Listen RSS

The Pith: What makes rice nice in one varietal may not make it nice in another. Genetically that is….

Rice is edible and has high yields thanks to evolution. Specifically, the artificial selection processes which lead to domestication. The “genetically modified organisms” of yore! The details of this process have long been of interest to agricultural scientists because of possible implications for the production of the major crop which feeds the world. And just as much of Charles Darwin’s original insights derived from his detailed knowledge of breeding of domesticates in Victorian England, so evolutionary biologists can learn something about the general process through the repeated instantiations which occurred during domestication during the Neolithic era.

A new paper in PLoS ONE puts the spotlight on the domestication of rice, and specifically the connection between particular traits which are the hallmark of domestication and regions of the genome on chromosome 3. These are obviously two different domains, the study and analysis of the variety of traits across rice strains, and the patterns in the genome of an organism. But they are nicely spanned by classical genetic techniques such as linkage mapping which can adduce regions of the genome of possible interesting in controlling variations in the phenotype. In this paper the authors used the guidelines of the older techniques to fix upon regions which might warrant further investigation, and then applied the new genomic techniques. Today we can now gain a more detailed sequence level picture of the genetic substrate which was only perceived at a remove in the past through abstractions such as the ‘genetic map.’ Levels and Patterns of Nucleotide Variation in Domestication QTL Regions on Rice Chromosome 3 Suggest Lineage-Specific Selection:

Oryza sativa or Asian cultivated rice is one of the major cereal grass species domesticated for human food use during the Neolithic. Domestication of this species from the wild grass Oryza rufipogon was accompanied by changes in several traits, including seed shattering, percent seed set, tillering, grain weight, and flowering time. Quantitative trait locus (QTL) mapping has identified three genomic regions in chromosome 3 that appear to be associated with these traits. We would like to study whether these regions show signatures of selection and whether the same genetic basis underlies the domestication of different rice varieties. Fragments of 88 genes spanning these three genomic regions were sequenced from multiple accessions of two major varietal groups in O. sativaindica and tropical japonica—as well as the ancestral wild rice species O. rufipogon. In tropical japonica, the levels of nucleotide variation in these three QTL regions are significantly lower compared to genome-wide levels, and coalescent simulations based on a complex demographic model of rice domestication indicate that these patterns are consistent with selection. In contrast, there is no significant reduction in nucleotide diversity in the homologous regions in indica rice. These results suggest that there are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups.

Here’s what seems relevant for the two domestic varieties from Wikipedia:

Oryza sativa contains two major subspecies: the sticky, short grained japonica or sinica variety, and the non-sticky, long-grained indica variety. Japonica are usually cultivated in dry fields, in temperate East Asia, upland areas of Southeast Asia and high elevations in South Asia, while indica are mainly lowland rices, grown mostly submerged, throughout tropical Asia….

There’s long been debate about the exact phylogenetic relationship between these two strains of domestic rice. More on that later. In regards to domestication there are three categories we need to focus on in terms of adaptation: 1) traits which are common to all domestic cereals and tend to crop up almost immediately, 2) traits which are extensions and improvements upon the initial domestic prototype, 3) traits which are regional diversifications, often adaptations to climate. Consider an analogy to horses. The original domestic horse was rather small, and was only fit for drawing chariots. Eventually the breeds became larger, and suitable for cavalry. Finally, there was a diversification by task (e.g., workhorses vs. race horses) and to some extent climate.

As noted above previous classical genetic techniques had narrowed down the genetic regions responsible for various domesticate traits when comparing japonica to the wild rufipogon. Since domestication usually entails a process of selection the authors naturally presumed that they might be able to detect signatures of selection within the genome. What are the genomic tells of selection?

There are many, just as there are different types of selection. In this case what we know suggests that due to #1 there’s going to be an initial bout of adaptation and rapid shift from wild diversity to fixed traits suitable for a crop which is going to be controlled by humans. Just as the riotous diversity of the wild varieties become constrained to monocultures, so the diversity of the wild type often gets swept away by a few genetic variants which are responsible for the favored traits. So what they might see in the domestic varieties is a sharp reduction of variation around the quantitative trait loci (QTLs) reported earlier, because those QTLs have presumably been the target of selection. In other words, a selective sweep.

That’s what they found. At least in one lineage.

Left to right you have indica, japonica, and rufipogon. Front to back in each chart you see the three QTLs, and the distribution of nucleotide diversities by genetic fragments within these QTLs. The extremely skewed distribution of the domestic varieties in relation to the wild type rufipogon is rather obvious. Additionally, you see a stronger skew in japonica in relation to indica. The skew in the domestic strains is toward a greater proportion of the fragments having very low nucleotide diversity.

What could cause this? You need a further piece of information here. The domestic varieties have long regions of the genome characterized by linkage disequilibrium (actually, japonica is so homogeneous that you barely have enough variation to calculate LD!). So particular genetic variants are associated with each other, resulting in long runs of similar sequences, haplotypes. It’s as if a chunk of some ancient chromosome just “blew up” and took over that segment of the genome in japonica.

Natural selection could do this. Imagine that an ancestral rufipogon has a genetic variant which confers a domestic trait. It would be selected. Even if crossed with other strains with other domestic characteristics its particular QTL would be transmitted down to the descendants in general. But not only would the specific genetic variant which conferred the favored trait be passed on, but many of the flanking genomic regions carrying other variants would also be transmitted! This explains the extremely low genetic diversity in japonica, if there’s a sweep up in frequency of a particular ancestral haplotype then what were polymorphisms in the wild type become monomorphic in the domesticate.

Another explanation though could be that demographic history produced these results. Random genetic drift due to small populations, whether via bottleneck or systematic inbreeding/selfing, can also drive up the frequency of alleles favored by lady-luck and render extinct all others. To check for this the authors constructed a model where japonica and indica went through bottlenecks enforced by the domestication (note that strong selection can drive down population size as well). Even with this model the diversity in japonica in these QTLs remained far too low (though indica’s skew did not reach statistical significance).

Since both of the domestic strains exhibit traits of domestication the lack of a selective event in indica at these QTLs does not allow us to infer that there are no genes which were selected for these traits in the past in indica. On the contrary, there certainly were and are such genes. But where are they? The authors moot the possibility that selection exists at the loci under consideration, but was simply missed because the selection was by a different dynamic which might not be picked up by their test. For various reasons they are skeptical of this on its own merits, but I think the bigger issue is that the original linkage mapping was performed with japonica vs. wild type strains, so naturally if the two domestic subspecies differed in their genetic architectures then the QTLs of interest of indica would not be discovered simultaneously.

Something which I’m rather perplexed by is how this comports or aligns with the finding by many of the same researchers that the two domestic varietals derive from the same ancestral population which was domesticated from East Asian wild rice. It could be that the history of domestication is more serial than we know, and that the common QTLs to both japonica and indica have been rendered irrelevant by new adaptations subsequent to their separation. Or, one or the other may have experienced introgression at that locus and so diverged after domestication. Interestingly in figure 7 of the paper they show that phylogenetic trees which illustrates the relationship of alleles associated with each strain. It indicates that indica is not monophyletic on these regions, while japonica is. This means that the japonica variants share a common ancestor, from which all are descended. In contrast, indica variants do not. Such a pattern is consistent with the story of strong positive selection upon a single variant at some time in the past for japonica. From what I can tell they may actually have sent the PLoS ONE paper to the reviewers before the PNAS paper which I reviewed earlier. Because these two papers were published so close to each other they don’t cite each other, though in some ways the first paper in PNAS would have fleshed out the natural history of domestic rice somewhat. As it is, they kind of leave of us hanging in relation to indica.

Why does all of this matter? Yes, agricultural genetics is important for agriculture. But let’s get back to people. There is a hypothesis that man is a ‘self-domesticated’ organism. Whatever quibbles I have with artificial terms like domestication I do think that there may be broad analogies to be drawn between our own species and the organisms associated with us.

Citation: Xianfa Xie1, Jeanmaire Molina, Ryan Hernandez, Andy Reynolds, Adam R. Boyko, Carlos D. Bustamante, & Michael D. Purugganan (2011). Levels and Patterns of Nucleotide Variation in Domestication QTL Regions on Rice Chromosome 3 Suggest Lineage-Specific Selection PLoS ONE : 10.1371/journal.pone.0020670

Image Credit: IRRI Images

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS Last week I reviewed ideas about the effect of “exogenous shocks” to an ecosystem of creatures, and how it might reshape their evolutionary trajectory. These sorts of issues are well known in their generality. They have implications from the broadest macroscale systematics to microevolutionary process. The shocks point to changes over time which have a general effect, but what about exogenous parameters which shift spatially and regularly? I’m talking latitudes here. The further you get from the equator the more the climate varies over the season, and the lower the mean temperature, and, the less the aggregate radiation the biosphere catches. Allen’s rule and Bergmann’s rule are two observational trends which biologists have long observed in relation to many organisms. The equatorial variants are slimmer in their physique, while the polar ones are stockier. Additionally, there tends to be an increase in mean mass as one moves away from the equator.

But these rules are just general observations. What process underlies these observations? The likely culprit would be natural selection of course. But the specific manner in which this process shakes out, on both the organismic and genetic level, still needs to be elucidated in further detail. A new paper in PLoS Genetics attempts to do this more rigorously and deeply than has been done before for one particular world wide mammalian species, H. sapiens sapiens. We have spanned the latitudes and longitudes, and so we’re a perfect test case for an exploration of the broader microevolutionary forces which shape variation.

The paper is Adaptations to Climate-Mediated Selective Pressures in Humans. Its technical guts can be intimidating, but its initial questions and final answers are less daunting. So let’s jump straight to the last paragraph of the discussion:

The results of this genome scan not only increase our understanding of the genetic landscape of adaptation across the human genome, but they may also have a more practical value. For example, they can be used to select candidate genes for common disease risk and to generate specific testable hypotheses regarding the functions of specific genes and variants. While the results of genome-wide scans for association with diseases and other traits are accumulating at a rapid pace, interpretation of these results is often ambiguous because the power to detect all common variants that are important in the etiology of the phenotype is incomplete. This is especially true in the case of complex traits, where variants at many loci may contribute to the phenotype, each with a small effect. By combining the evidence from GWAS with evidence of selection, it may be possible to separate true causative regions from the background noise inherent in genome-wide screens for association. To facilitate this, all of our empirical rank statistics are publically available. Moreover, results of selection scans that detect evidence for spatially-varying selection may be especially relevant to diseases that show substantial differences in prevalence across ethnic groups (e.g., sodium-sensitive hypertension, type 2 diabetes, prostate cancer, osteoporosis). In the future, this approach could be extended by including additional populations and aspects of the environment to gain a more complete understanding of how natural selection has shaped variation across the genome in worldwide populations. Furthermore, whereas we relied on linkage disequilibrium between (potentially un-genotyped) adaptive variants and genotyped SNPs, whole genome re-sequencing data should give a more complete picture of the variation that underlies adaptation.

How’d they infer this? First, they had a pretty wide coverage of populations from across the world. They pooled the HGDP and HapMap, as well as a few other populations of interest, Ethiopians, some Siberian groups, and Australian Aboriginals. I do wish that the Aboriginal data set was public, but it doesn’t seem to be! The Ethiopians are I assume the ones you can find in Behar et al. The authors had a null model which was predicated on the fact that variation in the frequencies of given genetic morphs, single nucleotide polymorphisms, should be bested predicted by population history and relationships. That is, two populations will differ on a given locus in proportion to their genetic divergence, due to random forces such as genetic drift. Perturbations from this null model are possible targets of natural selection, which reshapes regions of the genome in a deterministic manner aiming at particular ends. Two 21st century classic examples of this phenomenon seem to be skin pigmentation and lactase persistence. Different populations with the same phenotype, in particular, light skin and the ability to digest lactose sugar as an adult, exhibit divergent genetic architectures.

They naturally looked to see how these deviations tracked environmental parameters you see above. Keep in mind that they did take into account correlations between these variables. Additionally, correlation does not equal causation, so there could be other variables which are correlated with the ones which they explored which might be responsible for the systematic perturbations.

Their method yielded a Bayes factor (BF) which measures the deviation from the null model for a given SNP. To judge off the bat whether these SNPs are plausibly the targets of adaptation you want to check to see if they’re enriched for certain classes of SNPs. They found that the SNPs which rejected the null model, where population history and demographics predicts genetic variation, tended to be much more likely to be genic or nonsynonymous. This means that the base pair is embedded in a coding gene, as opposed to much of the genome which isn’t translated into proteins. A nonsynonymous base pair is one at a location which changes the protein coded. Normally these sorts of changes are selected against because you don’t want to change the protein function, but when a population is adapting to a new environment this is obviously not so.

There are a host of results in the paper, but one pattern which seemed of interest was that different sets of SNPs can be selected in different population pools. Below are two panels which show the SNPs with significant BF, and how they vary as a function of the climatic variable depending upon the populations which are sampled. To the left you see the cluster which varied in western Eurasia, while in the left you see those which varied in eastern Eurasia. In a broad sense the target of selection was the same, but the specific SNPs which were pulled out the set of potential targets still exhibits stochasticity:

Natural selection is deterministic in the broadest scale, but in its instantiations it can exhibit a great deal of randomnes. Same phenotype. Different genotype. Similarly, the heat death of the universe may be determined, but there’s a lot of contingency of epiphemenonal detail between now and then. Modulating the range of populations analyzed often shifted the value of the statistic for a given SNP. Remember, averaging over the aggregate can remove important local information. That being said, the Venn Diagram below shows that there was a disproportionate tendency for the signals detected to be world wide. This indicates that the wheel isn’t reinvented as much as we might think. I wonder if it points to the limitations baked into the human genome in terms of the plasticity and flexibility of all its various pathways. There’s a structural engineer vetoing the elegant fancies of the architect?

The leftmost panel highlights the West Eurasian signals and the middle panel the East Eurasians.

As noted above these sorts of studies have both evolutionary and biomedical relevance. Perhaps the most intriguing result, albeit expected from other areas of research, is the role of antagonistic pleiotropy in many diseases. Concretely, it may be that a change in a particular location may increase reproductive fitness in a novel environment at the cost of later morbidity in life. The authors suggest that pathogenic resistance and inflammatory response may have the side effect of increasing susceptibility to a range of diseases of old age. Why is this important? I think that the authors are implying in part that a plausible evolutionary mechanism of adaptation should change our prior expectation that a given genome wide association is a false positive. At least I think that. If a SNP was the target of natural selection and shows up on GWAS, keep an eye on it! All the better if you have a good functional understanding of what’s going on there.

But more long-term, it might change our perception of the basal risk for classes of morbidity as they vary by population. Human populations have had different evolutionary histories. Their disease risks then might vary a great deal. Between population differences may be a lot less paradoxical than we think….

Citation: Hancock AM, Witonsky DB, Alkorta-Aranburu G, Beall CM, & Gebremedhin A (2011). Adaptations to Climate-Mediated Selective Pressures in Humans PLoS Genetics, : 10.1371/journal.pgen.1001375

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Last month in Nature Reviews Genetics there was a paper, Measuring selection in contemporary human populations, which reviewed data from various surveys in an attempt to adduce the current trajectory of human evolution. The review didn’t find anything revolutionary, but it was interesting to see where we’re at. If you read this weblog you probably accept a priori that it’s highly unlikely that evolution “has stopped” because infant mortality has declined sharply across developed, and developing, nations. Evolution understood as change in gene frequencies will continue because there will be sample variance in the proportions of given alleles from generation to generation. But more interestingly adaptive evolution driven by change in mean values of heritable phenotypes through natural selection will also continue, assuming:

1) There is variance in reproductive fitness

2) That that variance is correlated with a phenotype

3) That those phenotypes are at all heritable. In other words, phenotypic variation tracks genotypic variation

Obviously there is variance in reproductive fitness. Additionally, most people have the intuition that particular traits are correlated with fecundity, whether it be social-cultural identities, or personality characteristics. The main issue is probably #3. It is a robust finding for example that in developed societies the religious tend to have more children than the irreligious. If there is an innate predisposition to religiosity, and there is some research which suggests modest heritability, then all things being equal the population would presumably be shifting toward greater innate predisposition toward religion as time passes. I do believe religiosity is heritable to some extent. More precisely I think there are particular psychological traits which make supernatural claims more plausible for some than others, and, those traits themselves are partially determined by biology. But obviously even if we think that religious inclination is partially heritable in a biological sense, it is also heritable in the familial sense of values passed from one generation to the next, and in a broader cultural context of norms imposed from on high. In other words, when it comes to these sorts of phenotypic analyses we shouldn’t get too carried away with clean genetic logics. In Shall the Religious Inherit the Earth? Eric Kaufmann notes that it is in the most secular nations that the fertility gap between the religious and irreligious is greatest, and therefore selection for religiosity would be strongest in nations such as Sweden, not Saudi Arabia. But as a practical matter biologically driven shifts in trait value in this case pales in comparison to the effect of strong cultural norms for religiosity.

Below are two of the topline tables which show the traits which are currently subject to natural selection. A + sign indicates that there is natural selection for higher values of the trait, and a – sign the inverse. An s indicates stabilizing selection, which tells you that median values have higher fitnesses than the extremes. The number of stars is proportional to statistical significance.



Some of this is not surprising. The age of the onset of menarche has been dropping in much of the world. I suspect this is mostly due to better nutrition, but a consequence of this shift is earlier fertility for some females. The authors are nervous about the robust correlation of higher fertility with lower intelligence, but notice that the pattern for wealth and income is different and more complicated. The key is to look at education. Whether you believe intelligence exists or not in any substantive concrete sense, those who are more intelligent are more likely to have had more education, and there’s a rather common sense reason why investing in more schooling would reduce your fertility: you simply forgo some of your peak reproductive years, especially if you’re female. The higher you go up the educational ladder the stronger the anti-natalist cultural and practical pressures become (the latter is a heavier burden for females because of their biological centrality in child-bearing, but both males and females are subject to the former). As with religion even if the differences have no biological implication because you believe the correlations are spurious or reject the existence of the trait one presumes that parents and subcultures pass on values to offspring. If higher education has anti-natalist correlations we shouldn’t be surprised if subsequent generations turn away from higher education. Their parents were the ones who were more likely to avoid it.

We live in interesting times.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

480px-Olivia_MunnOne of the major issues which has loomed at the heart of biology since The Origin of Species is why species exist, as well as how species come about. Why isn’t there a perfect replicator which performs all the conversion of energy and matter into biomass on this planet? If there is a God the tree of life almost seems to be a testament to his riotous aesthetic sense, with numerous branches which lead to convergences, and a inordinate fascination with variants on the basic morph of beetles. From the outside the outcomes of evolutionary biology look a patent mess, a sprawling expanse of experiments and misfires.

A similar issue has vexed biologists in relation to sex. Why is it that the vast majority of complex organisms take upon themselves the costs of sex? The existence of a non-offspring bearing form within a species reduces the potential natural increase by a factor of two before the game has even begun. Not only that, but the existence of two sexes who must seek each other out expends crucial energy in a Malthusian world (selfing hermaphrodites obviously don’t have this problem, but for highly complex organisms they aren’t so common). Why bother? (I mean in an ultimate, not proximate, sense)

It seems likely that part of the answer to both these questions on the grande scale is that the perfect is the enemy of long term survival. Sexual reproduction confers upon a lineage a genetic variability which may reduce fitness by shifting populations away from the adaptive peak in the short term, but the fitness landscape itself is a constant bubbling flux, and perfectly engineered asexual lineages may all too often fall off the cliff of what was once their mountain top. The only inevitability seems to be that the times change. Similarly, the natural history of life on earth tells us that all greatness comes to an end, and extinction is the lot of life. The universe is an unpredictable place and the mighty invariably fall, as the branches of life’s tree are always pruned by the gardeners red in tooth and claw. But it is one thing to describe reality in broad verbal brushes. How about a more rigorous empirical and theoretical understanding of how organisms and the genetic material through which they gain immortality play out in the universe? A new paper which uses plant models explores the costs and benefits of admixture between lineages, and how those two dynamics operate in a heterogeneous and homogeneous world. Population admixture, biological invasions and the balance between local adaptation and inbreeding depression:

When previously isolated populations meet and mix, the resulting admixed population can benefit from several genetic advantages, including increased genetic variation, the creation of novel genotypes and the masking of deleterious mutations. These admixture benefits are thought to play an important role in biological invasions. In contrast, populations in their native range often remain differentiated and frequently suffer from inbreeding depression owing to isolation. While the advantages of admixture are evident for introduced populations that experienced recent bottlenecks or that face novel selection pressures, it is less obvious why native range populations do not similarly benefit from admixture. Here we argue that a temporary loss of local adaptation in recent invaders fundamentally alters the fitness consequences of admixture. In native populations, selection against dilution of the locally adapted gene pool inhibits unconstrained admixture and reinforces population isolation, with some level of inbreeding depression as an expected consequence. We show that admixture is selected against despite significant inbreeding depression because the benefits of local adaptation are greater than the cost of inbreeding. In contrast, introduced populations that have not yet established a pattern of local adaptation can freely reap the benefits of admixture. There can be strong selection for admixture because it instantly lifts the inbreeding depression that had built up in isolated parental populations. Recent work in Silene suggests that reduced inbreeding depression associated with post-introduction admixture may contribute to enhanced fitness of invasive populations. We hypothesize that in locally adapted populations, the benefits of local adaptation are balanced against an inbreeding cost that could develop in part owing to the isolating effect of local adaptation itself. The inbreeding cost can be revealed in admixing populations during recent invasions.

First, plants are good models to explore evolutionary genetics. They’re not as constrained as say mammals, or the typical tetrapod, when it comes to barriers to gene flow between distinct taxa. Hybridization is common, and plants can also self-fertilize as well as cross-fertilize, allowing researchers to push the genetic pool in different directions (“selfing” obviously reduces the effective population and is an extreme form of inbreeding, so it’s a good way to purge genetic variation really quickly). In a perfect abstract world of evolution one might imagine Richard Dawkins’ vehicles and replicators as fluid entities which float along a turbid sea of evolutionary genetic parameters, drift, migration, mutation and selection. But reality is constrained to DNA substrate, which have their own parameters such as recombination, modulators such as epigenetics, and numerous ways to express variation through gene regulation. It’s complicated, and stripping the issues down to their pith is easier said that done.

But the broader dynamics here being examined is the generalist-specialist trade-off, which I think is relevant to the two issues I introduced earlier in this post. Specialists are optimized for their own position in the adaptive landscape, but have difficulties when it is perturbed. Generalists always less than maximum fitness in all landscapes, but higher average fitness across them because they can adapt to changes. Specialization is local adaptation of particular lineages, while in the generalist case you can have invasive species in novel environments. They’re obviously facing an adaptive landscape which is at some remove from what any of the introduced genotypes were “optimized” for, so hybridization produces something new for something new.

In the first figure of the paper you see F3 wild barley descended from two parental lineages, ME and AQ. The left panels show seed output as a function of heterozygosity, and the right panels as a function of ME genome content. Remember that in subsequent generations the descendants of hybrids will vary quite a big in genetics and phenotype as the original alleles re-segregate.


The takeaway is that in novel environments genetic variation seems to result in increased fitness. Why? One concept which one has to introduce is heterosis, whereby crosses between homogeneous lineages produce more fitness offspring. One reason this may be is that there is overdominance, where heterozygotes have greater fitness than the homogyzotes. This is the case with sickle-cell malaria disease. Another reason may be that in the original parental lineages there was a higher fraction of alleles which were deleterious in homozygote genotypes. In plain English, inbreeding resulted in genetic drift which cranked up the proportion of alleles implicated in recessively express negative phenotypes. The authors argue though that in the context local adaptation is strong enough to be a barrier against too much gene flow between the parental wild barely lineages, so the deleterious alleles are less likely to be masked. Only in a novel environment when that benefit was removed from the equation could the negative consequences of inbreeding come to the fore in the total calculus.

Figure 2 shows the results of experiments which examine the fitness of white campion, a European species which has been introduced in North America. In the left panel are crosses between native European lineages, with distance between parental lineages on the x-axis. In the right panel you have the same experiment, but with North American variants, which are products of introductions from various regions of Europe. The plants were grown in a “common garden,” to show how all the genotypes performed when environment was controlled.


As you can see moderate levels of hybridization entailed a benefit in the European variants, but not the North American variants. Hybridization between variants which were too distant did produce outbreeding depression in the European case, suggesting perhaps that disruption of co-adapted gene complexes resulted in a greater fitness cost than the masking of deleterious alleles due to inbreeding. One can make the inference from these data that the introduced white campion lineages are already hybridized, the barriers to crossing being removed by a disruption of the adaptive landscapes which each native lineages was optimized for.

Here are the authors from the discussion talking about invasions of exotic species:

Provided that multiple introductions from different source populations have occurred, the benefits of admixture become freely available to introduced populations that do not yet show a pattern of local adaptation. Because the benefits are potentially large, admixture may play an important role during early invasions. Native populations often show evidence of inbreeding depression…and one instant reward of admixture in the introduced range is the release of this genetic burden. Such heterosis effects can contribute significantly to the establishment and early success of invasive species…When tested together in a common garden experiment, invaders can show enhanced fitness-related traits compared with populations from their native range…If there is evidence of admixture, the effects of heterosis might be a default explanation for such observations, perhaps providing a null expectation against which other explanations (such as trait evolution) need to be tested.

What have plants to do with life as a whole? I assume much. Plants differ in the details, but compared to other complex multicellular organisms in regards to evolutionary genetics they’re quite liberated. By this, I mean that their modes of reproduction and promiscuity in hybridization make them more of an ideal “frictionless” test case of evolutionary biology and the power of the classical parameters. Perhaps given enough time natural selection would produce the ideal replicator to rule them all, to drive all others to extinction. But that day is not this day. And that day may never come because the universe is far too protean and erratic. Life is varied, on the phenotypic and genotypic level, and the exogenous processes of climate and geology continue to warp and reshape the adaptive landscape. And more subtly, but just as critically, life is always in an endless race with itself, as pathogens co-evolve with their hosts, and predators figure out how to outfox their prey. Life warps its own adaptive landscapes, and the innovation of one branch may lead to extinction of others as well as the proliferation of new branches.

More prosaically and anthropocentrically what does this say about us? Humans are an expansive species, and over the past 500 years different lineages have been hybridizing promiscuously. New genotypes have arisen in altered landscapes, and our pathogens are also riding the high tide of globalization onward and upward. We are ourselves a “natural experiment.”

Image Credit: Olivia Munn by Gage Skidmore

Link hat tip: Dienekes.

Citation: Verhoeven KJ, Macel M, Wolfe LM, & Biere A (2010). Population admixture, biological invasions and the balance between local adaptation and inbreeding depression. Proceedings. Biological sciences / The Royal Society PMID: 20685700

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

How we perceive nature and describe its shape are a matter of values and preferences. Nature does not take notice of our distinctions; they exist only as instruments which aid in our comprehension. I’ve brought this up in relation to issues such as categorization of recessive vs. dominant traits. The offspring of people of Sub-Saharan African and non-African ancestry where the non-African parent has straight or wavy hair tend to have very curly hair. Therefore, one may say that the tightly curled hair form is dominant to straight or wavy hair. But, it is also the case that there is some modification in relation to the African parent in the offspring, so the dominance is not complete. When examining the morphology of the follicle, which determines the extent of the hair’s curl, the offspring may in fact exhibit some differences from both parents. In other words our perception of the outcomes of inheritance are contingent to some extent on our categorization of the traits as well as our specific focus along the developmental pathway.

Or consider the division between “traits” and “diseases.” The quotations are necessary. Lactose intolerance is probably one of the best cases to illustrate the gnarly normative obstructions which warp our perceptions. As a point of fact lactose intolerance is the ancestral human state, and numerically predominant. It is the “wild type.” Lactose tolerance is a relatively recent adaptation, found among a variety of West Eurasian and African populations. A more politically correct term, lactase persistence, probably better encapsulates the evolutionary history of the trait, which has shifted from the class of disease to that of genetic trait when we evaluate the bigger picture (obviously diseases are simply “bad” traits”).

Sometimes though the issues are more cut & dried. No one would doubt that sickle-cell anemia is a disease. It has a major fitness impact in a colloquial sense, as well as evolutionarily. It kills you, and it kills your potential genetic lineage. But, it is also a byproduct of adaptation to endemic malaria. Sickle-cell disease one of the classical illustrations of heterozygote advantage, whereby those who carry one copy of the mutation on the gene have increased fitness vis-a-vis those who carry two normal copies of the gene. The increase in frequency of the mutant gene though is balanced by the fact that mutant homozygotes have decreased fitness.

We can then construct a narrative of the long term evolutionary dynamics from this initial condition. When a new exogenous stress hits a population mean fitness drops immediately (take a look at the biographies of the Popes, and observe how many died of malaria in the Dark Ages when that disease was new to Italy). Natural selection quickly increases in frequency any alleles which confer protection against the exogenous stress. But, baked into the cake of how genetics in complex organisms usually works, one allele may often have multiple downstream consequences. This is pleiotropy. This means that if a change at a locus increases aggregate fitness, it may nevertheless destabilize long established biochemical pathways. In the short term evolution simply takes the net fitness impact into account. Over the long term one assumes that “better solutions” will emerge which do not have so high a fitness drag, perhaps through the evolution of modifier genes which mask the deleterious outcomes of the initial mutant. This sort of ad hoc trial and error and “duct-taping” of kludges is part and parcel of how adaption works in situations where shocks out of equilibrium states are common.

In many cases the byproducts of a genetic change may be benign. To my knowledge no one knows major negative consequences of carrying the alleles which confer lactase persistence (excepting some studies indicating higher obesity, but this seems a marginal fitness impact which has only come to the fore in the past century in all likelihood). But in other cases the outcomes may not be as serious as that of sickle-cell anemia, but may rise above the level of significance where one must note the existence of a disease which is a secondary consequence of adaptation to meet a new challenge.

Yesterday I pointed to a paper which illustrates just this phenomenon, Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans:

African-Americans have higher rates of kidney disease than European-Americans. Here, we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. Apolipoprotein L-1 (ApoL1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.

In its implementation the paper has a lot of moving parts, but the outcome is straightforward. If you haven’t, you might read Genomes Unzipped and its post How to read a genome-wide association study. This is a case where the original association studies were not reporting false results, but, it seems that one had to take a further step to really understand the likely molecular genetic and evolutionary underpinnings of what was going on. These results suggest that the original signals of association for variants within the MYH9 gene were actually signals from within APOL1, which happened to be next to MYH9. The region around MYH9 had already showed up in tests to detect natural selection through patterns of linkage disequilibrium (non-random associations of alleles at different loci within the genome, in this case the relevant consideration are adjacent loci across continuous regions of the genome which come together to form haplotype blocks). Since the footprint of natural selection on the genome is often wide that did not imply that MYH9 was the target of natural selection per se, opening the likely possibility for other causal associations. A convenience in light of the difficulty of establishing a plausible functional relationship between renal failure and MYH9.

To explore the possibility of nearby functional candidates the researchers focused on a number of alleles within this genomic region which exhibited maximal European-African frequency differences in the 1000 Genomes Project. Once they ascertained the between population differences they then looked at differences in allele frequencies in cases and controls within the African American population for the two diseases in question (those with the trait/disease vs. those without). Table 1 has the top line raw results:


WT = “Wild Type,” the ancestral allelic variant found in most populations. G1 and G2 are two haplotypes, associated alleles across the locus of the APOL1 gene. G1 consists of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) within an exonic region of APOL1. Non-synonymous simply means that a change at that base pair alters the amino acid coded, and exons are the genomics regions whose information is eventually translated into proteins. In other words, these are non-neutral functionally significant genomic regions which do something. G2 is a 6 base pair deletion, rs71785313, close to G1 in APOL1.

apo12To more formally model the relationship between the alleles which are found to differ between cases and controls they performed a logistic regression. The alleles serve as independent variables which can predict the probable outcome of the dependent variable, the probability of FSGS or H-ESKD in this case (renal failure). Figure 1 to the left has a summary of some of the results of the regression in graphical form for FSGS. I’ve rotated it so it can fit on the screen. Basically the strong signals are to the right of the chart (from your perspective). The y-axis displays (horizontal from your perspective) negative-log of p-values for a signal at a particular marker, which is defied by the x-axis (vertical for you). The labels show the particular gene at that genomic position. The smaller the p-value, the more probable that the signal is real and not random. This produces huge spikes in the negative-log values (in the body of the paper they present p-values on the order of 10-35).

You can see that it is in APOL1 that the biggest signals reside. The first panel, A, throws all the SNPs into the mix. On MYH9 they highlight a few SNPs which combine to form the E-1 haplotype, which is strongly associated with cases (this is where the association between disease and genetic variants on MYH9 are coming from). This haplotype is found in conjunction with G1 and G2 on APOL1. E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2. A classic illustration of likely correlation but not causation. The second panel controls for the effect of G1. In other words, this is showing you the variation in the dependent variable that remains after you take the largest independent variable, G1, into account. The G2 haplotype is the largest effect independent variable after G1 is taken into account; in other words, it explains most of the residual variation in FSGS probability. Finally, the last panel controls for both G1 and G2. As you can see there aren’t any major signals left; the distribution is relatively flat. Logically once you account for the variables which produce change in an outcome you shouldn’t see any impact of other variables. And that’s what happens here. They also performed controls where MYH9 was held constant, and that does not eliminate the signals in APOL1. MYH9 is conditional on its correlation with APOL1. This was the correlation which showed up on the original association studies. The exact same pattern of signals within the logistic regression model was replicated for H-ESKD. G1 had the strongest signal, then G2. The markers within MYH9 was not significant once one controlled for the variants in G1 and G2.

It is important to remember though that these markers are segregating within a human population where individuals have three potential genotypes. Ancestral homozygote, homozygote for the mutants, and heterozygote. They found that a recessive model of expression of disease is most appropriate in the case of these risk alleles. That is, most of the increased risk is accounted for by the change from one risk allele, the heterozygote state, to two risk alleles, the homozygote state. One risk allele increased odds of renal failure by 1.26, but two by 7.3. The odds ratio of two risk alleles compared to a base rate of one risk allele was 5.8. They report that the results for FSGS were broadly similar. This matters because the frequency of the trait/disease in a random mating population is conditional on the homozygotes if it has a recessive expression pattern. G1 was present in 40% of Yoruba HapMap data set, but in none of the two Eurasian groups, Europeans and East Asians. G2 was found in three Yoruba, but in none of the Eurasian groups. Assuming Hardy-Weinberg equilibrium the Yoruba should have 16% of the population at sharply elevated risk for FSGS and H-ESKD because they’d be homozygotes for the G1 allele.

Once they established which markers seem to implicated in this phenotypic variation, they wanted to focus on how the frequencies of those markers came to be. Specifically, G1 and G2 seem to be derived haplotypes which arose out of the ancestral background. In plain English 20,000 years ago Africans should have looked like all non-Africans genomically, at least on the functionally relevant segments, but within the last 10,000 years it looks like new variants rose in frequency driven by natural selection to new environmental stresses. The region has already broadly been surveyed by linkage disequilibrium based tests, which basically look for regions of long haplotypes, homogenized zones of the genome where many individuals have the variation removed because one gene rose so rapidly in frequency that huge adjacent sections hitchhiked up in frequency. Presumably this may have happened with the MYH9 haplotype correlated with the traits under consideration here; G1 and G2 dragged up the E-1 haplotype as a secondary consequence of their own rise to prominence among some Sub-Saharan African populations.

So next authors turned to tried & tested techniques and focused on the risk markers which they had discovered earlier in their research, G1 and G2. Specifically, EHH, which is best at detecting selection where sweeps have nearly completed (e.g., the derived variant is at frequency 0.95 within the population), iHS, which is best at detecting sweeps which have not completed (e.g., the derived variant is at frequency 0.6), as well as ΔiHH, which I am less familiar with but is reputedly similar to iHS but uses absolute haplotype length as opposed to relative haplotype length. Figure 2 show the results of these tests:


The resolution isn’t the best, but G1 and G2 seem to be outliers on all three tests to detect natural selection by using patterns of linkage disequilibrium. The first panel is EHH, the second and third show iHS and ΔiHH respectively, with the position of the markers being outliers among the distribution of values for the genome within the Yoruba. This is not proof of adaptation, but it changes our weights of possibilities. Additionally, they note that Europeans exhibit no such patterns on these markers. Visually the position of the markers in the latter two panels would be closer to the mode of the distribution in Europeans.

To review, first they confirmed a causal relationship between a particular set of markers, haplotypes, and the traits of interest. Second, they confirmed that said markers seem to bear the hallmarks of genomic regions subject to natural selection. We know that focal segmental glomerulosclerosis (FSGS) end-stage kidney disease (H-ESKD), the traits whose relationship to the G1 and G2 haplotypes seem confirmed, are unlikely to be targets of positive natural selection. To get a better sense of that we need to look at Apol1, the protein product of APOL1, and what it does. At this point I’ll quote the paper:

ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T. brucei brucei) parasite…T. brucei brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans…T. brucei rhodesiense is predominantly found in Eastern and Southeastern Africa, while T. brucei gambiense is typically found in Western Africa, though some overlap exists…Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1 gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays to compare the trypanolytic potential of the variant, disease-associated forms of ApoL1 proteins with that of the “wild-type” form of ApoL1 protein that is not associated with renal disease.

We’re talking about sleeping sickness. Here’s a description:

It starts with a headache, joint pains and fever. It is the kind you would expect to get over quickly. But after a while, things get worse. You fall asleep most of the time, are confused and get intense pains and convulsions.

If you do not get treatment, your body begins to waste away. Eventually, you slip into coma and die. This is human African trypanosommiasis, better known as sleeping sickness. If untreated, it kills 100% of its victims in a very short time.

Cheery. I think we have a plausible reason for natural selection to kick into overdrive! Or more specifically, we have a plausible external selection pressure which will drive fitness differentials which correlate with genetic variation. Increased probability of kidney disease seems preferable to this. In terms of the molecular genetics it looks like a factor, serum resistance-associated protein (SRA), produced by T. brucei rhodesiense binds to a specific location of Apol1, and that mutations at G1 and G2 change exactly that location within the protein. So these mutants may block the ability of T. brucei rhodesiense to turn off the body’s defenses against trypanosomes.

To test this they examined the in vitro lytic potential of serum produced by individuals carrying the G1 and G2 haplotypes against the three subspecies of of Trypanosoma. T. brucei brucei, which normal Apol1 can lyse, and T. brucei rhodesiense and T. brucei gambiense which can infect humans (endemic to eastern and western Africa respectively, though the former extends into west Africa as well).

- All 75 samples lysed brucie brucie

- None lysed brucie gambiense

- 46 samples lysed SRA-positive brucie rhodesiense, all 46 samples were from G1 or G2 carrying individuals

- The potency of G2 seemed higher than G1 against SRA-positive samples of brucie rhodesiense, though not SRA-negative samples, where G1 seemed as potent

- Recombinants of Apol1 which had only one of the two SNPs of the G1 haplotype were less effective against brucie rhodesiense than those which had both (G1 haplotype)

- Recombinants with G1 and G2 were not more effective against brucie rhodesiense than those with G2 alone

- Recombinants with G1 alone were more potent against SRA-negative brucie rhodesiense than those with G2 alone

- G2 was necessary and sufficient to block SRA binding to Apol1 and allow lysing of brucie rhodesiense. G1 did not block SRA binding to Apol1, but was still sufficient to lyse brucie rhodesiense, but far less potent against SRA-positive brucie rhodesiense than G2

It seems that the G1 and G2 haplotypes utilize different mechanisms to enable the lysing of invasive pathogens, and so prevent the development of sleeping sickness. Their means differ, but the ends are the same. The authors note that even minimal amounts of plasma serum produced by G2 individuals seems potent enough to block the binding of SRA to Apol1 and so enable lysis. And introduction of such plasma into the bloodstreams of individuals who do not have resistance may then be highly efficacious as a preventative treatment against sleeping sickness. They do note that they did not explore in detail the mechanism by which the G1 and G2 variants result in suscepbility to kidney failure, but that’s presumably for the future.

Finally, the second to last paragraph where they bring it all together:

It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T. brucei rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T. brucei rhodesiense could have favored the spreading of T. brucei gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T. brucei rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest…Thus, resistance to T. brucei rhodesiense may not be the only factor causing these variants to be selected.

This is a very long review already. But, while I have your attention, I think I need to point to another paper on the same topic which has a slightly different twist. I won’t dig into the details with the same thoroughness as above, but rather I’ll highlight the value-add of this group’s contribution. It’s an Open Access paper, unlike the one above, so you can review it in depth yourself. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene:

MYH9 has been proposed as a major genetic risk locus for a spectrum of nondiabetic end stage kidney disease (ESKD). We use recently released sequences from the 1000 Genomes Project to identify two western African-specific missense mutations (S342G and I384M) in the neighboring APOL1 gene, and demonstrate that these are more strongly associated with ESKD than previously reported MYH9 variants. The APOL1 gene product, apolipoprotein L-1, has been studied for its roles in trypanosomal lysis, autophagic cell death, lipid metabolism, as well as vascular and other biological activities. We also show that the distribution of these newly identified APOL1 risk variants in African populations is consistent with the pattern of African ancestry ESKD risk previously attributed to MYH9. Mapping by admixture linkage disequilibrium (MALD) localized an interval on chromosome 22, in a region that includes the MYH9 gene, which was shown to contain African ancestry risk variants associated with certain forms of ESKD…MYH9 encodes nonmuscle myosin heavy chain IIa, a major cytoskeletal nanomotor protein expressed in many cell types, including podocyte cells of the renal glomerulus. Moreover, 39 different coding region mutations in MYH9 have been identified in patients with a group of rare syndromes, collectively termed the Giant Platelet Syndromes, with clear autosomal dominant inheritance, and various clinical manifestations, sometimes also including glomerular pathology and chronic kidney disease…Accordingly, MYH9 was further explored in these studies as the leading candidate gene responsible for the MALD signal. Dense mapping of MYH9 identified individual single nucleotide polymorphisms (SNPs) and sets of such SNPs grouped as haplotypes that were found to be highly associated with a large and important group of ESKD risk phenotypes, which as a consequence were designated as MYH9-associated nephropathies…These included HIV-associated nephropathy (HIVAN), primary nonmonogenic forms of focal segmental glomerulosclerosis, and hypertension affiliated chronic kidney disease not attributed to other etiologies…The MYH9 SNP and haplotype associations observed with these forms of ESKD yielded the largest odds ratios (OR) reported to date for the association of common variants with common disease risk…Two specific MYH9 variants (rs5750250 of S-haplotype and rs11912763 of F-haplotype) were designated as most strongly predictive on the basis of Receiver Operating Characteristic analysis…These MYH9 association studies were then also extended to earlier stage and related kidney disease phenotypes and to population groups with varying degrees of recent African ancestry admixture…and led to the expectation of finding a functional African ancestry causative variant within MYH9. However, despite intensive efforts including re-sequencing of the MYH9 gene no suggested functional mutation has been identified…This led us to re-examine the interval surrounding MYH9 and to the detection of novel missense mutations with predicted functional effects in the neighboring APOL1 gene, which are significantly more associated with ESKD than all previously reported SNPs in MYH9.

Table one has the top line results. Focus on the first two rows, they’re “G1″ from the earlier study (that is, the two SNPs which combine to form the G1 haplotype).


Here’s a difference between the previous paper and this one: the table above uses cases and controls from African Americans and Hispanic Americans. The original paper which the genomic data on this sample is drawn from calculates the average ancestry of African, European and Native American in the two groups is as follows (I did some rounding to keep the values round):

African American – 85%, 10%, 5%
Hispanic American – 30%, 55%, 15%

Not surprisingly the Hispanic American sample here is mostly Puerto Rican and Dominican, explaining the greater African than Native American ancestry. Nevertheless, it is a sufficiently different genetic background to test the effects of the same marker against different genes. They confirmed the association of the markers of large effect in African Americans within the Hispanic cohort. The risk allele frequency in the African American control group is 21% vs. 37% in the cases. For Hispanic Americans are 6% and 23% for the same categories.

OK, now to the most interesting point in this short paper:

HIVAN has been considered as the most prominent of the nondiabetic forms of kidney disease within what has been termed the MYH9-associated nephropathies…We have reported absence of HIVAN in HIV infected Ethiopians, and attributed this to host genomic factors (Behar et al. 2006). Therefore, we examined the allele frequencies of the APOL1 missense mutations in a sample set of 676 individuals from 12 African populations, including 304 individuals from four Ethiopian populations…We coupled this with the corresponding distributions for the African ancestry leading MYH9 S-1 and F-1 risk alleles. A pattern of reduced frequency of the APOL1 missense mutations and also of the MYH9 risk variants was noted in northeastern African in contrast to most central, western, and southern African populations examined…Especially striking was the complete absence of the APOL1 missense mutations in Ethiopia. This combination of the reported lack of HIVAN and observed absence of the APOL1 missense mutations is consistent with APOL1 being the functionally relevant gene for HIVAN risk and likely the other forms of kidney disease previously associated with MYH9.

apo16Bingo. The previous paper focused on African Americans (along with the HapMap Yoruba). But the pattern of variation within Africa is interesting as well. Ethiopians are not quite like other Africans, having a great deal of admixture with populations from Arabia (many of the languages of highland Ethiopia are Semitic). But the majority of their ancestry remains similar to that of other Sub-Saharan Africans. As a point of contrast the ecology of Ethiopia differs a great deal from the rest of Sub-Saharan Africa because of its elevation, and concomitant frigidity. The mean monthly low in Addis Ababa is around 10 (50 for Americans) degrees and mean high 20-25 (high 60s to mid 70s for Americans). There isn’t much variation from month to month because of the low latitude, but the high elevation keeps the temperatures relatively moderate. Different environments result in different selection pressures, and Ethiopia has a very unique environment within Africa. The tsetse fly which serves as a vector forTtrypanosomes does not seem to be present in the Ethiopian highlands. The map above shows the distribution within Africa of one the markers which defines the G1 haplotype in the previous paper. Note that the modal frequency is in the west of Africa, and the frequency drops off to the east (though the geographic coverage leaves a bit to be desired if you look at the raw data which went into generating this map, which smooths over huge discontinuities).

One of the points I want to reemphasize from the tests of natural selection in the first paper is that these genetic adaptations are likely to be new, otherwise recombination would have broken up the long haplotypes and reduced linkage disequilibrium. New as in the last 10,000 years. It is interesting that a particular subspecies of Trypanosome which is immune to these genetic adaptations is endemic to west Africa. We may be seeing evolution in action here, or at least the arms race between man and pathogen where man is always one step behind. In contrast, the subspecies which is effectively diffused by the genetic adaptations reviewed here is present in higher numbers precisely in the regions where the resistance mutations are extant at lower proportions. Perhaps there are different mutations in these regions of Africa, not yet properly identified. Or perhaps the we’re seeing humans in this region at an earlier stage of the dance, so to speak.

Citation: Giulio Genovese, David J. Friedman, Michael D. Ross, Laurence Lecordier, Pierrick Uzureau, Barry I. Freedman, Donald W. Bowden, Carl D. Langefeld, Taras K. Oleksyk, Andrea Uscinski Knob, Andrea J. Bernhardy, Pamela J. Hicks, George W. Nelson, Benoit Vanhollebeke, Cheryl A. Winkler, Jeffrey B. Kopp, Etienne Pays, & Martin R. Pollak (2010). Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans Science : 10.1126/science.1193032

Citation: Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, Bekele E, Bradman N, Wasser WG, Behar DM, & Skorecki K (2010). Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Human genetics PMID: 20635188

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Mutations are as you know a double-edged sword. On the one hand mutations are the stuff of evolution; neutral changes on the molecular or phenotypic level are the result of from mutations, as are changes which enhance fitness and so are driven to fixation by positive selection. On the other hand mutations also tend to cause problems. In fact, mutations which are deleterious far outnumber those which are positive. It is much easier to break complex systems which are near a fitness optimum than it is to improve upon them through random chance. In fact a Fisherian geometric analogy of the affect of genes on fitness implies that once a genetic configuration nears an optimum mutations of larger effect have a tendency to decrease fitness. Sometimes environments and selection pressures change radically, and large effect mutations may become needful. But despite their short term necessity these mutations still cause major problems because they disrupt many phenotypes due to pleiotropy.

But much of the playing out of evolutionary dynamics is not so dramatic. Instead of very costly mutations for good or ill, most mutations may be of only minimal negative effect, especially if they are masked because of recessive expression patterns. That is, only when two copies of the mutation are present does all hell break loose. And yet even mutations which exhibit recessive expression tend to generate some drag on the fitness of heterozygotes. And if you sum small values together you can obtain a larger value. This gentle rain of small negative effect mutations can be balanced by natural selection, which weeds does not smile upon less fit individuals who have a higher mutational load. Presumably those with “good genes,” fewer deleterious mutations, will have more offspring than those with “bad genes.” Because mutations accrue from one generation to the next, and, there is sampling variance of deleterious alleles, a certain set of offspring will always be gifted with fewer deleterious mutations than their siblings. This is a genetics of chance. And so the mutation-selection balance is maintained over time, the latter rising to the fore if the former comes to greater prominence.

The above has been a set of logic inferences from premises. Evolution is about the logic of life’s process, but as a natural science its beauty is that it is testable through empirical means. A short report in Science explores mutational load and fitness, and connects it with the ever popular topic of sexual selection, Additive Genetic Breeding Values Correlate with the Load of Partially Deleterious Mutations:

The mutation-selection–balance model predicts most additive genetic variation to arise from numerous mildly deleterious mutations of small effect. Correspondingly, “good genes” models of sexual selection and recent models for the evolution of sex are built on the assumption that mutational loads and breeding values for fitness-related traits are correlated. In support of this concept, inbreeding depression was negatively genetically correlated with breeding values for traits under natural and sexual selection in the weevil Callosobruchus maculatus. The correlations were stronger in males and strongest for condition. These results confirm the role of existing, partially recessive mutations in maintaining additive genetic variation in outbred populations, reveal the nature of good genes under sexual selection, and show how sexual selection can offset the cost of sex.

mutAdditive genetic variance just refers to the variation of genes which affect the phenotype by independent and usually small effects which sum together to produce the range of variation of the trait. Imagine for example that the range of variation in height within the population was 10 inches, and that there were 10 genes which varied, and that each gene exhibited co-dominance. One could construct a model where every gene pair could add 0, 0.5 or 1 inch to the height independently, so that the maximum height could be constructed by adding 10 inches to the baseline and 1 inch per locus, and the minimum height by adding no inches to the baseline when each locus is homozygous for null alleles.

Mutations can be conceived of in the same manner, with each mutation being a new variant which changes trait value. Even if most of the impact of a mutation is masked there is a small effect in the heterozygote state, and this may serve as a fitness drag. The range in mutational load can then naturally be analogized to additive genetic variance, in this case the trait under consideration ultimately being fitness, mediated through life history and morphological phenotypes.

In this report they focused primarily on the weevil’s ability to obtain resources and transform those resources into size, which correlates with greater sexual access for males and fecundity for females (ergo, greater fitness). They bred various outbred and inbred lineages across families of these weevils, because these sorts of crosses gauge the impact of masked deleterious alleles, which will manifest in homozygote state more often between related pairs who share mutations than unrelated ones. They found a correlation of -0.24 between inbreeding and breeding value; in other words the more inbred the pair the fewer offspring. The impact of these recessively expressed alleles is mitigated in heterozygous individuals, but because of the non-trivial impact the number of these alleles within an individual will determine its fitness all things equal.

328_892_F1Interestingly when background variables were controlled males tended to show the greatest fitness drag due to inbreeding depression. This would comport with models of sexual selection where males justify their expense (because they can not bear offspring) within the population by serving as the perishable dumping grounds of bad genes. In particular in a polygynous population a few healthy males with good genes could give rise to most of the next generation, and so providing the balance of selection to the background mutational rate.

Of course mating patterns vary between taxa. The more reproductive skew there is, in particular for males, the more recourse selection has every generation to dump deleterious alleles via selection. In contrast monogamous populations will have less power to expunge mutations in this fashion because there is more genetic equality across males, the bad will reproduce along with the good, more or less. Therefore a breeding experiment of weevils may have more limited insight than these authors may wish to admit. Geoffrey Miller’s The Mating Mind attempted to take the insights of sexual selection and develop a model of human evolutionary history, but it does not seem that this theory has swept all before it. Only time will tell, but until then more breeding experiments can’t help but clarify where theory goes wrong or right.

Citation: Tomkins, J., Penrose, M., Greeff, J., & LeBas, N. (2010). Additive Genetic Breeding Values Correlate with the Load of Partially Deleterious Mutations Science, 328 (5980), 892-894 DOI: 10.1126/science.1188013

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Biology, Genetics, Good Genes, Selection, Sex 
🔊 Listen RSS

443px-PaldenLhamoYou probably are aware that different populations have different tolerances for high altitudes. Himalayan sherpas aren’t useful just because they have skills derived from their culture, they’re actually rather well adapted to high altitudes because of their biology. Additionally, different groups seem to have adapted to higher altitudes independently, exhibiting convergent evolution. But in terms of physiological function they aren’t all created equal, at least in relation to the solutions which they’ve come to to make functioning at high altitudes bearable. In particular, it seems that the adaptations of the peoples of Tibet are superior than those of the peoples of the Andes. Superior in that the Andean solution is more brute force than the Tibetan one, producing greater side effects, such as lower birth weight in infants (and so higher mortality and lower fitness).

The Andean region today is dominated by indigenous people, and Spanish is not the lingua franca of the highlands as it is everyone in in the former colonial domains of Spain in the New World. This is largely a function of biology; as in the lowlands of South America the Andean peoples were decimated by disease upon first contact (plague was spreading across the Inca Empire when Pizzaro arrived with his soldiers). But unlike the lowland societies the Andeans had nature on their side: people of mixed or European ancestry are less well adapted to high altitudes and women without tolerance of the environment still have higher miscarriage rates.

So despite the suboptimal nature of the Andean adaptations vis-a-vis the Tibetan ones, they are certainly better than nothing, and in a relative sense have been very conducive to higher reproductive fitness. And yet why might the Andeans have kludgier adaptations than Tibetans? One variable to consider is time. The probability is that the New World was populated by humans only for the past ~10,000-15,000 years or so, with an outside chance of ~20,000 years (if you trust a particular interpretation of the genetic data, which you probably shouldn’t). By contrast, modern humans have had a presence in the center of Eurasia for ~30,000 years. Generally when populations are exposed to new selective regime the initial adaptations are drastic and exhibit major functional downsides, but they’re much better than the status quo (remember, fitness is relative). Over time genetic modifications mask the deleterious byproducts of the genetic change which emerged initially to deal with the new environment. In other words, selection perfects design over time in a classic Fisherian sense as the genetic architecture converges upon the fitness optimum.*

Another parameter may be the variation available within the population, as the power of selection is proportional to the amount of genetic variation, all things equal. The peoples of the New World tend to be genetically somewhat homogeneous, probably due to the fact that they went through a bottleneck across Berengia, and that they’re already sampled from the terminus of the Old World. A physical anthropologist once told me that the tribes of the Amazon still resemble Siberians in their build. It may be that it takes a homogeneous population with little extant variation a long time indeed to shift trait value toward a local ecological optimum (tropical Amerindians are leaner and less stocky than closely related northern populations, just not particularly in relation to other tropical populations). In contrast, populations in the center of Eurasia have access to a great deal of genetic variation because they’re in proximity to many distinctive groups (the Uyghurs for example are a recent hybrid population with European, South Asian and East Asian ancestry).

So that’s the theoretical backdrop for the differences in adaptations. Shifting to the how the adaptations play out concretely, some aspects of the physiology of Tibetan tolerance of high altitudes are mysterious, but one curious trait is that they actually have lower levels of hemoglobin than one would expect. Andean groups have elevated hemoglobin levels, which is the expected “brute force” response. Interestingly it seems that evolution given less time or stabilizing at a physiologically less optimal equilibrium is more comprehensible to humans! Nature is often more creative than us. In contrast the Tibetan adaptations are more subtle, though interestingly their elevated nitric acid levels may facilitate better blood flow. Though the inheritance patterns of the trait had been observed, the genetic mechanism underpinning it has not been elucidated. Now a new paper in Science identifies some candidate genes for the various physiological quirks of Tibetans by comparing them with their neighbors, and looking at the phenotype in different genotypes with the Tibetan population. Genetic Evidence for High-Altitude Adaptation in Tibet:

Tibetans have lived at very high altitudes for thousands of years, and they have a distinctive suite of physiological traits that enable them to tolerate environmental hypoxia. These phenotypes are clearly the result of adaptation to this environment, but their genetic basis remains unknown. We report genome-wide scans that reveal positive selection in several regions that contain genes whose products are likely involved in high-altitude adaptation. Positively selected haplotypes of EGLN1 and PPARA were significantly associated with the decreased hemoglobin phenotype that is unique to this highland population. Identification of these genes provides support for previously hypothesized mechanisms of high-altitude adaptation and illuminates the complexity of hypoxia response pathways in humans.

Here’s what they did. First, Tibetans are adapted to higher altitudes, Chinese and Japanese are not. The three groups are relatively close genetically in terms of ancestry, so the key is to look for signatures of positive selection in regions of the genome which have been identified as possible candidates in terms of functional significance in relation to pathways which may modulate the traits of interest. After finding potential regions of the genome possibly under selection in Tibetans but not the lowland groups, they fixed upon variants which are at moderate frequencies in Tibetans and noted how the genes track changes in the trait.

This figure from the supplements shows how the populations are related genetically:


In a worldwide context the three groups are pretty close, but they also don’t overlap. The main issue I would have with this presentation is that the Chinese data is from the HapMap, and they’re from Beijing. This has then a northeast Chinese genetic skew (I know that people who live in Beijing may come from elsewhere, but recent work which examines Chinese phylogeography indicates that the Beijing sample is not geographically diversified), while ethnic Tibetans overlap a great deal with Han populations in the west of China proper. In other words, I wouldn’t be surprised if the separation between Han and Tibetan was far less if you took the Chinese samples from Sichuan or Gansu, where Han and Tibetans have lived near each other for thousands of years.

tib2But these issues of phylogenetic difference apart, we know for a fact that lowland groups do not have the adaptations which are distinctive to the Tibetans. To look for genetic differences they focused on 247 loci, some from the HIF pathway, which is important for oxygen homeostasis, as well genes from Gene Ontology categories which might be relevant to altitude adaptations. Table 1 has the breakdown by category.

Across these regions of the genome they performed two haplotype based tests which detect natural selection, EHH and iHS. Both of these tests basically find regions of the genome which have reduced variation because of a selective sweep, whereby selection at a specific region of the genome has the effect of dragging along large neutral segments adjacent to the original copy of the favored variant. EHH is geared toward detection of sweeps which have nearly reached fixation, in other words the derived variant has nearly replaced the ancestral after a bout of natural selection. iHS is better at picking up sweeps which have not resulted in the fixation of the derived variant. The paper A Map of Recent Positive Selection in the Human Genome outlines the differences between EHH and iHS in more detail. They looked at the three populations and wanted to find regions of the genome where Tibetans, but not the other two groups, were subject to natural selection as defined by positive signatures with EHH and iHS. They scanned over 200 kb windows of the genome, and found that 10 of their candidate genes were in regions where Tibetans came up positive for EHH and iHS, but the other groups did not. Since these tests do produce false positives they ran the same procedure on 240 random candidate genes (7 genes were in regions where Chinese and Japanese came up positive, so these were removed from the set of candidates), and came up with average EHH and iHS positive hits of ~2.7 and ~1.4 genes after one million resamplings (specifically, these are genes where Tibetans were positive, the other groups negative). Their candidate genes focused on altitude related physiological pathways yielded 6 for EHH and 5 for iHS (one gene came up positive for both tests, so 10 total). This indicates to them these are not false positives, something made more plausible by the fact that we know that Tibetans are biologically adapted to higher altitudes and we have an expectation that these genes are more likely than random expectation to have a relationship to altitude adaptations.

Finally, they decided to look at two genes with allelic variants which exist at moderate frequencies in Tibetans, EGLN1 and PPARA. The procedure is simple, you have three genotypes, and you see if there are differences across the 31 individuals by genotype in terms of phenotype. In this case you want to look at hemoglobin concentration, where those who are well adapted have lower concentrations. Figure 3 is rather striking:


Even with the small sample sizes the genotypic effect jumps out at you. This isn’t too surprising, previous work has shown that these traits are highly heritable, and that they vary within the Tibetan population. There’s apparently a sex difference in terms of hemoglobin levels, so they did a regression analysis, and it illustrates how strong the genetic effect from these alleles are:


My main question: why do Tibetans still have variation on these genes after all this time? Shouldn’t they be well adapted to high altitudes by now? A prosaic answer may be that the Tibetans have mixed with other populations recently, and so have added heterozygosity through admixture. But there are several loci here which are fixed in Tibetans, and not the HapMap Chinese and Japanese. For admixture to be a good explanation one presumes that the groups with which the Tibetans mixed would have been fixed for those genes as well, but not the ones at moderate frequencies. This may be true, but it seems more likely that admixture alone can not explain this pattern. As the Andean example suggests adaptation to high altitudes is not easy or simple. Until better options arrive on the scene, kludges will suffice. It may be that the Tibetans are still going through the sieve of selection, and will continue to do so for the near future. Or, there may be balancing dynamics on the genes which exhibit heterozygosity, so that fixation is prevented.

No matter what the truth turns out to be, this is surely just the beginning. A deeper investigation of the genetic architecture of Andeans and Ethiopians, both of which have their own independent adaptations, will no doubt tell us more. Finally, I wonder if these high altitude adaptations have fitness costs which we’re not cognizant of, but which Tibetans living in India may have some sense of.

Citation: Tatum S. Simonson, Yingzhong Yang, Chad D. Huff, Haixia Yun, Ga Qin, David J. Witherspoon, Zhenzhong Bai, Felipe R. Lorenzo, Jinchuan Xing, Lynn B. Jorde, Josef T. Prchal, & RiLi Ge (2010). Genetic Evidence for High-Altitude Adaptation in Tibet Science : 10.1126/science.1189406

* Additionally, it may be that archaic hominin groups were resident in the Himalaya for nearly one million years. Neandertal admixture evidence in Eurasians should change our priors when evaluating the possibility for adaptive introgression on locally beneficial alleles.

Image Credit: Wikimedia Commons

(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"