The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Selective Sweep

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

The Pith: Evolution is a sloppy artist. Upon the focal zone of creative energy adaptation can sculpt with precision, but on the margins of the genetic landscape frightening phenomena may erupt due to inattention. In other words, there are often downsides to adaptation.

A few weeks ago I reviewed a paper which suggests that Crohn’s disease may be a side effect of a selective sweep. The sweep itself was possibly driven by adaptation to nutrient deficiencies incurred by European farmers switching to a grain based diet. The reason for this is a contingent genomic reality: the positively selected genetic variant was flanked by a Crohn’s disease risk allele. The increment of fitness gain of the former happens to have been greater than the decrement entailed by the latter, resulting in the simultaneous increase in the frequency of both the fit and unfit variants. You can’t always have one without the other.

But that’s just focusing on one gene, though the authors did indicate that this may be a genome-wide feature. A new paper in PLoS Genetics argues that that is the case, at least to some extent. Evidence for Hitchhiking of Deleterious Mutations within the Human Genome:

Deleterious mutations reduce fitness within natural populations and must be continually removed by natural selection. However, some deleterious mutations reach unexpectedly high frequencies. There are a number of mechanisms by which this could occur, including changes in genetic or environmental constraints. Here, we investigate the hypothesis that some deleterious mutations have hitchhiked to high frequency due to linkage to sites that have been under positive selection. Using a collated set of regions likely to have been influenced by positive selection, we find that the number of deleterious polymorphisms in hitchhiking and non-hitchhiking regions is similar, but that the ratio of deleterious to neutral polymorphism is higher in hitchhiking compared to non-hitchhiking regions. Both computer simulations and empirical data indicate that while hitchhiking eliminates many deleterious mutations, some are increased in frequency. The distribution of human disease-associated mutations is also altered in hitchhiking compared to non-hitchhiking regions. Together, our results provide evidence that hitchhiking has influenced the frequency of linked deleterious mutations in humans, implying that the evolutionary dynamics of advantageous and deleterious mutations may often depend on one another.

To understand what’s going on here recall that the genome consists of a sequence of base pairs. Some of these base pairs code for genes, and some do not. In the former class you have bases which can be changed and still not alter the final protein product, and so are “synonymous,” and those whose change does alter the final protein product, and so are “nonsynonymous.” By and large synonymous alternatives are often selectively “neutral,” in other words they don’t have a positive or negative impact on fitness (though as you may know some of these nonfunctional regions turn out to have some selective relevance). Nonsynonymous changes can be neutral as well, but they may also have functional consequences which are negative or positive. More often than not the consequences are negative, and nonsynonymous mutations are going to be “purified” from the genome through background selection. Imagine if you will that the genome is always bubbling with new mutations. A substantial proportion of these are deleterious, but they are quickly nipped in the bud by purifying selection, which is constantly pruning and constraining critical regions of the genome. In a few cases though a mutation may be positively selected due to its adaptive value. Instead of being purified, this variant will increase in frequency rapidly, generating a selective sweep which reorganizes flanking regions of the genome. Neutral variants may “hitchhike” along with their selectively favored neighbors, and because of this correlated increase in frequency of linked variants on a population wide scale the swept region of the genome will exhibit a high degree of homogenization. And more broadly framing this constant patter of purifying selection and welter of selective sweeps you always have neutral dynamics, as innocuous mutants rise to prominence or fade into the background via the chance winds of fortune.

In other words, the genome is a complex and interlocking system, which has its own logic which constrains and shapes evolutionary process. To a first approximation it may be useful to view it as a “black box,” a substrate which exists only to mediate the action of natural selection and random genetic drift over the generations. But to properly characterize the fine-grained texture of the patterns of biological variation as they are we need to grapple with the concrete reality of the genome, and the broad features which characterize its landscape. Some of these complex parameters are exogenous to the DNA structure; population genetic abstractions. Random genetic drift is affected by population size and structure. Natural selection is highly context dependent, both as a function of space and time. But other parameters are endogenous. The number of chromosomes, whether the organism is diploid, the variations in the rate of recombination. These are all features of the genome itself.

The ratio at the heart of this paper, the result that deleterious alleles may gain more from hitchhiking on positives selective sweeps than neutral alleles, is at the intersection of exogenous and endogenous parameters. Neutral dynamics can be effected by changes in population size, while the phenomenon of hitchhiking occurs due to the biophysical structure of the genome. The question which looms large here is this: why are some deleterious alleles present at a high frequency? One of the answers may be that they are parasites upon the favor shown to their adaptively beneficial neighbors. Normally the frequency distribution of deleterious alleles is skewed toward the low end, presumably because purifying selection is constantly tamping down the upward fluctuations in frequency. But with a selective sweep the gentle constraints which keep these deleterious variants in check are pushed aside by the hammer-blow of adaptive processes, which exist not to finely tune the genetic architecture for the long run, but succeed in the short run, no matter the cost. What was once a rare loss of function deleterious variant may become a far less rare loss of function variant due to a chance association with a selectively favored allele. In contrast, neutral variants exhibit a much wider range of frequencies, from fixed, to moderate, to low frequency.

But the final conclusions of the paper are tentative and cautious. There are obviously many other population biological processes which are responsible for the preservation of deleterious alleles. Balancing selection of various kinds comes to mind. Instead of finding the one answer the aim of the authors here seems to be to sketch out one important possible piece of the puzzle.

Citation: Chun S, Fay JC (2011) Evidence for Hitchhiking of Deleterious Mutations within the Human Genome. PLoS Genet 7(8): e1002240. doi:10.1371/journal.pgen.1002240

🔊 Listen RSS

The Pith: Natural selection is a quick & dirty operator. When subject to novel environments it can react rapidly, bringing both the good and the bad. The key toward successful adaptation is not perfection, but being better than the alternatives. This may mean that many contemporary diseases are side effects of past evolutionary genetic compromises.

The above is a figure from a recent paper which just came out in Molecular Biology and Evolution, Crohn’s disease and genetic hitchhiking at IBD5. You probably have heard about Crohn’s disease before, there are hundreds of thousands of Americans afflicted with it. It’s an inflammatory bowel ailment, and it can be debilitating even to very young people. The prevalence also varies quite a bit by population. Why? It could be something in the environment (e.g., different diet) or genetic predisposition, or some combination. What the figure above purports to illustrate is the correlation between Crohn’s disease and the expansion of the agricultural lifestyle.

But don’t get overexcited Paleos! There are many moving parts to this story, and I need to back up to the beginning. The tens of thousands of genes which you inherited from your parents are embedded within the genome and aligned in a set of sequences, one after the other. On the one hand for the purposes of conceptualizing evolutionary dynamics, such as natural selection or random genetic drift, focusing on a single gene is useful. It has power to illustrate some basic and elementary principles. But sometimes you need to take a more synoptic view, and look at genes in their broader context. In this post I’ll avoid molecular or statistical epistasis, gene-gene interaction. Rather, let’s just consider the static landscape of the genome, where genes are physical concrete entities which are embedded in a particular spatial relationship to other genes, upstream or downstream in the genetic code. These physical or statistical associations of genes can form a de facto supergene through linkage, and their variants combine to form haplotypes, sequences of markers across small stretches of the genome. But recall that these associations are counter-balanced by genetic recombination, which tears apart physical sequences and sows them to the opposite DNA strand.

The big picture that the above highlights is the fact that evolutionary dynamics operate not just on the gene, but also upon the local genetic neighborhood. Therefore, when we talk about selection upon a gene, we need to recall that this has consequences for that gene’s neighbors. Let’s use a concrete and real example. Northern Europeans tend to have very long haplotypes around the LCT gene, which encodes the production of lactase. Functionally this haplotype has embedded within it a variant which allows for continued production of lactase as an adult, and therefore the ability to extract all the calories from milk beyond childhood in the form of lactose sugar. The molecular genetic details of how this happens does not concern us. Instead, let’s consider why LCT is characterized by a very long haplotype.

This is what we think happened. Between 5 and 10 thousand years before the present there lived an individual who carried a dominant genetic mutation which allowed for the persistent production of lactase into adulthood. Only one copy of the lactase persistent allele is needed for lactose tolerance. That’s why populations such as in Denmark where the persistent allele is present in proportions of 80-90% have nearly universal tolerance. As per the Hardy-Weinberg equilibrium a recessive trait would express at frequencies of 1-4% (square the frequency of the minor allele). Going back to the individual with the mutant copy, if one considers a scenario where lactase persistence would be highly beneficial (this is not hard to imagine) then the frequency of that mutant would rapidly rise. It would “sweep” through the population. As it has a dominant mode of expression half of the children of the original mutant would express the trait and carry the allele, while half would not. Over the generations that one original copy could replicate rapidly within a population due to positive selection and intermarriage.

But it’s not just the functionally relevant genetic variant which would proliferate. The lactase persistent allele would be embedded within the context of a host of other genetic variants across the sequence of the DNA strand in which it was located. As the lactase persistent allele rose rapidly in frequency in a selective sweep its neighbors would hitchhike along. The extent of the hitchhiking would be conditional upon distance from the positively selected variant and the speed of the sweep, which itself would presumably depend upon the strength of selection. All of this together explains the very long haplotype around LCT in Northern Europeans: 5 to 10 thousand years ago a relatively large genomic segment of an individual who carried a lactase persistent allele was driven up in frequency very rapidly because of adaption to new conditions. Not only did that particular individual’s functionally relevant variant, the target of selection, sweep nearly to fixation in some Northern European populations, but many adjacent variants also rose in frequency, in direct proportion from distance from the focal variant. In other words, natural selection in this case was about one specific functional unit within LCT, but as a side effect it also reorganized a whole swath of the total population genome structure of Northern Europeans.

What does that have to do with Crohn’s disease and agriculture? Crohn’s disease may be a modification of the LCT story in a genomic sense, and the trigger of that modification may have been agriculture. Before I go any further, let me post the paper’s abstract:

IBD5 (inflammatory bowel disease 5) is a 250 kb haplotype on chromosome 5 that is associated with an increased risk of Crohn’s disease in Europeans. The OCTN1 gene is centrally located on IBD5 and encodes a transporter of the antioxidant ergothioneine (ET). The 503F variant of OCTN1 is strongly associated with IBD5 and is a gain-of-function mutation that increases absorption of ET. Although 503F has been implicated as the variant potentially responsible for Crohn’s disease susceptibility at IBD5, there is little evidence beyond statistical association to support its role in disease causation. We hypothesize that 503F is a recent adaptation in Europeans that swept to relatively high frequency, and that disease association at IBD5 results not from 503F itself, but from one or more nearby hitchhiking variants, in the genes IRF1 or IL5. To test for evidence of recent positive selection on the 503F allele, we employed the iHS statistic, which was significant in the European…populations…To evaluate the hypothesis of disease-variant hitchhiking, we performed haplotype association tests on high-density microarray data in a sample of 1868 Crohn’s disease cases and 5550 controls. We found that 503F haplotypes with recombination breakpoints between OCTN1 and IRF1 or IL5 were not associated with disease…In contrast, we observed strong disease association for 503F haplotypes with no recombination between these three gene…as expected if the sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5. To further evaluate these disease-gene candidates, we obtained expression data from lower gastrointestinal biopsies of healthy individuals and Crohn’s disease patients. We observed a 72% increase in gene expression of IRF1 among Crohn’s disease patients (p=0.0006) and no significant difference in expression of OCTN1….

It’s all a mouthful. But let’s review here. IBD5 is a 250 kilobase haplotype implicated in Crohn’s disease. A long segment of associated markers which also seem to correlate with individuals with the illness. This does not imply that the whole segment is causally connected with Crohn’s disease. But, there are two genes which have been pegged as likely candidates, IRF1 and IL5. Finally, there’s another gene, OCTN1, which is statistically associated with Crohn’s disease, but lacks a biologically plausible connection. Rather, it seems to have a role in absorption of the amino acid ergothioneine, with the 503F allele of OCTN1 resulting in gain of function in regards to this process. Interestingly the authors observe that OCTN1 is positioned exactly in the middle of the haplotype. In other words, you can think of the genome upstream and downstream of OCTN1 extending out across the haplotype as two wings or fringes of this gene.

The IBD5 haplotype is the broader landscape. IRF1, IL5, and OCTN1 are general features embedded within this landscape. 503F is a specific feature, in that it is a flavor of OCTN1. Crohn’s disease is another phenomenon which has an association with this genomic landscape, but is of a different class or category. It is correlated in particular with IBD5 haplotypes with 503F allele. The main aim of this paper is to tease apart all these multitudinous associations. What the authors found is that in terms of biochemistry the symptoms of Crohn’s disease are not correlated with the 503F allele if that allele is not associated with known risk variants of IRF1 and IL5. These are instances where genetic recombination has broken apart the association which couples 503F with the risk alleles of those two genes. The architecture of the genomic landscape then in this case has obscured the more specific causal chain which leads to an increased risk for Crohn’s disease.

So what happened? The authors posit that the 503F allele was selectively favored at some point in the past, and flanking it were the Crohn’s disease risk elevating variants of IRF1 and IL5. All things equal it is best not to have a risk for this disease, but all things are not equal. If there was a strong enough selective pressure on the target, 503F, then the downsides of the fact that it came as a “total package” with some deleterious alleles would be irrelevant. Over a long enough evolutionary time the deleterious alleles would be purified through negative selection because recombination does break apart associations, but there’s a lot of reality which consists of being between beginnings and ends.

To infer that 503F was the target of natural selection the authors used a haplotype based test for detecting such this phenomeon, iHS. This test tends to detect selective sweeps in midstream, or those which do not shift to fixation because of balancing dynamics. One implication of this is that the allele which was the target of selection will tend to have modest frequencies at best, and that is so. From the supplements here are a list of populations with the percentage of the selected allele (some duplicates because they sampled different data sets):

Population N = 503f alleles N = 503L alleles % of 503f
Sardinian 40 16 71%
Tuscan 9 7 56%
Turku 11 9 55%
Basque 23 23 50%
Adygei 15 17 47%
Orcadian 15 17 47%
Italian 12 16 43%
Utah 40 56 42%
French 24 34 41%
Kuopio 8 12 40%
Tuscan 23 35 40%
Pole 7 13 35%
Druze 27 67 29%
Russian 13 35 27%
Uygur 5 15 25%
Terekli-Mektab (Daghestani) 13 43 23%
Makrani 11 39 22%
Balochi 10 40 20%
Mozabite 12 48 20%
Palestinian 19 83 19%
Kalash 8 42 16%
Pathan 8 42 16%
Kubachi (Daghestani) 7 39 15%
Brahmin Niyogi 4 26 13%
Brahmin 5 33 13%
Hazara 6 42 13%
Burusho 6 44 12%
Brahmin Vydika 5 41 11%
Sindhi 5 43 10%
Bedouin 10 88 10%
Brahui 5 45 10%
BantuSouthAfrica 1 15 6%
Yakut 3 47 6%
Xibo 1 17 6%
Daur 1 19 5%
Lahu 1 19 5%
Tu 1 19 5%
Yi 1 19 5%
Cambodian 1 21 5%
Mbuti Pygmy 2 74 3%
Mbuti Pygmy 2 74 3%
Mbuti Pygmy 2 74 3%
Mbuti Pygmy 2 74 3%
Mandenka 1 47 2%
Khonda Dora 1 51 2%
Irula 1 59 2%
BiakaPygmy 1 69 1%
!Kung (San) 0 22 0%
Alur 0 16 0%
BantuKenya 0 22 0%
Biaka Pygmy 0 10 0%
Cambodian 0 10 0%
Chinese 0 16 0%
Dai 0 20 0%
Han 0 70 0%
Han-NChina 0 18 0%
Hema 0 42 0%
Hezhen 0 20 0%
Japanese 0 62 0%
Japanese 0 38 0%
Khmer Cambodian 0 8 0%
Malasian 0 12 0%
MbutiPygmy 0 30 0%
Melanesian 0 44 0%
Miao 0 18 0%
Mongola 0 20 0%
Nande 0 36 0%
Naxi 0 20 0%
Oroqen 0 20 0%
Papuan 0 34 0%
Pedi (northern Sotho) 0 22 0%
San 0 14 0%
She 0 20 0%
Sotho 0 10 0%
Southern Chinese 0 8 0%
Taiwan 0 6 0%
Tsonga 0 12 0%
Tswana 0 14 0%
Tujia 0 20 0%
Vietnamese 0 18 0%
Xhosa 0 4 0%
Yoruba 0 50 0%
Zulu (Nguni) 0 18 0%

From these data the authors make the inference that the 503F allele was selected for its enhanced transport of ergothioneine, which is lacking in many plant foodstuffs which became prominent with the Neolithic Revolution. In other words, Crohn’s disease is a byproduct of an adaptation to nutrient deficiencies brought on by agricultural monocultures. The main problem this thesis seems to have is that many Middle Eastern populations which have long been agricultural don’t have a high frequency of the 503F allele. This doesn’t mean that the selective model proposed here is impossible, but, it does indicate that if this was a plausible adaptation then Middle Eastern populations must have their own distinctive variants.

I think this is a great paper, though I’m not confident about the conclusion. Agriculture was obviously one of the major selective pressures on the human genome. Even if some of the preliminary tests of natural selection from the mid-2000s don’t hold up because they tend to confuse genuine natural selective targets from spurious positives I’m rather confident that genes which are associated in some way with agriculture are going to be enriched in terms of functional constraint and adaptive sculpting.

Citation: Chad D. Huff, David Witherspoon, Yuhua Zhang, Chandler Gatenbee, Lee A. Denson, Subra Kugathasan, Hakon Hakonarson, April Whiting, Chad Davis, Wilfred Wu, Jinchuan Xing, W. Scott Watkins, Mike Bamshad, Jonathan P. Bradfield, Kazima Bulayeva, Tatum S. Simonson, Lynn B. Jorde, and Stephen L. Guthery Crohn’s disease and genetic hitchhiking at IBD5, Mol Biol Evol, doi:10.1093/molbev/msr151.

🔊 Listen RSS

The Pith: What makes rice nice in one varietal may not make it nice in another. Genetically that is….

Rice is edible and has high yields thanks to evolution. Specifically, the artificial selection processes which lead to domestication. The “genetically modified organisms” of yore! The details of this process have long been of interest to agricultural scientists because of possible implications for the production of the major crop which feeds the world. And just as much of Charles Darwin’s original insights derived from his detailed knowledge of breeding of domesticates in Victorian England, so evolutionary biologists can learn something about the general process through the repeated instantiations which occurred during domestication during the Neolithic era.

A new paper in PLoS ONE puts the spotlight on the domestication of rice, and specifically the connection between particular traits which are the hallmark of domestication and regions of the genome on chromosome 3. These are obviously two different domains, the study and analysis of the variety of traits across rice strains, and the patterns in the genome of an organism. But they are nicely spanned by classical genetic techniques such as linkage mapping which can adduce regions of the genome of possible interesting in controlling variations in the phenotype. In this paper the authors used the guidelines of the older techniques to fix upon regions which might warrant further investigation, and then applied the new genomic techniques. Today we can now gain a more detailed sequence level picture of the genetic substrate which was only perceived at a remove in the past through abstractions such as the ‘genetic map.’ Levels and Patterns of Nucleotide Variation in Domestication QTL Regions on Rice Chromosome 3 Suggest Lineage-Specific Selection:

Oryza sativa or Asian cultivated rice is one of the major cereal grass species domesticated for human food use during the Neolithic. Domestication of this species from the wild grass Oryza rufipogon was accompanied by changes in several traits, including seed shattering, percent seed set, tillering, grain weight, and flowering time. Quantitative trait locus (QTL) mapping has identified three genomic regions in chromosome 3 that appear to be associated with these traits. We would like to study whether these regions show signatures of selection and whether the same genetic basis underlies the domestication of different rice varieties. Fragments of 88 genes spanning these three genomic regions were sequenced from multiple accessions of two major varietal groups in O. sativaindica and tropical japonica—as well as the ancestral wild rice species O. rufipogon. In tropical japonica, the levels of nucleotide variation in these three QTL regions are significantly lower compared to genome-wide levels, and coalescent simulations based on a complex demographic model of rice domestication indicate that these patterns are consistent with selection. In contrast, there is no significant reduction in nucleotide diversity in the homologous regions in indica rice. These results suggest that there are differences in the genetic and selective basis for domestication between these two Asian rice varietal groups.

Here’s what seems relevant for the two domestic varieties from Wikipedia:

Oryza sativa contains two major subspecies: the sticky, short grained japonica or sinica variety, and the non-sticky, long-grained indica variety. Japonica are usually cultivated in dry fields, in temperate East Asia, upland areas of Southeast Asia and high elevations in South Asia, while indica are mainly lowland rices, grown mostly submerged, throughout tropical Asia….

There’s long been debate about the exact phylogenetic relationship between these two strains of domestic rice. More on that later. In regards to domestication there are three categories we need to focus on in terms of adaptation: 1) traits which are common to all domestic cereals and tend to crop up almost immediately, 2) traits which are extensions and improvements upon the initial domestic prototype, 3) traits which are regional diversifications, often adaptations to climate. Consider an analogy to horses. The original domestic horse was rather small, and was only fit for drawing chariots. Eventually the breeds became larger, and suitable for cavalry. Finally, there was a diversification by task (e.g., workhorses vs. race horses) and to some extent climate.

As noted above previous classical genetic techniques had narrowed down the genetic regions responsible for various domesticate traits when comparing japonica to the wild rufipogon. Since domestication usually entails a process of selection the authors naturally presumed that they might be able to detect signatures of selection within the genome. What are the genomic tells of selection?

There are many, just as there are different types of selection. In this case what we know suggests that due to #1 there’s going to be an initial bout of adaptation and rapid shift from wild diversity to fixed traits suitable for a crop which is going to be controlled by humans. Just as the riotous diversity of the wild varieties become constrained to monocultures, so the diversity of the wild type often gets swept away by a few genetic variants which are responsible for the favored traits. So what they might see in the domestic varieties is a sharp reduction of variation around the quantitative trait loci (QTLs) reported earlier, because those QTLs have presumably been the target of selection. In other words, a selective sweep.

That’s what they found. At least in one lineage.

Left to right you have indica, japonica, and rufipogon. Front to back in each chart you see the three QTLs, and the distribution of nucleotide diversities by genetic fragments within these QTLs. The extremely skewed distribution of the domestic varieties in relation to the wild type rufipogon is rather obvious. Additionally, you see a stronger skew in japonica in relation to indica. The skew in the domestic strains is toward a greater proportion of the fragments having very low nucleotide diversity.

What could cause this? You need a further piece of information here. The domestic varieties have long regions of the genome characterized by linkage disequilibrium (actually, japonica is so homogeneous that you barely have enough variation to calculate LD!). So particular genetic variants are associated with each other, resulting in long runs of similar sequences, haplotypes. It’s as if a chunk of some ancient chromosome just “blew up” and took over that segment of the genome in japonica.

Natural selection could do this. Imagine that an ancestral rufipogon has a genetic variant which confers a domestic trait. It would be selected. Even if crossed with other strains with other domestic characteristics its particular QTL would be transmitted down to the descendants in general. But not only would the specific genetic variant which conferred the favored trait be passed on, but many of the flanking genomic regions carrying other variants would also be transmitted! This explains the extremely low genetic diversity in japonica, if there’s a sweep up in frequency of a particular ancestral haplotype then what were polymorphisms in the wild type become monomorphic in the domesticate.

Another explanation though could be that demographic history produced these results. Random genetic drift due to small populations, whether via bottleneck or systematic inbreeding/selfing, can also drive up the frequency of alleles favored by lady-luck and render extinct all others. To check for this the authors constructed a model where japonica and indica went through bottlenecks enforced by the domestication (note that strong selection can drive down population size as well). Even with this model the diversity in japonica in these QTLs remained far too low (though indica’s skew did not reach statistical significance).

Since both of the domestic strains exhibit traits of domestication the lack of a selective event in indica at these QTLs does not allow us to infer that there are no genes which were selected for these traits in the past in indica. On the contrary, there certainly were and are such genes. But where are they? The authors moot the possibility that selection exists at the loci under consideration, but was simply missed because the selection was by a different dynamic which might not be picked up by their test. For various reasons they are skeptical of this on its own merits, but I think the bigger issue is that the original linkage mapping was performed with japonica vs. wild type strains, so naturally if the two domestic subspecies differed in their genetic architectures then the QTLs of interest of indica would not be discovered simultaneously.

Something which I’m rather perplexed by is how this comports or aligns with the finding by many of the same researchers that the two domestic varietals derive from the same ancestral population which was domesticated from East Asian wild rice. It could be that the history of domestication is more serial than we know, and that the common QTLs to both japonica and indica have been rendered irrelevant by new adaptations subsequent to their separation. Or, one or the other may have experienced introgression at that locus and so diverged after domestication. Interestingly in figure 7 of the paper they show that phylogenetic trees which illustrates the relationship of alleles associated with each strain. It indicates that indica is not monophyletic on these regions, while japonica is. This means that the japonica variants share a common ancestor, from which all are descended. In contrast, indica variants do not. Such a pattern is consistent with the story of strong positive selection upon a single variant at some time in the past for japonica. From what I can tell they may actually have sent the PLoS ONE paper to the reviewers before the PNAS paper which I reviewed earlier. Because these two papers were published so close to each other they don’t cite each other, though in some ways the first paper in PNAS would have fleshed out the natural history of domestic rice somewhat. As it is, they kind of leave of us hanging in relation to indica.

Why does all of this matter? Yes, agricultural genetics is important for agriculture. But let’s get back to people. There is a hypothesis that man is a ‘self-domesticated’ organism. Whatever quibbles I have with artificial terms like domestication I do think that there may be broad analogies to be drawn between our own species and the organisms associated with us.

Citation: Xianfa Xie1, Jeanmaire Molina, Ryan Hernandez, Andy Reynolds, Adam R. Boyko, Carlos D. Bustamante, & Michael D. Purugganan (2011). Levels and Patterns of Nucleotide Variation in Domestication QTL Regions on Rice Chromosome 3 Suggest Lineage-Specific Selection PLoS ONE : 10.1371/journal.pone.0020670

Image Credit: IRRI Images

🔊 Listen RSS

Credit: Karl Magnacca

The Pith: In this post I review some findings of patterns of natural selection within the Drosophila fruit fly genome. I relate them to very similar findings, though in the opposite direction, in human genomics. Different forms of natural selection and their impact on the structure of the genome are also spotlighted on the course of the review. In particular how specific methods to detect adaptation on the genomic level may be biased by assumptions of classical evolutionary genetic models are explored. Finally, I try and place these details in the broader framework of how best to understand evolutionary process in the “big picture.”

A few days ago I titled a post “The evolution of man is no cartoon”. The reason I titled it such is that as the methods become more refined and our data sets more robust it seems that previously held models of how humans evolved, and evolution’s impact on our genomes, are being refined. Evolutionary genetics at its most elegantly spare can be reduced down to several general parameters. Drift, selection, migration, etc. Exogenous phenomena such as the flux in census size, or environmental variation, has a straightforward relationship to these parameters. But, to some extent the broadest truths are nearly trivial. Down to the brass tacks what are these general assertions telling us? We don’t know yet. We’re in a time of transitions, though not troubles. Going back to cartoons, starting around 1970 there were a series of debates which hinged around the role of deterministic adaptive forces and random neutral ones in the domain of evolutionary process. You have probably heard terms like “adaptationist,” “ultra-Darwinian,” and “evolution by jerks” thrown around. All great fun, and certainly ripe “hooks” to draw the public in, but ultimately that phase in the scientific discourse seems to have been besides the point. A transient between the age of Theory when there was too little of the empirics, and now the age of Data, when there is too little theory. Biology is a very contingent discipline, and it may be that questions of the power of selection or the relevance of neutral forces will loom large or small dependent upon the particular tip of the tree of life to which the question is being addressed. Evolution may not be a unitary oracle, but rather a cacophony from which we have to construct a harmonious symphony for our own mental sanity. Nature is one, an the joints which we carve out of nature’s wholeness are for our own benefit.

The age of molecular evolution, ushered in by the work on allozymes in the 1960s, was just a preface to the age of genomics. If Stephen Jay Gould and Richard Dawkins were in their prime today I wonder if the complexities of the issues on hand would be too much even for their verbal fluency in terms of formulating a concise quip with which to skewer one’s intellectual antagonists. Complexity does not make fodder for honest quips and barbs. You’re just as liable to inflict a wound upon your own side through clumsiness of rhetoric in the thicket of the data, which fires in all directions.

In any case, on this weblog I may focus on human genomics, but obviously there are other organisms in the cosmos. Because of the nature of scientific funding for reasons of biomedical application humans have now come to the fore, but there is still utility in surveying the full taxonomic landscape. As it happens a paper in PLos Genetics, which I noticed last week, is a perfect complement to the recent work on human selective sweeps. Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans:

In Drosophila, multiple lines of evidence converge in suggesting that beneficial substitutions to the genome may be common. All suffer from confounding factors, however, such that the interpretation of the evidence—in particular, conclusions about the rate and strength of beneficial substitutions—remains tentative. Here, we use genome-wide polymorphism data in D. simulans and sequenced genomes of its close relatives to construct a readily interpretable characterization of the effects of positive selection: the shape of average neutral diversity around amino acid substitutions. As expected under recurrent selective sweeps, we find a trough in diversity levels around amino acid but not around synonymous substitutions, a distinctive pattern that is not expected under alternative models. This characterization is richer than previous approaches, which relied on limited summaries of the data (e.g., the slope of a scatter plot), and relates to underlying selection parameters in a straightforward way, allowing us to make more reliable inferences about the prevalence and strength of adaptation. Specifically, we develop a coalescent-based model for the shape of the entire curve and use it to infer adaptive parameters by maximum likelihood. Our inference suggests that ~13% of amino acid substitutions cause selective sweeps. Interestingly, it reveals two classes of beneficial fixations: a minority (approximately 3%) that appears to have had large selective effects and accounts for most of the reduction in diversity, and the remaining 10%, which seem to have had very weak selective effects. These estimates therefore help to reconcile the apparent conflict among previously published estimates of the strength of selection. More generally, our findings provide unequivocal evidence for strongly beneficial substitutions in Drosophila and illustrate how the rapidly accumulating genome-wide data can be leveraged to address enduring questions about the genetic basis of adaptation.

Figure 1 C shows the top line. As you can see, there’s a “trough” around non-synonymous substitutions. Non-synonymous simply means that a base pair substitution at that position within the codon changes the amino acid encoded. In contrast, a synonymous change does not. A substitution is not just a mutant variant though. It is rather an assessment of a population level shift from one allele to another. Neutral theory posited that most substitutions were not driven by natural selection, but rather random walk processes. Ergo, most evolutionary change was not adaptive. A simple way to check the power of selection against this background of stochastic variation is to measure the ratio of substitution between non-synonymous and synonymous bases. But this sort of thing is more appropriate when comparing closely related species. In the paper on selective sweeps in humans obviously that’s not going on, they were looking within one species. Instead the authors looked at reduction of variation across regions which may have been targets of natural selection. The reduction occurs because when one particular allele becomes the target of strong positive selection it pulls along adjacent linked regions in a “hitchhiking” process. Recombination works against this, resulting in decay over time of linkage disequilibrium which spikes in th wake of selection.

But these conceptions are predicated on a simple model of the emergence of variants, and the way selection does, or doesn’t, target these variants. One imagines a new mutant which arises against the ancestral genetic background. In a single-gene model the probability of fixation, that is, going to ~100% and substitution in the population, is 1/N (or 2N for diploid). In plain English the fixation probability for a mutant is inversely proportional to the effective population size. In contrast, the probability of fixation of a mutant which is selectively favored is proportional to its selection coefficient, which simply measures its fitness as a ratio to that of the population mean. The fixation of neutral variants is random walk, and the time until fixation is directly proportional to population size. In contrast, selectively favored variants can sweep to fixation rather quickly. Being very conservative one can infer that the fixation of lactose tolerance in Northern Europeans due to a mutation on the LCT gene took about ~7,000 years, or a little less than 300 generations. Because of this rapidity recombination has far less leisure with which to “chop” apart the physical associations of variants on the ancestral mutant genetic background. No wonder the LCT locus has one of the longest “haplotype blocks” in the European genome; a sequence of associate markers.

But let’s modify our mental model a bit. Imaging that a genetic variant has been floating around at a low frequency for a long time. There may be many copies of the mutant, associated with different genetic variants due to the impact of recombination. We can for example imagine a recessively deleterious allele which persists in low frequencies because of the lack of efficacy of selection (most alleles are found in heterozygote individuals with normal fitness). Many variants have multiple effects. Imagine that this allele has a dominant phenotypic effect which goes from being neutral to being very selectively favored. Now you have a situation where the genomic region will be dragged upward in frequency during adaptation, but, there will be many region s, not just one. Concretely, if the selective event occurred only a few generations after the original mutant the impact on the local genome would be much stronger in terms of generating homogenization than if the event occurred dozens of generations after the original mutant, as the original genetic background would have been recombined and so lost its distinctive coherency.

This is a form of natural selection from “standing variation.” Old mutants floating around in the background noise, rather than new mutants. In the paper above the authors find a fair amount of conventional selective sweeps, but, they suggest that the higher ratios of the proportion of the genome under natural selection found by some researchers in Drosophila may be due to the fact that some methods catch the whole basket of selection, while others focus on more tractable “cartoon” models.

Of the selection which can be modeled as a classic selective weep the authors also found a “power law” effect. There was a combination of a few hits of powerful selection, and more numerous bouts of weak selection. This is not totally unexpected according to theory. Some of the human traits which have been amenable to genome-wide association, such as pigmentation, probably fall under this category. Most of the trait variance is due to a few genes of large effect, but there are a larger number of loci which account for the minority balance of variance. The same no doubt can hold across evolutionary time with the dynamics of natural selection.

But we also shouldn’t get lost in the genomic trees and lose sight of the forest. Not only are evolutionary processes subject to molecular scale parameters such as recombination and mutation rates, but they are also impacted by organism and population scale parameters. One presumes that fruit flies are subject to a different pressures and have had a different history from human beings, just as both have from philopatric amphibians. Humans have an enormous census size, huge populations, and, we’ve undergone a massive change in lifestyle over the last 10,000 years. But as land bound mammals we may exhibit more population substructure than some species, for example birds with a wide range. Additionally, because of a low long term effective population we have only so much genic variation to work with. Such a welter of details distorts attempts at elegance, but they need to be kept in mind.

The authors conclude:

In summary, our findings establish a distinctive, genome-wide signature of adaptation in D. simulans, suggesting that many amino acid substitutions are beneficial and are driven by two classes of selective effects. Enabled by a richer summary of diversity patterns that avoids an a priori choice of scale, these conclusions offer a coherent interpretation of the results of previous inferences. It will now be interesting to see whether similar findings emerge in other Drosophila species, which vary in their recombination rates, effective population sizes, and ecology.

I wouldn’t limit this just to Drosophila. Because the different fruit fly species have different distributions, natural histories, as well as common ancestral traits and genes, they’re an excellent laboratory of evolution. But eventually we’ll start sweeping our gazes across all the multitudinous branches of the tree of life. Soon.

Citation: Sattath S, Elyashiv E, Kolodny O, Rinott Y, & Sella G (2011). Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans PLoS Genetics : 10.1371/journal.pgen.100130

🔊 Listen RSS I was semi-offline for much of last week, so I only randomly heard from someone about the “Science paper” on which Molly Przeworski is an author. Finally having a chance to read it front to back it seems rather a complement to other papers, addressed to both man and beast. The major “value add” seems to be the extra juice they squeezed out of the data because they looked at the full genomes, instead of just genotypes. As I occasionally note the chips are marvels of technology, but the markers which they are geared to detect are tuned to the polymorphisms of Europeans.

Classic Selective Sweeps Were Rare in Recent Human Evolution:

Efforts to identify the genetic basis of human adaptations from polymorphism data have sought footprints of “classic selective sweeps” (in which a beneficial mutation arises and rapidly fixes in the population). Yet it remains unknown whether this form of natural selection was common in our evolution. We examined the evidence for classic sweeps in resequencing data from 179 human genomes. As expected under a recurrent-sweep model, we found that diversity levels decrease near exons and conserved noncoding regions. In contrast to expectation, however, the trough in diversity around human-specific amino acid substitutions is no more pronounced than around synonymous substitutions. Moreover, relative to the genome background, amino acid and putative regulatory sites are not significantly enriched in alleles that are highly differentiated between populations. These findings indicate that classic sweeps were not a dominant mode of human adaptation over the past ~250,000 years.

Figure 2 shows the top-line result. There are certain mutations which are “non-synonymous,” in that they change the amino acid encoded by the codon. Others are “synonymous,” insofar as changing the base pair has no direct functional impact. Since natural selection “sees” function the expectation is that it would impact the two types of substitutions differently. More specifically, synonymous bases should be relatively “neutral” in terms of their rate of change vis-a-vis non-synonymous bases, which may be affected by both positive and negative selective forces.

A “classic sweep” is a very easy dynamic to imagine. Single mutations arise which are very favored and so are driven to “fixation,” ~100%, within the population rather rapidly by positive directional selection. Since the mutation is embedded in the broader genome natural selection will also “catch” other variants associated with the mutant of interest, in direct proportion to parameters such as distance and rates of recombination. Selective sweeps then produce regions of relative homogenization as a whole block of the ancestral background genome around the favored mutant is dragged upward in frequency. The interesting point in this paper is that the authors show that there’s relatively little difference in the pattern between functionally significant and non-significant regions of the genome. As the classic sweep models are predicated upon strong positive selection operating upon a favored variant, something seems off.

What does this mean? Selective sweeps are a tractable dynamic. If they’re not so ubiquitous then human evolutionary genetics becomes a rather more complex game, with different varieties of natural selection operative. By analogy, perhaps this is similar to the unfortunate reality that the “common disease-common variant” seems to be only marginally fruitful.

Now, it does turn out that some traits do seem to have been driven by conventional sweeps. Pigmentation, infectious disease resistance, and lactase persistence. No surprise that these are traits whose genetic architectures have also been relatively well elucidated. Finally, I find this passage intriguing:

…This, to dissect the genetic basis of human adaptations and assess what fraction of the genome was affected by positive selection, we need new tests to detect other modes of selection, such as comparisons between closely related populations that have adopted to drastically different environments….

I have a candidate dyad in mind: Papuans and Australian Aborigines. They separated as distinctive populations within the past 10 to 20,000 years, and have diverged greatly in their mode of existence with the spread of horticulture in the highlands of New Guinea.

Citation: Hernandez RD, Kelley JL, Elyashiv E, Melton SC, Auton A, McVean G, 1000 Genomes Project, Sella G, & Przeworski M (2011). Classic selective sweeps were rare in recent human evolution. Science (New York, N.Y.), 331 (6019), 920-4 PMID: 21330547

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"