The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information



=>
Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
/
Soft Sweeps

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

In the image to the left you see three human males. You can generate three pairings of these individuals. When comparing these pairs which would you presume are more closely related than the other pairs? Now let me give you some more information. The rightmost image is of the president of Tanzania. The middle image is of the president of Taiwan (Republic of China). And finally, the leftmost image is of the prime minister of Papua New Guinea. With this information you should now know with certainty that the prime minister of Papua New Guinea and the president of Taiwan are much more closely related than either are to the president of Tanzania. But some of you may not have guessed that initially. Why? I suspect that physical inspection may have misled you. One of the most salient visible human characteristics is of the complexion of our largest organ, the skin. Its prominence naturally leads many to mistakenly infer relationships where they do not exist.

This was certainly an issue when European explorers encountered the peoples of Melanesia. An older term for Melanesians is “Oceanic Negro,” and some sources suggest that the Spaniards who named the island New Guinea did so with an eye to the old Guinea on the coast of West Africa. To the left is an unrooted tree which illustrates the relationships between Papuans, Bantu from Kenya, and Han Chinese. Since the font is small I’ve underlined the focal populations in red. Africans are always the “outgroup” to any two non-African populations. This is a robust pattern whenever you look at averaged total genome phylogenies. In other words, when you don’t privilege particular genes in a phylogeny humanity can be divided into African and non-African branches.

ResearchBlogging.org But, if you look at pigmentation genes you get a different picture altogether. As it happens, not only is variation in skin color a trait of great social importance, but it turns out to be one of the few phenotypes whose genetic architecture has been well elucidated by genomics. There are about half a dozen genes responsible for most of the between population variation in complexion. Populations far from the equator seem to have developed parallel means toward lighter skins, while those near the equator are more likely to exhibit similarities. In other words, the phylogeny of these specific genes is out of sync with the average phylogeny of the genes in these populations. It is the latter which is a good reflection of demographic history, not the former.

A new paper in PLoS Genetics looks at these sorts of parallel trends more broadly, not just focusing on one trait. In particular, the authors explore the possibilities of natural selection operating upon standing genetic variation across divergent lineages. This means that there need not be a novel mutation which is driven up toward fixation rapidly by positive selection in a “hard sweep.” Rather, as populations diversify they may be subject to selection pressures which take their extant genetic variation and shift the mean of the quantitative trait in a particular direction, altering the balance of underlying allele frequencies rather than substituting novel genetic variants at one or two loci. These are “soft sweeps.”

First, the author summary, Parallel Adaptive Divergence among Geographically Diverse Human Populations:

Identifying regions of the human genome that differ among populations because of natural selection is both essential for understanding evolutionary history and a powerful method for finding functionally important variants that contribute to phenotypic diversity and disease. Adaptive events on timescales corresponding to the human diaspora may often manifest as relatively small changes in allele frequencies at numerous loci that are difficult to distinguish from stochastic changes due to genetic drift, rather than the more dramatic selective sweeps described by classic models of natural selection. In order to test whether a substantial proportion of interpopulation genetic differences are indeed adaptive, we identify loci that have undergone moderate allele frequency changes in multiple independent human lineages, and we test whether these parallel divergence events are more frequent than expected by chance. We report a significant excess of polymorphisms showing parallel divergence, especially within genes, a pattern that is best explained by geographically varying natural selection. Our results indicate that local adaptation in humans has occurred by subtle, repeated changes at particular genes that are likely to be associated with important morphological and physiological differences among human populations.

The statistics in this paper can be a bit daunting, but the basic logic is simple. The HGDP data set has a lot of SNP information on ~50 populations. These populations also exhibit variation in their phylogenetic relationships. We know, for example, that Amerindian populations are closer to East Asians than they are to Europeans. They pruned their population set down to very genetically distinctive groups. Those which don’t have too much admixture and are in ecologically unambiguous regions (so discard the Uyghur). For example, Europeans and East Asians in temperate climes, Pygmies and Papuans in the tropics. Comparing two pairs which were phylogenetically unrelated but ecologically distinctive in a similar manner they found broad evidence of parallel shifts in underlying allele frequency on a range of SNPs.

Remember, these are polymorphisms found in all populations. So natural selection is perturbing the frequencies around an average. Additionally, they focused on alleles with intermediate global frequency, so that one presumes there’s enough genetic variance for selection to be effective. Theoretically and through simulation the authors understand that a certain number of SNPs would be correlated in the manner which would imply parallel positive selection, and so possible convergence of trait values. But the authors found that for several comparisons across groups there was an excess of detected SNPs. And, the distribution across regions of the genomes for these detected SNPs is very suggestive. There was an excess of SNPs in coding regions of the genome. And, there was an even greater excess on base pairs where a change in state would result in a change in the protein! In other words, regions of the genome implicated in genuine function show more hints of convergence across unrelated lineages.

They also found particular patterns in the genes which were enriched for parallel selection:

Genes overlapping parallel divergent SNPs were modestly enriched for diverse functional categories associated with various cell types including neurons, lymphocytes, cancer, and epithelium…Among the most extreme parallel divergent genes (observed at a threshold of 0.5%) were the skin keratinization gene ABCA12SH2B1, which controls serum letpin levels and body weight…GRM5, a glutamate receptor associated with schizophrenia…and with pigmentation via the closely linked TYRATP2A2, which causes a neuropsychiatric/keratinization disorder…F13A1, a coagulation factor linked to numerous cardiovascular diseases and to Alzheimer’s…and IFIH1, associated with antiviral defense, type 1 diabetes, and psoriasis...The pleiotropic nature of many of these genes suggests that selection on one trait may have affected the evolution of other traits.

On the last part: there’s a “correlation matrix” between genetic variance and trait variance. If you slam the genome with natural selection there will usually be a correlated response on a host of traits unrelated to the target of selection because of the complex contingent nature of biological pathways. Modulating gene X to shift the value of trait 1 to increase local fitness can have large consequences for trait 2, trait 3, trait 4, and so forth.

Of course most of the SNPs detected are not targets of selection. Remember, one assumes that simply due to random chance some SNPs will exhibit patterns which spuriously match those of regions which are the targets of parallel selection pressures. Rather, the importance of this paper is that it is another step to fleshing out the broader general theory of how adaptation and demographic events interplay across the arc of human history. It was always understood that convergent evolution is a force which must have shaped humans as they diversified and radiated across the world, but the genetic details were often left unspecified for various reasons. By filling in those details we may be able to stumble upon some very interesting general insights about the parameters which frame evolutionary process.

For example:

South Americans may carry alleles adapted to temperate climates due to their ancestral migration across Beringia, and they may have lacked adequate time and/or genetic variation to completely re-adapt to a tropical environment. One SNP that fits this hypothesis lies in DDB1, which protects the skin from solar UV exposure…and is one of the strongest examples of this parallel divergence pattern, with one allele fixed in South America, over 90% in Europe and East Asia, and less than 40% in African and Oceania….

A biological anthropologist once told me that South American Indians look like Siberians in their bodily proportions in relation to other tropical people. Which makes sense since they are probably the descendants of Siberians! As for skin color, this is an interesting trait insofar as it looks like that our species evolved very dark skin relatively early in our history, at the point when we lost our fur. Tropical populations usually exhibit a modicum of functional constraint. They don’t deviate too far from the ancestral type. In contrast, temperate zone populations often “lose” the function on these pigment producing genes, though differently. To “break” a gene is far easier than to put it back together, and I suspect that’s what you’re seeing with Amerindians in the tropics. A small ancestral population which traversed Beringia only carried a non-functional copy, which probably accumulated many mutations. Once they reentered the tropical zone they “needed” function again, but that would take too many independent steps for 10-15,000 years to suffice.

As noted by the authors this was definitely a first pass. With thicker sequence level data, and better population coverage, presumably one could explore more fine-grained questions. But at least there are results which confirm what one always assumed in theory. Sometimes it is just good to do a check, because actually you never really know….

Citation: Tennessen JA, & Akey JM (2011). Parallel Adaptive Divergence among Geographically Diverse Human Populations PLoS Genetics

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

509px-Drosophila_residua_heNatural selection happens. It was hypothesized in copious detail by Charles Darwin, and has been confirmed in the laboratory, through observation, and also by inference via the methods of modern genomics. But science is more than broad brushes. We need to drill-down to a more fine-grained level to understand the dynamics with precision and detail, and so generate novel inferences which may then be tested. For example, there are various flavors of natural selection: stabilizing selection, negative selection, and positive directional selection. In the first case natural selection buffets the phenotype about an ideal mean, in the second case deleterious phenotypes and their associated alleles are purged from the genome, and finally, natural selection can also drive a novel trait toward greater prominence, and concomitantly the allelic variants which are associated with the fitter phenotype.

The last case is of particular interest to many because it is often with positive natural selection by which evolution as descent with modification occurs. Over time trait values and the nature of traits themselves shift such that a lineage changes its character beyond recognition. This phyletic gradualism and the scale independence of evolutionary process has been challenged, in particular from the domain of developmental biology (albeit, not all ,or even most, developmental biologists). But ultimately no one doubts that a classical understanding of evolution as change in allele frequency, often driven by natural selection, is part of the larger puzzle of how the tree of life came to be.

ResearchBlogging.org One of the phenomena associated with positive directional evolution is the selective sweep. How a selective sweep occurs, and its consequences, are rather straightforward. A genome consists of a sequence of base pairs (e.g., we have 3 billion base pairs). If a new mutation emerges at a particular base pair, a novel single nucelotide polymorphism (SNP), and, that allelic variant is ~10% fitter than the ancestral variant, natural selection could drive up its frequency (the conditionality is due to the fact that in all likelihood it would still go extinct because of the power of stochastic forces when a mutant is at low frequency). So the variant could in theory shift from ~0% (1 out of N, N being the number of individuals in a population, 2N if diploid, and so forth) to ~100%. This would be the fixation of the novel variant, driven by selective dynamics. So what’s the sweep aspect? The sweep in this case refers to the effect of the very rapid rise in frequency of the SNP in question on the adjacent genomic region. What is termed a genetic hitchiking dynamic results if the sweep occurs rapidly, so that nearby regions of the genome also move to fixation along with the favored SNP. But in a diploid organism with sexual reproduction genetic recombination persistently breaks apart associations across the physical genome. Therefore the span of the sequence of genetic markers nearby a favored SNP which form a haplotype is dependent on the rate of recombination as well as the rate of the rise in frequency of the allele, which is contingent on the strength of selection. A powerful selective sweep has the effect of homogenizing wide regions of the genome flanking the favored mutant; in other words the sweep “cleans” the gene pool of variation as one very long haplotype replaces many shorter haplotypes. As an example, in the genomes of Northern Europeans the locus LCT is characterized by a very long haplotype, which itself seems to correlate well with the trait of lactase persistence. The implication here is that the lactase persistence conferring variant arose relatively recently, and was swept up to near fixation by positive directional natural selection.

That’s the broad theory. But as you know, evolution and its subcomponents are more than “just a theory,” they’re a set of models which are amenable to testing, whether through observation, or via controlled laboratory experiments. A new letter to Nature elaborates how exactly selective sweeps play out in Drosophila melanogaster, a classic “model organism.” Interestingly, this is a case of experimental evolution, something we are more familiar with Richard Lenski’s E. coli. Genome-wide analysis of a long-term evolution experiment with Drosophila:

Experimental evolution systems allow the genomic study of adaptation, and so far this has been done primarily in asexual systems with small genomes, such as bacteria and yeast…Here we present whole-genome resequencing data from Drosophila melanogaster populations that have experienced over 600 generations of laboratory selection for accelerated development. Flies in these selected populations develop from egg to adult ~20% faster than flies of ancestral control populations, and have evolved a number of other correlated phenotypes. On the basis of 688,520 intermediate-frequency, high-quality single nucleotide polymorphisms, we identify several dozen genomic regions that show strong allele frequency differentiation between a pooled sample of five replicate populations selected for accelerated development and pooled controls. On the basis of resequencing data from a single replicate population with accelerated development, as well as single nucleotide polymorphism data from individual flies from each replicate population, we infer little allele frequency differentiation between replicate populations within a selection treatment. Signatures of selection are qualitatively different than what has been observed in asexual species; in our sexual populations, adaptation is not associated with ‘classic’ sweeps whereby newly arising, unconditionally advantageous mutations become fixed. More parsimonious explanations include ‘incomplete’ sweep models, in which mutations have not had enough time to fix, and ‘soft’ sweep models, in which selection acts on pre-existing, common genetic variants. We conclude that, at least for life history characters such as development time, unconditionally advantageous alleles rarely arise, are associated with small net fitness gains or cannot fix because selection coefficients change over time

Critical to understanding what’s going on here is the distinction they make between ‘classic’ ‘hard sweeps’ and ‘soft sweeps.’ Hard sweeps follow the spare description I outlined above:

1) A new mutant arises in the genetic background

2) Selection favors the mutant

3) The mutant rises in frequency and sweeps to fixation, 0% → 100%, replacing the ancestral variants

In contrast, for a soft sweep:

1) Selection favors a set of minor polymorphisms already segregating in the gene pool

2) These polymorphisms rise in frequency

3) But they may not sweep to fixation

In the first case the signature of natural selection will be clear, distinct, and indubitable. A novel haplotype which has replaced the ancestral variants and produced a wide region of genetic homogeneity as all other allele states are expunged by the sweep will have resulted. That isn’t what they saw at the genomic level.

phendiffBut first, what did they do? The flies used in this experiment derive from a 30 year old lineage, and they selected them for 600 generations in the case of the treatments which were being driven to new phenotype values. 600 generations for humans would be about 15,000 years assuming 25 years per generation. If a trait is heritable, and you select offspring deviated away from the mean, over time you will see a shift in the trait value. This is classic quantitative genetics, and that’s what they saw. They had five lineages which exhibited accelerated development (ACO), and five which were controls which exhibited the ancestral phenotypes (CO). “Eclosion” refers to the fly’s emergence from the pupae. The lineages which were subject to natural had very different life histories from the control groups. The cluster of traits here shouldn’t be too surprising, we know from other taxa that short-lived fast-developing species tend to be smaller and metabolically more under-the-gun than the inverse.

But the real interesting aspects of this study are not the phenotypes. Who hasn’t seen weird things among the Drosophila? That’s one of the reasons they were chosen as model organisms in the first place! Rather, they explored the patterns of genomic variation within and across the lineages, and integrated the results into a broader theoretical framework of how evolutionary processes occur, and their implications for the genome-wide structure one should see. Below I’ve stitched together figure 2 & 3, which illustrate particular patterns of genomic variation.

compfig

The left figure shows differences in allele frequencies between the ACO and CO pooled lineages. The spikes indicate large differences, with the dotted line representing the threshold where there’s a 0.1% random chance of such a between population frequency difference. The vertical axis is log-scaled. The grey line at the bottom indicate the differences in one particular ACO lineage with the pooled ACO sample. In the right panel you see heterozygosities, with blue denoting the CO lineages, and red the selected ACO lineages which have shortened life histories. The grey again is a particular ACO lineage. Each vertical panel corresponds to a chromosomal arm of the the Drosophila melanogaster genome.

First, note the widespread distribution of allele frequency differences between ACO and CO. Additionally, there’s little difference between the specific ACO lineage, and the pooled sample. Despite their independent histories they seem to exhibit the same allelic configuration. Second, note that the heterozygosities in the case of the ACO pooled sample is lower than in the CO ancestral phenotype lineages. Why? Remember that selective sweeps should expunge genomic variation. But, the sweeps do not seem to have gone to fixation, otherwise we’d see many more inverted peaks converging to heterozygosity of ~0, as the selected variant replaces all others in the population.

What’s going on in the regions which exhibit differences between the controls and selected linages? They looked at the ~650 non-synonymous SNPs on ~500 genes which were most differentiated between ACO and CO (L 10FET score > 4) and found the following categories of genes enriched: imaginal disc development, smoothened signalling pathway, larval development, wing disc development, larval development (sensu Amphibia), metamorphosis, organ morphogenesis, imaginal disc morphogenesis, organ development and regionalization. Life history is complex. Combine the wide class of genes with the dispersed genomic impact of selection as evident in figures 2 and 3, you get a good sense of the sort of consequences on the substrate level which quantitative genetic evolutionary dynamics have. Also of interest, they found that the X chromosome seemed enriched for signatures of selection and evolution. Why? They note that this chromosome would be more subject to selection for recessive or partially recessive expressing SNPs.

Clearly this study did not find the clean hard sweeps which theory may have predicted. Rather, the researchers found a lot of partially completed sweeps distributed all across the genome. Sound familiar? Before we move on to broader considerations, here are their explanations:

- The sweeps are hard, but haven’t reached fixation. So the selection coefficients have be rather small for them to still be in transient

- Selection is operating on “standing variation.” That is, the genetic variation extant naturally within a given population, and which may be operated upon by natural selection to change the population trait value mean through classical breeding techniques

- And finally, selection coefficients (the greater fitness of positively selected variants against the population mean) may not be static parameters, but change over time as a function of allele frequency. This shouldn’t be that surprising. Frequency dependence and epistasis can impact on linear assumptions within a statistical genetic model. The authors refer to deleterious alleles or antagonistic pleiotropy as possible genetic level forces which also prevent fixation

I personally lean against the first option, because it seems like we see a similar pattern in human evolutionary genomics, lots of partial sweeps and incomplete fixation. How much time does a brother need? In the long run we’re dead, and heat death swallows the universe. In the short run evolutionary pressures are always shifting. Fix now, or forget it say I! The wide distribution of allelic differences as well as moderate heterozygosities seems to be an indication that a quantitative trait, life history, is being modified through mass action on genetic variation. Interestingly, there’s also the parallel to humans insofar as the X chromosome seems to have more signatures of selection and variation in this evolutionary experiment. Next question: who’s working on experimental evolution of 600 generations in mice?

Citation: Burke, Molly K., Dunham, Joseph P., Shahrestani, Parvin, Thornton, Kevin R., Rose, Michael R., & Long, Anthony D. (2010). Genome-wide analysis of a long-term evolution experiment with Drosophila Nature : 10.1038/nature09352

Image Credit: Karl Magnacca

(Republished from Discover/GNXP by permission of author or representative)
 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"