The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information



=>
Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
/
Adaptation

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

plant-introductions-evolution-hybrid-speciation-and-gene-transfer-18-728nature09103-f3.2Is there a difference between admixture and introgression? I think there is. Or have always assumed there is. But of late I’m wondering if a distinction is widely accepted, and what sort of distinctions people make. That is, in some cases it seems clear that admixture and introgression are used interchangeably as meaning the same thing. I’ve seen this in scientific papers, and often just do a mental substitution. But in other cases I’m wondering if people are using the terms in a different sense than I am. Probably the latter is more worrisome.

The figure to the left was generated by Admixture, a software package which takes population genetics assumptions (models) and data, and shows you the best fit of the data to a particular model. In this case the bar plot shows you the admixture of a given individual when you posit them to be a combination of K ancestral populations. The individuals are clustered by population, so you see population-wide profiles. The details of the model, and whether the model accurately captures reality (i.e., were there actually K populations at any time in the past?), is less important for this post than the fact that Admixture is reflecting admixture on a genome-wide scale between two or more populations. The input data are represented by hundreds of thousands of single nucleotide polymorphisms distributed across the whole genome. The question of interest is whether a population can be represented as a pulse mixing even between two hypothetical groups, which were at some point phylogenetically distinct.

Introgression in contrast focuses on the question of genetic variants which are penetrating one population from another, and becoming common in the target population. A classical method of generating introgression in plant genetics was to engage in extensive backcrosses of mixed lineages with a trait of interest against a parental population. If one continued to select for a particular trait among the progeny one could introgress the trait and allele in a daughter population which was almost identical to one of the parent populations on a genome-wide scale, but identical to the other at one gene of interest. The practical reason for this is obvious. Imagine you have a variety of cold adapted rice which is susceptible to a particular type of fungal infection. Then, you have a heat adapted rice which is resistant to the fungal infection. All you want is fungal infection resistance, maintaining all the other characteristics that keep the cold adapted rice optimal for its climate. So you cross the two, and continue to cross progeny against cold adapted rice while selecting for the resistance phenotype. Eventually you’ll get the allele you want introgressed while maintaining the genetic background you want. In contrast, if you just allowed for admixture between the two lineages, you might get a population which was in between on a whole host of phenotypes which make them suboptimal for any climatic regime.

An example from human population genomics can be found in the paper Altitude adaptation in Tibetans caused by introgression of Denisovan-like DNA. What occurred here is that a very common variant in Tibetans implicated altitude tolerance and adaptation seems to be phylogenetically closer to those you find in the Denisovan hominins than in other human populations. This, despite the fact that Denisovan ancestry is nearly nonexistent in Tibetans (the latest work suggests admixture in East and Southeast Asia on the order of 0.1 to 0.5%, with the highest fractions being among certain Southeast Asian and South Asian groups).

nature13408-f3 The network plot to the right illustrates the issue. On a genome-wide admixture plot Tibetans look like any East Asian population. They seem to be a mix of farmers related to the Han to the east and indigenous groups long resident at these high altitudes. But on the region around EPAS1 their genetic variation matches not modern humans, but the Denisovan hominin, which diverged ~500,000 years ago from the population gave rise to 90 to 99% of the ancestry of our own lineage.

So what happened? We know that there were low levels of hybridization between very diverged human lineages in the past. Because of genetic incompabilities it seems that in fact there was some selection against distinctive alleles from archaic lineages in our own genome. That is, the percentage of Neanderthal ancestry on the genomic level is probably lower than you’d get from doing a genealogical analysis of all lines of ancestry back to 100,000 years ago, because there has been selection against Neanderthal variants in the dominant human genetic background. But not in 51r8Ph-vcaL._SY344_BO1,204,203,200_ all cases. In a minority of instances the Neanderthal and Denisovan variants were not less fit, nor were they neutral, but rather, they were favored!

So, imagine a scenario where in the initial generation admixture between a large human population and a small Neanderthal population leads to admixture on the order of ~5% in the descendants. Over the generations due to selection against Neanderthal alleles the genomic ancestry from this group converges upon ~2.5%. But, on a subset of loci the Neanderthal alleles will have increased in frequency, and in some cases introgressed to high levels. This could be due to randomness; in a genome with billions of base pairs and tens of millions of nucleotide polymorphisms some alleles will drift up to higher frequencies randomly. But it is in the set of high frequency alleles from Neanderthals that you might find variants that have become common due to adaptive introgression. See this paper in AJHG, Introgression of Neandertal- and Denisovan-like Haplotypes Contributes to Adaptive Variation in Human Toll-like Receptors. Immunological variation is always an excellent candidate because genetic diversity at these loci are highly favored, and long resident populations often have local adaptations.

Because my focus is generally in microevolutionary process, the sort of thing population geneticists are interested in, I’ve really not been talking about species-level dynamics (though the hominins are arguably distinct species). Much of the work on admixture and introgression is done by biologists focused on inter-specific differences, but the general framework holds I believe (in fact, questions of admixture and introgression and more clear and distinct across diverged lineages). In plants in particular hybridization and introgression are common in wild and domestic lineages.

I’m not putting this post up as definitive. When I read papers where there is talk about “introgression of ancestry” it is clear that today people are merging and bleeding the definitions. I actually checked for definitions of introgression and admixture in . Principles of Population Genetics and Elements of Evolutionary Genetics. There wasn’t anything, because debate on this issue isn’t/wasn’t very live in these fields. At this point I’m really curious what other biologists think. I still find the distinction important, and more critically, useful. If one doesn’t, I’d like to hear opinions. If one has different definitions, I’d like to hear opinions.

 
• Category: Science • Tags: Adaptation, Genetics, Introgression 
🔊 Listen RSS

Screenshot - 12102015 - 03:55:03 PM

Screenshot - 12102015 - 04:26:46 PM The human genome is littered with many genes from diverged lineages. That is, any given human has segments from lineages which are deeply diverged from the dominant demographic element in our ancestry, which diverged from an African population which flourished on ~200,000 years ago, and among non-Africans a population derived from Northeast Africa ~50,000 years ago. The Neanderthal ancestry of non-Africans, which is in the range of ~2 percent, diverged from the rest of the genome on the order of ~500,000 years ago from the main stem of our heritage. A similar time span divides us from the Denisovan ancestry in Oceanians, and to a lesser extent island Southeast Asians, and even less among East and South Asians. Due to the lack of ancient genomes such definitive inferences are more difficult to make for African populations, but there are suggestive clues that diverged lineages also contributed to the Khoe-San and Pygmy, and therefore other African, peoples.

Though the initial Neanderthal admixture results tended to focus on their implications for ancestry, and not function, a recent spate of work has suggested that archaic admixture in modern lineages may have been adaptive. But at some point one needs to go beyond genome-wide assessments, and look at specific genes. In that vein, a preprint out of Rasmus Nielsen’s group, Archaic adaptive introgression in TBX15/WARS2. This related to their paper about adaptation of Greenland Inuit to their environment. What they report here is that Greenlanders, and Eurasians more broadly as well, exhibit evidence of carrying an introgressed haplotype which derives from Denisovans or a Denisovan-related population.* The map above shows distributions of an allele which is strongly associated with the introgressed haplotype. You can see that it is absent in Sub-Saharan Africa, fixed in Greenland, present in high frequencies in the New World (near fixation in Amerindians), at moderate frequencies in East Asia, and lower frequencies elsewhere in Eurasia. The haplotype harbors two genes, TBX15 and WARS2. What do these genes do? Lots of things:

The TBX15/WARS2 region is highly pleiotropic: it has been found to be associated with a variety of traits. These include the differentiation of adipose tissue5, body fat distribution…facial morphology…stature…ear morphology…hair pigmentation…and skeletal development…Interestingly, for several of body fat distribution studies, the introgressed SNPs have significant genome-wide associations. The Denisovan alleles tend to increase waist circumference and waist-hip ratio, after correcting for BMI.

They went to great lengths to ascertain whether this was an introgressed haplotype, and where it came from in relation to the genomes we have on hand (Neanderthals, modern humans, and Denisovans). Broadly they are convincing that it is introgressed, and not deep structure from Africa. And, they make a good case that the population from which this haplotype derived is closer to Denisovans than to Neanderthals. The summary is really in the haplotype network above. To the bottom right you see a cluster of common human haplotypes. Then you see the Neanderthal haplotype, and then further along the Denisovan haplotype. Finally, you see a cluster of introgressed haplotypes. First, remember it’s not established that the donor population was Denisovan. Second, there have been derived mutations since the allele moved between populations.

The functional reason why this swept to fixation in Greenlanders seems obvious. It’s related to the nature of fat deposition, and GWAS correlate it with effects on BMI and body shape, and, it is jointly in high frequency in Amerindians and the populations of the Arctic. In all likelihood the sieve was in Beringia, where climatic conditions were extreme, and all sorts of adaptations were necessitated. And the authors state that plainly. In relation to mechanism there are suggestions that there regulatory and epigenetic dynamics at play. The expression differences seem clear, in terms of how the Denisovan variant effects the magnitude of the genetic consequence. But the epigenetic aspect is very confused to me. Part of it seems to be that the authors themselves are trying to make sense of the results (jump to “DMR”), but I wish they would at least expand, because there is some lack of clarity as to the details (I had a friend whose research is in epigenetics read that section and they found it a bit unclear, so they didn’t want to evaluate whether it made sense, so I felt better as to my confusion).

With all that put out there, let’s get to the crux of the issue that we can agree on. Using simulations the authors did establish that this is unlikely to have increased in frequency in all these populations simply due to drift. That is, chance. The alternative then is that selection is increasing the frequency of this haplotype, and assorted functional alleles. In the case of Amerindians you see see that it is close to fixation. There is a recessive aspect to the nature of methylation patterns, which are associated with gene expression, so that may give us a clue why this variant is fixed in Greenland, and nearly so in unadmixed Amerindians. If the expression of a favored trait is recessive, then it makes sense to make the final step from ~90% (where nearly 20% of the population would have a disfavored morph) to ~99% (where only ~2% would).

But what’s going on in Old World populations? To my knowledge there is no evidence of Denisovan admixture in western Eurasian populations, but you can find these “Denisovan” alleles in them. The simplest explanation is that the haplotype is derived from another archaic population, within the same clade as Denisovans with Neanderthals as an outgroup, which was resident further west. In fact, look at the structure the introgressed haplotypes. The Amerindians have the most diverged branch, while the western and southern Eurasian groups are represented within the haplotype closest to Denisovans. To me this is suggestive of an early admixture event closer to the point-of-departure from Africa. As modern humans moved east the serial bottleneck effect occurred with this introgressed haplotype.

A second possibility is that this allele may be from Denisovans, and that it is so favored that even if small levels of eastern Eurasian admixture don’t manifest themselves in total genome-wide admixture estimates a few copies were sufficient for this to become common outside that zone. This gets to the title of the post: one can posit a multi-regional system of selected variants sweeping across interconnected demes, which transcend the fact that migration between the demes is too low to make significant contributions to total genome content. This may explain the presence of East Asian EDAR in the Motala samples, for example.

Finally, I think one has to consider the high probability that the target of selection on this locus has varied over time and space. The introgression of this archaic allele into non-Africans was an ancient event, but it does not seem to have fixed into any populations outside of the Beringian Diaspora. Why? It may be that there are balancing effects going on, perhaps frequency dependence, or, in even over-dominance (I tend to discount the last in most cases, but the fractions in East Asia are so close to 50%). Along the East Asian littoral the frequency is in an intermediate range, while in western and southern Eurasia they’re present at lower frequencies, though lower in South Asia than in some of the European groups. This is intriguing because when it comes to alleles which are not subject to selection South Asian share more ancestry with East Eursaians, and you generally see a pattern where they occupy a position in between (e.g., this is the case when it comes to Neanderthal admixture, with East Asians having the most, and Europeans the least, with South Asians between).

There’s a lot that’s going to be researched and published between now and 2025. These authors have established that they found an introgressed variant, but it’s too widely distributed to have the neat solution that EPAS1 in Tibetans has.

* My father and brother carry one copy of the introgressed haplotype.

 
• Category: Science • Tags: Adaptation, Genomics, Introgression 
🔊 Listen RSS

A Tree of Life

Evolutionary processes which play out across the tree of life are subject to distinct dynamics which can shape and influence the structure and characteristics of individuals, populations, and whole ecosystems. For example, imagine the phylogeny and population genetic characteristics of organisms which are endemic to the islands of Hawaii. Because the Hawaiian islands are an isolated archipelago the expectation is that lineages native to the region are going to be less shaped by the parameter of migration, or gene flow between distinct populations, than might otherwise be the case. Additionally, presumably there was a “founding” event of these endemic Hawaiian lineages at some distant point in the past, so another expectation is that most of the populations would exhibit evidence of having gone through a genetic bottleneck, where the power of random drift was sharply increased for several generations. The various characteristics, or states, which we see in the present in an individual, population, or set of populations, are the outcome of a long historical process, a sequence of precise events. To understand evolution properly it behooves us to attempt to infer the nature and magnitude of these distinct dynamic parameters which have shaped the tree of life.

Credit: Verisimilus

For many the image of evolutionary processes brings to mind something on a macro scale. Perhaps that of the changing nature of protean life on earth writ large, depicted on a broad canvas such as in David Attenborough’s majestic documentaries over millions of years and across geological scales. But one can also reduce the phenomenon to a finer-grain on a concrete level, as in specific DNA molecules. Or, transform it into a more abstract rendering manipulable by algebra, such as trajectories of allele frequencies over generations. Both of these reductions emphasize the genetic aspect of natural history.

Credit: Johnuniq

Obviously evolutionary processes are not just fundamentally the flux of genetic elements, but genes are crucial to the phenomena in a biological sense. It therefore stands to reason that if we look at patterns of variation within the genome we will be able to infer in some deep fashion the manner in which life on earth has evolved, and conclude something more general about the nature of biological evolution. These are not trivial affairs; it is not surprising that philosophy-of-biology is often caricatured as philosophy-of-evolution. One might dispute the characterization, but it can not be denied that some would contend that evolutionary processes in some way allow us to understand the nature of Being, rather than just how we came into being (Creationists depict evolution as a religion-like cult, which imparts the general flavor of some of the meta-science and philosophy which serves as intellectual subtext).

R. A. Fisher

But shifting from such near-metaphysical generalities to more in-the-trenches science as it is done, we are faced today with the swell of sequence data due to the genomic revolution. What does this matter for our understanding of evolution? Many of the original arguments of evolutionary geneticists such as R. A. Fisher and Sewall Wright were predicated on inferences from the inheritance patterns of a few genes which were easily identifiable by their phenotypic markers. But a more likely frame for the dispute was one where the inferences were purely theoretical, deduction with a minimal level of empirical messiness intervening. In contrast today we live in an age where someone may pity you if you don’t have a very well assembled genome of your organism (on the order of billions of base pairs for mammals), and so have to make due with SNP marker data of a few thousand per individual!

These new data, first and foremost from humans due to the funding priorities of biomedical science, have stimulated a renaissance of method development to take advantage of the richness of the genetic variation now being uncovered. Consider PSMC, which allows one to make demographic inferences of population history from one genome by surveying patterns of heterozygosity within a single individual. Last week I reviewed a preprint which illustrated the power of extensive data analysis in shading and refining previous results which seemed straightforward on the face of it. The reformulation yielded the possibility of natural selection as being a pervasive parameter in human evolution over the past ~100,000 years. The authors compared variation at different categories of bases (synonymous vs. nonsynonomous) across the genome to reinforce both old intuitions and extract novel insights.

Citation: Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72.

Looking at diferences between synonymous vs. nonsyonomous substitutions is a tried & tested technique with a fine pedigree, but more recently haplotype based methods to detect natural selection have been all the rage, due to the emergence of dense genome-wide marker sets. These allow for the inference of correlated patterns of markers across adjacent genomic segments. This trend toward haplotype methods naturally triggered their antithesis, and the resulting synthesis to some extent can be seen in two papers, both Grossman et al., A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection, and Identifying recent adaptations in large-scale genomic data. These are improvements upon earlier work in the aughts, a reassessment which had already started to occur in the literature after the excesses of genomic methods in their detection of ubiquitous selection in human populations. More specifically, the newer techniques focused on recent selective events which leave long blocks of the genome within populations homogenized. As causal markers rapidly increase in frequency due to positive selection, they drag along flanking region in sweep events. For many generations after the initial selection event these flanking regions will produce regions of linkage disequilibrium, as recombination only slowly breaks apart apart the associations across loci. But a key drawback with these methods is that selection is not the only dynamic which results in long haplotypes and linkage disequilibrium. More specifically demographic stochasticity, colloquially the vicissitudes of population history, can also generate long homogeneous blocks of markers. The initial candidate regions yielded by a statistic like iHS were saturated by the effects of population specific history.

CMS, debuted in Grossman et al. 2010, is an attempt to correct for this bug, while retaining the power of haplotype based methods. Natural selection within the genome leaves more evidence behind in regards to its operation than just long halotype blocks and linkage disequilibrium. Selected alleles often exhibit greater between population difference than the average region of the genome (i.e., higher F st). Additionally, a new derived allele segregating within one population at a high frequency is often a telltale marker of recent adaptation, as a de novo mutation in a specific locale turns out to be beneficial. By combining tests which survey patterns of variation across loci (i.e., haplotype based methods), with those within loci and across populations (F st based methods) , CMS zeros in on a few precise narrow candidates by cross-checking with multiple tools. False positive hits aside, another major problem with relying upon a single coarse test is that they often highlight a large region as a target of natural selection. This does not necessarily allow for simple follow up when you have dozens of genes and millions of bases which are potential candidates.

The second paper, Grossman et al. 2013, is less a map of genome-wide variation, than a scan of genome-wide variation with an intent to select choice targets for more detailed analysis. To no one’s surprise for human data sets loci implicated in salient physical characteristics such as height and pigmentation, metabolism, and immune response, are high on the list of candidates. No matter the genuine issue of false positives it does seem that recent human evolution (and frankly, evolution more generally) has a fixation on these traits, no pun intended. I do wonder sometimes if this is just an feature of the fact that we humans notice exterior phenotypes, as well as disease related markers (e.g., metabolic and immune illnesses). One of the major concerns in the second paper is that a selection signature without a phenotype is often without utility, but perhaps the phenotypes are lacking in utility because humans are blind in terms of what traits are of interest. I am still skeptical of explanations for what exactly the target of selection around the EDAR locus in East Asians is.

Two alleles of SLC24A5, citation: Norton, Heather L., et al. “Genetic evidence for the convergent evolution of light skin in Europeans and East Asians.” Molecular biology and evolution 24.3 (2007): 710-722.

One of the more intriguing results from CMS in Grossman et al. 2013 is that a locus with the strongest association with resistance to leprosy also contains SLC24A5. This locus has an allele within it that is almost disjoint in frequency between Europeans and Sub-Saharan Africans. By this, I mean that almost all Africans carry one base, while nearly all Europeans care the other. The allele found in Europeans is dominant in West Asia, and present as frequencies as high as ~50% as far south and east as Sri Lanka. It is a gene which is famously correlated with lighter skin in humans and zebrafish. And yet there remains the mystery that it is present at very high frequencies rather far south, and it is certainly not a necessary condition for light skin. East Asians are nearly fixed for the ancestral variant which is common in Sub-Saharan Africa. A possible explanation is that these sorts of salient phenotypic loci have been reshaped due to very strong bouts of selection targeting particular diseases in the recent past. If this is correct, the phenotypic characteristics which we find salient in human beings may simply be pleiotropic side effects of selective sweeps anchored around disease resistance.

I am not proposing here that genomics can solve and explain evolution. The heirs of G. G. Simpson may have something to say about that. Rather, I am suggesting that the genetic piece of the puzzle will not be lacking in data to any extent within our lifetimes. My hunch is that many evolutionary genetic questions will be soluble when we have thousands of complete genomes of high quality on thousands of organisms. There is no likely windfall of fossils in the near future, so palentology will have to continue to operate in a relatively data constrained environment. For those who work in the domain of evolutionary genetics and genomics the onus is on human ingenuity, and analytic skill and savvy. Thinking hard and deep about difficult problems, rather than putting in long hours on the bench to glean more data.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Layers and layers….

There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*


But ultimately these 10,000 feet debates are more a matter of philosophy than science. At least until the scientific questions are stripped of their controversy and an equilibrium consensus emerges. That will only occur through an accumulation of publications whose results are robust to time, and subtle enough to convince dissenters. This is why Enard et al.’s preprint, Genome wide signals of pervasive positive selection in human evolution, attracted my notice. With the emergence of genomics it has been humans first in line to be analyzed, as the best data is often found from this species, so no surprise there. Rather, what is so notable about this paper in light of the past 10 years of back and forth exploration of this topic?**

By taking a deeper and more subtle look at patterns of the variation in the human genome this group has inferred that adaptation through classic positive selection has been a pervasive feature of the human genome over the past ~100,000 years. This is not a trivial inference, because there has been a great deal of controversy as to the population genetic statistics which have been used to infer selection over the past 10 years with the arrival of genome-wide data sets (in particular, a tendency toward false positives). In fact, one group has posited that a more prominent selective force within the genome has been “background selection,” which refers to constraint upon genetic variation due to purification of numerous deleterious mutations and neighboring linked sites.

The sum totality of Enard et al. may seem abstruse, and even opaque, in terms of the method. But each element is actually rather simple and clear. The major gist is that many tests for selection within the genome focus on the differences between nonynonymous and synonymous mutational variants. The former refer to base positions in the genome which result in a change in the amino acid state, while the latter are those (see the third positions) where different bases may still produce the same amino acid. The ratio between substitutions, replacements across lineages for particular base states, at these positions is a rough measure of adaptation driven by selection on the molecular level. Changes at synonymous positions are far less constrained by negative selection, while positive selection due to an increased fitness via new phenotypes is presumed to have occurred only via nonsynonymous changes. What Enard et al. point out is that the human genome is heterogeneous in the distribution of characteristics, and focusing on these sorts of pairwise differences in classes without accounting for other confounding variables may obscure dynamics on is attempting to measure. In particular, they argue that evidence of positive selective sweeps are masked by the fact that background selection tends to be stronger in regions where synonymous mutational substitutions are more likely (i.e., they are more functionally constrained, so nonsynonymous variants will be disfavored). This results in elevated neutral diversity around regions of nonsynonymous substitutions vis-a-vis strongly constrained regions with synonymous substitutions. Once correcting for the power of background selection the authors evidence for sweeps of novel adaptive variants across the human genome, which had previous been hidden.

There are two interesting empirical findings from the 1000 Genomes data set. First, the authors find that positive selection tends to operate upon regulatory elements rather than coding sequence changes. You are probably aware that this is a major area of debate currently within the field of molecular evolutionary biology. Second, there seems to be less evidence for positive selection in Sub-Saharan Africans, or, less background selection in this population. My own hunch is that it is the former, that the demographic pulse across Eurasia, and to the New World and Australasia, naturally resulted in local adaptations as environmental conditions shifted. Though it may be that the African pathogenic environment is particularly well adapted to hominin immune systems, and so imposes a stronger cost upon novel mutations than is the case for non-Africans. So I do not dismiss the second idea out of hand.

Where this debate about the power of selection will end is anyone’s guess. Nor do I care. Rather, what’s important is getting a finer-grained map of the dynamics at work so that we may perceive reality with greater clarity. One must be cautious about extrapolating from humans (e.g., the authors point out that Drosophila genomes are richer in coding sequence proportionally). But the human results which emerge because of the coming swell of genomic data will be a useful outline for the possibilities in other organisms.

Citation: Genome wide signals of pervasive positive selection in human evolution

* The cartoon qualification is due to the fact that I am aware that selection is stochastic as well.

** Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72., Sabeti, Pardis C., et al. “Detecting recent positive selection in the human genome from haplotype structure.” Nature 419.6909 (2002): 832-837., Wang, Eric T., et al. “Global landscape of recent inferred Darwinian selection for Homo sapiens.” Proceedings of the National Academy of Sciences of the United States of America 103.1 (2006): 135-140., Williamson, Scott H., et al. “Localizing recent adaptive evolution in the human genome.” PLoS genetics 3.6 (2007): e90., Hawks, John, et al. “Recent acceleration of human adaptive evolution.” Proceedings of the National Academy of Sciences 104.52 (2007): 20753-20758., Pickrell, Joseph K., et al. “Signals of recent positive selection in a worldwide sample of human populations.” Genome research 19.5 (2009): 826-837., Hernandez, Ryan D., et al. “Classic selective sweeps were rare in recent human evolution.” Science 331.6019 (2011): 920-924.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Credit: Albozagros


The genetics and history of Tibet are fascinating to many. To be honest the primary reason here is elevation. The Tibetan plateau has served as a fortress for populations who have adapted biologically and culturally to the extreme conditions. Naturally this means that there has been a fair amount of population genetics on Tibetans, as hypoxia is a side effect of high altitude living which dramatically impacts fitness. I have discussed papers on this topic before. And I will probably talk more about it in the future, considering rumblings at ASHG 2012.

But to understand the character of the effect of natural selection on a population it is often very important to keep in mind the phylogenetic context. By this, I mean that evolutionary processes occur over history, and those historical events shape the course of subsequent of phenomena. Concretely, to understand how the Tibetans came to be adapted to high altitudes one must understand who they are related to, and what their long term history is. There is a paper in Molecular Biology and Evolution which attempts to do just that, Genetic evidence of Paleolithic colonization and Neolithic expansion of modern humans on the Tibetan Plateau:

Tibetans live on the highest plateau in the world, their current population size is nearly 5 million, and most of them live at an altitude exceeding 3,500 meters. Therefore, the Tibetan Plateau is a remarkable area for cultural and biological studies of human population history. However, the chronological profile of the Tibetan Plateau’s colonization remains an unsolved question of human prehistory. To reconstruct the prehistoric colonization and demographic history of modern humans on the Tibetan Plateau, we systematically sampled 6,109 Tibetan individuals from 41 geographic populations across the entire region of the Tibetan Plateau and analyzed the phylogeographic patterns of both paternal (n = 2,354) and maternal (n = 6,109) lineages as well as genome-wide SNP markers (n = 50) in Tibetan populations. We found that there have been two distinct, major prehistoric migrations of modern humans into the Tibetan Plateau. The first migration was marked by ancient Tibetan genetic signatures dated to around 30,000 years ago, indicating that the initial peopling of the Tibetan Plateau by modern humans occurred during the Upper Paleolithic rather than Neolithic. We also found evidences for relatively young (only 7-10 thousand years old) shared Y chromosome and mitochondrial DNA haplotypes between Tibetans and Han Chinese, suggesting a second wave of migration during the early Neolithic. Collectively, the genetic data indicate that Tibetans have been adapted to a high altitude environment since initial colonization of the Tibetan Plateau in the early Upper Paleolithic, before the Last Glacial Maximum, followed by a rapid population expansion that coincided with the establishment of farming and yak pastoralism on the Plateau in the early Neolithic.

The two major salient points I think need emphasis are:

1) Massive sample sizes for mtDNA and lesser extent Y chromosomal linages

2) Tibetans are a compound of agriculturalists who arrived onto the plateau >10,000 years, and, hunter-gatherers who date back to the Paleolithic

Citation: Cai, Xiaoyun, et al. “Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes.” PloS one 6.8 (2011): e24282.

There are many issues with this paper that bother me. The broadest interpretation of their thesis is one I find creditable, but in the details I’m left skeptical, confused, and more curious than when I began. Also, I need to add that I talked to the people who presented a poster on this paper at ASHG 2012, though I do not know if they were the authors. They seemed nice, but, also not necessarily totally focused on the questions they were exploring, as opposed to obtaining huge sample sizes and applying standard methods to them. Speak of which, the first thing that jumps out is that their sample is skewed toward what is today Tibet proper, the autonomous province. But Tibetan people have historically lived as far as Sichuan. Only 50% of ethnic Tibetans live in the autonomous region, but well over 90% of their samples are from this area. In terms of exploring adaptation to altitude this is fine, but if you are going to do phylogeography you need better geographical coverage I would think.

But that’s only a minor aside. The bulk of the paper consists of a laundry list of Y and mtDNA haplogroups, and coalescence times. Some of the results are very persuasive to me. There are some Y lineages which exhibit a “star shaped” phylogeny, which usually connotes a recent rapid population expansion. Using other methods the authors have inferred that there was indeed an expansion of population after the introduction of agriculture >10,000 years ago. There is no great reason on prior grounds to be skeptical of this finding. Nevertheless, drilling down produces great confusions, and I am not sure that the coalescence times and phylogenies actually mean what the authors assume they mean.

For example, here is a standard sort of analysis presented in this paper:

We identified a molecular signature of recent population expansion during the early Neolithic time in both paternal (Y-chromosomal D3a-P47 and O3a3c1-M117) and maternal (M9a1a and M9a1b1) lineages (10-7 kya) (table 1). The detailed analysis of haplotype sharing and time of divergence between Tibetans and Han Chinese suggests that the Neolithic population expansion on the Plateau was likely caused by the dispersal of the earliest Neolithic Han Chinese agriculturalists originating about 10 kya in what is now northwestern China….

O3a3c1-M117 is present at frequencies of nearly ~30%, and is connected with the Chinese as you can see above. This dovetails with other recent research which imply relatively recent common ancestors between Tibetans and Chinese. This result can be reconciled to the presence of Paleolithic roots via the fact that admixed populations will give you average results between the two extremes. The problem I have is that I am skeptical that Han Chinese existed 10,000 years ago, just as I am skeptical that Greeks existed 10,000 years ago.

Citation: Cai, Xiaoyun, et al. “Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes.” PloS one 6.8 (2011): e24282.

A quick literature search yields the fact that M117 is modal in particular non-Han ethnic groups resident in southern China and northern Southeast Asia. I am not here proposing that the Hmong introduced M117 to the Tibetans. Rather, I am suggesting that we best be careful in assuming that we know the ethnic distribution of genetic haplogroups 6,500 years before there were any written records from a given region! To me the fact that there is a putative Sino-Tibetan group of languages is strongly indicative of diversification >10,000 years, not the existence of a Han ethnicity ~10,000 year ago. The historical records are clear that ~3,000 years ago the Yangzi river, now the informal dividing line between North China and South China, was the boundary of the zone where Han were demographically dominant. And even then there were clearly pockets of “barbarian” people on the North China plain itself! It simply does not stand up to the test of basic plausibility that the agricultural expansions ~10,000 years B.P. were Han as we would understand Han. The demographic and cultural dominance of the Han in Northeast Asia is a phenomenon of the last 3,000 years, perhaps 4,000 most generously (South China became Sinicized to some extent after the fall of the Latter Han Dynasty ~200 AD, and especially the Tang period ~600-950 AD).

Much of the argumentation is creaky because of these anachronistic assumptions and the casual inferences of contemporary haplogroup frequencies back toward ancient geographical demographic distributions. Ancient DNA has highlighted the danger of this in Europe, and that should update our priors as to the robustness of this sort of analysis. For example, the authors are curious as to the lack of structure of Y chromosomal lineages, combined with the fact of their deep coalescence times across Tibet. Why is this an issue? Because if these Y chromosomal lineages are Paleolithic, then the deep converges across the branches should also correspondent to geographic differences. But they don’t. To me the simplest explanation is that the last 10,000 years have seen a great deal of population movement, and sharply differentiated populations were brought together as agriculture opened up the Tibetan plateau. This presents a problem though with inferring ancient geographic connections from present distributions, since it opens up the possibility of migration, and radical genetic-demographic turnover.

Overall I would say that this paper is interesting and useful, but you should read it closely and not take the author’s inferences too much to heart. Those inferences are grounded in assumptions which may be built on false foundations.

Addendum: Also, a “gap” on a PCA plot does not necessarily mean long term isolation, as they say in the text. It might simply be a function of inadequate sampling. See above. There are many unsupported assertions such as that. But, I would like to add that the authors found a large number “exotic” haplogroups in Lhasa itself, which aligns with what we know about the cultural history of Tibet. Tibetan Buddhism actually is influenced more by extinct variants of South Asian (particularly, Bengali) Buddhism, rather than Chinese Buddhism. Though the demographic pump along the Himalayan border seems to go from the highlands to the lowlands, there were exceptions. And these exceptions tended to be found in Lhasa.

Citation: Cai, Xiaoyun, et al. “Human migration through bottlenecks from Southeast Asia into East Asia during Last Glacial Maximum revealed by Y chromosomes.” PloS one 6.8 (2011): e24282.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

The Pith: Natural selection comes in different flavors in its genetic constituents. Some of those constituents are more elusive than others. That makes “reading the label” a non-trivial activity.

As you may know when you look at patterns of variation in the genome of a given organism you can make various inferences from the nature of these patterns. But the power of those inferences is conditional on the details of the real demographic and evolutionary histories, as well as the assumptions made about the models one which is testing. When delving into the domain of population genomics some of the concepts and models may seem abstruse, but the reality is that such details are the stuff of which evolution is built. A new paper in PLoS Genetics may seem excessively esoteric and theoretical, but it speaks to very important processes which shape the evolutionary trajectory of a given population. The paper is titled Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. Here’s the author summary:

Considerable effort has been devoted to detecting genes that are under natural selection, and hundreds of such genes have been identified in previous studies. Here, we present a method for extending these studies by inferring parameters, such as selection coefficients and the time when a selected variant arose. Of particular interest is the question whether the selective pressure was already present when the selected variant was first introduced into a population. In this case, the variant would be selected right after it originated in the population, a process we call selection from a de novo mutation. We contrast this with selection from standing variation, where the selected variant predates the selective pressure. We present a method to distinguish these two scenarios, test its accuracy, and apply it to seven human genes. We find three genes, ADH1B, EDAR, and LCT, that were presumably selected from a de novo mutation and two other genes, ASPM and PSCA, which we infer to be under selection from standing variation.

The dynamic which they refer to seems to be a reframing of the conundrum of detecting hard sweeps vs. soft sweeps. In the former you case have a new mutation, so its frequency is ~1/(2N). It is quickly subject to natural selection (though stochastic processes dominate at low frequencies, so probability of extinction is high), and adaptation drives the allele to fixation (or nearly to fixation). In the latter scenario you have a great deal of extant genetic variation, present in numerous different allelic variants. A novel selection pressure reshapes the frequency landscape, but you can not ascribe the genetic shift to only one allele. It is no surprise that the former is easier to model and detect than the latter. Much of the evolutionary genomics of the 2000s focused on hard sweeps from de novo mutations because they were low hanging fruit. The methods had reasonable power to detect them (as well as many false positives!). But of late many are suspecting that hard sweeps are not the full story, and that much of evolutionary genetic process may be characterized by a combination of hard sweeps, soft sweeps (from standing variation), various forms of negative selection, not to mention the plethora of possibilities which abound in the domain of balancing selection.

Many of the details of the paper may seem overly technical and opaque (and to be fair, I will say here that the figures are somewhat difficult to decrypt, though the subject is not one that lends itself to general clarity), but the major finding is straightforward, and illustrated in figure 4 (I’ve added labels):

- The y-axis represents the frequency of the selected allele(s) at the initial start of the selection phase

- The x-axis frequency represents a population scaled selection coefficient: α = 4 Ns. Recall that N is the population size, and s is the standard selection coefficient, which measures the relative fitness difference between an individual/gene against the population median. A selection coefficient of 0.10 (10% increased fitness) is strong. One of 0.01 (1%) is modest.

What the results above, derived from simulations using particular parameters relevant to population genetic models and the output statistics (e.g., iHS, EHH, Tajima’s D), show you is that it is easier to differentiate forms of selection when:

- For standing variation the selected variants are present at a higher initial frequency when selection initiates. This is not relevant for de novo mutation, where the frequency is very low by definition. Remember that the latter case is actually a subset of the former. If the standing variation model has a parameter which varies in frequency, as the proportion converged upon 1/(2 N) you just get the de novo scenario.

- The stronger the selection event, the greater the power to detect and correctly assign selection for standing variation. This is rather straightforward on first blush. The main exception seems to be in panel e, where increased strength of selection decreases the ability to differentiate the models when the adaptive phase initiates when the initial allele frequency is low. I assume here you have a situation where it is difficult to distinguish the two models, as de novo and standing variation are converging. Note that it is easier to assign a hard sweep from a de novo mutation when the final frequency (or the frequency you are attempting to detect) is lower. Why? Probably because as the mutation fixes you are removing much of the variant genomic information you need to infer the trajectory of the selected variant (this is true for iHS).

All this may seem abstract. But what you need to do to make some sense of this is to visualize the trajectory of the evolutionary dynamics in temporal and concrete terms. For example, a de novo mutation which drives adaptation will rapidly expand in the population over time. Because of this phenomenon there will be a hitchhiking event where the flanking regions of the favored allele also rise in frequency. This generates a extended region of homogeneity in the genome, in direction proportion of the frequency of the haplotype. This block of homogeneity eventually decays as genetic recombination breaks apart the physical association of the markers which were found together on the original mutant by chance. This is why the power to detect these events declines over time; the perturbation decays, and the genome reverts to equilibrium. In contrast selection against standing variation is more complex, and therefore more difficult to detect, as it does not produce a clear and distinct signal as often. You may have numerous alleles dispersed across wide regions of the genome amenable to being driven up in frequency by adaptive pressures. This generates a mass action shift in variants, but does not entail the production of wide and distinctive homogeneous blocks across the genome. Rather, you have a larger number of alleles subject to less intensive individual selection. Though some of the same consequences are entailed as in the de novo mutation case, the magnitude will be sharply attenuated in any given region of the genome.

Though the conceptual & methodological issues here are of interest in and of themselves (e.g., can you trust the Approximate Bayesian Computation framework to generate simulations which give useful results?), there are also some analyses of real human genes. These are not revolutionary, they’re loci which have been analyzed before. But methods need to be judged against reality at some point, and this is an attempt. The table below shows their results.

Some of these genes should be familiar to you. If not, see the function column. I do want to mention that EDAR has been implicated in hair thickness in East Asians. The most amusing aspect of this gene is that it can turn mice into Asians, at least in their hair form. Obviously they focused on single populations. They note in the methods that more populations would introduce demographic complexities into their simulations, and it seems likely that they were already pushing the realistic boundaries of computations which you might want to run routinely in a laboratory. But, this simplification might explain some ambiguity with ADH1B, which has been found in West Asia as well (forgoing the straightforward model in all likelihood of one single sweep in East Asia). An important issue then may be the population sensitivity of these methods. One could imagine that selection at a gene is easy to discern in population A, but not population B. One population may shift to a different phenotype through standing variation, while another was subject to a hard sweep from de novo mutation. The devil here is in the details. There may not be one narrative to rule them all.

The most important result from this paper was its exploration of the reasonable parameter space over which one can make robust inferences about the specific variety of selection which is operative (or lack thereof). In the near future computational power and a surfeit of empirical data sets will make it so that there will be great temptation to generate reams of results in a blind fashion utilizing off the shelf techniques. But techniques without subtly and human judgment can lead to confusion and falsity. It is useful to know the scenarios where one would expect large numbers of false positives or low statistical power, a priori. That way you may save yourself a great deal of time after the fact.

As for soft vs. hard sweeps. This isn’t simply a question of interest and relevance to population geneticists and genomicists. The nature of adaptation is a question of deep importance across evolutionary biology. The balance between these two phenomena are important in characterizing the mode and tempo of evolution. It may be that in fact the ratio varies as a function of the tree of life, so that evolution may operate with slightly different rules contingent upon taxon.

Citation: Peter BM, Huerta-Sanchez E, Nielsen R (2012) Distinguishing between Selective Sweeps from Standing Variation and from a De Novo Mutation. PLoS Genet 8(10): e1003011. doi:10.1371/journal.pgen.1003011

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

I have blogged about the genetics of altitude adaptation before. There seem to be three populations in the world which have been subject to very strong natural selection, resulting in physiological differences, in response to the human tendency toward hypoxia. Two of them are relatively well known, the Tibetans and the indigenous people of the Andes. But the highlanders of Ethiopia have been less well studied, nor have they received as much attention. But the capital of Ethiopia, Addis Ababa, is nearly 8,000 feet above sea level!

Another interesting aspect to this phenomenon is that it looks like the three populations respond to adaptive pressures differently. Their physiological response varies. And the more recent work in genomics implies that though there are similarities between the Asian and American populations, there are also differences. This illustrates the evolutionary principle of convergence, where different populations approach the same phenotypic optimum, though by somewhat different means. To my knowledge there has not been as much investigation of the African example. Until now. A new provisional paper in Genome Biology is out, Genetic adaptation to high altitude in the Ethiopian highlands:

We highlight several candidate genes for involvement in high-altitude adaptation in Ethiopia, including CBARA1, VAV3, ARNT2 and THRB. Although most of these genes have not been identified in previous studies of high-altitude Tibetan or Andean population samples, two of these genes (THRB and ARNT2) play a role in the HIF-1 pathway, a pathway implicated in previous work reported in Tibetan and Andean studies. These combined results suggest that adaptation to high altitude arose independently due to convergent evolution in high-altitude Amhara populations in Ethiopia.

The main shortcoming about this paper for me is that it does not highlight the evolutionary history of this adaptation. In the paper the authors compared the Amhara (a highland population) to nearby lowland populations. But did not explore the nature of the population structure and how it might have influenced the arc of adaptation. Are these very ancient adaptations? Or new ones? It seems that hominins have been resident in Ethiopian for millions of years. If this is so presumably there have been adaptations to higher elevations from time immemorial. But what if these adaptations are new?

More pointedly the Ethiopians can be modeled as a compound of an Arabian population with an indigenous East African one. If this is a genuine recent admixture event, then one might be able to ascertain via haplotype structure whether the adaptive variants derive from ancient African genetic variation, or whether they’re novel mutations. It seems that this paper is a good first step, but there’s a lot more to see here….

Citation: Genome Biology, doi:10.1186/gb-2012-13-1-r1

Image credit: Wikipedia

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

ResearchBlogging.org In reading The cultural niche: Why social learning is essential for human adaptation in PNAS I couldn’t help but think back to a conversation I had with a few old friends in Evanston in 2003. They were graduate students in mathematics at Northwestern, and at one point one of them expressed some serious frustration at the fact that so many of the science and business students in his introductory calculus courses simply wanted to “learn” a disparate set of techniques, rather than understand calculus. The reality of course is that the vast majority of people who ever encounter calculus aim to learn it for reasons of utility, not so that they can grok the fundamental theorem of calculus. With the proliferation of tools such as Mathematica and powerful portable calculators fewer and fewer people are getting their hands dirty with calculus in an analytic sense, and more often see it as simply a “requirement” which they have to pass.

Calculus, and mathematics generally, is a clean and crisp human invention. In the late 17th century Isaac Newton and Gottfried Leibniz originated calculus as we understand it. Later thinkers extended their work. But for the vast majority of humans who have ever learned calculus it is simply a “black box” set of techniques which work rather magically. They did not contribute anything new to the body of knowledge which they drew upon. Mathematics is part of our cultural patrimony, we implicitly stand upon the shoulders of giants without apology. Such is to be human.


In a The cultural niche Robert Boyd, Peter J. Richerson, and Joseph Henrich sketch out the broad thesis about the nature of human culture which is explored in more depth in works such as Not by Genes Alone and The Origin and Evolution of Cultures. Though the subject which they tackle is vast, the powers of precise description and crisp inference can sometimes lead to surprising conclusions:

In the last 60,000 y humans have expanded across the globe and now occupy a wider range than any other terrestrial species. Our ability to successfully adapt to such a diverse range of habitats is often explained in terms of our cognitive ability. Humans have relatively bigger brains and more computing power than other animals, and this allows us to figure out how to live in a wide range of environments. Here we argue that humans may be smarter than other creatures, but none of us is nearly smart enough to acquire all of the information necessary to survive in any single habitat. In even the simplest foraging societies, people depend on a vast array of tools, detailed bodies of local knowledge, and complex social arrangements and often do not understand why these tools, beliefs, and behaviors are adaptive. We owe our success to our uniquely developed ability to learn from others. This capacity enables humans to gradually accumulate information across generations and develop well-adapted tools, beliefs, and practices that are too complex for any single individual to invent during their lifetime.

The authors use examples of foraging societies which have “lost” knowledge through population crashes to illustrate the collective nature of human knowledge. It is reputedly an African proverb that “when an old man dies a library burns.” This was certainly the case when a particular group of Greenland Inuit experienced a population collapse which impacted their older cohorts to the point where they all expired before passing on their knowledge and skills. The community forgot the techniques of hunting caribou or making kayaks! The younger individuals who survived understood that these were possibilities, but none of them had the suite of skills necessary to replicate the abilities of past generations.

Let’s use a more contemporary example. Imagine you had hand 200 business students who had completed a term of differential calculus. Now give them a year to infer what they would have learned in integral calculus. I’m not sure even with a knowledge of differential calculus that a random set of 200 business students would be able to derive much of integral calculus. Part of the issue here is that often students who must take mathematics, but are not of a mathematical bent themselves, have no “big picture” grasp, but master a set of discrete techniques. They solve problems of a specific form, but are not able to improvise anew from first principles, because they’re rarely asked to do such things.

As humans we always take for granted an enormous store of cultural knowledge, which we absorb both implicitly and explicitly. We are adapted to be cultural creatures. This is why the authors posit the “cultural niche” rather than “cognitive niche” hypothesis in terms of the transmission of sets of ideas. The cognitive niche hypothesis emphasizes the individual competencies of humans. We have relatively advanced general intelligence aptitudes, and we are master imitators. Therefore, once an innovation occurs, instead of reinventing the wheel, humans replicate. This is far cheaper than the act of invention. A sequential and synergistic set of imitations can then lead to a ratchet effect of cultural evolution, as beneficial memes sweep through populations.

But there is a problem with this thesis: imitation can be viewed as a “free rider” strategy. Why think for yourself when you can let others do the heavy lifting for you! Don’t worry about the large menu, just have what “he’s having.” The problem is that this cheap and effective strategy is liable to spread, and over time more and more imitators anchor on upon a few keystone innovators. These hard working de facto altruists though eventually become not keystones, but weak links. If one of them is gifted not with prudence and intellect, but arrogance and blindness, then a whole population can find itself hurtling over the cliff. Imitation is the root of irrational herds and chaotic mass social behavior.

What’s the solution to this? Naturally it is not a fixation upon a given strategy, but a facultative flexibility in learning from others. We don’t just imitate anyone, we imitate prestigious and successful individuals. Ergo, endorsements by sporting figures of seemingly unrelated products. And the nature of the environment impacts how liable we are to imitate or innovate. In a world subject to stasis the cost of individual innovation has few upsides. Best to simply do as “the ancestors did.” Collective cultural memory plays a critical role in passing down “best practices.” But sometimes this can become maladaptive when circumstances change. European peasants resisted the attempts by their rulers to promote the cultivation of potato for centuries because of its resemblance to nightshade. There was a deep-seated custom, which resulted in generations of suspicion which had to be overcome. Today the potato is a “customary” and “traditional” crop in many of these societies. Conservatism had its costs, as another food crop may have buffered famines in France and Russia.

An implication of this broader dynamic might be that environments which change more will have less pressure to enforce imitative conformity. I suspect that the protean social and technological milieu of the developed world does fit this description. “Doing your own thing” makes good sense when traditions and customs can never take hold because the background conditions of your environment are always in flux. Instead of vertical transmission of collective memory over time you see horizontal sweeps of fads and fashions across sets of peers, with each set of norms being overturned in rapid succession by that of the next cohort.

But let’s go back to the beginning: what does culture have to do with human evolution? The figure to the left was generated by Luke Jostins using hominin data sets. By this, I mean individuals who are not anatomically or behaviorally modern humans before ~200,000 years before the present at the outer limit. The story told above leans heavily on the “standard model” of recent human evolution, whereby modern humans arose ~50,000 years ago in Africa, and swept like wildfire around the world. The reason that modern humans conquered all before them is laid out explicitly in the paper: we are an incredibly flexible cultural creature. And because of the demands of culture you see a rapid encephalization of cranial capacities over the last “500,000″ years. From Luke’s data it looks like actually it was more the last ~250,000 years, before which there was a more gentle ascent upwards. In any case, we may also be living through a revision of the old model of recent human origins. The details are yet to be written, but it looks like the story is going to be a little more complex and multi-layered that one East African tribe of African Eve exploding outward from its ancestral territory .

But the way I see it that only makes the idea of human adaptation to a cultural environment more plausible. Instead of a singular mutation ~50,000 years ago conferring the ability to speak, as Richard Klein would have it, it may have been a co-evolutionary process where the brain and culture operated in tandem, ratcheting toward modernity step by step. Still, one would have to revise the thesis that this is the hallmark of a sui generis behaviorally modern human lineage. Neandertals, for example, seem to have been subject to the same long term dynamics of encephalization….

Citation: Robert Boyd, Peter J. Richerson, & Joseph Henrich (2011). The cultural niche: Why social learning is essential for human adaptation PNAS : 10.1073/pnas.1100290108

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

In the image to the left you see three human males. You can generate three pairings of these individuals. When comparing these pairs which would you presume are more closely related than the other pairs? Now let me give you some more information. The rightmost image is of the president of Tanzania. The middle image is of the president of Taiwan (Republic of China). And finally, the leftmost image is of the prime minister of Papua New Guinea. With this information you should now know with certainty that the prime minister of Papua New Guinea and the president of Taiwan are much more closely related than either are to the president of Tanzania. But some of you may not have guessed that initially. Why? I suspect that physical inspection may have misled you. One of the most salient visible human characteristics is of the complexion of our largest organ, the skin. Its prominence naturally leads many to mistakenly infer relationships where they do not exist.

This was certainly an issue when European explorers encountered the peoples of Melanesia. An older term for Melanesians is “Oceanic Negro,” and some sources suggest that the Spaniards who named the island New Guinea did so with an eye to the old Guinea on the coast of West Africa. To the left is an unrooted tree which illustrates the relationships between Papuans, Bantu from Kenya, and Han Chinese. Since the font is small I’ve underlined the focal populations in red. Africans are always the “outgroup” to any two non-African populations. This is a robust pattern whenever you look at averaged total genome phylogenies. In other words, when you don’t privilege particular genes in a phylogeny humanity can be divided into African and non-African branches.

ResearchBlogging.org But, if you look at pigmentation genes you get a different picture altogether. As it happens, not only is variation in skin color a trait of great social importance, but it turns out to be one of the few phenotypes whose genetic architecture has been well elucidated by genomics. There are about half a dozen genes responsible for most of the between population variation in complexion. Populations far from the equator seem to have developed parallel means toward lighter skins, while those near the equator are more likely to exhibit similarities. In other words, the phylogeny of these specific genes is out of sync with the average phylogeny of the genes in these populations. It is the latter which is a good reflection of demographic history, not the former.

A new paper in PLoS Genetics looks at these sorts of parallel trends more broadly, not just focusing on one trait. In particular, the authors explore the possibilities of natural selection operating upon standing genetic variation across divergent lineages. This means that there need not be a novel mutation which is driven up toward fixation rapidly by positive selection in a “hard sweep.” Rather, as populations diversify they may be subject to selection pressures which take their extant genetic variation and shift the mean of the quantitative trait in a particular direction, altering the balance of underlying allele frequencies rather than substituting novel genetic variants at one or two loci. These are “soft sweeps.”

First, the author summary, Parallel Adaptive Divergence among Geographically Diverse Human Populations:

Identifying regions of the human genome that differ among populations because of natural selection is both essential for understanding evolutionary history and a powerful method for finding functionally important variants that contribute to phenotypic diversity and disease. Adaptive events on timescales corresponding to the human diaspora may often manifest as relatively small changes in allele frequencies at numerous loci that are difficult to distinguish from stochastic changes due to genetic drift, rather than the more dramatic selective sweeps described by classic models of natural selection. In order to test whether a substantial proportion of interpopulation genetic differences are indeed adaptive, we identify loci that have undergone moderate allele frequency changes in multiple independent human lineages, and we test whether these parallel divergence events are more frequent than expected by chance. We report a significant excess of polymorphisms showing parallel divergence, especially within genes, a pattern that is best explained by geographically varying natural selection. Our results indicate that local adaptation in humans has occurred by subtle, repeated changes at particular genes that are likely to be associated with important morphological and physiological differences among human populations.

The statistics in this paper can be a bit daunting, but the basic logic is simple. The HGDP data set has a lot of SNP information on ~50 populations. These populations also exhibit variation in their phylogenetic relationships. We know, for example, that Amerindian populations are closer to East Asians than they are to Europeans. They pruned their population set down to very genetically distinctive groups. Those which don’t have too much admixture and are in ecologically unambiguous regions (so discard the Uyghur). For example, Europeans and East Asians in temperate climes, Pygmies and Papuans in the tropics. Comparing two pairs which were phylogenetically unrelated but ecologically distinctive in a similar manner they found broad evidence of parallel shifts in underlying allele frequency on a range of SNPs.

Remember, these are polymorphisms found in all populations. So natural selection is perturbing the frequencies around an average. Additionally, they focused on alleles with intermediate global frequency, so that one presumes there’s enough genetic variance for selection to be effective. Theoretically and through simulation the authors understand that a certain number of SNPs would be correlated in the manner which would imply parallel positive selection, and so possible convergence of trait values. But the authors found that for several comparisons across groups there was an excess of detected SNPs. And, the distribution across regions of the genomes for these detected SNPs is very suggestive. There was an excess of SNPs in coding regions of the genome. And, there was an even greater excess on base pairs where a change in state would result in a change in the protein! In other words, regions of the genome implicated in genuine function show more hints of convergence across unrelated lineages.

They also found particular patterns in the genes which were enriched for parallel selection:

Genes overlapping parallel divergent SNPs were modestly enriched for diverse functional categories associated with various cell types including neurons, lymphocytes, cancer, and epithelium…Among the most extreme parallel divergent genes (observed at a threshold of 0.5%) were the skin keratinization gene ABCA12SH2B1, which controls serum letpin levels and body weight…GRM5, a glutamate receptor associated with schizophrenia…and with pigmentation via the closely linked TYRATP2A2, which causes a neuropsychiatric/keratinization disorder…F13A1, a coagulation factor linked to numerous cardiovascular diseases and to Alzheimer’s…and IFIH1, associated with antiviral defense, type 1 diabetes, and psoriasis...The pleiotropic nature of many of these genes suggests that selection on one trait may have affected the evolution of other traits.

On the last part: there’s a “correlation matrix” between genetic variance and trait variance. If you slam the genome with natural selection there will usually be a correlated response on a host of traits unrelated to the target of selection because of the complex contingent nature of biological pathways. Modulating gene X to shift the value of trait 1 to increase local fitness can have large consequences for trait 2, trait 3, trait 4, and so forth.

Of course most of the SNPs detected are not targets of selection. Remember, one assumes that simply due to random chance some SNPs will exhibit patterns which spuriously match those of regions which are the targets of parallel selection pressures. Rather, the importance of this paper is that it is another step to fleshing out the broader general theory of how adaptation and demographic events interplay across the arc of human history. It was always understood that convergent evolution is a force which must have shaped humans as they diversified and radiated across the world, but the genetic details were often left unspecified for various reasons. By filling in those details we may be able to stumble upon some very interesting general insights about the parameters which frame evolutionary process.

For example:

South Americans may carry alleles adapted to temperate climates due to their ancestral migration across Beringia, and they may have lacked adequate time and/or genetic variation to completely re-adapt to a tropical environment. One SNP that fits this hypothesis lies in DDB1, which protects the skin from solar UV exposure…and is one of the strongest examples of this parallel divergence pattern, with one allele fixed in South America, over 90% in Europe and East Asia, and less than 40% in African and Oceania….

A biological anthropologist once told me that South American Indians look like Siberians in their bodily proportions in relation to other tropical people. Which makes sense since they are probably the descendants of Siberians! As for skin color, this is an interesting trait insofar as it looks like that our species evolved very dark skin relatively early in our history, at the point when we lost our fur. Tropical populations usually exhibit a modicum of functional constraint. They don’t deviate too far from the ancestral type. In contrast, temperate zone populations often “lose” the function on these pigment producing genes, though differently. To “break” a gene is far easier than to put it back together, and I suspect that’s what you’re seeing with Amerindians in the tropics. A small ancestral population which traversed Beringia only carried a non-functional copy, which probably accumulated many mutations. Once they reentered the tropical zone they “needed” function again, but that would take too many independent steps for 10-15,000 years to suffice.

As noted by the authors this was definitely a first pass. With thicker sequence level data, and better population coverage, presumably one could explore more fine-grained questions. But at least there are results which confirm what one always assumed in theory. Sometimes it is just good to do a check, because actually you never really know….

Citation: Tennessen JA, & Akey JM (2011). Parallel Adaptive Divergence among Geographically Diverse Human Populations PLoS Genetics

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

ResearchBlogging.org Last week I reviewed ideas about the effect of “exogenous shocks” to an ecosystem of creatures, and how it might reshape their evolutionary trajectory. These sorts of issues are well known in their generality. They have implications from the broadest macroscale systematics to microevolutionary process. The shocks point to changes over time which have a general effect, but what about exogenous parameters which shift spatially and regularly? I’m talking latitudes here. The further you get from the equator the more the climate varies over the season, and the lower the mean temperature, and, the less the aggregate radiation the biosphere catches. Allen’s rule and Bergmann’s rule are two observational trends which biologists have long observed in relation to many organisms. The equatorial variants are slimmer in their physique, while the polar ones are stockier. Additionally, there tends to be an increase in mean mass as one moves away from the equator.

But these rules are just general observations. What process underlies these observations? The likely culprit would be natural selection of course. But the specific manner in which this process shakes out, on both the organismic and genetic level, still needs to be elucidated in further detail. A new paper in PLoS Genetics attempts to do this more rigorously and deeply than has been done before for one particular world wide mammalian species, H. sapiens sapiens. We have spanned the latitudes and longitudes, and so we’re a perfect test case for an exploration of the broader microevolutionary forces which shape variation.

The paper is Adaptations to Climate-Mediated Selective Pressures in Humans. Its technical guts can be intimidating, but its initial questions and final answers are less daunting. So let’s jump straight to the last paragraph of the discussion:

The results of this genome scan not only increase our understanding of the genetic landscape of adaptation across the human genome, but they may also have a more practical value. For example, they can be used to select candidate genes for common disease risk and to generate specific testable hypotheses regarding the functions of specific genes and variants. While the results of genome-wide scans for association with diseases and other traits are accumulating at a rapid pace, interpretation of these results is often ambiguous because the power to detect all common variants that are important in the etiology of the phenotype is incomplete. This is especially true in the case of complex traits, where variants at many loci may contribute to the phenotype, each with a small effect. By combining the evidence from GWAS with evidence of selection, it may be possible to separate true causative regions from the background noise inherent in genome-wide screens for association. To facilitate this, all of our empirical rank statistics are publically available. Moreover, results of selection scans that detect evidence for spatially-varying selection may be especially relevant to diseases that show substantial differences in prevalence across ethnic groups (e.g., sodium-sensitive hypertension, type 2 diabetes, prostate cancer, osteoporosis). In the future, this approach could be extended by including additional populations and aspects of the environment to gain a more complete understanding of how natural selection has shaped variation across the genome in worldwide populations. Furthermore, whereas we relied on linkage disequilibrium between (potentially un-genotyped) adaptive variants and genotyped SNPs, whole genome re-sequencing data should give a more complete picture of the variation that underlies adaptation.

How’d they infer this? First, they had a pretty wide coverage of populations from across the world. They pooled the HGDP and HapMap, as well as a few other populations of interest, Ethiopians, some Siberian groups, and Australian Aboriginals. I do wish that the Aboriginal data set was public, but it doesn’t seem to be! The Ethiopians are I assume the ones you can find in Behar et al. The authors had a null model which was predicated on the fact that variation in the frequencies of given genetic morphs, single nucleotide polymorphisms, should be bested predicted by population history and relationships. That is, two populations will differ on a given locus in proportion to their genetic divergence, due to random forces such as genetic drift. Perturbations from this null model are possible targets of natural selection, which reshapes regions of the genome in a deterministic manner aiming at particular ends. Two 21st century classic examples of this phenomenon seem to be skin pigmentation and lactase persistence. Different populations with the same phenotype, in particular, light skin and the ability to digest lactose sugar as an adult, exhibit divergent genetic architectures.

They naturally looked to see how these deviations tracked environmental parameters you see above. Keep in mind that they did take into account correlations between these variables. Additionally, correlation does not equal causation, so there could be other variables which are correlated with the ones which they explored which might be responsible for the systematic perturbations.

Their method yielded a Bayes factor (BF) which measures the deviation from the null model for a given SNP. To judge off the bat whether these SNPs are plausibly the targets of adaptation you want to check to see if they’re enriched for certain classes of SNPs. They found that the SNPs which rejected the null model, where population history and demographics predicts genetic variation, tended to be much more likely to be genic or nonsynonymous. This means that the base pair is embedded in a coding gene, as opposed to much of the genome which isn’t translated into proteins. A nonsynonymous base pair is one at a location which changes the protein coded. Normally these sorts of changes are selected against because you don’t want to change the protein function, but when a population is adapting to a new environment this is obviously not so.

There are a host of results in the paper, but one pattern which seemed of interest was that different sets of SNPs can be selected in different population pools. Below are two panels which show the SNPs with significant BF, and how they vary as a function of the climatic variable depending upon the populations which are sampled. To the left you see the cluster which varied in western Eurasia, while in the left you see those which varied in eastern Eurasia. In a broad sense the target of selection was the same, but the specific SNPs which were pulled out the set of potential targets still exhibits stochasticity:

Natural selection is deterministic in the broadest scale, but in its instantiations it can exhibit a great deal of randomnes. Same phenotype. Different genotype. Similarly, the heat death of the universe may be determined, but there’s a lot of contingency of epiphemenonal detail between now and then. Modulating the range of populations analyzed often shifted the value of the statistic for a given SNP. Remember, averaging over the aggregate can remove important local information. That being said, the Venn Diagram below shows that there was a disproportionate tendency for the signals detected to be world wide. This indicates that the wheel isn’t reinvented as much as we might think. I wonder if it points to the limitations baked into the human genome in terms of the plasticity and flexibility of all its various pathways. There’s a structural engineer vetoing the elegant fancies of the architect?

The leftmost panel highlights the West Eurasian signals and the middle panel the East Eurasians.

As noted above these sorts of studies have both evolutionary and biomedical relevance. Perhaps the most intriguing result, albeit expected from other areas of research, is the role of antagonistic pleiotropy in many diseases. Concretely, it may be that a change in a particular location may increase reproductive fitness in a novel environment at the cost of later morbidity in life. The authors suggest that pathogenic resistance and inflammatory response may have the side effect of increasing susceptibility to a range of diseases of old age. Why is this important? I think that the authors are implying in part that a plausible evolutionary mechanism of adaptation should change our prior expectation that a given genome wide association is a false positive. At least I think that. If a SNP was the target of natural selection and shows up on GWAS, keep an eye on it! All the better if you have a good functional understanding of what’s going on there.

But more long-term, it might change our perception of the basal risk for classes of morbidity as they vary by population. Human populations have had different evolutionary histories. Their disease risks then might vary a great deal. Between population differences may be a lot less paradoxical than we think….

Citation: Hancock AM, Witonsky DB, Alkorta-Aranburu G, Beall CM, & Gebremedhin A (2011). Adaptations to Climate-Mediated Selective Pressures in Humans PLoS Genetics, : 10.1371/journal.pgen.1001375

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS


Credit: Karl Magnacca

The Pith: In this post I review some findings of patterns of natural selection within the Drosophila fruit fly genome. I relate them to very similar findings, though in the opposite direction, in human genomics. Different forms of natural selection and their impact on the structure of the genome are also spotlighted on the course of the review. In particular how specific methods to detect adaptation on the genomic level may be biased by assumptions of classical evolutionary genetic models are explored. Finally, I try and place these details in the broader framework of how best to understand evolutionary process in the “big picture.”

A few days ago I titled a post “The evolution of man is no cartoon”. The reason I titled it such is that as the methods become more refined and our data sets more robust it seems that previously held models of how humans evolved, and evolution’s impact on our genomes, are being refined. Evolutionary genetics at its most elegantly spare can be reduced down to several general parameters. Drift, selection, migration, etc. Exogenous phenomena such as the flux in census size, or environmental variation, has a straightforward relationship to these parameters. But, to some extent the broadest truths are nearly trivial. Down to the brass tacks what are these general assertions telling us? We don’t know yet. We’re in a time of transitions, though not troubles.

ResearchBlogging.org Going back to cartoons, starting around 1970 there were a series of debates which hinged around the role of deterministic adaptive forces and random neutral ones in the domain of evolutionary process. You have probably heard terms like “adaptationist,” “ultra-Darwinian,” and “evolution by jerks” thrown around. All great fun, and certainly ripe “hooks” to draw the public in, but ultimately that phase in the scientific discourse seems to have been besides the point. A transient between the age of Theory when there was too little of the empirics, and now the age of Data, when there is too little theory. Biology is a very contingent discipline, and it may be that questions of the power of selection or the relevance of neutral forces will loom large or small dependent upon the particular tip of the tree of life to which the question is being addressed. Evolution may not be a unitary oracle, but rather a cacophony from which we have to construct a harmonious symphony for our own mental sanity. Nature is one, an the joints which we carve out of nature’s wholeness are for our own benefit.

The age of molecular evolution, ushered in by the work on allozymes in the 1960s, was just a preface to the age of genomics. If Stephen Jay Gould and Richard Dawkins were in their prime today I wonder if the complexities of the issues on hand would be too much even for their verbal fluency in terms of formulating a concise quip with which to skewer one’s intellectual antagonists. Complexity does not make fodder for honest quips and barbs. You’re just as liable to inflict a wound upon your own side through clumsiness of rhetoric in the thicket of the data, which fires in all directions.

In any case, on this weblog I may focus on human genomics, but obviously there are other organisms in the cosmos. Because of the nature of scientific funding for reasons of biomedical application humans have now come to the fore, but there is still utility in surveying the full taxonomic landscape. As it happens a paper in PLos Genetics, which I noticed last week, is a perfect complement to the recent work on human selective sweeps. Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans:

In Drosophila, multiple lines of evidence converge in suggesting that beneficial substitutions to the genome may be common. All suffer from confounding factors, however, such that the interpretation of the evidence—in particular, conclusions about the rate and strength of beneficial substitutions—remains tentative. Here, we use genome-wide polymorphism data in D. simulans and sequenced genomes of its close relatives to construct a readily interpretable characterization of the effects of positive selection: the shape of average neutral diversity around amino acid substitutions. As expected under recurrent selective sweeps, we find a trough in diversity levels around amino acid but not around synonymous substitutions, a distinctive pattern that is not expected under alternative models. This characterization is richer than previous approaches, which relied on limited summaries of the data (e.g., the slope of a scatter plot), and relates to underlying selection parameters in a straightforward way, allowing us to make more reliable inferences about the prevalence and strength of adaptation. Specifically, we develop a coalescent-based model for the shape of the entire curve and use it to infer adaptive parameters by maximum likelihood. Our inference suggests that ~13% of amino acid substitutions cause selective sweeps. Interestingly, it reveals two classes of beneficial fixations: a minority (approximately 3%) that appears to have had large selective effects and accounts for most of the reduction in diversity, and the remaining 10%, which seem to have had very weak selective effects. These estimates therefore help to reconcile the apparent conflict among previously published estimates of the strength of selection. More generally, our findings provide unequivocal evidence for strongly beneficial substitutions in Drosophila and illustrate how the rapidly accumulating genome-wide data can be leveraged to address enduring questions about the genetic basis of adaptation.

Figure 1 C shows the top line. As you can see, there’s a “trough” around non-synonymous substitutions. Non-synonymous simply means that a base pair substitution at that position within the codon changes the amino acid encoded. In contrast, a synonymous change does not. A substitution is not just a mutant variant though. It is rather an assessment of a population level shift from one allele to another. Neutral theory posited that most substitutions were not driven by natural selection, but rather random walk processes. Ergo, most evolutionary change was not adaptive. A simple way to check the power of selection against this background of stochastic variation is to measure the ratio of substitution between non-synonymous and synonymous bases. But this sort of thing is more appropriate when comparing closely related species. In the paper on selective sweeps in humans obviously that’s not going on, they were looking within one species. Instead the authors looked at reduction of variation across regions which may have been targets of natural selection. The reduction occurs because when one particular allele becomes the target of strong positive selection it pulls along adjacent linked regions in a “hitchhiking” process. Recombination works against this, resulting in decay over time of linkage disequilibrium which spikes in th wake of selection.

But these conceptions are predicated on a simple model of the emergence of variants, and the way selection does, or doesn’t, target these variants. One imagines a new mutant which arises against the ancestral genetic background. In a single-gene model the probability of fixation, that is, going to ~100% and substitution in the population, is 1/N (or 2N for diploid). In plain English the fixation probability for a mutant is inversely proportional to the effective population size. In contrast, the probability of fixation of a mutant which is selectively favored is proportional to its selection coefficient, which simply measures its fitness as a ratio to that of the population mean. The fixation of neutral variants is random walk, and the time until fixation is directly proportional to population size. In contrast, selectively favored variants can sweep to fixation rather quickly. Being very conservative one can infer that the fixation of lactose tolerance in Northern Europeans due to a mutation on the LCT gene took about ~7,000 years, or a little less than 300 generations. Because of this rapidity recombination has far less leisure with which to “chop” apart the physical associations of variants on the ancestral mutant genetic background. No wonder the LCT locus has one of the longest “haplotype blocks” in the European genome; a sequence of associate markers.

But let’s modify our mental model a bit. Imaging that a genetic variant has been floating around at a low frequency for a long time. There may be many copies of the mutant, associated with different genetic variants due to the impact of recombination. We can for example imagine a recessively deleterious allele which persists in low frequencies because of the lack of efficacy of selection (most alleles are found in heterozygote individuals with normal fitness). Many variants have multiple effects. Imagine that this allele has a dominant phenotypic effect which goes from being neutral to being very selectively favored. Now you have a situation where the genomic region will be dragged upward in frequency during adaptation, but, there will be many region s, not just one. Concretely, if the selective event occurred only a few generations after the original mutant the impact on the local genome would be much stronger in terms of generating homogenization than if the event occurred dozens of generations after the original mutant, as the original genetic background would have been recombined and so lost its distinctive coherency.

This is a form of natural selection from “standing variation.” Old mutants floating around in the background noise, rather than new mutants. In the paper above the authors find a fair amount of conventional selective sweeps, but, they suggest that the higher ratios of the proportion of the genome under natural selection found by some researchers in Drosophila may be due to the fact that some methods catch the whole basket of selection, while others focus on more tractable “cartoon” models.

Of the selection which can be modeled as a classic selective weep the authors also found a “power law” effect. There was a combination of a few hits of powerful selection, and more numerous bouts of weak selection. This is not totally unexpected according to theory. Some of the human traits which have been amenable to genome-wide association, such as pigmentation, probably fall under this category. Most of the trait variance is due to a few genes of large effect, but there are a larger number of loci which account for the minority balance of variance. The same no doubt can hold across evolutionary time with the dynamics of natural selection.

But we also shouldn’t get lost in the genomic trees and lose sight of the forest. Not only are evolutionary processes subject to molecular scale parameters such as recombination and mutation rates, but they are also impacted by organism and population scale parameters. One presumes that fruit flies are subject to a different pressures and have had a different history from human beings, just as both have from philopatric amphibians. Humans have an enormous census size, huge populations, and, we’ve undergone a massive change in lifestyle over the last 10,000 years. But as land bound mammals we may exhibit more population substructure than some species, for example birds with a wide range. Additionally, because of a low long term effective population we have only so much genic variation to work with. Such a welter of details distorts attempts at elegance, but they need to be kept in mind.

The authors conclude:

In summary, our findings establish a distinctive, genome-wide signature of adaptation in D. simulans, suggesting that many amino acid substitutions are beneficial and are driven by two classes of selective effects. Enabled by a richer summary of diversity patterns that avoids an a priori choice of scale, these conclusions offer a coherent interpretation of the results of previous inferences. It will now be interesting to see whether similar findings emerge in other Drosophila species, which vary in their recombination rates, effective population sizes, and ecology.

I wouldn’t limit this just to Drosophila. Because the different fruit fly species have different distributions, natural histories, as well as common ancestral traits and genes, they’re an excellent laboratory of evolution. But eventually we’ll start sweeping our gazes across all the multitudinous branches of the tree of life. Soon.

Citation: Sattath S, Elyashiv E, Kolodny O, Rinott Y, & Sella G (2011). Pervasive Adaptive Protein Evolution Apparent in Diversity Patterns around Amino Acid Substitutions in Drosophila simulans PLoS Genetics : 10.1371/journal.pgen.100130

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

372px-PresidentTaftTelephoneCrop
President William Howard Taft

It is the best of times, it is the worse of times. On the one hand the medical consequences of human genomics have been underwhelming. This is important because this is the ultimate reason that much of the basic research is funded. And yet we’ve learned so much. The genetic architecture of skin color has been elucidated, and we’ve seen a clarification of patterns of natural selection in the human genome. The finding last spring of Neandertal admixture in modern human populations is perhaps the most awesome pure science finding of late, coming close to resolving a decades old debate in anthropology. This doesn’t cure cancer, but it does connect the dots about the human past, and that’s not trivial. We are species haunted by our memories, so we might as well get them right!

But all hope is not lost. Research continues. And one area which general surveys of genomic variation have usually shown to be targets of natural selection, and, also have clear and immediate biomedical relevance, is that of metabolism. How we eat, and how we process and integrate the food we eat, is of obvious fitness relevance in the evolutionary and medical senses. It turns out that there is even variation in our saliva which is probably due to natural selection. The combination of diversity in human cuisine and susceptibility to the diseases of modern life indicate possibilities as to the relationship between past selection pressures and contemporary patterns of genetic variation. Of course one has to tread softly in this area, there are the inevitable confounds of environment, as well the unfortunate probability of any given locus being of small effect size in its influence on any given trait.

ResearchBlogging.org A new paper in Genome Research reports a SNP which seems to have been subject to natural selection in Eurasians within the last 10,000 years. This variant is located within an exon on a gene, GIP, which produces peptides critical in the regulation of various metabolic pathways, in particular insulin response. A possible biomedical relevance to risk susceptibility is then explored subsequent to the evolutionary genomic preliminaries. Adaptive selection of an incretin gene in Eurasian populations:

Diversities in human physiology have been partially shaped by adaptation to natural environments and changing cultures. Recent genomic analyses have revealed single nucleotide polymorphisms (SNPs) that are associated with adaptations in immune responses, obvious changes in human body forms, or adaptations to extreme climates in select human populations. Here, we report that the human GIP locus was differentially selected among human populations based on the analysis of a nonsynonymous SNP (rs2291725). Comparative and functional analyses showed that the human GIP gene encodes a cryptic glucose-dependent insulinotropic polypeptide (GIP) isoform (GIP55S or GIP55G) that encompasses the SNP and is resistant to serum degradation relative to the known mature GIP peptide. Importantly, we found that GIP55G, which is encoded b y the derived allele, exhibits a higher bioactivity compared with GIP55S, which is derived from the ancestral allele. Haplotype structure analysis suggests that the derived allele a t rs2291725 arose to dominance in East Asians ∼8100 yr ago due to positive selection. The combined results suggested that rs2291725 represents a functional mutation and may contribute to the population genetics observation. Given that GIP signaling plays a critical role in homeostasis regulation at both the enteroinsular and enteroadipocyte axes, our study highlights the importance of understanding adaptations in energy-balance regulation in the face of the emerging diabetes and obesity epidemics.

This is a paper with several moving parts.

-There is genomics (the broad sweep of the genome)

-Genetics (a focus on a few genes and their consequences)

-Biochemistry

-And some allusion to epidemiology, as befits a paper which comes out of a medical department

The first observation is that rs2291725 differs a great deal across populations. As I said, it’s a SNP on an exon in GIP. Not only that, it’s nonsynonomous, which means that it’s in a position to change the structure and therefore function of the biochemical which the sequence is ultimately coding for. The T allele is the ancestral variant, while the C allele is the derived one. That means that C arose as a mutation against the background of T. There is a figure which shows the geographical distribution of the variance on this SNP from the HGDP data set in the paper, but I think the HGDP browser produces a crisper display, so here it is:

rs2291725.frqs

As you can see the ancestral allele is dominant in Africa. In several populations it is fixed. In contrast among non-African populations there’s quite a bit of variation. In East Asia the derived variant is at a high frequency, though not fixed. In West Eurasia and North Africa the two variants are at rough balance, more or less. Finally, in the New World the derived variant is found in appreciable proportions, but the ancestral variant of the SNP is found at much higher proportions than in other non-African populations. Seeing as how Amerindians derive from a branch of East Eurasians, common descent from an ancestor with the derived allele can not explain the frequency discrepancy. Interestingly the HGDP Melanesians have amongst the highest frequencies of the derived allele in the data set.

In any case, most of the analysis was not done with the HGDP sample, but with the first two phases of the HapMap. The marker density is richer in this sample, and obviously it is easier to compare a few populations than dozens. So the primary populations of comparison in this study were the Chinese + Japanese (ASN), Utah Whites (CEU), and Yoruba from Nigeria (YRI). It was immediately noticeable that when doing pairwise comparisons between two populations in the HapMap data set that the SNP of interest in GIP was exceptional in between population difference when set against other nonsynonymous SNPs. The chart below shows the SNP in red, with the full distribution curve of Fst (proportion of between population difference) illustrated by the bars in blue. rs2291725 is the top 0.5% of Fst difference between ASN and YRI.

dia2

The expected Fst between continental races is on the order of ~0.15. The ASN vs. YRI difference is far greater than that, and even more exceptional when you note the skew of the distribution. As it happens there’s HapMap3 data on this SNP as well. It doesn’t add much value to the HGDP, but does confirm the general findings:

gip1

Population descriptors:
ASW (A): African ancestry in Southwest USA
CEU (C): Utah residents with Northern and Western European ancestry from the CEPH collection
CHB (H): Han Chinese in Beijing, China
CHD (D): Chinese in Metropolitan Denver, Colorado
GIH (G): Gujarati Indians in Houston, Texas
JPT (J): Japanese in Tokyo, Japan
LWK (L): Luhya in Webuye, Kenya
MEX (M): Mexican ancestry in Los Angeles, California
MKK (K): Maasai in Kinyawa, Kenya
TSI (T): Tuscan in Italy
YRI (Y): Yoruban in Ibadan, Nigeria

Now that they’ve established between population variation at the SNP, what about the structure around the SNP? Remember, the SNP is one base pair. T in the ancestral state, C in the derived. The patterns of variation flanking the SNP in GIP can tell us a lot. What they found was this:

- Africans have several different haplotypes around the T allele. A haplotype is just a set of correlated markers

- The C allele in East Asians seem to be embedded within one haplotype, or set of markers

- There was a lot of linkage disequilibrium around the C allele in East Asians

In East Asians both EHH and iHS were consistent with, if not necessarily suggestive of, selection. A plausible scenario is that the C allele was subject to a powerful bout of natural selection recently, and the allele rose so rapidly in frequency that a selective sweep dragged along the flanking regions of the genome. This would homogenize the variance in that genic region within the population in question (East Asians), as the numerous other haplotypes would decline in proportion. To show the relationships of the various haplotypes within the three HapMap populations being analyzed here they produced an unrooted tree. Observe that the haplotype in which the derived variant is embedded has only Asians and Europeans, and is on a separate branch by itself:

diab3

I noted above that just because there is a lot of linkage disequilibrium and haplotype block structure in this region of the genome, it doesn’t necessarily mean that it was a target of natural selection. There may have been stochastic phenomenon which produced these results, and so our inference would be a false positive. To check for this they ran several models and simulations which varied demographic parameters under neutral (non-selective) conditions, and for the Asian sample the iHS scores were generally not as low as those for the SNP of interest. This does not “prove” that demography can not explain these results, but it does shift the probability more toward natural selection than before.

The circumstantial evidence presented above is that the derived allele rose to frequency relatively recently (in general LD decays rapidly over time, so these tests detect more recent selective or demographic events). They ran a simulation under neutral parameters, and for the frequency of the derived haplotype it would take 100-500,000 years for the various populations to reach the values which we see (starting from the initial mutant gene copy). The latter figure is outside the bounds of modern humanity, while the former probably pre-dates the “Out of Africa” event. It is implausible that so much haplotype structure could be preserved over time, because recombination over the generations breaks apart associations between markers. Using the recombination rates, which would slowly degrade long haplotypes in the genome, the authors inferred that the C allele and its haplotype began to rise in frequency on the order of 12-2,000 years before the present.

Why would an allele rise to frequency within the past 10,000 years? The authors gave the game away in the abstract: humans shifted to different modes of primary production after the rise of agriculture. This is where the role of GIP in producing peptides which have a role in regulating our biochemistry is relevant. GIP is of a class of hormones found in the intestine called incretins:

Incretins are a group of gastrointestinal hormones that cause an increase in the amount of insulin released from the beta cells of the islets of Langerhans after eating, even before blood glucose levels become elevated. They also slow the rate of absorption of nutrients into the blood stream by reducing gastric emptying and may directly reduce food intake. As expected, they also inhibit glucagon release from the alpha cells of the Islets of Langerhans….

500px-Incretins_and_DPP_4_inhibitors.svgIncreased insulin reduces blood sugar. Diabetes is a malfunction of the insulin release mechanism, and so blood sugar begins to rise as individuals don’t uptake their glucose. Glucagon has the opposite effect, increasing blood sugar. But just because there is a change in a nonsynonymous position in an exonic region of a gene of relevance to the pathway, it doesn’t mean that that necessarily impacts the pathway which is illustrated to the left. And for natural selection to have any traction it needs to have an impact on some sort of concrete biological process (unless we’re talking intra-genomic competition of some sort).

It turns out that rs2291725 is actually just outside the primary coding region for the GIP peptide. For it to be a functional variant there needs to be more to the story. As it turns out, there are other less common variants which ware modified by changes at this SNP, GIP55S and GIP55G. The first is produced by the ancestral T allele, and the second by the derived C allele. GIP55S and GIP55G are also found in the intestine, though they only constitute a few percent of the total GIP.

gipactBut here’s where it gets really interesting: GIP55G exhibits more bioactivity over the long term. In other words it seems to be more potent the generic GIP or GIP55S, the ancestral variant. They’ve gone from supposition based on the functional significance of the broader gene, to a connection between the T→C transition over the last 10,000 years. As it turns out it may be that those with GIP55G would have a stronger insulin response, and so reduce blood sugar faster, than those without.

It doesn’t take a genius to figure out where there’re going with this. The relationship between insulin response and carbohydrates in our day and age is fraught. But we already suspect that carbs have reshaped the human genome through copy number variation in the amylase gene. It is interesting though that the derived variant has not fixed. That is, it hasn’t replaced the ancestral variant. This may be due to dominance, so that one copy is almost as efficacious as two, or, it may be due to balancing selection of some sort, which the authors suggest in the text. At this point it’s time to jump to the discussion and let the authors speak for themselves. They start out well:

Based on the gene age estimation and biochemical analyses, our study revealed a functional mutation that is associated with the selection of the GIP locus in East Asian populations ~8100 yr ago and the presence of a cryptic GIP isoform. Specifically, we showed that the inventory of human GIP peptides has recently diverged and that individuals could express three different combinations of GIP isoforms (GIP, GIP55S, and GIP55G) with distinct bioactivity profiles. Future study of how this phenotypic variation affects glucose and lipid homeostasis in response to different diets and of which physiological variations in humans can be attributed to prior gene–environmental interactions at the GIP locus is crucial to a better understanding of human adaptations in energy-balance regulation.

As I observed above many of the researchers have a biomedical background, and the NIH is funding this. The evolutionary anthropological findings, cautious as they are, are fascinating and of deep interest. But I don’t think this is going to go anywhere:

It was hypothesized by Neel almost 50 yr ago that mismatches between prior physiological adaptations and contemporary environments can lead to health risks because the ancestral variants that have been selected for the organism’s fitness or reproductive success may not be optimal for the individual’s health in the new environment…In support of this thrifty genotype hypothesis, a number of genes in humans and house mice have been implied to have coevolved with the emergence of agricultural societies…and a rapid shift in diets is associated with the detrimental effects on human survival in a number of human populations…Conceptually, the serum-resistant GIP55G carried by the GIP103C haplotype may have been beneficial for individuals who have unconstrained access to the food supply in many agricultural societies by preventing severe hyperglycemia. As selection pressure changed in these societies, the ancient GIP103T haplotype could have become a liability and conferred a loss of fitness in the new environment. In addition, we speculate that the selection of GIP in East Asians may contribute to the heterogeneity in the risk of diabetes among major ethnic groups at the present time….

Do you believe that the Han Chinese have had a surfeit of food compared to Africans over the past 10,000 years? Or compared to Europeans? Indians have had more food than Africans? The populations of the New World are in a food-poor environment? This doesn’t make any sense as an evolutionary explanation because the stable state for most of human history has been one of Malthusianism. A few people had a lot of food, ergo, the association of wealth with corpulence. Additionally, one can imagine that societies transitioning between modes of production would have a period when land would be in surplus and there was a lot of food. But for most of history life was grinding. This is simply an unbelievable story. Additionally, this SNP can’t explain most of the variation in diabetes. South Asians have the highest rates in the world, but they have appreciable proportions of the derived variant. I am of the CC (derived-derived) genotype myself (I justed checked on 23andMe), and I have a family risk of diabetes, so I know to ignore the relevance of these findings for myself when it comes to personal risk assessment.

There is probably not going to be one gene that explains diabetes, or obesity, etc. We already knew that, but there is a strange kabuki theater which goes on whereby research groups pretend as to the high significance of one locus, because how is it going to look to a granting agency that you’re out or explain ~1% of the variance in a trait for trivial predictive value? And yet usually they’re honest enough in the discussions to suggest that one finding needs to be integrated into a broader picture…as in the hundreds of other genes of interest!?!?!

This paper is fascinating as a work of human evolutionary history. They don’t have a good story, but they have results which need to be integrated into the bigger framework. But the paper is also a story of the culture of science today, driven by biomedical relevances which are often simply phantoms.

Citation: Chang CL, Cai JJ, Lo C, Amigo J, Park JI, & Hsu SY (2010). Adaptive selection of an incretin gene in Eurasian populations. Genome research PMID: 20978139

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

EarWhen I was in college I would sometimes have late night conversations with the guys in my dorm, and the discussion would random-walk in very strange directions. During one of these quasi-salons a friend whose parents were from Korea expressed some surprise and disgust at the idea of wet earwax. It turns out he had not been aware of the fact that the majority of the people in the world have wet, sticky, earwax. I’d stumbled onto that datum in the course of my reading, and had to explain to most of the discussants that East Asians generally have dry earwax, while convincing my Korean American friend that wet earwax was not something that was totally abnormal. Earwax isn’t something we explore in polite conversation, so it makes sense that most people would be ignorant of the fact that there was inter-population variation on this phenotype.

But it doesn’t end there. Over the past five years the genetics of earwax has come back into the spotlight, because of its variation and what it can tell us about the history and evolution of humans since the Out of Africa event. Not only that, it seems the variation in earwax has some other phenotypic correlates. The SNPs in and around ABCC11 are a set where East Asians in particular show signs of being different from other world populations. The variants which are nearly fixed in East Asia around this locus are nearly disjoint in frequency with those in Africa. Here are the frequencies of the alleles of rs17822931 on ABCC11 from ALFRED:
abcc11A


ResearchBlogging.org The expression of the dry earwax phenotype is contingent on an AA genotype, it has recessive expression. So in a population where the allele frequency of A ~0.50, the dry earwax phenotype would have a ~0.25 frequency. In a population where the A allele has a ~0.20 frequency, the dry earwax phenotype would be at ~0.04 frequency. Among people of European descent the dry earwax phenotype is present at proportions of less than ~5%. Because of recessive expression a larger minority of Japanese and Chinese should manifest wet earwax, though interestingly the ALFRED database indicates that Koreans are fixed for the A allele. In Africa conversely the G allele seems to be fixed.

So the question is: why? A new paper in Molecular Biology and Evolution argues that the allele frequency differences are a function of positive directional selection since humans left Africa ~100,000 years ago. The impact of natural selection on an ABCC11 SNP determining earwax type:

A nonsynonymous single nucleotide polymorphism (SNP), rs17822931-G/A (538G>A; Gly180Arg), in the ABCC11 gene determines human earwax type (i.e., wet or dry) and is one of most differentiated nonsynonymous SNPs between East Asian and African populations. A recent genome-wide scan for positive selection revealed that a genomic region spanning ABCC11, LONP2, and SIAH1 genes has been subjected to a selective sweep in East Asians. Considering the potential functional significance as well as the population differentiation of SNPs located in that region, rs17822931 is the most plausible candidate polymorphism to have undergone geographically restricted positive selection. In this study, we estimated the selection intensity or selection coefficient of rs17822931-A in East Asians by analyzing two microsatellite loci flanking rs17822931 in the African (HapMap-YRI) and East Asian (HapMap-JPT and HapMap-CHB) populations. Assuming a recessive selection model, a coalescent-based simulation approach suggested that the selection coefficient of rs17822931-A had been approximately 0.01 in the East Asian population, and a simulation experiment using a pseudo-sampling variable revealed that the mutation of rs17822931-A occurred 2006 generations (95% credible interval, 1023 to 3901 generations) ago. In addition, we show that absolute latitude is significantly associated with the allele frequency of rs17822931-A in Asian, Native American, and European populations, implying that the selective advantage of rs17822931-A is related to an adaptation to a cold climate. Our results provide a striking example of how local adaptation has played a significant role in the diversification of human traits.

The region around ABCC11 has come under scrutiny with the emergence of tests of natural selection predicated on inspecting patterns of linkage disequilibrium (LD). LD is basically measuring the association of genetic variants within the genome shifted away from expectation. A selective sweep tends to generate a lot of LD around the target of natural selection because as the allele in question rises in frequency its neighbors also hitchhike along. The hitchhiking process means that within a population you may see regions of the genome which exhibit long sequences of correlated single-nucelotide polymorphisms (SNPs), haplotypes. An initial selective event will presumably generate a very long homogenized block, which over time will break apart through recombination and mutation, as variation is injected back into the genome. The extent and decay of LD then can help us gauge the time and strength of selection events.

But LD can emerge via other processes besides natural selection. Imagine for example that a population of Africans and Europeans mix in a given generation. Europeans and Africans have different genetic makeups, on average, so the initial generations will have more LD than expectation because recombination will only slowly break apart the physical connection between genomic regions from European and African ancestors. The decay of LD then can give one a sense of the time since admixture as well as selection. Not only that, stochastic demographic events and processes are also important and may drive the emergence of LD. Consider a bottleneck where the frequency of a particular haplotype is driven up by random genetic drift alone. The details of these alternative scenarios are explored in the 2009 paper The role of geography in human adaptation.

All this is preamble to the fact that there’s a lot of LD around ABCC11. Here’s a visualization from the HapMap populations:

abcc11B

abc11From left to right you have Chinese & Japanese, Utah whites, and the Yoruba from Nigeria. An absolute value of D’ ~0 means that there’s linkage equilibrium; the default or null state where there are no atypical excessive correlations of alleles across the genome. The axes here are pairwise combinations of SNPs around ABCC11, with a focus around rs17822931, a nonsynonymous SNP which seems to be the likely functional source of the variance in earwax and other phenotypes. In terms of LD rank order the results are not surprising, across the genome East Asians tend to exhibit more LD than Europeans, and Europeans exhibit more LD than the Yoruba. Part of this is probably a function of population history, a serial bottleneck model Out of Africa would posit that drift and other stochastic forces would have a stronger impact on the genomes of East Asians than Europeans. But this seems like it can’t be the whole picture here; note the variance in allele frequency in the New World as well as in Oceania. Some of the Amerindian populations seem to have a higher frequency of the ancestral G allele on rs17822931. The figure above is easier to understand, the Y-axis is showing you the extent of heterozygosity at a given location. GA is heterozygous, GG is homozygous. Africans again tend to exhibit more heterozygosity than non-Africans, but note the sharply diminished heterozygosity for the East Asian sample around rs17822931 in ABCC11. Remember that heterozygosity tends not to go above 0.50 in a random mating population in a diallelic model (though in selective breeding it may go above 0.50 for F1 generations).

The major findings of this paper beyond what was known before seem to be a) an explicit model of how East Asians could have arrived at a high frequency of the AA genotype at rs17822931, and, b) the correlation between climate and the frequency of A. I’ll get to the second point in a bit, but what about the first? Using the nature of variation in two microsatellites flanking the SNP of interest in East Asians, and assuming a recessive selection model, the authors posit that the A allele began to rise in frequency ~50,000 years ago, and, that the selection coefficient was ~1% per generation. This a significant value for the selection parameter, and the timing is possible in light of the separation of non-Africans into a western and eastern group around that period.

But honestly I’m pretty skeptical of this. The confidence intervals don’t inspire confidence, and from what little I know selection for recessive traits should exhibit less linkage disequilibrium. At low frequencies there is very little affect of natural selection on the allele because it is mostly “masked” in heterozygotes, and therefore there will be a long period before its proportion begins to rise more rapidly. During this time recombination will have time to chop up the haplotypes around the SNP, reducing the length of the statistically associated haplotype block. Also, the authors themselves don’t seem to believe that the phenotype of earwax itself was the target of selection, so its recessive expression pattern should be less important from where I stand.

abcc11dThe idea that the genes around ABCC11 might have something to do with adaptation to cold is suggestive, but almost every East Asian trait of distinction has been hypothesized to have something to do with cold at some point by physical anthropologists. You’d figure that the Cantonese lived in igloos going by all the myriad adaptations to frigid conditions which they exhibit. The reality is that much of China, Korea and Japan are subtropical today. In any case the last figure shows the correlation across several lineages. Earlier they found that by comparing variation around this region in humans with other primates that Africans seem to be subject to purifying selection. This means that there’s constraint so that neutral forces don’t change the frequencies of functionally significant regions. It is well known that on average Africans are more diverse than non-Africans, probably because the latter are a sampling of the former, but, on a small minority of genes the reverse is true. This is likely due to the relaxation of functional constraint as humans left the ancestral African environment. And this is clearly true for rs17822931; most non-African populations exhibit some heterozygosity. East Asians here are an exception, not the rule, at having derived allele frequencies nearly fixed. The regression lines in this last figure are all statistically significant. It is interest that there are particularly strong correlations between latitude and and frequency of the derived A allele among Europeans and Native Americans. In contrast the relationship within Asian populations is weaker. Only 17% of the allele frequency variance can be explained by latitude variance among the Asian ALFRED sample.

But we shouldn’t allow the hypothesis to rise and fall just on this evidence. After all there have likely been substantial movements of populations within the last 10,000. Perhaps especially in East Asia, where the expansion of the Han south may have triggered the movement of both the Thai and Vietnamese people out of South China and into mainland Southeast Asia. The best evidence of adaptation would be among admixed populations; presumably those at higher latitudes would have higher frequencies of the AA genotype than those at lower latitudes. Instead of categorizing the populations into three coarse classes probably a more sophisticated treatment using ancestral quanta derived from STRUCTURE or ADMIXTURE as independent variables would be informative. Remember, adaptation should show evidence of decoupling ancestry from phenotype.

Finally, I have to point to this section of the discussion:

What is the cause of the selective advantage of rs17822931-A? Although the physiological function of earwax is poorly understood (Matsunaga 1962), dry earwax itself is unlikely to have provided a substantial advantage. The rs17822931-GG and GA genotypes (wet earwax) are also strongly associated with axillary osmidrosis, suggesting that the ABCC11 protein has an excretory function in the axillary apocrine gland (Nakano et al. 2009)…,

I really didn’t know what this meant. So I looked it up. Here’s what I found, A strong association of axillary osmidrosis with the wet earwax type determined by genotyping of the ABCC11 gene:

Apocrine and/or eccrine glands in the human body cause odor, especially from the axillary and pubic apocrine glands. As in other mammals, the odor may have a pheromone-like effect on the opposite sex. Although the odor does not affect health, axillary osmidrosis (AO) is a condition in which an individual feels uncomfortable with their axillary odor, regardless of its strength, and may visit a hospital. Surgery to remove the axillary gland may be performed on demand. AO is likely an oligogenic trait with rs17822931 accounting for most of the phenotypic variation and other unidentified functional variants accounting for the remainder. However, no definite diagnostic criteria or objective measuring methods have been developed to characterize the odor, and whether an individual suffers from AO depends mainly on their assessment and/or on examiner’s judgment. Human body odor may result from the breakdown of precursors into a pungent odorant by skin bacteria….

Perhaps the paper should have been titled “why barbarians smell bad”? In any case, an idea for a book title on Korean genetics: “the least smelly race.”*

Citation: Ohashi J, Naka I, & Tsuchiya N (2010). The impact of natural selection on an ABCC11 SNP determining earwax type. Molecular biology and evolution PMID: 20937735

* I’m referencing The Cleanest Race.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Last month in Nature Reviews Genetics there was a paper, Measuring selection in contemporary human populations, which reviewed data from various surveys in an attempt to adduce the current trajectory of human evolution. The review didn’t find anything revolutionary, but it was interesting to see where we’re at. If you read this weblog you probably accept a priori that it’s highly unlikely that evolution “has stopped” because infant mortality has declined sharply across developed, and developing, nations. Evolution understood as change in gene frequencies will continue because there will be sample variance in the proportions of given alleles from generation to generation. But more interestingly adaptive evolution driven by change in mean values of heritable phenotypes through natural selection will also continue, assuming:

1) There is variance in reproductive fitness

2) That that variance is correlated with a phenotype

3) That those phenotypes are at all heritable. In other words, phenotypic variation tracks genotypic variation

Obviously there is variance in reproductive fitness. Additionally, most people have the intuition that particular traits are correlated with fecundity, whether it be social-cultural identities, or personality characteristics. The main issue is probably #3. It is a robust finding for example that in developed societies the religious tend to have more children than the irreligious. If there is an innate predisposition to religiosity, and there is some research which suggests modest heritability, then all things being equal the population would presumably be shifting toward greater innate predisposition toward religion as time passes. I do believe religiosity is heritable to some extent. More precisely I think there are particular psychological traits which make supernatural claims more plausible for some than others, and, those traits themselves are partially determined by biology. But obviously even if we think that religious inclination is partially heritable in a biological sense, it is also heritable in the familial sense of values passed from one generation to the next, and in a broader cultural context of norms imposed from on high. In other words, when it comes to these sorts of phenotypic analyses we shouldn’t get too carried away with clean genetic logics. In Shall the Religious Inherit the Earth? Eric Kaufmann notes that it is in the most secular nations that the fertility gap between the religious and irreligious is greatest, and therefore selection for religiosity would be strongest in nations such as Sweden, not Saudi Arabia. But as a practical matter biologically driven shifts in trait value in this case pales in comparison to the effect of strong cultural norms for religiosity.

Below are two of the topline tables which show the traits which are currently subject to natural selection. A + sign indicates that there is natural selection for higher values of the trait, and a – sign the inverse. An s indicates stabilizing selection, which tells you that median values have higher fitnesses than the extremes. The number of stars is proportional to statistical significance.


future1

future2

Some of this is not surprising. The age of the onset of menarche has been dropping in much of the world. I suspect this is mostly due to better nutrition, but a consequence of this shift is earlier fertility for some females. The authors are nervous about the robust correlation of higher fertility with lower intelligence, but notice that the pattern for wealth and income is different and more complicated. The key is to look at education. Whether you believe intelligence exists or not in any substantive concrete sense, those who are more intelligent are more likely to have had more education, and there’s a rather common sense reason why investing in more schooling would reduce your fertility: you simply forgo some of your peak reproductive years, especially if you’re female. The higher you go up the educational ladder the stronger the anti-natalist cultural and practical pressures become (the latter is a heavier burden for females because of their biological centrality in child-bearing, but both males and females are subject to the former). As with religion even if the differences have no biological implication because you believe the correlations are spurious or reject the existence of the trait one presumes that parents and subcultures pass on values to offspring. If higher education has anti-natalist correlations we shouldn’t be surprised if subsequent generations turn away from higher education. Their parents were the ones who were more likely to avoid it.

We live in interesting times.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

evocomplexEvolution means many things to many people. On the one hand some scholars focus on time scales of “billions and billions,” and can ruminate upon the radical variation in body plans across the tree of life. Others put the spotlight on the change in gene frequencies on the scale of years, of Ph.D. programs. While one group must glean insight from the fossil remains of trilobites and ammonites, others toils away in dimly lit laboratories breeding nematodes and fruit flies, generations upon generations. More recently a new domain of study has been focusing specifically on the arc of animal development as a window onto the process of evolution. And so forth. Evolution has long been dissected by an army of many specialized parts.

ResearchBlogging.org And yet the core truth which binds science is that nature is one. No matter the disciplinary lens which we put on at any given moment we’re plumbing the same depths on some fundamental level. But what are the abstract structures of those depths? Can we project a tentative map of the fundamentals before we go exploring through observation and experiment? That’s the role of theoreticians. Charles Darwin, R. A. Fisher, and Sewall Wright. Evolution is a phenomenon which is on a deep level an abstraction, though through objectification we speak of it as if it was as concrete as the frills of the Triceratops. As an abstraction it is open to mathematical formalization. Models of evolution may purport to tell us how change over time occurs in specific instances, but the ultimate aim is to capture the maximum level of generality possible.

Though the original mathematical theoreticians of evolution, in particular R. A. Fisher and Sewall Wright, were critical in the formation of the Modern Neo-Darwinian Synthesis, their formal frameworks were not without critics from within the mainstream. Ernst W. Mayr famously rejected “beanbag genetics,” the view propounded specifically by R. A. Fisher and J.B. S. Haldane in England that a model of evolution could be constructed from singular genetic elements operating independently upon traits. Mayr, as an ecologist and naturalist, believed that this framework lacked the essential integrative or holistic aspect of biology as it manifested in the real world. Selection after all operated proximately on the fitness of the whole organism. We’ve come a long way since those debates. One of the problems with the earlier disputes is that they were not sufficiently informed by the empirical evidence because of the primitive nature of experimental and observational evolutionary biology. Molecular biology changed that, and now the rise of genomics has also become a game changer. Genomics gets at the concrete embodiment of evolutionary change at its root, the structure and variation of the genomes of organisms.

A new paper in PNAS is a nice “mash-up” of the old and the new, Genomic patterns of pleiotropy and the evolution of complexity:

Pleiotropy refers to the phenomenon of a single mutation or gene affecting multiple distinct phenotypic traits and has broad implications in many areas of biology. Due to its central importance, pleiotropy has also been extensively modeled, albeit with virtually no empirical basis. Analyzing phenotypes of large numbers of yeast, nematode, and mouse mutants, we here describe the genomic patterns of pleiotropy. We show that the fraction of traits altered appreciably by the deletion of a gene is minute for most genes and the gene–trait relationship is highly modular. The standardized size of the phenotypic effect of a gene on a trait is approximately normally distributed with variable SDs for different genes, which gives rise to the surprising observation of a larger per-trait effect for genes affecting more traits. This scaling property counteracts the pleiotropy-associated reduction in adaptation rate (i.e., the “cost of complexity”) in a nonlinear fashion, resulting in the highest adaptation rate for organisms of intermediate complexity rather than low complexity. Intriguingly, the observed scaling exponent falls in a narrow range that maximizes the optimal complexity. Together, the genome-wide observations of overall low pleiotropy, high modularity, and larger per-trait effects from genes of higher pleiotropy necessitate major revisions of theoretical models of pleiotropy and suggest that pleiotropy has not only allowed but also promoted the evolution of complexity.

The basic thrust of this paper is to test older theoretical models of evolutionary genetics and their relationship and dependence on pleiotropy against new genomic data sets. In The Genetical Theory of Natural Selection R. A. Fisher proposed a model whereby all mutations affect every trait, and the effect size of the mutations exhibited a uniform distribution. Following in Fisher’s wake the evolutionary geneticist H. Allen Orr published a paper ten years ago, Adaptation and the cost of complexity, which argued that “…the rate of adaptation declines at least as fast as n-1, where n is the number of independent characters or dimensions comprising an organism.” This is the “cost of complexity,” which lay at the heart of this paper in PNAS.

To explore these questions empirically the authors looked at five data sets:

- yeast morphological pleiotropy, is based on the measures of 279 morphological traits in haploid wild-type cells and 4,718 haploid mutant strains that each lack a different nonessential gene (this also yielded quantitative measures)

- yeast environmental pleiotropy, is based on the growth rates of the same collection of yeast mutants relative to the wild type in 22 different environments

- yeast physiological pleiotropy, is based on 120 literature-curated physiological functions of genes recorded in the Comprehensive Yeast Genome Database (CYGD)

- nematode pleiotropy, is based on the phenotypes of 44 early embryogenesis traits in C. elegans treated with genome-wide RNA-mediated interference

- mouse pleiotropy, is based on the phenotypes of 308 morphological and physiological traits in gene-knockout mice recorded in Mouse Genome Informatics (MGI)

pleio1The first figure shows the results of the survey. You see in each data set the mean and median number of traits affected by mutations on a given gene, as well as the distribution of effects. Two conclusions are immediately evident, 1) most genes have a relationship only to a small number of traits, 2) very few genes have a relationship to many traits. You also see the percentages of genes impacted by pleiotropy is rather small. This seems to immediately take off the table simplifying assumptions of a mutant variant producing changes across the full range of traits in a complex organism. Additionally the effects do not seem to exhibit a uniform distribution; rather, they’re skewed toward genes which are minimally or trivially pleiotropic. From the text:

Our genome-wide results echo recent small-scale observations from fish and mouse quantitative trait locus (QTL) studiies…and an inference from protein sequence evolution…and reveal a general pattern of low pleiotropy in eukaryotes, which is in sharp contrast to some commonly used theoretically models…that assume universal pleiotropy (i.e., every gene affects every trait)

So if the theoretical models are wrong, what’s right? In this paper the authors argue that it seems as if pleiotropy has a modular structure. That is, mutations tend to have impacts across sets of correlated traits, not across a random distribution of traits. This is important when we consider the fitness implications of mutations, for if the impacts were not modular but randomly distributed the putative genetic correlations which would more likely serve as dampeners on directional change in trait value.

Figure 2 shows the high degree of modularity in their data sets:

peio2

pleio3Now that we’ve established that mutations tend to have clustered effects, what about their distribution? Fisher’s original model postulated a uniform distribution. The first data set, the morphological characteristics of baker’s yeast, had quantitative metrics. Using the results from 279 morphological traits they rejected the assumption of a uniform distribution. In fact the distribution was closer to normal, with a central tendency and a variance about the mode. Second, they found that standard deviations of effect sizes varied quite a bit as well. Many statistical models assume invariant standard deviations, so it is not surprising that that was the initial assumption, but I doubt many will be that surprised that the assumption turns out not to be valid. The question is: does this matter?

Yes. Within the parameter space being explored one can calculate distances which we can use to measure the effect of mutations. Panels C to F show the distances as a function of pleiotropic effect. The left panels are Euclidean distances while the right panels are Manhattan distances. The first two panels show the outcomes from the parameter values generated from their data sets. The second two panels use randomly generated effect sizes assuming a normal distribution. The last two panels use randomly generated effect sizes, and, assume a constant standard deviation (as opposed to the empirical distribution of standard deviations which varied).

To connect these empirical results back to the theoretical models: there are particular scaling parameters, the values of which the earlier models assumed, but which can now be calculated from the real data sets. It turns out that the empirical scaling parameter values differ rather significantly from the assumed parameter values, and this changes the inferences one generates from the theoretical models. The empirically calculated value of b = 0.612, as an exponent on the right hand side of the equation which generates the distances within the parameter space. From the text: “the invariant total effect model…assumes a constant total effect size (b = 0), whereas the Euclidian superposition model…assumes a constant effect size per affected trait (b = 0.5).” Instead of looking at the number value, note what each value means verbally. What they found in the empirical data was that there was variant effect size per affected trait. In this paper the authors found larger per-trait effects for genes affecting more traits, and this seems to be a function of the fact that b > 0.5; with a normal distribution of effect sizes and a variance in the standard deviation of effect sizes.

This all leads us back to the big picture question: is there cost of complexity?Substituting in the real parameters back into the theoretical framework originated by Fisher, and extended by H. Allen Orr and others, they find that the cost of complexity disappears. Mutations do not effect all traits, so more complex organisms are not disproportionately impacted by pleiotropic mutations. Not only that, the modularity of pleiotropy likely decreases the risk of opposing fitness implications due to a mutation, since similar traits are more likely to be similarly effected in fitness. These insights are summarized in the last figure:

pleio4

The one to really focus on is panel A. As you can see there is a sweet spot in complexity when it comes to the rate of adaptation. Contra earlier models there isn’t a monotonic decrease in the rate of adaptation as a function of complexity, but rather an increase until to an equipoise, before a subsequent decrease. At least within the empirically validated range of the scaling exponent. This is important because we see complex organisms all around us. When theory is at variance with the observational reality we are left to wonder what the utility of theory is (here’s looking at your economists!). By plugging empirical results back into the theory we now have a richer and more robust model. I will let the authors finish:

First, the generally low pleiotropy means that even mutations in organisms as complex as mammals do not normally affect many traits simultaneously. Second, high modularity reduces the probability that a random mutation is deleterious, because the mutation is likely to affect a set of related traits in the same direction rather than a set of unrelated traits in random directions…These two properties substantially lower the effective complexity of an organism. Third, the greater per-trait effect size for more pleiotropic mutations (i.e., b > 0.5) causes a greater probability of fixation and a larger amount of fitness gain when a beneficial mutation occurs in a more complex organism than in a less complex organism. These effects, counteracting lower frequencies of beneficial mutations in more complex organisms…result in intermediate levels of effective complexity having the highest rate of adaptation. Together, they explain why complex organisms could have evolved despite the cost of complexity. Because organisms of intermediate levels of effective complexity have greater adaptation rates than organisms of low levels of effective complexity due to the scaling property of pleiotropy, pleiotropy may have promoted the evolution of complexity. Whether the intriguing finding that the empirically observed scaling exponent b falls in a narrow range that offers the maximal optimal complexity is the result of natural selection for evolvability or a by-product of other evolutionary processes…requires further exploration.

Citation: Wang Z, Liao BY, & Zhang J (2010). Genomic patterns of pleiotropy and the evolution of complexity. Proceedings of the National Academy of Sciences of the United States of America PMID: 20876104

Image credit: Moussa Direct Ltd., http://evolutionarysystemsbiology.org

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

ashjewheadshotLast spring two very thorough papers came out which surveyed the genetic landscape of the Jewish people (my posts, Genetics & the Jews it’s still complicated, Genetics & the Jews). The novelty of the results was due to the fact that the research groups actually looked across the very diverse populations of the Diaspora, from Morocco, Eastern Europe, Ethiopia, to Iran. They constructed a broader framework in which we can understand how these populations came to be, and how they relate to each other. Additionally, they allow us to have more perspective as to the generalizability of medical genetics findings in the area of “Jewish diseases,” which for various reasons usually are actually findings for Ashkenazi Jews (the overwhelming majority of Jews outside of Israel, but only about half of Israeli Jews).

Just as the two aforementioned papers were deep explorations of the genetic history of the Jewish people, and allowed for a systematic understanding of their current relationships, a new paper in PNAS takes a slightly different tack. First, it zooms in on Ashkenazi Jews. The Jews whose ancestors are from the broad swath of Central Europe, and later expanded into Poland-Lithuania and Russia. The descendants of Litvaks, Galicians, and the assimilated Jewish minorities such as the Germans Jews. Second, though constrained to a narrower population set, the researchers put more of an emphasis on the evolutionary parameter of natural selection. Like any population Jews have been impacted by drift, selection, migration (and its variant admixture), and mutation. Teasing apart these disparate parameters may aid in understanding the origin of Jewish diseases.

ResearchBlogging.org The paper is open access, so you don’t have to take my interpretation as the last word. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population:

The Ashkenazi Jewish (AJ) population has long been viewed as a genetic isolate, yet it is still unclear how population bottlenecks, admixture, or positive selection contribute to its genetic structure. Here we analyzed a large AJ cohort and found higher linkage disequilibrium (LD) and identity-by-descent relative to Europeans, as expected for an isolate. However, paradoxically we also found higher genetic diversity, a sign of an older or more admixed population but not of a long-term isolate. Recent reports have reaffirmed that the AJ population has a common Middle Eastern origin with other Jewish Diaspora populations, but also suggest that the AJ population, compared with other Jews, has had the most European admixture. Our analysis indeed revealed higher European admixture than predicted from previous Y-chromosome analyses. Moreover, we also show that admixture directly correlates with high LD, suggesting that admixture has increased both genetic diversity and LD in the AJ population. Additionally, we applied extended haplotype tests to determine whether positive selection can account for the level of AJ-prevalent diseases. We identified genomic regions under selection that account for lactose and alcohol tolerance, and although we found evidence for positive selection at some AJ-prevalent disease loci, the higher incidence of the majority of these diseases is likely the result of genetic drift following a bottleneck. Thus, the AJ population shows evidence of past founding events; however, admixture and selection have also strongly influenced its current genetic makeup.

The sample size of Ashkenazi Jews was ~400, and they looked at ~700,000 SNPs. As I said, how Jews relate to other populations really isn’t at the core of this paper as it was in the earlier ones from the spring, but there were the PCA plots (sorry Mike), a frappe bar plot, and a phylogenetic tree derived from Fst statistic. Again, remember that PCA is showing you the largest independent components of genetic variation within the data. The bar plot has a set of ancestral populations of which individuals are composites of. And finally, Fst measures between population component of genetic variation. The larger the Fst across two populations the bigger the genetic distance.

[nggallery id=6]

Using the Druze & Palestinians as the ancestral Middle Eastern reference the authors estimated that the European admixture into Ashkenazi Jews is on the order of 30-55%. This is in the same ballpark as the previous studies, so no great surprise. As I stated in earlier posts the authors can spin the same results in very different ways. From what I can tell these authors are inclined to emphasize the strong possibility that in terms of genetic distance Ashkenazi Jews are somewhat closer to Europeans than they are to Levantine Arabs. Of course these sorts of assertions need to be handled with care. The genetic distance between Ashkenazi Jews and Tuscans is less than half that between Ashenazi Jews and Russians, while the Jewish-Russian value is about 50% larger than the Jewish-Palestinian one. Remember that there’s a fair amount of circumstantial evidence that Tuscans may themselves be a relatively recent hybrid population between indigenous residents of the Italian peninsula and Near Easterners.

ashjtab1One thing that this paper does do is rebut any strong assertion that Ashkenazi Jews are a genetically homogeneous population which went through a powerful bottleneck. Basically, the idea that Jewish diseases are just an outcome of the operational inbreeding that occurs when genetic variation is expunged from a population through low effective population size. The clincher seems to be comparison of heterozygosity of Ashkenazi Jews and gentile Europeans. The former are actually somewhat more heterozygous than the latter. There’s been a bit of evidence from previous research that the long term effective population size of Ashkenazi Jews was not necessarily very small, so this isn’t a total surprise. Remember that heterozygosity simply means the fraction of individuals heterozygous at a locus.

One way you can become heterozygous is naturally admixture. Remember that populations differ across many genes. As an example, there’s a pigmentation gene, SLC24A5, where all Europeans are at one state, and all West Africans in another. Naturally African Americans exhibit much more heterozygosity on this locus than the ancestral populations. The Ashkenazi Jewish case is less extreme because the two parental populations are genetically closer, but the principle still holds.

A consequence of recent admixture between genetically different populations are high levels of linkage disequilibrium, non-random associations of alleles at different loci across the genome. Why? There are many genes where two populations may be very different. Offspring inherit half their genome from one parent, and half from the other, and the parents pass along to their offspring particular associations of alleles. There may be a set of European distinctive alleles on a chromosome, and an African distinctive set of alleles, so that in a hybrid individual the alleles are strongly correlated across loci. These associations are broken down over time by recombination. The regularity of this process can serve as a clock with which to measure the period since admixture. African Americans were used to calibrate the time since admixture for the Uyghur people of western China, who are mixed from West and East Eurasian populations. The authors did not do this in this paper, I assume because the ancestral populations were genetically rather close in comparison to the two above examples, so there’d be less linkage disequilibrium to break down in the first place.

In the Ashkenazi Jewish population they found more linkage disequilibrium than in Europeans as well as longer haplotypes. This could be the result of a population bottleneck where drift could drive up the frequency of blocks of the genome, but as they note in the paper that should probably reduce heterozygosity. The natural inference then is that admixture between distinct populations can explain both data points.

ashslselectBut let’s cut to the chase. What genes exhibit signatures of natural selection in Ashkenazi Jews? More precisely, what distinctive regions of the genome exhibit signatures of natural selection? They used the standard haplotype type based methods. Basically you’re looking for regions of the genome where there are long blocks of correlated alleles, signs of a selective sweep due to a favored variant which dragged along flanking genomic regions as it rose rapidly in frequency, more rapidly than recombination could break apart the associations. Because recombination does breaks up associations over time, you need the selective sweeps to be relatively recent to detect them with these methods. Since the Jewish people, and Ashkenazi Jews more particularly, are relatively recent historically timing shouldn’t be an issue for Jewish specific sweeps. But another factor is that the two primary tests they used, EHH and iHS, are not good at picking up sweeps which are just starting. EHH is geared toward sweeps which are almost complete, so the frequency of the selected allele is near 100%. iHS is better are mid-range values. Using a combination of these two techniques they found that six genes which are implicated in diseases characteristic of Ashkenazi Jews have the hallmarks of natural selection. Natural selection is self-evident, so what seems to have been going here is that the disease was simply a side effect or byproduct of adaptation.

The strongest signal they found was in ALDH2. The strongest signal in Europeans, LCT, was not found in Ashkenazi Jews. But is LCT a strong signal in Europeans? Many Southern European populations have low frequencies of the derived LCT allele, indicating that they haven’t been subject to strong selection for lactase persistence. These are the same populations genetically close to the Ashkenazi Jews. The authors suggest that the Jewish-European admixture occurred before the sweep of the derived LCT allele, but it seems more plausible that the Ashkenazim simply admixed with a European population, such as Italians, which do not exhibit much lactase persistence. As for ALDH2, the association between genetic variation on this locus and alcoholism is well known, and has been used to explain the low Jewish rates of the disease. In this case, the authors posit that protection from alcoholism is a positive side effect of natural selection:

The mechanism driving selection of the ALDH2 locus is unknown, but a plausible target of selection also within this selected region is the TRAFD1/FLN29 gene, which is a negative regulator of the innate immune system, important for controlling the response to bacterial and viral infection (49). TRAFD1/FLN29 may have conferred a selective advantage in the immune response to a pathogen, perhaps near the time that the Jews returned to Israel from their Babylonian captivity. Despite the unclear selective mechanism, this remains a remarkable example of a putatively selected region accounting for a known population phenotype.

Many of the other loci naturally did not show signatures of natural selection. But this sort of work is exploratory, and there are limits to the power of their techniques. As it is, it seems that we’re very far along on understanding the phylogenetic tree of the Jewish people, and we’re finally getting a grip on the exogenous parameters which might prune the branches.

Citation: Steven M. Bray, Jennifer G. Mulle, Anne F. Dodd, Ann E. Pulver, Stephen Wooding, & Stephen T. Warren (2010). Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population PNAS : 10.1073/pnas.1004381107

Related: John Hawks, New data on Ashkenazi population history.

Image Credit: Wikimedia

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

480px-Olivia_MunnOne of the major issues which has loomed at the heart of biology since The Origin of Species is why species exist, as well as how species come about. Why isn’t there a perfect replicator which performs all the conversion of energy and matter into biomass on this planet? If there is a God the tree of life almost seems to be a testament to his riotous aesthetic sense, with numerous branches which lead to convergences, and a inordinate fascination with variants on the basic morph of beetles. From the outside the outcomes of evolutionary biology look a patent mess, a sprawling expanse of experiments and misfires.

A similar issue has vexed biologists in relation to sex. Why is it that the vast majority of complex organisms take upon themselves the costs of sex? The existence of a non-offspring bearing form within a species reduces the potential natural increase by a factor of two before the game has even begun. Not only that, but the existence of two sexes who must seek each other out expends crucial energy in a Malthusian world (selfing hermaphrodites obviously don’t have this problem, but for highly complex organisms they aren’t so common). Why bother? (I mean in an ultimate, not proximate, sense)

It seems likely that part of the answer to both these questions on the grande scale is that the perfect is the enemy of long term survival. Sexual reproduction confers upon a lineage a genetic variability which may reduce fitness by shifting populations away from the adaptive peak in the short term, but the fitness landscape itself is a constant bubbling flux, and perfectly engineered asexual lineages may all too often fall off the cliff of what was once their mountain top. The only inevitability seems to be that the times change. Similarly, the natural history of life on earth tells us that all greatness comes to an end, and extinction is the lot of life. The universe is an unpredictable place and the mighty invariably fall, as the branches of life’s tree are always pruned by the gardeners red in tooth and claw.

ResearchBlogging.org But it is one thing to describe reality in broad verbal brushes. How about a more rigorous empirical and theoretical understanding of how organisms and the genetic material through which they gain immortality play out in the universe? A new paper which uses plant models explores the costs and benefits of admixture between lineages, and how those two dynamics operate in a heterogeneous and homogeneous world. Population admixture, biological invasions and the balance between local adaptation and inbreeding depression:

When previously isolated populations meet and mix, the resulting admixed population can benefit from several genetic advantages, including increased genetic variation, the creation of novel genotypes and the masking of deleterious mutations. These admixture benefits are thought to play an important role in biological invasions. In contrast, populations in their native range often remain differentiated and frequently suffer from inbreeding depression owing to isolation. While the advantages of admixture are evident for introduced populations that experienced recent bottlenecks or that face novel selection pressures, it is less obvious why native range populations do not similarly benefit from admixture. Here we argue that a temporary loss of local adaptation in recent invaders fundamentally alters the fitness consequences of admixture. In native populations, selection against dilution of the locally adapted gene pool inhibits unconstrained admixture and reinforces population isolation, with some level of inbreeding depression as an expected consequence. We show that admixture is selected against despite significant inbreeding depression because the benefits of local adaptation are greater than the cost of inbreeding. In contrast, introduced populations that have not yet established a pattern of local adaptation can freely reap the benefits of admixture. There can be strong selection for admixture because it instantly lifts the inbreeding depression that had built up in isolated parental populations. Recent work in Silene suggests that reduced inbreeding depression associated with post-introduction admixture may contribute to enhanced fitness of invasive populations. We hypothesize that in locally adapted populations, the benefits of local adaptation are balanced against an inbreeding cost that could develop in part owing to the isolating effect of local adaptation itself. The inbreeding cost can be revealed in admixing populations during recent invasions.

First, plants are good models to explore evolutionary genetics. They’re not as constrained as say mammals, or the typical tetrapod, when it comes to barriers to gene flow between distinct taxa. Hybridization is common, and plants can also self-fertilize as well as cross-fertilize, allowing researchers to push the genetic pool in different directions (“selfing” obviously reduces the effective population and is an extreme form of inbreeding, so it’s a good way to purge genetic variation really quickly). In a perfect abstract world of evolution one might imagine Richard Dawkins’ vehicles and replicators as fluid entities which float along a turbid sea of evolutionary genetic parameters, drift, migration, mutation and selection. But reality is constrained to DNA substrate, which have their own parameters such as recombination, modulators such as epigenetics, and numerous ways to express variation through gene regulation. It’s complicated, and stripping the issues down to their pith is easier said that done.

But the broader dynamics here being examined is the generalist-specialist trade-off, which I think is relevant to the two issues I introduced earlier in this post. Specialists are optimized for their own position in the adaptive landscape, but have difficulties when it is perturbed. Generalists always less than maximum fitness in all landscapes, but higher average fitness across them because they can adapt to changes. Specialization is local adaptation of particular lineages, while in the generalist case you can have invasive species in novel environments. They’re obviously facing an adaptive landscape which is at some remove from what any of the introduced genotypes were “optimized” for, so hybridization produces something new for something new.

In the first figure of the paper you see F3 wild barley descended from two parental lineages, ME and AQ. The left panels show seed output as a function of heterozygosity, and the right panels as a function of ME genome content. Remember that in subsequent generations the descendants of hybrids will vary quite a big in genetics and phenotype as the original alleles re-segregate.

F1.large

The takeaway is that in novel environments genetic variation seems to result in increased fitness. Why? One concept which one has to introduce is heterosis, whereby crosses between homogeneous lineages produce more fitness offspring. One reason this may be is that there is overdominance, where heterozygotes have greater fitness than the homogyzotes. This is the case with sickle-cell malaria disease. Another reason may be that in the original parental lineages there was a higher fraction of alleles which were deleterious in homozygote genotypes. In plain English, inbreeding resulted in genetic drift which cranked up the proportion of alleles implicated in recessively express negative phenotypes. The authors argue though that in the context local adaptation is strong enough to be a barrier against too much gene flow between the parental wild barely lineages, so the deleterious alleles are less likely to be masked. Only in a novel environment when that benefit was removed from the equation could the negative consequences of inbreeding come to the fore in the total calculus.

Figure 2 shows the results of experiments which examine the fitness of white campion, a European species which has been introduced in North America. In the left panel are crosses between native European lineages, with distance between parental lineages on the x-axis. In the right panel you have the same experiment, but with North American variants, which are products of introductions from various regions of Europe. The plants were grown in a “common garden,” to show how all the genotypes performed when environment was controlled.

F2.large

As you can see moderate levels of hybridization entailed a benefit in the European variants, but not the North American variants. Hybridization between variants which were too distant did produce outbreeding depression in the European case, suggesting perhaps that disruption of co-adapted gene complexes resulted in a greater fitness cost than the masking of deleterious alleles due to inbreeding. One can make the inference from these data that the introduced white campion lineages are already hybridized, the barriers to crossing being removed by a disruption of the adaptive landscapes which each native lineages was optimized for.

Here are the authors from the discussion talking about invasions of exotic species:

Provided that multiple introductions from different source populations have occurred, the benefits of admixture become freely available to introduced populations that do not yet show a pattern of local adaptation. Because the benefits are potentially large, admixture may play an important role during early invasions. Native populations often show evidence of inbreeding depression…and one instant reward of admixture in the introduced range is the release of this genetic burden. Such heterosis effects can contribute significantly to the establishment and early success of invasive species…When tested together in a common garden experiment, invaders can show enhanced fitness-related traits compared with populations from their native range…If there is evidence of admixture, the effects of heterosis might be a default explanation for such observations, perhaps providing a null expectation against which other explanations (such as trait evolution) need to be tested.

What have plants to do with life as a whole? I assume much. Plants differ in the details, but compared to other complex multicellular organisms in regards to evolutionary genetics they’re quite liberated. By this, I mean that their modes of reproduction and promiscuity in hybridization make them more of an ideal “frictionless” test case of evolutionary biology and the power of the classical parameters. Perhaps given enough time natural selection would produce the ideal replicator to rule them all, to drive all others to extinction. But that day is not this day. And that day may never come because the universe is far too protean and erratic. Life is varied, on the phenotypic and genotypic level, and the exogenous processes of climate and geology continue to warp and reshape the adaptive landscape. And more subtly, but just as critically, life is always in an endless race with itself, as pathogens co-evolve with their hosts, and predators figure out how to outfox their prey. Life warps its own adaptive landscapes, and the innovation of one branch may lead to extinction of others as well as the proliferation of new branches.

More prosaically and anthropocentrically what does this say about us? Humans are an expansive species, and over the past 500 years different lineages have been hybridizing promiscuously. New genotypes have arisen in altered landscapes, and our pathogens are also riding the high tide of globalization onward and upward. We are ourselves a “natural experiment.”

Image Credit: Olivia Munn by Gage Skidmore

Link hat tip: Dienekes.

Citation: Verhoeven KJ, Macel M, Wolfe LM, & Biere A (2010). Population admixture, biological invasions and the balance between local adaptation and inbreeding depression. Proceedings. Biological sciences / The Royal Society PMID: 20685700

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

How we perceive nature and describe its shape are a matter of values and preferences. Nature does not take notice of our distinctions; they exist only as instruments which aid in our comprehension. I’ve brought this up in relation to issues such as categorization of recessive vs. dominant traits. The offspring of people of Sub-Saharan African and non-African ancestry where the non-African parent has straight or wavy hair tend to have very curly hair. Therefore, one may say that the tightly curled hair form is dominant to straight or wavy hair. But, it is also the case that there is some modification in relation to the African parent in the offspring, so the dominance is not complete. When examining the morphology of the follicle, which determines the extent of the hair’s curl, the offspring may in fact exhibit some differences from both parents. In other words our perception of the outcomes of inheritance are contingent to some extent on our categorization of the traits as well as our specific focus along the developmental pathway.

Or consider the division between “traits” and “diseases.” The quotations are necessary. Lactose intolerance is probably one of the best cases to illustrate the gnarly normative obstructions which warp our perceptions. As a point of fact lactose intolerance is the ancestral human state, and numerically predominant. It is the “wild type.” Lactose tolerance is a relatively recent adaptation, found among a variety of West Eurasian and African populations. A more politically correct term, lactase persistence, probably better encapsulates the evolutionary history of the trait, which has shifted from the class of disease to that of genetic trait when we evaluate the bigger picture (obviously diseases are simply “bad” traits”).


Sometimes though the issues are more cut & dried. No one would doubt that sickle-cell anemia is a disease. It has a major fitness impact in a colloquial sense, as well as evolutionarily. It kills you, and it kills your potential genetic lineage. But, it is also a byproduct of adaptation to endemic malaria. Sickle-cell disease one of the classical illustrations of heterozygote advantage, whereby those who carry one copy of the mutation on the gene have increased fitness vis-a-vis those who carry two normal copies of the gene. The increase in frequency of the mutant gene though is balanced by the fact that mutant homozygotes have decreased fitness.

We can then construct a narrative of the long term evolutionary dynamics from this initial condition. When a new exogenous stress hits a population mean fitness drops immediately (take a look at the biographies of the Popes, and observe how many died of malaria in the Dark Ages when that disease was new to Italy). Natural selection quickly increases in frequency any alleles which confer protection against the exogenous stress. But, baked into the cake of how genetics in complex organisms usually works, one allele may often have multiple downstream consequences. This is pleiotropy. This means that if a change at a locus increases aggregate fitness, it may nevertheless destabilize long established biochemical pathways. In the short term evolution simply takes the net fitness impact into account. Over the long term one assumes that “better solutions” will emerge which do not have so high a fitness drag, perhaps through the evolution of modifier genes which mask the deleterious outcomes of the initial mutant. This sort of ad hoc trial and error and “duct-taping” of kludges is part and parcel of how adaption works in situations where shocks out of equilibrium states are common.

In many cases the byproducts of a genetic change may be benign. To my knowledge no one knows major negative consequences of carrying the alleles which confer lactase persistence (excepting some studies indicating higher obesity, but this seems a marginal fitness impact which has only come to the fore in the past century in all likelihood). But in other cases the outcomes may not be as serious as that of sickle-cell anemia, but may rise above the level of significance where one must note the existence of a disease which is a secondary consequence of adaptation to meet a new challenge.

Yesterday I pointed to a paper which illustrates just this phenomenon, Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans:

African-Americans have higher rates of kidney disease than European-Americans. Here, we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. Apolipoprotein L-1 (ApoL1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.

In its implementation the paper has a lot of moving parts, but the outcome is straightforward. If you haven’t, you might read Genomes Unzipped and its post How to read a genome-wide association study. This is a case where the original association studies were not reporting false results, but, it seems that one had to take a further step to really understand the likely molecular genetic and evolutionary underpinnings of what was going on. These results suggest that the original signals of association for variants within the MYH9 gene were actually signals from within APOL1, which happened to be next to MYH9. The region around MYH9 had already showed up in tests to detect natural selection through patterns of linkage disequilibrium (non-random associations of alleles at different loci within the genome, in this case the relevant consideration are adjacent loci across continuous regions of the genome which come together to form haplotype blocks). Since the footprint of natural selection on the genome is often wide that did not imply that MYH9 was the target of natural selection per se, opening the likely possibility for other causal associations. A convenience in light of the difficulty of establishing a plausible functional relationship between renal failure and MYH9.

To explore the possibility of nearby functional candidates the researchers focused on a number of alleles within this genomic region which exhibited maximal European-African frequency differences in the 1000 Genomes Project. Once they ascertained the between population differences they then looked at differences in allele frequencies in cases and controls within the African American population for the two diseases in question (those with the trait/disease vs. those without). Table 1 has the top line raw results:

apo1

WT = “Wild Type,” the ancestral allelic variant found in most populations. G1 and G2 are two haplotypes, associated alleles across the locus of the APOL1 gene. G1 consists of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) within an exonic region of APOL1. Non-synonymous simply means that a change at that base pair alters the amino acid coded, and exons are the genomics regions whose information is eventually translated into proteins. In other words, these are non-neutral functionally significant genomic regions which do something. G2 is a 6 base pair deletion, rs71785313, close to G1 in APOL1.

apo12To more formally model the relationship between the alleles which are found to differ between cases and controls they performed a logistic regression. The alleles serve as independent variables which can predict the probable outcome of the dependent variable, the probability of FSGS or H-ESKD in this case (renal failure). Figure 1 to the left has a summary of some of the results of the regression in graphical form for FSGS. I’ve rotated it so it can fit on the screen. Basically the strong signals are to the right of the chart (from your perspective). The y-axis displays (horizontal from your perspective) negative-log of p-values for a signal at a particular marker, which is defied by the x-axis (vertical for you). The labels show the particular gene at that genomic position. The smaller the p-value, the more probable that the signal is real and not random. This produces huge spikes in the negative-log values (in the body of the paper they present p-values on the order of 10-35).

You can see that it is in APOL1 that the biggest signals reside. The first panel, A, throws all the SNPs into the mix. On MYH9 they highlight a few SNPs which combine to form the E-1 haplotype, which is strongly associated with cases (this is where the association between disease and genetic variants on MYH9 are coming from). This haplotype is found in conjunction with G1 and G2 on APOL1. E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2. A classic illustration of likely correlation but not causation. The second panel controls for the effect of G1. In other words, this is showing you the variation in the dependent variable that remains after you take the largest independent variable, G1, into account. The G2 haplotype is the largest effect independent variable after G1 is taken into account; in other words, it explains most of the residual variation in FSGS probability. Finally, the last panel controls for both G1 and G2. As you can see there aren’t any major signals left; the distribution is relatively flat. Logically once you account for the variables which produce change in an outcome you shouldn’t see any impact of other variables. And that’s what happens here. They also performed controls where MYH9 was held constant, and that does not eliminate the signals in APOL1. MYH9 is conditional on its correlation with APOL1. This was the correlation which showed up on the original association studies. The exact same pattern of signals within the logistic regression model was replicated for H-ESKD. G1 had the strongest signal, then G2. The markers within MYH9 was not significant once one controlled for the variants in G1 and G2.

It is important to remember though that these markers are segregating within a human population where individuals have three potential genotypes. Ancestral homozygote, homozygote for the mutants, and heterozygote. They found that a recessive model of expression of disease is most appropriate in the case of these risk alleles. That is, most of the increased risk is accounted for by the change from one risk allele, the heterozygote state, to two risk alleles, the homozygote state. One risk allele increased odds of renal failure by 1.26, but two by 7.3. The odds ratio of two risk alleles compared to a base rate of one risk allele was 5.8. They report that the results for FSGS were broadly similar. This matters because the frequency of the trait/disease in a random mating population is conditional on the homozygotes if it has a recessive expression pattern. G1 was present in 40% of Yoruba HapMap data set, but in none of the two Eurasian groups, Europeans and East Asians. G2 was found in three Yoruba, but in none of the Eurasian groups. Assuming Hardy-Weinberg equilibrium the Yoruba should have 16% of the population at sharply elevated risk for FSGS and H-ESKD because they’d be homozygotes for the G1 allele.

Once they established which markers seem to implicated in this phenotypic variation, they wanted to focus on how the frequencies of those markers came to be. Specifically, G1 and G2 seem to be derived haplotypes which arose out of the ancestral background. In plain English 20,000 years ago Africans should have looked like all non-Africans genomically, at least on the functionally relevant segments, but within the last 10,000 years it looks like new variants rose in frequency driven by natural selection to new environmental stresses. The region has already broadly been surveyed by linkage disequilibrium based tests, which basically look for regions of long haplotypes, homogenized zones of the genome where many individuals have the variation removed because one gene rose so rapidly in frequency that huge adjacent sections hitchhiked up in frequency. Presumably this may have happened with the MYH9 haplotype correlated with the traits under consideration here; G1 and G2 dragged up the E-1 haplotype as a secondary consequence of their own rise to prominence among some Sub-Saharan African populations.

So next authors turned to tried & tested techniques and focused on the risk markers which they had discovered earlier in their research, G1 and G2. Specifically, EHH, which is best at detecting selection where sweeps have nearly completed (e.g., the derived variant is at frequency 0.95 within the population), iHS, which is best at detecting sweeps which have not completed (e.g., the derived variant is at frequency 0.6), as well as ΔiHH, which I am less familiar with but is reputedly similar to iHS but uses absolute haplotype length as opposed to relative haplotype length. Figure 2 show the results of these tests:

apol13

The resolution isn’t the best, but G1 and G2 seem to be outliers on all three tests to detect natural selection by using patterns of linkage disequilibrium. The first panel is EHH, the second and third show iHS and ΔiHH respectively, with the position of the markers being outliers among the distribution of values for the genome within the Yoruba. This is not proof of adaptation, but it changes our weights of possibilities. Additionally, they note that Europeans exhibit no such patterns on these markers. Visually the position of the markers in the latter two panels would be closer to the mode of the distribution in Europeans.

To review, first they confirmed a causal relationship between a particular set of markers, haplotypes, and the traits of interest. Second, they confirmed that said markers seem to bear the hallmarks of genomic regions subject to natural selection. We know that focal segmental glomerulosclerosis (FSGS) end-stage kidney disease (H-ESKD), the traits whose relationship to the G1 and G2 haplotypes seem confirmed, are unlikely to be targets of positive natural selection. To get a better sense of that we need to look at Apol1, the protein product of APOL1, and what it does. At this point I’ll quote the paper:

ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T. brucei brucei) parasite…T. brucei brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans…T. brucei rhodesiense is predominantly found in Eastern and Southeastern Africa, while T. brucei gambiense is typically found in Western Africa, though some overlap exists…Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1 gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays to compare the trypanolytic potential of the variant, disease-associated forms of ApoL1 proteins with that of the “wild-type” form of ApoL1 protein that is not associated with renal disease.

We’re talking about sleeping sickness. Here’s a description:

It starts with a headache, joint pains and fever. It is the kind you would expect to get over quickly. But after a while, things get worse. You fall asleep most of the time, are confused and get intense pains and convulsions.

If you do not get treatment, your body begins to waste away. Eventually, you slip into coma and die. This is human African trypanosommiasis, better known as sleeping sickness. If untreated, it kills 100% of its victims in a very short time.

Cheery. I think we have a plausible reason for natural selection to kick into overdrive! Or more specifically, we have a plausible external selection pressure which will drive fitness differentials which correlate with genetic variation. Increased probability of kidney disease seems preferable to this. In terms of the molecular genetics it looks like a factor, serum resistance-associated protein (SRA), produced by T. brucei rhodesiense binds to a specific location of Apol1, and that mutations at G1 and G2 change exactly that location within the protein. So these mutants may block the ability of T. brucei rhodesiense to turn off the body’s defenses against trypanosomes.

To test this they examined the in vitro lytic potential of serum produced by individuals carrying the G1 and G2 haplotypes against the three subspecies of of Trypanosoma. T. brucei brucei, which normal Apol1 can lyse, and T. brucei rhodesiense and T. brucei gambiense which can infect humans (endemic to eastern and western Africa respectively, though the former extends into west Africa as well).

- All 75 samples lysed brucie brucie

- None lysed brucie gambiense

- 46 samples lysed SRA-positive brucie rhodesiense, all 46 samples were from G1 or G2 carrying individuals

- The potency of G2 seemed higher than G1 against SRA-positive samples of brucie rhodesiense, though not SRA-negative samples, where G1 seemed as potent

- Recombinants of Apol1 which had only one of the two SNPs of the G1 haplotype were less effective against brucie rhodesiense than those which had both (G1 haplotype)

- Recombinants with G1 and G2 were not more effective against brucie rhodesiense than those with G2 alone

- Recombinants with G1 alone were more potent against SRA-negative brucie rhodesiense than those with G2 alone

- G2 was necessary and sufficient to block SRA binding to Apol1 and allow lysing of brucie rhodesiense. G1 did not block SRA binding to Apol1, but was still sufficient to lyse brucie rhodesiense, but far less potent against SRA-positive brucie rhodesiense than G2

It seems that the G1 and G2 haplotypes utilize different mechanisms to enable the lysing of invasive pathogens, and so prevent the development of sleeping sickness. Their means differ, but the ends are the same. The authors note that even minimal amounts of plasma serum produced by G2 individuals seems potent enough to block the binding of SRA to Apol1 and so enable lysis. And introduction of such plasma into the bloodstreams of individuals who do not have resistance may then be highly efficacious as a preventative treatment against sleeping sickness. They do note that they did not explore in detail the mechanism by which the G1 and G2 variants result in suscepbility to kidney failure, but that’s presumably for the future.

Finally, the second to last paragraph where they bring it all together:

It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T. brucei rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T. brucei rhodesiense could have favored the spreading of T. brucei gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T. brucei rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest…Thus, resistance to T. brucei rhodesiense may not be the only factor causing these variants to be selected.

This is a very long review already. But, while I have your attention, I think I need to point to another paper on the same topic which has a slightly different twist. I won’t dig into the details with the same thoroughness as above, but rather I’ll highlight the value-add of this group’s contribution. It’s an Open Access paper, unlike the one above, so you can review it in depth yourself. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene:

MYH9 has been proposed as a major genetic risk locus for a spectrum of nondiabetic end stage kidney disease (ESKD). We use recently released sequences from the 1000 Genomes Project to identify two western African-specific missense mutations (S342G and I384M) in the neighboring APOL1 gene, and demonstrate that these are more strongly associated with ESKD than previously reported MYH9 variants. The APOL1 gene product, apolipoprotein L-1, has been studied for its roles in trypanosomal lysis, autophagic cell death, lipid metabolism, as well as vascular and other biological activities. We also show that the distribution of these newly identified APOL1 risk variants in African populations is consistent with the pattern of African ancestry ESKD risk previously attributed to MYH9. Mapping by admixture linkage disequilibrium (MALD) localized an interval on chromosome 22, in a region that includes the MYH9 gene, which was shown to contain African ancestry risk variants associated with certain forms of ESKD…MYH9 encodes nonmuscle myosin heavy chain IIa, a major cytoskeletal nanomotor protein expressed in many cell types, including podocyte cells of the renal glomerulus. Moreover, 39 different coding region mutations in MYH9 have been identified in patients with a group of rare syndromes, collectively termed the Giant Platelet Syndromes, with clear autosomal dominant inheritance, and various clinical manifestations, sometimes also including glomerular pathology and chronic kidney disease…Accordingly, MYH9 was further explored in these studies as the leading candidate gene responsible for the MALD signal. Dense mapping of MYH9 identified individual single nucleotide polymorphisms (SNPs) and sets of such SNPs grouped as haplotypes that were found to be highly associated with a large and important group of ESKD risk phenotypes, which as a consequence were designated as MYH9-associated nephropathies…These included HIV-associated nephropathy (HIVAN), primary nonmonogenic forms of focal segmental glomerulosclerosis, and hypertension affiliated chronic kidney disease not attributed to other etiologies…The MYH9 SNP and haplotype associations observed with these forms of ESKD yielded the largest odds ratios (OR) reported to date for the association of common variants with common disease risk…Two specific MYH9 variants (rs5750250 of S-haplotype and rs11912763 of F-haplotype) were designated as most strongly predictive on the basis of Receiver Operating Characteristic analysis…These MYH9 association studies were then also extended to earlier stage and related kidney disease phenotypes and to population groups with varying degrees of recent African ancestry admixture…and led to the expectation of finding a functional African ancestry causative variant within MYH9. However, despite intensive efforts including re-sequencing of the MYH9 gene no suggested functional mutation has been identified…This led us to re-examine the interval surrounding MYH9 and to the detection of novel missense mutations with predicted functional effects in the neighboring APOL1 gene, which are significantly more associated with ESKD than all previously reported SNPs in MYH9.

Table one has the top line results. Focus on the first two rows, they’re “G1″ from the earlier study (that is, the two SNPs which combine to form the G1 haplotype).

apo14

Here’s a difference between the previous paper and this one: the table above uses cases and controls from African Americans and Hispanic Americans. The original paper which the genomic data on this sample is drawn from calculates the average ancestry of African, European and Native American in the two groups is as follows (I did some rounding to keep the values round):

African American – 85%, 10%, 5%
Hispanic American – 30%, 55%, 15%

Not surprisingly the Hispanic American sample here is mostly Puerto Rican and Dominican, explaining the greater African than Native American ancestry. Nevertheless, it is a sufficiently different genetic background to test the effects of the same marker against different genes. They confirmed the association of the markers of large effect in African Americans within the Hispanic cohort. The risk allele frequency in the African American control group is 21% vs. 37% in the cases. For Hispanic Americans are 6% and 23% for the same categories.

OK, now to the most interesting point in this short paper:

HIVAN has been considered as the most prominent of the nondiabetic forms of kidney disease within what has been termed the MYH9-associated nephropathies…We have reported absence of HIVAN in HIV infected Ethiopians, and attributed this to host genomic factors (Behar et al. 2006). Therefore, we examined the allele frequencies of the APOL1 missense mutations in a sample set of 676 individuals from 12 African populations, including 304 individuals from four Ethiopian populations…We coupled this with the corresponding distributions for the African ancestry leading MYH9 S-1 and F-1 risk alleles. A pattern of reduced frequency of the APOL1 missense mutations and also of the MYH9 risk variants was noted in northeastern African in contrast to most central, western, and southern African populations examined…Especially striking was the complete absence of the APOL1 missense mutations in Ethiopia. This combination of the reported lack of HIVAN and observed absence of the APOL1 missense mutations is consistent with APOL1 being the functionally relevant gene for HIVAN risk and likely the other forms of kidney disease previously associated with MYH9.

apo16Bingo. The previous paper focused on African Americans (along with the HapMap Yoruba). But the pattern of variation within Africa is interesting as well. Ethiopians are not quite like other Africans, having a great deal of admixture with populations from Arabia (many of the languages of highland Ethiopia are Semitic). But the majority of their ancestry remains similar to that of other Sub-Saharan Africans. As a point of contrast the ecology of Ethiopia differs a great deal from the rest of Sub-Saharan Africa because of its elevation, and concomitant frigidity. The mean monthly low in Addis Ababa is around 10 (50 for Americans) degrees and mean high 20-25 (high 60s to mid 70s for Americans). There isn’t much variation from month to month because of the low latitude, but the high elevation keeps the temperatures relatively moderate. Different environments result in different selection pressures, and Ethiopia has a very unique environment within Africa. The tsetse fly which serves as a vector forTtrypanosomes does not seem to be present in the Ethiopian highlands. The map above shows the distribution within Africa of one the markers which defines the G1 haplotype in the previous paper. Note that the modal frequency is in the west of Africa, and the frequency drops off to the east (though the geographic coverage leaves a bit to be desired if you look at the raw data which went into generating this map, which smooths over huge discontinuities).

One of the points I want to reemphasize from the tests of natural selection in the first paper is that these genetic adaptations are likely to be new, otherwise recombination would have broken up the long haplotypes and reduced linkage disequilibrium. New as in the last 10,000 years. It is interesting that a particular subspecies of Trypanosome which is immune to these genetic adaptations is endemic to west Africa. We may be seeing evolution in action here, or at least the arms race between man and pathogen where man is always one step behind. In contrast, the subspecies which is effectively diffused by the genetic adaptations reviewed here is present in higher numbers precisely in the regions where the resistance mutations are extant at lower proportions. Perhaps there are different mutations in these regions of Africa, not yet properly identified. Or perhaps the we’re seeing humans in this region at an earlier stage of the dance, so to speak.

Citation: Giulio Genovese, David J. Friedman, Michael D. Ross, Laurence Lecordier, Pierrick Uzureau, Barry I. Freedman, Donald W. Bowden, Carl D. Langefeld, Taras K. Oleksyk, Andrea Uscinski Knob, Andrea J. Bernhardy, Pamela J. Hicks, George W. Nelson, Benoit Vanhollebeke, Cheryl A. Winkler, Jeffrey B. Kopp, Etienne Pays, & Martin R. Pollak (2010). Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans Science : 10.1126/science.1193032

Citation: Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, Bekele E, Bradman N, Wasser WG, Behar DM, & Skorecki K (2010). Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Human genetics PMID: 20635188

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

I said yesterday I would say a bit more about the new paper on rapid recent high altitude adaptation among the Tibetans when I’d read the paper. Well, I’ve read it now. Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude:

Residents of the Tibetan Plateau show heritable adaptations to extreme altitude. We sequenced 50 exomes of ethnic Tibetans, encompassing coding sequences of 92% of human genes, with an average coverage of 18x per individual. Genes showing population-specific allele frequency changes, which represent strong candidates for altitude adaptation, were identified. The strongest signal of natural selection came from endothelial Per-Arnt-Sim (PAS) domain protein 1 (EPAS1), a transcription factor involved in response to hypoxia. One single-nucleotide polymorphism (SNP) at EPAS1 shows a 78% frequency difference between Tibetan and Han samples, representing the fastest allele frequency change observed at any human gene to date. This SNP’s association with erythrocyte abundance supports the role of EPAS1 in adaptation to hypoxia. Thus, a population genomic survey has revealed a functionally important locus in genetic adaptation to high altitude.

The exome is just the protein-coding part of the genome; so they’re focusing ostensibly on functionally relevant single nucleotide polymorphisms (SNPs). About a month and a half ago a similar paper on Tibetan high altitude adaptations was published in Science (I posted on that too), but their methodology was somewhat different. That group was looking at a set of genes, candidates, which they’d assume might have been under selection and so have functional significance in explaining Tibetan vs. non-Tibetan phenotypes at high altitudes. This second paper takes a more bottom up approach, scanning the genome of Tibetans and Han Chinese, and trying to spotlight regions which exhibit a great deal of between population variance, far greater than one might presume from the total genome genetic distances.

As to that last point…the timing of this has been causing a major problem with archaeologists. The supplements lays out the details a bit more than the press reports, so below is figure 2:


tibhan

It looks like to get a better sense of the model you’ll have to read the cited paper, and I’m not sure that that will satisfy the archaeologists. They did use a large number of neutral markers though, so I’m not too worried about biases in their data set. Some have been confused about the population numbers, but this value in a population genetic context can be counterintuitive, especially over the long term (low values are given much more weight than high values). The small Han value can be easily made less confusing when you consider a massive demographic expansion from a small founder group, as well as persist long term biases in reproductive value within the population (e.g., some males in a given generation are way more fecund than others through polygyny). A higher N for Tibetans may be explained by a more stable population where diverse subsets and across individuals the reproductive value may be more equitable. In other words, an effective population size is a statistic which is bundling together a lot of evolutionary history, and is not a simple measure of perceived census sizes (the Tibetans may also be something of a melange of a diverse set of ancient groups which took refuge in the highlands, while the Han are the descendants of early adopters of agriculture which expanded demographically; so they’re opposite ends of the demographic tunnel).

The time of divergence of a little under 3,000 years is important for the rest of the paper, so I suppose other workers had better replicate their findings in the future. Figure 1 is rather striking, so let’s jump to it:

tibhan2

This chart is simply showing frequencies of SNPs in Tibetans and Han. The two are obviously correlated, as evident by the diagonal. Shading indicates the density of the number of SNPs at a given position. Look to the bottom right, and you see the gene around which much of the paper hinges, EPAS1. It’s an enormous outlier, with SNPs where Tibetans and Han differ a great deal. This is important in regards to looking for genes which may drive adaptation to higher altitudes; if you don’t have different genes then you don’t have different traits. If the Tibetans and Han diverged ~3,000 years ago, then those adaptations may be recent and would have emerged through rapid allele frequency changes (though they observe that it may be drawn from standing variation). The researchers didn’t go looking for EPAS1 as such, rather, it came looking for them. What does it do? From the text:

EPAS1 is also known as hypoxia-inducible factor 2{alpha} (HIF-2{alpha}). The HIF family of transcription factors consist of two subunits, with three alternate {alpha} subunits (HIF-1{alpha}, HIF-2{alpha}/EPAS1, HIF-3{alpha}) that dimerize with a β subunit encoded by ARNT or ARNT2. HIF-1{alpha} and EPAS1 each act on a unique set of regulatory targets…and the narrower expression profile of EPAS1 includes adult and fetal lung, placenta, and vascular endothelial cells…A protein-stabilizing mutation in EPAS1 is associated with erythrocytosis…suggesting a link between EPAS1 and the regulation of red blood cell production.

Next, they dig into the functional significant of EPAS1 variants, in the literature, and in their current sample:

Associations between SNPs at EPAS1 and athletic performance have been demonstrated…Our data set contains a different set of SNPs, and we conducted association testing on the SNP with the most extreme frequency difference, located just upstream of the sixth exon. Alleles at this SNP tested for association with blood-related phenotypes showed no relationship with oxygen saturation. However, significant associations were discovered for erythrocyte count (F test P = 0.00141) and for hemoglobin concentration (F test P = 0.00131), with significant or marginally significant P values for both traits when each village was tested separately (table S5). Comparison of the EPAS1 SNP to genotype data from 48 unlinked SNPs confirmed that its P value is a strong outlier (5) (fig. S4).

The allele at high frequency in the Tibetan sample was associated with lower erythrocyte quantities and correspondingly lower hemoglobin levels…Because elevated erythrocyte production is a common response to hypoxic stress, it may be that carriers of the “Tibetan” allele of EPAS1 are able to maintain sufficient oxygenation of tissues at high altitude without the need for increased erythrocyte levels. Thus, the hematological differences observed here may not represent the phenotypic target of selection and could instead reflect a side effect of EPAS1-mediated adaptation to hypoxic conditions. Although the precise physiological mechanism remains to be discovered, our results suggest that the allele targeted by selection is likely to confer a functionally relevant adaptation to the hypoxic environment of high altitude.

There are random anomalies in nature, but it seems too perfect that this is the outlier in allele frequencies across two populations which differ in adaptations which relate to many of the traits above.

tibhan3OK, so they found an outlier SNP. The gene seems to have a reasonable probability of being involved in functional pathways relevant to altitude adaptation. But so far we’ve been focusing on the Tibetan-Han difference. If the two populations separated about 3,000 years ago one assumes that genes with SNPs with huge F sts, where most of the variation can be partitioned between the groups, not within them, are good candidates for having been driven by selection. But it would be nice to compare with an outgroup. So they compared the Tibetans and Hans with the Danes, who are an outgroup who separated from the East Asian cluster about one order of magnitude further back in time (~30,000 years). Next they generated a “population branch statistic,” (PBS), from the the F st data (see the supplements). Basically you’re getting a value which describes allele frequency differences normalized to the expected genetic distance as known from population history. I’ve extracted out Panel B from figure 2. T = Tibetans, H = Han, and D = Danes. The smaller tree represents genome average PBS values. It’s what you’d expect, the Danes are the outgroup. Over time genetic difference builds up because of separation between the groups. The Han and Tibetans are very close, as you’d expect from genetically similar populations. But look at the larger tree, the Tibetans are the outgroup by a mile! The Danes and Han differ far less from each other on EPAS1 than they do from the Tibetans. This seems like a clear deviation from the level of allele frequency difference one might be able to generate by neutral random walk processes.

EPAS1 isn’t the only gene which they found, but it was the most significant, and illustrates the nature of the methodological orientation of this group. Sift through the genome and look for something which is totally unexpected, and put a focus on the peculiar diamond in the rough and see what it can tell you. They conclude with the big picture:

Of the genes identified here, only EGLN1 was mentioned in a recent SNP variation study in Andean highlanders (24). This result is consistent with the physiological differences observed between Tibetan and Andean populations…suggesting that these populations have taken largely distinct evolutionary paths in altitude adaptation.

Several loci previously studied in Himalayan populations showed no signs of selection in our data set…whereas EPAS1 has not been a focus of previous altitude research. Although EPAS1 may play an important role in the oxygen regulation pathway, this gene was identified on the basis of a noncandidate population genomic survey for natural selection, illustrating the utility of evolutionary inference in revealing functionally important loci.

Given our estimate that Han and Tibetans diverged 2750 years ago and experienced subsequent migration, it appears that our focal SNP at EPAS1 may have experienced a faster rate of frequency change than even the lactase persistence allele in northern Europe, which rose in frequency over the course of about 7500 years…EPAS1 may therefore represent the strongest instance of natural selection documented in a human population, and variation at this gene appears to have had important consequences for human survival and/or reproduction in the Tibetan region.

Natural selection is somewhat stochastic; it can take different tacks to the same process because it doesn’t have infinite power in its search algorithm. Given enough time and gene flow no doubt adaptations would homogenize and converge upon a perfect optimum, but given enough time the universe will devolve into heat death. Evolution has to operate extemporaneously for eternity because the conditions are ever changing. Second, the big headline grabbing assertion about EPAS1 being the strongest instance of natural selection needs to be moduled by the fact that the conclusion was generated assuming the validity of the inferences of a particular model, and models can be wrong. It does seem like the evolutionary change is likely to be recent, I doubt they’d be off by an order of magnitude. But for lactase persistence we’ve extracted genetic material from ancient remains. The conclusion then is much more concrete in this case. Until we get remains from ancient Tibetans and can infer their allele frequencies, there will be some asymmetry in the confidence with which we can make a claim as to when the selection event began.

Citation: Yi, X., Liang, Y., Huerta-Sanchez, E., Jin, X., Cuo, Z., Pool, J., Xu, X., Jiang, H., Vinckenbosch, N., Korneliussen, T., Zheng, H., Liu, T., He, W., Li, K., Luo, R., Nie, X., Wu, H., Zhao, M., Cao, H., Zou, J., Shan, Y., Li, S., Yang, Q., Asan, ., Ni, P., Tian, G., Xu, J., Liu, X., Jiang, T., Wu, R., Zhou, G., Tang, M., Qin, J., Wang, T., Feng, S., Li, G., Huasang, ., Luosang, J., Wang, W., Chen, F., Wang, Y., Zheng, X., Li, Z., Bianba, Z., Yang, G., Wang, X., Tang, S., Gao, G., Chen, Y., Luo, Z., Gusang, L., Cao, Z., Zhang, Q., Ouyang, W., Ren, X., Liang, H., Zheng, H., Huang, Y., Li, J., Bolund, L., Kristiansen, K., Li, Y., Zhang, Y., Zhang, X., Li, R., Li, S., Yang, H., Nielsen, R., Wang, J., & Wang, J. (2010). Sequencing of 50 Human Exomes Reveals Adaptation to High Altitude Science, 329 (5987), 75-78 DOI: 10.1126/science.1190371

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

443px-PaldenLhamoYou probably are aware that different populations have different tolerances for high altitudes. Himalayan sherpas aren’t useful just because they have skills derived from their culture, they’re actually rather well adapted to high altitudes because of their biology. Additionally, different groups seem to have adapted to higher altitudes independently, exhibiting convergent evolution. But in terms of physiological function they aren’t all created equal, at least in relation to the solutions which they’ve come to to make functioning at high altitudes bearable. In particular, it seems that the adaptations of the peoples of Tibet are superior than those of the peoples of the Andes. Superior in that the Andean solution is more brute force than the Tibetan one, producing greater side effects, such as lower birth weight in infants (and so higher mortality and lower fitness).

The Andean region today is dominated by indigenous people, and Spanish is not the lingua franca of the highlands as it is everyone in in the former colonial domains of Spain in the New World. This is largely a function of biology; as in the lowlands of South America the Andean peoples were decimated by disease upon first contact (plague was spreading across the Inca Empire when Pizzaro arrived with his soldiers). But unlike the lowland societies the Andeans had nature on their side: people of mixed or European ancestry are less well adapted to high altitudes and women without tolerance of the environment still have higher miscarriage rates.

So despite the suboptimal nature of the Andean adaptations vis-a-vis the Tibetan ones, they are certainly better than nothing, and in a relative sense have been very conducive to higher reproductive fitness. And yet why might the Andeans have kludgier adaptations than Tibetans? One variable to consider is time. The probability is that the New World was populated by humans only for the past ~10,000-15,000 years or so, with an outside chance of ~20,000 years (if you trust a particular interpretation of the genetic data, which you probably shouldn’t). By contrast, modern humans have had a presence in the center of Eurasia for ~30,000 years. Generally when populations are exposed to new selective regime the initial adaptations are drastic and exhibit major functional downsides, but they’re much better than the status quo (remember, fitness is relative). Over time genetic modifications mask the deleterious byproducts of the genetic change which emerged initially to deal with the new environment. In other words, selection perfects design over time in a classic Fisherian sense as the genetic architecture converges upon the fitness optimum.*

Another parameter may be the variation available within the population, as the power of selection is proportional to the amount of genetic variation, all things equal. The peoples of the New World tend to be genetically somewhat homogeneous, probably due to the fact that they went through a bottleneck across Berengia, and that they’re already sampled from the terminus of the Old World. A physical anthropologist once told me that the tribes of the Amazon still resemble Siberians in their build. It may be that it takes a homogeneous population with little extant variation a long time indeed to shift trait value toward a local ecological optimum (tropical Amerindians are leaner and less stocky than closely related northern populations, just not particularly in relation to other tropical populations). In contrast, populations in the center of Eurasia have access to a great deal of genetic variation because they’re in proximity to many distinctive groups (the Uyghurs for example are a recent hybrid population with European, South Asian and East Asian ancestry).

So that’s the theoretical backdrop for the differences in adaptations. Shifting to the how the adaptations play out concretely, some aspects of the physiology of Tibetan tolerance of high altitudes are mysterious, but one curious trait is that they actually have lower levels of hemoglobin than one would expect. Andean groups have elevated hemoglobin levels, which is the expected “brute force” response. Interestingly it seems that evolution given less time or stabilizing at a physiologically less optimal equilibrium is more comprehensible to humans! Nature is often more creative than us. In contrast the Tibetan adaptations are more subtle, though interestingly their elevated nitric acid levels may facilitate better blood flow. Though the inheritance patterns of the trait had been observed, the genetic mechanism underpinning it has not been elucidated. Now a new paper in Science identifies some candidate genes for the various physiological quirks of Tibetans by comparing them with their neighbors, and looking at the phenotype in different genotypes with the Tibetan population. Genetic Evidence for High-Altitude Adaptation in Tibet:

Tibetans have lived at very high altitudes for thousands of years, and they have a distinctive suite of physiological traits that enable them to tolerate environmental hypoxia. These phenotypes are clearly the result of adaptation to this environment, but their genetic basis remains unknown. We report genome-wide scans that reveal positive selection in several regions that contain genes whose products are likely involved in high-altitude adaptation. Positively selected haplotypes of EGLN1 and PPARA were significantly associated with the decreased hemoglobin phenotype that is unique to this highland population. Identification of these genes provides support for previously hypothesized mechanisms of high-altitude adaptation and illuminates the complexity of hypoxia response pathways in humans.

Here’s what they did. First, Tibetans are adapted to higher altitudes, Chinese and Japanese are not. The three groups are relatively close genetically in terms of ancestry, so the key is to look for signatures of positive selection in regions of the genome which have been identified as possible candidates in terms of functional significance in relation to pathways which may modulate the traits of interest. After finding potential regions of the genome possibly under selection in Tibetans but not the lowland groups, they fixed upon variants which are at moderate frequencies in Tibetans and noted how the genes track changes in the trait.

This figure from the supplements shows how the populations are related genetically:

tib1

In a worldwide context the three groups are pretty close, but they also don’t overlap. The main issue I would have with this presentation is that the Chinese data is from the HapMap, and they’re from Beijing. This has then a northeast Chinese genetic skew (I know that people who live in Beijing may come from elsewhere, but recent work which examines Chinese phylogeography indicates that the Beijing sample is not geographically diversified), while ethnic Tibetans overlap a great deal with Han populations in the west of China proper. In other words, I wouldn’t be surprised if the separation between Han and Tibetan was far less if you took the Chinese samples from Sichuan or Gansu, where Han and Tibetans have lived near each other for thousands of years.

tib2But these issues of phylogenetic difference apart, we know for a fact that lowland groups do not have the adaptations which are distinctive to the Tibetans. To look for genetic differences they focused on 247 loci, some from the HIF pathway, which is important for oxygen homeostasis, as well genes from Gene Ontology categories which might be relevant to altitude adaptations. Table 1 has the breakdown by category.

Across these regions of the genome they performed two haplotype based tests which detect natural selection, EHH and iHS. Both of these tests basically find regions of the genome which have reduced variation because of a selective sweep, whereby selection at a specific region of the genome has the effect of dragging along large neutral segments adjacent to the original copy of the favored variant. EHH is geared toward detection of sweeps which have nearly reached fixation, in other words the derived variant has nearly replaced the ancestral after a bout of natural selection. iHS is better at picking up sweeps which have not resulted in the fixation of the derived variant. The paper A Map of Recent Positive Selection in the Human Genome outlines the differences between EHH and iHS in more detail. They looked at the three populations and wanted to find regions of the genome where Tibetans, but not the other two groups, were subject to natural selection as defined by positive signatures with EHH and iHS. They scanned over 200 kb windows of the genome, and found that 10 of their candidate genes were in regions where Tibetans came up positive for EHH and iHS, but the other groups did not. Since these tests do produce false positives they ran the same procedure on 240 random candidate genes (7 genes were in regions where Chinese and Japanese came up positive, so these were removed from the set of candidates), and came up with average EHH and iHS positive hits of ~2.7 and ~1.4 genes after one million resamplings (specifically, these are genes where Tibetans were positive, the other groups negative). Their candidate genes focused on altitude related physiological pathways yielded 6 for EHH and 5 for iHS (one gene came up positive for both tests, so 10 total). This indicates to them these are not false positives, something made more plausible by the fact that we know that Tibetans are biologically adapted to higher altitudes and we have an expectation that these genes are more likely than random expectation to have a relationship to altitude adaptations.

Finally, they decided to look at two genes with allelic variants which exist at moderate frequencies in Tibetans, EGLN1 and PPARA. The procedure is simple, you have three genotypes, and you see if there are differences across the 31 individuals by genotype in terms of phenotype. In this case you want to look at hemoglobin concentration, where those who are well adapted have lower concentrations. Figure 3 is rather striking:

tib3

Even with the small sample sizes the genotypic effect jumps out at you. This isn’t too surprising, previous work has shown that these traits are highly heritable, and that they vary within the Tibetan population. There’s apparently a sex difference in terms of hemoglobin levels, so they did a regression analysis, and it illustrates how strong the genetic effect from these alleles are:

tib4

My main question: why do Tibetans still have variation on these genes after all this time? Shouldn’t they be well adapted to high altitudes by now? A prosaic answer may be that the Tibetans have mixed with other populations recently, and so have added heterozygosity through admixture. But there are several loci here which are fixed in Tibetans, and not the HapMap Chinese and Japanese. For admixture to be a good explanation one presumes that the groups with which the Tibetans mixed would have been fixed for those genes as well, but not the ones at moderate frequencies. This may be true, but it seems more likely that admixture alone can not explain this pattern. As the Andean example suggests adaptation to high altitudes is not easy or simple. Until better options arrive on the scene, kludges will suffice. It may be that the Tibetans are still going through the sieve of selection, and will continue to do so for the near future. Or, there may be balancing dynamics on the genes which exhibit heterozygosity, so that fixation is prevented.

No matter what the truth turns out to be, this is surely just the beginning. A deeper investigation of the genetic architecture of Andeans and Ethiopians, both of which have their own independent adaptations, will no doubt tell us more. Finally, I wonder if these high altitude adaptations have fitness costs which we’re not cognizant of, but which Tibetans living in India may have some sense of.

Citation: Tatum S. Simonson, Yingzhong Yang, Chad D. Huff, Haixia Yun, Ga Qin, David J. Witherspoon, Zhenzhong Bai, Felipe R. Lorenzo, Jinchuan Xing, Lynn B. Jorde, Josef T. Prchal, & RiLi Ge (2010). Genetic Evidence for High-Altitude Adaptation in Tibet Science : 10.1126/science.1189406

* Additionally, it may be that archaic hominin groups were resident in the Himalaya for nearly one million years. Neandertal admixture evidence in Eurasians should change our priors when evaluating the possibility for adaptive introgression on locally beneficial alleles.

Image Credit: Wikimedia Commons

(Republished from Discover/GNXP by permission of author or representative)
 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"