The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Kjmtchl Razib Khan
Nothing found
 TeasersGene Expression Blog
Natural Selection

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

A Tree of Life

Evolutionary processes which play out across the tree of life are subject to distinct dynamics which can shape and influence the structure and characteristics of individuals, populations, and whole ecosystems. For example, imagine the phylogeny and population genetic characteristics of organisms which are endemic to the islands of Hawaii. Because the Hawaiian islands are an isolated archipelago the expectation is that lineages native to the region are going to be less shaped by the parameter of migration, or gene flow between distinct populations, than might otherwise be the case. Additionally, presumably there was a “founding” event of these endemic Hawaiian lineages at some distant point in the past, so another expectation is that most of the populations would exhibit evidence of having gone through a genetic bottleneck, where the power of random drift was sharply increased for several generations. The various characteristics, or states, which we see in the present in an individual, population, or set of populations, are the outcome of a long historical process, a sequence of precise events. To understand evolution properly it behooves us to attempt to infer the nature and magnitude of these distinct dynamic parameters which have shaped the tree of life.

Credit: Verisimilus

For many the image of evolutionary processes brings to mind something on a macro scale. Perhaps that of the changing nature of protean life on earth writ large, depicted on a broad canvas such as in David Attenborough’s majestic documentaries over millions of years and across geological scales. But one can also reduce the phenomenon to a finer-grain on a concrete level, as in specific DNA molecules. Or, transform it into a more abstract rendering manipulable by algebra, such as trajectories of allele frequencies over generations. Both of these reductions emphasize the genetic aspect of natural history.

Credit: Johnuniq

Obviously evolutionary processes are not just fundamentally the flux of genetic elements, but genes are crucial to the phenomena in a biological sense. It therefore stands to reason that if we look at patterns of variation within the genome we will be able to infer in some deep fashion the manner in which life on earth has evolved, and conclude something more general about the nature of biological evolution. These are not trivial affairs; it is not surprising that philosophy-of-biology is often caricatured as philosophy-of-evolution. One might dispute the characterization, but it can not be denied that some would contend that evolutionary processes in some way allow us to understand the nature of Being, rather than just how we came into being (Creationists depict evolution as a religion-like cult, which imparts the general flavor of some of the meta-science and philosophy which serves as intellectual subtext).

R. A. Fisher

But shifting from such near-metaphysical generalities to more in-the-trenches science as it is done, we are faced today with the swell of sequence data due to the genomic revolution. What does this matter for our understanding of evolution? Many of the original arguments of evolutionary geneticists such as R. A. Fisher and Sewall Wright were predicated on inferences from the inheritance patterns of a few genes which were easily identifiable by their phenotypic markers. But a more likely frame for the dispute was one where the inferences were purely theoretical, deduction with a minimal level of empirical messiness intervening. In contrast today we live in an age where someone may pity you if you don’t have a very well assembled genome of your organism (on the order of billions of base pairs for mammals), and so have to make due with SNP marker data of a few thousand per individual!

These new data, first and foremost from humans due to the funding priorities of biomedical science, have stimulated a renaissance of method development to take advantage of the richness of the genetic variation now being uncovered. Consider PSMC, which allows one to make demographic inferences of population history from one genome by surveying patterns of heterozygosity within a single individual. Last week I reviewed a preprint which illustrated the power of extensive data analysis in shading and refining previous results which seemed straightforward on the face of it. The reformulation yielded the possibility of natural selection as being a pervasive parameter in human evolution over the past ~100,000 years. The authors compared variation at different categories of bases (synonymous vs. nonsynonomous) across the genome to reinforce both old intuitions and extract novel insights.

Citation: Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72.

Looking at diferences between synonymous vs. nonsyonomous substitutions is a tried & tested technique with a fine pedigree, but more recently haplotype based methods to detect natural selection have been all the rage, due to the emergence of dense genome-wide marker sets. These allow for the inference of correlated patterns of markers across adjacent genomic segments. This trend toward haplotype methods naturally triggered their antithesis, and the resulting synthesis to some extent can be seen in two papers, both Grossman et al., A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection, and Identifying recent adaptations in large-scale genomic data. These are improvements upon earlier work in the aughts, a reassessment which had already started to occur in the literature after the excesses of genomic methods in their detection of ubiquitous selection in human populations. More specifically, the newer techniques focused on recent selective events which leave long blocks of the genome within populations homogenized. As causal markers rapidly increase in frequency due to positive selection, they drag along flanking region in sweep events. For many generations after the initial selection event these flanking regions will produce regions of linkage disequilibrium, as recombination only slowly breaks apart apart the associations across loci. But a key drawback with these methods is that selection is not the only dynamic which results in long haplotypes and linkage disequilibrium. More specifically demographic stochasticity, colloquially the vicissitudes of population history, can also generate long homogeneous blocks of markers. The initial candidate regions yielded by a statistic like iHS were saturated by the effects of population specific history.

CMS, debuted in Grossman et al. 2010, is an attempt to correct for this bug, while retaining the power of haplotype based methods. Natural selection within the genome leaves more evidence behind in regards to its operation than just long halotype blocks and linkage disequilibrium. Selected alleles often exhibit greater between population difference than the average region of the genome (i.e., higher F st). Additionally, a new derived allele segregating within one population at a high frequency is often a telltale marker of recent adaptation, as a de novo mutation in a specific locale turns out to be beneficial. By combining tests which survey patterns of variation across loci (i.e., haplotype based methods), with those within loci and across populations (F st based methods) , CMS zeros in on a few precise narrow candidates by cross-checking with multiple tools. False positive hits aside, another major problem with relying upon a single coarse test is that they often highlight a large region as a target of natural selection. This does not necessarily allow for simple follow up when you have dozens of genes and millions of bases which are potential candidates.

The second paper, Grossman et al. 2013, is less a map of genome-wide variation, than a scan of genome-wide variation with an intent to select choice targets for more detailed analysis. To no one’s surprise for human data sets loci implicated in salient physical characteristics such as height and pigmentation, metabolism, and immune response, are high on the list of candidates. No matter the genuine issue of false positives it does seem that recent human evolution (and frankly, evolution more generally) has a fixation on these traits, no pun intended. I do wonder sometimes if this is just an feature of the fact that we humans notice exterior phenotypes, as well as disease related markers (e.g., metabolic and immune illnesses). One of the major concerns in the second paper is that a selection signature without a phenotype is often without utility, but perhaps the phenotypes are lacking in utility because humans are blind in terms of what traits are of interest. I am still skeptical of explanations for what exactly the target of selection around the EDAR locus in East Asians is.

Two alleles of SLC24A5, citation: Norton, Heather L., et al. “Genetic evidence for the convergent evolution of light skin in Europeans and East Asians.” Molecular biology and evolution 24.3 (2007): 710-722.

One of the more intriguing results from CMS in Grossman et al. 2013 is that a locus with the strongest association with resistance to leprosy also contains SLC24A5. This locus has an allele within it that is almost disjoint in frequency between Europeans and Sub-Saharan Africans. By this, I mean that almost all Africans carry one base, while nearly all Europeans care the other. The allele found in Europeans is dominant in West Asia, and present as frequencies as high as ~50% as far south and east as Sri Lanka. It is a gene which is famously correlated with lighter skin in humans and zebrafish. And yet there remains the mystery that it is present at very high frequencies rather far south, and it is certainly not a necessary condition for light skin. East Asians are nearly fixed for the ancestral variant which is common in Sub-Saharan Africa. A possible explanation is that these sorts of salient phenotypic loci have been reshaped due to very strong bouts of selection targeting particular diseases in the recent past. If this is correct, the phenotypic characteristics which we find salient in human beings may simply be pleiotropic side effects of selective sweeps anchored around disease resistance.

I am not proposing here that genomics can solve and explain evolution. The heirs of G. G. Simpson may have something to say about that. Rather, I am suggesting that the genetic piece of the puzzle will not be lacking in data to any extent within our lifetimes. My hunch is that many evolutionary genetic questions will be soluble when we have thousands of complete genomes of high quality on thousands of organisms. There is no likely windfall of fossils in the near future, so palentology will have to continue to operate in a relatively data constrained environment. For those who work in the domain of evolutionary genetics and genomics the onus is on human ingenuity, and analytic skill and savvy. Thinking hard and deep about difficult problems, rather than putting in long hours on the bench to glean more data.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Frank analytic clarity?

Sexual selection is a big deal. A few years ago Geoffrey Miller wrote The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature, which seemed to herald a renaissance of the public awareness of this evolutionary phenomenon, triggered in part by debates over Amotz Zahavi’s Handicap Principle in the 1970s. Of course Charles Darwin discussed the process in the 19th century, and it has always been part of the arsenal of the evolutionary biologist (I first encountered it in Jared Diamond’s The Third Chimpanzee, where he lent some credence to Darwin’s supposition that human racial differences may be a consequence of sexual selection). But this bump in recognition for sexual selection seems to be accompanied by its co-option as a deus ex machina for all sorts of unexplained events. And yet as they say, that which explains everything explains nothing.

To get a better sense of the current scientific literature I consulted A Guide to Sexual Selection Theory in the Annual Review of Ecology, Evolution, and Systematics. The image above is from an actual box in this review! Normally technical boxes illuminate with an air of superior authority (e.g. “it therefore follows from eq. 1…/”), but it seems to me that the admission that a parameter can be represented by the verbal assertion that it’s complicated tells us something about the state of sexual selection theory. In short: its formal basis is baroque because the dynamic itself is not amenable to easy decomposition.

Not just for the peacocks
Credit: George Biard

First, for those who are unfamiliar with the topic, sexual selection theory comes in several flavors. As the term implies sexual selection emerges from differential fitness due to the preferences of individuals for various favored traits. I will admit beforehand that my personal preference is that sexual selection not be so artificially detached from natural selection more broadly, but the nature of the discussion is usually one where such strong distinctions are made. So I won’t make too much of a fuss about that.

Perhaps the most obvious area of difference is that there are forms of sexual selection where there is no strong exogenous fitness implication. By this, I mean that there is no great adaptive value to the trait being favored proportional to its selective value (note: the trait may not necessarily be totally neutral initially, one could imagine non-sexual preferences which triggered subsequent sexual dynamics). This is at the heart of Fisherian runaway process. The basic principle here is that if there is a correlation for a trait which is preferred, and the preference for that trait, then the two will amplify each other’s fitness and rapidly sweep up in frequency within the population. A simple illustration will suffice. Imagine that within a bird population a subset of females prefers longer beaks. There is normal variation within the population for beak length, which implies that the fitness of the shorter and longer beaked individuals is not so different. If a subset of females prefers longer beaks, then males with longer beaks will have higher fitness, because they have reproductive access to all the females, while those with shorter beaks only have access to those females who do not exhibit a preference. In the next generation there will be a correlation between longer beaks (from the fathers) and preference for longer beaks (from the mothers). Because of the correlation there is now also selection for the preference as a byproduct of selection for the longer beaks! This means that selection for longer beaks is greater, and therefore selection for the preference is greater, and so forth.

Credit: Doug Janson.

This dynamic is a byproduct of the structural factors inherent in sexual reproduction. In particular, dimorphism between the sexes, and the importance of selection in mate choice. Fisherian process is rapid, it is arbitrary, and, it is likely subject to oscillations as it is kept in check by other evolutionary forces. In the example above continuous selection for long beaks would obviously have some deleterious consequences as natural selection began to take its told. At that point no matter how “sexy” long beaked sons were, it would all be for naught if they couldn’t even be viable. This sort of sexual selection predicts a constant bubble of diversity of morphology over space and time.

Another sexual selection framework where fitness is a consequence of indirect forces is sensory bias. Again, an example will suffice. Imagine birds which are frugivores. In this situation there will be a natural preference for bright and vibrant colors, because those are the colors of the main food item, fruit. Females may naturally prefer individuals with the same vibrant colors as their primary food item (this may even be selectively beneficial, as it indicates strong preference of high quality food). As in the Fisherian process above obviously this can come at a cost. Bright fruit want to be eaten. Bright animals do not.

Credit: Pavel Riha

This highlights again the fact that over and over sexually selected traits may not be beneficial in the conventionally adaptive sense. They may even be a detriment to fitness! And this is also an observation of the Handicap Principle, though it turns logic on its head at the end of the game. Its counter-intuitive thesis is that costly signals in fact indicate that an organism is extremely fit. The underlying reason is that costly signals are by their nature honest. Massive antlers for example take a great deal of biological energy in production and maintenance, and, they may also make one more vulnerable to predators. Only the most superior individuals could incur such costs! The relationship here to Thorstein Veblen’s idea of “conspicuous consumption” is so obvious that I won’t bother to elaborate on it. Crazy as it may sound, from what I can tell the Handicap Principle has now come to be accepted by many biologists (Richard Dawkins’ for example has done an about face on the theory).

The Handicap Principle is arguably a model of a “good genes” of sexual selection. Unlike Fisherian runaway or sensory bias the preference is rooted in the genuine fitness of the individual as evaluated by external metrics (at least in the indirect sense of genetic health). Theories of beauty in evolutionary psychology are often implicitly predicated on this model, where high symmetry and extreme secondary sexual characteristics suggest few deleterious mutations interfering with the idealized development of the individual. The explanations for why larger size in males and larger breasts and buttocks might signal fitness are also so obvious in comparison to something like Fisherian runaway that many people find direct benefit models also more plausible. That is, not only do these traits signal good genes, but they confer immediate benefits for survival and function.

But plausibility does not lead us toward the truth in all cases. Sexual selection models explicated in verbal terms often tend toward circularity and confusion. A real thought experiment could run like so. You have a population where females prefer attractive males (e.g. they are more vibrant in their plumage). But the fitness of the females (in particular, the suvivorship of their offspring) is also depend upon mate provisioning of supplementary resources. One can easily imagine a scenario where promiscuous attractive males and monogamous less attractive males converge upon the same equilibrium fitness because of heterogeneity in female mate choice. Some females may opt for “cads,” who stray and invest little in their offspring, even though those offspring are of high genetic quality. Other females may opt for “dads,” males who have lower genetic quality, but remain more invested in their smaller number of offspring. These offspring may have higher survivorship because of the added investment. Verbal elaborations of sexual selection seem never to give a “final answer,” because there is always “on the other hand.”

And this is why I wanted to review the available literature. Unfortunately I gained little extra clarity, as the formalism above implies. The authors suggest there are four primary avenues by which sexual selection is explored: population genetics, quantitative genetics, invasion approaches, individual-based simulations. I am not particularly familiar with ‘invasion approaches,’ though in its broad outlines it seems similar to the quantitative genetic method. The population genetic methods are powerful because they start from first principles and explicitly model parameters such as linkage. But there are limits to the analytic tractability of complex phenomena such as sexual selection in population genetic models, for example, multilocus approaches tend to be difficult. The quantitative genetic methods make the standard assumptions of normal distributions for straits, and are gene blind (they look at the phenotype). They seem a nice complement to the population genetic methods, and are often useful in more practical field research. Finally, the simulation approach suffers from the lack of computational power to explore the whole parameter space.

In relation to the simulation approach, last year a phylogeneticist told me that 15 years ago researchers assumed they could never operationalize maximum likelihood models in their lifetimes. Of course today ML based packages are the ‘fast’ strategies in relation to the more heavy duty Bayesian frameworks in phylogenetics. I point this out because I have faith that simulation may be the ultimate way to go for understanding sexual selection over the long run, supplemented by the other methods as scaffolds to reduce the parameter space. We may not be able to explore the whole space of possibilities, but that is the nature of science.

My primary concern for the formal models as outlined in the review is that many of them assumed weak selection. This is a feature of many population genetic models (e.g. see W. D. Hamilton’s original work on inclusive fitness), but from the perspective of evolutionary genomics some of the most fascinating possibilities for sexual selection are subject to strong selection. For example, many researchers appeal to sexual selection to explain the pigmentation complex of European populations, but more and more evidence suggests that these loci have been subject to relatively strong selection. Is this plausible for sexual selection? Do we even know how strong sexual section might operate? Fisherian runaway is an obvious candidate, but this process is so rapid, and so protean, that it seems unlikely.

A major long term problem with sexual selection theories is that they seem to imply oscillatory dynamics when equilibria are more easy to digest (and traditionally many classical models are oriented toward solving for equilibria). This is why models of positive natural selection are so straightforward, they have a beginning and an end. This does not seem to be the case for more realistic sexual selection models. Rather than a specific answer to a given biological question sexual selection theory may be more useful as a way to explain the constant background flux of evolutionary process. At this point I am not convinced that it is robust enough to give us good “rough and ready” rules of thumb which we can apply as a sieve upon the welter of evolutionary genomic results.

But progress is being made, and in concert with fields like game theory and computer science I suspect that the future is going to be bright.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Representatives of Szechuan and Shangdong cuisine

The Pith: The Han Chinese are genetically diverse, due to geographic scale of range, hybridization with other populations, and possibly local adaptation.

In the USA we often speak of “Chinese food.” This is rather peculiar because there isn’t any generic “Chinese cuisine.” Rather, there are regional cuisines, which share a broad family similarity. Similarly, American “Mexican food” and “Indian food” also have no true equivalent in Mexico or India (naturally the novel American culinary concoctions often exhibit biases in the regions from which they sample due to our preferences and connections; non-vegetarian Punjabi elements dominate over Udupi, while much authentic Mexican American food has a bias toward the northern states of that nation). But to a first approximation there is some sense in speaking of a general class of cuisine which exhibits a lot of internal structure and variation, so long as one understands that there is an important finer grain of categorization.

Some of the same applies to genetic categorizations. Consider two of the populations in the original HapMap, the Yoruba from Nigeria, and the Chinese from Beijing. There are ~30 million Yoruba, but over 1 billion Han Chinese! Even granting that the Yoruba seem excellent representatives of Sub-Saharan African genetic variation (not Bantu, but not far from the Bantu), there are still more Han Chinese than Sub-Saharan Africans (including the African Diaspora). So it’s nice that over the past few years there’s been a deep-dive into Han genetics. A new paper in the European Journal of Human Genetics focuses on the north-south difference among Han Chinese, using groups flanking them to their north and south as references, Natural positive selection and north–south genetic diversity in East Asia.

First, let’s back up for a moment. Who are the Han? Where did they come from? The details aren’t simple, insofar there wasn’t a “Han Homesteading Act” which pushed the frontiers of Chinese culture and civilization to a limit demarcated by a national boundary line. But overall the shift in Chinese society over the past ~3,000 years been outward from a northern focus to the south. 2,000 years ago China proper, the zone where dominant Han ethnic habitation overlapped with Chinese political hegemony, consisted primarily of the Yellow River plain. Though the Han Dynasty extended their empire south toward Vietnam the landscape was still predominantly non-Han outside of a few locales beyond the Yangtze. During the Han Dynasty even the Yangtze River basin was still somewhat liminal. This changed between the year 0 and 1000. The collapse of the Han Dynasty in the 3rd century led to what are sometimes termed the Chinese Dark Ages. During this period of political fragmentation much of northern China was dominated by barbarian dynasties, and Han political elites controlled the commanding heights only in the south. With the rise of the Tang in the 7th century the shift to the Yangtze River which had occurred in the interregnum solidified. Economically, demographically, and to some extent culturally, what during the Han Dynasty would have been defined as a zone of barbarian habitation, or marginal Han civilization, had become the center of gravity of the Sinic world by 1000. The domains of the Han by this period began to push far south of the Yangtze, and some of the most preeminent intellectuals came out of relatively isolated southern provinces such as Fujian, on the coast between the Yangtze and Pearl River deltas. In the next 1,000 years the Han spread through many sections of southern China which were previous redoubts of aboriginal peoples. Yunnan for example likely did not become majority Han until the past few centuries.

This poses a question: was this expansion of the Han a biological process, or a cultural one? It seems likely some of both. There are even customs particular to some Chinese dialect groups, such as the Cantonese, which may have a pre-Han origin. This amalgamation combined with the widespread geographic diversity of China is a perfect laboratory for evolutionary processes. In Plagues and Peoples William H. McNeill notes that demographic expansion by Han peasants (as opposed to military or bureaucratic outposts) into much of southern China during the early Imperial period was limited due to diseases. One presumes that transforming the landscape would have some mitigating effect on the power of pestilence, but admixture and selection may also have allowed the biologically inoculated Han to occupy areas which were previously no-go.

Here’s the abstract of the paper:

Recent reports have identified a north–south cline in genetic variation in East and South-East Asia, but these studies have not formally explored the basis of these clinical differences. Understanding the origins of these variations may provide valuable insights in tracking down the functional variants in genomic regions identified by genetic association studies. Here we investigate the genetic basis of these differences with genome-wide data from the HapMap, the Human Genome Diversity Project and the Singapore Genome Variation Project. We implemented four bioinformatic measures to discover genomic regions that are considerably differentiated either between two Han Chinese populations in the north and south of China, or across 22 populations in East and South-East Asia. These measures prioritized genomic stretches with: (i) regional differences in the allelic spectrum for SNPs common to the two Han Chinese populations; (ii) differential evidence of positive selection between the two populations as quantified by integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH); (iii) significant correlation between allele frequencies and geographical latitudes of the 22 populations. We also explored the extent of linkage disequilibrium variations in these regions, which is important in combining genetic association studies from North and South Chinese. Two of the regions that emerged are found in HLA class I and II, suggesting that the HLA imputation panel from the HapMap may not be directly applicable to every Chinese sample. This has important implications to autoimmune studies that plan to impute the classical HLA alleles to fine map the SNP association signals.

The authors do not focus on phylogenetic relationships and the historical inferences one can make from them much. For example they don’t posit any complex migration scenario to explain the pattern of genetic substructure in China today. Instead the spotlight is on differences in allele frequencies which seem outside of the normal expectation, and so might have been targets of selection. To frame that appropriately in a phylogenetic context they pooled a wide range of data sets together (HGDP, HapMap, SVGP) and generated a PCA which illustrates the relationships of East Asian populations on a two dimensional plot. The figure is rather hard to make out because of similarities in color coding, but the basic result is shown to the left. You see a north-south axis within China, and some separation from groups to the north and south. Interestingly some Chinese ethnic minorities are within the range of variation of the Han. There are many reasons this could be. They might have been already nested within the original Han range of variation before the demographic expansion of the latter. There could have been extensive gene flow between the Han and minorities, in particular in the direction of the latter if the Han were far more numerous. And of course many Han dialect groups could simply be culturally assimilated minorities if you go back far enough. A combination of these with various weights in different contexts is certainly the best approximation to what occurred. Pure replacement and pure cultural diffusion seems untenable as a robust explanation. Additionally, the best check for the relationship between Han and minorities is to look for the differences within the same province. So Han from Yunnan should be cross-referenced with ethnic minorities from the same locale, instead of Han from Guangdong being proxies for “South Chinese.” I suspect that the gap between the Dai and the southern Chinese is partially an artifact of undersampling Han from those particular isolated regions of China where they live cheek-by-jowl with Dai.

But the rationale for this paper was to shine a light on the effects of natural selection on the Han genome and possible adaptations, not the systematics of East Asian human populations. As noted in the abstract they used several methods to get at this issue. They looked to see the correlation between allele frequencies and latitude. The logic presumably being that latitude is correlated with climate and other geographical parameters which serve as environmental selection pressures. All things equal northern climes for example will have fewer pathogens and parasites. Consider the value of a frost season in killing many surface soil organisms. Second they also looked at differences in Fst between Han of the north and Han of the south. Fst is a measure of between population genetic differences. As it converges upon zero there’s basically no difference between the populations in question, while a value of 1.0 would indicate that all the variation is partitioned across the two groups so that you could use a marker to perfectly distinguish membership in a population for an individual. The authors had an average difference between north and south Han in mind, and looked for genomic regions where the differences were far greater than expectation. They also looked at the contribution of a given SNP to the variation you saw illustrated in the PCA. Big contributions to the inter-population variation obviously indicate differences across populations. Finally, they also looked at haplotype structure as a signature of natural selection. While Fst focuses on specific points in the genome, haplotype structure elucidates patterns across genes, sequences of markers. Natural selection tends to homogenize genomic regions temporarily as a particular variant rises in frequency and drags along its neighbors in a selective sweep hitchhike. The two methods they used have different powers to detect selective events; iHS is better at catching sweeps in mid-stream, where allele frequencies are not fixed. XP-EHH on the other hand picks up nearly completed sweeps. These two methods complement each other and rely on similar logic. Again, like Fst the authors focused on regions of the genome which were at the tails of the expected distribution given pairs of populations with the genetic distances which one sees across the total genome.

What did they find? Here’s a table which shows you some genes:

MAF latitude cor FST(CHB vs CHS) XP-EHH iHS (CHB) iHS (CHS) SNP loadings Genes
2.1 × 10−5(rs6901084) 0.50% 0.5% (positive) 0.01% 0.01% 0.10% HLA-DRB1, HLA-DQA1-2, HLA-DOB, PSMB9, BRD2, TAP2, PSMB8, TAP1, HLA-DMB, HLA-DMA, HLA-DOA
2.0 × 10−4(rs4489283) No evidence 0.5% (positive) 0.50% 0.50% 0.10% NRG1
6.6 × 10−5(rs2370969) No evidence 0.1% (negative) 0.50% 0.10% 0.10% WDR48, GORASP1, TTC21A, AXUD1, CMYA1, CX3CR1, CCR8, SLC25A38, LAMR1, MOBP
9.3 × 10−4(rs6762261) No evidence No evidence 0.10% 0.50% 0.50% EPHB1
9.5 × 10−4(rs986148) No evidence 0.1% (positive) 0.10% - - NA

The first thing that jumps out at me is HLA. These genes are involved in immune response, and are extremely polymorphic. If you’re going to see regional differences correlated with ecology, this is where you’d look. The expansion of the Han to the south of China was probably accompanied by changes in the type of immunological portfolio which was the norm among the peasants. It isn’t in this table, but other genes found at the intersection of tests are LPP and ADH. The former has been implicated in celiac disease, while the latter is an alcohol dehydrogenase locus. When it comes to natural selection disease matters a lot, but so does digestion. I don’t have a good explanation for the patterns here, but there are differences in cuisine within China. Rice is dominant in the center and south, while wheat and millet dominate the north. I would be interesting to know if there are also variations in alcohol production and consumption. China is in many ways equivalent to Europe, and there are differences between north and south in ADH and cultural norms in the amount and nature of alcohol consumption. Finally you have something like NRG1, which seems to be a locus of neurological function. This doesn’t exhibit difference across the two Han classes, but seems to have been the target of natural selection within the overall population. Perhaps the social norms of the culture and society of Han China reshaped the personality profiles of the population?

Going back to the analogy with cuisine: like food the components and elements of genetic variation are shaped by different forces. Modern Italian cuisine for example has a dependence upon the basic elements which were common in Italy 2,000 years ago (e.g., olive oil), but it has changed a great deal with the Columbian Exchange (e.g., tomatoes). Descent shapes the possibilities of future culinary options by fixing some constraints and preferences (traditional Jewish food is light on shellfish!). But over time new variants can arise and alter the original base. Additionally, there are local adaptations. The Cajuns are descended from Acadians, from the maritime provinces of Canada. Obviously spicy crayfish concoctions were not part of their original culinary portfolio, but they had to make due with the options that they had in their new ecology. There’s a strong correlation between warmer climes and spice, probably having to do with the anti-bacterial properties of many of these non-nutritious additives. (from what I know South Indian and South Chinese cuisines are both much spicier than North Indian and North Chinese fare). Within any broad family of cuisines one must acknowledge both the unity and diversity. And the same applies within a cultural-genetic macro-region on the scale of China.

Image credit: Rolf Muller

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Back when this sort of thing was cutting edge mtDNA haplogroup J was a pretty big deal. This was the haplogroup often associated with the demic diffusion of Middle Eastern farmers into Europe. This was the “Jasmine” clade in Seven Daughters of Eve. A new paper in PLoS ONE makes an audacious claim: that J is not a lineage which underwent recent demographic expansion, but rather one which has been subject to a specific set of evolutionary dynamics which have skewed the interpretations due to a false “molecular clock” assumption. By this assumption, I mean that mtDNA, which is passed down in an unbroken chain from mother to daughter, is by and large neutral to forces like natural selection and subject to a constant mutational rate which can serve as a calibration clock to the last common ancestor between two different lineages. Additionally, mtDNA has a high mutational rate, so it accumulates lots of variation to sample, and, it is copious, so easy to extract. What’s not to like?

First, the paper, Mutation Rate Switch inside Eurasian Mitochondrial Haplogroups: Impact of Selection and Consequences for Dating Settlement in Europe:

R-lineage mitochondrial DNA represents over 90% of the European population and is significantly present all around the planet (North Africa, Asia, Oceania, and America). This lineage played a major role in migration “out of Africa” and colonization in Europe. In order to determine an accurate dating of the R lineage and its sublineages, we analyzed 1173 individuals and complete mtDNA sequences from Mitomap. This analysis revealed a new coalescence age for R at 54.500 years, as well as several limitations of standard dating methods, likely to lead to false interpretations. These findings highlight the association of a striking under-accumulation of synonymous mutations, an over-accumulation of non-synonymous mutations, and the phenotypic effect on haplogroup J. Consequently, haplogroup J is apparently not a Neolithic group but an older haplogroup (Paleolithic) that was subjected to an underestimated selective force. These findings also indicated an under-accumulation of synonymous and non-synonymous mutations localized on coding and non-coding (HVS1) sequences for haplogroup R0, which contains the major haplogroups H and V. These new dates are likely to impact the present colonization model for Europe and confirm the late glacial resettlement scenario.

John Hawks has written at length of the possible distortions that selection might produce in our understanding of the history of mtDNA lineages, and therefore our understanding of the history of the population groups which these genealogies are used as proxies for. So I won’t review that much. I find the dynamics that they’re detecting possible, even plausible. But I don’t see why the authors having introduced skepticism start to conjure up positive visions of what is the true nature of the demographics which underpin these mtDNA phylogenies, now that they’ve “corrected” for variation in the power of the molecular clock to let use look through the glass clearly.

Readers with more fluency in the mtDNA literature can probably pick it apart. At the end of the day I’m always wondering what do the subfossils tell us? In other words, ancient DNA. Inferences from contemporary populations have been a total hash at a finer grain than that of continents, so you probably shouldn’t rest on that leg alone.

Finally, I thought this paper was of interest because it’s an inversion of R1b1b2. That’s a Y chromosomal haplogroup which was once presumed to be Paleolithic but now seems likely to be Neolithic. These authors are claiming that a mtDNA haplogroup which was once presumed to be Neolithic is actually Paleolithic. All this I think indicates that we should be modulating outward our error bars whenever we make assertions based on uniparental data with any time depth and below a very coarse level of spatial granularity.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

In the image to the left you see three human males. You can generate three pairings of these individuals. When comparing these pairs which would you presume are more closely related than the other pairs? Now let me give you some more information. The rightmost image is of the president of Tanzania. The middle image is of the president of Taiwan (Republic of China). And finally, the leftmost image is of the prime minister of Papua New Guinea. With this information you should now know with certainty that the prime minister of Papua New Guinea and the president of Taiwan are much more closely related than either are to the president of Tanzania. But some of you may not have guessed that initially. Why? I suspect that physical inspection may have misled you. One of the most salient visible human characteristics is of the complexion of our largest organ, the skin. Its prominence naturally leads many to mistakenly infer relationships where they do not exist.

This was certainly an issue when European explorers encountered the peoples of Melanesia. An older term for Melanesians is “Oceanic Negro,” and some sources suggest that the Spaniards who named the island New Guinea did so with an eye to the old Guinea on the coast of West Africa. To the left is an unrooted tree which illustrates the relationships between Papuans, Bantu from Kenya, and Han Chinese. Since the font is small I’ve underlined the focal populations in red. Africans are always the “outgroup” to any two non-African populations. This is a robust pattern whenever you look at averaged total genome phylogenies. In other words, when you don’t privilege particular genes in a phylogeny humanity can be divided into African and non-African branches. But, if you look at pigmentation genes you get a different picture altogether. As it happens, not only is variation in skin color a trait of great social importance, but it turns out to be one of the few phenotypes whose genetic architecture has been well elucidated by genomics. There are about half a dozen genes responsible for most of the between population variation in complexion. Populations far from the equator seem to have developed parallel means toward lighter skins, while those near the equator are more likely to exhibit similarities. In other words, the phylogeny of these specific genes is out of sync with the average phylogeny of the genes in these populations. It is the latter which is a good reflection of demographic history, not the former.

A new paper in PLoS Genetics looks at these sorts of parallel trends more broadly, not just focusing on one trait. In particular, the authors explore the possibilities of natural selection operating upon standing genetic variation across divergent lineages. This means that there need not be a novel mutation which is driven up toward fixation rapidly by positive selection in a “hard sweep.” Rather, as populations diversify they may be subject to selection pressures which take their extant genetic variation and shift the mean of the quantitative trait in a particular direction, altering the balance of underlying allele frequencies rather than substituting novel genetic variants at one or two loci. These are “soft sweeps.”

First, the author summary, Parallel Adaptive Divergence among Geographically Diverse Human Populations:

Identifying regions of the human genome that differ among populations because of natural selection is both essential for understanding evolutionary history and a powerful method for finding functionally important variants that contribute to phenotypic diversity and disease. Adaptive events on timescales corresponding to the human diaspora may often manifest as relatively small changes in allele frequencies at numerous loci that are difficult to distinguish from stochastic changes due to genetic drift, rather than the more dramatic selective sweeps described by classic models of natural selection. In order to test whether a substantial proportion of interpopulation genetic differences are indeed adaptive, we identify loci that have undergone moderate allele frequency changes in multiple independent human lineages, and we test whether these parallel divergence events are more frequent than expected by chance. We report a significant excess of polymorphisms showing parallel divergence, especially within genes, a pattern that is best explained by geographically varying natural selection. Our results indicate that local adaptation in humans has occurred by subtle, repeated changes at particular genes that are likely to be associated with important morphological and physiological differences among human populations.

The statistics in this paper can be a bit daunting, but the basic logic is simple. The HGDP data set has a lot of SNP information on ~50 populations. These populations also exhibit variation in their phylogenetic relationships. We know, for example, that Amerindian populations are closer to East Asians than they are to Europeans. They pruned their population set down to very genetically distinctive groups. Those which don’t have too much admixture and are in ecologically unambiguous regions (so discard the Uyghur). For example, Europeans and East Asians in temperate climes, Pygmies and Papuans in the tropics. Comparing two pairs which were phylogenetically unrelated but ecologically distinctive in a similar manner they found broad evidence of parallel shifts in underlying allele frequency on a range of SNPs.

Remember, these are polymorphisms found in all populations. So natural selection is perturbing the frequencies around an average. Additionally, they focused on alleles with intermediate global frequency, so that one presumes there’s enough genetic variance for selection to be effective. Theoretically and through simulation the authors understand that a certain number of SNPs would be correlated in the manner which would imply parallel positive selection, and so possible convergence of trait values. But the authors found that for several comparisons across groups there was an excess of detected SNPs. And, the distribution across regions of the genomes for these detected SNPs is very suggestive. There was an excess of SNPs in coding regions of the genome. And, there was an even greater excess on base pairs where a change in state would result in a change in the protein! In other words, regions of the genome implicated in genuine function show more hints of convergence across unrelated lineages.

They also found particular patterns in the genes which were enriched for parallel selection:

Genes overlapping parallel divergent SNPs were modestly enriched for diverse functional categories associated with various cell types including neurons, lymphocytes, cancer, and epithelium…Among the most extreme parallel divergent genes (observed at a threshold of 0.5%) were the skin keratinization gene ABCA12SH2B1, which controls serum letpin levels and body weight…GRM5, a glutamate receptor associated with schizophrenia…and with pigmentation via the closely linked TYRATP2A2, which causes a neuropsychiatric/keratinization disorder…F13A1, a coagulation factor linked to numerous cardiovascular diseases and to Alzheimer’s…and IFIH1, associated with antiviral defense, type 1 diabetes, and psoriasis...The pleiotropic nature of many of these genes suggests that selection on one trait may have affected the evolution of other traits.

On the last part: there’s a “correlation matrix” between genetic variance and trait variance. If you slam the genome with natural selection there will usually be a correlated response on a host of traits unrelated to the target of selection because of the complex contingent nature of biological pathways. Modulating gene X to shift the value of trait 1 to increase local fitness can have large consequences for trait 2, trait 3, trait 4, and so forth.

Of course most of the SNPs detected are not targets of selection. Remember, one assumes that simply due to random chance some SNPs will exhibit patterns which spuriously match those of regions which are the targets of parallel selection pressures. Rather, the importance of this paper is that it is another step to fleshing out the broader general theory of how adaptation and demographic events interplay across the arc of human history. It was always understood that convergent evolution is a force which must have shaped humans as they diversified and radiated across the world, but the genetic details were often left unspecified for various reasons. By filling in those details we may be able to stumble upon some very interesting general insights about the parameters which frame evolutionary process.

For example:

South Americans may carry alleles adapted to temperate climates due to their ancestral migration across Beringia, and they may have lacked adequate time and/or genetic variation to completely re-adapt to a tropical environment. One SNP that fits this hypothesis lies in DDB1, which protects the skin from solar UV exposure…and is one of the strongest examples of this parallel divergence pattern, with one allele fixed in South America, over 90% in Europe and East Asia, and less than 40% in African and Oceania….

A biological anthropologist once told me that South American Indians look like Siberians in their bodily proportions in relation to other tropical people. Which makes sense since they are probably the descendants of Siberians! As for skin color, this is an interesting trait insofar as it looks like that our species evolved very dark skin relatively early in our history, at the point when we lost our fur. Tropical populations usually exhibit a modicum of functional constraint. They don’t deviate too far from the ancestral type. In contrast, temperate zone populations often “lose” the function on these pigment producing genes, though differently. To “break” a gene is far easier than to put it back together, and I suspect that’s what you’re seeing with Amerindians in the tropics. A small ancestral population which traversed Beringia only carried a non-functional copy, which probably accumulated many mutations. Once they reentered the tropical zone they “needed” function again, but that would take too many independent steps for 10-15,000 years to suffice.

As noted by the authors this was definitely a first pass. With thicker sequence level data, and better population coverage, presumably one could explore more fine-grained questions. But at least there are results which confirm what one always assumed in theory. Sometimes it is just good to do a check, because actually you never really know….

Citation: Tennessen JA, & Akey JM (2011). Parallel Adaptive Divergence among Geographically Diverse Human Populations PLoS Genetics

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS Last week I reviewed ideas about the effect of “exogenous shocks” to an ecosystem of creatures, and how it might reshape their evolutionary trajectory. These sorts of issues are well known in their generality. They have implications from the broadest macroscale systematics to microevolutionary process. The shocks point to changes over time which have a general effect, but what about exogenous parameters which shift spatially and regularly? I’m talking latitudes here. The further you get from the equator the more the climate varies over the season, and the lower the mean temperature, and, the less the aggregate radiation the biosphere catches. Allen’s rule and Bergmann’s rule are two observational trends which biologists have long observed in relation to many organisms. The equatorial variants are slimmer in their physique, while the polar ones are stockier. Additionally, there tends to be an increase in mean mass as one moves away from the equator.

But these rules are just general observations. What process underlies these observations? The likely culprit would be natural selection of course. But the specific manner in which this process shakes out, on both the organismic and genetic level, still needs to be elucidated in further detail. A new paper in PLoS Genetics attempts to do this more rigorously and deeply than has been done before for one particular world wide mammalian species, H. sapiens sapiens. We have spanned the latitudes and longitudes, and so we’re a perfect test case for an exploration of the broader microevolutionary forces which shape variation.

The paper is Adaptations to Climate-Mediated Selective Pressures in Humans. Its technical guts can be intimidating, but its initial questions and final answers are less daunting. So let’s jump straight to the last paragraph of the discussion:

The results of this genome scan not only increase our understanding of the genetic landscape of adaptation across the human genome, but they may also have a more practical value. For example, they can be used to select candidate genes for common disease risk and to generate specific testable hypotheses regarding the functions of specific genes and variants. While the results of genome-wide scans for association with diseases and other traits are accumulating at a rapid pace, interpretation of these results is often ambiguous because the power to detect all common variants that are important in the etiology of the phenotype is incomplete. This is especially true in the case of complex traits, where variants at many loci may contribute to the phenotype, each with a small effect. By combining the evidence from GWAS with evidence of selection, it may be possible to separate true causative regions from the background noise inherent in genome-wide screens for association. To facilitate this, all of our empirical rank statistics are publically available. Moreover, results of selection scans that detect evidence for spatially-varying selection may be especially relevant to diseases that show substantial differences in prevalence across ethnic groups (e.g., sodium-sensitive hypertension, type 2 diabetes, prostate cancer, osteoporosis). In the future, this approach could be extended by including additional populations and aspects of the environment to gain a more complete understanding of how natural selection has shaped variation across the genome in worldwide populations. Furthermore, whereas we relied on linkage disequilibrium between (potentially un-genotyped) adaptive variants and genotyped SNPs, whole genome re-sequencing data should give a more complete picture of the variation that underlies adaptation.

How’d they infer this? First, they had a pretty wide coverage of populations from across the world. They pooled the HGDP and HapMap, as well as a few other populations of interest, Ethiopians, some Siberian groups, and Australian Aboriginals. I do wish that the Aboriginal data set was public, but it doesn’t seem to be! The Ethiopians are I assume the ones you can find in Behar et al. The authors had a null model which was predicated on the fact that variation in the frequencies of given genetic morphs, single nucleotide polymorphisms, should be bested predicted by population history and relationships. That is, two populations will differ on a given locus in proportion to their genetic divergence, due to random forces such as genetic drift. Perturbations from this null model are possible targets of natural selection, which reshapes regions of the genome in a deterministic manner aiming at particular ends. Two 21st century classic examples of this phenomenon seem to be skin pigmentation and lactase persistence. Different populations with the same phenotype, in particular, light skin and the ability to digest lactose sugar as an adult, exhibit divergent genetic architectures.

They naturally looked to see how these deviations tracked environmental parameters you see above. Keep in mind that they did take into account correlations between these variables. Additionally, correlation does not equal causation, so there could be other variables which are correlated with the ones which they explored which might be responsible for the systematic perturbations.

Their method yielded a Bayes factor (BF) which measures the deviation from the null model for a given SNP. To judge off the bat whether these SNPs are plausibly the targets of adaptation you want to check to see if they’re enriched for certain classes of SNPs. They found that the SNPs which rejected the null model, where population history and demographics predicts genetic variation, tended to be much more likely to be genic or nonsynonymous. This means that the base pair is embedded in a coding gene, as opposed to much of the genome which isn’t translated into proteins. A nonsynonymous base pair is one at a location which changes the protein coded. Normally these sorts of changes are selected against because you don’t want to change the protein function, but when a population is adapting to a new environment this is obviously not so.

There are a host of results in the paper, but one pattern which seemed of interest was that different sets of SNPs can be selected in different population pools. Below are two panels which show the SNPs with significant BF, and how they vary as a function of the climatic variable depending upon the populations which are sampled. To the left you see the cluster which varied in western Eurasia, while in the left you see those which varied in eastern Eurasia. In a broad sense the target of selection was the same, but the specific SNPs which were pulled out the set of potential targets still exhibits stochasticity:

Natural selection is deterministic in the broadest scale, but in its instantiations it can exhibit a great deal of randomnes. Same phenotype. Different genotype. Similarly, the heat death of the universe may be determined, but there’s a lot of contingency of epiphemenonal detail between now and then. Modulating the range of populations analyzed often shifted the value of the statistic for a given SNP. Remember, averaging over the aggregate can remove important local information. That being said, the Venn Diagram below shows that there was a disproportionate tendency for the signals detected to be world wide. This indicates that the wheel isn’t reinvented as much as we might think. I wonder if it points to the limitations baked into the human genome in terms of the plasticity and flexibility of all its various pathways. There’s a structural engineer vetoing the elegant fancies of the architect?

The leftmost panel highlights the West Eurasian signals and the middle panel the East Eurasians.

As noted above these sorts of studies have both evolutionary and biomedical relevance. Perhaps the most intriguing result, albeit expected from other areas of research, is the role of antagonistic pleiotropy in many diseases. Concretely, it may be that a change in a particular location may increase reproductive fitness in a novel environment at the cost of later morbidity in life. The authors suggest that pathogenic resistance and inflammatory response may have the side effect of increasing susceptibility to a range of diseases of old age. Why is this important? I think that the authors are implying in part that a plausible evolutionary mechanism of adaptation should change our prior expectation that a given genome wide association is a false positive. At least I think that. If a SNP was the target of natural selection and shows up on GWAS, keep an eye on it! All the better if you have a good functional understanding of what’s going on there.

But more long-term, it might change our perception of the basal risk for classes of morbidity as they vary by population. Human populations have had different evolutionary histories. Their disease risks then might vary a great deal. Between population differences may be a lot less paradoxical than we think….

Citation: Hancock AM, Witonsky DB, Alkorta-Aranburu G, Beall CM, & Gebremedhin A (2011). Adaptations to Climate-Mediated Selective Pressures in Humans PLoS Genetics, : 10.1371/journal.pgen.1001375

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

EarWhen I was in college I would sometimes have late night conversations with the guys in my dorm, and the discussion would random-walk in very strange directions. During one of these quasi-salons a friend whose parents were from Korea expressed some surprise and disgust at the idea of wet earwax. It turns out he had not been aware of the fact that the majority of the people in the world have wet, sticky, earwax. I’d stumbled onto that datum in the course of my reading, and had to explain to most of the discussants that East Asians generally have dry earwax, while convincing my Korean American friend that wet earwax was not something that was totally abnormal. Earwax isn’t something we explore in polite conversation, so it makes sense that most people would be ignorant of the fact that there was inter-population variation on this phenotype.

But it doesn’t end there. Over the past five years the genetics of earwax has come back into the spotlight, because of its variation and what it can tell us about the history and evolution of humans since the Out of Africa event. Not only that, it seems the variation in earwax has some other phenotypic correlates. The SNPs in and around ABCC11 are a set where East Asians in particular show signs of being different from other world populations. The variants which are nearly fixed in East Asia around this locus are nearly disjoint in frequency with those in Africa. Here are the frequencies of the alleles of rs17822931 on ABCC11 from ALFRED:
abcc11A The expression of the dry earwax phenotype is contingent on an AA genotype, it has recessive expression. So in a population where the allele frequency of A ~0.50, the dry earwax phenotype would have a ~0.25 frequency. In a population where the A allele has a ~0.20 frequency, the dry earwax phenotype would be at ~0.04 frequency. Among people of European descent the dry earwax phenotype is present at proportions of less than ~5%. Because of recessive expression a larger minority of Japanese and Chinese should manifest wet earwax, though interestingly the ALFRED database indicates that Koreans are fixed for the A allele. In Africa conversely the G allele seems to be fixed.

So the question is: why? A new paper in Molecular Biology and Evolution argues that the allele frequency differences are a function of positive directional selection since humans left Africa ~100,000 years ago. The impact of natural selection on an ABCC11 SNP determining earwax type:

A nonsynonymous single nucleotide polymorphism (SNP), rs17822931-G/A (538G>A; Gly180Arg), in the ABCC11 gene determines human earwax type (i.e., wet or dry) and is one of most differentiated nonsynonymous SNPs between East Asian and African populations. A recent genome-wide scan for positive selection revealed that a genomic region spanning ABCC11, LONP2, and SIAH1 genes has been subjected to a selective sweep in East Asians. Considering the potential functional significance as well as the population differentiation of SNPs located in that region, rs17822931 is the most plausible candidate polymorphism to have undergone geographically restricted positive selection. In this study, we estimated the selection intensity or selection coefficient of rs17822931-A in East Asians by analyzing two microsatellite loci flanking rs17822931 in the African (HapMap-YRI) and East Asian (HapMap-JPT and HapMap-CHB) populations. Assuming a recessive selection model, a coalescent-based simulation approach suggested that the selection coefficient of rs17822931-A had been approximately 0.01 in the East Asian population, and a simulation experiment using a pseudo-sampling variable revealed that the mutation of rs17822931-A occurred 2006 generations (95% credible interval, 1023 to 3901 generations) ago. In addition, we show that absolute latitude is significantly associated with the allele frequency of rs17822931-A in Asian, Native American, and European populations, implying that the selective advantage of rs17822931-A is related to an adaptation to a cold climate. Our results provide a striking example of how local adaptation has played a significant role in the diversification of human traits.

The region around ABCC11 has come under scrutiny with the emergence of tests of natural selection predicated on inspecting patterns of linkage disequilibrium (LD). LD is basically measuring the association of genetic variants within the genome shifted away from expectation. A selective sweep tends to generate a lot of LD around the target of natural selection because as the allele in question rises in frequency its neighbors also hitchhike along. The hitchhiking process means that within a population you may see regions of the genome which exhibit long sequences of correlated single-nucelotide polymorphisms (SNPs), haplotypes. An initial selective event will presumably generate a very long homogenized block, which over time will break apart through recombination and mutation, as variation is injected back into the genome. The extent and decay of LD then can help us gauge the time and strength of selection events.

But LD can emerge via other processes besides natural selection. Imagine for example that a population of Africans and Europeans mix in a given generation. Europeans and Africans have different genetic makeups, on average, so the initial generations will have more LD than expectation because recombination will only slowly break apart the physical connection between genomic regions from European and African ancestors. The decay of LD then can give one a sense of the time since admixture as well as selection. Not only that, stochastic demographic events and processes are also important and may drive the emergence of LD. Consider a bottleneck where the frequency of a particular haplotype is driven up by random genetic drift alone. The details of these alternative scenarios are explored in the 2009 paper The role of geography in human adaptation.

All this is preamble to the fact that there’s a lot of LD around ABCC11. Here’s a visualization from the HapMap populations:


abc11From left to right you have Chinese & Japanese, Utah whites, and the Yoruba from Nigeria. An absolute value of D’ ~0 means that there’s linkage equilibrium; the default or null state where there are no atypical excessive correlations of alleles across the genome. The axes here are pairwise combinations of SNPs around ABCC11, with a focus around rs17822931, a nonsynonymous SNP which seems to be the likely functional source of the variance in earwax and other phenotypes. In terms of LD rank order the results are not surprising, across the genome East Asians tend to exhibit more LD than Europeans, and Europeans exhibit more LD than the Yoruba. Part of this is probably a function of population history, a serial bottleneck model Out of Africa would posit that drift and other stochastic forces would have a stronger impact on the genomes of East Asians than Europeans. But this seems like it can’t be the whole picture here; note the variance in allele frequency in the New World as well as in Oceania. Some of the Amerindian populations seem to have a higher frequency of the ancestral G allele on rs17822931. The figure above is easier to understand, the Y-axis is showing you the extent of heterozygosity at a given location. GA is heterozygous, GG is homozygous. Africans again tend to exhibit more heterozygosity than non-Africans, but note the sharply diminished heterozygosity for the East Asian sample around rs17822931 in ABCC11. Remember that heterozygosity tends not to go above 0.50 in a random mating population in a diallelic model (though in selective breeding it may go above 0.50 for F1 generations).

The major findings of this paper beyond what was known before seem to be a) an explicit model of how East Asians could have arrived at a high frequency of the AA genotype at rs17822931, and, b) the correlation between climate and the frequency of A. I’ll get to the second point in a bit, but what about the first? Using the nature of variation in two microsatellites flanking the SNP of interest in East Asians, and assuming a recessive selection model, the authors posit that the A allele began to rise in frequency ~50,000 years ago, and, that the selection coefficient was ~1% per generation. This a significant value for the selection parameter, and the timing is possible in light of the separation of non-Africans into a western and eastern group around that period.

But honestly I’m pretty skeptical of this. The confidence intervals don’t inspire confidence, and from what little I know selection for recessive traits should exhibit less linkage disequilibrium. At low frequencies there is very little affect of natural selection on the allele because it is mostly “masked” in heterozygotes, and therefore there will be a long period before its proportion begins to rise more rapidly. During this time recombination will have time to chop up the haplotypes around the SNP, reducing the length of the statistically associated haplotype block. Also, the authors themselves don’t seem to believe that the phenotype of earwax itself was the target of selection, so its recessive expression pattern should be less important from where I stand.

abcc11dThe idea that the genes around ABCC11 might have something to do with adaptation to cold is suggestive, but almost every East Asian trait of distinction has been hypothesized to have something to do with cold at some point by physical anthropologists. You’d figure that the Cantonese lived in igloos going by all the myriad adaptations to frigid conditions which they exhibit. The reality is that much of China, Korea and Japan are subtropical today. In any case the last figure shows the correlation across several lineages. Earlier they found that by comparing variation around this region in humans with other primates that Africans seem to be subject to purifying selection. This means that there’s constraint so that neutral forces don’t change the frequencies of functionally significant regions. It is well known that on average Africans are more diverse than non-Africans, probably because the latter are a sampling of the former, but, on a small minority of genes the reverse is true. This is likely due to the relaxation of functional constraint as humans left the ancestral African environment. And this is clearly true for rs17822931; most non-African populations exhibit some heterozygosity. East Asians here are an exception, not the rule, at having derived allele frequencies nearly fixed. The regression lines in this last figure are all statistically significant. It is interest that there are particularly strong correlations between latitude and and frequency of the derived A allele among Europeans and Native Americans. In contrast the relationship within Asian populations is weaker. Only 17% of the allele frequency variance can be explained by latitude variance among the Asian ALFRED sample.

But we shouldn’t allow the hypothesis to rise and fall just on this evidence. After all there have likely been substantial movements of populations within the last 10,000. Perhaps especially in East Asia, where the expansion of the Han south may have triggered the movement of both the Thai and Vietnamese people out of South China and into mainland Southeast Asia. The best evidence of adaptation would be among admixed populations; presumably those at higher latitudes would have higher frequencies of the AA genotype than those at lower latitudes. Instead of categorizing the populations into three coarse classes probably a more sophisticated treatment using ancestral quanta derived from STRUCTURE or ADMIXTURE as independent variables would be informative. Remember, adaptation should show evidence of decoupling ancestry from phenotype.

Finally, I have to point to this section of the discussion:

What is the cause of the selective advantage of rs17822931-A? Although the physiological function of earwax is poorly understood (Matsunaga 1962), dry earwax itself is unlikely to have provided a substantial advantage. The rs17822931-GG and GA genotypes (wet earwax) are also strongly associated with axillary osmidrosis, suggesting that the ABCC11 protein has an excretory function in the axillary apocrine gland (Nakano et al. 2009)…,

I really didn’t know what this meant. So I looked it up. Here’s what I found, A strong association of axillary osmidrosis with the wet earwax type determined by genotyping of the ABCC11 gene:

Apocrine and/or eccrine glands in the human body cause odor, especially from the axillary and pubic apocrine glands. As in other mammals, the odor may have a pheromone-like effect on the opposite sex. Although the odor does not affect health, axillary osmidrosis (AO) is a condition in which an individual feels uncomfortable with their axillary odor, regardless of its strength, and may visit a hospital. Surgery to remove the axillary gland may be performed on demand. AO is likely an oligogenic trait with rs17822931 accounting for most of the phenotypic variation and other unidentified functional variants accounting for the remainder. However, no definite diagnostic criteria or objective measuring methods have been developed to characterize the odor, and whether an individual suffers from AO depends mainly on their assessment and/or on examiner’s judgment. Human body odor may result from the breakdown of precursors into a pungent odorant by skin bacteria….

Perhaps the paper should have been titled “why barbarians smell bad”? In any case, an idea for a book title on Korean genetics: “the least smelly race.”*

Citation: Ohashi J, Naka I, & Tsuchiya N (2010). The impact of natural selection on an ABCC11 SNP determining earwax type. Molecular biology and evolution PMID: 20937735

* I’m referencing The Cleanest Race.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

ashjewheadshotLast spring two very thorough papers came out which surveyed the genetic landscape of the Jewish people (my posts, Genetics & the Jews it’s still complicated, Genetics & the Jews). The novelty of the results was due to the fact that the research groups actually looked across the very diverse populations of the Diaspora, from Morocco, Eastern Europe, Ethiopia, to Iran. They constructed a broader framework in which we can understand how these populations came to be, and how they relate to each other. Additionally, they allow us to have more perspective as to the generalizability of medical genetics findings in the area of “Jewish diseases,” which for various reasons usually are actually findings for Ashkenazi Jews (the overwhelming majority of Jews outside of Israel, but only about half of Israeli Jews).

Just as the two aforementioned papers were deep explorations of the genetic history of the Jewish people, and allowed for a systematic understanding of their current relationships, a new paper in PNAS takes a slightly different tack. First, it zooms in on Ashkenazi Jews. The Jews whose ancestors are from the broad swath of Central Europe, and later expanded into Poland-Lithuania and Russia. The descendants of Litvaks, Galicians, and the assimilated Jewish minorities such as the Germans Jews. Second, though constrained to a narrower population set, the researchers put more of an emphasis on the evolutionary parameter of natural selection. Like any population Jews have been impacted by drift, selection, migration (and its variant admixture), and mutation. Teasing apart these disparate parameters may aid in understanding the origin of Jewish diseases. The paper is open access, so you don’t have to take my interpretation as the last word. Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population:

The Ashkenazi Jewish (AJ) population has long been viewed as a genetic isolate, yet it is still unclear how population bottlenecks, admixture, or positive selection contribute to its genetic structure. Here we analyzed a large AJ cohort and found higher linkage disequilibrium (LD) and identity-by-descent relative to Europeans, as expected for an isolate. However, paradoxically we also found higher genetic diversity, a sign of an older or more admixed population but not of a long-term isolate. Recent reports have reaffirmed that the AJ population has a common Middle Eastern origin with other Jewish Diaspora populations, but also suggest that the AJ population, compared with other Jews, has had the most European admixture. Our analysis indeed revealed higher European admixture than predicted from previous Y-chromosome analyses. Moreover, we also show that admixture directly correlates with high LD, suggesting that admixture has increased both genetic diversity and LD in the AJ population. Additionally, we applied extended haplotype tests to determine whether positive selection can account for the level of AJ-prevalent diseases. We identified genomic regions under selection that account for lactose and alcohol tolerance, and although we found evidence for positive selection at some AJ-prevalent disease loci, the higher incidence of the majority of these diseases is likely the result of genetic drift following a bottleneck. Thus, the AJ population shows evidence of past founding events; however, admixture and selection have also strongly influenced its current genetic makeup.

The sample size of Ashkenazi Jews was ~400, and they looked at ~700,000 SNPs. As I said, how Jews relate to other populations really isn’t at the core of this paper as it was in the earlier ones from the spring, but there were the PCA plots (sorry Mike), a frappe bar plot, and a phylogenetic tree derived from Fst statistic. Again, remember that PCA is showing you the largest independent components of genetic variation within the data. The bar plot has a set of ancestral populations of which individuals are composites of. And finally, Fst measures between population component of genetic variation. The larger the Fst across two populations the bigger the genetic distance.

[nggallery id=6]

Using the Druze & Palestinians as the ancestral Middle Eastern reference the authors estimated that the European admixture into Ashkenazi Jews is on the order of 30-55%. This is in the same ballpark as the previous studies, so no great surprise. As I stated in earlier posts the authors can spin the same results in very different ways. From what I can tell these authors are inclined to emphasize the strong possibility that in terms of genetic distance Ashkenazi Jews are somewhat closer to Europeans than they are to Levantine Arabs. Of course these sorts of assertions need to be handled with care. The genetic distance between Ashkenazi Jews and Tuscans is less than half that between Ashenazi Jews and Russians, while the Jewish-Russian value is about 50% larger than the Jewish-Palestinian one. Remember that there’s a fair amount of circumstantial evidence that Tuscans may themselves be a relatively recent hybrid population between indigenous residents of the Italian peninsula and Near Easterners.

ashjtab1One thing that this paper does do is rebut any strong assertion that Ashkenazi Jews are a genetically homogeneous population which went through a powerful bottleneck. Basically, the idea that Jewish diseases are just an outcome of the operational inbreeding that occurs when genetic variation is expunged from a population through low effective population size. The clincher seems to be comparison of heterozygosity of Ashkenazi Jews and gentile Europeans. The former are actually somewhat more heterozygous than the latter. There’s been a bit of evidence from previous research that the long term effective population size of Ashkenazi Jews was not necessarily very small, so this isn’t a total surprise. Remember that heterozygosity simply means the fraction of individuals heterozygous at a locus.

One way you can become heterozygous is naturally admixture. Remember that populations differ across many genes. As an example, there’s a pigmentation gene, SLC24A5, where all Europeans are at one state, and all West Africans in another. Naturally African Americans exhibit much more heterozygosity on this locus than the ancestral populations. The Ashkenazi Jewish case is less extreme because the two parental populations are genetically closer, but the principle still holds.

A consequence of recent admixture between genetically different populations are high levels of linkage disequilibrium, non-random associations of alleles at different loci across the genome. Why? There are many genes where two populations may be very different. Offspring inherit half their genome from one parent, and half from the other, and the parents pass along to their offspring particular associations of alleles. There may be a set of European distinctive alleles on a chromosome, and an African distinctive set of alleles, so that in a hybrid individual the alleles are strongly correlated across loci. These associations are broken down over time by recombination. The regularity of this process can serve as a clock with which to measure the period since admixture. African Americans were used to calibrate the time since admixture for the Uyghur people of western China, who are mixed from West and East Eurasian populations. The authors did not do this in this paper, I assume because the ancestral populations were genetically rather close in comparison to the two above examples, so there’d be less linkage disequilibrium to break down in the first place.

In the Ashkenazi Jewish population they found more linkage disequilibrium than in Europeans as well as longer haplotypes. This could be the result of a population bottleneck where drift could drive up the frequency of blocks of the genome, but as they note in the paper that should probably reduce heterozygosity. The natural inference then is that admixture between distinct populations can explain both data points.

ashslselectBut let’s cut to the chase. What genes exhibit signatures of natural selection in Ashkenazi Jews? More precisely, what distinctive regions of the genome exhibit signatures of natural selection? They used the standard haplotype type based methods. Basically you’re looking for regions of the genome where there are long blocks of correlated alleles, signs of a selective sweep due to a favored variant which dragged along flanking genomic regions as it rose rapidly in frequency, more rapidly than recombination could break apart the associations. Because recombination does breaks up associations over time, you need the selective sweeps to be relatively recent to detect them with these methods. Since the Jewish people, and Ashkenazi Jews more particularly, are relatively recent historically timing shouldn’t be an issue for Jewish specific sweeps. But another factor is that the two primary tests they used, EHH and iHS, are not good at picking up sweeps which are just starting. EHH is geared toward sweeps which are almost complete, so the frequency of the selected allele is near 100%. iHS is better are mid-range values. Using a combination of these two techniques they found that six genes which are implicated in diseases characteristic of Ashkenazi Jews have the hallmarks of natural selection. Natural selection is self-evident, so what seems to have been going here is that the disease was simply a side effect or byproduct of adaptation.

The strongest signal they found was in ALDH2. The strongest signal in Europeans, LCT, was not found in Ashkenazi Jews. But is LCT a strong signal in Europeans? Many Southern European populations have low frequencies of the derived LCT allele, indicating that they haven’t been subject to strong selection for lactase persistence. These are the same populations genetically close to the Ashkenazi Jews. The authors suggest that the Jewish-European admixture occurred before the sweep of the derived LCT allele, but it seems more plausible that the Ashkenazim simply admixed with a European population, such as Italians, which do not exhibit much lactase persistence. As for ALDH2, the association between genetic variation on this locus and alcoholism is well known, and has been used to explain the low Jewish rates of the disease. In this case, the authors posit that protection from alcoholism is a positive side effect of natural selection:

The mechanism driving selection of the ALDH2 locus is unknown, but a plausible target of selection also within this selected region is the TRAFD1/FLN29 gene, which is a negative regulator of the innate immune system, important for controlling the response to bacterial and viral infection (49). TRAFD1/FLN29 may have conferred a selective advantage in the immune response to a pathogen, perhaps near the time that the Jews returned to Israel from their Babylonian captivity. Despite the unclear selective mechanism, this remains a remarkable example of a putatively selected region accounting for a known population phenotype.

Many of the other loci naturally did not show signatures of natural selection. But this sort of work is exploratory, and there are limits to the power of their techniques. As it is, it seems that we’re very far along on understanding the phylogenetic tree of the Jewish people, and we’re finally getting a grip on the exogenous parameters which might prune the branches.

Citation: Steven M. Bray, Jennifer G. Mulle, Anne F. Dodd, Ann E. Pulver, Stephen Wooding, & Stephen T. Warren (2010). Signatures of founder effects, admixture, and selection in the Ashkenazi Jewish population PNAS : 10.1073/pnas.1004381107

Related: John Hawks, New data on Ashkenazi population history.

Image Credit: Wikimedia

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

How we perceive nature and describe its shape are a matter of values and preferences. Nature does not take notice of our distinctions; they exist only as instruments which aid in our comprehension. I’ve brought this up in relation to issues such as categorization of recessive vs. dominant traits. The offspring of people of Sub-Saharan African and non-African ancestry where the non-African parent has straight or wavy hair tend to have very curly hair. Therefore, one may say that the tightly curled hair form is dominant to straight or wavy hair. But, it is also the case that there is some modification in relation to the African parent in the offspring, so the dominance is not complete. When examining the morphology of the follicle, which determines the extent of the hair’s curl, the offspring may in fact exhibit some differences from both parents. In other words our perception of the outcomes of inheritance are contingent to some extent on our categorization of the traits as well as our specific focus along the developmental pathway.

Or consider the division between “traits” and “diseases.” The quotations are necessary. Lactose intolerance is probably one of the best cases to illustrate the gnarly normative obstructions which warp our perceptions. As a point of fact lactose intolerance is the ancestral human state, and numerically predominant. It is the “wild type.” Lactose tolerance is a relatively recent adaptation, found among a variety of West Eurasian and African populations. A more politically correct term, lactase persistence, probably better encapsulates the evolutionary history of the trait, which has shifted from the class of disease to that of genetic trait when we evaluate the bigger picture (obviously diseases are simply “bad” traits”).

Sometimes though the issues are more cut & dried. No one would doubt that sickle-cell anemia is a disease. It has a major fitness impact in a colloquial sense, as well as evolutionarily. It kills you, and it kills your potential genetic lineage. But, it is also a byproduct of adaptation to endemic malaria. Sickle-cell disease one of the classical illustrations of heterozygote advantage, whereby those who carry one copy of the mutation on the gene have increased fitness vis-a-vis those who carry two normal copies of the gene. The increase in frequency of the mutant gene though is balanced by the fact that mutant homozygotes have decreased fitness.

We can then construct a narrative of the long term evolutionary dynamics from this initial condition. When a new exogenous stress hits a population mean fitness drops immediately (take a look at the biographies of the Popes, and observe how many died of malaria in the Dark Ages when that disease was new to Italy). Natural selection quickly increases in frequency any alleles which confer protection against the exogenous stress. But, baked into the cake of how genetics in complex organisms usually works, one allele may often have multiple downstream consequences. This is pleiotropy. This means that if a change at a locus increases aggregate fitness, it may nevertheless destabilize long established biochemical pathways. In the short term evolution simply takes the net fitness impact into account. Over the long term one assumes that “better solutions” will emerge which do not have so high a fitness drag, perhaps through the evolution of modifier genes which mask the deleterious outcomes of the initial mutant. This sort of ad hoc trial and error and “duct-taping” of kludges is part and parcel of how adaption works in situations where shocks out of equilibrium states are common.

In many cases the byproducts of a genetic change may be benign. To my knowledge no one knows major negative consequences of carrying the alleles which confer lactase persistence (excepting some studies indicating higher obesity, but this seems a marginal fitness impact which has only come to the fore in the past century in all likelihood). But in other cases the outcomes may not be as serious as that of sickle-cell anemia, but may rise above the level of significance where one must note the existence of a disease which is a secondary consequence of adaptation to meet a new challenge.

Yesterday I pointed to a paper which illustrates just this phenomenon, Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans:

African-Americans have higher rates of kidney disease than European-Americans. Here, we show that in African-Americans, focal segmental glomerulosclerosis (FSGS) and hypertension-attributed end-stage kidney disease (H-ESKD) are associated with two independent sequence variants in the APOL1 gene on chromosome 22 {FSGS odds ratio = 10.5 [95% confidence interval (CI) 6.0 to 18.4]; H-ESKD odds ratio = 7.3 (95% CI 5.6 to 9.5)}. The two APOL1 variants are common in African chromosomes but absent from European chromosomes, and both reside within haplotypes that harbor signatures of positive selection. Apolipoprotein L-1 (ApoL1) is a serum factor that lyses trypanosomes. In vitro assays revealed that only the kidney disease-associated ApoL1 variants lysed Trypanosoma brucei rhodesiense. We speculate that evolution of a critical survival factor in Africa may have contributed to the high rates of renal disease in African-Americans.

In its implementation the paper has a lot of moving parts, but the outcome is straightforward. If you haven’t, you might read Genomes Unzipped and its post How to read a genome-wide association study. This is a case where the original association studies were not reporting false results, but, it seems that one had to take a further step to really understand the likely molecular genetic and evolutionary underpinnings of what was going on. These results suggest that the original signals of association for variants within the MYH9 gene were actually signals from within APOL1, which happened to be next to MYH9. The region around MYH9 had already showed up in tests to detect natural selection through patterns of linkage disequilibrium (non-random associations of alleles at different loci within the genome, in this case the relevant consideration are adjacent loci across continuous regions of the genome which come together to form haplotype blocks). Since the footprint of natural selection on the genome is often wide that did not imply that MYH9 was the target of natural selection per se, opening the likely possibility for other causal associations. A convenience in light of the difficulty of establishing a plausible functional relationship between renal failure and MYH9.

To explore the possibility of nearby functional candidates the researchers focused on a number of alleles within this genomic region which exhibited maximal European-African frequency differences in the 1000 Genomes Project. Once they ascertained the between population differences they then looked at differences in allele frequencies in cases and controls within the African American population for the two diseases in question (those with the trait/disease vs. those without). Table 1 has the top line raw results:


WT = “Wild Type,” the ancestral allelic variant found in most populations. G1 and G2 are two haplotypes, associated alleles across the locus of the APOL1 gene. G1 consists of the two derived non-synonymous coding variants rs73885319 (S342G) and rs60910145 (I384M) within an exonic region of APOL1. Non-synonymous simply means that a change at that base pair alters the amino acid coded, and exons are the genomics regions whose information is eventually translated into proteins. In other words, these are non-neutral functionally significant genomic regions which do something. G2 is a 6 base pair deletion, rs71785313, close to G1 in APOL1.

apo12To more formally model the relationship between the alleles which are found to differ between cases and controls they performed a logistic regression. The alleles serve as independent variables which can predict the probable outcome of the dependent variable, the probability of FSGS or H-ESKD in this case (renal failure). Figure 1 to the left has a summary of some of the results of the regression in graphical form for FSGS. I’ve rotated it so it can fit on the screen. Basically the strong signals are to the right of the chart (from your perspective). The y-axis displays (horizontal from your perspective) negative-log of p-values for a signal at a particular marker, which is defied by the x-axis (vertical for you). The labels show the particular gene at that genomic position. The smaller the p-value, the more probable that the signal is real and not random. This produces huge spikes in the negative-log values (in the body of the paper they present p-values on the order of 10-35).

You can see that it is in APOL1 that the biggest signals reside. The first panel, A, throws all the SNPs into the mix. On MYH9 they highlight a few SNPs which combine to form the E-1 haplotype, which is strongly associated with cases (this is where the association between disease and genetic variants on MYH9 are coming from). This haplotype is found in conjunction with G1 and G2 on APOL1. E-1 is present in 89% of haplotypes carrying G1 and in 76% of haplotypes carrying G2. A classic illustration of likely correlation but not causation. The second panel controls for the effect of G1. In other words, this is showing you the variation in the dependent variable that remains after you take the largest independent variable, G1, into account. The G2 haplotype is the largest effect independent variable after G1 is taken into account; in other words, it explains most of the residual variation in FSGS probability. Finally, the last panel controls for both G1 and G2. As you can see there aren’t any major signals left; the distribution is relatively flat. Logically once you account for the variables which produce change in an outcome you shouldn’t see any impact of other variables. And that’s what happens here. They also performed controls where MYH9 was held constant, and that does not eliminate the signals in APOL1. MYH9 is conditional on its correlation with APOL1. This was the correlation which showed up on the original association studies. The exact same pattern of signals within the logistic regression model was replicated for H-ESKD. G1 had the strongest signal, then G2. The markers within MYH9 was not significant once one controlled for the variants in G1 and G2.

It is important to remember though that these markers are segregating within a human population where individuals have three potential genotypes. Ancestral homozygote, homozygote for the mutants, and heterozygote. They found that a recessive model of expression of disease is most appropriate in the case of these risk alleles. That is, most of the increased risk is accounted for by the change from one risk allele, the heterozygote state, to two risk alleles, the homozygote state. One risk allele increased odds of renal failure by 1.26, but two by 7.3. The odds ratio of two risk alleles compared to a base rate of one risk allele was 5.8. They report that the results for FSGS were broadly similar. This matters because the frequency of the trait/disease in a random mating population is conditional on the homozygotes if it has a recessive expression pattern. G1 was present in 40% of Yoruba HapMap data set, but in none of the two Eurasian groups, Europeans and East Asians. G2 was found in three Yoruba, but in none of the Eurasian groups. Assuming Hardy-Weinberg equilibrium the Yoruba should have 16% of the population at sharply elevated risk for FSGS and H-ESKD because they’d be homozygotes for the G1 allele.

Once they established which markers seem to implicated in this phenotypic variation, they wanted to focus on how the frequencies of those markers came to be. Specifically, G1 and G2 seem to be derived haplotypes which arose out of the ancestral background. In plain English 20,000 years ago Africans should have looked like all non-Africans genomically, at least on the functionally relevant segments, but within the last 10,000 years it looks like new variants rose in frequency driven by natural selection to new environmental stresses. The region has already broadly been surveyed by linkage disequilibrium based tests, which basically look for regions of long haplotypes, homogenized zones of the genome where many individuals have the variation removed because one gene rose so rapidly in frequency that huge adjacent sections hitchhiked up in frequency. Presumably this may have happened with the MYH9 haplotype correlated with the traits under consideration here; G1 and G2 dragged up the E-1 haplotype as a secondary consequence of their own rise to prominence among some Sub-Saharan African populations.

So next authors turned to tried & tested techniques and focused on the risk markers which they had discovered earlier in their research, G1 and G2. Specifically, EHH, which is best at detecting selection where sweeps have nearly completed (e.g., the derived variant is at frequency 0.95 within the population), iHS, which is best at detecting sweeps which have not completed (e.g., the derived variant is at frequency 0.6), as well as ΔiHH, which I am less familiar with but is reputedly similar to iHS but uses absolute haplotype length as opposed to relative haplotype length. Figure 2 show the results of these tests:


The resolution isn’t the best, but G1 and G2 seem to be outliers on all three tests to detect natural selection by using patterns of linkage disequilibrium. The first panel is EHH, the second and third show iHS and ΔiHH respectively, with the position of the markers being outliers among the distribution of values for the genome within the Yoruba. This is not proof of adaptation, but it changes our weights of possibilities. Additionally, they note that Europeans exhibit no such patterns on these markers. Visually the position of the markers in the latter two panels would be closer to the mode of the distribution in Europeans.

To review, first they confirmed a causal relationship between a particular set of markers, haplotypes, and the traits of interest. Second, they confirmed that said markers seem to bear the hallmarks of genomic regions subject to natural selection. We know that focal segmental glomerulosclerosis (FSGS) end-stage kidney disease (H-ESKD), the traits whose relationship to the G1 and G2 haplotypes seem confirmed, are unlikely to be targets of positive natural selection. To get a better sense of that we need to look at Apol1, the protein product of APOL1, and what it does. At this point I’ll quote the paper:

ApoL1 is the trypanolytic factor of human serum that confers resistance to the Trypanosoma brucei brucei (T. brucei brucei) parasite…T. brucei brucei has evolved into two additional subspecies, Trypanosoma brucei rhodesiense and Trypanosoma brucei gambiense, which have both acquired the ability to infect humans…T. brucei rhodesiense is predominantly found in Eastern and Southeastern Africa, while T. brucei gambiense is typically found in Western Africa, though some overlap exists…Since these parasites exist only in sub-Saharan Africa, we hypothesized that the APOL1 gene may have undergone natural selective pressure to counteract these trypanosoma adaptations. As an initial test of this hypothesis, we performed in vitro assays to compare the trypanolytic potential of the variant, disease-associated forms of ApoL1 proteins with that of the “wild-type” form of ApoL1 protein that is not associated with renal disease.

We’re talking about sleeping sickness. Here’s a description:

It starts with a headache, joint pains and fever. It is the kind you would expect to get over quickly. But after a while, things get worse. You fall asleep most of the time, are confused and get intense pains and convulsions.

If you do not get treatment, your body begins to waste away. Eventually, you slip into coma and die. This is human African trypanosommiasis, better known as sleeping sickness. If untreated, it kills 100% of its victims in a very short time.

Cheery. I think we have a plausible reason for natural selection to kick into overdrive! Or more specifically, we have a plausible external selection pressure which will drive fitness differentials which correlate with genetic variation. Increased probability of kidney disease seems preferable to this. In terms of the molecular genetics it looks like a factor, serum resistance-associated protein (SRA), produced by T. brucei rhodesiense binds to a specific location of Apol1, and that mutations at G1 and G2 change exactly that location within the protein. So these mutants may block the ability of T. brucei rhodesiense to turn off the body’s defenses against trypanosomes.

To test this they examined the in vitro lytic potential of serum produced by individuals carrying the G1 and G2 haplotypes against the three subspecies of of Trypanosoma. T. brucei brucei, which normal Apol1 can lyse, and T. brucei rhodesiense and T. brucei gambiense which can infect humans (endemic to eastern and western Africa respectively, though the former extends into west Africa as well).

- All 75 samples lysed brucie brucie

- None lysed brucie gambiense

- 46 samples lysed SRA-positive brucie rhodesiense, all 46 samples were from G1 or G2 carrying individuals

- The potency of G2 seemed higher than G1 against SRA-positive samples of brucie rhodesiense, though not SRA-negative samples, where G1 seemed as potent

- Recombinants of Apol1 which had only one of the two SNPs of the G1 haplotype were less effective against brucie rhodesiense than those which had both (G1 haplotype)

- Recombinants with G1 and G2 were not more effective against brucie rhodesiense than those with G2 alone

- Recombinants with G1 alone were more potent against SRA-negative brucie rhodesiense than those with G2 alone

- G2 was necessary and sufficient to block SRA binding to Apol1 and allow lysing of brucie rhodesiense. G1 did not block SRA binding to Apol1, but was still sufficient to lyse brucie rhodesiense, but far less potent against SRA-positive brucie rhodesiense than G2

It seems that the G1 and G2 haplotypes utilize different mechanisms to enable the lysing of invasive pathogens, and so prevent the development of sleeping sickness. Their means differ, but the ends are the same. The authors note that even minimal amounts of plasma serum produced by G2 individuals seems potent enough to block the binding of SRA to Apol1 and so enable lysis. And introduction of such plasma into the bloodstreams of individuals who do not have resistance may then be highly efficacious as a preventative treatment against sleeping sickness. They do note that they did not explore in detail the mechanism by which the G1 and G2 variants result in suscepbility to kidney failure, but that’s presumably for the future.

Finally, the second to last paragraph where they bring it all together:

It will be interesting to determine the distribution of these mutations throughout sub-Saharan Africa. In present-day Africa, T. brucei rhodesiense is found in the Eastern part of the continent, while we noted high frequency of the trypanolytic variants and the signal of positive selection in a West African population. Changes in trypanosome biology and distribution and/or human migration may explain this discrepancy, or resistance to T. brucei rhodesiense could have favored the spreading of T. brucei gambiense in West Africa. Alternatively, ApoL1 variants may provide immunity to a broader array of pathogens beyond just T. brucei rhodesiense, as a recent report linking ApoL1 with anti-Leishmania activity may suggest…Thus, resistance to T. brucei rhodesiense may not be the only factor causing these variants to be selected.

This is a very long review already. But, while I have your attention, I think I need to point to another paper on the same topic which has a slightly different twist. I won’t dig into the details with the same thoroughness as above, but rather I’ll highlight the value-add of this group’s contribution. It’s an Open Access paper, unlike the one above, so you can review it in depth yourself. Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene:

MYH9 has been proposed as a major genetic risk locus for a spectrum of nondiabetic end stage kidney disease (ESKD). We use recently released sequences from the 1000 Genomes Project to identify two western African-specific missense mutations (S342G and I384M) in the neighboring APOL1 gene, and demonstrate that these are more strongly associated with ESKD than previously reported MYH9 variants. The APOL1 gene product, apolipoprotein L-1, has been studied for its roles in trypanosomal lysis, autophagic cell death, lipid metabolism, as well as vascular and other biological activities. We also show that the distribution of these newly identified APOL1 risk variants in African populations is consistent with the pattern of African ancestry ESKD risk previously attributed to MYH9. Mapping by admixture linkage disequilibrium (MALD) localized an interval on chromosome 22, in a region that includes the MYH9 gene, which was shown to contain African ancestry risk variants associated with certain forms of ESKD…MYH9 encodes nonmuscle myosin heavy chain IIa, a major cytoskeletal nanomotor protein expressed in many cell types, including podocyte cells of the renal glomerulus. Moreover, 39 different coding region mutations in MYH9 have been identified in patients with a group of rare syndromes, collectively termed the Giant Platelet Syndromes, with clear autosomal dominant inheritance, and various clinical manifestations, sometimes also including glomerular pathology and chronic kidney disease…Accordingly, MYH9 was further explored in these studies as the leading candidate gene responsible for the MALD signal. Dense mapping of MYH9 identified individual single nucleotide polymorphisms (SNPs) and sets of such SNPs grouped as haplotypes that were found to be highly associated with a large and important group of ESKD risk phenotypes, which as a consequence were designated as MYH9-associated nephropathies…These included HIV-associated nephropathy (HIVAN), primary nonmonogenic forms of focal segmental glomerulosclerosis, and hypertension affiliated chronic kidney disease not attributed to other etiologies…The MYH9 SNP and haplotype associations observed with these forms of ESKD yielded the largest odds ratios (OR) reported to date for the association of common variants with common disease risk…Two specific MYH9 variants (rs5750250 of S-haplotype and rs11912763 of F-haplotype) were designated as most strongly predictive on the basis of Receiver Operating Characteristic analysis…These MYH9 association studies were then also extended to earlier stage and related kidney disease phenotypes and to population groups with varying degrees of recent African ancestry admixture…and led to the expectation of finding a functional African ancestry causative variant within MYH9. However, despite intensive efforts including re-sequencing of the MYH9 gene no suggested functional mutation has been identified…This led us to re-examine the interval surrounding MYH9 and to the detection of novel missense mutations with predicted functional effects in the neighboring APOL1 gene, which are significantly more associated with ESKD than all previously reported SNPs in MYH9.

Table one has the top line results. Focus on the first two rows, they’re “G1″ from the earlier study (that is, the two SNPs which combine to form the G1 haplotype).


Here’s a difference between the previous paper and this one: the table above uses cases and controls from African Americans and Hispanic Americans. The original paper which the genomic data on this sample is drawn from calculates the average ancestry of African, European and Native American in the two groups is as follows (I did some rounding to keep the values round):

African American – 85%, 10%, 5%
Hispanic American – 30%, 55%, 15%

Not surprisingly the Hispanic American sample here is mostly Puerto Rican and Dominican, explaining the greater African than Native American ancestry. Nevertheless, it is a sufficiently different genetic background to test the effects of the same marker against different genes. They confirmed the association of the markers of large effect in African Americans within the Hispanic cohort. The risk allele frequency in the African American control group is 21% vs. 37% in the cases. For Hispanic Americans are 6% and 23% for the same categories.

OK, now to the most interesting point in this short paper:

HIVAN has been considered as the most prominent of the nondiabetic forms of kidney disease within what has been termed the MYH9-associated nephropathies…We have reported absence of HIVAN in HIV infected Ethiopians, and attributed this to host genomic factors (Behar et al. 2006). Therefore, we examined the allele frequencies of the APOL1 missense mutations in a sample set of 676 individuals from 12 African populations, including 304 individuals from four Ethiopian populations…We coupled this with the corresponding distributions for the African ancestry leading MYH9 S-1 and F-1 risk alleles. A pattern of reduced frequency of the APOL1 missense mutations and also of the MYH9 risk variants was noted in northeastern African in contrast to most central, western, and southern African populations examined…Especially striking was the complete absence of the APOL1 missense mutations in Ethiopia. This combination of the reported lack of HIVAN and observed absence of the APOL1 missense mutations is consistent with APOL1 being the functionally relevant gene for HIVAN risk and likely the other forms of kidney disease previously associated with MYH9.

apo16Bingo. The previous paper focused on African Americans (along with the HapMap Yoruba). But the pattern of variation within Africa is interesting as well. Ethiopians are not quite like other Africans, having a great deal of admixture with populations from Arabia (many of the languages of highland Ethiopia are Semitic). But the majority of their ancestry remains similar to that of other Sub-Saharan Africans. As a point of contrast the ecology of Ethiopia differs a great deal from the rest of Sub-Saharan Africa because of its elevation, and concomitant frigidity. The mean monthly low in Addis Ababa is around 10 (50 for Americans) degrees and mean high 20-25 (high 60s to mid 70s for Americans). There isn’t much variation from month to month because of the low latitude, but the high elevation keeps the temperatures relatively moderate. Different environments result in different selection pressures, and Ethiopia has a very unique environment within Africa. The tsetse fly which serves as a vector forTtrypanosomes does not seem to be present in the Ethiopian highlands. The map above shows the distribution within Africa of one the markers which defines the G1 haplotype in the previous paper. Note that the modal frequency is in the west of Africa, and the frequency drops off to the east (though the geographic coverage leaves a bit to be desired if you look at the raw data which went into generating this map, which smooths over huge discontinuities).

One of the points I want to reemphasize from the tests of natural selection in the first paper is that these genetic adaptations are likely to be new, otherwise recombination would have broken up the long haplotypes and reduced linkage disequilibrium. New as in the last 10,000 years. It is interesting that a particular subspecies of Trypanosome which is immune to these genetic adaptations is endemic to west Africa. We may be seeing evolution in action here, or at least the arms race between man and pathogen where man is always one step behind. In contrast, the subspecies which is effectively diffused by the genetic adaptations reviewed here is present in higher numbers precisely in the regions where the resistance mutations are extant at lower proportions. Perhaps there are different mutations in these regions of Africa, not yet properly identified. Or perhaps the we’re seeing humans in this region at an earlier stage of the dance, so to speak.

Citation: Giulio Genovese, David J. Friedman, Michael D. Ross, Laurence Lecordier, Pierrick Uzureau, Barry I. Freedman, Donald W. Bowden, Carl D. Langefeld, Taras K. Oleksyk, Andrea Uscinski Knob, Andrea J. Bernhardy, Pamela J. Hicks, George W. Nelson, Benoit Vanhollebeke, Cheryl A. Winkler, Jeffrey B. Kopp, Etienne Pays, & Martin R. Pollak (2010). Association of Trypanolytic ApoL1 Variants with Kidney Disease in African-Americans Science : 10.1126/science.1193032

Citation: Tzur S, Rosset S, Shemer R, Yudkovsky G, Selig S, Tarekegn A, Bekele E, Bradman N, Wasser WG, Behar DM, & Skorecki K (2010). Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene. Human genetics PMID: 20635188

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

A follow up to the post below, see John Hawks, Selection’s genome-wide effect on population differentiation and p-ter’s Natural selection and recombination. As I said, it’s a dense paper, and I didn’t touch on many issues.

(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"