The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Evolutionary Genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

41L69h9XdRL._SX331_BO1,204,203,200_ It’s Darwin Day. I’m a little ambivalent about the sort of cultishness that sometimes accrues to Charles Darwin. But it is probably a phenomenon that only makes sense in light of the culture war started by evolution-rejectionists. But there is reason to be optimistic on this; according to the GSS young people tend to be more accepting of evolutionary theory. And I was intrigued by these Pew data which indicate that a substantial minority of religious people accept a naturalistic evolutionary model! (Muslims have a high fraction of this element, and, of Creationists, indicating that this is a particularly heterogeneous group)

I’m gratified that ~40 percent of my readers have read The Origin of Species. But, I’m a bit concerned that only ~8 percent have read The Genetical Theory of Natural Selection (though one more than The Structure of Evolutionary Theory). Darwin is important, but for the stuff that is more the bread & butter of this weblog, I think R. A. Fisher is probably more relevant. That is, an analytical and formal understanding of evolutionary process. With that in mind I would recommend that if you can afford it, get a hardcover The Genetical Theory of Natural Selection with the forward by J. H. Bennett. Otherwise, read the original version online. Focus on the first half, and understand that Fisher was not God, so it’s how he conceived of a problem, not his particular solution, that’s useful (he was wrong on dominance it seems for example).

• Category: Science • Tags: Evolutionary Genetics 
🔊 Listen RSS


$_35 Very important paper in PLOS BIOLOGY just out, Natural Selection Constrains Neutral Diversity across A Wide Range of Species. Important enough that the journal commissioned this article: Lewontin’s Paradox Resolved? In Larger Populations, Stronger Selection Erases More Diversity. The paradox is pretty straightforward. Assuming the neutral theory of molecular evolution you’d expect that you’d have more genetic diversity in species with larger population sizes, because the larger the population size the longer it would take for mutations to transition from novelty to fixation. More formally the time until fixation of a neutral polymorphism is ~4N e, with N e being the effective population size. In small populations mutations will emerge and fix rather quickly due to the generation to generation volatility of drift being so powerful, and therefore keeping down the total diversity. In large populations mutations will take a long time to traverse the frequency range from 0 to 100% because of the weakness of inter-generational random drift. The paradox was a big deal because for the past 30 years or so the neutral (or nearly neutral) has been the implicit null model, and I’d argue broadly supported as such, albeit with strong dissents.

41TCN6WTB4L._SY344_BO1,204,203,200_ The “controversies” that occurred from the 1970s onward about the role of selection and and its enemies are somewhat notorious. Some of the figures are well known to the public. Richard Dawkins and Stephen Jay Gould both had cameos because of their differing views about the pervasiveness of adaptation in evolutionary process more generally. But the geneticists at the heart of the major disagreements are more obscure to the general public, though in the early 1990s the Sacramento Bee reported on the beef between John Gillespie and Motoo Kimura (Gillespie was based out of UC Davis, near Sacramento). From what I can tell, and who I know, it strikes me that genomics has now somewhat mitigated the role of rhetoric in the debate, and at the same time fostered an abating of the extremism of some of the anti-selectionists. Leibniz’s stance of “let us calculate” has now become more important than a turn of the phrase or evocative metaphor. With data there is less of a role for posturing. Additionally, the fact is that many researchers did not follow mathematical theoretical proofs very closely or with genuine comprehension, so empirical results are really what is changing the terms of the debate. The Drosophila world has long been a redoubt for selectionism, but now you see papers such as Genome-wide signals of positive selection in human evolution, which argue for the importance of that population genetic parameter even for small effective population size organisms such as humans.

187874 What the authors did in the above paper was leverage the fact that with genome-wide data they could test the theoretical propositions empirically. In particular, they looked at regions with reduced recombination,* and therefore should be subject more strongly to selection (whether selective sweeps, which allow for the hitchhiking of regions around the target of selection and generate long haplotypes, or background selection, which constrains genomic variation due to negative pressures against mutation). As the figure above shows there is a correlation between the power of selection on the genome and inferred effective population size. I say inferred because they had to use species range and size as proxies. Obviously this isn’t perfect, but I suspect that the utilization of these proxy variables only diminishes the correlation. The authors admit that there is a lot of work to be done, but this is just the first step. Perhaps the results will change somewhat with a different selection of organisms (N = 40), but I’m moderately skeptical. Probably the most important line in the paper is “it seems clear that, in most cases, BGS [background selection] is a more appropriate null model for tests of natural selection than strict neutrality.”

* Recombination shuffles the association of variants across the genome, and so separates their destiny, whether good (positive selection) or bad (negative selection).

• Category: Science • Tags: Evolutionary Genetics, Genetic Draft, Selection 
🔊 Listen RSS

elementarysofevolutionarygenetics In the early 1970s the eminent evolutionary geneticist Richard C. Lewontin wrote that population genetics “was like a complex and exquisite machine, designed to process a raw material that no one had succeeded in mining.” By this, Lewontin meant that in the 1930s when R. A. Fisher, Sewall Wright and J. B. S. Haldane established the theoretical foundations of the field, the techniques to discover the variation in populations to test their suppositions was rather thin (naturally, this resulted in many controversies, see The Origins of Theoretical Population Genetics). Geneticists were using classical methods, utilizing salient phenotypes which were proxies for underlying genetic markers, and tracing patterns of co-inheritance of traits with known locations in the genetic map with novel mutants. Researchers were not even clear at that point as to the underlying biochemical structure of the particle of Mendelian inheritance, what we term DNA. That arrived onto the scene in in the 1960s. But in the early 1970s when the above was written we’re not talking about DNA sequencing. Rather, this is the allozyme era, which Lewontin helped usher in with a paper in 1966. He expresses the excitement of the times later in the passage:

Quite suddenly the situation has changed. The mother-lode has been tapped and facts in profusion have been poured into the hoppers of this theory machine. And from the other end has issued–nothing. It is not that the machine does not work, for a great clashing of gears is clearly audible, if not deafening, but it somehow cannot transform into a finished product the great volume of raw material that has been provided.”

Despite the pessimism expressed above the emergence of molecular evolution stimulated the debates around neutral theory. Over a generation ago evolutionary geneticists were grappling with the swell of data which was confronting theoretical frameworks constructed in the early 20th century. Today we live in the “post-genomic” era, and now think in terms of whole genomes. The details may differ, but many of Lewontin’s observations in the 1970s still hold true, as novel results meet the paradigms of old. Last month in PNAS Brian Charlesworth published a paper which brought this to mind, Causes of natural variation in fitness: Evidence from studies of Drosophila populations. You may know Charlesworth as the coauthor of Elements of Evolutionary Genetics, an encyclopedia of a text which I highly recommend to all. In the paper, which is both review for those of us not steeped in Drosophila genetics, and a distillation of derivations to be found in the supplements, Charlesworth notes that there is a contradiction in terms of the typical selection coefficients inferred for deleterious alleles from population genomics in relation to those from quantitative genetics. Population genomics is a new field, and involves sequencing many markers (often whole genomes) to good accuracy across a reasonable number of individuals. Quantitative genetics is a more classical framework utilizing statistical methods which interpret variation in traits within laboratory populations.

220px-Drosophila_repleta_lateral The fruit fly has a storied role in Mendelian genetics. To a great extent the study of the fruit fly is the early history of Mendelian genetics (see Lords of the Fly: Drosophila Genetics and the Experimental Life). Therefore it is natural that a large body of research exists in this area, and one can’t accept novel results obtained through new methods such as genomics at face value without some degree of skepticism. Charlesworth notes that the extremely small fitness effects of the mutation discovered via genomic methods are biased toward single nucleotide variants (SNVs); point mutations. In contrast it seems likely that the larger effect mutations implied by quantitative genetic studies, which are rather rare, and so missed in population genomic sample sizes, are due to transposable elements (TEs) interspersing themselves across the genome, and presumably disrupting function. In line with older theoretical models, most of the variation in fitness is due to a small number of mutations. Presumably as genomic methods get better (e.g., longer read to catch repeat elements and larger sample sizes) they will converge upon the older established quantitative genetic methods. Two interesting other results in this paper is that much of the variation is due to balancing selection. For theoretical reasons balancing selection can not be pervasive across the genome (too much fitness variation would result in huge death rates per generation), but, of the variation within the population much of it is maintained by balancing selection according to Charlesworth. Another interesting dynamic is that the population genomic method seem to be better at capturing the distribution of fitness effects in humans, because of our smaller effective population size. You can read the paper for the technical reason why, but the key here is to remember that one has to be careful about extrapolating from model organisms. The models are imperfect, and we always need to never outrun our ability to generalize.

As genomics becomes pervasive in population genetics this sort of analysis will be more common. Rather than “genome-of-the-week” papers we’ll move to actually trying to grapple with what the sequence data is telling us specifically about the lineage in question, and, what we can generalize from the results about evolution writ large. Some organisms have a long history of scientific study, so population genomics will supplement and complement. In other cases though organisms do not have such a rich literature and scientific culture, and the pitfalls that are highlighted here might alert us to the deficiencies in genomic methods.

Citation: Charlesworth, Brian. “Causes of natural variation in fitness: Evidence from studies of Drosophila populations.” Proceedings of the National Academy of Sciences (2015): 201423275.

🔊 Listen RSS

Quanta Magazine has a piece up audaciously titled Evolution’s Random Paths Lead to One Place. It’s basically a review of the research published in the paper Global epistasis makes adaptation predictable despite sequence-level stochasticity. There’s a lot packed into the title. Here’s the important bit from Quanta:

Many biologists argue that it would not, that chance mutations early in the evolutionary journey of a species will profoundly influence its fate. “If you replay the tape of life, you might have one initial mutation that takes you in a totally different direction,” Desai said, paraphrasing an idea first put forth by the biologist Stephen Jay Gould in the 1980s.

The findings also suggest a disconnect between evolution at the genetic level and at the level of the whole organism. Genetic mutations occur mostly at random, yet the sum of these aimless changes somehow creates a predictable pattern. The distinction could prove valuable, as much genetics research has focused on the impact of mutations in individual genes. For example, researchers often ask how a single mutation might affect a microbe’s tolerance for toxins, or a human’s risk for a disease. But if Desai’s findings hold true in other organisms, they could suggest that it’s equally important to examine how large numbers of individual genetic changes work in concert over time.

There’s been a vogue of late for attacking the utility of mouse genetics for medical research. Perhaps studying flies, yeast, and bacteria to understand evolution is also misguided? Interesting research in any case.

• Category: Science • Tags: Evolutionary Genetics 
🔊 Listen RSS

Credit: Eric Hunt

Credit: Eric Hunt

I do love me some sprouts! Greens, bitters, strong flavors of all sorts. I’ve always been like this. Some of this is surely environment. My family comes from a part of South Asia known for its love of bracing and bold sensation. But perhaps I was born this way? There’s a fair amount of evidence that taste has a substantial genetic component. This does not mean genes determine what one tastes, but it certainly opens the door for passive gene-environment correlations. If you do not find a flavor offensive, you are much more likely to explore it depths, and cultivate your palette.


Dost thou dare?
Credit: W.A. Djatmiko

And of course I’m not the only one with a deep interest in such questions. With the marginal income available to us many Americans have become “foodies,” searching for flavor bursts and novelties which their ancestors might never have been able to comprehend. More deeply in a philosophical sense the question of qualia reemerges if there is a predictable degree of inter-subjectivity in taste perception (OK, qualia is always there, though scientific sorts tend to view it as intractable in a fundamental sense).

But there’s heritability, and then there’s genes. We know that perception in some ways is heritable, but what is perhaps more interesting is if you can peg a specific genomic location to it. Then the evolutionary story becomes all the richer. And so it is with the locus TAS2R16, where a nonsynonymous mutation at location 516 seems to result in heightened sensitivity to bitter tastes. More specifically, it’s rs846664, and the derived T allele is fixed outside of Africa, while the ancestral G allele still segregates at appreciable fractions within African populations. A new paper in Molecular Biology and Evolution puts this locus under a microscope, though it does not come up with any clear conclusions. Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa presents some interesting findings. First, let’s look at the distribution of the variation in their sample populations at the SNP of most particular interest:

Region Population T516G
Outside of Africa Non-Africans 0.000
Ethiopia Semitic 0.059
Tanzania Sandawe 0.083
Ethiopia Omotic 0.093
Ethiopia Cushitic 0.095
Tanzania Iraqw 0.111
West Central Africa Fulani 0.114
Kenya Niger-Kordofanian 0.133
Ethiopia Nilo-Saharan 0.156
Kenya Afroasiatic 0.162
West Central Africa Niger-Kordofanian 0.214
Kenya Nilo-Saharan 0.225
Kenya Luo 0.250
Central Africa Niger-Kordofanian 0.329
Tanzania Hadza 0.333
Central Africa Bulala 0.361
Central Africa Nilo-Saharan 0.367
West Central Africa Afroasiatic 0.462
West Central Africa Nilo-Saharan 0.500

As you can see T is fixed outside of Africa, and varies across many African populations Previous work implied this, though coverage within Africa was not good. One thing to observe though is that the frequency of A within Africa can not be explained by recent Eurasian admixture. The frequency is way too high for that to be the sole explanation, and in any case there is no evidence that ~33% of the Hadza’s ancestry is of Eurasian provenance (the Hadza being one of the three major groups of African hunter-gatherers, along with the Bushmen and Pygmies).

Within the paper the authors resequenced ~1,000 base pairs across diverse African populations in an exonic region of this gene (the stuff that codes for amino acids). What they discovered is that of the SNPs segregating, 516 in particular was critical toward effecting phenotyping change. Not only did individuals with the T variant notably exhibit stronger bitter sensitivity, but in vitro expression with a reporter was elevated. Because they had such a dense genomic region they could perform various nucleotide based tests to detect natural selection, and, attempt coalescent models to infer genealogical history.

I’m going to spare you some of the gory details at this point. Here’s what they found. First, it does look like the region is under natural selection in many African populations, in particular, the derived haplotype with T at 516 at the center. But this result is not reproduced across all tests. The coalescent simulations make clear why: the mutation is an old variant with deep roots in the hominin lineage. In other words this variation pre-dates H. sapiens. It looks like the T allele has rapidly increased in frequency relatively recently, though more on the order of ~50,000 years, rather than ~10,000.* Basically around the time of the “Out of Africa” event. Additionally, there’s a tell-tale sign that this is being subject to selection within Africa: the genetic differences across populations at TAS2R16 far exceed the genome-wide values (the Fst at this locus is in the top 1% of loci within the African genome). Finally, one should note that the G allele haplotypes seem to be much more strongly constrained, as if they’re under purifying selection. This means that the switch to T is not all gain.

At this point you may be ready for a story about how some African populations, like Eurasians, underwent a lifestyle change, and diet changes resulted in a shift in sensory perception. That does not seem to be the story. Rather, the authors did not seem to be able to agree upon a neat explanation for what is driving these recent sweeps up from ancient standing genetic variation. They do observe that the variation does tend to cluster geographically, more so than the genome-wide results would imply. There’s likely some adaptation going on, they simply don’t know what. In the introduction and elsewhere you can see that variation at TAS2R16 does correlate with other traits. Not too surprising due to the relatively ubiquity of pleiotropy; one gene with many effects.

Stepping outside of the implications of this specific result, let’s think about what might be a takeaway: something as essential as taste perception might be a side effect of other aspects of evolutionary processes. In other words, we don’t know what the phenotypic target of selection is in this case, but we do have a good handle one of the major side effects, which is sensory perception. How one taste seems like a big deal.** Andthere have been many theories propounded that variation in bitter sensitivity is due to adaptation to poisonous plants and such, but really no one knew, and that was just the most plausible of low hanging fruit. With these results from Africa, where there is more variation in the trait and genes, and good geographic coverage, that seems to be an implausible model to adhere to (one would think the hunter-gatherer Hadza would exhibit the most sensitivity, no?). Many of the traits and tendencies which we humans see as fundamental, essential, and of great import, many actually be side effects of powerful evolutionary forces hammering at the genetic-correlation matrices which define the hidden network of co-dependencies within the genome. So there, I said it. Life is an accident. Enjoy it.

Citation: Campbell, Michael C., et al. “Origin and Differential Selection of Allelic Variation at TAS2R16 Associated with Salicin Bitter Taste Sensitivity in Africa.” Molecular biology and evolution (2013): mst211.

* If it was closer to ~10,000 I think haplotype based tests would come back with something, but they do not.

** Some Epicureans might be accused of reducing the good to taste!

• Category: Science • Tags: Anthropology, Evolution, Evolutionary Genetics, Taste 
🔊 Listen RSS

Soft serve

The trait of lactase persistence (lactose tolerance) is probably one of the better schoolbook examples of natural selection in human populations. The reasons for this are probably two-fold. There is a very strong signature of selection within a specific gene known to associate with the trait in question in many populations. And, there is a very compelling historical narrative which explains rather neatly how this particular functional change could have undergone such strong selection within the past ~5,000 years across these populations. But the elucidation of the origin and spread of this genetic adaptation is also interesting because it looks as if it was not a singular event. Populations as disparate as Arabians, Danes, and Masai seem to carry different alleles around the locus of interest which confer the ability to digest milk. This illustrates the fact when selection pressures have a viable target, there is a rapid response on the genomic level. At some point during the maturation of a mammal the regulatory pathway which produces lactase enzyme shuts down. Yet within numerous human populations this gradual shutdown process has been short-circuited.

The variety of response in relation to this adaptation was brought home to me as I read Diversity of Lactase Persistence Alleles in Ethiopia – Signature of a Soft Selective Sweep, in the latest issue of The American Journal of Human Genetics:

The persistent expression of lactase into adulthood in humans is a recent genetic adaptation that allows the consumption of milk from other mammals after weaning. In Europe, a single allele (−13910∗T, rs4988235) in an upstream region that acts as an enhancer to the expression of the lactase gene LCT is responsible for lactase persistence and appears to have been under strong directional selection in the last 5,000 years, evidenced by the widespread occurrence of this allele on an extended haplotype. In Africa and the Middle East, the situation is more complicated and at least three other alleles (−13907∗G, rs41525747; −13915∗G, rs41380347; −14010∗C, rs145946881) in the same LCT enhancer region can cause continued lactase expression. Here we examine the LCT enhancer sequence in a large lactose-tolerance-tested Ethiopian cohort of more than 350 individuals. We show that a further SNP, −14009T>G (ss 820486563), is significantly associated with lactose-digester status, and in vitro functional tests confirm that the −14009∗G allele also increases expression of an LCT promoter construct. The derived alleles in the LCT enhancer region are spread through several ethnic groups, and we report a greater genetic diversity in lactose digesters than in nondigesters. By examining flanking markers to control for the effects of mutation and demography, we further describe, from empirical evidence, the signature of a soft selective sweep.

To some extent the paper was written rather confusingly for my taste. Importantly, they did not even consider the results of Pagani et al. (in the same journal!) from last year in their analysis. The big picture result is that whereas in Eurasia and East Africa it looks as if lactase persistence spread through populations via “hard” selective sweeps, in Ethiopia it may have been propagation through “soft” sweeps. The former are cases where a single new mutant confers a beneficial phenotype. In the absence of allelic competitors this variant sweeps up in frequency extremely rapidly, and flanking regions of the genome generate a long haplotype block. In Europeans this has resulted in a strongly homogenized region of the genome around LCT.

The situation in Ethiopia is a touch paradoxical in light of the above model. Instead of one allele, it looks as if several are segregating. And, the lactase persistence haplotypes exhibit more, not less, genetic diversity than the non-persistent variants. As noted in the article it may be that there are strong selective constraints against lactase persistence. Apparently there is a long non-persistent haplotype in Horn of Africa populations, explaining the reduced diversity of this subset of the sample. Whereas in a hard sweep a single mutation can rise in frequency against disfavored ancestral variants, in this situation you have a soft sweep where alternative variants with similar fitness values are presumably increasing in frequency.

But all this needs to be considered in light of Pagani et al., which indicates a very recent admixture in Ethiopia. The discussion above seems to suggest in situ selective events within the Horn of Africa, but the possibility is that the sweeps may have initiated among the Eurasian ancestors of the Ethiopians (perhaps some admixture mapping would be useful?). Ultimately this is going to be a complicated story. It doesn’t take away from the bigger picture that lactase persistence is an excellent model for natural selection, but the sketch has more details to be filled in, though I’m not quite sure about the specific character of this from this paper

🔊 Listen RSS

Citation: Comas, Iñaki, et al. “Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans.” Nature Genetics (2013).

The two phylogenies above represent Mycobacterium tuberculosis, to the left, and human mitochondrial DNA (passed from mother to daughter) on the right. It was pulled from the paper, Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans, which just came out recently, and has naturally been making a splash. As the title implies the paper concludes that humans and tuberculosis have been each other’s “partners,” after a fashion, for the whole existence of modern humanity. The main method here is somewhat brute force and straightforward, by sequencing 259 tuberculosis strains from all across the world they managed to make relatively robust phylogeographic inferences. Throwing data at a question usually resolves something. The correspondence between human and pathogen strains is qualitatively uncanny, and there is plenty enough statistical footwork to confirm it more rigorously within the body of the text.

Citation: Comas, Iñaki, et al. “Out-of-Africa migration and Neolithic coexpansion of Mycobacterium tuberculosis with modern humans.” Nature Genetics (2013).

Rather, what’s more interesting is the correlation in population growth of tuberculosis and humans as in the Bayesian skyline plots to the left (additionally, coalescence times for the pathogens seem to imply diversification at the same time as anatomically modern humans). This isn’t surprising at all obviously; more humans means more hosts. Rather, what’s a touch novel is the confirmation that tuberculosis is a Paleolithic disease. There are many illnesses which seem to be endemic to dense agricultural societies. They need the density which agrarian culture provides to incubate and propagate (not to mention the catalytic effect of coexistence with domestic animals). In contrast, pathogens which have deeper roots with our species can perpetuate themselves at the low population densities characteristic of hunter-gatherers via long incubation and low virulence. Apparently tuberculosis manifests both characteristics. It can incubate and remain latent, biding its time until it gets nasty.

I am of a mind to reflect how the “two-step” evolutionary genomic relationship of tuberculosis to humans may resemble that of dogs, who seem to have undergone some sort of domestication with hunter-gatherers, before changing again during the Neolithic. These sorts of results force me to reconsider the old maxim, “maybe it’s agriculture?” (paraphrasing L. L. Cavalli-Sforza). Perhaps agriculture accelerated many basic elements of human existence, rather than originating them.

• Category: Science • Tags: Evolutionary Genetics, Tuberculosis 
🔊 Listen RSS

A Tree of Life

Evolutionary processes which play out across the tree of life are subject to distinct dynamics which can shape and influence the structure and characteristics of individuals, populations, and whole ecosystems. For example, imagine the phylogeny and population genetic characteristics of organisms which are endemic to the islands of Hawaii. Because the Hawaiian islands are an isolated archipelago the expectation is that lineages native to the region are going to be less shaped by the parameter of migration, or gene flow between distinct populations, than might otherwise be the case. Additionally, presumably there was a “founding” event of these endemic Hawaiian lineages at some distant point in the past, so another expectation is that most of the populations would exhibit evidence of having gone through a genetic bottleneck, where the power of random drift was sharply increased for several generations. The various characteristics, or states, which we see in the present in an individual, population, or set of populations, are the outcome of a long historical process, a sequence of precise events. To understand evolution properly it behooves us to attempt to infer the nature and magnitude of these distinct dynamic parameters which have shaped the tree of life.

Credit: Verisimilus

For many the image of evolutionary processes brings to mind something on a macro scale. Perhaps that of the changing nature of protean life on earth writ large, depicted on a broad canvas such as in David Attenborough’s majestic documentaries over millions of years and across geological scales. But one can also reduce the phenomenon to a finer-grain on a concrete level, as in specific DNA molecules. Or, transform it into a more abstract rendering manipulable by algebra, such as trajectories of allele frequencies over generations. Both of these reductions emphasize the genetic aspect of natural history.

Credit: Johnuniq

Obviously evolutionary processes are not just fundamentally the flux of genetic elements, but genes are crucial to the phenomena in a biological sense. It therefore stands to reason that if we look at patterns of variation within the genome we will be able to infer in some deep fashion the manner in which life on earth has evolved, and conclude something more general about the nature of biological evolution. These are not trivial affairs; it is not surprising that philosophy-of-biology is often caricatured as philosophy-of-evolution. One might dispute the characterization, but it can not be denied that some would contend that evolutionary processes in some way allow us to understand the nature of Being, rather than just how we came into being (Creationists depict evolution as a religion-like cult, which imparts the general flavor of some of the meta-science and philosophy which serves as intellectual subtext).

R. A. Fisher

But shifting from such near-metaphysical generalities to more in-the-trenches science as it is done, we are faced today with the swell of sequence data due to the genomic revolution. What does this matter for our understanding of evolution? Many of the original arguments of evolutionary geneticists such as R. A. Fisher and Sewall Wright were predicated on inferences from the inheritance patterns of a few genes which were easily identifiable by their phenotypic markers. But a more likely frame for the dispute was one where the inferences were purely theoretical, deduction with a minimal level of empirical messiness intervening. In contrast today we live in an age where someone may pity you if you don’t have a very well assembled genome of your organism (on the order of billions of base pairs for mammals), and so have to make due with SNP marker data of a few thousand per individual!

These new data, first and foremost from humans due to the funding priorities of biomedical science, have stimulated a renaissance of method development to take advantage of the richness of the genetic variation now being uncovered. Consider PSMC, which allows one to make demographic inferences of population history from one genome by surveying patterns of heterozygosity within a single individual. Last week I reviewed a preprint which illustrated the power of extensive data analysis in shading and refining previous results which seemed straightforward on the face of it. The reformulation yielded the possibility of natural selection as being a pervasive parameter in human evolution over the past ~100,000 years. The authors compared variation at different categories of bases (synonymous vs. nonsynonomous) across the genome to reinforce both old intuitions and extract novel insights.

Citation: Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72.

Looking at diferences between synonymous vs. nonsyonomous substitutions is a tried & tested technique with a fine pedigree, but more recently haplotype based methods to detect natural selection have been all the rage, due to the emergence of dense genome-wide marker sets. These allow for the inference of correlated patterns of markers across adjacent genomic segments. This trend toward haplotype methods naturally triggered their antithesis, and the resulting synthesis to some extent can be seen in two papers, both Grossman et al., A Composite of Multiple Signals Distinguishes Causal Variants in Regions of Positive Selection, and Identifying recent adaptations in large-scale genomic data. These are improvements upon earlier work in the aughts, a reassessment which had already started to occur in the literature after the excesses of genomic methods in their detection of ubiquitous selection in human populations. More specifically, the newer techniques focused on recent selective events which leave long blocks of the genome within populations homogenized. As causal markers rapidly increase in frequency due to positive selection, they drag along flanking region in sweep events. For many generations after the initial selection event these flanking regions will produce regions of linkage disequilibrium, as recombination only slowly breaks apart apart the associations across loci. But a key drawback with these methods is that selection is not the only dynamic which results in long haplotypes and linkage disequilibrium. More specifically demographic stochasticity, colloquially the vicissitudes of population history, can also generate long homogeneous blocks of markers. The initial candidate regions yielded by a statistic like iHS were saturated by the effects of population specific history.

CMS, debuted in Grossman et al. 2010, is an attempt to correct for this bug, while retaining the power of haplotype based methods. Natural selection within the genome leaves more evidence behind in regards to its operation than just long halotype blocks and linkage disequilibrium. Selected alleles often exhibit greater between population difference than the average region of the genome (i.e., higher F st). Additionally, a new derived allele segregating within one population at a high frequency is often a telltale marker of recent adaptation, as a de novo mutation in a specific locale turns out to be beneficial. By combining tests which survey patterns of variation across loci (i.e., haplotype based methods), with those within loci and across populations (F st based methods) , CMS zeros in on a few precise narrow candidates by cross-checking with multiple tools. False positive hits aside, another major problem with relying upon a single coarse test is that they often highlight a large region as a target of natural selection. This does not necessarily allow for simple follow up when you have dozens of genes and millions of bases which are potential candidates.

The second paper, Grossman et al. 2013, is less a map of genome-wide variation, than a scan of genome-wide variation with an intent to select choice targets for more detailed analysis. To no one’s surprise for human data sets loci implicated in salient physical characteristics such as height and pigmentation, metabolism, and immune response, are high on the list of candidates. No matter the genuine issue of false positives it does seem that recent human evolution (and frankly, evolution more generally) has a fixation on these traits, no pun intended. I do wonder sometimes if this is just an feature of the fact that we humans notice exterior phenotypes, as well as disease related markers (e.g., metabolic and immune illnesses). One of the major concerns in the second paper is that a selection signature without a phenotype is often without utility, but perhaps the phenotypes are lacking in utility because humans are blind in terms of what traits are of interest. I am still skeptical of explanations for what exactly the target of selection around the EDAR locus in East Asians is.

Two alleles of SLC24A5, citation: Norton, Heather L., et al. “Genetic evidence for the convergent evolution of light skin in Europeans and East Asians.” Molecular biology and evolution 24.3 (2007): 710-722.

One of the more intriguing results from CMS in Grossman et al. 2013 is that a locus with the strongest association with resistance to leprosy also contains SLC24A5. This locus has an allele within it that is almost disjoint in frequency between Europeans and Sub-Saharan Africans. By this, I mean that almost all Africans carry one base, while nearly all Europeans care the other. The allele found in Europeans is dominant in West Asia, and present as frequencies as high as ~50% as far south and east as Sri Lanka. It is a gene which is famously correlated with lighter skin in humans and zebrafish. And yet there remains the mystery that it is present at very high frequencies rather far south, and it is certainly not a necessary condition for light skin. East Asians are nearly fixed for the ancestral variant which is common in Sub-Saharan Africa. A possible explanation is that these sorts of salient phenotypic loci have been reshaped due to very strong bouts of selection targeting particular diseases in the recent past. If this is correct, the phenotypic characteristics which we find salient in human beings may simply be pleiotropic side effects of selective sweeps anchored around disease resistance.

I am not proposing here that genomics can solve and explain evolution. The heirs of G. G. Simpson may have something to say about that. Rather, I am suggesting that the genetic piece of the puzzle will not be lacking in data to any extent within our lifetimes. My hunch is that many evolutionary genetic questions will be soluble when we have thousands of complete genomes of high quality on thousands of organisms. There is no likely windfall of fossils in the near future, so palentology will have to continue to operate in a relatively data constrained environment. For those who work in the domain of evolutionary genetics and genomics the onus is on human ingenuity, and analytic skill and savvy. Thinking hard and deep about difficult problems, rather than putting in long hours on the bench to glean more data.

🔊 Listen RSS

Layers and layers….

There is the fact of evolution. And then there is the long-standing debate of how it proceeds. The former is a settled question with little intellectual juice left. The latter is the focus of evolutionary genetics, and evolutionary biology more broadly. The debate is an old one, and goes as far back as the 19th century, where you had arch-selectionists such as Alfred Russel Wallace (see A Reason For Everything) square off against pretty much the whole of the scholarly world (e.g., Thomas Henry Huxely, “Darwin’s Bulldog,” was less than convinced of the power of natural selection as the driving force of evolutionary change). This old disagreement planted the seeds for much more vociferous disputations in the wake of the fusion of evolutionary biology and genetics in the early 20th century. They range from the Wright-Fisher controversies of the early years of evolutionary genetics, to the neutralist vs. selectionist debate of the 1970s (which left bad feelings in some cases). A cartoon-view of the implication of the debates in regards to the power of selection as opposed to stochastic contingency can be found in the works of Stephen Jay Gould (see The Structure of Evolutionary Theory) and Richard Dawkins (see The Ancestor’s Tale): does evolution result in an infinitely creative assortment due to chance events, or does it drive toward a finite set of idealized forms which populate the possible parameter space?*

But ultimately these 10,000 feet debates are more a matter of philosophy than science. At least until the scientific questions are stripped of their controversy and an equilibrium consensus emerges. That will only occur through an accumulation of publications whose results are robust to time, and subtle enough to convince dissenters. This is why Enard et al.’s preprint, Genome wide signals of pervasive positive selection in human evolution, attracted my notice. With the emergence of genomics it has been humans first in line to be analyzed, as the best data is often found from this species, so no surprise there. Rather, what is so notable about this paper in light of the past 10 years of back and forth exploration of this topic?**

By taking a deeper and more subtle look at patterns of the variation in the human genome this group has inferred that adaptation through classic positive selection has been a pervasive feature of the human genome over the past ~100,000 years. This is not a trivial inference, because there has been a great deal of controversy as to the population genetic statistics which have been used to infer selection over the past 10 years with the arrival of genome-wide data sets (in particular, a tendency toward false positives). In fact, one group has posited that a more prominent selective force within the genome has been “background selection,” which refers to constraint upon genetic variation due to purification of numerous deleterious mutations and neighboring linked sites.

The sum totality of Enard et al. may seem abstruse, and even opaque, in terms of the method. But each element is actually rather simple and clear. The major gist is that many tests for selection within the genome focus on the differences between nonynonymous and synonymous mutational variants. The former refer to base positions in the genome which result in a change in the amino acid state, while the latter are those (see the third positions) where different bases may still produce the same amino acid. The ratio between substitutions, replacements across lineages for particular base states, at these positions is a rough measure of adaptation driven by selection on the molecular level. Changes at synonymous positions are far less constrained by negative selection, while positive selection due to an increased fitness via new phenotypes is presumed to have occurred only via nonsynonymous changes. What Enard et al. point out is that the human genome is heterogeneous in the distribution of characteristics, and focusing on these sorts of pairwise differences in classes without accounting for other confounding variables may obscure dynamics on is attempting to measure. In particular, they argue that evidence of positive selective sweeps are masked by the fact that background selection tends to be stronger in regions where synonymous mutational substitutions are more likely (i.e., they are more functionally constrained, so nonsynonymous variants will be disfavored). This results in elevated neutral diversity around regions of nonsynonymous substitutions vis-a-vis strongly constrained regions with synonymous substitutions. Once correcting for the power of background selection the authors evidence for sweeps of novel adaptive variants across the human genome, which had previous been hidden.

There are two interesting empirical findings from the 1000 Genomes data set. First, the authors find that positive selection tends to operate upon regulatory elements rather than coding sequence changes. You are probably aware that this is a major area of debate currently within the field of molecular evolutionary biology. Second, there seems to be less evidence for positive selection in Sub-Saharan Africans, or, less background selection in this population. My own hunch is that it is the former, that the demographic pulse across Eurasia, and to the New World and Australasia, naturally resulted in local adaptations as environmental conditions shifted. Though it may be that the African pathogenic environment is particularly well adapted to hominin immune systems, and so imposes a stronger cost upon novel mutations than is the case for non-Africans. So I do not dismiss the second idea out of hand.

Where this debate about the power of selection will end is anyone’s guess. Nor do I care. Rather, what’s important is getting a finer-grained map of the dynamics at work so that we may perceive reality with greater clarity. One must be cautious about extrapolating from humans (e.g., the authors point out that Drosophila genomes are richer in coding sequence proportionally). But the human results which emerge because of the coming swell of genomic data will be a useful outline for the possibilities in other organisms.

Citation: Genome wide signals of pervasive positive selection in human evolution

* The cartoon qualification is due to the fact that I am aware that selection is stochastic as well.

** Voight, Benjamin F., et al. “A map of recent positive selection in the human genome.” PLoS biology 4.3 (2006): e72., Sabeti, Pardis C., et al. “Detecting recent positive selection in the human genome from haplotype structure.” Nature 419.6909 (2002): 832-837., Wang, Eric T., et al. “Global landscape of recent inferred Darwinian selection for Homo sapiens.” Proceedings of the National Academy of Sciences of the United States of America 103.1 (2006): 135-140., Williamson, Scott H., et al. “Localizing recent adaptive evolution in the human genome.” PLoS genetics 3.6 (2007): e90., Hawks, John, et al. “Recent acceleration of human adaptive evolution.” Proceedings of the National Academy of Sciences 104.52 (2007): 20753-20758., Pickrell, Joseph K., et al. “Signals of recent positive selection in a worldwide sample of human populations.” Genome research 19.5 (2009): 826-837., Hernandez, Ryan D., et al. “Classic selective sweeps were rare in recent human evolution.” Science 331.6019 (2011): 920-924.

🔊 Listen RSS

The Biometrician

I have alluded over the years to the early 20th century conflict between Mendelians, who were proto-geneticists, and the biometricians, who were classical Darwinians. As if in a Hegelian dialectic this clash of egos eventually lead to the synthesis which became population genetics. The historical process is outlined beautifully in Will Provine’s The Origins of Theoretical Population Genetics, but at the time it bore fruit in R. A. Fisher’s 1918 paper The Correlation between Relatives on the Supposition of Mendelian Inheritance. One might argue that though this publication ended the explicit debate and division, the reality is that the difference continued, not because of fundamental differences, but pragmatic ones. Classical population geneticists focused on single or two locus models to develop their intuitions about the trajectory of evolutionary processes. Quantitative geneticists refined their statistical techniques of inference on continuous characters whose heritable character was confirmed, but whose specific causal genetic elements remained mysterious.

The geneticist

It could be no different in the pre-genomic era. Without “big data” and “big metal” (i.e., computation) the rich, but messy, empirical work of quantitative genetics and the elegant and analytic landscapes of population genetics were separated by a methodological chasm. Polygenic models built from the bottom up were simply not practicable for evolutionary geneticists. And without genomics the likelihood of ascertaining causal loci in very polygenic traits was unlikely, and not necessary, for quantitative geneticists. But that is changing, one century on after Fisher’s seminal paper which fused the two fields in their theoretical axioms.

And this is not a trivial matter, because adaptive evolution occurs upon continuous characters affected by polygenic standing variation. Subtle heritable differences on quantitative traits almost certainly have a genetic basis, but when that variation is distributed across hundreds or thousands of loci, the reality must remain abstract. One can make educated assertions about the broad flow of evolutionary process, but can not get at the nuts and bolts details of how it proceeds. And these quantitative characters are of some note. Diseases such as type 2 diabetes and schizophrenia seem to have a heritable component, but their possible evolutionary origins are murky at best.

With the availability of large data sets, theorists are now rousing themselves, and attempting to close the book that Fisher opened. Over at Haldane’s Sieve they have posted a preprint with an intriguing title, The Population Genetic Signature of Polygenic Local Adaptation. The claims seem characterized by a grand modesty:

Adaptation in response to selection on polygenic phenotypes occurs via subtle allele frequencies shifts at many loci. Current population genomic techniques are not well posed to identify such signals. In the past decade, detailed knowledge about the specific loci underlying polygenic traits has begun to emerge from genome-wide association studies (GWAS). Here we combine this knowledge from GWAS with robust population genetic modeling to identify traits that have undergone local adaptation. Using GWAS data, we estimate the mean additive genetic value for a give phenotype across many populations as simple weighted sums of allele frequencies. We model the expected differentiation of GWAS loci among populations under neutrality to develop simple tests of selection across an arbitrary number of populations with arbitrary population structure. To find support for the role of specific environmental variables in local adaptation we test for correlations with the estimated genetic values. We also develop a general test of local adaptation to identify overdispersion of the estimated genetic values values among populations. This test is a natural generalization of QST /FST comparisons based on GWAS predictions. Finally we lay out a framework to identify the individual populations or groups of populations that contribute to the signal of overdispersion. These tests have considerably greater power than their single locus equivalents due to the fact that they look for positive covariance between like effect alleles. We apply our tests to the human genome diversity panel dataset using GWAS data for six different traits. This analysis uncovers a number of putative signals of local adaptation, and we discuss the biological interpretation and caveats of these results.

How? The mathematics will likely be a touch gnarly for most readers of this weblog, but the brave should just go the preprint. What I will say is that the methods outlined within the paper seem to attempt to account for the diverse multi-valent forces that polygenic traits are subject to. Dispersed weak selection is naturally subtle, and easily masked and confounded. What one must do is compare the patterns within the genome against the neutral expectations that one might predict from phylogeny and geography. Easy enough to right, but totally unfeasible in the pre-computer age. The main empirical result I will offer is that they find little evidence for selection loci implicated in type 2 diabetes. This is not dispositive of the proposition, but, it does lend credence to the idea that ideas of a ‘thrifty gene’ seem rather fanciful.

Ultimately the task of model building is tedious, and it will be iterative. But the early years of the 21st century have seen the same sort of theoretical revival and reformation which occurred in the early 20th. Only good things can come….

Citation: The Population Genetic Signature of Polygenic Local Adaptation.

• Category: Science • Tags: Evolutionary Genetics 
🔊 Listen RSS

Figure 1: “We show the frequency of all identified mutations through 1,000 generations in 6 of the 40 sequenced populations. Non-synonymous mutations are solid lines with solid circles, and synonymous and intergenic mutations are dotted lines with open circles and squares, respectively. Populations in the left and right columns were evolved at small (105) and large (106) population sizes, respectively. We observe qualitatively similar patterns in the other populations.”

Credit: Adapted from Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations.


Credit: André Karwath

Evolutionary genetics as a field emerged in the early 20th century. There were some upsides to this. R. A. Fisher was alive, so there were some incredibly brilliant theoretical minds who could focus upon the project of formalizing evolutionary process and fusing it with Mendelian genetics. And, frankly there are situations where data-free theorizing is best because that sort of theorizing at least is blind to what the solutions should be. But there were also many downsides to this early flowering of theoretical evolutionary biology. The reality that biologists were not clear as to the nature of the biomolecular substrate of inheritance, DNA, was not a hindrance for most of the high level abstraction. But to trace patterns of transmission of characters, and implicitly genotypes, within populations researchers relied upon classical phenotypic markers. This means that the theoretical speculation advanced rapidly into confusing and tendentious terrain, while the empirical data sets to test the questions at issue were simply not sufficient to resolve the debates. The emergence of molecular markers in the 1960s, and the maturation of genomics in the 2000s, has revolutionized the empirical domain of evolutionary genetics. To use a rough analogy the large data sets of the present offer up raw material for the machinery of theory to sift, process, and refine.

A new paper in Nature is a perfect illustration of this, Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations:

The dynamics of adaptation determine which mutations fix in a population, and hence how reproducible evolution will be….Here we use whole-genome whole-population sequencing to examine the dynamics of genome sequence evolution at high temporal resolution in 40 replicate Saccharomyces cerevisiae populations growing in rich medium for 1,000 generations. We find pervasive genetic hitchhiking: multiple mutations arise and move synchronously through the population as mutational ‘cohorts’. Multiple clonal cohorts are often present simultaneously, competing with each other in the same population. Our results show that patterns of sequence evolution are driven by a balance between these chance effects of hitchhiking and interference, which increase stochastic variation in evolutionary outcomes, and the deterministic action of selection on individual mutations, which favours parallel evolutionary solutions in replicate populations.

The specific question here falls under the general set of queries relating to the trajectory of mutations within populations. A stylized model may be that within large populations a favored mutation emerges periodically. Not at a uniform rate, but one defined by a poisson distribution. This basically means that it is a rare event, with the variation of the occurrence being approximately the same order of magnitude as the frequency of the event. A perfect “spherical cow” model might be one where a sequence of favored mutations emerges, and rapidly sweeps up to fixation (frequency 0→1.0), one after the other as independent events. Conveniently these independent events can be analyzed with simpler models than a more cluttered space of various favored mutations crowding each other out.

There are some issues with this model on the face of it. First, multiple mutations may emerge at the same time. There’s nothing in nature that prevents this, even if it is theoretically inconvenient. Second, these mutations are embedded in the physical genome, which is arranged sequentially. The favored variant is flanked by a large region of sequence with which it is “linked.” Therefore the rise up in frequency is going to bring along other variants in their sweep through the hitchhiking process. This second phenomenon illustrates the stochasticity of selection itself. Recall that random genetic drift changes allele frequencies due to conventional sampling processes, with greater variation generation to generation across small populations. But even notionally deterministic forces such as selection, which favor particular alleles in a biased manner, are going to have random effects because there is no rhyme or reason to their flanking regions.

A major step forward in this paper is that the authors combined the large population sizes available in a model organism like S. cerevisiae with whole-genome analysis. The latter allows them to pick up favored mutational variants, and, also to annotate them, and examine them for their patterns. Basically they looked at 40 haploid yeast lineages over 1,000 generations (haploid yeast can recombine) where mutations of frequency 0.10 were already known, and performed 100-fold coverage whole-genome sequencing. There were a total of 480 data points, so the sampling was around 80 generations apart. Their extensive time coverage allowing for further correction of false positives (sequencing errors), above and beyond the 100-fold coverage. The design seems relatively straightforward and elegant to me, though I would have preferred a bigger range of populations, as they compared 14 N=106 (large) to 26 N=105 (small). It seems likely that stochastic effects might be more discernible at lower N’s than what they looked at, but that’s surely for follow up papers.

As the curve above illustrates they found an excess of time points sampled where there were zero or more than the expected number of mutations (as defined by a poisson distribution). In an idealized model as I outlined above you’d have periodic novel mutations, with occasional clusters, tailing off rapidly as you increase the number segregating. These mutations would quickly sweep through the population to fixation. What the results here illustrate is that the real dynamic process seems more dispersed; more samples with no mutations and with many mutations than would be expected. This seems to be underpinned by the fact that in multiple cases you have combined mutations driving the same sweep. Not only does this increase the probability of fixation, but it also interferes with other mutational sweeps up. Increasing the population size seems to increase the multiple mutation scenario…but, it also results in more interference, so strangely these are less likely to fix in the population! (this where I would like a larger range in N’s to test how robust this prediction is)

There are many empirical results in this work that don’t fall into elegant verbal models which starkly present a “A Grand Unified Theory of Evolutionary Genetics”. So I thought I would sidestep quickly into the distribution of favored mutations as illustrated in table 2 of the paper. What you see here are variants found in genes where there were more mutations observed than by chance, so likely biasing it toward adaptively favored variants. Nonsynonymous mutations are those which change the amino acid. They are more numerous than silent mutations (synonymous), or those outside of genes (intergenic). You see here the weird pattern of less fixation within large populations due to interference because there are more favored mutations. Not shown in this table, but 24 genes were “hit” by mutations >=2 times, across multiple populations. These genes repeatedly targeted by selection had 141 mutations, but only one was synonymous. Of the rest there was an enrichment for frameshift or nonsense mutations, as opposed to missense. The latter alter one amino acid at a time, and many only modify protein function, rather than radically change or abolish it (as is likely in the first two cases). In other words “driver genes” which seem very likely to be subject to selection through multiple mutations over and over across populations tend to have drastic mutations. In addition, depending on functional category (e.g., mating vs. cell wall assembly) there were different proportions of missense vs. nonsense/frameshift mutations.

I have a hard time summarizing this sort of research in a few sentences. In any case the review of the results here are cursory at best. This is ultimately only the beginning of a huge area of evolutionary genetics which utilize the power of genomics to test decades old theories about general patterns. But, I wonder if perhaps what will be uncovered is that old and stale arguments about stylized verbal models will be shown to be without deep substance. For years public intellectuals such as Richard Dawkins and Stephen Jay Gould argued about the role of determinism and contingency in evolutionary biology. What results such as the above are telling us is that all this commentary was irrelevant, because both chance and determinism have complex and interleaved roles in evolutionary process. The exact nature of this dance is to be empirically determined, though there is thankfully a robust theoretical scaffold already in place.

Citation: Pervasive genetic hitchhiking and clonal interference in forty evolving yeast populations

• Category: Science • Tags: Evolutionary Genetics 
🔊 Listen RSS

Credit: Gross, Liza. “Who Needs Sex (or Males) Anyway?.” PLoS Biology 5.4 (2007): e99.

Bdelloid rotifiers get a fair amount of attention because they seem to be a lineage of obligate asexual metazoans (Richard Dawkins discusses them extensively in The Ancestor’s Tale). The fact that they don’t have sex isn’t that big of a deal. Bacteria do not have sex, and they’re quite successful. Rather, the issue is that they don’t have sex, they are complex, and, they are successful as a lineage. These do not usually go together. One of the posited explanations is that complex organisms are subject to phenomena such as Muller’s Ratchet, where they begin accumulating a load of deleterious alleles. Sex, with genetic recombination, is a way to evade this process, by mixing and matching alleles. By producing offspring with more than the expected payload of deleterious alleles the lineage can slough off unfavorable mutations which might otherwise fix. Without the ability to offload bad mutations over time they build up, and eventually one presumes the lineage would be unviable. This is just one of the myriad reasons biologists give for the long term lack of success of parthenogenetic metazoan lineages. Sex is ubiquitous among metazoans despite its two-fold cost. That is an overwhelming fact against which stands the example of the bdelloids.

A new letter in Nature proposes to answer the enigma of bdelloid rotifers by an analysis of their genomes. Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga. From what I can gather the authors found that the genome of a bdelloid rotifer (or this species) is a total mess. The chaotic arrangement of individual copies of genes is such that proper pairing which is necessary for meoisis, the conventional precursor to sexual reproduction, can not occur. By this, I mean that there seems to be a structural barrier to sex. This goes to show that evolutionary process is not just a chemistry blind fitness optimization process. The very structure of the substrate of genetic code seems to mitigate against the reemergence of sexuality, closing off possible paths toward adaptation and fitness optimization. The adaptive landscape is most certainly not flat.

So how do these rotifers remain then, knowing what we know about asexual multicellular lineages (that they are not long for this world)? The two primary answers put forth in this paper seem to be gene conversion and horizontal gene transfer (e.g., from bacteria via viruses). In the gene conversion process presumably deleterious alleles are simply reverted back to the wild state. Meanwhile, horizontal gene transfer (e.g. from bacteria) likely results in a persistent infusion of new variation into the species, which is otherwise a proliferation of clonal lineages.

The two above phenomena seem to be almost banal and obvious ways in which rotifers might avoid the pitfalls of their structurally enforced asexuality. This prompts me to ask why more organisms don’t exhibit this profile? Many lineages have parthenogenetic branches. But few of them become as wildly successful as bdelloid rotifers. It would be curious if other sexual metazoans were to be found with the genomic features of bdelloids. Perhaps there are lineages which manage to enter into meiosis despite the hurdles which bdelloids have circumvented. Of course for that we’ll need many more “genome papers.”

• Category: Science • Tags: Evolution, Evolutionary Genetics 
🔊 Listen RSS
Central Dogma

Central Dogma

One of the elementary aspects of understanding genetics on a biophysical scale is to characterize the set of processes which span the chasm between the raw sequence information of base pairs (e.g. AGCGGTCGCAAG….) and the assorted macromolecules which are woven together to create the collection of tissues, and enable the physiological processes, which result in the organism. This suite of phenomena are encapsulated most succinctly in the often maligned Central Dogma of Molecular Biology. In short, the information of the DNA sequence is transcribed and translated into proteins. Though for greater accuracy and precision one must always add the caveats of phenomena such as splicing. The baroque character of the range of processes is such an extent that molecular genetics has become a massive enterprise, to a great extent superseding classical Mendelian genetics.

One critical structural detail from an evolutionary perspective is that the amino acids which are the building blocks of proteins are generally encoded by multiple nucleotide triplets, or codons. For example the amino acid Glyceine is “four-fold degenerate,” GG A, GG G, GG C, GG U (for RNA Uracil, U, substitutes for Thymine in DNA, T), all encode it. Notice that the change is fixed upon the third position in the codon. Altering the first or second position would transform the amino acid end product, and possibly perturb the function of the final protein (or perhaps disrupt transcription altogether in some case). These are synonymous substitutions because they don’t change the functional import of the sequence, as opposed to the nonsynonymous positions (which may abolish or change function). In an evolutionary context one may presume that these synonymous substitutions are “silent.” Because natural selection operates upon heritable variation of a phenotype, and synonymous substitutions presumably do not change phenotype, it is often assumed that evolutionary change on these bases is selectively neutral. In contrast, nonsynonymous changes may be deleterious or beneficial (far more likely the former than the latter because breaking contingent complexity is easier than creating new contingent complexity). Therefore the ratio of gentic change on nonsynonymous and synonymous bases across lineages has been a common measure of possible selection on a gene.

At this point I have sketched out in the most superficial sense a set of propositions which span the concrete physical realm of the biochemical mechanics of DNA to the abstract formal evolutionary genetic models which outline the trajectory of allele frequencies over time and space. But propositions are always embedded in axioms, and those axioms may not always be literally true. For example some codons, which are notionally equivalent in terms of their amino acid output, are favored due biases derived from the various efficiencies of the translational machinery of the cell. After a fashion this too is natural selection, but it does not manifest via fitness of individual organisms at some stage of life history in a straightforward fashion. Then there are cases where synonymous mutations change the regulatory pathway in a significant manner. And so on. Despite all these deviations from the ideal presumably the preponderance of researchers accept that the utility of neutral framework for synonymous mutations allowed for the prior assumption that they were not subject to selection.

A new paper in PLoS GENETICS, Strong Purifying Selection at Synonymous Sites in D. melanogaster, takes aim at the robustness of this axiom by highlighting the likelihood that many synonymous positions in Drosophila are subject to strong purifying selection. That is, a putative silent transition produces significant functional differences which result in a major decrease in the fitness of the organisms, removing the mutant alleles from the pool of polymorphisms. Note the key qualifier here that the selection is strong. Dynamics such as mutational bias and regulatory differences mean that many would acknowledge a weak and gentle purifying selection on even synonymous sites. These authors contend something rather more radical.

Figure 1C

To be frank the paper is rather abstruse and dense in its prose, though impressive in its disciplinary breadth, ranging from statistical genetics to developmental biology. But the core result can be boiled down to raw counts of SNPs. In particular they compared introns, which like synonymous sites are putatively neutral because they are not part of the final RNA transcript which generates the protein, as a reference against which to check their sites of interest. Though subtle you can observe in the panel at the top of this post that here seems some deviation from neutrality in the 4D (for four-fold degenerate) sites. It is clearer in the second panel above. The synonymous sites seem have less genetic variation than they should. This is a tell for purifying selection, which removes low frequency deleterious mutations from the population continuously. But why is this strong selection? The issue highlighted by the authors is that the data sets from previous research were simply not dense and rich enough to distinguish between strong and weak purifying selection, as on a coarser scale of analysis the effects would be rather similar. In contrast here the authors used more than 100 Drosophila lines, and assembled nearly 1 million 4D SNPs. With such a deep sampling of the population they were able to probe even small differences, as strong selection would be discernibly more effective in flushing out very low frequency alleles (consider that in smaller samples low N variants are simply likely to be missed).

Being a paper in PLoS GENETICS it is free for all to read, so I will save you all the gory details in terms of how they corrected for biases of GC content, possible selective sweeps distorting the signal from flanked regions, etc. They were able to use resampling techniques to confirm the robustness of their inferences, though the slicing of the data into numerous categories does concern me a bit. Additionally there is mention of utilizing “parsimony,” which is somewhat concerning, in particular due to the fact that the authors even concede that this may produce false conclusions. But the big picture result is rather impressive even if the details have a daunting number of moving parts. I should mention as well that they explored the possible role that codon bias might have in generating this pattern, and that does not seem likely (in particular because purifying selection seems to effect optimal and non-optimal codons). And, there were some rather strange results too, such as their finding that purifying selection was weaker on the X chromosome than the autosome (contrary to mine, and I think their, expectation).

The “back end” of the paper is different in that it analyses the functional and developmental aspects of the genome regions of interest (4D sites). They report for example that purifying selection is operating on conserved sites across Drosophila species. Not surprising. But there also seems a significant amount of substitution and change on sites across lineages which are subject to purifying selection within lineages. This hints to gain of function of mutations which distinguish Drosophila species. Finally, are also broad patterns as to the temporal distribution of gene expression as they relate to 4D sites which are strongly conserved. As I am not well versed in developmental biology I will leave that to others, though the results seem suggestive, if opaque to me.

One paper does not overthrow 40 years of molecular evolution. And even if some of the primary assumptions and results validating neutral theory are wrong, that does not negate the utility of neutrality as a null hypothesis. But if synonymous sites are taken as a benchmark for neutrality, and have been subject to strong purifying selection all the while, then it does mean that our understanding of the balance of forces shaping the evolutionary genetic history of Drosophila may be quite wrong. The qualifier about Drosophila is I think warranted, because from what I recall earlier results reported ubiquitous selection in this model organism, and that may not hold for all taxa. The authors make the case for the generality of their results, and they may be right, but I think one should be more cautious about such claims. What this does tell us is that modern genomics and the scaling up of data is not revealing nature on just a finer scale, but may actually be smoking out structure and patterns which have long been hiding in plain site.

Citation: Lawrie DS, Messer PW, Hershberg R, Petrov DA (2013) Strong Purifying Selection at Synonymous Sites in D. melanogaster. PLoS Genet 9(5): e1003527. doi:10.1371/journal.pgen.1003527

🔊 Listen RSS

Credit: Nature (2013) doi:10.1038/nature12228

Every now and then I’m asked about the ‘aquatic ape hypothesis’. My standard response is that there’s nothing to see, and everyone should just move on. But reading a new (open access) paper in Nature, Great ape genetic diversity and population history, it crossed my mind again. The reason is this section of the legend of figure 1, “The Sanaga River forms a natural boundary between Nigeria–Cameroon and central chimpanzee populations whereas the Congo River separates the bonobo population from the central and eastern chimpanzees.” I knew of the latter division. The former was novel to me. In fact I’d never even heard of the Sanaga river prior to this paper. Though the Congo seems clearly a significant geological and hydrological entity, I’m not quite so sure of the Sanaga. The division between the chimpanzees of Nigeria-Cameroon and those of the western Congo region may be one with an overdetermined number of causes. Nevertheless, taking these riverine features as a given parameters in generating allopatric speciation and subspecies level differences, I am struck by the contrast between ourselves and our cousins. In particular, the phylogeny above seems to imply that bonobos and common chimpanzees diverged on the order of ~2 million years ago, while the Nigeria-Cameroon population separated from the western Congo population ~500,000 years before the present (depending on the method of inference you rely on, though the qualitative insight here is preserved even if you switch them around). Though it took H. sapiens sapiens to break out of the world island of Afro-Eurasia, even our erectine cousins pushed on toward the southeastern extremities of Eurasia over 1 million years ago. It seems then that our savanna ape lineage is characterized by the behavior of wander lust and lack of fear of water.

Overall this paper is important because it has reasonable coverage in terms of genomes (hitting a genome at the same putative marker more than 10 times) and populations (sampling a wide array of subspecies from the great apes, N = 79). In the future this sort of analysis will probably be ho-hum, slicing apart a rich data set with a Swiss army knife of statistical genomic analytic techniques (see the supplementary PDF). If you are interested in the complex population history of chimpanzees this paper is for you, as it explores changes in population sizes and admixture and gene flow. For me there are two big results which are of particular importance.

First, they did not find that loss of function substitutions were enriched in the human lineage. This matters because geneticists are still poking around for the “gene(s) which made us human.” Or perhaps more charitably the evolutionary processes specific and distinctive to our species which results in our uniqueness. I don’t deny we’re a unique species, but it seems that though this quest will continue for a while longer. That’s because it may be that there is nothing singularly unique about our genetics or our evolutionary origins. In fact I wouldn’t be surprised if researchers conclude that there were general trends which applied to all of Homo over the past 2 million years, rather than the subset of Homo who are the ancestors of all the living humans today (since other lineages went extinct).

Second, notice how varied the range of genetic diversity for other Hominidae is. This has been moderately well known for a while, but the chestnut of wisdom that humans are particularly genetic homogeneous as far as great apes goes suffers from the problem of aggregating different populations and taking the average. The range of heterozygosity in chimpanzees is huge. Some human populations are actually more genetically diverse than several of the other ape populations. Additionally, notice the rather deep population divergences of these ape subspecies. The bonobo-chimpanzee split occurred as Homo came into being, but among the common chimpanzee subspecies the lineages began to diverge at about the same time as archaic Homo populations. In plain English it seems that population structure with the same time depth as that of the Neandertal vs. African human is present among common chimpanzees.

In the near future this sort of analysis will branch out to other mammalian taxa. But as usual, apes first, and others later.

• Category: Science • Tags: Evolution, Evolutionary Genetics 
🔊 Listen RSS

Credit: Dan Reeves

The Y chromosome is strange. It’s gene poor and loaded with repeats. That’s one reason mtDNA phylogenetic and phylogeographic analysis preceded the Y chromosome by about 10-15 years (the other major reason in the pre-PCR age is that mtDNA is very copious). While the hypervariable region of mtDNA is an excellent molecular clock because of its high mutation rate (though at a deep enough time depth this causes problems, as bases start to turnover), in the pre-next generation sequencing era hunting around the Y chromosomes for SNPs was tedious (a significant portion of Spencer Wells’ Journey of Man focused on the nitty gritty of extraction and preparation).

Despite all this one of the weirder stories over the past decade in relation to the Y chromosome is the peculiar theory promoted by Oxford geneticist Bryan Sykes, and outlined in his book Adam’s Curse: A Future without Men. As I observed above the Y chromosome has a tendency to be filled up with genetic garbage (since it does not recombine deleterious mutations tend to accumulate). There are a few important functional regions (e.g., SRY), but there’s also a reason that sex-linked diseases occur: in most cases males have to rely on the X chromosome to pick up the slack for the Y. Extrapolating this genetic decay Sykes posited that human males would disappear within ~10 million years due to this process working its inevitable logic. Needless to say most scientists were skeptical. Extrapolating without seeing if the projections pass the sniff test is a fool’s errand. And in any case there’s no Law of Nature that sex determination has to be via the Y chromosome. Birds and reptiles have males despite a somewhat different sex determination system.

Rather than pass judgment on the Y it is more appropriately a proper area of investigation. Because of its copious enrichment with repeats (hard to sequence) and paucity of interesting genes (so why bother) to some extent it has received short shrift outside of its role as a molecular marker via microsatellites (exceptions for those interested in specific genes such as SRY and ZFY)). A new paper in Genome Research attempts to catalog and survey the evolutionary history of Y chromosomes in a select number of mammalian taxa. Comparative analysis of mammalian Y chromosomes illuminates ancestral structure and lineage-specific evolution:

Although more than thirty mammalian genomes have been sequenced to draft quality, very few of these include the Y chromosome. This has limited our understanding of the evolutionary dynamics of gene persistence and loss, our ability to identify conserved regulatory elements, as well our knowledge of the extent to which different types of selection act to maintain genes within this unique genomic environment. Here we present the first MSY (male-specific region of the Y chromosome) sequences from two carnivores, the domestic dog and cat. By combining these with other available MSY data, our multi-ordinal comparison allows for the first accounting of levels of selection constraining the evolution of eutherian Y chromosomes. Despite gene gain and loss across the phylogeny, we show the eutherian ancestor retained a core set of 15 MSY genes, most being constrained by negative selection for nearly 100 million years (My). The X-degenerate and ampliconic gene classes are partitioned into distinct chromosomal domains in most mammals, but were radically restructured on the human lineage. We identified multiple conserved non-coding elements that potentially regulate eutherian MSY genes. The acquisition of novel ampliconic gene families was accompanied by signatures of positive selection, and has differentially impacted the degeneration and expansion of MSY gene repertoires in different species.

It is not surprising that genes related to functions in the testes seem to be constrained or under selection in these organisms. The raison d’être of the Y chromosome in some ways is the testes. Rather what’s not so surprising, but striking, are the radical divergences in the genomic architecture of the Y chromosome across these mammalian lineages. Figure illustrates this well:

Click to enlarge

Looking at these results, rough as they are (I have to wonder if the greater structural complexity upon first blush for the human Y chromosome has to do with the greater resources allocated toward sequencing it in comparison to other organisms), it seems clear that the reason Sykes was wrong is that evolutionary process does not proceed in a linear fashion on the Y chromosome. There have been very different trajectories across these lineages. Some of this may be due to lineage specific selective pressures (e.g., r vs. K selected species and their relation to sperm competition). Others may simply be stochastic. The Y chromosome is subject to greater drift (smaller effective population size) than the broader genome. Whatever the details, there’s a broader story to tell.

• Category: Science • Tags: Evolution, Evolutionary Genetics, Y Chromosome 
🔊 Listen RSS

Frank analytic clarity?

Sexual selection is a big deal. A few years ago Geoffrey Miller wrote The Mating Mind: How Sexual Choice Shaped the Evolution of Human Nature, which seemed to herald a renaissance of the public awareness of this evolutionary phenomenon, triggered in part by debates over Amotz Zahavi’s Handicap Principle in the 1970s. Of course Charles Darwin discussed the process in the 19th century, and it has always been part of the arsenal of the evolutionary biologist (I first encountered it in Jared Diamond’s The Third Chimpanzee, where he lent some credence to Darwin’s supposition that human racial differences may be a consequence of sexual selection). But this bump in recognition for sexual selection seems to be accompanied by its co-option as a deus ex machina for all sorts of unexplained events. And yet as they say, that which explains everything explains nothing.

To get a better sense of the current scientific literature I consulted A Guide to Sexual Selection Theory in the Annual Review of Ecology, Evolution, and Systematics. The image above is from an actual box in this review! Normally technical boxes illuminate with an air of superior authority (e.g. “it therefore follows from eq. 1…/”), but it seems to me that the admission that a parameter can be represented by the verbal assertion that it’s complicated tells us something about the state of sexual selection theory. In short: its formal basis is baroque because the dynamic itself is not amenable to easy decomposition.

Not just for the peacocks
Credit: George Biard

First, for those who are unfamiliar with the topic, sexual selection theory comes in several flavors. As the term implies sexual selection emerges from differential fitness due to the preferences of individuals for various favored traits. I will admit beforehand that my personal preference is that sexual selection not be so artificially detached from natural selection more broadly, but the nature of the discussion is usually one where such strong distinctions are made. So I won’t make too much of a fuss about that.

Perhaps the most obvious area of difference is that there are forms of sexual selection where there is no strong exogenous fitness implication. By this, I mean that there is no great adaptive value to the trait being favored proportional to its selective value (note: the trait may not necessarily be totally neutral initially, one could imagine non-sexual preferences which triggered subsequent sexual dynamics). This is at the heart of Fisherian runaway process. The basic principle here is that if there is a correlation for a trait which is preferred, and the preference for that trait, then the two will amplify each other’s fitness and rapidly sweep up in frequency within the population. A simple illustration will suffice. Imagine that within a bird population a subset of females prefers longer beaks. There is normal variation within the population for beak length, which implies that the fitness of the shorter and longer beaked individuals is not so different. If a subset of females prefers longer beaks, then males with longer beaks will have higher fitness, because they have reproductive access to all the females, while those with shorter beaks only have access to those females who do not exhibit a preference. In the next generation there will be a correlation between longer beaks (from the fathers) and preference for longer beaks (from the mothers). Because of the correlation there is now also selection for the preference as a byproduct of selection for the longer beaks! This means that selection for longer beaks is greater, and therefore selection for the preference is greater, and so forth.

Credit: Doug Janson.

This dynamic is a byproduct of the structural factors inherent in sexual reproduction. In particular, dimorphism between the sexes, and the importance of selection in mate choice. Fisherian process is rapid, it is arbitrary, and, it is likely subject to oscillations as it is kept in check by other evolutionary forces. In the example above continuous selection for long beaks would obviously have some deleterious consequences as natural selection began to take its told. At that point no matter how “sexy” long beaked sons were, it would all be for naught if they couldn’t even be viable. This sort of sexual selection predicts a constant bubble of diversity of morphology over space and time.

Another sexual selection framework where fitness is a consequence of indirect forces is sensory bias. Again, an example will suffice. Imagine birds which are frugivores. In this situation there will be a natural preference for bright and vibrant colors, because those are the colors of the main food item, fruit. Females may naturally prefer individuals with the same vibrant colors as their primary food item (this may even be selectively beneficial, as it indicates strong preference of high quality food). As in the Fisherian process above obviously this can come at a cost. Bright fruit want to be eaten. Bright animals do not.

Credit: Pavel Riha

This highlights again the fact that over and over sexually selected traits may not be beneficial in the conventionally adaptive sense. They may even be a detriment to fitness! And this is also an observation of the Handicap Principle, though it turns logic on its head at the end of the game. Its counter-intuitive thesis is that costly signals in fact indicate that an organism is extremely fit. The underlying reason is that costly signals are by their nature honest. Massive antlers for example take a great deal of biological energy in production and maintenance, and, they may also make one more vulnerable to predators. Only the most superior individuals could incur such costs! The relationship here to Thorstein Veblen’s idea of “conspicuous consumption” is so obvious that I won’t bother to elaborate on it. Crazy as it may sound, from what I can tell the Handicap Principle has now come to be accepted by many biologists (Richard Dawkins’ for example has done an about face on the theory).

The Handicap Principle is arguably a model of a “good genes” of sexual selection. Unlike Fisherian runaway or sensory bias the preference is rooted in the genuine fitness of the individual as evaluated by external metrics (at least in the indirect sense of genetic health). Theories of beauty in evolutionary psychology are often implicitly predicated on this model, where high symmetry and extreme secondary sexual characteristics suggest few deleterious mutations interfering with the idealized development of the individual. The explanations for why larger size in males and larger breasts and buttocks might signal fitness are also so obvious in comparison to something like Fisherian runaway that many people find direct benefit models also more plausible. That is, not only do these traits signal good genes, but they confer immediate benefits for survival and function.

But plausibility does not lead us toward the truth in all cases. Sexual selection models explicated in verbal terms often tend toward circularity and confusion. A real thought experiment could run like so. You have a population where females prefer attractive males (e.g. they are more vibrant in their plumage). But the fitness of the females (in particular, the suvivorship of their offspring) is also depend upon mate provisioning of supplementary resources. One can easily imagine a scenario where promiscuous attractive males and monogamous less attractive males converge upon the same equilibrium fitness because of heterogeneity in female mate choice. Some females may opt for “cads,” who stray and invest little in their offspring, even though those offspring are of high genetic quality. Other females may opt for “dads,” males who have lower genetic quality, but remain more invested in their smaller number of offspring. These offspring may have higher survivorship because of the added investment. Verbal elaborations of sexual selection seem never to give a “final answer,” because there is always “on the other hand.”

And this is why I wanted to review the available literature. Unfortunately I gained little extra clarity, as the formalism above implies. The authors suggest there are four primary avenues by which sexual selection is explored: population genetics, quantitative genetics, invasion approaches, individual-based simulations. I am not particularly familiar with ‘invasion approaches,’ though in its broad outlines it seems similar to the quantitative genetic method. The population genetic methods are powerful because they start from first principles and explicitly model parameters such as linkage. But there are limits to the analytic tractability of complex phenomena such as sexual selection in population genetic models, for example, multilocus approaches tend to be difficult. The quantitative genetic methods make the standard assumptions of normal distributions for straits, and are gene blind (they look at the phenotype). They seem a nice complement to the population genetic methods, and are often useful in more practical field research. Finally, the simulation approach suffers from the lack of computational power to explore the whole parameter space.

In relation to the simulation approach, last year a phylogeneticist told me that 15 years ago researchers assumed they could never operationalize maximum likelihood models in their lifetimes. Of course today ML based packages are the ‘fast’ strategies in relation to the more heavy duty Bayesian frameworks in phylogenetics. I point this out because I have faith that simulation may be the ultimate way to go for understanding sexual selection over the long run, supplemented by the other methods as scaffolds to reduce the parameter space. We may not be able to explore the whole space of possibilities, but that is the nature of science.

My primary concern for the formal models as outlined in the review is that many of them assumed weak selection. This is a feature of many population genetic models (e.g. see W. D. Hamilton’s original work on inclusive fitness), but from the perspective of evolutionary genomics some of the most fascinating possibilities for sexual selection are subject to strong selection. For example, many researchers appeal to sexual selection to explain the pigmentation complex of European populations, but more and more evidence suggests that these loci have been subject to relatively strong selection. Is this plausible for sexual selection? Do we even know how strong sexual section might operate? Fisherian runaway is an obvious candidate, but this process is so rapid, and so protean, that it seems unlikely.

A major long term problem with sexual selection theories is that they seem to imply oscillatory dynamics when equilibria are more easy to digest (and traditionally many classical models are oriented toward solving for equilibria). This is why models of positive natural selection are so straightforward, they have a beginning and an end. This does not seem to be the case for more realistic sexual selection models. Rather than a specific answer to a given biological question sexual selection theory may be more useful as a way to explain the constant background flux of evolutionary process. At this point I am not convinced that it is robust enough to give us good “rough and ready” rules of thumb which we can apply as a sieve upon the welter of evolutionary genomic results.

But progress is being made, and in concert with fields like game theory and computer science I suspect that the future is going to be bright.

🔊 Listen RSS

Credit: PartnerHund

It’s an exciting time for those interested in the evolutionary genomics of the dog. In 2010 a big SNP-array paper came out, Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication. Today we’re going whole genome, which is important because many of the SNP-arrays are ascertained on domestic dogs (i.e., they are designed to pick up dog variation, and so may distort our perception of the variation in wolves). Recently I talked about an analysis of the evolutionary genomics of the dog, The genomics of selection in dogs and the parallel evolution between dogs and humans. The main interesting result of that group was to push the divergence of the dog and wolf lineages further back in time, ~30,000 years, in line with some archaeological and mtDNA finds. I did not find their arguments for the origin of the dog in East Asian convincing. Now a new preprint on arXiv, Genome Sequencing Highlights Genes Under Selection and the Dynamic Early History of Dogs, pushes this even further.

First, after reading the paper I recommend the comments at Haldane’s Sieve, where the authors engage in some back and forth. Second, one of the authors put the supplementary information on a Dropbox, so you can get that too. I highly recommend this in particular, because it has detailed methods, code, and also concise but useful explanation of concepts such as D-statistics. Overall the paper breaks down into two broad themes, the phylogenetics and analysis of adaptation and selection. There was many X coverage of an Israeli, Croatian, and Chinese wolf, and a Basenji, Boxer, and Dingo. For the primary analysis the sample sizes were N = 1, but that is not a major issue as they had extremely accurate and precise estimates as to the polymorphism across these individuals because of the repeated coerage.

A major takeaway in terms of demographics for this paper is that it’s complicated. The authors inferred that domestic dogs went through a population bottleneck in the past on the order of one to two magnitudes. Second, wolves also went through a population bottleneck, albeit milder. On the first read through I was surprised by the second finding, but after talking to a canine geneticist I was told that this wolf bottleneck had long been known. The genomics confirmed prior expectations rather than smoking out novel inferences.

Perhaps a more surprising finding is that the ancestor of dogs, and yes, there was one ancestral population, not many, derives from an extinct wolf lineage. Their inference was derived from the fact that the three wolves, sampled from putative regions of dog domestication, all exhibit equal genetic distance from the dogs. Previous work had suggested that dogs may have derived from Near Eastern wolves, while other researchers argued for an East Asian origin. The results here support the proposition that these suggestions are misinterpreting genuine gene flow between local populations of wolves and dogs. The authors detected gene flow between West Eurasian wolves and the western dogs (Basenji and Boxer) and East Eurasian wolves and eastern dogs (the Dingo).

On a minor note, these results also confirm a pre-agricultural origin for the dog, with a divergence of ~11-16 thousand years B.P. across the 95% confidence interval. This is at some discrepancy with the Chinese group, but this may just be an artifact of a different mutation rate parameter. The take home either way is that dogs pre-date agriculture.

But that doesn’t mean agriculture is irrelevant. As far as the adaptation goes there’s a lot here, and I’m not sure that this paper has anything revolutionary in that dimension. First, they confirm that just as in humans there is variation among canids in terms of copy number in the amylase gene conditional on lifestyle. Dingos and Alaskan huskies have very few copies, while ancient West Asian dogs have many. Also, the authors find normal variation in wolves for this trait, implying that amylase polymorphism is part of standing genetic variations.

I will leave it to you to survey the veritable alphabet soup of genes which have been buffeted by natural selection by evolutionary process when it comes to dogs. I’m more curious about variation within dog at this point, as there should be heritable variation there too.

Cite: arXiv:1305.7390v2 [q-bio.GN]

🔊 Listen RSS

Since the last post on genomic tools was a bit parochial, I figure it’s acceptable to put up this notice for the Bay Area Population Genomics meeting on June 8th. Registration closes on June 3rd (that is, Monday). Here’s the announcement:

Hello Everyone,

We are excited to be hosting the 8th meeting of the Bay Area Population Genomics group at UCSF Mission Bay on June 8th! Thanks to support from and the Institute for Quantitative Biosciences (QB3 @ UCSF), this conference will include breakfast and lunch. In addition, we will also have a reception during the poster session, so we highly encourage you to preview your work at BAPG before heading out to summer conferences.

Please register at, and sign up to give a talk or poster. Registration is again free, but required by June 3rd.

There is paid parking in the lot/garage at the corner of 4th and 16th streets, and we have a limited number of parking passes for people that sign up to present and/or make a strong effort to carpool (please email me for details).

We are very much looking forward to seeing you at UCSF in a few weeks!



🔊 Listen RSS

Cite: Wang, Guo-dong, et al. “The genomics of selection in dogs and the parallel evolution between dogs and humans.” Nature Communications 4 (2013): 1860.

To the left is a figure which illustrates the phylogenetic inferences from a new paper in Nature Communications, The genomics of selection in dogs and the parallel evolution between dogs and humans (see Carl Zimmer’s coverage in The New York Times). Why is this paper important? The first thing that jumped out at me is that because they’re using whole genomes (~10X coverage) of a selection of dogs and wolves the results aren’t as subject to the bias of using “chips” of polymorphisms discovered in dogs on wolves (see: Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication). The second aspect is that the coalescence of the dog vs. wolf lineage is pushed further back in time than earlier genetic work, by a factor of three. A standard model for the origin of dogs is that they arose in the Middle East ~10,000-15,000 years ago , possibly as part of the broad shift of lifestyles which culminated in the Neolithic Revolution.

This model is now in serious question. Though there have always been claims of fossils of older domestic canids (adduced as such in terms of morphology) than the ones discovered in the Middle East ~15,000 years ago, this year there has been publication of ancient mtDNA results from ~30,000 years before the present which imply the separation of putative domestic and wolf lineages at least to that date. Over the past few years I have wondered about the specific nature of the emergence of both modern humans and modern dogs, and their co-evolutionary trajectory, over the Pleistocene and into the Holocene, in light of these results.

So the preponderance of data (genomic and archaeological) leans me toward accepting the general shape and >15,000 year B.P. date for the divergence of dog and wolf lineages outlined by the authors. But there is a lot more in terms of the phylogenetics of the paper which I am not willing to agree with as so obvious and clear. In particular, the authors support a Chinese/Southeast Asian origin for the dog, rather than a Middle Eastern one. This position is backed up by the reality that the Southeast Asian dog lineages do seem quite genetically diverse, and basal to other dogs (i.e., they diverge first within the clade of domestic dogs). Additionally, in the paper itself they note that the PCA, which visualizes genetic distance, suggests that the East Asian lineages are somewhat shifted toward the wolf. Model based clustering also implies that East Asian lineages are “more wolf.”

The reason I don’t buy this conjecture is as they say in the paper itself modern distributions and relationships don’t always map onto ancient distributions and relationships. We’ve already gotten into trouble doing this for human populations of similar time depth as the new putative period of dog domestication. Ancient DNA has uncovered a great deal of discordance between the past and present. I don’t expect dogs to be any different. The authors have whole genomes of a dozen animals. When the data set is expanded to hundreds with reasonable geographic coverage let’s talk. They attempt to model some gene flow, but I suspect that this is a major problem when talking about regions of origin of a group of organisms whose divergence from the ancestral outgroup is not quite clear in its nature.

Human directed breeding. Credit: Galabwebdesign.

But, a bigger point which has less to do with the zone of origination of the dog is the mode of the origination “event.” In the paper the authors present a stark model of the classic origination event for dogs, where Ice Age hunter-gatherers adopt some puppies, and this population exhibits a sharp and punctuated divergence from the main line of the wolves. These genetic data don’t indicate that at all. Rather, the “bottleneck” as very mild, if you could call it a bottleneck (see: Vulcans through the eye of the bottleneck). Certainly some inbred modern lineages have gone through bottlenecks, but this was long subsequent to the initial separation of dog and wolf. Rather, the authors put forward an alternative hypothesis where dogs were co-existent with early man, with a subset of wolves who were happy to scavenge on the margins of human settlements. There are variations and flavors of this sort of argument, but you can bracket them as the “self domestication” model. The reality here is that I think our explicit differentiation between forms of selection is wrongheaded, the primary issue isn’t whether dogs were self-domesticated or human-domesticated, but the rate of adaptation and demographic history. It may be that the best way to think about the origin of dogs and humans isn’t that the latter domesticated the former, but that both dogs and humans changed together as their lifestyles and interactions changed. With the rise of agriculture and increased specialization of human lifestyles there occurred a concomitant diversification of dogs.

And that is where I think the second part of the paper, focusing on parallel adaptations on the genomic level, is really interesting.

If you don’t want to click the image above, it seems that genes involved in neurological function, metabolism, and cancer are enriched in terms of signals of selection in domestic dogs. This is not surprising. Dogs exhibit great life history differences from wolves (they breed more, and are not pair bonded), and famously may be able to read human faces despite being less intelligent than wolves. And of course dogs have to eat what we eat, at least to some extent.

To understand this functional aspect of the evolutionary history of dogs though one does have to nail the phylogenetics down. So there will no doubt be more coming down the pipeline in this domain, and within the next few years the natural history of man’s best friend will be of deep interest. As ancient DNA has revolutionized the understanding of the human past, I suspect there will be attempts to analyze samples from dogs as well (though I assume that the data sets will always be thinner because scholars have always been preoccupied with human remains).

Citation: Wang, Guo-dong, et al. “The genomics of selection in dogs and the parallel evolution between dogs and humans.” Nature Communications 4 (2013): 1860.

🔊 Listen RSS

Credit: Campbell, Catarina D., and Evan E. Eichler. “Properties and rates of germline mutations in humans.” Trends in Genetics (2013).

What a great age we live in. Until recently critical parameters in population genetics such as mutation rates had to be inferred and assumed, even though they served as bases for much more complex inferences. Now with humans (and humans are only the beginning!) much of what was inferred is being assessed in a more direct fashion. Caterina Campbell and Even Eichler have a review in Trends in Genetics which surveys the field as it stands now, Properties and rates of germline mutations in humans. Notice that there’s a rough convergence using pedigree analysis of a mutation rate in the low 10-8 range. Additionally, it does seem that a disproportionate number of novel mutations come through the paternal lineage via sperm. This should increase our moderate worry about older fathers (something reiterated in the piece, with caveats). Finally, the authors suggest these results are a floor for the mutational rate, in part due to the long term conflict with the inferred ‘evolutionary rates,’ which are higher. This matters because to infer the last common ancestors between lineages the value of the mutation rate is obviously critical.

To me the obvious ‘killer app’ which derives from the understanding of mutations are analyses of pedigrees in terms of accretion of de novo mutations. With precise and accurate coverage of a whole pedigree you could theoretically perform a pre-implantation screening of a set of embryos and select exactly those you adduce to have received the lowest fraction of accrued mutations from the generation of the grandparents down. This isn’t rocket science, but simple comparison and counting. Spontaneous abortion rates on the order of ~50% set a floor on human many mutations viable offspring can carry (most aneuploidies are aborted), but it seems like we may be able to set the floor a bit higher.

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"