The Unz Review • An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 TeasersGene Expression Blog
SMBE 2014
Email This Page to Someone

 Remember My Information


Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

The Society For Molecular Biology & Evolution is having its meeting in Puerto Rico right now. Here’s the website. The Twitter hashtag seems to be #SMBE14. With stuff that’s going on in my life no way I’d be able to make it, though I plan on going to ASHG 2014 in San Diego for what it’s worth. Here are some abstracts that I think are interesting (I assume a substantial fraction are going to make it to high profile pubs).

A deep learning approach to ancestral inference

Sara Sheehan, Yun S. Song
UC Berkeley, Berkeley, CA, USA

The coalescent is a powerful tool for developing likelihood-based methods for population genetic inference. However, for many coalescent models, the full likelihood is unknown or computationally infeasible. Further, even if the likelihood is known, sufficient statistics for parameters of interest may not exist. To circumvent these issues, Approximate Bayesian Computation (ABC) reduces the information present in genomic sequences to a set of summary statistics. Although this approach has proved useful for various population genetic applications, one problem with ABC is the common “curse of dimensionality”, caused by a large number of correlated summary statistics. This is typically dealt with by applying dimensionality reduction techniques or pruning statistics based on intuition. However, simple reductions cannot always learn subtle relationships between the data and the parameters. Expert pruning of statistics is often justified, but can eliminate valuable information, especially when trying to infer many parameters.

Here we present a new ancestral inference method that leverages the power of a deep learning framework. Inspired by neural networks, deep architectures use several layers of hidden nodes to learn a rich class of functions from the input (summary statistics) to the output (parameters of interests). In contrast to ABC, deep learning requires no rejection step, does not rely on a prior for parameter estimation, and is robust to the addition of uninformative statistics. Our approach begins with several initialization steps that learn the correlations between the inputs for each layer. The entire deep network is then fine-tuned to create a sophisticated function from the inputs to the outputs.

In this work we apply our method to jointly estimate various population-genetic parameters. For illustration, we consider the problem of jointly estimating the population-scaled mutation rate and the effective population sizes of a bottleneck. We verify the accuracy of our method on simulated data, demonstrating a marked improvement in mutation rate accuracy over traditional estimators. Since our deep learning approach is roughly an order of magnitude faster than typical coalescent HMMs, applying it to the entire genome is much more feasible. We also apply our method to investigating variation in mutation rate and selection across the genome. Our method can also be applied to other combinations of parameters, including more complex demographies and recombination rates. Eventually we hope to combine the domain knowledge of likelihood methods with machine learning to create a general framework for ancestral inference problems.

A model-based approach for identifying signatures of ancient balancing selection in genetic data

Michael DeGiorgio1, Kirk Lohmueller2, Rasmus Nielsen3
1Pennsylvania State University, University Park, PA, USA, 2University of California, Los Angeles, Los Angeles, CA, USA, 3University of California, Berkeley, Berkeley, CA, USA

While much effort has focused on detecting positive and negative directional selection in the human genome, relatively little work has been devoted to balancing selection. This lack of attention is likely due to the paucity of sophisticated methods for identifying sites under balancing selection. We designed the first set of likelihood-based methods that explicitly model the genealogical process under balancing selection using a coalescent framework. Simulation results show that our methods for detecting balancing selection vastly outperform previous approaches based on summary statistics are robust to demography. We apply the new methods to whole-genome sequencing data from humans, and find a number of previously-identified loci with strong evidence of balancing selection, including various HLA genes. Additionally, we find evidence for many novel candidates, the strongest of which is FANK1, an imprinted gene that suppresses apoptosis, is expressed during meiosis in males, and displays marginal signs of segregation distortion. We hypothesize that balancing selection acts on this locus to stabilize the segregation distortion and negative fitness effects of the distorter allele. Not only are our methods for identifying signatures of balancing selection the most powerful developed to date, but they can also be applied to any organism with polymorphism data and an outgroup sequence. As such, we expect that our methods will be widely used by the genomics community to uncover the potentially numerous genomic regions that are under balancing selection in many non-human species.

Comparative Genomics Across Modern Bird Species

Guojie Zhang1 ,2
1China National Genebanck at BGI-Shenzhen, Shenzhen, China, 2Centre for Social Evolution,, Department of Biology,University of Copenhagen, Copenhagen, Denmark

Our avian genomics consortium finished the genomes of at least one species per Neoavian order as well as those of several reptile outgroups, a dataset that we have used for both resolving the order level Neoavian tree of life and for performing comparative genomic analyses for traits and genome evolution across an entire vertebrate class. Neoaves represent 95% of all living bird species, thus this project is to date the most comprehensive genomics study of its kind of a vertebrate class that we are aware of. The total dataset is comprised of 48 avian and six non-avian reptile species, of which 42 of the avian are unpublished genomes. I will present here the initiation of this project and the comprehensive comparative genomic analyses. We found the smaller size of avian genomes was largely the result of massive erosion of repetitive elements, large segmental deletions, as well as gene loss following the split from other reptiles. Avian genomes show a remarkably high degree of evolutionary stasis from chromosomes, gene synteny, down to single nucleotides. We identified many protein-coding genes that are evolving non-neutrally, as well as non-coding RNA and regulatory regions that are highly conserved.We detected the convergent evolutionary sequences in both coding regions and non-coding regulatory regions within vocal learning birds. We investigated and highlighted candidate genes that underlie traits relevant to the diversity of avian ecologies.

Sex, Genes and Sequence

Jennifer A. Marshall Graves2
1La Trobe University, Melbourne, Australia, 2University of Canberra, Canberra, Australia

What can we discover about how sex is determined using new sequencing technologies? In mammals, X and Y sex chromosomes are recognised cytologically. We could derive and annotate sequence for the X along with autosomes, a task made easier by the complete conservation of the X among placental mammals. The degenerate Y presented a challenge because it is full of repetitive sequence and poorly conserved because of differential gene attrition and retroposon insertion. Positional cloning yielded the male-dominant SRY gene that triggers male development in mammals. Birds present a similar story in reverse; the Z chromosome common to both sexes and highly conserved beteween birds, could be sequenced an annotated; the female-specific W presented a challenge that has only recently been met. The sex determining gene DMRT1 on the Z chromosome was identified by analogy with other DM-domain bearing gene, and shown to work by differential dosage.
Reptiles and fish, however, display huge variety of sex determining mechanisms, including XY systems (male heterogamety) and ZW systems (female heterogamety), some highly differentiated like the human XY and chicken ZW, and some cytologically homomorphic. Many reptiles lack sex chromosomes, determining sex via incubation temperature, and several species do both; chromosomal sex determination at moderate temperature and a sex-switch override at extremes. These systems may prove to be the most informative, yielding novel sex determining genes (or, more remarkably, the same old genes in new contexts) and providing insights into how they interact with the environment via epigenetic pathways.

Genotyping of 390,000 SNPs in more than forty 3,000-9,000 year old humans from the ancient Russian steppe

David Reich1 ,2, Nadin Rohland1 ,2, Swapan Mallick1 ,2, Iosif Lazaridis1, Eadaoin Harney1, Susanne Nordenfelt1, Qiaomei Fu3, Matthias Meyer3, Dorcas Brown4, David Anthony4, Nick Patterson2
1Harvard Medical School, Boston, MA, USA, 2Broad Institute of Harvard and MIT, Cambridge, MA, USA, 3Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 4Hartwick College, Oneonta, NY, USA

A central challenge in ancient DNA research is that for many bones that contain genuine DNA, the great majority of molecules in sequencing libraries are microbial. Thus, it has been impractical to carry out whole genome analyses of substantial numbers of ancient individuals. We report a strategy for in-solution capture of ancient DNA from approximately 390,000 single nucleotide polymorphism (SNP) targets, adapting a method of Fu et al. PNAS 2013 who enriched a 40,000 year old DNA sample for the entire chromosome 21. Of the SNPs targets, the vast majority overlap the Affymetrix Human Origins array, allowing us to compare the ancient samples to a database of more than 2,700 present-day humans from 250 groups.

We applied the SNP capture as well as mitochondrial genome enrichment to a series of 65 bones dating to between 3,000-9,000 years ago from the Samara district of Russia in the far east of Europe, a region that has been suggested to be part of the Proto-Indo-European homeland. We successfully extracted nuclear data from 10-90% of targeted SNPs for more than 40 of the samples, and for all of these samples also obtained complete mitochondrial genomes. We report three key findings:

  • Samples from the Samara region possess Ancient North Eurasian (ANE) admixture related to a recently published 24,000 year old Upper Paleolithic Siberian genome. This contrasts with both European agriculturalists and with European hunter-gatherers from Luxembourg and Iberia who had little such ancestry (Lazaridis et al. 2013). This suggests that European steppe groups may have been be implicated in the dispersal of ANE ancestry across Europe where it is currently pervasive.
  • The mtDNA composition of the steppe population is primarily West Eurasian, in contrast with northwest Russian samples of this period (Der Sarkissian et al. PLoS Genetics 2013) where an East Eurasian presence is evident.
  • Samara experienced major population turnovers over time: early samples (>6000 years) belong primarily to mtDNA haplogroups U4 and U5, typical of European hunter-gatherers but later ones include haplogroups W, H, T, I, K, J.

We report modeling analyses showing how the steppe samples may relate to ancient and present-day DNA samples from the rest of Europe, the Caucasus, and South Asia, thereby clarifying the relationship of steppe groups to the genetic, archaeological and linguistic transformations of the late Neolithic and Bronze ages.

Statistical Inference of Archaic Introgression In Central African Pygmies

PingHsun Hsieh1, Jeffrey Wall2, Joseph Lachance3, Sarah Tishkoff3, Ryan Gutenkunst1, Michael Hammer1
1University of Arizona, Tucson, AZ, USA, 2University of California, San Francisco, CA, USA, 3University of Pennsylvania, Philadelphia, PA, USA

Recent evidence from ancient DNA studies suggests that genetic material introgressed from archaic forms of Homo, such as Neanderthals and Denisovans, into the ancestors of contemporary non-African populations. These findings also imply that hybridization may have given rise to some of adaptive novelties in anatomically modern human (AMH) populations as they expanded from Africa into various ecological niches in Eurasia. Within Africa, fossil evidence suggests that AMH and a variety of archaic forms coexisted for much of the last 200,000 years. Here we present preliminary results leveraging high quality whole-genome data (>60X coverage) for three contemporary sub-Saharan African populations (Biaka, Baka, and Yoruba) from Central and West Africa to test for archaic admixture. With the current lack of African ancient DNA, especially in Central Africa due to its rainforest environment, our statistical inference approach provides an alternative means to understand the complex evolutionary dynamics among groups of the genus Homo.

To identify candidate introgressive loci, we scan the genomes of 16 individuals and calculate S*, a summary statistic that was specifically designed by one of us (JDW) to detect archaic admixture. The significance of each candidate is assessed through extensive whole-genome level simulations using demographic parameters estimated by ∂a∂i to obtain a parametric distribution of S* values under the null hypothesis of no archaic introgression. As a complementary approach, top candidates are also examined by an approximate-likelihood computation method. The admixture time for each individual introgressive variant is inferred by estimating the decay of the genetic length of the diverged haplotype as a function of its underlying recombination rate. A neutrality test that controls for demography is performed for each candidate to test the hypothesis that introgressive variants rose to high frequency due to positive directional selection. The present study represents one of the most comprehensive genomic surveys to date for evidence of archaic introgression to anatomically modern humans in Africa.

The complete genome sequence of a 45,000-year-oldmodern human from Eurasia

Qiaomei Fu1 ,2, Bence Viola1 ,3, Heng Li5 ,6, Priya Moorjani6, Flora Jay4, Aximu Ayinuer-Petri1, Susan Keates8, Yaroslav V. Kuzmin7, Montgomery Slatkin4, David Reich5 ,6, Janet Kelso1, Svante Pääbo1
1Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2Key Laboratory of Vertebrate Evolution and Human Origins of Chinese Academy of Sciences, Beijing, China, 3Department of Human Evolution, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 4Department of Integrative Biology, University of California, Berkeley, USA, 5Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA, 6Department of Genetics,Harvard Medical School, Boston, USA, 7Institute of Geology & Mineralogy, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia, 8University Village, Columbia, USA


We have sequenced to high coverage the genome of a femur recently discovered near Ust-Ishim in Siberia. The bone was directly carbon-dated to 45,000 years before present. Analyses of the relationship of the Ust-Ishim individual to present-day humans show that he is closely related to the ancestral population shared between present-day Europeans and present-day Asians. The over-all amount of genomic admixture from Neandertals is similar to that in present-day non-Africans and there is no evidence for admixture from Denisovans. However, the size of the genomic segments of Neandertal ancestry in the Ust-Ishim individual is substantially larger than in present-day individuals. From the size distribution of these segments we estimated that this individual lived about 200-400 generations after the admixture with Neandertals occurred. The age of this genome allows us to directly assess the mutation rate in the different compartments of the human genome. These results will be presented and discussed.

Detection of polygenic selection at different evolutionary time scales

Josephine Daub1 ,2, Isabelle Dupanloup1 ,2, Marc Robinson-Rechavi2 ,3, Laurent Excoffier1 ,2
1University of Berne, Berne, Switzerland, 2Swiss Institute of Bioinformatics, Lausanne, Switzerland, 3University of Lausanne, Lausanne, Switzerland

Most approaches aiming at finding genes involved in adaptive events have focused on the detection of outlier loci, which resulted in the discovery of individually ´significant´ genes with strong effects. However, a collection of small effect mutations could have a large effect on a given biological pathway that includes many genes, and such a polygenic mode of adaptation has not been systematically investigated in humans or other mammals. We therefore propose to evidence polygenic selection by detecting signals of adaptation at the pathway or gene set level instead of analyzing single independent genes. Using a gene-set enrichment test, we identify genome-wide signals of recent adaptation among human populations as well as more ancient signals of adaptation in the human lineage and in primates.

A global landscape of protein adaptation to viruses in mammals

David Enard, Dmitri Petrov
Stanford, Stanford, CA, USA


Evolutionary arms race between pathogens and their hosts is expected to result in high rates of adaptation in host genes where resistance alleles arise. In mammals, only a few striking cases of pervasive adaptation to viruses have been studied at the protein divergence level. Such cases include the highly specialized antiviral factors ZAP and PKR, both of which are involved in viral RNA degradation. Viruses however have been shown to interact with many more host proteins, many of which play no specific role in immunity and have very diverse functions in mammalian hosts. It is currently unknown whether such non-specialized virus-interacting proteins also take part in the arms race against viruses, and whether they exhibit higher rates of adaptation as a result.

We have manually curated a large body of virology literature and found that around 1,000 mammalian proteins out of 9,800 mammalian orthologs have been shown to interact with viruses by using high confidence, low-throughput methods. For each of the 9,800 orthologs, we used codeml branch-site test on high quality alignments to estimate the number of adaptive amino acid changes present on each one of the 44 branches of a tree comprising 24 mammalian species with high-coverage genomes.

By using a gene set enrichment approach we could find a very substantial excess of adaptation in virus-interacting proteins across the whole mammalian tree. We could also show that this excess is robust to confounding factors such as Biased Gene Conversion or hypermutable CpGs, and that such mutational biases lead to underestimate the excess of adaptation driven by viruses. Known cases such as ZAP or PKR therefore represent the tip of a much larger iceberg, with many proteins in very diverse functions being involved in the arms race against viruses. Adaptation to viruses in fact explains a surprisingly large part of positive selection events in virus-interacting proteins. Thanks to the high number of virus interactions we compiled and the large number of mammalian lineages included in our analysis, we could finally characterize the network and functional landscape of adaptation to viruses in mammals. Proteins involved in primary biosynthetic processes (transcription, RNA maturation, etc…) do not adapt in response to viruses, whereas other functions critical to the viral life cycle such as endocytosis or apoptosis concentrate a high number of adaptive events driven by viruses.

Conflations of short IBD blocks can bias inferred length of IBD

Charleston Chiang1, Peter Ralph2, John Novembre3
1University of California, Los Angeles, Los Angeles, CA, USA, 2University of Southern California, Los Angeles, CA, USA, 3University of Chicago, Chicago, IL, USA

Blocks of identity-by-descent (IBD) play an important role in many modern genetic applications, including long-range phasing, imputation, genetic mapping, detection of natural selection, and demographic inferences. One commonly used definition of IBD blocks is that they are contiguous segments of the genome inherited from a recent shared common ancestor without intervening recombination. With a number of available programs, long IBD blocks (> 1cM) can be efficiently detected using high-density SNP array data of a population sample. However, all programs detect IBD based on contiguous segments of identity-by-state (IBS). As such, detected IBD blocks could often be due to the conflation of smaller IBD blocks inherited from different common ancestors. Here, we show through theory and simulation that the conflation of small IBD blocks can occur with appreciable frequencies and can lead to errors in estimating the length distribution of IBD blocks, thereby affecting downstream inferences. Specifically, we used coalescent simulations where we know the precise genealogy of the sample and found that, under a realistic demographic model of human history, >35% of the detected IBD segments of 1cM or longer are composed of at least two subsegments. In particular, 11% of the detectable segments consist of at least 1 other subsegment >25% of the total length, and this effect was more pronounced for detectable segments between 1 to 2cM long, compared to segments > 2cM long. To demonstrate that the conflation can lead to practical problems, we investigated the impact on a novel estimator of the de novo mutation rate using IBD blocks. We observed accurate estimates of the input mutation rate when true IBD blocks are used, but overestimates of the mutation rate by ~15 fold using inferred IBD blocks. When the effect of conflation on the estimated age of the block was modeled, the mutation rate estimate improved greatly. Our results suggest that identifying IBD blocks based on extended IBS can inflate the length of IBD blocks, and in this case results in an inflated estimate of the de novo mutation rate, unless properly accounted for. This effect should be carefully considered as methods to detect shorter IBD blocks using sequencing data are being developed.

Demographic inference from whole genome data with particle filters

Sha (Joe) Zhu, Gerton Lunter
Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK

Besides encoding an organism’s biology, genomes contain information about a species’ demographic history. The availability of reliable DNA sequencing at accessible cost has resulted in the development of new methods to infer historical demography from whole-genome data, most notably the pairwise sequentially Markovian coalescent method, a hidden Markov model (HMM) based approach which is able to resolve details of past population size changes over time from a single diploid individual. However, these and other HMM-based methods tend not to scale well to multiple samples, because the underlying model, the coalescent with recombination, is inherently very high dimensional.

Here we present a new approach to this problem, using a sequential Monte Carlo or “particle filter” approach. Instead of approximating by discretization into a finite number of states, we represent the posterior sample of genealogies by a large number of “particles” representing particular genealogies, which we sequentially update to reflect additional data. This involves no discretization, and has the property that the accuracy of the approximation scales as the square root of the number of particles, independently of the dimensionality of the model. Second, we infer parameters using a novel “tracking” online expectation-maximization (EM) algorithm, which has superior performance compared to traditional EM procedures.

Since we can process multiple samples simultaneously, under a good approximation of the full coalescent model, we are able to infer migration events as well as population size changes. In addition, by being simulation-based, particle filters can in principle be used for inference under more complex models of evolution, including selection. In this talk I will show the method’s performance on simulated data, and show initial analyses of chimpanzee demographic history and migration patterns.

Detecting ghost admixture in human populations using a projection analysis

Melinda Yang, Montgomery Slatkin
UC Berkeley, Berkeley, CA, USA

Large genomic data sets have been collected for many modern human populations, allowing greater inference of past demographic history between and within these populations. However, the ability to detect admixture from an unsampled population (ghost admixture) is still weak. Inferences of ghost admixture have been made for various human populations, especially in Africa, but current methods often have little power to estimate the low amounts of admixture believed to be in the population. The method developed here, that of projecting a test genome on a reference population, is highly sensitive to low amounts of migration and admixture. It extracts relevant data from the frequency spectrum by asking whether an individual has a derived allele at a site relative to the derived allele frequency of a reference population. Admixture from an unsampled population into the reference population results in a characteristic difference from the case with no admixture. Using an array of test individuals from the A-panel and reference populations from the 1000 Genome Project and the Human Genome Diversity Panel, our method distinguishes ghost admixture into various human populations.

Expansion load and the evolutionary dynamics of a species range

Stephan Peischl, Laurent Excoffier
University of Bern, Bern, Switzerland

We study the effect of expansion load, i.e, the accumulation of deleterious mutations during range expansions, on the evolutionary dynamics of species ranges. Using a mixture of indivudal based simulations and analytical approximations we find that expansion load can severely limit the speed at which a species expands its range. If recombination is strong, we find that mean fitness at the wave front approaches an equilibrium value at which the effects of newly established mutations on fitness cancel each other out. In the absence of recombination the dynamics are more complex and beneficial mutations from the core of the range can invade the front of the expansion, which results in a quasi-periodic expansion process. This leads to an interesting phenomena: although the rate of adaptation is generally higher in recombining organisms, mean fitness at the wave front may be higher in the absence of recombination because individuals from the core have a higher chance to invade the front, replace individuals with high mutation load, and colonize new habitats. Our findings have important consequences for the evolutionary dynamics of species ranges, the role of recombination during range expansions, and the evolution of mechanisms that modifiy recombination (e.g., inversions) or mating (e.g., modifiers that affect the rate of selfing).

Using whole genomes to resolve the tree of life of birds and evolution of complex traits

Erich D. Jarvis
Duke University Medical Center and Howard Hughes Medical Insitute, Durham, NC, USA

One of the goals of the G10K project is to provide a resource of whole genomes that can be used by investigators for resolving difficult questions in biology. Here I will present some results of an avian phylogenomics and comparative genomics consortium, that included the G10K group, on the generation and use of genomes of 48 bird species to resolve the early branches in the tree of life of birds and identify genes involved in complex convergent traits. For the first time we were able to obtain a highly resolvable tree for birds. The non-coding portions of the genome yielded a more reliable phylogeny than the coding portions, of which the later showed a massive amount of protein coding sequence convergence in genes that had high base composition variance across species and correlated with life history traits. We hypothesize that this protein coding sequence convergence is due to GC biased gene conversion in species with smaller body mass and shorter generation times. The more resolved view of avian phylogeny showed that water adaptations, predatory behavior, and vocal learning each evolved at least twice in different major lineages of Neoaves birds. Waterbirds showed convergent changes in keratin genes that make up feathers, whereas vocal learning birds (songbirds, parrots, and hummingbirds) showed convergent changes in genes involved in brain function. Vocal learning is a rare trait critical for spoken-language in humans, and as such we also found that all vocal learning bird lineages and humans had shared convergent changes in expression of 40 to 60 genes involved in neural connectivity and motor behavior in song learning brain areas in birds and speech brain areas in humans. These findings demonstrate the power of genome-scale biology to address some of the most challenging questions in biology.

The gene novelty and gene enhancement game: adapting to life

Agostinho Antunes1 ,2
1CIIMAR/CIMAR, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177;, 4050-123 Porto, Portugal, 2Departamento de Biologia, Faculdade de Ciências, Universidade do Porto. Rua do Campo Alegre, 4169-007 Porto, Portugal


The completion of the human genome sequencing in 2003 opened a new perspective into the importance of whole genome sequencing projects, and currently multiple species are having their genomes completed sequenced, from simple organisms, such as bacteria, to more complex taxa, such as mammals. This voluminous sequencing data generated across multiple organisms provides also the framework to better understand the genetic makeup of such species and related ones, allowing to explore the genetic changes underlining the evolution of diverse phenotypic traits. Here, recent results from our group retrieved from comparative evolutionary genomic analyses of selected vertebrate species will be considered to exemplify how gene novelty and gene enhancement by positive selection might have been determinant in the success of adaptive radiations into diverse habitats and lifestyles.

Molecular evolution before the ancestors of the bacterial and archaeal domains

Johann Peter Gogarten1, Gregory Fournier2
1University of Connecticut, Storrs, Connecticut, USA, 2Massachusetts Institute of Technology, Cambridge,‎ Wikipedia Massachusetts, USA

Reconstruction of ancestral ribosomal protein sequences reveals an amino acid compositional bias for the bacterial branch that appears to reflect an echo from the gradual assembly of the genetic code. While the deep branches in many phylogenies derived from molecular data are occupied by extreme thermophiles, this echo from the assembly of the genetic code is not compatible with a hyperthermophilic Last Universal Common Ancestor. The compositional analysis of ancestral sequences, the late heavy bombardment (LHB) hypothesis, and consideration of tree shape suggest that the bottleneck and selection for hyperthermophyly at the base of the archaeal and bacterial domains was due to the LHB, and that life is older than the LHB.

This conclusion is in agreement with molecular clock models applied to the evolution of Archaea, constrained by time-calibrating horizontal gene transfer events, which predict that the archaeal ancestor existed at or near the time of the LHB, with a most likely time estimate of ~3.9 Ga.


Sequence reconstruction for enzymes that diverged before the organismal LUCA confirms some amino acids as late additions to genetic code, but also suggests that for most amino acids other tRNA charging mechanisms existed before the currently known aminoacyl tRNA synthetases diversified.

The Proteobacteria: Large Genome Flows and the Evolution of Photosynthesis

James Lake1, Jun Zhao2, Brooke Sarna1, HyunMin Koo3, Janet Sinsheimer1
1University of California, Los Angeles, Los Angeles, California, USA, 2Peking University, Beijing, China, 3University of Alabama, Birmingham, Birmingham, Alabama, USA


The most speciose prokaryotic phylum known is the Proteobacteria. It contains diverse free living-, pathogenic-, photosynthetic- and symbiotic species. Due to its large number of species it might be expected to provide strong phylogenetic support for a proteobacterial tree of life. But surprisingly sequence-based tree analyses are unable to resolve its topology. Here using ring analyses we reconstruct its tangled evolutionary history.

We find that the evolution of the Alpha-, Beta-, Gamma- and Delta- Proteobacteria cannot be represented by trees. But it’s evolution can be uniquely explained by a combination of divergences and convergences, or Rings. We identify and map the origins of major gene flows within the Proteobacteria and reconstruct the resulting rings (P<10-11). Specifically, we obtain strong evidence that the Proteobacterial Rings are the result of endosymbioses. We map the flows of photosynthesis, sulfur metabolisms, and aerobic- and anaerobic- life styles throughout the Proteobacteria. With one exception – a single large horizontal transfer of 647 genes from the Alphaproteobacteria to the Betaproteobacteria – we demonstrate that these flows are due to endosymbioses and not due to horizontal/lateral gene transfers ( P < 5.2×10-10 ). Analyses based on gene presences and absences within genomes and on protein families strongly support a unique ring topology. These rings explain major evolutionary events within the Proteobacteria that trees cannot. These include the phylogenetic distributions of photosynthesis, aerobic life styles, and sulfur-based metabolisms.

Size matters: pre-adaptation in metabolic evolution

Martin Lercher1, Jonathan Fritzemeier1, Balazs Szappanos2, Balazs Papp2, Csaba Pal2
1Heinrich Heine University, Düsseldorf, Germany, 2Biological Research Center of the Hungarian Academy of Sciences, Szeged, Hungary

It has been proposed that the adaptation to one environment may facilitate later adaptations to further environments, a process termed pre-adaptation. We quantify the role of pre-adaptation by simulating evolution of bacterial metabolic networks across a network of environments, assessing viability through Flux-Balance Analysis. We find that E. coli K12 can adapt to most environments through the addition of 2-3 ‘beneficial’ reactions, with very few environments requiring more than 6 additional reactions. Adaptations are not independent: in many cases, the number of additional reactions can be reduced by first adapting to an ‘intermediate’ environment, confirming an important role of metabolic pre-adaptation for generalist bacteria such as E. coli. Our predictions of co-acquired beneficial gene pairs are confirmed by comparative genomics data from relatives of E. coli K12 .

In contrast to the generalist E. coli, the reduced endosymbiont Buchnera aphidicola typically needs 80+ additional reactions to adapt to new metabolic environments. This difference between generalist and specialist metabolic networks confirms the general conclusions of the toolbox model of evolution, which posits that larger metabolic networks require fewer additional reactions (fewer new ‘tools’) in order to metabolize novel substrates. Thus, more complex metabolic systems are not reduced in their ability to evolve due to higher pleiotropic constraints (as originally proposed by R.A. Fisher), but instead adapt faster due to an already well-filled ‘toolbox’.

The evolution of strategic phenotypic heterogeneity

Rafael Pena-Miller1 ,5, Markus Arnoldini3 ,4, Martin Ackermann3 ,4, Robert Beardmore2
1Department of Zoology, University of Oxford, Oxford, UK, 2Biosciences, University of Exeter, Exeter, UK, 3Department of Environmental Systems Science, ETH Zurich, Zurich, Switzerland, 4Department of Environmental Microbiology Eawag, Zurich, Switzerland,5Center for Genomic Sciences, UNAM, Cuernavaca, Mexico

Bacterial communities need different strategies to survive in unpredictable environments. Sensing the surroundings and regulating their metabolic machinery accordingly in order to maximise fitness is a pervasive mechanism in microbial survival, but this strategy is not optimal if the energetic cost of expressing the sensing apparatus is high relative to the benefit of having it. An alternative strategy is known as bet-hedging, where a genetically clonal population is composed of multiple sub-populations, each expressing a different phenotype.

Although the potential benefits of non-genetic individuality and variable phenotypes between cells are clear, the underlying gene regulatory mechanisms that support it are not. Is phenotypic bistability a noise-driven stochastic process? Or does the cell dynamically, indeed almost deterministically (but naturally subject to stochastic forcing), regulate the observed phenotype? In this talk we will show, using mathematical models and in-silico evolution, that there are different gene regulatory networks that yield complex dynamics which can be evolutionary optimal at a population-level. The predicted single-cell profiles of gene expression will then be compared with the temporal patterns of gene expression of a virulence factor of the human pathogen Salmonella enterica serovar Typhimurium observed experimentally using microscopy and microfluidic devices.

Number of genes controlling a quantitative trait in a hybrid zone

Jacob Schack Vestergaard1, Evan Twomey2, Kyle Summers2, Rasmus Larsen1, Rasmus Nielsen3
1Technical University of Denmark, Copenhagen, Denmark, 2East Carolina University, Greenville, NC, USA, 3University of California Berkeley, Berkeley, CA, USA

Ranitomeya imitator is a dendrobatid frog with a visually striking aposematic color pattern, that forms a mimetic complex with several other species. The color pattern is highly polymorphic and tends to mimic the pattern observed in other dendrobatid frogs in the local area. Where different mimetic morphs come into contact, a gradient is formed in which frogs in the contact zone appear to be phenotypic intermediates of the pure mimetic morphs. R. imitator is an example of a species of great interest to evolutionary biologists, but with few genetic tools available due its large genome size (9 gb) and difficulties with captive breeding. To remedy this problem we have developed a number of methods that can be applied to systems such as R. imitator that can help elucidate the genetic basis of the phenotype without access to sophisticated genetic tools. We first develop a new method for unsupervised quantification of phenotypes, that avoid the biases inherent in subjective choices of phenotypic descriptors. We apply it to images of R. imitator and show that the position of a population in the transect can largely be predicted from phenotypes. We also develop a new likelihood based methods for determining the number of genes affecting a phenotype along a transect or a hybrid zone. The methods take advantage of genetic estimates of admixture proportions. We show that a model with a single gene of major effect has stronger statistical support than models involving several genes of major effect for the R. imitator transect. Finally, we use a related statistical model to show that several apparently distinct phenotypic traits likely are controlled by the same gene.

On the distributions of lengths of shared genome: population size, bottlenecks and mutational load

Angeles de Cara1, Frederic Austerlitz1, Luis Alberto García Cortés2
1Museum National d’Histoire Naturell, Paris, France, 2INIA, Madrid, Spain

The study of haplotypes, their linkage disequilibrium structure and diversity patterns has only become feasible with the advent of high density genotype and now whole genome sequence data. This has reopened the theoretical questions introduced by Fisher in 1954 about the distribution of junctions and of lengths of segments of identity, which so far, had been mostly explored theoretically. The power of such dense data sets to infer evolutionary processes remains to be explored, as it is unclear how to distinguish demographic processes from selective processes. Here we present simulation results for the distribution of lengths of segments of identity under various scenarios: constant population size, bottlenecks and populations which have accumulated deleterious mutations. These distributions not only provide insight into the history of the populations, but, furthermore, they appear as a useful tool to measure inbreeding and coancestry in the populations for which genealogies are not available. Therefore, our results are relevant not only in the context of disentangling the processes that the populations have undergone, but can also provide helpful insights on how the populations may respond to future changes in their environment.

De Novo Sequencing of Three Anole Lizards and Comparative Genomic Analysis of a Neotropical Adaptive Radiation

Marc Tollis1, Elizabeth Hutchins1, Walter Eckalbar1, Michael Crusoe1, Catherine May1, Jessica Stapley2, Elise Kulik1, Matt Huentelman3, Rebecca Fisher1 ,4, Kenro Kusumi1
1School of Life Sciences, Arizona State University, Tempe, AZ, USA, 2Smithsonian Tropical Research Institute, Balboa, Ancon, Panama, 3Translational Genomics Research Institute, Phoenix, AZ, USA, 4University of Arizona College of Medicine, Phoenix, AZ, USA

The repeated evolution of morphological adaptations to specific ecological niches makes Anolis lizards a spectacular example of adaptive radiation in terrestrial vertebrates, and an ideal model for comparative genomics. The complete genome of the green anole (A. carolinensis) has already provided insights to the evolution of genomic and phenotypic variation in vertebrates. A multi-species comparison within the Anolis genus would increase the power of studies seeking to understand the genomic bases of species diversification. We carried out de novo whole genome sequencing and assembly of three species, the grass anole (A. auratus), the bridled anole (A. frenatus), and the slender anole (A. apletophallus) from Panama. Analysis of the abundance and diversity of transposable elements within these genomes has revealed repetitive landscapes typical of non-mammalian vertebrates, yet variation between Anolis species is greater than what is observed across most mammals. This may have provided a genomic environment amenable to key adaptations during the Anolis radiation. Using well-defined models such as mouse and chicken, we identified orthologous genes integral to myogenesis and limb development, and are beginning to catalogue interspecific variation in protein-coding genes and cis-regulatory motifs related to the evolution of morphological diversity in this genus. Functional anatomical and histological studies are being performed to quantify the tail and hindlimb muscle groups of these species compared to A. carolinensis. Our ultimate goal is to identify the divergent alleles associated with ecological speciation, thus bridging the genotype-phenotype gap.

An informational constraint on genetic fidelity and its implications for early life

Steven Massey
University of Puerto Rico, San Juan, Puerto Rico


A ‘proteomic constraint’ on genetic fidelity has been proposed, proportional to the size of the proteome (the total number of codons in a genome), and approximates to the amount of sequence specific information in a genome. The larger the proteome the higher the mutational load, which is proposed to lead to a greater selective pressure to evolve or maintain genetic fidelity. The concept was originally introduced to explain the malleability of the genetic code in some genomes; it was suggested that Crick’s Frozen Accident could be ‘thawed’ when proteome size and the proteomic constraint are reduced, as deviations to the genetic code would be more likely to be tolerated. This explains their frequency in mitochondria and intracellular bacteria. The first link between information content and mutation rates was made by Eigen’s quasispecies hypothesis, which proposes that mutation rates limit the information content of a genome. In direct contrast, we propose that proteome size exerts a constraint on mutation rates and other forms of error. This leads to a number of predictions that may be tested using comparative genomics approaches. Firstly, it is observed that mutation rates are inversely proportional to proteome size in a range of different genomes, as predicted from first principles. Secondly, the idea of a proteomic constraint implies that DNA repair should be more complex and efficient in larger proteomes. Consistent with this prediction, the number and diversity of DNA repair genes is found to be greater in larger bacterial, archaeal and viral genomes. These insights allows us to propose that error rates in early lifeforms should be high given smaller proteome sizes and also allows a prediction to be made as to the mutation rate in the LUCA given that ancestral genome reconstruction allows us to estimate the size of the LUCA proteome.

Bayesian Tree Inference on Whole-Genome Datasets is Possible!

Andre Aberer1, Kassian Kobert1, Alexandros Stamatakis1
1Heidelberg Institute for Theoretical Studies, gGmbH, Heidelberg, Germany, 2Karlsruhe Institute of Technology, Karlsruhe, Germany

Bayesian phylogenetic inference is a central and well-established technique in evolutionary biology. While many alignments do not consist of more than a dozen genes, larger datasets rapidly emerge: several phylogenomic datasets comprising more than 1,000 genes have been published. First whole-transcriptome and whole-genome alignments are currently being assembled.

To alleviate the apparent analytical bottleneck, we introduce ExaBayes, a software package engineered for conducting state-of-the-art Bayesian phylogenetic inferences on datasets of almost arbitrary size, being only constrained by amount of computational resources available. ExaBayes implements a similar set of proposals as MrBayes which is on par in terms of convergence performance. In ExaBayes, the tree inference process is parallelized at three distinct levels. Our approach for analyzing highly partitioned (>1,000) datasets accelerates parallel inferences by more than a factor of 20 for linked branch lengths and about two orders of magnitude when branch lengths are not linked across partitions. Two orthogonal memory saving techniques can be employed to reduce the excessive RAM requirements of large datasets by more than a factor of two, thus allowing small computer clusters to carry out ambitious large-scale inferences. ExaBayes uses the highly optimized likelihood function implementation of RAxML. The sequential version of ExaBayes is easy to install and use while installation and deployment of the parallel version does not require more effort than comparable parallel Bayesian inference tools.

We demonstrate how we used ExaBayes on a simulated alignment comprising 200 taxa and 10,000,000 characters divided into 100 partitions using 8,192 cpu-cores on the Munich x86 supercomputer SuperMUC.

New features in Molecular Evolutionary Genetics Analysis (MEGA) software

Sudhir Kumar1, Koichiro Tamura2
1Arizona State University, Tempe, AZ, USA, 2Tokyo Metropolitan University, Tokyo, Japan

The Molecular Evolutionary Genetics Analysis (MEGA) software is developed for comparative analyses of DNA and protein sequences that are aimed at inferring the molecular evolutionary patterns of genes, genomes, and species over time. MEGA contains facilities for building sequence alignments, inferring phylogenetic histories, and conducting diverse molecular evolutionary analysis. These facilities are delivered through an extensive graphical user interface (GUI) with many visual tools for exploration of data and analysis results. In this presentation, I will provide an overview of many new additions to MEGA that are aimed at expanding the repertoire of molecular evolutionary analyses and making it easy to conduct all these analyses at a high-throughput. New facilities include a new Timetree Wizard for estimating divergence times for all branching points in large phylogenies (timetrees) and a suite of methods to forecast the deleteriousness of non-synonymous single nucleotide variants in the human proteins. These new methods and all previous functionalities in MEGA are now available in a new command line version (MEGA-CC) that is optimized for iterative and integrated pipeline analyses. The new version also contains enhanced algorithms to search for the optimal trees under evolutionary criteria and implements a more advanced memory management to better use 64-bit memory systems. Both GUI and command-line versions of MEGA can be downloaded from free of charge.

Holly M. Bik
UC Davis Genome Center, University of California, Davis, USA

Using environmental sequencing approaches, we now have the ability to deeply characterize biodiversity and biogeographic patterns in understudied, uncultured microbial taxa (investigations of bacteria, archaea, and microscopic eukaryotes using 454/Illumina sequencing platforms). However, the sheer volume of data produced from these new technologies requires fundamentally different approaches and new paradigms for effective data analysis. Scientific visualization represents an innovative method towards tackling the current bioinformatics bottleneck; in addition to giving researchers a unique approach for exploring large datasets, it stands to empower biologists with the ability to conduct powerful analyses without requiring a deep level of computational knowledge. Here we present Phinch, an interactive, browser-based visualization framework that can be used to explore and analyze biological patterns in high-throughput environmental datasets. Leveraging a close collaboration between UC Davis and Pitch Interactive (a data visualization studio in Berkeley, CA), this project takes advantage of standard file formats from computational pipelines in order to bridge the gap between biological software (e.g. QIIME) and existing data visualization capabilities (harnessing the flexibility and scalability of WebGL and HTML5).

Comparative Genomics of Rotifera

David Mark Welch1 ,3, Bette Hecox-Lea1 ,2
1Marine Biological Laboratory, Woods Hole, MA, USA, 2Northeastern Univiersity, Boston, MA, USA, 3Brown University, Providence, RI, USA

Rotifers, microinvertebrates made up of ~1000 cells, are major components of aquatic environments around the world and compose one of the largest non-arthropod taxa. The phylum likely occupies a basal position within Protostomia and is comprised of groups that each employs one of the major modes of animal reproduction: obligate sexuality (seisonid rotifers), facultative sexuality (monogonont rotifers), and obligate asexuality (bdelloid rotifers). In support of the Global Invertebrate Genome Alliance we report on our initial comparative analysis of rotifer genomes and transcriptomes, including the published genome of the bdelloid Adineta vaga, partial genomes of the monogononts Brachionus calycifloris and B. manjavacas and the seisonid Seison nebaliae; and the transcriptomes of S. nebaliae, multiple isolates of Brachionus, and six species of bdelloids representing most of the taxonomic breadth of that class (Adineta vaga, A. ricciae, Habrotrocha rosa, Macrotrachela quadricornifera, Philodina rapida, and P. roseola). Phylogenomics has already been used to elucidate the relationship of rotifer groups. We describe the presence of genes involved in DNA damage repair and considered “vertebrate specific” due to their absence in standard Ecdysozoan models such as Drosophila and C. elegans, and suggest that the standard account of the evolution of these genes must be reconsidered. Many “meiosis specific” genes absent in the asexual bdelloids are also absent in facultatively sexual monogonont rotifers, indicating that caution is warranted in interpreting the content of bdelloid genomes. Horizontal gene transfer, well documented in bdelloids, is also present in other rotifers, though it is not as abundant. However, the proliferation of genes involved in DNA damage repair and prevention does appears to be bdelloid-specific, consistent with the adaptation of bdelloids to desiccation-prone environments and the possibility that these genes play a novel role in the successful maintenance of bdelloid genomes in the absence of sexual recombination.

Functional divergence of Drosophila and mammalian duplicate genes

Raquel Assis1, Doris Bachtrog2
1Pennsylvania State University, University Park, PA, USA, 2University of California, Berkeley, Berkeley, CA, USA

Gene duplication is thought to play a key role in the evolution of phenotypic innovation. We recently developed a phylogenetic approach to identify duplicate genes with novel functions by quantifying differences between gene expression profiles of duplicates in closely related species. Application of our method to spatial gene expression profiles in Drosophila revealed that most young duplicates possess new functions, and that acquisition of new functions occurs rapidly. Conversely, in mammals, most young duplicates retain their ancestral functions, and functional divergence occurs slowly. This contrast is reminiscent of the faster rate of protein sequence evolution observed in Drosophila relative to mammals, which can be attributed to a greater efficiency of natural selection in Drosophila. Thus, our findings suggest that strong positive selection drives the acquisition of new functions in both Drosophila and mammalian duplicate genes.

The evolution of gene expression during speciation

Severin Uebbing, Hans Ellegren
Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden

We recently sequenced and assembled the genome of the collared flycatcher (Ficedula albicollis). By resequencing population samples from both collared flycatchers and its sister species, the pied flycatcher (F. hypoleuca), we found a heterogeneous landscape of genome differentiation between species. Given these striking large-scale patterns on the sequence level, differences in ecological traits such as feather coloration or behavioural traits might ultimately be caused by differences in gene expression patterns. Here we investigated what role gene expression divergence plays in the speciation process of this model system.

We sequenced RNA samples from in total 20 different individual birds from both species and sexes which provided us with deep insight into many different aspects of expression divergence. A first finding was that differentially expressed genes were enriched within species differentiation peaks. Having sequenced samples from nine different tissues also allows us to compare contributions of different tissues to expression divergence. For example, expression differences were on average about ten times smaller in brain than in skin tissue.

We further show which factors are associated with expression variance and expression level divergence. By screening for genes with large between-species divergence and small within-species expression variance, we could identify genes with potential importance for speciation or local adapation. Genomic sequence analyses allow the identification of gene regulatory motifs and their association with expression divergence. Putting all these pieces together provides the basis for a much deeper understanding of the processes at the interplay of gene expression divergence and speciation.

Contrasting influences of natural selection across multiple lepidopteran species

Matthew Aardema, Peter Andolfatto
Princeton University, Princeton, NJ, USA


Examination of genome-wide divergence patterns across broad suites of taxa has revealed large variance in the extent to which positive selection has shaped patterns of genetic diversity between species. For example, in a variety of Drosophila, half or more of the amino acid substitutions observed between species appears to be the result of positive selection. In contrast, estimates for humans suggest that only 10 – 20% of non-synonymous changes across the genome are the result of adaptive processes. The most commonly proposed explanation for this large difference is that the effective population size (Ne) of these species also varies significantly. The effective population size of an organism strongly influences how its genome will evolve, with the fate of new mutations (either advantageous or deleterious) being contingent on the size of the population it arises in. What influence this has on greater patterns of divergence, however, is still unclear. A number of other factors have also been implicated to explain observed variation in the degree to which selective forces have shaped genomes. These include historical demographic changes, population structure and/or variation in the degree of epistasis that occurs across the genome, influencing selective constraint. To date, few studies comparing closely related taxa of differing Ne have been carried out. Therefore, to address the question of how a species’ population size influences divergence patterns between it and other species, we here examine the relationship between Ne and signatures of selection using transcriptome data from four species of butterfly from two distinct families with large variations in Ne. We compare these species across two levels (within and between family), using a variety of methods to quantify the amount of amino acid substitution driven by positive selection (α). We also examine the distribution of fitness effects in each of the four populations. With these comparisons, we show that the effective population size of a species strongly correlates with measures of adaptive evolution. This work lends strong support to the idea that the efficacy of natural selection to facilitate new adaptations is greatest in larger populations, and that smaller population sizes may constrain evolution change in a species. This work has implications for a variety of disciplines from phylogenetics to conservation biology.

Divergence with gene flow across the speciation continuum in Heliconius erato

Brian Counterman1, Megan Supple2, Riccardo Papa3, W. Owen McMillan2
1Mississippi State University, Mississippi State, MS, USA, 2Smithsonian Tropical Research Institute, Washington DC, USA, 3University of Puerto Rico, Rio Piedras, Puerto Rico


A key to understanding the origins of species is determining the evolutionary processes that drive the patterns of genomic divergence during speciation. Using whole genome sequencing of Heliconius butterflies, we examine patterns of divergence between parapatric and allopatric taxa pairs with varying levels of reproductive isolation. We examine these patterns around a locus responsible for a major phenotypic switch in wing color pattern that is under divergent selection and drives reproductive isolation through assortative mating. As predicted, we find that genomic divergence increases with the degree of reproductive isolation. Between the incipient species H. erato and H. himera, we find unexpectedly high levels of divergence across the genome given that reproductive isolation is incomplete and hybridization is common. This divergence between the incipient species is substantially higher than the differences between parapatric hybridizing races and between allopatric races, indicating that selection on color pattern and a lack of recent gene flow can not explain the high levels of divergence between the incipient species. Our results suggest selection on multiple loci drives genome-wide divergence to accumulate early during speciation with gene flow.

Introgression and recombination shaping the evolution and population differentiation of an adaptive supergene in Heliconius butterflies

Annabel Whibley1 ,2, Mathieu Chouteau1 ,2, Lise Frézal1 ,2, Florence Prunier1 ,2, Mathieu Joron1 ,2
1CNRS Institute of Systematics, Evolution and Biodiversity, UMR 7205, Paris, France, 2Muséum National d’Histoire Naturelle, Paris, France

Butterfly mimicry is characterised by powerful selection shaping extraordinary wing pattern convergence between distantly related lineages. The Amazonian butterfly Heliconius numata maintains polymorphic mimicry across its entire range, every population harbouring multiple wing pattern forms mimicking, with high fidelity, each of the local mimicry groups. Polymorphism is controlled by a cluster of coadapted loci (a supergene) locked together by a complex of adjacent, polymorphic inversions. Structural polymorphism maintains differentiated haplotypes (supergene alleles) which differ in gene order and in nucleotide variation. We studied the origins and build-up of a co-adapted gene cluster using population genomics scans and whole-genome resequencing of populations of H. numata and closely related species across the entire range, ranging from the Andean foothills of Peru, Ecuador and Colombia to French Guiana and Atlantic Brazil. Contrasted patterns of shared polymorphisms within the supergene and genome-wide revealed that the supergene architecture in H. numata was initiated by a first inversion which shares a common origin with a non-sister species and was possibly transferred through ancient introgression. More recent inversions followed within the H. numata lineage, associated with a vigorous diversification of alleles controlling multiple distinct wing-pattern forms. The recombination landscape accorded perfectly with the precise location of inversion breakpoints and genotype-phenotype associations pinpointed sites associated with wing pattern regulation within the first inversion, revealing which genes underlie co-adaptations within the cluster. Finally, genome-wide demographic approaches revealed shallow geographic differentiation among H. numata populations, in contrast to some other species of the clade. Our results suggest how an initial inversion kick-started the formation of a complex genomic architecture associated with adaptive diversification under balancing selection, with far-reaching and unusual consequences in terms of demographics, population differentiation, and cladogenesis.

Evolution of protein spatial structures

Nick Grishin1 ,2
1UT Southwestern, Dallas, USA, 2HHMI, Dallas, USA


From the early days of protein structural biology, researches have been puzzled by the resistance of protein spatial structures to evolutionary changes. This amazing structural robustness combined with the limited number of available 3D structures has lead to a view that the abstract protein structure space is discrete, can be divided in a number of folds, and protein evolution mostly proceeds within the framework of the same fold. Today, when the majority of protein structural patterns have been experimentally determined, we see that it may be more realistic to view protein space as a continuum of structures, with “folds” being transformed into each other, and thus evolutionary bridges between structurally different proteins should exist. Many examples of proteins with statistically significant sequence similarity, but substantial structural differences have been documented. On one hand, an emergence of a new paradigm that protein structures are evolutionarily plastic and changeable has important applications for protein design and opens new frontiers in engineering of proteins that possess desired functional properties. On the other hand, the existence of proteins with similar sequences but different structures hinders homology-modeling methods, and our ability to detect such cases from sequence is crucial. Overview of mechanisms for fold changes in evolution will be presented and the best known examples of structural changes will be discussed. Most common molecular events in sequence evolution are point mutations, insertions/deletions and non-homologous recombination. Effects of these events on protein spatial structure will be shown. The problem of structural analogy versus homology will be touched upon.

Worldwide linguistic and genetic variation

Nicole Creanza1, Merritt Ruhlen1, Trevor Pemberton2, Noah Rosenberg1, Marcus Feldman1, Sohini Ramachandran3
1Stanford University, Stanford, CA, USA, 2University of Manitoba, Winnipeg, MB, Canada, 3Brown University, Providence, RI, USA

Inferences about human evolutionary history from large-scale genetic analyses are often interpreted in terms of other data types, such as the geographic locations or language classifications of sampled populations. However, previous studies have not fully integrated genetic, geographic, and linguistic data on a global scale. By synthesizing genetic data, geographic locations, and detailed linguistic information from globally distributed populations, we observed that genes and languages show broadly parallel patterns of decreasing complexity with distance from Africa; in linguistic data, however, there are numerous outliers from this trend. The serial reduction in genetic diversity with repeated founder events does not provide a mechanistic explanation that can be readily applied to languages, so we explore the geographical parallels in these data types and the evidence for both vertical and horizontal transmission of linguistic sounds.

Demography and the age of rare variants

Iain Mathieson1 ,2, Gil McVean1
1University of Oxford, Oxford, UK, 2Harvard University, Boston, USA


Rare variants are a rich and informative source of information about recent human demography. Patterns of rare variation across populations are informative about the structure and relationships between them, but by estimating the age of the variants, we can put this information into an historical context, and infer the dates of demographic events.


We develop an estimator that uses local haplotype information to estimate the age of rare variants across a range of frequencies and apply it to the 1000 Genomes Project Phase 1 dataset. This reveals enormous variation in the age of variants both across populations and across frequencies. The median age of variants present in the UK varies from 90 generations for variants at frequency 0.1% to 325 for variants at frequency 0.25%. Other populations harbor much older variants, with variants present in Luhya having median ages of 211 and 922 generations at these frequencies. Variants shared between population are older than those within populations though this effect is reduced as the frequency increases, since more common variants tend to predate population splits.


Different demographic scenarios leave distinctive patterns in the distributions of rare variants. We use this fact to infer possible histories for the populations involved, by finding patterns of splits, mixtures, and migrations which generate age distributions which match the observed patterns. A major advantage of our approach is that it is robust to uncertainty in the mutation rate, and thus provides an independent check on other estimator.

Balancing selection and maintenance of variation as a natural consequence of adaptation in diploids

Dmitri Petrov, Diamantis Sellis, Sandeep Venkataram, Zoe Assaf, Jamie Blundell, Philipp Messer
Stanford University, Stanford, CA, USA


Adaptation in diploid organisms is often modeled by focusing on marginal fitness of individual alleles as if they were evolving in a haploid system with twice the number of alleles. However, diploidy in sexual organisms (with or without recombination) imposes substantial new conditions for the invasion and maintenance of non-neutral alleles related to the dominance of their fitness effects. Specifically, recessive adaptive alleles are unlikely to participate in the adaptive process given that they cannot invade the population when rare, while recessive deleterious alleles can both persist in the population for long times and interfere with invasion of nearby adaptive alleles. Here I will discuss our modeling and some empirical results suggesting that adaptation in diploids is (i) likely to generally involve overdominant alelles and maintenance of genetic variation by balancing selection throughout the adaptive walk, (ii) generate less forward- and more backward-predictable adaptive paths compared to adaptation in haploids, and (iii) induce ephemeral balancing selection and “staggered sweeps” generated by the invasion of linked combinations of codominant/adaptive and recessive/deleterious alleles.

Confronting theory with data: The role of ploidy in evolution

Sarah Otto1, Aleeza Gerstein2, Jasmine Ono1
1University of British Columbia, Vancouver, BC, Canada, 2University of Minnesota, Minneapolis, MN, USA

The genetics of adaptation depend on the genomic context in which mutations arise. Ploidy level is one of the most important determinants of this context. The mutation rate per cell rises with ploidy level, but so too does the masking of new beneficial mutations. Theory for the impacts of ploidy on the rate of adaptation has identified the key properties that determine whether organisms with haploid, diploid, or polyploid genomes will evolve faster. Many of these properties, such as the dominance of beneficial mutations, remain poorly measured. In this talk, I summarize theory and experiments with yeast aimed at elucidating the impact of ploidy level on rates of adaptation.

Staggered sweeps: The obstruction of adaptation in diploids by recessive, strongly deleterious alleles

Zoe June Assaf, Jamie Blundell, Dmitri Petrov
Stanford University, Stanford, CA, USA

Hitchhiking was first seriously considered by evolutionary biologists in 1966 when William G. Hill and Alan Robertson described how genetic linkage between alleles can interfere with selective sweeps. In years since there have been many papers exploring the interplay between an advantageous mutation and its linked neutral, deleterious, or advantageous neighbors. This analysis was however largely limited to the study of codominant alleles and thus necessarily focused on the hitchhiking of weak deleterious alleles with stronger advantageous ones.
Here we show that strongly deleterious recessive mutations can indeed hitchhike with weaker codominant advantageous mutations. The frequency trajectory of the adaptive mutation in this case is dramatically altered and results in what we have termed a ‘staggered sweep’. It is named for its three-phased trajectory: (1) initially the two linked mutations have a selective advantage while rare and so increase in frequency together, then (2) at higher frequencies the recessive hitchhiker is exposed to selection and can cause balancing selection via heterozygote advantage (the ‘staggered’ phase), and (3) finally if the adaptive mutation recombines onto a new haplotype then it can finish its sweep to fixation.
We have characterized the dynamics of staggered sweeps through both analytics and forward simulations. We show that strongly deleterious mutations can both significantly decrease the probability of adaptive fixation events and prolong the time of selective sweeps. This effect extends for large genomic distances around the recessive deleterious allele, especially in small populations and for weaker adaptive events. In addition the signature of selection of successful staggered sweeps is sharply distorted compared to classical hard sweeps, and is similar to that of soft sweeps that appear to be located off-center from the true adaptive site. Our results show that adaptation in diploids will be especially slowed in small populations and for cases of weak adaptation, and suggest that studies of experimental evolution in obligately sexual diploids will need significant population sizes to escape the confounding effects of recessive deleterious variation.

Genomic evidence for ameiotic evolution and adaptation without sex in a bdelloid rotifer lineage.

Karine Van Doninck1, Jean-François Flot2, Boris Hespeels1, Olivier Jaillon3
1University of Namur, Namur, Belgium, 2Max Planck Institute for Dynamics and Self-Organization, Göttingen, Germany, 3CEA – Genoscope, Evry, France


Loss of sexual reproduction is considered an evolutionary dead end for metazoans, but bdelloid rotifers challenge this view as they appear to have persisted asexually for millions of years. However, current evidence does not exclude that they may engage in sex on rare, cryptic occasions. We found that the genome structure of a bdelloid rotifer, Adineta vaga, is incompatible with conventional meiosis. At gene scale, the genome of A. vaga is tetraploid and comprises both anciently duplicated segments and less divergent allelic regions. However, in contrast to sexual species, the allelic regions are rearranged and sometimes even found on the same chromosome. Such structure does not allow meiotic pairing; instead, we find abundant evidence of gene conversion, which may limit the accumulation of deleterious mutations in the absence of meiosis. Gene conversion may occur during mitotic recombination repair of broken DNA following cycles of desiccation and rehydration experienced by bdelloids in their temporary habitats.

Gene families involved in resistance to oxidation, carbohydrate metabolism and defence against transposons are significantly expanded in the genome of A. vaga, which may explain why transposable elements cover only 3% of the assembled sequence. Furthermore, 8% of the genes are likely to be of non-metazoan origin and were probably acquired horizontally. This apparent convergence between bdelloids and prokaryotes sheds new light on the evolutionary significance of sex.

Detecting local adaptation on standing variation

Jeremy Berg, Graham Coop
Dept. of Evolution and Ecology, UC Davis, USA

The hard sweep model, where a newly arisen beneficial mutation sweeps from its introduction to fixation, has dominated our understanding of the population genetic of adaptation and the genome-wide effect of linked selection. However, it is unlikely that the majority of adaptation occurs in this manner, for example a selected allele could be present on a number of haplotypes because multiple mutations during the sweep or due to the fact that the allele arose from neutral standing variation (so called soft-sweeps). I describe how to use comparisons across populations of to learn about selection on standing variation.

The accumulation of deleterious mutations during range expansions

Stephan Peischl1, Mark Kirkpatrick2, Laurent Excoffier1
1University of Bern, Bern, Switzerland, 2UT at Austin, Austin, TX, USA


We investigate the effect of spatial range expansions on the evolution of fitness when beneficial and deleterious mutations co-segregate. Using a mixture of individual-based simualtions and analytical approximations, we find that deleterious mutations accumulate steadily on the wave front during range expansions, thus creating an expansion load. Reduced fitness due to the expansion load is not restricted to the wave front but occurs over a large proportion of newly colonized habitats. The expansion load can persist and represent a major fraction of the total mutation load thousands of generations after the expansion. The phenomenon of expansion load may explain growing evidence that populations that have recently expanded, including humans, show an excess of deleterious mutations. To test the predictions of our model, we analyze patterns of neutral and non-neutral genetic diversity in humans and find an excellent fit between theory and data.

Anatomy of an ongoing soft selective sweep in SE Asian malaria parasites

Tim Anderson1, Shalini Nair1, Marina McDew-White1, Fatma Bilgiç1, Ian Cheeseman1, Standwell Nkhoma1, Rose McGready2, Elizabeth Ashley3, Aung Pyae Phyo2, François Nosten2 ,3
1Texas Biomedical Research Institute, San Antonio, Tx, USA, 2Shoklo Malaria Research Unit, Mahidol-Oxford Tropical Medicine Research Unit, Faculty of Tropical Medicine, Mahodol University, Mae Sot, Thailand, 3Centre for Tropical Medicine, Nuffield Department of Medicine, University of Oxford, Oxford, UK

Soft selective sweeps, in which multiple selected alleles spread through a population, are widely discussed in the adaptation literature, difficult to detect using population genomic data, and few empirical examples are available. We describe an ongoing selection event driven by artemisinin treatment in malaria parasites. Artemisinin has been used on the Thailand-Burma border since 1990, but parasites showing resistance to this drug increased in frequency over the past decade. A major gene determining resistance has recently been identified. We sequenced the gene involved (Kelch gene, chr 13) and genotyped 40 flanking SNPs in >1500 parasites sampled between 2001-13. We first observed mutations within the Kelch gene in 2003, and these rose to 75% frequency by 2013 driven by a selection advantage of 7-14% relative to wildtype alleles. Remarkably, we observed that 27 independent alleles, each bearing a different mutation within this gene, are spreading concurrently through this parasite population. Alleles characterized by C580Y mutation appear to be replacing those with mutations at other positions. This dramatic selective event allows us to examine in detail the dynamics of competing alleles as they transit through the population, to determine the impact on flanking variation, and test the power of different statistical approaches to detect soft selective sweeps. The rapid spread of one allele also raises the intriguing possibility that previously described hard sweeps may have originally involved multiple competing resistance alleles of which only one eventually survived.

Soft selective sweeps in complex demographic scenarios

Ben Wilson, Philipp Messer, Dmitri Petrov
Stanford University, Stanford, California, USA

The model of the selective sweep is a hallmark of population genetics theory. For adaptation from de novo mutation, we typically imagine a beneficial mutation arising in a single individual and sweeping to high frequency. When selective sweeps arise from mutations originating in multiple individuals within a population they are referred to as ‘soft’ selective sweeps, in contrast to the case of a ‘hard’ selective sweep where all individuals possessing the adaptive substitution are identical by descent. The probability of observing soft sweeps in populations of constant size is known to be largely determined by the population-scaled mutation rate quantity, Θ. In populations that flucutuate in size, what determines the probability of observing a soft sweep is not known. We develop a generalized framework for calculating the probability of observing soft sweeps in populations with complex demographic scenarios. We demonstrate the existence of weak selection and strong selection limits that allow complex scenarios to be understood through a comparison of the timescale of the selective sweep and the timescale of demographic changes. Our results indicate that the signatures of adaptation left by selective sweeps may not be trivially understood without a detailed knowledge of the recent population demographic history and the selection strength.

Widespread heterozygote advantage in diploids

Diamantis Sellis, Daniel Kvitek, Barbara Dunn, Katja Schwartz, Gavin Sherlock, Dmitri Petrov
Stanford University, Stanford, CA, USA

Balancing selection due to heterozygote advantage was once considered to be the main reason for the observed variation in nature and the cause of heterosis. However, well documented cases of heterozygote advantage are few and it is now considered to be a rare exception in adaptive evolution. Previously, we proposed a model for adapting diploids which predicted that heterozygote advantage is a common outcome of adaptation in diploids. To test this hypothesis we explore the fitness relationship of homozygote and heterozygote gene deletions from the yeast deletion collection and find that heterozygote advantage is indeed common in multiple environmental conditions. We also experimentally evolve diploid yeast clones in continuous cultures under glucose limitation and isolate and sequence adaptive clones. Using classical genetics we construct homozygotes for the evolved mutations and perform direct fitness comparisons. We again find that the first mutations that repeatedly appear independently in the adapting populations are more fit as heterozygotes. These findings indicate that heterozygote advantage is indeed a natural outcome of adaptation in diploids.

Recruitment of ancient genes into endometrial expression underlies the evolution of mammalian pregnancy

Vincent Lynch
The University of Chicago, Chicago, IL, USA



A long-standing challenge in biology is explaining how evolutionary novel characters originate, however, mechanistic explanations for the origin of novelties such pregnancy in mammals are almost completely unknown. The evolution of pregnancy is an excellent system in which to study the origin of novelties because extant mammals preserve major stages in the transition from egg-laying to live-birth. To identify genes that underlie the evolution of mammalian pregnancy we sequenced the uterine transcriptome from eight pregnant Eutherian mammals (human, Rhesus monkey, mouse, dog, cow, horse, pig, and armadillo), a pregnant marsupial (short-tailed opossum), a pregnant Monotreme (platypus), and two sauropsid out-groups (lizard and chicken) using high-throughput Illumina sequencing (mRNA-Seq) and inferred which genes evolved and lost uterine expression using parsimony. We found that hundreds of genes evolved uterine expression during the origins of pregnancy in mammals; key innovations include the recruitment of genes that suppress estrogen signaling (HAND2, DKK1, PTCH1 and PTCH2), mediate blastocyst/trophoblast attachment (SELL and VCAM1) and invasion (integrins and regulators of RHOA/RAC1 GTPase signaling) into the endometrium, and that establish maternal-fetal communication and maternal immunotolerance of the fetal allograft (IL15, CSF1, PD-1 ligands). Our results indicate that the recruitment of ancient genes into an evolutionary novel tissue and the co-coption of an antigen presenting cell (APC)-like identity by endometrial stromal cells to allow for local regulation of immune cells in the uterus played an essential role in the origins of pregnancy.

Integrative Genomic Studies of Evolution and Adaptation in Africa

Sarah Tishkoff
Departments of Genetics and Biology, University of Pennsylvania, Philadelphia, PA, USA


Africa is thought to be the ancestral homeland of all modern human populations. It is also a region of tremendous cultural, linguistic, climatic, and genetic diversity. Despite the important role that African populations have played in human history, they remain one of the most underrepresented groups in human genomics studies. A comprehensive knowledge of patterns of variation in African genomes is critical for a deeper understanding of human genomic diversity, the identification of functionally important genetic variation, the genetic basis of adaptation to diverse environments and diets, and the origins of modern humans. Furthermore, a deeper understanding of African genomic variation will provide the necessary foundation for powerful and efficient genome-wide association and systems biology studies to identify coding and regulatory variants that play a role in phenotypic variation including disease susceptibility. We have used whole genome SNP genotyping and high coverage sequencing analyses to characterize patterns of genomic variation, ancestry, and local adaptation across ethnically and geographically diverse African populations. We have identified candidate loci that play a role in adaptation to infectious disease, diet and high altitude, as well as the short stature trait in African Pygmies. Additionally, our studies shed light on human evolutionary history and African population history.

Genome architecture matters for very rapid evolution in threespine stickleback

Julian Catchen1, Susan Bassham1, Emily Lescak2, Frank von Hippel2, William Cresko1
1University of Oregon, Eugene, OR, USA, 2University of Alaska Anchorage, Anchorage, AK, USA

How does variation in genome architecture influence very rapid adaptation in the wild? We are using deep sequencing population genomics to address this question using threespine stickleback fish. As the last ice age ended and the glaciers retreated from coastal regions of Alaska, marine stickleback independently colonized newly-created freshwater lakes and evolved rapidly and in parallel in numerous phenotypes such as lateral plates, pelvic structure and head morphology. Laboratory mapping studies have located genomic regions associated with some of the phenotypic variation, and genome scans of wild fish have identified numerous regions that consistently differ between freshwater and marine stickleback. However, it it is still unclear how the complement of genetic variation for the freshwater phenotype is so rapidly assembled. One hypothesis is that given enough time hundreds or thousands of freshwater alleles segregating at low frequency in the marine population will be assembled in the freshwater populations. However, is this mechanism plausible if reassembly occurred much more rapidly, say in 50 years? We present a study of stickleback populations that colonized newly formed freshwater habitats on Middleton Island off the coast of Alaska following a massive earthquake in 1964. Our data supports an alternative hypothesis: that a significant component of the stickleback genome is segregating in large blocks, at least some of these blocks are associated with structural variation and this genomic architecture may facilitate reassembly of the freshwater phenotype in just a few decades.

Evolution of Genome Structure in the Vertebrate Lineage

Jeramiah Smith
University of Kentucky, Lexington, KY, USA

The sequencing and assembly of genomes from diverse vertebrate taxa is shedding new light on the deep ancestry and evolution of vertebrate genomes. Data from species including lamprey (Petromyzon marinus), spotted gar (Lepisosteus oculatus), coelacanth (Latimeria chalumnae) and amphibians (Ambystoma mexicanum and Xenopus tropicalis) reveal that karyotypic evolution occurred at a relatively slow rate over much of vertebrate ancestry, with more rapid rates being characteristic of mammals and teleost fish lineages. S striking example of this evolutionary conservatism lies in the observation that many chicken microchromosomes are the direct homologs of entire chromosomes in other vertebrate lineages.

These broadly informed comparative analyses also provide new detail regarding the effect of duplication over vertebrate ancestry. A recent assembly of the spotted gar genome reveals that a whole genome duplication in the teleost lineage (known as the teleost specific duplication – TSD) was predated by a large number of fusion events, but is not generally associated with increased rates of rearrangement following duplication. Large scale mapping and genome assembly data from lamprey reveal an alternative to the classical hypothesis that ancestral vertebrate lineage experienced two rounds of whole genome duplication (known at the 2R hypothesis), supporting a more conservative scenario involving a single whole genome duplication and small number of defined segmental duplication events.

Data from lamprey (and other species) also reveal that the relative evolutionary stasis of vertebrate genomes is often contrasted by dramatic developmental alterations of genome structure (i.e. programmed genome rearrangements) that result in genomic differentiation of somatic and germline cell lineages. Altogether, these recent sequencing studies are providing a clearer picture of the last 600 million years of vertebrate evolution, refining the focus of future comparative studies.

Genome scans from multiple populations with complex genetic structure: detecting local adaptation and parallel evolution at different scales.

Matthieu Foll
School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland

Detecting genes involved in local adaptation is challenging and of fundamental importance in evolutionary, quantitative, and medical genetics. Genome scans performed in a large number of populations from different origins and environments are becoming increasingly common, and one would ideally like to analyze all these data simultaneously to find markers under selection. However, unaccounted population history or population sub-structure may lead to an excess of false positives in meta-analyses. Therefore data sets are often split into a series of pairwise analyses. This procedure leads to a global loss of power as compared to a large single analysis, and it requires multiple tests corrections taking into account non-independence between analyses.

In order to alleviate these problems, we introduce a new hierarchical Bayesian method to detect markers under selection from genome scans. We identify selection as a deviation from a neutral model allowing for populations with complex histories clustered in different large geographic regions. Our method is very versatile as it can distinguish between local adaptation, adaptation between regions, as well as convergent evolution between regions.

We show the power of our method by re-analyzing a large SNP genome scan performed in low- and high-altitude populations from America and Asia. In addition to confirming previous candidate loci, the simultaneous analysis of these two geographic areas identifies several new genomic regions involved in adaptation to altitude. Unlike previous studies, we also show that convergent evolution is common in these populations, and we identify SNPs, genes, biological processes and pathways involved in high altitude adaptation in both Andeans and Tibetans.

Selection and demography: where are we now?

Rasmus Nielsen1 ,5, Michael DeGiorgio3, Thorfinn Korneliussen5, Christian Huber2, Ines Hellmann2, Anna Ferrer-Admetlla4, Mason Liang1
1UC Berkeley, Berkeley, CA, USA, 2University of Vienna, Vienna, Austria, 3Penn State University, University Park, Pennsylvania, USA, 4EPFL, Lausanne, Switzerland, 5Natural History Museum of Denmark, Copenhagen, Denmark


Identifying regions of a genome that may have been targeted by natural selection is one of the fundamental challenges in population genomic analyses. In most cases, selection cannot be directly linked to a phenotype for which differences in fitness can actually be measured. It is notoriously difficult to reliably measure differences in fitness in natural populations, and even in cases where fitness can be measured, the fitness effects may be so small, or may have affected the population in the past, so that a direct connection between selection inferred at the genetic level and fitness differences at the population level cannot be established. For this reason, results from studies aimed at detecting selection at the DNA sequence level can almost never be verified using direct observations. Consequently, the robustness of the methods for detecting selection, to violations of model assumptions, has been a topic of substantial research focus. While no method is fully robust to all model violations, some methods are more robust than others. In this talk I will review recent progress on developing robust methods and compare the robustness of different methods for inferring selection.


Co-varying genotypic changes associated with selective mortality at sea in Atlantic salmon (Salmo salar)

Louis Bernatchez
University Laval, Quebec, QC, Canada

Over the last 30 years, wild populations of Atlantic salmon have declined worldwide and increased mortality at sea is predicted to be one of the major contributing factors to this decline. Examining the potential changes occurring in the genome-wide composition of populations during this migration has have the potential to tease apart some of the factors influencing marine mortality. We genotyped 5568 SNPs in populations representing two distinct regional genetic groups and across two cohorts to test for differential allelic and genotypic frequencies between juveniles migrating to sea and adults returning to reproduce. We contrasted the outcomes of a single-locus FST based genome scan method with a new multi-locus framework to test for genetically-based differential mortality at sea. Numerous outliers were identified by the single-locus analysis but no evidence for parallel temporally repeated selection was found. In contrast, the multi-locus approach detected repeated patterns of selection for a multi-locus group of 34 SNPs in one of the two populations but not the other, suggesting different causes of mortality among populations. These results support the hypothesis that selection mainly causes small changes in allele frequencies among many co-varying loci rather than a small number of changes in loci with large effects.

Ancient DNA reveals the complex genetic history of the New World Arctic

Maanasa Raghavan1, Pontus Skoglund2, Michael DeGiorgio6, Anders Albrechtsen4, Ida Moltke5, Helena Malmström2, M. Thomas P. Gilbert1, Mattias Jakobsson2, Rasmus Nielsen3, Eske Willerslev1
1Centre for GeoGenetics, Natural History Museum of Denmark, Copenhagen, Denmark, 2Uppsala University, Uppsala, Sweden, 3University of California – Berkeley, Berkeley, California, USA, 4University of Copenhagen, Copenhagen, Denmark, 5University of Chicago, Chicago, Illinois, USA, 6Pennsylvania State University, University Park, Pennsylvania, USA


New World Arctic (North America and Greenland) was first occupied by modern humans around 5,000 years ago. The PaleoEskimos constituted the first two cultures to have peopled the region: the Pre-Dorset or Saqqaq culture (ca. 3000-800 BC) and the Dorset culture (ca. 800 BC-1300 AD). The NeoEskimos (Thule culture), who are considered to be ancestral to modern-day Inuit, were the latest migrants into the New World Arctic and spread eastwards from northern Alaska in around 1000 AD. However, despite decades of archaeological research having established when the cultural transitions occurred, there is no consensus on how these people were related to one another and whether one or several gene pools were represented in these different Arctic traditions. We present results from an ongoing study comprising the largest genomic dataset generated thus far on ancient human samples from sites in Siberia, Alaska, Canada and Greenland. Our research contributes new perspectives to the debate of cultural versus genetic replacement in the New World Arctic and also evaluates the extent to which the PaleoEskimos and the NeoEskimos have shaped the genetic structure of modern populations in the region.

Population history of South America: ancient DNA study of extinct people from Tierra del Fuego

Zuzana Faltyskova1, Hannes Schroeder2 ,3, Carles Lalueza4, Yolanda Espinoza4, Elena Gigli4, Oscar Ramirez4, Alfredo Prieto5 ,6, Susana Morano5, David Caramelli7, Elena Pilli7, Alessandra Modi7, Giorgio Manzi7, Alessandro Pietrelli8, Ermanno Rizzi8, Aurelio Marangoni9, Guido Barbujani10, Silvia Ghirotto10, Toomas Kivisild1, Maru Mormina1 ,11
1Division of Biological Anthropology, University of Cambridge, Cambridge, UK, 2Centre for GeoGenetics, University of Copenhagen, Copenhagen, Denmark, 3Faculty of Archaeology, Leiden University, Leiden, The Netherlands, 4Institute of Evolutionary Biology, Pompeu Fabra University, Barcelona, Spain, 5Institute of Patagonia, University of Magallanes, Punta Arenas, Chile, 6Autonomous University of Barcelona, Barcelona, Spain, 7Department of Biology, University of Florence, Florence, Italy, 8ITB CNR Institute for Biomedical Technologies, National Research Council, Milan, Italy, 9Department of Environmental Biology, University of Rome La Sapienza, Rome, Italy, 10Department of Life Sciences and Biotechnology, University of Ferrara, Ferrara, Italy, 11Department of Archaeology, University of Winchester, Winchester, UK

The details of the early human settlement of the Americas such as the dispersal time, number of migrations, and migration routes remain subject to debate. With many Native populations now extinct, the Pre-Columbian genetic make-up has been partly lost or blurred by recent admixture. The present study examines the mitochondrial genetic diversity of extinct Fuegian populations in order to illuminate the population history of South America.

The Fuegians lived on the islands of Tierra del Fuego in the Southern Cone of South America in isolation from other Native Americans until their extinction at the beginning of the 20th century, likely maintaining their original genetic signature without recent admixture. Based on the Fuegian robust cranial morphology, a few controversial studies have suggested that Fuegians might be descendants of a putative earlier migration wave preceding the arrival of the other Native Americans.

Using target enrichment and next-generation sequencing, we obtained complete mitochondrial genomes from skeletal remains of 37 Fuegians and 19 individuals from adjacent Patagonia. Comparing them to published sequences of other Native Americans, we estimated the divergence times and past population dynamics in the Southern Cone and we assessed the question of population continuity in the region. The coalescent ages of deep Fuegian-specific clades suggest early human settlement in Tierra del Fuego, probably associated with the initial peopling of the continent. The early arrival of Fuegians to the Southern Cone is consistent with the generally accepted scenario of rapid coastal dispersal throughout the Americas, which is further supported by the presence of Monte Verde, the oldest known South American pre-Clovis archaeological site, in Chilean Patagonia. In this presentation, alternative views on Fuegian origins and their genetic affinities with other Native Americans are considered in the context of the evolutionary history of South American populations.

Tracing the genetic ancestry of enslaved Africans using ancient DNA

Hannes Schroeder1 ,2, María C. Ávila-Arcos1 ,4, Pontus Skoglund3, Meredith Carpenter4, Anna Sapfo Malaspinas1, Marcela Sandoval-Velasco1, Jose Víctor Moreno-Mayar1, Morten Rasmussen1 ,4, Jay B. Haviser2, Ludovic Orlando1, Antonio Salas5, Carlos Bustamante4, Mattias Jakobsson3, M. Thomas P Gilbert1
1Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark, 2Faculty of Archaeology, Leiden University, Leiden, The Netherlands, 3Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, Sweden, 4Center for Computational, Evolutionary and Human Genomics, Stanford, California, USA, 5Instituto de Ciencias Forenses ‘Luís Concheiro’, Universidade de Santiago de Compostela, Santiago de Compostela, Spain

Between the 16th and 19th centuries, over 12 million Africans were kidnapped in Africa and transported to the Americas as a result of the transatlantic slave trade. The captives were taken from various parts of mainly West and West Central Africa but their precise origins often remained unknown or were deliberately obscured. In this study, we sequenced enriched DNA libraries from 17th century remains of three enslaved Africans, who had died on the Caribbean island of Saint Martin, in an attempt to trace their ancestral origins in Africa. Our results show that the three captives, who had been buried together, are genetically related to different populations in Africa, including Bantu and non-Bantu speakers. This suggests that they might have originated from different parts of Africa and reflects upon the nature of the transatlantic slave trade and its role in shaping the population history of the Americas.

A framework to infer fitness landscapes from experimental evolution data

François Blanquart, Thomas Bataillon
Bioinformatics Research Centre – University of Aarhus, Aarhus, Denmark

The process of adaptation results from natural selection acting on existing genetic variation. Many fundamental features of this process, such as its speed or repeatability, depend on the rate at which beneficial mutations appear, on population size and, perhaps more importantly, on the fitness landscape. The fitness landscape defines the mapping between genotypes and fitness, and thus encapsulates several important properties such as the distribution of fitness effects of new mutations, the magnitude and sign of epistasis among mutations, or the level of ruggedness. These properties of fitness landscapes can now be determined empirically in microbial species thanks to the rise of experimental evolution and increasingly sophisticated molecular techniques. Yet, no generic method has been proposed to link more directly the growing amount of experimental data to theoretical models of fitness landscapes.

Here we develop a flexible statistical framework based on Approximate Bayesian Computation to infer the parameters of a fitness landscape model using experimental evolution data. We focus on experiments where a set of mutations (constructed, or evolving naturally or in experiments) is identified, and where the fitness of several genotypes carrying combinations of these mutations is measured. We develop a statistical method that fits this type of data to a broad class of phenotypic fitness landscape models. More precisely, we use a generalized version of Fisher’s geometric model, whereby fitness is determined by stabilizing selection in a multivariate phenotypic space, and mutations cause Gaussian deviations in this phenotypic space. In a first step, we test the power of the framework using Monte Carlo simulations, and show that a reasonable amount of data (fitness of 20 – 40 genotypes carrying 4-5 different mutations) allows accurate inference of the parameters of the landscape. In a second step, we use the framework to infer the properties of the landscape in several existing datasets. This analysis reveals that the experimental data accumulated so far in various species (virus, fungi and bacteria) is largely compatible with Fisher’s model. Thus, Fisherian fitness landscapes appear as a flexible and general tool to quantify and predict the dynamics of adaptation.

Tracking hundreds of thousands of lineages in an evolving population allows determination of the beneficial mutation rate and elucidation of the distribution of their fitness effects

Sasha Levy1, Jamie Blundell2, Sandeep Venkataram2, Dmitri Petrov2, Daniel Fisher2, Gavin Sherlock2
1Stony Brook University, Stony Brook, New York, USA, 2Stanford University, Stanford, California, USA

Current experimental evolution studies identify some beneficial mutations but lack the resolution to place these mutations in a broader context of population dynamics. Contributing to this disconnect is a limited understanding of the rate at which beneficial mutations accumulate and the distribution of their selection coefficients — two population parameters that control the extent of clonal interference and thereby the impact of any given beneficial mutation on the population. We have developed a method to track hundreds of thousand of lineages during experimental evolution studies. By tracking the relative frequencies of lineages over time, we observe the granular dynamics of an evolving population and measure, at high resolution, its population parameters. We find a high rate of beneficial mutations and a roughly exponential distribution of selection coefficients. Knowledge of the population size, beneficial mutation rate, and the distribution of beneficial selection coefficients is sufficient to predict the initial population dynamics, including a mass extinction of lineages that lack a beneficial mutation at ~100 generations. These results suggest, that, at most naturally occurring population sizes, population dynamics are dominated by clonal interference rather than clonal sweeps. Using this system, we are also able to isolate independent lineages for sequencing, and more fully characterize the spectrum of beneficial mutations, and map those mutations back to the fitnesses derived from our lineage tracking, to determine how mutation of different pathways modifies fitness (see abstract #0620 by Venkataram).

Widespread domestication of degraded prophages by their bacterial hosts

Louis-Marie Bobay1 ,3, Marie Touchon1 ,2, Eduardo Rocha1 ,2
1Institut Pasteur, Paris, France, 2CNRS, Paris, France, 3UPMC, Paris, France


Integrated phages (prophages) are major contributors to the diversity of bacterial gene repertoires. Bacterial molecular systems involved in secretion, defense, warfare and gene transfer have resulted from the domestication of prophages. How the fast turnover of prophages translates into domestication processes remains unclear. We used comparative genomics to study the evolution of prophages within the bacterial genome. We identified over 300 vertically inherited prophages within enterobacterial genomes. Some of these elements are very old and a few might even be shared by Escherichia coli and Salmonella enterica. The size of prophage elements is bimodal; suggestive of rapid prophage inactivation followed by much slower gene degradation. Accordingly, we observed a pervasive pattern of strong purifying selection acting on almost all vertically inherited prophages. Importantly, purifying selection is observed not only on accessory regions, but also in core phage genes, such as structural genes and lysis genes. This suggests strong selection by the bacterial host for phage-associated functions. Several of these conserved prophages show gene repertoires compatible with described functions of prophage-derived elements such as killer particles, gene transfer agents or satellite prophages, but few seem to be definitely stabilized in the host genomes. We suggest that bacteria often transiently domesticate their prophages after the initial accumulation of inactivating mutations. These domestications are frequently short-lived because they are replaced by more recent elements. This puts the bacterial genome in a state of continuous flux of acquisition and loss of phage-derived adaptive genes.

Generation-Proxy Selection Mapping in Cattle and Maize

Jared Decker1, Jeffrey Ross-Ibarra2, Robert Schnabel1, Justin Gerke3, Michael McMullen1, Jeremy Taylor1
1University of Missouri, Columbia, MO, USA, 2University of California-Davis, Davis, CA, USA, 3DuPont Pioneer, Des Moines, Iowa, USA

We recently developed a method, Generation-Proxy Selection Mapping (GPSM, pronounced “gypsum”), to identify loci responding to ongoing selection on complex or quantitative phenotypes while accounting for kinship and population structure. GPSM is designed to identify “soft sweeps” or polygenic selection. In cattle, we fit birth date as the continuous dependent variable and generation-proxy in a mixed-model genome-wide association analysis (which we originally called Birth Date Selection Mapping). In maize, we fit breeding cycle (discrete) or donation date (continuous) as the generation-proxy dependent variable again in a polygenic model. Loci at which drift is the primary force influencing allele frequency do not experience large and directionally consistent changes in allele frequency and are not strongly predictive of a generation-proxy, such as birth date or cycle. However, loci under directional selection (natural or artificial) have relatively large, consistent changes in allele frequency over time and are strongly predictive of generation-proxies. The use of a genomic relationship matrix in the mixed-model analysis explicitly accounts for relationships between samples, and deconvolutes changes in allele frequency due to selective forces and demographic changes. We previously analyzed 3,570 Angus animals born over a 50 year period and identified selected regions putatively involved in immune function, growth, and reproductive traits. In this work we report the analysis of 3,664 Angus, 927 Hereford, 3,192 Holstein, 2,256 Limousin, and 1,149 Simmental with imputed genotypes for 434,206 SNPs. We again identified selected genomic regions harboring genes with immune function, but also identify selection at loci controlling coat patterns and the absence of horns. Further, we compare our results to those generated by use of the iHS statistic. Finally, we also analyzed maize data sets where we fit breeding cycle or donation date as the dependent variable. This method can be applied to natural populations, especially those with short generation intervals provided generation-proxy data are available. GPSM is an effective method for identifying genomic regions harboring functional mutations which are responding to selection on Mendelian or complex phenotypes.

Patterns of genetic diversity in Latin America: insights from human population genomics

Andres Moreno Estrada
Stanford University, Stanford, CA, USA

Indigenous populations from the American continent have been largely underrepresented in large-scale genomic studies, yet they are bearers of a unique history from one of the regions of the world where proportionally more novel variation remains to be discovered. I will discuss recent efforts to characterize the genetic profile of Native Americans throughout the continent and focus on regional approaches aimed at resolving finer scale population structure patterns in Mexico, South America, and the Caribbean. By generating genome-wide SNP data for hundreds of individuals from both indigenous and recently admixed cosmopolitan populations and developing novel methodologies to investigate ancestry patterns at the sub-continental level, we reconstruct the pre- and post-colonial history of each region. We trace back the origin of ancestral components of admixed Latin Americans to their closest source among Native American, European, and African populations. Our work demonstrates that dense population genomic data coupled with novel methods of admixture deconvolution afford the possibility of reconstructing human population genetic history with far greater resolution than previously thought.

The Brazilian EPIGEN Initiative: admixture, history and epidemiology at high resolution

Eduardo Tarazona-Santos1, Mauricio Barreto2, Bernardo Horta3, Maria Fernanda Lima-Costa4, Andrea Horimoto5, Nubia Esteban5, Fernanda Kehdy1, Wagner Magalhaes1, Maira Rodrigues1, Mateus Gouveia1, Moara Machado1, Rennan Moreira1, Jose Sanches5, Hadassa Santos5, Fernanda Soares1, Alexandre Pereira1
1Universidade Federal de Minas Gerais, Minas Gerais, Brazil, 2Universidade Federal da Bahia, Bahia, Brazil, 3Universidade Federal de Pelotas, Rio Grande do Sul, Brazil, 4Fundação Oswaldo Cruz, Centro de Pesquiça René Rachou, Minas Gerais, Brazil, 5Instituto do Coração, Universidade de São Paulo, Sao Paulo, Brazil

As part of the largest Latin-American genomic initiative, we studied three Brazilian longitudinal populational cohorts: Salvador-Bahia (n=1309), Bambui (n=1442) and Pelotas (n=3736) from Northeast, Southeast and Southern Brazil respectively. We genotyped the Omni2.5M-Illumina for the 6487 individuals, the Omni5.0M-Illumina for 265 individuals and sequenced 30 genomes (average coverage: 42X). While Amerindian ancestry was low (5-7% at population level, with no individual with > 30% of this ancestry), the three populations showed individuals with all possible combinations of African and European ancestry. At population level, African ancestry ranged from 14-15% in Pelotas and Bambui to 51% in Salvador. Our unprecedented high-resolution analysis of population structure of Brazilians in a worldwide context shows that European and African contributions differ from African-Americans at subcontinental level, due to a Mediterranean component and to the African origins of immigrants that in Brazil include Mozambique and Angola (not included in current genomic initiatives). Our large and highly admixed dataset allow inferences based on the distribution of local chromosome ancestry. We are currently inferring the dynamics of the admixture process in different parts of Brazil, as well as the time and mode of arrival of clinically relevant mutations. By the genome sequencing we identified between 3.6 M and 4.4 M of autosomal SNPs per each individual, and the high levels of admixture of Brazilians allowed us to identify ~1.6 M of new autosomal SNPs. We are separating the African, European and Native American constituents of the 30 genomes to analyze the distributions of different class of variants in function of their ancestry. The EPIGEN Initiative is also performing several GWAS and admixture mapping studies on different complex traits, including longitudinal data. Funding: Brazilian Ministry of Health/FINEP.

Patterns of human mutation rate revealed by sequencing of 250 parent-child trios.

Laurent Francioli1, Paz Polak2, Amnon Koren3, Genome of Netherlands Consortium Genome of Netherlands Consortium4, Paul de Bakker1, Shamil Sunyaev2
1University of Utrecht Medical Center, Utrecht, The Netherlands, 2Genetics Division, Department of Medicine, Brigham & Women’s Hospital, Harvard Medical School, Boston MA, USA, 3Department of Genetics, Harvard Medical School, Boston MA, USA, 4Genome of Netherlands Consortium, The Netherlands, The Netherlands


Incessant mutations population fuel evolution and create genetic variation. Molecular mechanisms responsible for spontaneously occurring mutations include low replication fidelity and improper repair of lesions arising from DNA damage. Understanding human mutation rates and patterns is important for deciphering mechanisms of DNA replication and repair, yet its importance extends far beyond. Assumptions concerning local mutation rate are critical for the vast arsenal of tools of evolutionary biology, including tools for analyzing relationships between species and between population, and tools for detecting natural selection. In human genetics, methods based on recurrent de novo mutations have been recently proposed for mapping genes underlying complex diseases.

Our current knowledge of spontaneous mutagenesis derives from three sources, all of them indirect. Experiments in model organisms utilizing reporter assays have provided most of the information on the fundamental biological mechanisms of mutation. However, these experiments are biased towards properties of specific reporters and features of non-human organisms. Comparative genomics studies, and studies of population genetic variation, aim to provide a full characterization of human mutation rates and properties. However, since they infer properties of mutagenesis indirectly, they are biased by other factors influencing population genetic variation and species divergence, such as natural selection.

We present the analysis of more than 11,000 de novo mutations identified by complete genome sequencing of 250 parent-offspring trios from the Netherlands. The large number of genomes enabled us to study genomic variation in mutation rates and patterns at an unprecedented detail, and to revisit current knowledge based on comparative genomics, reporter system experiments and disease mutation. We designed a new strategy to call de novo mutations from trio sequencing data. This strategy was supported by extensive validation experiments. To control for biases due to sequencing technology and the bioinformatics pipeline, we simulated de novo mutations uniformly at the level of individual sequencing reads, and applied the same processing pipeline to the simulated dataset of mutations. We then compared the genomic distribution of high confidence de novo mutations to the simulated baseline, revealing biological properties of human germline mutations.

We analyzed context-dependency of mutation rate, and strand dependency in transcribed regions. These observations generally confirmed results of comparative genomics analysis. However, the analysis of regional variation and dependency of mutation rate on replication timing and local recombination rate revealed several highly surprising patterns. We also analyzed properties and abundance of clustered mutations.

Robust detection of hard and soft selective sweeps using haplotype statistics

Nandita Garud1, Philipp Messer1, Erkan Buzbas2, Dmitri Petrov1
1Stanford University, Stanford, CA, USA, 2University of Idaho, Moscow, ID, USA

Selective sweeps often leave distinct, elevated haplotype homozygosity signatures in population genomic data and are thus the focus of many scans for adaptive events. However, demography has long been a confounding variable in scans for selective events. We recently developed a new haplotype-based method that has substantial power to detect both hard and soft sweeps and to differentiate them from each other, unlike most existing methods which are designed to detect only hard sweeps. Hard sweeps arise from one mutation rising in frequency, whereas soft sweeps arise from multiple mutations rising in frequency. We now assess the susceptibility of our method to a wide number of demographic events such as admixture, population bottlenecks, and population substructure.

We applied our method to 145 inbred Drosophila strains from North Carolina (from the DGRP data set) and found compelling evidence for pervasive soft selective sweeps in addition to recovering three known cases of soft sweeps, which were discovered empirically previously. To control for any confounding demographic effects, we performed extensive simulations under several realistic admixture and bottleneck models published in the literature for North American Drosophila to calculate a 1-per-genome false discovery rate. We found our top 50 most extreme cases of adaptation are robust under these various models. We also repeated our scan in multiple data sets of the same North Carolina population and recovered several of the same top candidates in these different samples, including the three known cases of soft sweeps. Finally, using an approximate Bayesian computation method, we performed our test of ‘softness’ for each of the identified sweeps under both a constant Ne=106 and an admixture demographic model and found that our results are conservative for the purposes of inferring softness under the more computationally efficient and simpler constant Ne=106 model.

Our results show that we can apply our method to a wide number of organisms even without the exact knowledge of the underlying demography. Soft sweeps should generally be common in all populations with large census sizes, and thus might be abundant in many other organisms as well, including plants, marine invertebrates, insects, microorganisms, and even modern humans when considering very recent evolution in the population as a whole. Our method provides a robust framework for the detection of the many hard and soft sweeps present in nature.

Private haplotypes can reveal local adaptation

Agnès Sjöstrand1 ,2, Per Sjödin2, Mattias Jakobsson2
1EBC, Uppsala University, Uppsala, Sweden, 2UMR7206, MNHN, Paris, France, 3TIMC-IMAG, Grenoble, France

Genome-wide scans for regions that demonstrate deviating patterns of genetic variation have become common approaches for finding genes targeted by selection. Several genomic patterns have been utilized for this purpose, including deviations in haplotype homozygosity, frequency spectra and genetic differentiation between populations. We describe a novel approach based on the Maximum Frequency of Private Haplotypes – MFPH – to search for signals of recent population-specific selection. The MFPH statistic is straightforward to compute for phased SNP- and sequence-data.
Using both simulated and empirical data, we show that MFPH can be a powerful statistic to detect recent population-specific selection, that it performs at the same level as other commonly used summary statistics (e.g. FST, iHS and XP-EHH), and that MFPH in some cases capture signals of selection that are missed by other statistics. This is the case for DOCK3 and CISH, genes shown to be involved in height in Pygmy groups: In the Maasaï, the regions where these genes are located show a strong MFPH signal although none of the other statistics we investigated revealed any significant signal of selection.
From the analysis of both simulated and publically available data, we show that MFPH represents an additional summary statistic that can provide further insight concerning population-specific adaptation.

The role of ecological niche adaptation in the evolution of the olfactory receptor subgenome in mammals.

Graham Hughes1, Desmond Higgins1, Sara Hayden1 ,3, William Murphy4, Mary O’Connell2, Emma Teeling1
1University College Dublin, Dublin, Ireland, 2Dublin City University, Dublin, Ireland, 3University of Washington, Seattle, USA, 4Texas A&M University, College Station, USA


Olfaction, the ability to detect odour molecules, is an important method of sensory perception in mammals. It is used to locate food, avoid predators and as a means of social signalling. Mammalian smell is governed by olfactory receptors (ORs), G-coupled protein receptors coded for by the largest multi-gene family. OR genes contain no introns, are up to 1kb in length and account for up to 6% of mammalian protein coding genes. ORs are split into two Classes. Class I dominates fish OR repertoires while Class II dominates terrestrial animals. These are further divided into 4 and 9 subfamilies respectively. The OR gene repertoire evolves under a gene birth – and – death model, where genes are ‘born’ through duplication and ‘die’ through pseudogenisation. Various paradigm shifts in mammalian evolution have given rise to adaptations to new ecological niches. Examples of such niches include aquatic, terrestrial and volant habitats, and frugivorous, carnivorous and insectivorous diets. A link between adaptation to niche and loss of OR function has already been established. To further explore the effects of environmental adaptations on OR genes, we have generated novel data from a variety of mammalian species using next-generation sequencing technologies (n=17), standard gene cloning (n=20), and coupled these new data with published whole genomes (n=60).


In total we explored the OR subgenome in over 90 different mammalian taxa consisting of 77,030 OR genes. We elucidated whether species inhabiting different environmental niches (terrestrial, aquatic and volant) show unique gain or loss of function in the OR repertoire, and whether there is evidence of selection for specific and similar ORs in each habitat. We further investigate the link between OR repertoire and dietary niche using gene duplication and pseudogenisation analyses to identify species-specific repertoire evolution, focusing on fruigivores, insectivores, herbivores and carnivores.. We analyze these data in light of visual acuity of each species, as well as alternative methods of sensory perception such as echolocation. Finally we analyse species – specific duplications to investigate if sociality or nocturnally play any significant role in OR repertoire evolution. Using all these data, we establish the role of olfaction in driving adaptation to unique ecological niches during the evolution of a large number of mammalian species.

The new advances of epigenomics enlightening adaptation processes in wild and domesticated plant species

Catarina F. Lira-Medeiros1, Amy Litt1
1The New York Botanical Garden, New York, USA, 2Diretoria de Pesquisa, Instituto de Pesquisas Jardim Botânico do Rio de Janeiro, Rio de Janeiro, Brazil

The methylation of cytosines is an epigenetic phenomenon linked to various processes such as genome stability and regulation of gene expression. Due to the sessile nature of plant species, adaptation processes occur in response to alterations in habitat and also through artificial processes such as domestication. The understanding of cytosine methylation alterations in different plant tissues and also related species are crucial for understanding the link between epigenomics and plant adaptation. Studies of genome-wide methylation patterns in model species are expanding through the use of large-scale sequencing, however studies in wild species lag behind due to the lack of reference genome sequences. MSAP (Methylation-Sensitive Amplification Polymorphism), which does not depend on previous knowledge of DNA sequence, is often used to evaluate genome-wide methylation in wild species however it does not allow identification of specific loci that are methylated. Domestication is an interesting process of artificial selection to obtain economically desirable phenotypes in plants. The domestication of tomato (Solanum lycopersicum) from its wild relative (S. pimpinellifolium) was mostly focused on fruit size, shape and taste. We used whole-genome bisulfite sequencing to identify genes that are differentially methylated in the two species to identify those that might be linked to the domestication process through alteration of gene expression. These advances in the study of the domestication process could be applied to many other cultivated and crop species and can help to increase our understanding of cytosine methylation alteration and long-term adaptation of plants to the environment.

Deciphering the Sequence and Evolution of Gorilla Y Chromosome Using a Combination of Short and Long Read Technologies

Marta Tomaszkiewicz1, Samarth Rangavittal1, Monika Michalovova1, Oliver Ryder2, Malcolm Ferguson-Smith3, Anton Nekrutenko1, Rayan Chikhi1, Paul Medvedev1, Kateryna Makova1
1Center for Medical Genomics, Penn State University, University Park, PA, USA, 2San Diego Zoological Society, San Diego, CA, USA, 3University of Cambridge, Cambridge, UK


Mammalian Y chromosomes have always intrigued evolutionary biologists because of their role in sex determination and male fertility, and because of their rapid genetic degradation. Yet, Y chromosome studies have been hampered by the unusual structure of the Y. Indeed, the abundance of repeats, palindromes, and heterochromatic regions precluded sequencing Y chromosomes in many species. By now, only a handful of mammalian Y chromosomes have been sequenced and this has been achieved by an extraordinarily laborious and expensive effort including BAC sequencing and assembly. Here we propose a fast and cost-effective alternative to Y chromosome sequencing which embraces new sequencing technologies and assembly algorithms, and can be applied to any species of interest. Using this method, we determined the sequence of gorilla Y chromosome which allowed us to address a myriad of long-standing evolutionary questions.

Flow-sorted gorilla Y chromosome DNA was used to construct paired-end and mate-paired libraries sequenced with Illumina technology at high depth. As flow-sorted Y material might contain some autosomal DNA, in the first analysis step, we isolated Y-chromosomal reads by employing a k-mer based algorithm relying on sequence depth differences between the Y and autosomes (in the flow-sorted material, the Y-chromosomal reads are expected to be significantly enriched). Next, we assembled Y-chromosome reads with the ALLPATHS-LG assembler. This resulted in 1,886 scaffolds with a total length of 27 Mb (scaffold N50 of 39 kb) effectively covering most of the euchromatic portion of the gorilla Y chromosome.

We also sequenced flow-sorted gorilla Y-chromosome DNA with Pacific Biosciences technology and used the resulting long reads to improve the Illumina assembly. This resulted in 8 Mb of additional sequence, an increase of scaffold N50 to 47 kb and reduction in number of scaffolds by 15%. In a separate experiment, we sequenced gorilla testis transcriptome. By removing transcriptome reads which aligned to the female genome, we isolated Y-chromosome reads that were used to further improve the Y-chromosome assembly. Several computational and wet-lab techniques were used to validate the gorilla Y-chromosome assembly.

The gorilla Y-chromosome sequence generated and assembled here for the first time has enabled an evolutionary comparison of the Y among human, chimpanzee and gorilla at both genic and non-coding levels, and taking differences in the mating patterns of these great apes into account. Moreover, the available gorilla Y chromosome sequence will serve as an invaluable resource for designing genetic markers to trace male dispersal in endangered gorilla populations.

Genomics and the molecular basis of hybrid incompatibilities.

Nitin Phadnis1, Emily Baker3, Jacob Kitzman2, Kimberly Frizzel1, Emily Hsieh3, Jay Shendure2, Harmit Malik3 ,4
1University of Utah, Salt Lake City, UT, USA, 2University of Washington, Seattle, WA, USA, 3Fred Hutchinson Cancer Research Center, Seattle, WA, USA, 4Howard Hughes Medical Institute, USA



Speciation – the process of one species splitting into two – involves the evolution of reproductive isolating barriers such as the sterility or inviability of hybrids between previously interbreeding populations. An indispensible step in understanding the molecular basis of speciation involves the identification of hybrid incompatibility genes that underlie hybrid sterility or inviability. Despite decades of intense efforts, we still know only a few such genes and even less about the molecular basis of hybrid incompatibilities. Perhaps the best-studied cases of hybrid incompatibility genes are those that affect hybrids between D. melanogaster and D. simulans. In crosses between D. melanogaster females and D. simulans males, the resulting hybrid F1 females are sterile, while the hybrid F1 males are inviable. H.J. Muller’s classic X-ray experiments revealed that an interaction between the D. melanogaster X chromosome and dominant alleles from the D. simulans second and third chromosomes is necessary for the hybrid male inviability. Genetic studies over the past two decades have identified two of these hybrid incompatibility genes: Hybrid male rescue (Hmr) on the D. melanogaster X chromosome and Lethal hybrid rescue (Lhr) on the D. simulans second chromosome. Hmr and Lhr, however, are not sufficient to cause hybrid inviability; at least one additional gene is required for this hybrid inviability. Identifying this missing hybrid incompatibility gene, however, remains a challenging problem due to the sterility of hybrid intermediates, the absence of genetic tools such as deletions or balancers in D. simulans, and the lack of new hybrid rescue mutations. Here, we devised a new genomics-based approach to identify hybrid incompatibility genes that affect F1 hybrids and applied this method to successfully identify this missing hybrid incompatibility gene from D. simulans. The identity and function of this gene provides important insights into the molecular basis of hybrid inviability and opens the door to detailed functional studies in this classic hybridization. Finally, our approach is readily applicable to many model and non-model systems and may accelerate the identification of hybrid incompatibility genes in other species.

What changes matter? A genomic approach to human evolution

Nicolas Rohner1, Michael Zody2, David Reich1, Steven McCarroll1, Daniel Lieberman3, Clifford Tabin1
1Harvard Medical School, Boston, USA, 2Broad Institute of MIT and Harvard, Cambridge, USA, 3Harvard University, Cambridge, USA

We humans and our closest relatives the chimpanzees differ only in 1-2 % of our genomes. Despite this genetic similarity we differ in many anatomical and behavioral traits. Upright walking and larger brains are just two prominent examples amongst many others that allowed us to adapt to new environments. Although full genome sequences are now available for humans, chimpanzees and other primates, surprisingly little is known about the genetic basis underlying these traits. One reason being that even within a 1-2% difference lie many genetic changes potentially driving human evolution. Because open-reading-frames of genes tend to be very similar between great apes, it has been argued that the majority of significant evolutionary changes affect cis-regulatory mutations. To identify regulatory changes specific to the human lineage we undertook a whole genome approach by aligning human, chimpanzee, macaque, and mouse genomes and focusing on conserved non-coding regions. We identified 298 human-specific deletions potentially removing cis-regulatory elements. We used a mouse transgenic approach to test if the deletions affect enhancer activity. Indeed out of 12 tested elements, 4 showed tissue-specific expression at diverse developmental stages. We focused on two human-specific deletions for further study. The first removes an enhancer element near the gene OSR2, and its expression argues for a role in human palate, cranial base and jaw development. The second deletion removes a regulatory element in the gene ACVR2A. Its expression pattern and the phenotype of a full knockout of ACVR2A in mouse point to its role in the human specific shortening of digit 2-5 and the smaller size of upper incisors in humans. We are currently mimicking the human situation by removing the corresponding piece in each of two different mouse models to test the ability to generate human-like phenotypes.

Genomic signatures of selection for behavior in the silver fox (Vulpes vulpes)

Anna Kukekova1, Jennifer Johnson1, Shiping Liu2, Yury Herbeck3, Anastasiya Kharlamova3, Rimma Gulevich3, Anastasiya Vladimirova3, Halie Rando1, Jessica Hekman1, Feng Shaohong2, Xueyan Xiang2, Lyudmila Trut3, Guojie Zhang2
1University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA, 2China National Genebank, BGI–Shenzhen, Shenzhen, 518083, China, 3Institute of Cytology and Genetics of the Russian Academy of Sciences, Novosibirsk 630090, Russia


The fox strains (Vulpes vulpes) developed by long-term selection for behavior offer an opportunity to study genetic regulation of complex social behaviors. Selection of conventionally farm-bred foxes, separately, for tame and for aggressive behavior, has yielded two strains with markedly different, genetically determined, behavioral phenotypes. Tame-strain foxes are eager to establish human contact paralleling the sociability of friendly dogs. Foxes from the aggressive strain avoid contact and demonstrate aggressive response to humans. To identify genomic targets of selection for behavior, the genomes of ten foxes from each of three populations (tame, aggressive, and conventionally farm-bred) were sequenced using Illumina paired end technology. ~25x sequence coverage was obtained for each of three population yielding in total ~75x genome coverage of the fox. The fox reads were aligned against the scaffolds of the newly available draft of the fox genome. 8.5 million SNPs were identified using UnifiedGenotyper from the GATK package. To identify regions of increased homozygosity in the fox populations, pooled heterozygosity (Hp) was calculated separately for each of three populations using a sliding window approach. The 100 Kb windows were moved along the genome in 50 Kb steps; only windows containing 20 or more SNPs were considered. The number of 100 Kb windows with Hp =< 0.1 was 1494 in tame, 431 in aggressive and 59 in conventionally farm-bred population. To distinguish between selective sweeps associated with selection for behavior and regions of random fixation, regions of increased homozygosity were compared to QTL intervals identified by mapping behavioral traits in fox cross-bred pedigrees. The selective sweep data allowed to refine QTL intervals and pinpoint positional candidate genes implicated in behavioral differences between the fox strains.

Accumulation of Slightly Deleterious Mutations as a Hallmark of Domestication

Austin Hughes
University of South Carolina, Columbia SC, USA


The hypothesis that domestication lead to a relaxation of purifying selection was tested by comparative analysis of mitochondrial (mt) genes from dog, pig, chicken, and silkworm. The three vertebrate species showed mt genome phylogenies in which domestic and wild isolates were intermingled, whereas the domestic silkworm (Bombyx mori) formed a distinct cluster nested within its closest wild relative (B. mandarina). In spite of these differences in phylogenetic pattern, significantly greater proportions of nonsynonymous SNPs than of synonymous SNPs were unique to the domestic populations of all four species. Likewise, in all four species, significantly greater proportions of RNA-encoding SNPs than of synonymous SNPs were unique to the domestic populations. Thus, domestic populations were characterized by an excess of unique polymorphisms in two categories generally subject to purifying selection: nonsynonymous sites and RNA-encoding sites. Many of these unique polymorphisms thus seem likely to be slightly deleterious; the latter hypothesis was supported by the generally lower gene diversities of polymorphisms unique to domestic populations in comparison to those of polymorphisms shared by domestic and wild populations.


Transcriptome responses to selection for tame/aggressive behaviors in silver foxes (Vulpes vulpes)

Xu Wang1, Lenore Pipes1, Gregory Acland2, Lyudmila Trut3, Yury Herbeck3, Anastasiya Kharlamova3, Rimma Gulevich3, Anastasiya Vladimirova3, Jennifer Johnson4, Anna Kukekova4, Andrew Clark1
1Cornell University, Ithaca, NY, USA, 2Baker Institute for Animal Health, Cornell University, Ithaca, NY, USA, 3Institute of Cytology and Genetics of the Russian Academy of Sciences, Novosibirsk, Russia, 4University of Illinois at Urbana-Champaign, Urbana, IL, USA

Domestication leads to a spectrum of striking behavioral changes whose genetic basis remains largely unknown. Silver foxes (Vulpes vulpes) have been selectively bred for tame and aggressive behaviors over 50 years at the Institute for Cytology and Genetics in Novosibirsk, Russia. In order to further understand the genetic basis and molecular mechanisms underlying tame and aggressive behavioral phenotypes segregating in selected strains of the silver fox, we used Illumina RNA-seq to quantify genome-wide gene expression levels in two selected brain tissues (right prefrontal cortex and right amygdala) from 12 aggressive and 12 tame individuals. Since there is no currently available fox genome sequence, we performed a de novo assembly of the 1.4 billion RNA-seq reads and characterized the fox transcriptome. To aid in annotation and remove the transposable elements, we repeat-masked and blasted the fox transcripts to the dog genome/transcriptome. Then we aligned the RNA-seq reads from each sample to the cleaned fox transcriptome, producing high-quality read-count data on the 48 samples for ~15,000 annotated genes in the fox transcriptome. Our preliminary analysis reveals that there are ~600 genes at a 5% FDR that show differential gene expression between the tame and aggressive foxes. Analysis of gene ontology (GO) and of pathway enrichment highlighted several key pathways known to be critical to neurological processing, including the dopaminergic and NMDA receptor pathways. The data relate in interesting ways to neurological and pharmacological effects that are actively being studied to understand human aggression. In addition to the expression analysis, we also performed de novo SNP calling in combined RNA-seq alignments and identified >27,000 high quality exonic SNPs. We compared the allele frequencies between tame and aggressive populations and found significant changes at 350 SNP loci under Bonferroni correction, not all of which are due to pure genetic drift. These candidates include non-synonymous change in a gene within a previous mapped “tameness” QTL using F2s of tame and aggressive foxes. These changes in expression level and allele frequency might be the direct response to the artificial selection and will help understand the genetic basis of mammalian domestication process.

The developmental basis of mutation in mammals

Ni Huang, Don Conrad
Washington University School of Medicine, St. Louis, USA


Recent years have shown rapid progress in our understanding of the physiological and molecular determinates of germline mutation across many metazoans. By comparison, our understanding of somatic mutation processes are rudimentary, largely limited by the technical difficulty of detecting extremely rare mutations in large pools of cells. Nonetheless, small glimpses from sources of data as diverse as drosophila melanogaster mutation screening experiments and studies of aneuploidy in homo sapiens embryos suggest that, remarkably, somatic mutation begins immediately after fertilization in many metazoans, and these genomes may in fact be more unstable during early cell divisions than during mid-life.

Here we describe an early attempt to study somatic mutation processes in non-diseased tissues from mammals, using RNA sequencing data from homo sapiens and mus musculus. We have developed a model-based method for identifying sites of somatic DNA mutation in RNA-seq data, and distinguishing these from sites of RNA editing and allele-specific expression, biological processes that can mimic the signature of somatic mutation. We have applied this method to RNA-seq data from 1,600 human tissues generated by the NIH Genotype-Tissue Expression Project (GTEx), as well as a companion dataset that we have generated from analogous tissues in mouse. Our preliminary results indicate that RNA editing creates high-level mosaicism within the adult transcriptome at ten times more nucleotide sites compared to somatic DNA mutation, and that the extent of RNA editing within a tissue can be strongly predicted by a small number of molecular phenotypes. Our results will allow us to comment on the distribution of developmentally-acquired somatic mutations across tissues, across individuals, and across species, and provide a starting point for discussions on the challenges that we face as we attempt to reconstruct developmental processes at an organismal level in mammals.

Copy-number changes in experimental evolution: rates, fitness effects and adaptive significance

Vaishali Katju, James Farslow, Lucille Packard, Ulfar Bergthorsson
University of New Mexico, Albuquerque, NM 87131, USA

Gene copy-number differences due to gene duplications and deletions are rampant in natural populations and play a crucial role in the evolution of genome complexity. The rate at which new gene copies appear in populations greatly influences their evolutionary dynamics and standing gene copy-number variation in populations. The duplication rate may therefore have profound effects on the role of adaptation in the evolution of duplicated genes with important consequences for the evolutionary potential of species.

In this talk, I will discuss three long-term experimental evolution experiments in Caenorhabditis elegans that we have utilized to investigate fundamental properties of the gene duplication process. First, we conducted oligonucleotide array comparative genome hybridization (oaCGH) on C. elegans mutation accumulation (MA) lines subjected repeatedly to single-worm bottlenecks each generation to provide the first direct estimate of the spontaneous genome-wide rate of duplication in a multicellular eukaryote. The gene duplication rate in C. elegans is quite high and exceeds the spontaneous rate of point mutation per nucleotide site in this species by two orders of magnitude. Second, I discuss new oaCGH results of low-fitness experimental lines subjected to adaptive recovery via population expansion to investigate whether copy-number variants (CNVs) constitute a common mechanism of adaptive genetic change during compensatory evolution. Lastly, long-term spontaneous MA lines maintained at three varying effective population sizes for >400 generations were used to investigate whether CNVs accumulate differentially under varying intensities of natural selection and provide some insights into their average fitness effects.

An atlas of human and mouse genomic imprinting reveals evolutionary causes and consequences

Tomas Babak1, Brian DeVeale2, Yiqi Zhou1, Hunter Fraser1
1Stanford University, Stanford, USA, 2UCSF, San Francisco, USA

Genomic imprinting is an epigenetic mechanism that restricts gene expression to either the maternal or paternal copy of ~150 genes in mammals. It is essential for development and impacts a variety of physiological and cognitive processes that have served as focal points for theories explaining its evolutionary origin and consequences. However, our understanding of the forces shaping imprinting has been limited, in large part because its extent across tissues and development is mostly unknown. To investigate this, we generated an atlas of imprinting in 33 mouse and 45 human developmental stages and adult tissues. Contrary to claims of widespread tissue-specific imprinting, most genes had robust parent-of-origin expression from early development through adulthood. We discovered many maternally and paternally expressed genes with highly similar patterns of imprinting across cell types, which may reflect evolutionary signatures of parental conflict. Furthermore, strongly imprinted genes were highly divergent in their expression between the two species. In addition to facilitating tests of evolutionary theory regarding selection for genomic imprinting and its consequences, our approach demonstrates a general framework for imprinting discovery in any species, even when inbred or related individuals are not available.

Should evolutionary geneticists worry about higher-order epistasis?

Daniel Weinreich1, Yinghong Lan1, Christopher Wylie1, Robert Heckendorn2
1Brown University, Providence, Rhode Island, USA, 2University of Idaho, Moscow, Idaho, USA

Epistasis may be regarded as one’s surprise at the phenotypic effect of some set of mutations, given what is known about their effects in isolation. Quite obviously, epistasis can have profound evolutionary consequences; we might say that it also surprises natural selection, by changing the fitness effect of a mutation as additional mutations appear in the same genome. While evolutionary genetics has long considered epistatic interactions between pairs of mutations, what of the possibility that such pairwise epistasis itself varies with genetic background? Do such higher-order interactions exist, and if so, what are their evolutionary implications?

Following the groundbreaking work of Malcolm et al (Nature 1990 345:86), we and many others have constructed and characterized the fitness (or close proxy) for all combinations of small sets of mutations of interest. This work is motivated directly or indirectly by an appreciation that such combinatorially complete data provide a most accurate picture of the epistatic interactions among these mutations. In this talk we generalizes classical formulae for selection coefficients and pairwise epistasis to allow quantification of epistatic interactions of arbitrary order in such data. We find that non-negligible interactions of all orders are common in many cases. Next we explore the structure of this epistasis. For example, do mutations with large selection coefficients tend to contribute to large epistatic components? We conclude by asking (though not fully answering) the question of the evolutionary consequences of such higher-order epistasis, and suggest that an anthropic principle may account for patterns of epistasis observed in nature.

Epistasis everywhere: a systematic large-scale survey of the epistatic landscape within yeast Hsp90

Claudia Bank1 ,2, Ryan Hietpas3, Daniel N.A. Bolon3, Jeffrey D. Jensen1
1School of Life Sciences, École Polytechnique Fédérale de Lausanne, and Swiss Insitute of Bioinformatics, Lausanne, Switzerland, 2Simons Institute for the Theory of Computing, UC Berkeley, Berkeley, California, USA, 3Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts, USA




A single mutation occurring within a protein-coding sequence can have a variety of consequences on the molecular level, such as changes in the protein’s folding stability or its binding affinity – potentially resulting in positive, or, more likely, detrimental effects on the organism’s fitness. If a second mutation occurs on the background carrying the first, we may expect (1) a higher probability of a negative effect, if the first mutation showed only a small or no effect (for example, because a stability threshold is crossed), or (2) a lower overall effect size if the first mutation had a large effect (corresponding to a “saturation” of effect sizes). Both of these hypotheses result in the expectation of frequent epistasis between mutations within a protein-coding region. Here, we present a large-scale analysis of the epistatic landscape within a 9-amino-acid region of Hsp90 in S. cerevisiae that allows us to test the above hypotheses. Using engineered combinations of up to 6 mutations, screened by means of a systematic high-throughput NGS approach, we gather unique information on the fitness landscape along thousands of mutational walks. We demonstrate that epistasis is ubiquitous, and that diminishing-returns epistasis occurs persistently when beneficial mutations are combined. In particular, we relate the pattern of epistasis to predictions from Fisher’s geometric model, and discuss implications for the role of epistasis in molecular evolution.

Parallel trajectories of genetic and linguistic admixture in Cape Verdean Kriolu speakers.

Paul Verdu1, Ethan Jewett2, Trevor Pemberton3, Noah Rosenberg2, Marlyse Baptista4
1CNRS/MNHN/Univ. Paris Diderot/Sorbonne Paris Cite, Paris, France, 2Stanford University, Department of Biology, Stanford, CA, USA, 3University of Manitoba,Department of Biochemistry and Medical Genetics, Winnipeg, MB, Canada, 4University of Michigan, Departments of Linguistics & Afroamerican and African Studies, Ann Arbor, MI, USA

Starting in the 15th Century, European colonization of Africa and the Atlantic Slave Trade brought together populations of European and African origin on the islands of Cape Verde, giving rise to an admixed population. The ways in which the different waves of migration and major sociohistorical events such as the abolition of slavery influenced the admixture process, and their impacts on the resulting genetic and cultural diversity in this population, remain largely unknown. To study the cultural and demographic history of the Cape Verdean population, we investigated patterns of genetic and linguistic diversity among 44 unrelated Cape Verdean individuals. Genetic data consisted of genotypes at ~2.5 million genome-wide SNPs and linguistic data of spontaneous speech in Cape Verdean Creole (Kriolu) provided by each subject. We found that individual speech patterns across Cape Verdean Kriolu speakers was significantly correlated with pairwise levels of allele-sharing dissimilarities, as well as with the birthplaces of individuals and their parents. Individual levels of African genetic admixture were significantly positively correlated with the number of words of putative African origin used by each individual. These results suggest that genetic and linguistic admixture followed parallel evolutionary trajectories in the Cape Verdean archipelago, and they provide a basis for combining genetic and linguistic information to reconstruct the complex admixture processes that have shaped the cultural and biological diversity of Cape Verde. To our knowledge, this work is the first joint analysis of genetic and cultural variation within a single population of individuals sharing a common, mutually intelligible language.

Genome-wide analysis of Oceanian ancestry

Ana T. Duggan1, David Reich2 ,3, Mark Stoneking1
1Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany, 2Harvard Medical School, Boston, USA, 3Broad Institute, Boston, USA



The history of Oceania, as inferred from archaeological, linguistic and genetic evidence, points to two major human expansions through the region. It seems that the first human settlers arrived in New Guinea and Australia, then joined as the continent of Sahul, more than 40 thousand years ago and spread to the Bismarck Archipelago and other nearby islands but did not spread widely through the Solomon Islands. Present day populations believed to be descendent of these initial settlers speak very diverse languages of apparently great time depth (referred to collectively as Papuan), practice patrilocality and tend to have darker skin pigmentation. The second wave of human expansion arrived with the Austronesians approximately 3.5 thousand years ago and touched almost all of Near Oceania before spreading further into the Pacific and settling Remote Oceania. The Austronesians brought with them a single proto-language which has diversified into a group of closely related languages, possessed a distinctive pottery style, were likely matrilocal and their descendants have a more Asian phenotype. MtDNA and Y-chromosome indicated that Papuan-speaking and Austronesian-speaking populations did admix extensively in Near Oceania and that the mixture appears to have been sex-biased. Maternal ancestry of putative Asian origin is high, even within Papuan speaking populations, and yet Remote Oceanian populations show high levels of Y-chromosomes of Near Oceanian origin. While some studies of genome-wide short tandem repeats or polymorphisms have been conducted they have been restrict to populations from New Guinea and Polynesia who likely represent population extremes. Here we analyse genome-wide SNP data, collected on the Affymetrix Human Origins array, from approximately 300 samples from 40 populations across Southeast Asia and Near and Remote Oceania. We are using these data to attempt to elucidate the genetic structure of Papuan-speaking and Austronesian-speaking groups, including the time and extent of admixture between them, to better understand the dynamics of population contact which lead to the distinctive pattern of uniparental inheritance but also maintained two very different language groups and cultures within Oceania.

Cultural transmission of reproductive success: a strong evolutionary force that shapes genetic diversity.

Evelyne Heyer1, Jean-Tristan Brandenburg1 ,2, Michela Leonardi1, Patricia Balaresque3, Bruno Toupance1, Tatyana Hegay4, Almaz Aldashev5, Frederic Austerlitz1
1CNRS/MNHN/P7 UMR7206, Paris, France, 2INRA/CNRS UMR 0320/UMR 8120, Moulon, France, 3CNRS/Univ Toulouse UMR5288, Toulouse, France, 4Academy of Science, Tachkent, Uzbekistan, 5Academy of Science, Bishkek, Kyrgyzstan

One of the specificities of our species, as acknowledged for a long time by anthropologists, is to live in an extremely wide range of social organizations defined mainly by alliance rules, matrimonial systems, residence rules and descent rules*. The hint that social organization should be taken into account when studying genetic diversity came mainly from comparisons between mitochondrial DNA (mtDNA) and Y-chromosome genetic diversity. Initially, it was proposed that sex-specific behaviours, and particularly differences in migration rates between men and women due to residence rules, may explain differences in Y-chromosome diversity versus mtDNA diversity. More recently it has been shown that the differences in diversity and differentiation levels between the different genetic systems (X, Y, mtDNA and autosomes) could not be explained only by differences between male and female migration rates, but also by differences between male and female effective population sizes.
We hypothesized that the mechanism by which such reduction in effective population size can be reached is Cultural transmission of reproductive success. Building on our previous theoretical work that showed that CTRS can reduce profoundly effective population size, and on a method that we have designed to detect such transmission from current DNA sequence polymorphism datasets, we tested formally the extent to which CTRS reduces genetic diversity in Central Asia, where we have previously demonstrated the occurrence of sex-specific reduction in effective population size: male effective size is much smaller than its female counterpart.
We used mtDNA and Y-chromosome genetic data to infer male and female transmission of reproductive success in 19 Turkic and Indo-Iranian populations from Central Asia known for their contrasted social organisations. Both societies are patrilocal and mildly polygynous, but Turkic populations have a patrilineal descent, while Indo-Iranian populations have a cognatic descent.
Our results show that patrilinearity impacts genetic diversity through cultural transmission of reproductive success. This clearly demonstrates the impact of social organization on human biological evolution. Moreover, notwithstanding the fact that our genetic approach clearly shows that there is a strong male bias transmission of reproductive success in patrilineal societies, it also formally demonstrates that cultural transmission of reproductive success could be a major evolutionary force. Indeed, it reduces within-population genetic diversity and increases among-population differentiation, the two key components for the evolution of cooperation.

Convergent genome evolution in marine mammals

Gregg Thomas1, Andrew Foote2, Matthew Hahn1
1Indiana University, Indiana, USA, 2Uppsala University, Uppsala County, Sweden

Marine mammals share striking phenotypic and behavioral convergence as a result of adaptation to their common environment, despite the independent origin of these traits. Here we investigate convergent evolution at the molecular level among four marine species that span three mammalian orders: bottlenose dolphin and killer whale (Cetacea), walrus (Carnivora), and manatee (Sirenia). Using the newly sequenced genomes of these species, we tested for an excess of convergent amino acid substitutions. Interestingly, when comparing the number of convergent substitutions among the marine species to similar convergent substitutions among control groups of land mammals, we identify more convergence among the land animals. However, we do identify 15 genes exhibiting both convergent amino acid substitutions and significant evidence for positive selection in all three marine taxa, indicating that these genes may have played a role in adaptation to the marine environment. Additionally, we find more than 100 proteins that show characteristics of adaptive convergent evolution between pairs of marine mammal lineages. We also find evidence of convergence throughout the regulatory regions of these genomes, including in the region upstream of PITX1, a gene associated with hind-limb formation. These findings support the idea that nature takes advantage of multiple genes and multiple types of molecular changes to evolve complex convergent phenotypes.

Population resequencing reveals the molecular basis of recurrent evolution in the domestic pigeon

Michael Shapiro, Eric Domyan, Zev Kronenberg, Michael Campbell, Mark Yandell
University of Utah, Salt Lake City, UT, USA

Selective breeding has generated spectacular phenotypic diversity in over 350 different modern breeds of domestic pigeon (Columba livia). Some of the traits that vary among breeds also vary among natural species of birds, making the domestic pigeon an attractive model to understand the molecular underpinnings of avian diversity. In the pigeon, some of these traits appear repeatedly in breeds that are otherwise morphologically or genetically dissimilar. To explore the genetic, genomic, and developmental origins of phenotypic diversity among pigeons, we generated a catalog of genetic variation by resequencing multiple breeds with different combinations of derived traits. We detected genotype-phenotype associations for several traits, and found an intriguing spectrum of both coding and regulatory changes. Importantly, these association studies provide targets for deeper exploration of mechanisms underlying phenotypic diversity, including mechanistic explanations of allelic dominance and epistasis among loci.

• Category: Science 
Hide 2 CommentsLeave a Comment
Commenters to FollowEndorsed Only
Trim Comments?
  1. This David Reich et al paper is going to be massive!

    “Genotyping of 390,000 SNPs in more than forty 3,000-9,000 year old humans from the ancient Russian steppe”

    On the WestHunt blog a few months ago, I guessed that the most likely culture to have spawned the Indo-Europeans was the Khvalynsk Culture, which is just South of Samara, from where the samples were taken. (5,000-4,500 BC)

    I guess if Reich chose to sample from Samara, then maybe he predicts that the slightly earlier Samara Culture is the source of the Indo-Europeans. (5500–4800 BC)

    The reason I prefer the Khvalynsk Culture more, is that it’s territory included the North Caucasus…

  2. j.p. mallory can stop searching….

Comments are closed.

Subscribe to All Razib Khan Comments via RSS