The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
DavidB P-ter Razib Khan
Nothing found
 TeasersGene Expression Blog
Population Genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

440px-Daphnia_pulex Yesterday I read a paper which utilized Daphnia as a model to explore a very important theoretical question, which relates the role of effective population size to the genetic load (they’re inversely correlated). The theoretical aspect I am aware of, but I don’t know much about Daphnia. The paper is titled Genetic load, inbreeding depression and hybrid vigor covary with population size: an empirical evaluation of theoretical predictions, and it’s in Evolution. It’s not open access, and I can’t find a preprint around, for which I apologize (you could pester the first author on ResearchGate or something). But one reason I’m interested is that they assert this:

Our results are in clear support of theoretical models based on recurrent mutation to unconditionally deleterious alleles on the effects of population size on inbreeding depression, hybrid vigor, and genetic load. This study is the first to find such clear and unequivocal evidence for all of the predicted effects.

This makes me think of Richard Lewontin’s assertion back in the 1970s that theoretical population genetics was basically a machine designed to operating upon inputs which weren’t available (data). I don’t know this literature well, but it’s shocking that these ideas have only been robustly tested now! Or, perhaps these results are false positives of some sort, as it does note it’s the first to find clear and unequivocal evidence for a prediction.

The basic issue is that in small populations genetic drift has the potential to overwhelm the power of selection in purging deleterious alleles. How deleterious an allele is varies. Some alleles have very strong negative selection coefficients. For example, those with dominant lethal effects are going to be purged immediately for obvious reasons (if it’s dominant, it’s always expressed, and if it’s lethal, it isn’t passed on). The situation differs for those with recessive expression patterns. Even if it is lethal in homozygous form, an allele can persist at low frequency if the population is random mating, as the vast majority of copies will be in heterozygotes whose fitness is not impinged. But if the selection coefficient is low enough than even dominantly expressed alleles may not be purged. The variance in allele frequencies due to sampling is inversely proportional to the population size, so as that converges upon the selection coefficient in terms of magnitude, the efficacy of natural selection diminishes. This is at the heart of the nearly neutral theory, which suggests that a lot of variation is due to the input of very weakly deleterious alleles which can’t be purged in population sizes where drift is above a particular threshold.

originsgenomearchitecture Presumably, in large populations there will be many low frequency variants of weak deleterious effect and recessive expression. In contrast, in small populations the power of drift is such that even rather deleterious alleles can be fixed against the gradient of selection. At cross-purposes with this is the idea that because inbreeding populations tend to “expose” alleles which express recessively to selection they can “purge” the genetic load which drags on fitness. For example, with dog breeds there is some evidence that inbreeding needs to be conditional upon breed level variation, as some of the load may have been purged.

Apparently Daphnia are a species which exhibit a wide gradient of variation in genetic diversity (heterozygosity in this case), allowing one to test various hypotheses by crossing lineages sampled from wild populations in the laboratory. Their molecular assay of diversity were ~30 microsatellite loci. What they found is that in line with theoretical prediction those sampled from large populations had lots of segregating deleterious alleles, which manifested in strong inbreeding effect when individuals were purposely crossed with those genetically similar. In contrast, those from small populations did not exhibit so much inbreeding effect, indicating that a lot of the deleterious alleles were already fixed and so exposed. These individuals from small populations also exhibited lower fitness than those from large populations, reflecting in all likelihood their genetic load. Crossing individuals from different small populations resulted in immediately hybrid vigor, as the fixed variants differed across lineages.

There are a lot more details in the paper. If you have academic access, read the whole thing. If not, there’s always #icanhazpdf. I’m more interested in general conclusions. Two preprints just came out which addressed the reality that Neanderthals seem to have had a small effective population size. Meanwhile, the issue is very real and live in conservation genetics, and even in the understanding of mammalian lineages more broadly, many of which have gone through bottlenecks even human intervention aside. But how much can we generalize from the Daphnia, which has a small genome (<10% of the size of the human genome, which is around average for mammals), but ~1/3 more genes? I’d wager a lot. But I’m really going to be interested when there are whole-genome analyses of this sort of study done in Daphnia, and we can look at the site frequency spectrum, instead of just inferring from the fitness.

Finally, I do want to emphasize here a lot of the problems relating to inbreeding seem to be due to segregation load of partially recessive low frequency variants. This is an important foundational insight that allows us to properly conceptualize what’s happening in small populations, or in lineages that have gone through a bottleneck, and why that’s a problem.

• Category: Science • Tags: Inbreeding, Population Genetics 
🔊 Listen RSS

9780981519425_1Brian Charlesworth, co-author of the magisterial Elements of Evolutionary Genetics, won the Thomas Hunt Morgan Medal from the Genetics Society of America this year. An open access copy of his speech, What Use Population Genetics?, is now online at Genetics. In the speech he makes the case toward a broader audience, which includes people working in molecular and developmental areas far removed from population genetics, as to why his field is important and critical to the broader scholarly enterprise.

First, he argues that without a good intuition for population genetic dynamics, one can not model evolutionary process very well. Of course that intuition only comes over time absorbing population genetics and gnawing on problem sets. But you have to put the work in to talk cogently about evolutionary biology in its broadest scope. Charlesworth suggests that those who don’t know population genetics “run the risk of making mistakes such as asserting that rapid evolutionary change is most likely to occur in small founder populations.” The issue here is that selection is powerful in very large populations. Not so much in smaller ones. I’ve personally encountered this confusion many times from biologists who are not population geneticists. But, I do want to also admit that genetic drift can cause rapid allele frequency changes, so even here I would say that some people might quibble a bit with Charlesworth on the specific details (I am not one to dispute this particular assertion, for the record; I know what he meant).

Second, he addresses the nature of transposable elements (TE) in the genomes of organisms, and why they are so common, and where they are so common, as well as the role of PRDM9 in recombination. Pervasive features of the genome may, or may not, have adaptive origins. That means evolutionary genetics has to step into the fray and address the long term dynamics. Intersecting the frameworks of evolutionary genetics, and the structural constraints of molecular genetics, Charlesworth illustrates how population genetics sheds light on the biophysical character of genomic features, as well as the distribution of those features. If evolutionary biology is the science of why. Population genetics is how. Molecular genetics may be thought of in this schema as the is.

Finally, though Charlesworth alludes to it in passing only at the end of his speech, I think it is critical to remember that the post-genomic era is upon us, and it is incumbent upon us to think in in population terms. The style of analysis which is common in population genetics lends itself easily to big data analyses. I recall a conversation with a young researcher last year at ASHG where he told he was moving from population, to medical, genetics. And yet when his most recent publication came out I had to observe that it was fundamentally a work of medical population genomics. You can take the geneticist out of the population, but you can’t take the population out of the geneticist.

• Category: Science • Tags: Population Genetics 
🔊 Listen RSS


David Reich’s lab has a new preprint out, Eight thousand years of natural selection in Europe, which serves as a complement to Massive migration from the steppe is a source for Indo-European languages in Europe. Where the previous work has focused on the relationships of ancient and modern populations, this research puts the spotlight on patterns of natural selection which have shaped ancient and modern populations. The method utilizes the explicit model which is supported by the previous work, that Europeans are best approximated as a three population admixture of a group represented by the hunter-gatherers of Western Europe, the first farmers which brought agriculture to Europe, and the peoples of Central Eurasia which likely brought the Indo-European languages to Europe. In the parlance of these sets of papers, WHG, EFF, and Yamnaya. Basically they have allele frequencies of these ancestral groups, thanks to ancient DNA techniques, and the frequencies in modern populations. By comparing the frequencies one can then infer if the deviations from expectation are large enough to satisfy the conditions you’d expect for a locus subject to a selective sweep of some sort which is changing proportions rapidly as a function of a given selection coefficient.

lctFirst, it is very obvious that lactase persistence in Europe has been under strong directional selection over the past 4,000 years. Even in the Bronze Age Central European samples did not exhibit frequencies of the derived variant common across Western and South-Central Eurasia on the LCT locus which is associated with persistence today. A quick survey of the 1000 Genomes data shows that this variant has wide variation in modern European populations which are phylogenetically close. The frequency in the Spanish data set is ~50 percent, but in the Tuscan Italian samples it is ~10 percent for the derived variant. In Denmark and Sweden the derived allele frequency goes up to ~75 percent (the phenotypic expression is dominant, so that means ~95 percent lactase persistence), though in the Finnish sample it is closer to the frequency of the Spanish data set. In South Asia the 1000 Genomes data as well as earlier work shows that frequencies are 25 percent or more in Northwest India, in the Punjab, where dairy culture is most pervasive. It drops as a function of distance from this zone, to 5 percent in the Southern and Eastern South Asia. The haplotype network around this particular mutation implies that it probably originated in Central Eurasia, so the varied frequencies across the Old World is suggestive of both migration and selection. Intriguingly, the lactase persistence allele is not present at appreciably frequencies in the Yamnaya. It begins to appear in cultures such as the Corded Ware Bell Beaker, though at far lower frequencies than is presently the case in this region.

But the story of lactase persistence is not entirely surprising. Its late evolutionary trajectory in relation to the rise of cattle culture and complex societies in Eurasia points to the reality that evolutionary change in the biological dimension requires a powerful cultural scaffold. That existed in the form of agro-pastoralism in Eurasia. Similar forces are at play across regions of Africa, where signatures of selection are even more evident in groups dependent upon cattle, likely because of the recency of the emergence of the trait, caught in mid-sweep.

A new face in the world?

A new face in the world?

There are few other signatures evident in these data. Three of them have to do with pigmentation, SLC24A5, SLC45A2, and HERC2. Ewen Callaway reported on the peculiarity last year that Paleolithic European hunter-gatherers may have had dark skin and light eyes. The reasoning here is that a large fraction of the complexion difference between Europeans and Africans is attributable to a derived mutation on SLC24A5, which is nearly fixed in modern Europeans. And yet ancient European hunter-gatherers on the whole were not fixed at this locus, and Western European hunter-gatherers, exhibited the ancestral variants. To get a sense of how peculiar this is the vast majority of the alleles in much of the Middle East are in the derived state, as are about half the alleles in South Asia (I am a homozygote for the derived allele for what it’s worth, and my skin is still notably brown, though obviously not extremely dark). The best available data suggests that the mutant allele emerged recently in the Middle East, and it has expanded out from that point of origin.

SLC45A2 is different in that its distribution is far more constrained to within Europe, though it is found at appreciable frequencies in the Middle East, and at lower frequencies in South Asia. The same for HERC2, though I was surprised to see that the “European” variant associated with blue eye color is actually found at a 0.10 proportion in the 1000 Genomes data in Bangladesh (I am a homozygote for the ancestral variant), the same fraction as the Punjabi sample.*

The results here seem to suggest that all these loci are under selection. The two SLC genes are under positive selection, though SLC24A5 probably got its first boost from EFF with the arrival of agriculture, and was subsequently fixed even when that group fused with the hunter-gatherers who lacked it. Curiously HERC2 is under some negative selection. Remember that all the hunter-gatherers seem to carry the derived variant, so the frequency could only but go down. But in Southern Europe it is likely being driven down in frequency, while it Northern Europe it has been maintained, or rebounded.

Of course one of the major issues we have when evaluating pigmentation loci and their relationship to selection is it’s not always clear if the target of selection is the trait of pigmentation, or something else which the locus modulates, and pigmentation just happens to be a salient side effect. There are many theories about why populations have become depigmented, but none of them are truly well supported in my opinion. Another question is whether we know the genetic architecture of pigmentation well enough to actually infer that these ancient populations are easily predicted in their trait character by modern models which map genotype to phenotype. In other words, were Paleolithic Europeans light skinned because of different alleles? The genetic architecture of skin color is relatively well understood in extant populations. Though it is possible, it so happens that modern Northern Europeans, and to a lesser extent Southern Europeans, harbor a substantial portion of European ancestry which is rooted in the Paleolithic. Studies in admixed African American populations, which are about ~20 percent European, indicate that the primary variants which determine complexion are the ones extant in modern populations, though it may be that there isn’t power to detect the ones from WHG, etc. Of course it could be that the lightening alleles of the Paleolithic Europeans were subject to negative selection, excepting the HERC2/OCA2 locus. But that’s not a particularly parsimonious solution from where I stand (by the way, if selection is targeting something other than pigmentation it is strange that pigmentation associated loci emerge in clusters as positive hits for selection tests).

A secondary issue in relation to pigmentation is that the Yamnaya population does not seem to have been particular fair of hair or azure of eye. The frequency of the derived HERC2 SNP is in the range of North Indian populations, while the SLC45A2 SNP is in the same frequency range as Middle Eastern groups. One might suggest that the Yamnaya are not representative of the population which was intrusive to Europe, but note that the frequencies of the alleles in question during the Late Neolithic and Bronze Age are intermediate between it and modern groups. These results imply in situ evolution within Europe over the Holocene, and down into historical times, toward the phenotype which we ascribe uniquely to Europeans. This is strange especially in light of the fact that a later eastern branch of Indo-Europeans seem to have been quite light. I don’t think we can make final inferences, but to me it is starting to look like the “Proto-Indo-European” complex of peoples was highly cosmopolitan and heterogeneous. Should we expect anything other? As the Mongols expanded in all directions their divergent tendrils were embedded in different ethnic substrate (e.g., Tatars, Khitai and Jurchen in China, Kipchak Turks in Russia, etc.).

The other major locus that showed up was one related to fatty acid metabolism, FADS1. Many tests for selection in humans and domestic animals show changes in the ability to process nutritive inputs. It seems an eminently plausible candidate phenotype to target for selection since the relationship to fitness is straightforward. Using polygenic score methods they also find that there was selection for shorter stature in early Neolithic populations in places like Spain. I think in the future one area of investigation is going to be in the domain of biological adaptations on the margin of farming populations which are put into a Malthusian pressure cooker. Humans, on average, were getting smaller until recently in comparison to their average stature during the Last Glacial Maximum. The Yamnaya people, in contrast to the Neolithic Iberians, seem to have been rather tall. Perhaps it had something to do with the nature of agro-pastoralism? (though do note that without lactase persistence they’d miss out on about 1/3 of the calories in the form of lactose sugar, though not the protein and fat)

edarmotalaBut there’s a twist which I haven’t gotten to, and that’s the one in regards to the hunter-gatherers from the Scandinavian region. Unlike the WHG samples you can see that they exhibit mixed frequencies of derived and ancestral alleles at the SLC loci. That’s peculiar, since geographically they are more distant from the core region from which EFF issued. We do know that their ancestry is somewhat exotic, as paper on Indo-European migrations pointed out that they seem to carry the same ancestral component which the Indo-Europeans brought to most of Europe, that of the Ancestral North Eurasians (albeit at far lower fractions than the EHG group which was a partial precursor of the Yamnaya population).

The past is complex and doesn’t fit into a solid narrative. And yet the weirdest aspect of the Scandinavian samples is that they carry the East Asian/Native American variant of EDAR at appreciable frequencies! The figure to the right illustrates this. In blue you have the focal SNP (dark is homozygote, light is heterozygote, dark circle means only one allele was retrieved). In the Chinese from Beijing population (CHB) the derived variant is at high frequency. In the sample of Northwest Europeans from Utah (CEU) it is not present. You can confirm these findings in the 1000 Genomes and elsewhere. In European EDAR of the East Asian form seems only to be found in Finland and associated populations. Using ALDER the authors conclude that admixture occurred on the order of 1 to 2 thousand years before the present, from an East Asian-like group (in the Indo-European paper they found this source best matched the Nganasans of North Central Siberia). An interesting fact which also comes out of this finding is that the haplotype that the derived SNP arose against is relatively common in Northern Europe. The arrows in the figure point to individuals who carry the ancestral SNP, but exhibit the same haplotype which is dominant in East Asia (and also among the Scandinavian hunter-gatherers with the derived variant). The authors state that “The statistic f4(Yoruba, Scandinavian hunter-gatherers, Han, Onge Andaman Islanders) is significantly negative (Z=-3.9) implying gene flow between the ancestors of Scandinavian hunter-gatherers and Han so this shared haplotype is likely the result of ancient gene flow between groups ancestral to these two populations.” Though in earlier work on these data sets they left open the possibility of gene flow between Eastern and Western Eurasia during the Paleolithic as a way to explain some results, it was not offered as a result for the Scandinavian hunter-gatherers. I do not know what to think of the fact that the haplotype that the derived East Asian SNP arose in is common in Northern Europe (though without the derived SNP, which is likely only present in a few populations due to recent Siberian admixture). Could it be that ancient gene flow from Western Eurasian Paleolithic people occurred into East Asian populations, and that then this haplotype accrued the mutation which later swept to near fixation? If that is the case I’m curious about haplotype networks, as Northern Europeans should be more diverse when it comes to the haplotype in question.

In the near future we’ll probably have better and more numerous whole genome sequences of ancient samples. Some of the confusions engendered by this work will be cleared up, as better data renders paradox crisply coherent. The preprint is free to anyone, and I invite readers to dig deeply into it. Though the results yielded only a few positive signals of selection, they’re subtle and complex in their implications. I certainly haven’t thought through everything….

* The fraction of blue eyes is MUCH higher among Punjabis than Bengalis in my experience. It goes to the point that blue eyes likely expresses against the genetic background found in Europeans, where there are other depigmenting alleles near fixation.

• Category: Science • Tags: Europeans, Population Genetics, Selection 
🔊 Listen RSS


figure1 I do like to suggest that the genetic and archaeological record support the conjecture of Conan the Barbarian in terms of what our male ancestors thought was “good in life.” Basically, to conquer your enemies and seize their women, which is a distillation of a disputed quote from Genghis Khan. Conan may be fiction, but Genghis Khan is not. As it happens there is a fair amount of circumstantial evidence that the genetic legacy of Genghis Khan is enormous. Not only did Khan father many sons, but so did their sons, and so forth. Tens of millions of men around the world are direct paternal descendants of Genghis Khan and his family.

This is known. But now more is known, thanks to a new paper out of Genome Research, A recent bottleneck of Y chromosome diversity coincides with a global change in culture. The upside of this paper is that it uses whole genome sequence of Y chromosomes to generate phylogenetic inferences. This is important because the Y chromosome has very little genetic variation relative to much of the rest of the genome. The downside is that because techniques were utilized to perform whole genome sequencing of the Y, the sample size, at 299, is not as large as we’ve gotten used to for analyses of uniparental lineages. That will change in the future, as there are many thousands of whole genome sequences of the Y in databases around the world, though perhaps not enough computational power allocated by funding agencies to crunch through them in the fashion on display in the paper (they didn’t use the whole sequence for a lot of the analysis, but ~35,000 SNPs).

So what are the major findings of the paper? Using a Bayesian Skyline Plot (BSP) it is rather clear that 4-8 thousand years ago there was a sharp drop in male effective population sizes across many world populations. It is also clear that the female effective population did not experience the same drastic contraction. The supplements have individual figures, and many of the events of history and archaeology can be easily mapped onto these population size changes. For example, the later reduction of African population sizes probably is due to the later adoption of agriculture in that continent, and timed with the Bantu expansion. In the New World the data seem to show late and persistent reduction in effective population size. The Columbian Exchange and massive population contraction subsequent to that is probably being picked up by this result. figure3Intriguingly there is a detection of a two events in the European data, where the sample size is relatively large. The first major drop seems to coincide with the arrival of the “First Farmers” (e.g., LBK culture) in Northern Europe. In the Middle East (orange) you see collapse, and then a rapid ascent very early. This comports well with the early history of agriculture here. But in the European samples there is a rapid ascent, and then a level off ~3,000 years ago or so. This could be the arrival of Indo-European cultures to Europe. If the sample sizes for other regions were as large and representative as Northern Europe such subtle details might also have emerged there with the BSP method (to be clear, I suspect the crash in effective size in Europe is due to haplogroup I, while the delayed expansion is due to R1a and R1b arriving a few thousand years later).

Also of interest are is the deep structure of the different clades. Those of you stepped in Y chromosomal haplogroups can extract more from the figure to the top left, but it shows relationship of the primary groups as well as their recent expansion. The affinity of the Q and R clades to me indicate that those who argue that these are somehow related to the “Ancestral North Eurasians” are correct. Similarly, the position of I and J in the same clade points to their common descent from ancient West Eurasian Pleistocene groups. The I lineage is most exclusively associated with European hunter-gatherers, while J is traditionally associated with groups of farmers expanding out of the Middle East in all directions (note that one branch of J is found in the Middle East, Central Asia, South Asia, and Europe). I agree with Dienekes that the branch of E that corresponds to the lineages which span Sub-Saharan Africa and Western Eurasia are a indicating a back migration to Africa, probably in the Pleistocene. I do wonder as well whether they have some association with the mysterious “Basal Eurasians.”

An important part of the paper that they emphasize is that ~50,000 years before the present there was a profusion of haplogroups associated with the ones which are today common across Eurasia, and Y chromosomal Ne was ~100. This seems to agree with the rapid expansion of non-Africans in the wake of the “Out of Africa” event, though the authors note they don’t have enough power to reject a model of a separate “Southern Route” migration, which might be detected with autosomal data. This is a good caution on the limitations of Y and mtDNA data; archaic admixture was rejected by these two loci because the non-African hominin lineages went extinct (mtDNA and Y have higher turnover rates than the recombining autosomal regions). figure4Additionally there were some major lacunae in the sampling. For example, among the African populations it doesn’t seem like some of the hunter-gatherer groups, the Khoisan or eastern Pygmy, were included in the data set. The map also shows that Northeast Asia (China, Japan and Korea) and Oceania were not extensively sampled. But these are minor issues in the broader picture of the insights from the population coverage that they did have.

The most important implication of these sorts of results have to do with the nature of the change of human social organization and behavior over the course of the existence of modern humans. The authors of the above paper seem to understand this, as there is extensive focus on the topic within the paper:

An increase in male migration rate might reduce the male Ne but is unlikely to cause a brief drastic reduction in Ne as observed in our empirical data…However, in models with competition among demes, an increased level of variance in expected offspring number among demes can drastically decrease the N e (Whitlock and Barton 1997). The effect may be male-specific, for example, if competition is through a male-driven conquest. A historical example might be the Mongol expansions (Zerjal et al. 2003). Innovations in transportation technology (e.g., the invention of the wheel, horse and camel domestication, and open water sailing) might have contributed to this pattern. Likely, the effect we observe is due to a combination of culturally driven increased male variance in offspring number within demes and an increased male-specific variance among demes, perhaps enhanced by increased sex-biased migration patterns (Destro-Bisol et al. 2004; Skoglund et al. 2014) and male-specific cultural inheritance of fitness.

To restate what’s being said here:

1) During the Holocene we saw the rise of powerful patrilineages which engaged in winner-take-all of inter-group competition.

2) Within the “winning” patrilineages there may have been winner-take-all dynamics, or at least high reproductive variance

When it comes to farmers and nomads against each other I do think a model of inter-demic competition is pretty realistic. But when it comes to farmers and nomads against hunter-gatherers I don’t think one can term it competition. The latter in most circumstances would be quickly overwhelmed by the farmers and nomads; eliminated, excluded, or at least assimilated (there are exceptions in areas where the hunter-gatherer density was high and they were sedentary). And as concerns the complex societies of farmers and nomads, even within them the rise of inequality and stratification mean that subordinate or secondary males and their lineages were marginalized, leaving few descendants.

Men are on average 15-20 percent bigger than women. Men are also stronger than women. But the sexual dimorphism is far less than one can find among gorillas. This suggests that intra-sex competition among males was attenuated, or at least it was not in the physical domain. Though I am not of the camp which believes that war as we understand it must necessarily be a feature of Holocene agricultural societies, it seems likely that the pressure cooker of high population densities resulted in a radical increase in the scale of inter-group atrocity. One way to react to this change would have been to grow larger physically, but there are limitations to how fast biological evolution can resculpt the human physique. Not only that, but larger humans presumably require more nutritional inputs, and the agricultural revolution in Malthusian conditions did not enable that on a mass scale. So humans did what they do best: innovate culturally.

The cultural innovations came as package deals. A central role for patriarchal lineages which tended to apply force to maintain social order, as well as take on the position as the tip of the spear in inter-group competition, eventually resulted in power accruing to those groups almost exclusively. The importance of patrilineages naturally resulted in an increased importance of paternity certainty, and therefore social mores which emphasized female chastity. These powerful lineages fixed upon a solution which gorillas had long ago arrived at: treat females as chattel and defend them as one would property.

The “men in groups” were evoked by particular social-cultural conditions of agricultural society which they themselves did not necessarily trigger in an any way. But once you had a small benefit to the emergence of a caste of men in groups, groups which developed this caste benefited. Within these groups eventually the caste took over the identity of the group, and made its own interests conterminous with the interests of the group. The Athenian polis was democratic, but only for free males who were born of Athenians. In other words, the most radical experiment in radical democracy in the ancient world was also still relatively exclusionary and delimited in the nature of political power and representation (also, recall that the power of freeborn males of lower economic status in Athens has been connected to their importance in the navy as oarsmen).

Speaking as someone with broadly liberal sympathies, economic and social forces over the past few centuries have resulted in an unwinding of the cultural innovations of the past 10,000 years which have put a straight-jacket on the forces of human liberty. This great unwinding to some extent can be understood as the shattering of the great patriarchal monopolies of old, reflected in the great families and lineages which spanned the world, and democratic representation first for all men and then women. In the West the period between 1800 and 1970 saw massive gains in income to unskilled workers, reversing the tendency toward winner-take-all dynamics which arose with the Neolithic.

That being said, the post-Industrial and post-materialist world, in full flower in places like North Europe, is not exactly like the Paleolithic. Some of the innovations of the post-Neolithic world, such as organized religion, are probably here to stay in a world of social complexity and density. The great devolution to power from the elite male lineages is one specific aspect where I believe the modern age more resembles the Paleolithic. More liberal sexual ethics is also another dimension where the modern world is more like that of hunter-gatherers. But the autonomous individual, an island unto himself, is a fiction. Hunter-gatherers were, and are, social creatures. No doubt they were bound by taboos and rules, just as modern hunter-gatherers are. The vision of egalitarianism promoted by many in the modern West is a reaction against the social controls of the post-Neolithic world, but those social controls themselves are rooted in human cognitive impulses. Competition did not come full formed in the world of grain, and the impulse toward violence and domination was present in man long before the scythe was re-purposed toward bloodier ends.

• Category: Science • Tags: Phylogenetics, Population Genetics, Y Chromosome 
🔊 Listen RSS
Distribution of rs17822931 from HGDP

Distribution of rs17822931 from HGDP

Yoshiura, Koh-ichiro, et al. "A SNP in the ABCC11 gene is the determinant of human earwax type." Nature genetics 38.3 (2006): 324-330.

Yoshiura, Koh-ichiro, et al. “A SNP in the ABCC11 gene is the determinant of human earwax type.” Nature genetics 38.3 (2006): 324-330.

I’ve talked about rs17822931 in ABCC11 several times. The reasons are manifold. First, on many traits of interest it exhibits variation across populations in a simple Mendelian (recessive expression) manner. Second, there are suggestive variations in distribution. Third, the traits are kind of interesting without being biomedical. In other words, it’s a cool illustration of pleiotropy and human genetic variation that isn’t going to depress you. If you check out the SNPedia page you note that it is associated with variation in earwax type (wet vs. dry), body odor, and colostrum secretion. This is not the full list, and I’m moderately confident that biologists haven’t hit on all the major phenotypes that this affects variation in.

Until recently I’ve really only been interested in the population genetics of the trait. But talking with a few friends who were molecular biologists I realized I should follow up and dig deeper, and what I found was very interesting. Specifically, as it relates to body odor, which, like it or not is a phenotype of significance in the modern world. The trait happens to segregate within my family. My son is a TT genotype, because his parents are heterozygotes. That means he will exhibit less body odor as an adult. How much less?

In The Journal of Dermetological Science I found Functional characterisation of a SNP in the ABCC11 allele—Effects on axillary skin metabolism, odour generation and associated behaviours. Obviously this is not a journal I read often, but some of the tables are fascinating. The subjects were a few hundred Filipins. This is a population where the allele of interest segregates in intermediate frequencies. So there are many individuals with dry earwax as well as wet earwax, and all the associated traits.

Here are some tables I extracted*:

Mean malodour scores
5 hours 24 hours
TT 2.59 2.6
CT 3.26 3.4
CC 3.21 3.5
Uses deodorant 0.5 0.86 0.97
Does not use 0.5 0.14 0.03

I have no idea how subjective malodour scales work, but the moral is pretty straightforward. Those with the TT genotype saturate at a much lower point. This manifests in daily behavior. There is a fair amount of Japanese data that people who go to the doctor for body odor issues are much more likely to have wet earwax. This data from the Philippines illustrates that individuals with the derived genotype, TT, must be conscious enough of their lack of body odor to forgo deodorant purchases, even though I assume it is normative in the American influenced culture of the Philippines.

1-s2.0-S0923181113003058-gr1But most interesting to me are the chemical differences of the sweat of the different genotypes. They note that there were differences in Nα-3-methyl-3-hydroxy-hexanoylglutamine (HMHA-Gln), Nα-3-methyl-2-hexenoyl-glutamine (3M2H-Gln), and 3-methyl-3-sulfanyl-hexanol-cysteine-glycine between the genotypes. I don’t know much about these chemicals, except that they are “malodour conjugate precursors”. Not surprisingly there’s some difference in the microbial flora of the individuals as a function of genotype.

There have been attempts to understand the selection processes which may have shaped the distribution of the regional variation of this trait, but I’m not entirely convinced of what I’ve seen. Especially when the authors presume that earwax phenotype is in some ways causal (or at least it can give insight to causality, if that makes sense), when it may just be a developmental side effect. A consideration is that some models assume a recessive expression of the trait, which is true for body odor and earwax. But we don’t know if selection occurred that it was on these traits. Because of pleiotropy traits due to variation at a given gene may exhibit different levels of dominance, from full dominance, to additivity, to recessive expression. The target of selection may exhibit a different dominance coefficient than many of the side effect phenotypes (to give you a concrete example, the locus responsible for blue vs. non-blue eye color in Europeans exhibits some recessivity, but it is also responsible for variation in skin color where it is additive).

A 2009 paper using the HGDP data set found evidence of selection on ABCC11 using XP-EHH but not iHS. In other words, extended haplotype differences across populations, but not within them, which often imply sweeps near fixation between populations, rather than ongoing ones within them. To get a better sense of the distribution of the allele I decided to query the SNP in the 1000 Genomes Browser. I invite you to look at the data yourself. The sample sizes start to get pretty large in some of these populations. It is interesting that in West African populations the ancestral variant is nearly fixed, or totally so. The cases where it is not so can pretty easily be hypothesized as due to recent (last 10,000 years) Eurasian admixture. In Europe the frequency of the derived variant is low, on the order of ~10%, but in the Finnish sample it peaks at ~25%. This aligns with patterns in the HGDP data set. African populations tend to be fixed for the ancestral variant, C, while European populations have a low frequency of the derived variant, T, with a cline toward the northeast from the southwest (i.e., peaks in the Russians, lowest fraction in Sardinians). But, Middle Eastern samples in the HGDP data set have European proportions of T as well, though the Mozabites in North Africa do not. The South Asian samples in the HGDP have higher levels of the derived variant than Europeans, intermediate between that group and East Asians. But the 1000 Genomes data results in a thickening of the plot (and, with large sample sizes!). The Bangladeshis are at even a higher fraction than the Pakistani populations. The genotype counts are like so: 12 CC, 54 CT, TT. When I saw this I assumed it was the East Asian admixture, on the order of 10-20%, which might account for the enrichment of T in relation to Pakistan groups. But that is not correct. Here are the counts for Indian Telegus: 20 CC, 49 CT, and 33 TT. And Sri Lankan Tamils: 23 CC, 49 CT and 30 TT. Many hypotheses about the derived variant involve adaptations to cold climates in Northeast Asia. This may still be the case in Northeast Asia, but what you see here is a NW to SE cline of ancestral to derived variant of ABCC11 in South Asia. The Punjabis and Gujaratis have higher fractions of the ancestral variant, as you’d except from the HGDP data.** (the fraction in the Bangladeshi sample might be elevated by East Asian admixture)

The results form East Asian samples in the 1000 Genomes is also illuminating. With sample sizes of around 200 each the Dai minority (related to the Tai people culturally as their antecedents) has a frequency of 56% for T, the Han from Beijing have 97%, the Han from South China are at 86%, the Japanese 88%, and the Vietnamese from the southern region of the country 64%. First, my intuition is that this seems a strange pattern for a allele which was selected on a recessive trait. Rather, it looks more likely for selection on a dominant trait, where the equilibrium frequency remains below 100% because of recessive expression of the unfavored state. Second, the fraction for the Dai seems rather high for the ancestral state. This particular population is sampled from the Mekong region of southern China, as far south as you can go in the nation. This sort of cline correlated with latitude goes a long way to explaining why the thesis often emerged that this variation is somehow related to climate (there is something of a north-south cline in Japan as well).

Where does this leave us? I honestly don’t think we can make a general conclusion about the nature of selection around this variation. To me it looks as it was functionally constrained in Africa. African populations have the derived variant, but those that do can be explained via recent Eurasian admixture pretty easily (e.g., the LWK sample are Kenyan Bantus who have mixed with Nilotic peoples, who do have Eurasian ancestry. The same for the samples from Gambia or Senegal in relation to Eurasian mixed Fula). But once you leave Africa it look as if the constraint was removed, and lots of populations have low frequencies of the derived nonsynonymous mutation. The 2006 paper which focused in on the SNP of interest had Oceanian samples, and the derived variant fraction is too high to simply be a matter of Austronesian admixture. Could it be some form of balancing selection outside of Africa? Who knows. It might be neutral in some areas, under positive selection in others, balanced in a few locations, and under constraint in Africa.

But despite the evolutionary enigma of this locus, the phenotypic correlations keep building up. It’s a classical genetics illustration because of its Mendelian character. In terms of morphology I should emphasize that the body odor related information probably relates to the apocrine glands, which are localized in the armpits and genitals, and also are precursors to mammary secretion glands. Someone who understands these sorts of pathways and how they influence development could probably say much more. I’m sure at some point we’ll be able to answer the big evolutionary questions about this locus, and how it relates to human biological variation, but that will probably necessitate a better catalog of its phenotypic consequences.

Addendum: If you have a 23andMe account, here is the link that will show you your genotype (and anyone else on your account): (be logged in ahead of time).

* I flipped the strand, so converted T to A and G to C.

** To be fair, there was some evidence from Tamils in earlier studies, but two South Indian populations in the 1000 Genomes with high sample sizes nails it.

🔊 Listen RSS

J. B. S. Haldane

J. B. S. Haldane

Fitness is an easy concept to talk about, but in practice it can be quite slippery. This would seem to contradict John Maynard Smith’s contention that biologists have it easy in comparison to social scientists in the context of game theory, because the bookkeeping is easier since fitness is an obvious currency. In any case, until recently outside of laboratory conditions fitness and its evolutionary genetic converse load have been of theoretical rather than empirical interest. But with genomics, and the ability to detect deleterious alleles to a high degree of precision these old issues have become live anew.

In 2008 a paper came out which reported that Europeans had more genetic load than Africans, Proportionally more deleterious genetic variation in European than in African populations. At the time I recall Greg Cochran was somewhat skeptical on grounds of biomedicine, and some rather unrealistic demographic assumptions (an realistically long bottleneck). The basic finding was simple, because of the “Out of Africa” event Europeans (and presumably all non-Africans) would exhibit a higher load of deleterious alleles because of the reduced power of selection in relation to drift. Over the past seven years that simple result has come under critique, and the first author of the 2008 paper now has a review which resolves the conflicting results, The distribution of deleterious genetic variation in human populations, out (the link is to the preprint, which has been around for a while). The short of it seems to be that the distribution of frequencies of deleterious alleles may differ across populations as a function of demographic history, with the bottleneck and rapid population growth resulted in an excess of rare alleles in non-Africans, but the large population producing more efficacy of selection. The theory itself in the paper is less interesting to me than the conclusion. Here he states:

Future work should include examining empirical patterns of deleterious mutations in other human populations that have differing populations histories, such as different amounts of recent population growth. Studies with large samples of individuals will be particularly helpful as they will be informative regarding how deleterious mutations have behaved during recent times….

Genomics is powerful. For the sort of subtle evolutionary patterns which researchers are trying to sniff out it strikes me that good quality whole genomes in larger numbers across more populations are probably necessary before we can make robust generalizations about humans, let alone other species. Cautious is definitely important because the first wave of SNP-chip results seem to have produced a set of results which were interpreted in light of theory, without understanding that the empirical results were only a sliver of reality constrained by the methods at hand.

• Category: Science • Tags: Deleterious mutations, Population Genetics 
🔊 Listen RSS

James C. Chatters 2002 book

James C. Chatters 2002 book

By now you may have read the breaking news in The Seattle Times that Eske Willerslev’s group is going to publish genetic results on Kennewick Man. This “scoop” was obtained through the freedom of information act, which makes sense since Kennewick Man has been embroiled in political controversy since the beginning of its discovery by James Chatters in the 1990s. The issue is that morphologically the remains were not typical of contemporary Native Americans, which might cause some doubt as to the legitimacy of the social-political rights of the indigenous people of the region today. The social-political aspects have been beaten to death, and I am not particularly interested in that area. Rather, the science is more fascinating, if, somewhat less surprising in light of the results that are going to come out in the near future.

2019387254The most famous reconstruction of Kennewick Man is strange because it resembles British actor Patrick Stewart. Humans use phenotypes, morphology, to ascertain genetic relatedness when DNA is not available. In the 1990s DNA was not available. The inference by many researchers who had access to the remains was that Kennewick Man was different because his morphology may have resembled a person of European heritage. The controversy turned into such a circus that somehow Steve McNallen, arguably America’s foremost Northern European neo-pagan expositor, made claims on the remains on the same grounds as Native American people! Later scholars suggest that perhaps Kennewick Man was not so much European, as not typical of contemporary Native Americans (e.g., perhaps he was part of an early migration of basal East Eurasians related to the Jomon of Japan).

41VAznr2aiL._SX258_BO1,204,203,200_ If the Seattle Times report is correct, and I believe it is, Kennewick Man is part of the ancestral population to modern Native Americans. This should put to bed most of the political debate, since the results are likely to mollify many Native activists. But, there are still details to be fleshed out. A 2012 publication suggests that there was a secondary migration out of Eurasia, which resulted in the Na-Dene group which is common in the northern and western portions of North America. In contrast, Kennewick Man is likely to belong to the first ur-North Americans, who arrived as a relatively small population from Berengia ~15,000 years ago. This is the overwhelming majority of indigenous ancestry, and south of the Rio Grande basically the totality.*

Due for an update!

Due for an update!

The context here is important. One insight of modern ancient DNA is that there has been a great deal of population turnover over the past ~10,000 years, as well as admixture between disparate lineages. When Kennewick Man died ~9,000 years ago Europeans as we understand them did not exist genetically. All across Eurasia, Africa, and Oceania, the Holocene brought radical demographic turnover (with some exceptions such as the Andaman Islands and the deserts of southwest Africa). The New World was somewhat different, as I implied above. There were some demographic disruptions, but south of the Rio Grande, and across the eastern half of North America, the populations descend from a relatively homogeneous founder stock which arrived at the end of the Pleistocene. The fact that many remains seem “atypical” for the morphology of Native Americans is strong evidence of in situ evolution.**

Years ago a physical anthropologist told me that when you look at Amazonian natives they “looked” like Siberians. Yes, they had changed and adapted, but only somewhat. It illustrated to me the powerful constraint of limited genetic variation upon populations. Similarly, though there is variation in pigmentation among native populations in the New World, it is far less than you see in the Old World. Why? Perhaps it is a function of different (or lack thereof) of selective pressures. Or, perhaps the variation wasn’t there for selection in the first place? The history of the Old World has jumbled all our easy narratives. The New World may actually be a godsend because of the simple elegance of its demographic history.

* From my Twitter exchanges with Pontus Skoglund I believe there is some population structure in the founding “First American” group, though not a great deal.

** Admixture is an issue, but that can be obviated by genetic testing, as well as looking at early modern remains.

🔊 Listen RSS

hartle A friend of mine is beginning grad school and has settled upon a lab. The core research within the laboratory is population genomics, and they now need to get up to speed in the area. Taking a class is certainly the start. You can read Haldane’s Sieve to keep up on the literature, which is a necessity if you are doing genomics work, as texts get out of date quickly. Additionally, Graham Coop, Joe Felsenstein and Kent Holsinger have excellent online notes. The upside to this is that they are free. The downside is sometimes you are away from a computer screen. Often a soft intro recommended by many is John Gillespie’s Population Genetics: A Concise Guide, which nicely has a Kindle edition. But if you are going to do graduate level work, I think it is best to just go whole hog. The Gillespie book is appropriate for a quick course or for the undergraduate level, but you really need something as a reference at some point. And for that nothing beats Daniel Hartl and Andrew Clark’s Principles of Population Genetics. There are other texts out there in this area. For example, I have Philip Hedrick’s Genetics of Populations, and Alan Templeton’s Population Genetics and Microevolutionary Theory. For various reasons I would still pick Hartl & Clark if I had to pick.

falconer I also think it’s important to know quantitative genetics, and for that Trudy MacKay and Douglas Falconer’s Introduction to Quantitative Genetics is the best bet in the business that I know of. It’s an excellent complement to Principles of Population Genetics because it starts with pop gen foundations. Derek Roff’s Evolutionary Quantitative Genetics and Michael Lynch and Bruce Walsh’s Genetics and Analysis of Quantitative Traits are probably too specialized for the beginner, and frankly even many steeped in the field haven’t read those books.

slatkinnielsen There are plenty of other books out there which might suffice in some fashion. In my previous post I mentioned Elements of Evolutionary Genetics. The old John Maynard Smith classic Evolutionary Genetics is also excellent. But if you are working in genomics and want a book less focused on classical methods and geared toward contemporary best practices, then Rasmus Nielsen and Monty Slatkin’s An Introduction to Population Genetics: Theory and Applications is pretty good. It’s a short book, and because it’s in its first edition there are many errors in it. From what I recall it was developed out of notes from a course taught at Berkeley, and it outlines the sort of methods you see in the papers which being published today, utilizing coalescent theory and site frequency spectra. It might be a reasonable quickstart, though I’m not sure it is developed well enough to be a reference (for what it’s worth, I have a copy of it too, and it is being used in graduate level courses here at UC Davis).

• Category: Science • Tags: Population Genetics, Population genomics 
🔊 Listen RSS

elementarysofevolutionarygenetics In the early 1970s the eminent evolutionary geneticist Richard C. Lewontin wrote that population genetics “was like a complex and exquisite machine, designed to process a raw material that no one had succeeded in mining.” By this, Lewontin meant that in the 1930s when R. A. Fisher, Sewall Wright and J. B. S. Haldane established the theoretical foundations of the field, the techniques to discover the variation in populations to test their suppositions was rather thin (naturally, this resulted in many controversies, see The Origins of Theoretical Population Genetics). Geneticists were using classical methods, utilizing salient phenotypes which were proxies for underlying genetic markers, and tracing patterns of co-inheritance of traits with known locations in the genetic map with novel mutants. Researchers were not even clear at that point as to the underlying biochemical structure of the particle of Mendelian inheritance, what we term DNA. That arrived onto the scene in in the 1960s. But in the early 1970s when the above was written we’re not talking about DNA sequencing. Rather, this is the allozyme era, which Lewontin helped usher in with a paper in 1966. He expresses the excitement of the times later in the passage:

Quite suddenly the situation has changed. The mother-lode has been tapped and facts in profusion have been poured into the hoppers of this theory machine. And from the other end has issued–nothing. It is not that the machine does not work, for a great clashing of gears is clearly audible, if not deafening, but it somehow cannot transform into a finished product the great volume of raw material that has been provided.”

Despite the pessimism expressed above the emergence of molecular evolution stimulated the debates around neutral theory. Over a generation ago evolutionary geneticists were grappling with the swell of data which was confronting theoretical frameworks constructed in the early 20th century. Today we live in the “post-genomic” era, and now think in terms of whole genomes. The details may differ, but many of Lewontin’s observations in the 1970s still hold true, as novel results meet the paradigms of old. Last month in PNAS Brian Charlesworth published a paper which brought this to mind, Causes of natural variation in fitness: Evidence from studies of Drosophila populations. You may know Charlesworth as the coauthor of Elements of Evolutionary Genetics, an encyclopedia of a text which I highly recommend to all. In the paper, which is both review for those of us not steeped in Drosophila genetics, and a distillation of derivations to be found in the supplements, Charlesworth notes that there is a contradiction in terms of the typical selection coefficients inferred for deleterious alleles from population genomics in relation to those from quantitative genetics. Population genomics is a new field, and involves sequencing many markers (often whole genomes) to good accuracy across a reasonable number of individuals. Quantitative genetics is a more classical framework utilizing statistical methods which interpret variation in traits within laboratory populations.

220px-Drosophila_repleta_lateral The fruit fly has a storied role in Mendelian genetics. To a great extent the study of the fruit fly is the early history of Mendelian genetics (see Lords of the Fly: Drosophila Genetics and the Experimental Life). Therefore it is natural that a large body of research exists in this area, and one can’t accept novel results obtained through new methods such as genomics at face value without some degree of skepticism. Charlesworth notes that the extremely small fitness effects of the mutation discovered via genomic methods are biased toward single nucleotide variants (SNVs); point mutations. In contrast it seems likely that the larger effect mutations implied by quantitative genetic studies, which are rather rare, and so missed in population genomic sample sizes, are due to transposable elements (TEs) interspersing themselves across the genome, and presumably disrupting function. In line with older theoretical models, most of the variation in fitness is due to a small number of mutations. Presumably as genomic methods get better (e.g., longer read to catch repeat elements and larger sample sizes) they will converge upon the older established quantitative genetic methods. Two interesting other results in this paper is that much of the variation is due to balancing selection. For theoretical reasons balancing selection can not be pervasive across the genome (too much fitness variation would result in huge death rates per generation), but, of the variation within the population much of it is maintained by balancing selection according to Charlesworth. Another interesting dynamic is that the population genomic method seem to be better at capturing the distribution of fitness effects in humans, because of our smaller effective population size. You can read the paper for the technical reason why, but the key here is to remember that one has to be careful about extrapolating from model organisms. The models are imperfect, and we always need to never outrun our ability to generalize.

As genomics becomes pervasive in population genetics this sort of analysis will be more common. Rather than “genome-of-the-week” papers we’ll move to actually trying to grapple with what the sequence data is telling us specifically about the lineage in question, and, what we can generalize from the results about evolution writ large. Some organisms have a long history of scientific study, so population genomics will supplement and complement. In other cases though organisms do not have such a rich literature and scientific culture, and the pitfalls that are highlighted here might alert us to the deficiencies in genomic methods.

Citation: Charlesworth, Brian. “Causes of natural variation in fitness: Evidence from studies of Drosophila populations.” Proceedings of the National Academy of Sciences (2015): 201423275.

🔊 Listen RSS
Citation: Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations

Citation: Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations

Balaresque_FiguresRevised251114 copy When it comes to human evolutionary genetics there are two broad areas of interest for me. One the one hand there are classic questions of functional biology and population genetics. Variation of traits and how that variation was selected for over time and space. Then there are the issues of demography, phylogeography, and phylogenetics. This is the domain under which “historical population genetics” tends to fall. Between 1995 and 2005 there was a significant period when the focus was on reconstructing phylogenetic trees inferred from uniparental maternal (mtDNA) and paternal (Y chromosomal) lineages. Using a coalescent framework these non-recombining regions generated intuitively appealing and computationally tractable trees, which illustrated relationships across history. These were often superimposed upon geographical maps to reconstruct patterns of the past. The_Journey_of_Man_-_A_Genetic_Odyssey Since 2005 the emergence of dense SNP chips, where individuals could be typed on hundreds of thousands of markers, ushered in a new era and uniparental studies faded somewhat into the backdrop (and today we are moving into whole genome analyses). But sometimes the uniparental research is still useful, in particular since there is already a huge databank of samples and studies which one can leverage. A new paper in The European Journal of Human Genetics does just that, Y-chromosome descent clusters and male differential reproductive success: young lineage expansions dominate Asian pastoral nomadic populations.The figure at the top of this post is a summary of the primary results, which show how extremely common Y chromosomal haplogroups in their data set can be correlated with particular historical events. The authors used a data set of over 5,000 males across a huge range of Eurasian populations. Surveying the genetic variation it is clear that the haplogroup counts exhibited an exponential distribution. Many of the genotypes were found in only a few individuals, but a few were found in many individuals.

510CbnsBGLL._SY344_BO1,204,203,200_ The authors refer to the haplogroups as “Descent Clusters” (DC) rather than haplogroups. You can see what the DCs are in the table at the top. DC2 is the familiar haplogroup R1a1a, of which I am a member. DC1 is the “Genghis Khan” haplogroup. Because they’re using fast mutating microsatellites the coalescence estimates have wide intervals. But, I am nearly 100% sure that R1a1a coalesces to a period more recently than 10,000 years ago in the past. The reason is that I saw some posters using whole genome sequences from the Y chromosome at ASHG. These should be more precise estimates because of the enormous marker set of more slowly mutating SNPs, and they too arrived at a relatively recent period for the last common ancestor of these common male lineages. In fact, if I recall correctly the divergence between R1b and R1a dates to ~10,000 years before the present in these studies, so R1a must have a much more recent coalescence. The TMRCA for the R1a1a expansion is suspiciously close to the most recent paper on the emergence of South Asians from an admixture between an indigenous group and West Eurasians to come out of the Reich lab, Genetic Evidence for Recent Population Mixture in India. But, even in this paper there is evidence of distinct inputs of Y chromosomes from the west into South Asia, so I suspect it too supports the proportion that the admixture between West Eurasian and indigenous groups occurred between separate and diverse West Eurasians, and not just one group (i.e., the Indo-Aryans may have been the last West Eurasians who arrived in rapid succession over the period between 3000 and 1000 BC). These results also seem to support the conjecture that the ancestors of “Austro-Asiatics” ranged far and wide.


R1a1a resplendent

In the ultimate evaluation I am less interested in the specific stories than in the general one. Is this pattern of “super-male” lineages new? The “Altaic” DCs clearly are associated with the Turks and Mongols, and emerged in the light of history. R1a1a and its cousins are older, and live in the shadowy zone of archaeology on the precipice of history. But is this pattern primal to our lineage? My own conjecture is that on the whole this pattern was prefigured in the ancient past whenever founder events occurred. For example, in the expansion into Oceania and the New World. But what is different about the world after the Neolithic is that periodically the tree of patrilineages was “pruned”, as one branch would rise to rule them all for a moment. There would be an elimination of numerous ancient lineages as a new shining star would dominate the firmament. But the echoes of that moment reverberate down the millennia, as one can see in the haplogroups which are prevalent across vast swaths of Eurasia, and at a frequency far out of proportion to the norm. Like a thunderbolt, demographic revolutions explode onto the human cultural landscape, and reshape the future topology of lineages on a regular basis.

🔊 Listen RSS

Nice review, Advances and limits of using population genetics to understand local adaptation. In particular the focus here is on the insights one can derive from new genomic methods (e.g., think of SFS analysis). But they on a cautious, perhaps even down, note:

Many of the important questions in local adaptation being pursued with population genetics approaches begin – rather than end – with identifying loci responsible for variation. It is therefore important to realize that a full accounting of local adaptation at the molecular level goes beyond having high-quality data to analyze and statistical methods to identify causative genes. The crux of the challenge is that most ecologically important traits responsible for local adaptation are quantitative, and identifying all of the genes responsible for variation in quantitative traits is likely not possible. Even the cumulative explanatory power of individual loci identified in human genotype–phenotype association studies, which often involve tens of thousands of individuals, is generally only a small percentage of the phenotypic variation….

But I want to note that they cite a paper from 2012, and the work in capturing the fraction of phenotypic variation from genomic variation for height has gotten much better since then. So perhaps in that way that is a reason for optimism. Though, as they note, humans are a best case scenario since the sample sizes are enormous. Just a nice reminder of the limitations of the ‘post-genomic’ era.

• Category: Science • Tags: Population Genetics 
🔊 Listen RSS
Carrion Crow

Carrion Crow

Haeckel's "tree of life"

Haeckel’s “tree of life”

Being the way we are we humans attempt to comprehend the world in a manner which is intuitively graspable. Obviously some ideas are derived from environmental inputs. If you learn a little math and start talking about a multi-dimensional universe beyond the three spatial ones which we can grasp, then obviously you’re seeing the power of higher order abstraction detached from lived experience. But science is usually not so rarefied in relation to our lived reality. Our intuitions about the world often interface with our broader theories, many of which clearly shape scientific models, even if in the end these models extend far beyond the limits of our Gestalt cognition. How we grasp the whole of the universe has an effect on how we break nature apart at its joints.

The evolutionary ideas which were ascendant in the Victorian age, crowned by Charles Darwin’s theory of the origin of species via natural selection, illustrate both of these realities. On the one hand evolutionary ideas are as old as the Greeks, and likely older in that the Ionians made formal and abstract many folk theories which were likely floating about in the world of antiquity. But there were those then, and now, who had difficulty comprehending the evolutionary nature of speciation, and the morphological change which results in phyletic gradualism (e.g., for Creationists “macroevolution” is always the problem). The likely psychological root of skepticism of speciation is that humans seem to have innate ideas as to the nature of kinds and categories. Plato’s speculations about eternal forms leverage deep intuitions that we have about the world around us which can be discerned even in infants that there are essences, an order and plan. What evolutionary biologists term “population thinking” is not natural, and continuity is often rendered in a discrete fashion when it comes to everyday terminology. A concept such as species has the dual benefit of both being intuitive and aligning with our natural prejudices about the world, and also being useful in the everyday practice of science. But the fact is species are not a real phenomenon, such as the acceleration of a ball in space, but a useful shorthand which brackets a range of concepts.

speciation My attitude toward the term “species” is strongly informed by the instrumental views which are interleaved throughout H. Allen Orr and Jerry Coyne’s book from the mid-aughts, Speciation. That is not to say that the book is perfect, at least from the perspective of some plant biologists. But that’s why I emphasize an instrumental view of species, what might be a useful classification for a plant biologist may not be a useful one for a zoologist, let alone a bacterial geneticist. Species as a concept only exists to delineate and clarify our thinking unless you have a religious model which presupposes ideal kinds brought about by the hand of a designer. Scientific taxonomy is only a rough and approximate mapping of the reality of natural history and evolutionary genetics, which it purports to collapse informatively. And with all the problems with the species concept, recall that it is the “most real” of taxonomic categories which we use (e.g., the biological species concept is moderately coherent).

Naturally this does not mean that there are no differences between the populations we term species, simply that we shouldn’t lose sight of the fact that the way we describe nature is often shorthand which obscures as well as illuminates. The debate about species concepts can be informative and interesting, but it has its limits. I do not hold to the position that there is “one definition to rule them all.” Which brings me to a new paper in Science on crows, The genomic landscape underlying phenotypic integrity in the face of gene flow in crows:

The importance, extent, and mode of interspecific gene flow for the evolution of species has long been debated. Characterization of genomic differentiation in a classic example of hybridization between all-black carrion crows and gray-coated hooded crows identified genome-wide introgression extending far beyond the morphological hybrid zone. Gene expression divergence was concentrated in pigmentation genes expressed in gray versus black feather follicles. Only a small number of narrow genomic islands exhibited resistance to gene flow. One prominent genomic region (<2 megabases) harbored 81 of all 82 fixed differences (of 8.4 million single-nucleotide polymorphisms in total) linking genes involved in pigmentation and in visual perception—a genomic signal reflecting color-mediated prezygotic isolation. Thus, localized genomic selection can cause marked heterogeneity in introgression landscapes while maintaining phenotypic divergence.

Citation: Poelstra, J. W., et al. "The genomic landscape underlying phenotypic integrity in the face of gene flow in crows." Science 344.6190 (2014): 1410-1414.

Citation: Poelstra, J. W., et al. “The genomic landscape underlying phenotypic integrity in the face of gene flow in crows.” Science 344.6190 (2014): 1410-1414.

You may wonder how a paper on the population genomics of crows relates to the broader philosophical issues I was alluding to earlier. Simple, as science advances it sheds light on the true and fine-grained shape of the world around us, rather than our coarse preconceptions. We look through the glass darkly to infer our innate ideas. Modern taxonomy has its origins in Carl Linnaeus’ system, and the status of carrion vs. hooded crow in terms of whether they are species or subspecies has a history which goes back at least to this period. This paper in Science seems to have “solved” the issue in substance, if not style. By substance I mean that the authors have extracted enough genetic information that all the blank spots in our discussion are filled in to my satisfaction. On the whole genome level one can’t differentiate the two crow species/subspecies as clear and distinct entities. German carrion crows are genetically closer to Polish hooded crows in terms of total genome content. But, when it comes to a few specific regions of the genome which affect diagnostic physical characteristics, the pigment of pelage, as well as variation in behaviour, the two groups in fact are quite distinct. To obtain these sorts of results the science had to be top notch. Or at least 2014, not 1814. They sequenced a male hooded crow to greater that 100x coverage to generate a reference sequence, which is very high. Then they sequenced a 60 carrion and hooded crows to greater than 10x coverage, which is reasonable for population genomic work, especially if you can align it to the reference.

Citation: Poelstra, J. W., et al. "The genomic landscape underlying phenotypic integrity in the face of gene flow in crows." Science 344.6190 (2014): 1410-1414.

Citation: Poelstra, J. W., et al. “The genomic landscape underlying phenotypic integrity in the face of gene flow in crows.” Science 344.6190 (2014): 1410-1414.

The basic major result is illustrated in the figure to the right. What you see is that overall the genetic divergence between German carrion crows and Spanish carrion crows, the latter being the putative source population, is rather large comparatively (Spanish vs. Germany vs. Swedish vs. Polish). In contrast there is minimal genetic divergence between German carrion crows and Polish hooded crows, as one might predict by geographic. But, there are exceptional regions of the genome, as is clear when you look at the emphasized spikes in F ST. In other words, continuous gene flow has homogenized between population differences, as you’d except from basic theory (across two demes N >= 1 sufficient to prevent divergence), but selection pressures along very salient traits have resulted in a shaper distinction along a few genomic regions. The interesting point here is though that this isn’t due to any ecological distinction. For example, when it comes to pigmentation some human populations (e.g., Africans and Melanesians) resemble each other despite huge whole genome differences (Melanesians are just another branch of “Out of Africa” humanity). But one can posit a clear ecological rational for why this might be. Not so for carrion and hooded crows. Intuitively it seems obvious that Germany shares more ecologically with Poland than it does with Spain. So what’s going on? The authors provide a likely answer: “A key feature that distinguishes the crow system is the apparent lack of ecological selection on the maintenance of separate phenotypes. Instead, the data presented here are consistent with the idea that assortative mating and sexual selection can exclusively cause phenotypic and genotypic differentiation.” Instead of a speciation gene, these may be “speciation genomic regions” (yes, it has less of a ring to it, I admit).

So where does this leave us in terms of species concept? Well, your mileage may vary. In the accompanying commentary by Peter de Knijff there is some bashing the bar code of life idea of systematically identifying species differences using DNA. I don’t think there’s a problem with the bar code of life as long as one understands that one shouldn’t confuse the measure with what one is measuring. The concept species is not like the speed of light, it is freighted with assumptions, and means different things to different people. If one understands that ahead of time then a consistent language or measuring stick can still be highly useful, if not ultimately informative in a deep ontological sense (i.e., atoms/quarks are fundamental to material objects in a way that species are not in regards to variation among living organisms).

This specific result is also not entirely surprising, though it is nice to see it worked out in a specific case. The connection between physical appearance and species distinctions is an old and intuitive one, despite the importance of genealogical concepts when it comes to our intuitive essentialism. And this applies to taxonomic levels which are lower, as far back as Charles Darwin sexual selection was posited as the reason for racial differences in appearance for humans (Jared Diamond promoted this view in The Third Chimpanzee). Back in 2003 Henry Harpending brought to my mind the idea that human differences in phenotypes can persist across populations despite overall genomic similarities. To me this reinforces that genomics has come not to bring peace to old truths, but a sword of empirical reality to old preconceptions. Rather than dithering as to the “best” term to describe genetic variation and evolutionary process, we can actually go about describing it in close to its entirety, and let the chips fall where they may. Compute and quantify. The rest is commentary.

• Category: Science • Tags: Genomics, Population Genetics, Speciation 
🔊 Listen RSS

Citation: Wilde et al.

Credit: Igor Kruglenko

Credit: Igor Kruglenko

A new paper in PNAS, Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 years, uses ancient DNA to examine the possibility of very recent natural selection in Europeans. In particular, it focuses on eastern Europeans, and roughly a region coterminous with Ukraine ~6000 to ~4000 years ago. The sample seems somewhat biased toward the low end of the age range if you look the supplemental tables. In the paper itself (which is open access) I don’t see a map to get a sense of the distribution of the sites from which the DNA was extracted. So I took the supplemental table and used the latitude and longitude information, as well as the samples from each site, and produced a density map with a bubble plot overlain upon it with specific locations (size of bubble proportional to number of samples at site). Like the earlier ancient DNA from a few European hunter-gatherers one must keep in mind the limitations of the scope of sampling so few to infer about so many. Though the number here is far larger (N >20 or >40 depending on the SNP), the set of markers examined was much smaller, a few pigmentation loci and mtDNA. Nevertheless this is not a trivial geographic example, nor is the time frame, from the Early Eneolithic down to the Bronze Age.

Figure S1

Figure S1

The clearest illustration of the topline result is found in the supplements (I prefer figures to tables obviously). What you see here is that there is a large difference in allele frequencies between ancient samples and modern ones from the equivalent geographic region at specific markers diagnostic for variation in pigmentation in modern Europeans. HERC2 is well known for being one of the two loci which span a long haplotype strongly correlated with blue eyes in Europeans. SLC452 and TRY are part of the standard suite of pigmentation genes which show up as variable across Eurasia. I am confused as to why they did not focus on SLC24A5, a locus which is nearly totally fixed in modern Europeans for the A allele, but may not have been so in hunter-gatherers. But in any case the result is rather clear: the ancient populations sampled here are statistically differentiated from modern populations in the same region, and, seem to have been darkly complected in comparison. The natural inference then is that powerful sweeps of natural selection increased the allele frequencies of lightening alleles in Europeans within the last ~4,000-6,000 years. This is not a crazy proposition; tests for recent natural selection in Europeans are often enriched around pigmentation loci, which are genomically atypical (long homogeneous blocks are common). What this study does is intersect inferences from modern variation with the distribution of variants in an ancient population presumed to be ancestral.

The problem of course is whether these are truly ancestral. But recall I stated earlier that they had mtDNA. This is copious, and so rather easy (comparatively!) to get from ancient DNA. Comparing their samples with modern ones from the region they find there isn’t great discontinuity. Using a model of genetic drift they support the scenario of continuity, and that the F st of ~0.005 is what you would expect for a set of populations ~4,000-6,000 years in the past. To put this in perspective this is about the Fst using autosomal SNPs between Russsians and French, or Palestinians and Greeks. Considering the time depth separating these putative populations I think even without their coalescent simulation models I can accept continuity of mtDNA intuitively. Of course the key is to not forget this is mtDNA, only the maternal lineage. If you looked at modern South Asians you’d see they’re mostly not West Eurasian. But if you looked at their Y chromosomes they’d be mostly West Eurasian. The autosomal DNA gives a half & half picture. The issue of sex mediated gene flow is made even more stark in the case of Latin America.

k8488 A model like is made more plausible by the fact that many of these individuals were of the Yamna culture, Kurgans. The thesis forwarded by some scholars is that it is these Kurgans, a patriarchal nomadic society, who brought Indo-European languages to central and western Europe ~5,000 years ago (their eastern cousins becoming Tocharians and Indo-Iranians, their southern ones Hittites and perhaps Armenians). Probably the best recent outline of this thesis is by David Anthony in The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. I found it so engrossing that I finished it in one sitting in 2008. If these data are correct the Kurgans did not look like blonde Aryan Übermensch, rather, they became that (though to be fair, in this case we are talking about them becoming Slavs, who the Nazis labelled Untermensch). But one of the general assumptions about Kurgans is that they were groups of mobiles males. In that case one wouldn’t be surprised if their mtDNA tended to reflect subject peoples, while the whole genome was more mixed and cosmopolitan, reflecting their migrations.

So the crux then is whether to trust this mtDNA evidence as representative of the whole genome. If I simply had the mtDNA, along with the information about provenance in terms of time and place, I’d probably accept the argument for continuity. But the phenotypic markers are so different, either there’s been population replacement, or, we’ve had a lot of in situ selection. Replacement seems like the more boring hypothesis, especially in light of the fact that many of the sites sampled were not in classically Slav zones of habitation, but were occupied by Iranian or Uralic peoples, or more recently Turks. Though the researchers are using contemporary East Slavs to compare to the ancient samples, across many of these sites Slavs only become dominant in the area with the rollback of the Ottomans in the 18th century.

Ultimately I’m very unsure that the assumption of genetic continuity in this case will hold, so let’s simply take that as a given for now. Then what? You have lots of selection. The question naturally moves to why. What drove the selection? In the discussion the authors the go over many of the hypotheses rather thoroughly. Roughly they fall into two classes, the ecological/environmental and the social/sexual. The former generally has do with a combination of a switch to agriculture and the need to synthesize vitamin D due to the shift away from fish in the far north. The latter focuses on sexual selection, and favoring particular markers due to heightened paternity certainty. In particular the sexual selection hypothesis would seem to be able to explain the rise of HERC2, which is associated with light eyes, as that may be a favored trait. The immediate rejoinder is provided in the text: many of the pigmentation loci have pleiotropic effects. In other words, they tune overall pigmentation, skin, hair, and eyes, though perhaps to different extents. So if the selection was environmental due to skin it would not be totally surprising if hair and eyes changed as a side effect. Of course, as suggested in the comments here one need not posit that there was one singular selection event, as opposed to a sequential composite. Perhaps it was both environmental and sexual selection?

This again is another area where I’ll throw my hands up the air. If selection is the answer, and not population replacement, then it’s very strong. It seems that these loci were subject to sweeps in the same range of power as that around LCT, for lactase persistence, the Tibetan high altitude adaptations, as well as the various malaria resistance alleles (which have different selective dynamics, some of them balancing). One can actually still detect differential fitness at high altitudes based on phenotype, and the same with malaria, at least before modern medicine. The problem I have is that I’m just not aware of studies on the extent of differential fitness in human populations due to sexual selection. In theory sexual selection is very powerful, especially in contexts of hyper-polygyny, but to have it be realized in humans would require very particular social structures. The environmental selection arguments by their nature tend to be simpler, and therefore more attractive. But we’ve reached a point where there’s a lot of confusing stuff coming out of ancient DNA, and we need to go back to first principles, and reexamine everything. This includes sexual selection, as more than simply a deus ex machina to throw out there when we don’t have a better model on hand. That necessitates a serious examination of patterns of variance in reproductive output by phenotype, and plugging these back into models of selective sweeps.

Citation: Direct evidence for positive selection of skin, hair, and eye pigmentation in Europeans during the last 5,000 years

Note: Yulia Tymoshenko has very dark eyes. So I assume she’s not a natural blonde.

🔊 Listen RSS


There’s been a lot of talk on Twitter and the blogs about PLOS’ new data sharing policy. I don’t have much deep to say, except that I’m for it. I do think from what I can tell that there is a cultural element to the reaction, pro or con. People in genomics seem to be responding of the form “yes, of course.” On the other hand those in other fields have less positive reactions.

You can go elsewhere to hear “both sides.” I am confident that this will be the future, and the naysayers will have to deal. One of the major reasons that formalized data release is good is that in a field like genomics there is more data than people to analyze the data. By this, I mean that you can ask many different questions of data, but you may only be interested in a subset of those questions. Other people in your lab might have different questions, but ultimately you’re probably leaving avenues on the table because you don’t have the time or inclination. To give you a funny example, a few years ago I stumbled on the fact that Dan MacArthur probably has recent (>200 years) South Asian ancestry. As an academic genomicist Dan could have dug up this fact himself, but he has grants and papers to write, not to mention a non-scientific life. So it was left to me to stumble upon the fact. On the margin it’s not that useful to Dan, but it’s something. You never know what’s going to happen when you release data, because you can’t read the minds of others. And that sort of surprise is a good thing.

One of the greatest intellectual philanthropists in recent years has been Mait Metspalu. He has plenty of publications to his name, but he’s also generously released and assembled the data together in convenient form. This allows for easy reanalysis. A few days ago I noticed that he had put up a few more European populations, including understudied groups like Greeks. With the recent flair up on Ukraine I thought I would process some of the new data. I pruned the data set down to 230,000 high quality SNPs, and focused on a large and small data set respectively of 500 and 340 individuals.

Click for larger images.






– As suggested by Dienekes modern Greeks seem to have been impacted more by northern gene flow (Slavs) than the inhabitants of Magna Graecia (Southern Italy and Sicily)

– There’s not much difference between Poles, Ukrainians, and Russians (though there are Russian samples from traditionally Finnic regions which are more diverse)

– Not much difference between Romanians, Bulgarians, and Hungarians

– The Northern European clusters can separate reasonably. Slavic, Finnic, and Germanic

I’ll leave it to readers to make further comments.

Tools used: Plink 1.9, ADMIXTURE and TreeMix.

Methods: First two plots are MDS representations of pairwise genetic differences between individuals. I used kerneling to lasso around the centroids of specific populations. The middle two are from TreeMix, and I asked for 5 migrations, rooting with outgroups, and allowed to reorder globally. Finally, the last is just ADMIXTURE. Ran at K = 6. You see the mean for each population.

• Category: Science • Tags: Population Genetics 
🔊 Listen RSS

hartle Population genetics is a moderately technical field (at least at the shallower end of the pool, there are some subfields which veer into applied math), and I am finding it difficult to distill it all down in a very simple fashion to readers who are asking serious questions. To gain a full measure of many of the posts on this website it helps to understand the basics of population genetics. There’s really no short cut, just like you have to do some study if you want to talk about quantum mechanics in any serious fashion. If you are happy reading on a screen, then there are many free resources on the web. I would recommend Graham Coop’s population genetics notes, the classic ones hosted at UConn, and Joe Felsenstein’s Theoretical Evolutionary Genetics, for a start. But if you need a old fashioned book that you can hold in your hands, there are a finite number of choices. To me the closest to a “gold standard” is probably the “Hartl & Clark” text, Principles of Population Genetics. As is the norm among most technical texts it is expensive, though worth it. But notice when you a search that there used copies of the earlier editions which are affordable. The updates in the newest edition due to genomic technologies and such in the are not necessarily worth an extra $80 in my opinion if you just want basic population genetics. If you can understand Principles of Population Genetics, then you can understand population genetics to a level to master all of my posts. And, you know more about the field than the vast majority of professional biologists.

microevlutionaryhedrick If you want more ecologically relevant illustrations of population genetic questions, you might enjoy Philip Hedrick’s Genetics of Populations. I recall there were some issues relating to spatial and temporal variation in structured populations which this book handled in great depth. But really there’s not that much difference in terms of substance besides that between this and Hartl & Clark from what I can recall. I generally find it a somewhat less elegant work stylistically. On the other hand if human examples are more to your taste, Alan Templeton has a textbook out, Population Genetics and Microevolutionary Theory. Like the Hedrick text it doesn’t pack as dense a punch in my opinion as Hartl & Clark. Also, this is the first edition of the textbook, and I can imagine that will get better in future editions as Templeton gains a better sense of his audience.

elementsofevolutionarygillespie The above are comprehensive surveys. Charlesworth & Charlesworth have written a text which is more like an encyclopedia, Elements of Evolutionary Genetics. This is not a compact work at all, and even I find it daunting. Sometimes it feels like this work is basically a “core dump,” but if you want to look up a specific issue in a textbook, then Elements of Evolutionary Genetics will probably cover it. At the other end of the spectrum in terms of comprehensiveness is the classic Gillespie book, Population Genetics: A Concise Guide. This is more an undergraduate level work, and hammers home the most elementary of population genetic principles and fundamentals. It isn’t going to bring you up to speed on how genomics has transformed the whole field over the past 10 years, though if you are new to the discipline then that’s probably not the priority in any case.

NielsenBook In contrast, Rasmus Nielsen and Monty Slatkin’s new textbook, An Introduction to Population Genetics: Theory and Applications, is up-to-date on the latest genomics and computational methods. Because of the authors’ research focus the illustrations also are biased toward humans. This is definitely going to show you how “population genetics is done” in 2014. quantitativegenetics The focus on site frequency spectrum makes sense only in light of genomic data. A slim text, the main downside is that it’s a first edition, and seems to suffer from light editing. There are many typos and other such errors, which presumably will be cleaned up in future editions (or else there won’t be future editions!).

There are other books out there, such as Andrew Hamilton’s Population Genetics, which I can’t comment on because I don’t own them. Also, Falconer & McKay’s Introduction to Quantitative Genetics is a classic which is complementary to all the works above (it begins with population genetic fundamentals). In no way am I saying you have to buy all these books, or any of these books. The key is that you actually learn a little population genetics, and phylogenetics while you’re at it, if you want to comment intelligently on some of the technical nuances which come up on this blog.

• Category: Science • Tags: Population Genetics 
🔊 Listen RSS
Ancient DNA figure

Citation: Ancient human genomes suggest three ancestral populations for present-day Europeans

Purple text included by author of this post

At some point you have no doubt encountered trees of the sort you see to the left. They are incredibly useful visualizations of historical relationships between lineages. Breeding populations. The metaphor of the tree of life was co-opted almost immediately by evolutionary science in the 19th century. On the orders of tens of millions to billions of years the idea of diverging and bifurcating lineages is accurate to a great extent in terms of depicting the dynamics of natural history. But even on this scale the tree masks facts which are not of trivial importance. Horizontal gene transfer means that even very sharply delineated branches of the tree of life may share commonalities across wide regions of the genome. The smaller the value which defines the last common ancestors of two putative lineages, the muddier the image reflected through the lens of the tree becomes. And yet the tree visual metaphor persists when comparing populations which are rather close genetically in an evolutionary sense because of its plain utility. Trees are thick in L. L. Cavalli-Sforza’s History and Geography of Human Genes, which paints the broad and rich landscape of human populations only diverged over the past few tens of thousands of years, our own species.

This is not to ignore the self-evident fact that tips of the branches can eventually converge. Geneticists have long acknowledged, and leveraged, recent admixture between populations long separated by time and space. No one denies that African Americans coalesced out of the relations of black slaves and white settlers. Or that the population genetic landscape of Latin America can not be understood without taking into account the varied quanta of African, European, and Amerindian ancestry which defines particular locales. The reality of admixture in these cases was attested to historically, is visible in a straightforward phenotypic sense, and, can be detected using a small number of classical markers.

What has has changed over the past 10 years, and in particular the past 5 years, has been the analytic fruit born of high density marker sets. By this, I mean rather than the hundreds of markers which L. L. Cavalli-Sforza and colleagues had access to, modern statistical geneticists can extract patterns out of hundreds of thousands of markers, and often whole genomes. This allows for researchers to detect more subtle or distant events which have been erased slowly by the effects of time. To my mind the seminal paper which heralded a paradigm shift was 2009’s Reconstructing Indian History. In this publication the authors concluded that South Asians, ~20% of the world’s population, are themselves a synthetic population, derived from two primary ancestral groups. One group, “Ancestral North Indians,” (ANI) has close affinities with West Eurasians (Europeans, Middle Easterners, etc.). Another group, “Ancestral South Indians” (ASI) has distant affinities with East Eurasians. In fact nearly all Indian subcontinental populations (there are exceptions) can be modeled as a two-way admixture, with various proportions of these two ancestral populations (also see Genetic evidence for recent population mixture in India for an update). The big take home was that the admixture had been thorough and deep enough so that standard clustering techniques (e.g., PCA) could not allow one to infer that South Asians were a synthetic group, ~2,000 to ~4,000 years post-dating an amalgamation event. One major stumbling block was that no close proxy existed for ASI, which was totally absorbed into what became South Asians. But the authors made use of the fact that Andaman Islanders were sufficient substitutes for the purposes of inferring the dynamics of the admixture (they diverged ~20 to ~30,000 years before the present from ASI). Using these methods the same group also came to similar conclusions about Amerindians in Reconstructing Native American population history. Another research group concluded the same for populations in the Horn of Africa, Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool. And even stranger results can be found deeper in Africa, Ancient west Eurasian ancestry in southern and eastern Africa.

More recently there has been the finding that an ancient Siberian boy seems to be representative of a population related to West Eurasians which contributed a substantial proportion of the ancestry of the first settlers of the New World. These results were prefigured by intriguing hints in the genome-wide studies as well as uniparental lineages. The power in this case is that internal nodes in the tree of life which were once only inferred from descendants, can now be examined directly with ancient DNA. There are limitations to time and locale. DNA degrades exponentially, and even in the best of cases it seems that the edge of preservation will be on the order of 100,000 to 1 million years. Additionally, cold and dry climates are naturally going to be highly enriched for samples, because tropical wet climates are amenable to rapid degradation of biomolecules of any sort.

A_large_blank_world_map_with_oceans_marked_in_blue To me a major implication is that over the next ten years the natural history of Pleistocene metazoans of some size and numbers across the Palearctic shall be illuminated to a much greater degree than we could have imagined. First in line will be humans and dogs, and later this will expand to assorted other lineages, such as bison and elk. And it is the human part of the jigsaw which is at the heart of a recent preprint posted on bioArxiv, Ancient human genomes suggest three ancestral populations for present-day Europeans. Since it is a preprint I won’t repeat much that you can read for yourself. I want to emphasize though that you really should read the supplements if you want more than spare conclusions. As the title states the authors conclude that overall you require at minimum three ancestral populations over a post “Out-of-Africa” time scale to model the dynamics of the emergence of Europeans. Though there were hints of this utilizing results from extant populations, the presence of ancient DNA truly pushed the ability to draw conclusions over the edge. That is because it seems that few of these ancient populations exist in “pure” form. One of the major shortcomings of drawing conclusions from distributions of populations in the present about the past is that interactions and admixtures were far more thoroughgoing than researchers had imagined.

The figure (modified) at the top of this post lays out the findings. In the preprint the authors arrive at the simplest model which can explain the most data. They acknowledge freely that there are likely modifications and elaborations on the edges and margins, and that the data might be explained by more complex models, but the key outcome is that they have rejected more parsimonious models which were once ascendant in regards to the ethnogenesis of Europeans. Ten years ago (see Seven Daughters of Eve) some researchers were presenting a cartoon model of hunter vs. farmer, as if these were two distinct options for the origins of all Europeans. But it turns out even the more nuanced and realistic models which posit varied degrees of genetic and cultural assimilation and interaction were false. Which seems clear from these data, and the ancient DNA, is that a substantial minority fraction of the ancestry of Europeans derives from a third population of northern Eurasian provenance.

Vision1 Going back to the lack of parsimony, to the left you see a model of diversification outside of Africa that many had in mind until recently. In this framework a small population of northeast Africans left that continent 50 to 100 thousand years ago, and populated the rest of the world. One group moved east, and gave rise to the populations of eastern Eurasia, as well as Australasia and the New World. Another branch moved north and west, and gave rise to Europeans and and Middle Easterners. The rest of population history might be modeled then as admixtures and rearrangements of this original diversification. In this scenario South Asians are an admixture of West Eurasians and an extinct branch of East Eurasians, explaining their affinities to both great branches of humanity. The divergent nature of Australians might simply be an artifact of their long term isolation in Oceania, rather early on in the diversification of East Eurasians. This model was already difficult to square with genetic data, but it could be shoehorned. Or at least I thought it could, because I did so myself.

Vision2 The simplest form of the new model complexifies the topology considerably. Now there is an early branch off of a Eurasian population prior to the diversification of West and East Eurasians, and within the western clade there is a separation between a North Eurasian group, and West Eurasians proper. Putting the focus on Europeans, they may be thought of as a complex admixture between Basal Eurasians, West Eurasians, and North Eurasians. The Basal Eurasian component is mediated by “Early European Farmers,” EFF, who seem to be a hybrid between this group and West Eurasian hunter-gatherers. The North Eurasian component seems to be both ancient and recent. Ancient because some Swedish hunter-gatherers had it (though the Central European one lacked it), and recent because the EFF populations which evident ancient Near Eastern ancestry lacked it, suggesting that it was not as widely present across western Eurasia as it is now. In fact, it is present in high fractions across many Middle Eastern populations, especially the Caucasus. Though the authors studiously avoiding speculating, it is clearly intriguing to them that the North Eurasian component is so widespread, and, that it is likely that it expanded relatively recently. Like the Denisovans the pesence North Eurasian DNA from the far north may simply be a function of biased preservation.

How the authors inferred the existence of Basal Eurasians is rather convoluted, and outlined in the supplements. In many ways this is the only simple model which fulfills all the conditions of their data. The key finding is that the European hunter-gatherers, both Central and Northern, were equally genetically closer to all East Eurasians than EFF. This sort of symmetrical relatedness implies that it is not admixture, but reflecting an ancient, but more recent than the outgroup, bifurcation in the phylogenetic tree. The EFF distance from East Eurasians is a function of the earlier divergence of their Basal Eurasian ancestry. The nature of the Basal Eurasians is left somewhat opaque. One can posit many scenarios of ancient population structure in the Near East, or migrations back and forth between these region and Africa. More data, and especially ancient DNA from the Near East, would clarify the model (unfortunately modern Near Eastern populations are high admixed).

Citation:  Mallick et al.

Citation: Mallick et al.

Though I have focused on phylogenetics, the authors had enough marker density to draw some functional conclusions. In particular they found that the Central European hunter-gatherer had some of the distinctive pigmentation mutations common to Europeans (and lesser extent other West Eurasians), such as at the OCA2-HERC2 ‘blue eye’ locus, as well as SLC45A2. But what was shocking to me is that the hunter-gatherer was fixed for the ancestral homozygous state at SLC24A5. To most of you that might not mean anything, but SLC24A5 is almost always homozygous in the derived state in modern Europeans. The HapMap data set as 329 alleles at this SNP for Europeans, whites of Northwest European heritage and Tuscans from Italy. There is only one copy of the ancestral alleles in the whole data set. Assuming that the result is not a genotyping error of some sort, a homozygote at this locus implies to me that the evidence for a strong selective event in this region (it has a long haplotype) within the last ~10,000 years is correct. The widespread distribution outside of Europe of the derived variant of SLC24A5 means we may not be looking at an originally ‘European’ allele, even if it is fixed in Europeans today. No doubt there will be much more in terms of our understanding of functional and population genetics through the window which ancient DNA allows us to view the past.

There are so many details, and so little time. Because it is a preprint you really should read the whole thing (several times). You are part of the revision process in some sense. But I think the general finding that the past is much more complex than we’d imagined will stand the test of time. On some level everyone understood that the trees illustrating genetic relationships on species which exhibit evidence of extensive gene flow were stylized representations which elided a great deal. But in the case of humans thanks to ancient DNA we see just how much that representation masked. Admixture events were collapsed back into the tree to such an extent that it may have been grossly simplified, and our understanding of past demographic events were sorely lacking in realism. We know this about humans across Northern Eurasia because they’ve been extensively studied, and, we have ancient DNA. Unfortunately due to climate we may never have ancient DNA from the tropics, or from many organisms due to the constraints of preservation (e.g., fish?). But I think that we need to update our null hypotheses. This may mean we give up some cherished models which explain things in a neat fashion, but obscure complexity which is truer to reality is preferable to elegant models which lead us to falsity. Perhaps we should finally end our love affair with the beautiful tree, and admit the virtues of a rambling graph.

Citation: Ancient human genomes suggest three ancestral populations for present-day Europeans.

Addendum: I’ve seen references in internet discussions to affinities in Admixture plots of MA1 (the Siberian boy). Please remember that because we only have one MA1 individual that individual will be forced to be a combination of populations generated from the groups where we have many individuals. So some of the strange and intriguing results are just nonsense, as the algorithm is trying to find the best fit to confusing conditions.

• Category: Science • Tags: Ancient DNA, Population Genetics 
🔊 Listen RSS

Jack Kerouac, credit: Tom Palumbo

The Pith: Higher Mendelian disease rates among French Canadians may be due to their demographic history.

As I have noted before, demographic bottlenecks with extremely strong effects on the character of population genetic variation need to be very radical in their nature to be of any significance. The population pinhole has to be on the order of hundreds, rather than thousands, of individuals. But that does not preclude more modest bottlenecks generating subtle shifts in the genetic site frequency spectrum. Strong bottlenecks may be needed to drive wholesale extinction of once common alleles (or the fixation of those at moderate frequencies), but mild bottlenecks may nevertheless perturb the allele frequency distribution. In particular, the number of alleles which are present at very low frequencies can be strongly impacted by demographic variation and natural selection. This is the logical rationale which serves as the basis for nucleotide sequence based tests for detecting natural selection, such as Tajima’s D. An excess of low frequency variants suggest a bottleneck and subsequent population expansion, or positive and/or purifying selection. In contrast, balanced polymorphism frequencies point to a shrinking population or balancing selection.

Citation: Casals F, Hodgkinson A, Hussin J, Idaghdour Y, Bruat V, et al. (2013) Whole-Exome Sequencing Reveals a Rapid Change in the Frequency of Rare Functional Variants in a Founding Population of Humans. PLoS Genet 9(9): e1003815. doi:10.1371/journal.pgen.1003815

These basic ideas have been around for decades, but it is with powerful genomic technologies that they are truly giving us actionable insights. A new paper in PLOS Genetics lays it out simply enough, Whole-Exome Sequencing Reveals a Rapid Change in the Frequency of Rare Functional Variants in a Founding Population of Humans. To the left you see a site frequency distribution for French and French Canadian populations. What is clear is that for the derived allele (i.e., mutations from the ancestral state) distribution in exonic regions of the genome French Canadians are much more skewed toward the low frequency portion of the spectrum than French proper. This skew is more noticeable for deleterious mutations, such as nonsense and missense mutations (nonsense mutations usually produce nonfunctional protein, while missense mutations may alter the nature of the protein in some specific detail through amino acid substitution).

The focus here is on exons, ~1% of the genome, because these are the regions that are translated into the final protein product, and the authors seem particularly interested in the functional consequences of the site frequency spectrum of the French Canadians. This makes sense, because the French Canadian population has long been known to have a somewhat high burden of recessive diseases. Why? As noted in the paper the French Canadian ancestry derives overwhelmingly from a founding population of less than 10,000. Not only that, but this expanding population exhibits geographic substructure, with demographic expansion being particularly powerful along the edge of the pale of Quebecois settlement. This results in increased genetic drift on the edge, as a smaller portion of the population contributes to descendants pushing over the frontier. The key is to note how striking it is that a few hundred years of demographic separation can result in the emergence of ‘private alleles.’ To a great extent this is intuitively obvious, as private alleles emerge de novo in families, and many French Canadian families have had many of generations separated from the ancestral homeland to accumulate distinctive markers specific to their lineage.

Over the long term many “x” whole-genome coverage (so on average the same base can be found in 10 or 20 or 30 reads to reduce possible false positives) is going to be ubiquitous, and we’ll get a sense of the distribution of genetic load within and across families. One major demographic-historical dynamic highlighted in this paper is that serial bottleneck events in human history (e.g., the “Out of Africa” migration) may endow different populations with different site frequency spectra, and so imply diverse genetic disease loads. Seeing as how genomic work tends to be focused on populations of European descent we haven’t truly explored these sorts of inter-population possibilities in great depth, but they’re in the offing. I suspect for example that Indian subcastes will tend to have many private alleles due to bottlenecks and recent expansions. And, in the short-term this may also redound to the benefit of those who argue for the benefits of genetic diversity through random mating across populations.

Citation: Casals F, Hodgkinson A, Hussin J, Idaghdour Y, Bruat V, et al. (2013) Whole-Exome Sequencing Reveals a Rapid Change in the Frequency of Rare Functional Variants in a Founding Population of Humans. PLoS Genet 9(9): e1003815. doi:10.1371/journal.pgen.1003815

• Category: Science • Tags: Population Genetics 
🔊 Listen RSS

Sewall Wright
Credit: University of Wisconsin, Madison

You have probably heard or read that most genetic variation is within races, not between races. This assertion has led, in my opinion, to unwarranted inferences. Often bracketed under “Lewontin’s Fallacy”, the basic intuition is that if most variation is within races, then races as a taxonomic unit are without utility or substantive basis. This is disputable. In plain English, though most genetic variation may be within races (i.e., not diagnostic of racial identity), the variation across races is quite systematic because that variation reflects deep population history. In this way of thinking population or racial substructure are simply reflections of the tips of the tree which has been shaped by history.

But these discussions are ultimately predicated upon a statistic, F ST. F ST is generally considered one of the fixation indices pioneered by the American evolutionary geneticist Sewall Wright. What Wright’s F ST aims to capture is the relative amount of genetic variance which is due population substructure. In regards to human races out of the total genetic variation ~15% of it can be inferred simply by looking at population substructure (F ST ~0.15), with the balance not being due to population structure. But this is an average value. At rs1426654 in SLC24A5 when comparing Europeans and Africans almost all of the variation is between the populations, because the allele frequencies are disjoint. But what if I told you that Wright’s F ST is quite a bit woollier than you might think?

Citation: Patterson, Nick, Alkes L. Price, and David Reich. “Population structure and eigenanalysis.” PLoS genetics 2.12 (2006): e190.

The issue here is that measuring genetic distance is not like measuring acceleration or length. Acceleration is a clearly defined phenomena with a first order relationship to material entities, while length is a physical property of concrete objects. What population genetics is attempting to do is formalize and render abstract phenomena whose ultimate basis are not constrained by human preconceptions or easily amenable to intuitions, and may be nested within other abstruse constructions. In most cases what “genetic distance” really is is a way for humans to be able to conceptualize easily patterns of variation which are the outcome of complex historical processes. Often the interest of population geneticists is not in taxonomy as such, but the historical events which can be inferred by the classifications.

Wright’s F ST is useful because it gives you a number. And, due to its age it is also easy to compute using single marker data, as was prevalent before the molecular revolution of the 1960s. Today I much prefer visualizations of genetic relationships such as can be found in principal component analysis, or the ubiquitous bar plots of explicit population model clustering (e.g., Admixture or Structure). But if you are submitting a paper for peer review you may still be asked to provide F ST, meaning that this is still a relevant statistic.*

This is why a new preprint in Genome Research is very important for scientists working in this area, Estimating and interpreting F ST : the impact of rare variants. I had a short conversation with Gaurav Bhatia, the first author, at ASHG 2012, so I was waiting for this preprint to come out. In the text the authors provide explicit guidelines for ‘best practices’ on using and computing F ST. This is needed. I myself have shied away from using F ST much because I have seen that different methods give different results. Yes, qualitatively coherent, but it is not reassuring as F ST purports to a precise quantity.

The problem seems to be that F ST emerged in an earlier pre-genomic era, and with genome-wide dense SNP data biases, distortions, and inconsistencies across different F ST frameworks are starting to emerge. As an empirical result the authors point out that a recent paper has claimed that F ST < 0.10 for human populations using 1000 Genomes data. This is lower than the values inferred from HapMap3. Why? One possibility is that the 1000Genomes data is enriched for rare variants, which are likely to have emerged after the divergence of the populations from a common ancestor. This is problematic because many variants of F ST are predicated on a divergence from a common ancestor, and so should be evaluating shared variation (the authors observe that highly heterozygous alleles with a bias toward private alleles can paradoxically result in very low F ST). Because of the importance of taking into account shared and diverged population history the authors recommend ascertaining the SNPs in an outgroup, if possible (if not, then make the ascertainment strategy explicit and sample different genomics regions to get a sense of possible biases or distortions).

Additionally, there are problems with unequal sample sizes, as well as using pooled SNPs so as to compute individual distance values and taking the average of the results. They term the latter “average of ratios” (the ratio between the variance components), and conclude that this will underestimate the F ST, and that that is what occurred in the 1000 Genomes paper above. Rather, they recommend that taking the ratio of the average variances across the SNPs are less biased. This is where the pre-genomic origins of F ST show, as this would not be an issue in an age of few markers. But with the copious data from the 1000 Genomes these distortions can be amplified and result in genuine confusions about the biological history of a population.

Finally, they make explicit recommendations as to the form of F ST to use:

Hudson estimator > Weir and Cockerham > Nei

All this goes to show that even in established science it is important check your premises. Too often F ST is simply a black-box, one of the elements which you have to check off. But it is a tool which should be used with subtle understanding.

Addendum: Alkes Prices’ software page has some great resoruces. And there’s a new version of Eigensoft! I know what I’m going to do this weekend….

Citation: Genome Research, Estimating and Interpreting FST: the Impact of Rare
Variants, Gaurav Bhatia1, Nick Patterson2, Sriram Sankararaman, Alkes L. Price. doi:10.1101/gr.154831.113
* In F ST is still useful in many cases as part of a broader population genetic toolkit.

• Category: Science • Tags: Fst, Population Genetics 
🔊 Listen RSS

Textbooks are often very expensive. For example the most recent edition of Principles of Population Genetics will run you $50-$100. But it has come to my attention that the third edition of this textbook is potentially much cheaper, with some copies in the <$10 range! Population genetics isn’t quite like math, where 19th century works are still interesting to the non-specialist (population genetics was ‘invented’ in the first decades of the 20th century!). But a lot of the older material is totally relevant and on point, so these previous editions are not just simply historical curiosities. Much of the third and fourth editions of this particular work overlap anyhow, the big difference obviously being genomic techniques and such. But for that you should probably read papers, many of which are thankfully being put out as preprints now.

While I’m at it, I notice there are some affordable copies ($10-$20) of Genetics of Human Populations. This is a very old work from decades ago. But it is an encyclopedic treatment of the post-World War II human genetic literature, much of which has been forgotten or faded, but perhaps should not be discarded so lightly….

Update: I realize on second thought that it was remiss of me to not point out that Graham Coop and Joe Felsenstein both have free population genetics resources which readers might find highly useful. Also see these notes from U Conn.

• Category: Science • Tags: Population Genetics 
🔊 Listen RSS

The Black Death

I noticed during Peter Ralph and Graham Coop’s Ask Me Anything about their new paper, The Geography of Recent Genetic Ancestry across Europe, someone brought up the effects of plague. Recall that ~1/3 of Europe’s population died during the Black Death. And population size reductions on the order of ~50% due to epidemics are not unknown in human history. Surely this would have a major genetic effect? Well, in fact it would have a genetic effect due to possible adaptations to disease (see CCR5). But there would be little overall impact on genetic diversity, at least in the short term. That is because for bottlenecks to produce major change in the genetic character of a population they have to be rather extreme in magnitude.

This issue came to mind for me in 2009 when I watched Stark Trek. If you haven’t watched the J. J. Abrams reboot, and are a spoilerphobe, read no more! Now, with that out of the way you may recall that during this film the Vulcans suffered a genocidal attack. Out of billions of Vulcans only ~10,000 survived. Here’s some commentary on the possible consequences, New Star Trek Movie: A Vulcan Holocaust?:

Yes, there is a remnant of ten thousand Vulcans left. At the end of the movie, we are told that they have found a new planet to settle on. Still, we must ask: If we are now in a new timeline and all we have left are a few thousand survivors, will the Vulcans have any political influence at all? Or will they just become a relic on a museum planet? Spock even refers to his people as an endangered species.

It would seem the Vulcans will have no other choice but to accept “converts” if they want to survive, because 10,000 is not really a very big gene pool in the long haul. The Amish, who do not accept converts or newcomers, have become very inbred and are now facing problems with genetic diseases. European Jews, who lived in isolated communities for many centuries, also carry certain genetic diseases. However, the recent influx of Jews by Choice is bringing new DNA patterns into the community, so that Jews have fewer such problems than the Amish.

3.5% growth per year

First things first. Vulcans would have no problem reestablishing their population on a virgin planet. It’s simply the power of exponential growth. The nation of East Timor has a growth rate of 3.5% per year (total fertility rate ~6 per woman). This is not an outlandish value. The Puritans of New England maintained higher fertility for several generations. The key here is that humans (or humanoids) are like any organism when faced with a Malthusian surfeit: they breed. Though Vulcans live longer than humans, and have some life history quirks, I’m rather confident that Vulcans could reproduce at least as fast as humans. The reality is that they’re superior to humankind in almost every way possible (their lack of emotions is a testament to culture, not biology). Some quick computations tell me that it would take 400 years for Vulcans to get back to a population of 10 billion. Since some Vulcans can live longer than two centuries, this seems like a rather short window of time.

But what about the second clause? Vulcan genetic diversity. Vulcans are logical, so I’m rather confident that they would have sampled diverse populations when evacuating. And to my knowledge I am not aware of an ethnic skew of Vulcans who were resident across the Federation. So with concerns of representativeness addressed, what would such a crash in population entail?

First you need to become familiar with the concept of an effective population, Ne. Consider that in any given generation some individuals shall breed and some shall not. Though the count of population may be x, the count of those who contribute to the next generation is invariably (x – those who do not breed). And it is this inter-generational transfer which is relevant to population genetics. Also, for the purposes of genetics deep history matters a great deal. Bottlenecks have an inordinate impact on the long term effective population. Intuitively, consider the case of a large population which goes through an extreme bottleneck, and then expands again. The average census size over that time might be rather substantial. But for genetic purposes the lineages are likely to coalesce back to a few common ancestors at the bottleneck. The impact of the pre-bottleneck period is attenuated, because much of the population was simply not genetically sampled. It may as well have not existed!

To make it concrete, below is a toy example. Imagine an island with 10,000 individuals which undergoes population crashes. You see the results below.


The total number of individuals over the 30 generations across the three scenarios is about the same. But the long term effective population in the scenario where the size dropped to 10 is 30 times smaller than the case where the size was reduced to 10% of the prior value.

But what does this do to genetics? There are complicated ways to model this, because populations may be in mutation/drift/selection equilibrium, with the bottleneck being a temporary perturbation. But one way to think about the issue is that a bottleneck can drop heterozygosity by about a factor of 1-1/(2Ne). As Ne → ∞ there is no change. But 1-1/(2Ne), where Ne is 1,000 to 10,000 (assuming that Ne is smaller than the census size of 10,000), is not implying a great change in heterozygosity. Of course many rare alleles, or alleles private to families, will be lost. But so long as the Vulcan population was reasonably representative (not inbred), then I think they don’t have much to fret about in terms of genetic health.

The purpose of this post was not to answer a question of deep interest to Trekkies. Rather, it was to encourage people to establish some intuitions about these sorts of demographic processes and their effect upon genetics.


Hartl, Daniel L., and Andrew G. Clark. Principles of population genetics. Vol. 116. Sunderland: Sinauer associates, 1997.

Nei, Masatoshi, Takeo Maruyama, and Ranajit Chakraborty. “The bottleneck effect and genetic variability in populations.” Evolution (1975): 1-10.

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"