The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information



=>
Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
/
Human Evolutionary Genomics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

Likely an individual with derived allele on KITL locus (Credit: David Shankbone)

An individual polymorphic on the KITL locus? (Credit: David Shankbone)

Pigmentation is one of the few complex traits in the post-genomic era which has been amenable to nearly total characterization. The reason for this is clear in hindsight. As far back as the 1950s (see The Genetics of Human Populations) there were inferences made using human pedigrees which suggested that normal human variation on this trait was controlled by fewer than ten genes of large effect. In other words, it was a polygenic character, but not highly so. This means that the alleles which control the variation are going to have reasonably large response, and be well within the power of statistical genetic techniques to capture their effect.

I should be careful about being flip on this issue. As recently as the mid aughts (see Mutants) the details of this trait were not entirely understood. Today the nature of inheritance in various populations is well understood, and a substantial proportion of the evolutionary history is also known to a reasonable clarity as far as these things go. The 50,000 foot perspective is this: we lost our fur millions of years ago, and developed dark skin, and many of us lost our pigmentation after we left Africa ~50,000 years ago (in fact, it seems likely that hominins in the northern latitudes were always diverse in their pigmentation)


A new paper in Cell sheds some further light on the fine-grained details which might be the outcome of this process. Being a Cell paper there is a lot of neat molecular technique to elucidate the mechanistic pathways. But I will gloss over that, because it is neither my forte nor my focus. A summary of the paper is that it shows that p53, a relatively well known tumor suppressor gene, seems to have an interaction with a response element (the gene product binds in many regions, it is a transcription factor) around the KITLG locus. This locus is well known in part because it has been implicated in pigment variation in human and fish. So KITLG is one of the generalized pigmentation pathways which spans metazoans. There are derived variants in both Europeans and East Asians which are correlated with lighter skin, though there is polymorphism in both cases (it has not swept to fixation).

The wages of adaptation? (Credit: Hoggarazzi Photography)

The wages of adaptation? (Credit: Hoggarazzi Photography)

But this is a Cell paper, so there has to be a more concrete and practical angle than just evolution. And there is. It turns out that a single nucleotide polymorphism mutation in the p53 response element results in a tendency toward upregulation of KITLG and male germ line proliferation. The latter matters when it comes to tumor genesis, and in particular testicular cancer. This form of cancer is one where there doesn’t seem to be a somatic cell mutation of p53 itself. Additionally, the authors observe that testicular cancer manifests at a 4-5 fold greater rate in people of European descent than African Americans. And, presumably the upregulation of KITLG is somehow related to increased melanin production. The authors posit that because of lighter skin in Europeans due to selection at other loci there has been a balancing effect at KITLG (increased tanning response). There is evidence of selection at this locus (a long haplotype and increased homozygosity), so this is not an unreasonable conjecture, though the high frequency of loss of function alleles suggests that the model is likely complex.

I don’t know if this particular story is correct in its details (though I am intrigued that variation in KITLG is associated with cancer in other organisms). But it illustrates one of the possible consequences of rapid evolutionary change due to human migration out of Africa: deleterious side effects because of pleiotropy. In other words, as you tinker with the genomic architecture of a population you are going to have to accept tradeoffs as you are optimizing one aspect of function. Genes don’t have just one consequence, but are embedded in myriad pathways. Over time evolutionary theory predicts a slow re-balancing, as modifier genes arise to mask the deleterious side effects. But until then, we will bear the burdens of adaptation as best as we can.

Citation: Zeron-Medina, Jorge, et al. “A Polymorphic p53 Response Element in KIT Ligand Influences Cancer Risk and Has Undergone Natural Selection.” Cell 155.2 (2013): 410-422.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Citation: Corona, Erik, et al. “Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration.” PLoS Genetics 9.5 (2013): e1003447.

The above figure is from a paper in PLoS GENETICS, Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration. The authors synthesize two diverse domains of human genomics. First, there are biomedically focused genome-wide association studies and their like which attempt to identify risk alleles for particular diseases. In some cases these risk alleles are very penetrant, in that a particular state predicts with high likelihood a disease phenotype. But in most cases the yield is elevated or decreased risks for highly complex traits such as type 2 diabetes. Second, there is the domain of evolutionary genomics which attempts to reconstruct a phylogenetic and population genetic history so as to frame contemporary patterns of variation in their proper context. How this might be important or of interest is obvious in the case of malaria resistance genes. Alleles conferring resistance have arisen in multiple populations due to parallel environmental pressures. Phylogenetic relationships between these populations should inform your predictions as to the likely similarities of the mutations between the populations. Meanwhile, population genetic theory can give you clues as to the likelihood of multiple adaptations.

The goal here is to increase understanding of the nature of the emergence of disease, and perhaps target individual risk more effectively. Above in the figure you see two interesting patterns: risk for type 2 diabetes alleles as a function of descent, and risk as a function of de novo mutation or independent selective event. The phylogenetic tree represents real relationships as inferred from the >600,0000 SNPs in the HGDP data set. The risk alleles were culled from the literature, and were computed for individuals and populations. The real population risks were then compared to a model of risks which might occur in a scenario with this particular phylogenetic tree and the normal process of random genetic drift (see methods for the gory details!). What you see are phylogenetic relationships (African populations shifted toward higher risk) and independent events (Pima Indians shifted toward higher risk) where there is a higher risk toward diabetes (red shifted).

There are all sorts of shortcomings to this analysis. The authors are limited by the risk alleles in their study, which is certainly far less than thorough or exhaustive. Additionally, their population coverage was thin in some regions, resulting in reduced ability to even squeeze power from their model in particular cases. But one thing that jumps out at you is that the patterns here inferred from risk alleles in a highly polygenic disease like type 2 diabetes don’t even track what you see in the real world. Many South Asian groups have very high risks of type 2 diabetes. It just so happens that these groups are not in the HGDP sample. There is actually a rather informative critique from two epidemiologists in the comments of this paper. They make many points that came to mind in the specifics. But they ended in a fashion which raised my eyebrows:

Finally, the need to avoid stigmatizing populations based on genetic risk has been much discussed.It is not difficult to imagine a media announcement based on this publication – “Genetic risk of diabetes found in African populations”. Similar claims were made for intelligence not very long ago. Not all speculation is neutral.

As it happens I come from a population with very high risk for metabolic disease. I have no idea if I’m stigmatized by this fact, but I am very glad that medical professionals are becoming aware of differential risks, and moving beyond coarse one-size-fits-all understandings of human health. The BMI values developed for European Americans are probably rather inappropriate to South Asians because of the way we distribute fat (in short, we need to be thinner to exhibit the same risk profile all things equal). Again , I have no idea if this is stigmatizing, but it is real.

So despite all the real concerns I have with the methodology in the paper above, I believe that these sorts of analyses are essential parts of the broader answer. We now live in the age of the antiobiotic revolution and an understanding of germ theory. Those were the big returns on investment for public health. For the short term gains in human well being and life expectancy are going to be on the margin, through increments. Despite all the skepticism I have with initial attempts to work out the relationship between population history and disease, one must begin somewhere.

Citation: Corona, Erik, et al. “Analysis of the Genetic Basis of Disease in the Context of Worldwide Human Relationships and Migration.” PLoS Genetics 9.5 (2013): e1003447

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

There’s an excellent paper up at Cell right now, Modeling Recent Human Evolution in Mice by Expression of a Selected EDAR Variant. It synthesizes genomics, computational modeling, as well as the effective execution of mouse models to explore non-pathological phenotypic variation in humans. It was likely due the last element that this paper, which pushes the boundary on human evolutionary genomics, found its way to Cell (and the “impact factor” of course).

The focus here is on EDAR, a locus you may have heard of before. By fiddling with the EDAR locus researchers had earlier created “Asian mice.” More specifically, mice which exhibit a set of phenotypes which are known to distinguish East Asians from other populations, specifically around hair form and skin gland development. More generally EDAR is implicated in development of ectodermal tissues. That’s a very broad purview, so it isn’t surprising that modifying this locus results in a host of phenotypic changes. The figure above illustrates the modern distribution of the mutation which is found in East Asians in HGDP populations.

One thing to note is that the derived East Asian form of EDAR is found in Amerindian populations which certainly diverged from East Asians > 10,000 years before the present (more likely 15-20,000 years before the present). The two populations in West Eurasia where you find the derived East Asian EDAR variant are Hazaras and Uyghurs, both likely the products of recent admixture between East and West Eurasian populations. In Melanesia the EDAR frequency is correlated with Austronesian admixture. Not on the map, but also known, is that the Munda (Austro-Asiatic) tribal populations of South Asia also have low, but non-trivial, frequencies of East Asian EDAR. In this they are exceptional among South Asian groups without recent East Asian admixture. This lends credence to the idea that the Munda are descendants in part of Austro-Asiatic peoples intrusive from Southeast Asia, where most Austro-Asiatic languages are present.

And yet one thing that jumps out at me is that there is no East Asian EDAR in European populations, even in Russians. I am a bit confused by this result, because of the possibility of Siberian-affiliated population admixture with Europeans within the last 10,000 years, as adduced by several researchers (this is not an obscure result, it manifests in TreeMix repeatedly). The second figure shows the inferred region from which the East Asian EDAR haplotype expanded over the past 30,000 years. The authors utilized millions of forward simulations with a host of parameters to model the expansion of EDAR, so that it fit the distribution pattern that is realized (see the supplements here for the parmeters). To make a long story short they infer that there was one mutation on the order of ~30,000 years before the present, and that it swept up in frequency driven by selection coefficients on the order of ~0.10 (10% increase relative fitness, which is incredibly powerful!). This is on the extreme end of selective sweeps, and likely of the same class as the haplotype blocks which characterize SLC24A5 and LCT (the block is shorter, though that makes sense because of the deeper time depth). Again, I am perplexed why such an ancient allele, which is found in Amerindians, or Munda populations, is absent in Europeans who have putative East Eurasian admixture. The whole does not cohere for me. There is a weak point in one or more of my assumptions.

Then there’s the section on the mouse model. To me this aspect was ingenious, though I’m not particularly able to assess it on its technicalities. The earlier usage of mouse models to test the effects of mutations on EDAR was in the context of coarse copy number changes which resulted in massive dosage changes of protein. The phenotypic outcomes were rather extreme in that case. Here they used a “knockin” model where they recreated the specific EDAR point mutation. Instead of extreme phenotypes they found that the mice were much more normal in their range of traits, though the hair form shifts were well aligned with what occurred in humans. Additionally there were some changes in the number of eccrine glands, with a larger number in the derived East Asian EDAR carriers (with additive effect). Finally they noticed that there were differences in mammary gland pad area and branching. None of this is that surprising, EDAR is a significant regulatory gene which shapes the peripheries and exterior of an organism.

To double check the human relevance of what they found in the mouse model they performed a genome-wide association in a large cohort of Han Chinese. The correlations of particular traits were in the directions that they expected; those individuals with East Asian EDAR variants had thicker hair, shovel-shaped incisors, and a greater density of eccrine glands. It is perhaps important to note that the frequency of the derived variant is so high in Han populations that they didn’t have enough homozygote ancestral genotypes to perform statistics, so their comparisons involved heterozygotes with the derived mutant and also a copy of the ancestral state. This is like SLC24A5 in Europeans, where it is difficult to find individuals of European heritage who have double copies of the non-European modal variant.

Let’s review all the awesome things they did in this study. They dug deeply into the evolutionary genomics of the region around the EDAR, concluding that this haplotype was driven up in frequency from on ancestral variant ~30,000 years ago in a hard selective sweep. And a sweep of notable strength in terms of selection coefficient. This may be one of the largest effect targets of natural selection in the genome of non-Africans over the past 50,000 years. Second, they used a humanized mouse model to explore the range of phenotypes correlated with this mutational change in East Asians. So you have a strong selection coefficient on a locus, and, a range of traits associated with changes on that locus. Third, they confirmed the correlation between the traits and the mutation in humans, despite there being prior research in this area (i.e., they reproduced). This is all great science, and shows the power of collaboration between the groups.

Much of the elegance and power of the paper applies to the discussion section as well, but to be frank this is where things start falling apart for me. You can get a sense of it in The New York Times piece, East Asian Physical Traits Linked to 35,000-Year-Old Mutation. The headline here points to a legitimately important inference from this line of research, many salient physical characteristics of the human races seem to be due to strong selection events at a few loci. In addition to EDAR I’m thinking of the pigmentation loci, such as SLC24A5. I wouldn’t be surprised if there was something similar for the epicanthic fold. If it is visible, and defines between populations differences, it is generally not genomically trivial. There’s usually a story underneath that difference.

In the broad scale of human natural history the problem that arises for me is that we have traits, we have genes under selection, but we have very weak stories to explain the mechanism and context of natural selection. Here there is a strong contrast with the loci around lactase persistence and malaria resistance. In those situations the causal mechanism for the selection seems relatively clear. Critics of evolutionary psychology are wont to accuse the field of ‘Just So’ storytelling, but the same problem crops up in the more intellectually insulated domain of evolutionary genomics (in part because the field is very new, and also mathematically and computationally abstruse). To illustrate what I’m talking about I’m going to quote from the discussion of the above paper:

A high density of eccrine glands is a key hominin adaptation that enables efficient evapo-traspiration during vigorous activities such as long-distance walking and running (Carrier et al., 1984; Bramble and Lieberman, 2004). An increased density of eccrine glands in 370A carriers might have been advantageous for East Asian hunter-gatherers during warm and humid seasons, which hinder evapo-transpiration.

Geological records indicate that China was relatively warm and humid between 40,000 and 32,000 years ago, but between32,000 and 15,000 years ago the climate became cooler and drier before warming again at the onset of the Holocene (Wang et al., 2001; Yuan et al., 2004). Throughout this time period, however, China may have remained relatively humid due to varying contribution from summer and winter monsoons.

High humidity, especially in the summers, may have provided a seasonally selective advantage for individuals better able to functionally activate more eccrine glands and thus sweat more effectively (Kuno, 1956). To explore this hypothesis, greater precision on when and where the allele was under selection—perhaps using ancient DNA sources—in conjunction with more detailed archaeological and climatic data are needed.

A climate adaptation is always a good bet. The problem I have with this hypothesis is that modern day gradients in the distribution of this allele are exactly the reverse of what one might expect in terms of adaptation to heat and humidity. Additionally, is there no cost to this adaptation? After the initial sweep upward, the populations where the derived EDAR mutant is found in high frequencies went through the incredible cold of the Last Glacial Maximum, and groups like the Yakuts are known to have cold adaptations today. Not only that, but the Amerindians from the arctic to the tropics all exhibit a cold adapted body morphology, the historical consequence of the long sojourn in Berengia.

Granted, the authors are not so simplistic, and the somewhat disjointed discussion alludes to the fact that EDAR has numerous phenotypic effects, and it may be subject to diverse positive selection pressures. This seems plausible on the surface, but this complexity of mechanism seems ill-fitted to the fact that the signal of selection around this locus is so clean and crisp. It seems that this is not going to be an easy story to unpack, and there’s a good deal of implicit acknowledgement of that fact in this paper. But tacked right at the end of the main text is this whopper:

It is worth noting that largely invisible structural changes resulting from the 370A allele that might confer functional advantage, such as increased eccrine gland number, are directly linked to visually obvious traits such as hair phenotypes and breast size. This creates conditions in which biases in mate preference could rapidly evolve and reinforce more direct competitive advantages. Consequently, the cumulative selective force acting over time on diverse traits caused by a single pleiotropic mutation could have driven the rise and spread of 370A.

A simple takeaway is that the initial climatic adaptation may have given way to a cultural/sexual selective adaptation, whereby there was a preference for “good hair” as exemplified by pre-Western East Asian canons (black and lustrous), as well as a bias toward small breasts. This aspect gets picked up in The New York Times piece of course. I’ll quote again:

But Joshua Akey, a geneticist at the University of Washington in Seattle, said he thought the more likely cause of the gene’s spread among East Asians was sexual selection. Thick hair and small breasts are visible sexual signals which, if preferred by men, could quickly become more common as the carriers had more children. The genes underlying conspicuous traits, like blue eyes and blond hair in Europeans, have very strong signals of selection, Dr. Akey said, and the sexually visible effects of EDAR are likely to have been stronger drivers of natural selection than sweat glands.

The passage here is ambiguous because the author of the article, Nick Wade, doesn’t use quotes, and I don’t know what is Akey and what is Wade’s gloss on Akey. For example, for theoretical reasons of reproductive skew (a few men can have many children) in general sexual selection is considered to be driven most often by female preference for male phenotypes. I assume Akey knows this, so I suspect that that section is Wade’s gloss (albeit, a reasonable one given the proposition of preference for smaller breasts). The main question on my mind is how seriously prominent population geneticists such as Joshua Akey actually take sexual selection to be as a force driving variation and selection in human populations. It seems that quite often sexual selection is presented as a deus ex machina. A phenomenon which can rescue our confusion as to the origins of a particular suite of traits. But our assessment of the likelihood of sexual selection presumably has to be premised on prior expectations informed by a balance of different forces one can gauge from the literature, and here my knowledge of the current sexual selection literature is weak. Perhaps my skepticism is premised on my ignorance, and the population geneticists who proffer up this explanation are more informed as to the state of the literature.

All this brings me back to the farcical title. When this paper first made news last week I was having dinner with a friend of Japanese heritage (who spent his elementary school years in Japan). I asked him point blank, “Do you like small breasts?” His initial response was “WTF!?! Razib,” but as a mouse geneticist he understood the thrust of my question after I outlined the above results to him. From personal communication with many East Asian American males I am not convinced that there is a overwhelmingly strong preference for small breasts within this subset of the population. But the key here is American. These are individuals immersed in American culture. The norms no doubt differ in East Asia. The typical visual representation of celebrity East Asian females that we see in the American media depict individuals who are slimmer and more understated in their secondary sexual characteristics than is the norm among Western female celebrities (e.g., Gong Li, the new crop of Korean pop stars, even taking into account the plastc surgery of the latter). Part of this is no doubt the reality that the normal range of variation across the population differs, and part of it may be the nature of aesthetic preferences.

But the possibility of deep rooted psychological reasons driving sexual selection (to my knowledge there was no culture which spanned South China and Siberia) brings us back to old ideas about the Pleistocene mind. And, it brings us back to evolutionary psychology, a field which is the whipping boy of both skeptics of the utility of evolutionary science in understanding human nature, and rigorous practitioners of evolutionary biology. And yet here it is not the evolutionary psychologists, but rock-ribbed statistical geneticists who I often see being quoted in the media invoking sexual selection. But do we know it is sexual selection, or is it just our best guess? Because more often than not best guesses are wrong (though best guesses are much more likely to be right than worst guesses!).

Evolutionary genomics has come a long way in the past 10 years. We know, for example, the genetic architecture and some aspects of the natural history of many traits. But, there are still shortcomings. Lactase persistence is the exception to the rule. Even a phenotype as straightforward as human pigmentation has no undisputed answer as to why it has been the repeated target of selection across Eurasia over the past 40,000 years. Oftentimes the right answer is simply that we just don’t know.

Citation: http://dx.doi.org/10.1016/j.cell.2013.01.016

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

A few days ago I was browsing Haldane’s Sieve,when I stumbled upon an amusing discussion which arose on it’s “About” page. This “inside baseball” banter got me to thinking about my own intellectual evolution. Over the past few years I’ve been delving more deeply into phylogenetics and phylogeography, enabled by the rise of genomics, the proliferation of ‘big data,’ and accessible software packages. This entailed an opportunity cost. I did not spend much time focusing so much on classical population and evolutionary genetic questions. Strewn about my room are various textbooks and monographs I’ve collected over the years, and which have fed my intellectual growth. But I must admit that it is a rare day now that I browse Hartl and Clark or The Genetical Theory of Natural Selection without specific aim or mercenary intent.

R. A. Fisher

Like a river inexorably coursing over a floodplain, with the turning of the new year it is now time to take a great bend, and double-back to my roots, such as they are. This is one reason that I am now reading The Founders of Evolutionary Genetics. Fisher, Wright, and Haldane, are like old friends, faded, but not forgotten, while Muller was always but a passing acquaintance. But ideas 100 years old still have power to drive us to explore deep questions which remain unresolved, but where new methods and techniques may shed greater light. A study of the past does not allow us to make wise choices which can determine the future with any certitude, but it may at least increase the luminosity of the tools which we have iluminate the depths of the darkness. The shape of nature may become just a bit less opaque through our various endeavors.

Figure from “Directional Positive Selection on an Allele of Arbitrary Dominance”, Teshima KM, Przeworski M

So what of this sieve of Haldane? As noted at Haldane’s Sieve the concept is simple. Imagine two mutations, one which expresses a trait in a recessive fashion, and another in a dominant one. The sieve operates by favoring the emergence out of the low frequency zone where stochastic forces predominate of dominantly expressing variants (i.e., even if an allele confers a large fitness benefit, at low frequencies the power of random chance may still imply that it is highly likely to go extinct). An example of this would be lactase persistence, which in the modal Eurasian variant seems to exhibit dominance. The converse case, where beneficial mutations are recessive in expression suffer from a structural problem where their benefit is more theoretical than realized.

The mathematics of this is exceedingly simple, a consequence of the Hardy-Weinberg dynamics of diploid random mating organisms. Let’s use the gene which is implicated in variation in lactase persistence as an example, LCT. Consider two alleles, LP and LNP, where the former confers persistence (one can digest lactose sugar as an adult), and the latter manifests the conventional mammalian ‘wild type’ (the production of lactase ceases as one leaves the life stage when nursing is feasible). LP is clearly the novel mutant. In a small population it is not unimaginable that by random chance the frequency of LP rises to ~10%. What now? At HWE you have:

p2 + 2pq + q2 = 1, where q = LP allele. At ~10% the numbers substituted would be:

(0.90)2 + 2(0.90)(0.10) + (0.10)2

This is where dominance or recessive expression is highly relevant. The reality is that LP is a dominant trait. So in this population the frequency of LP as a trait would be:

(0.10)2 + 2(0.90)(0.10) = 19%

Now imagine a model where LP is favored, but it expresses in a recessive fashion. Then the frequency of the trait would equal q2, the homozygote LP-allele proportion. That is, 1%. Though population genetics is often constructed on an algebraic foundation, the results lend themselves to intuition. A structural parameter endogenous to the genetic system, dominant or recessive expression, can have longstanding consequences in terms of the likely trajectory of the alleles. Selection only “sees” the trait, so a recessive trait with sterling qualities may as well be a trait with no qualities. In contrast, a dominantly expressed allele can cut like a scythe through a population, because every copy “counts.”

In preparation for this post I revisited the selection on Haldane’s Sieve in the encyclopediac Elements of Evolutionary Genetics. The authors note that this phenomenon, though of vintage character as these things can be reckoned is a field as young as evolutionary genetics, is still a live one. The dominance of favored mutations in wild populations, or the recessive character of deleterious ones in laboratory stock, may reflect the different regimes which these two genes pools are subject to. The nature of things is such that is easier to generate recessive mutations than dominant ones (i.e., loss is easier than gain), so the preponderance of dominant variants in wild stocks subject to positive selective pressure lends credence to the idea that evolutionary rather than development forces and constraints shape the genetic character of many species.

And yet things are not quite so tidy. Haldane’s Sieve, and the framework of dominant versus recessive alleles, operates differently in the area of sex chromosomes. In many lineages there is a ‘heterogametic sex’ which carries only one functional chromosome for most of the genome. In mammals this is the male (XY), while in birds this is the female (ZW). As males have only one functional copy of most genes on the sex chromosome, the masking effect of recessive expression does not apply to them in mammals. This may imply that because of the exposure of many deleterious recessive variants to natural selection within the heterogametic sex one would see different allelic distributions and genetic landscapes on these chromosomes (e.g., more rapid adaptation because of the exposure of nominally recessive alleles in the heterogametic sex, as well as more purifying selection on deleterious variants). But the reality is more complex, and the literature in this area is somewhat muddled. More precisely, it seems phylogenetically sensitive. Validation of the theory in mammals founders once one moves to Drosphila.

And that is why research in evolutionary genetics continues. The theory stimulates empirical exploration, and is tested against it. Much of the formal theory of classical evolutionary genetics, which crystallized in the years before World War II, is now gaining renewed relevance because of empirical testability in the era of big data and big computation. This is an domain where the past is not simply of interest to historians. Scientists themselves, chasing the next grant, and producing the expected stream of publications, may benefit from a little historical perspective by standing upon the shoulders of giants.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

The above map shows the population coverage for the Geno 2.0 SNP-chip, put out by the Genographic Project. Their paper outlining the utility and rationale by the chip is now out on arXiv. I saw this map last summer, when Spencer Wells hosted a webinar on the launch of Geno 2.0, and it was the aspect which really jumped out at me. The number of markers that they have on this chip is modest, only >100,000 on the autosome, with a few tens of thousands more on the X, Y, and mtDNA. In contrast, the Axiom® Genome-Wide Human Origins 1 Array Plate being used by Patterson et al. has ~600,000 SNPs. But as is clear by the map above Geno 2.0 is ascertained in many more populations that the other comparable chips (Human Origins 1 Array uses 12 populations). It’s obvious that if you are only catching variation on a few populations, all the extra million markers may not give you much bang for the buck (not to mention the biases that that may introduce in your population genetic and phylogenetic inferences).


To the left are the list of populations against which the Human Origins 1 Array was ascertained, and they look rather comprehensive to me. In contrast, for Geno 2.0 ‘ancestrally informative markers’ were ascertained on 450 populations. The ultimate question for me is this: is all the extra ascertainment on diverse and obscure groups worth it? On first inspection Geno 2.0′s number of SNPs looks modest as I stated, but in my experience when you quality control and merge different panels together you are often left with only a few hundred thousand SNPs in any case. 100-200,000 SNPs is also sufficient to elucidate relationships even in genetically homogeneous regions such as Europe in my experience (it’s more than enough for model-based clustering, and seems to be overkill for MDS or PCA). One issue that jumps out at me about the Affymetrix chip is that it is ascertained toward the antipodes. In contrast, Geno 2.0 takes into account the Eurasian heartland. I suspect, for example, that Geno 2.0 would be better for population or ancestry assignment for South Asians because it would have more informative markers for those populations.

Ultimately I can’t really say much more until I use both marker sets in different and similar contexts. Since Geno 2.0 consciously excludes many functional and medically relevant SNPs its utility is primarily in the domain of demographics and history. If the populations in question are well covered by the Human Origins 1 Array, I see no reason why one shouldn’t go with it. Not only does it have more information about biological function, but the number of markers are many fold greater. On the other hand, Geno 2.0 may be more useful on the “blank zones” of the Affy chip. Hopefully the Genographic Project results paper for Geno 2.0 will come out soon and I can pull down their data set and play with it.

Cite: arXiv:1212.4116

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

To understand nature in all its complexity we have to cut down the riotous variety down to size. For ease of comprehension we formalize with math, verbalize with analogies, and visualize with representations. These approximations of reality are not reality, but when we look through the glass darkly they give us filaments of essential insight. Dalton’s model of the atom is false in important details (e.g., fundamental particles turn out to be divisible into quarks), but it still has conceptual utility.

Likewise, the phylogenetic trees popularized by L. L. Cavalli-Sforza in The History and Geography of Human Genes are still useful in understanding the shape of the human demographic past. But it seems that the bifurcating model of the tree must now be strongly tinted by the shades of reticulation. In a stylized sense inter-specific phylogenies, which assume the approximate truth of the biological species concept (i.e., little gene flow across lineages), mislead us when we think of the phylogeny of species on the microevolutionary scale of population genetics. On an intra-specific scale gene flow is not just a nuisance parameter in the model, it is an essential phenomenon which must be accommodated into the framework.


This is on my mind because of the emergence of packages such as TreeMix and AdmixTools. Using software such as these on the numerous public data sets allows one to perceive the reality of admixture, and overlay lateral gene flow upon the tree as a natural expectation. But perhaps a deeper result is the character of the tree itself is torn asunder. The figure above is from a new paper, Efficient moment-based inference of admixture parameters and sources of gene flow, which debuts MixMapper. The authors bring a lot of mathematical heft to their exposition, and I can’t say I follow all of it (though some of the details are very similar to Pickrell et al.’s). But in short it seems that in comparison to TreeMix MixMapper allows for more powerful inference of a narrower set of populations, selected for exploring very specific questions. In contrast, TreeMix explores the whole landscape with minimal supervision. Having used the latter I can testify that that is true.

The big result from MixMapper is that it extends the result of Patterson et al., and confirms that modern Europeans seem to be an admixture between a “north Eurasian” population, and a vague “west Eurasian” population. Importantly, they find evidence of admixture in Sardinians, which implies that Patterson et al.’s original were not sensitive to admixture in putative reference populations (note that Patterson is a coauthor on this paper as well). The rub, as noted in the paper, is that it is difficult to estimate admixture when you don’t have “pure” ancestral reference populations. And yet here the takeaway for me is that we may need to rethink our whole conception of pure ancestral populations, and imagine a human phylogenetic tree as a series of lattices in eternal flux, with admixed nodes periodically expanding so as to generate the artifice of a diversifying tree. The closer we look, the more likely that it seems that most of the populations which have undergone demographic expansion in the past 10,000 years are also the products of admixture. Any story of the past 10,000 years, and likely the past 100,000 years, must give space at the center of the narrative arc lateral gene flow across populations.

Cite: arXiv:1212.2555 [q-bio.PE]
(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

While I was at Spencer Wells’ poster at ASHG I was primarily curious about bar plots. He’s got really good spatial coverage, so I’m moderately excited about the paper (though I didn’t see much explicit testing of phylogenetic hypotheses, which I think this sort of paper has to do now; we’re beyond PCA and bar plots only papers). That being said, Spencer was more interested in me promoting the Scientific Grants Program. Here’s some more information:

The Genographic Project’s Scientific Grants Program awards grants on a rolling basis for projects that focus on studying the history of the human species utilizing innovative anthropological genetic tools. The variety of projects supported by the scientific grants will aim to construct our ancient migratory and demographic history while developing a better understanding of the phylogeographic structure of world populations. Sample research topics could include subjects like the origin and spread of the Indo-European languages, genetic insights into Papua New Guinea’s high linguistic diversity, the number and routes of migrations out of Africa, the origin of the Inca, or the genetic impact of the spread of maize agriculture in the Americas.

Recipients will typically be population geneticists, students, linguists, and other researchers or scientists interested in pursuing questions relevant to the Genographic Project’s broad goal of exploring our migratory history. Recipients of Genographic scientific grant funds will become members of the Genographic Consortium, and will be expected to act as agents of the greater Genographic mission, participating in and reporting on multiple aspects of Genographic fieldwork, in addition to their own proposed and mission‐aligned pilot projects. Openness and transparency within the Consortium are the key values of the project’s research team, and grantees will be expected to abide by this code of conduct.


If you poke through their material they say that the grant will be $25 to $50 thousand dollars. That’s 125 to 250 Geno 2.0 chips. Speaking of which, I sent in a chip about a month ago now. The results should be back soon.

So why was Spencer so keen on me pushing this again? (I’ve mentioned it before) After being at ASHG 2012 I’m shocked in the small sample space of people interested in these sorts of historical genetic questions. I say this because I’ve reviewed/read most of the papers which were present as posters. I wonder on occasion if I’m missing out on something, but these results indicate no, there’s only so many labs doing this sort of work. The last is the key question. This is where “bottom up” non-academic science can do wonders. An Indian group presented a poster at ASHG, and when they told me of the similarities between Iyers and Bengali Brahmins I couldn’t help but admit that “Yes, I know that already, my friend Zack Ajmal came to that conclusion.” If you are an academic you need to go beyond tools and methods and analytic insights which someone with a spare computer and some marginal free time can generate. Academic monopolies on these data are going to be short-lived at best. And all for the good. I’m sick & tired of intellectual rents.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

As most readers know I was at ASHG 2012. I’m going to divide this post in half. First, the generalities of the meeting. And second, specific posters, etc.

Generalities:

- Life Technologies/Ion Torrent apparently hires d-bag bros to represent them at conferences. The poster people were fine, but the guys manning the Ion Torrent Bus were total jackasses if they thought it would be funny/amusing/etc. Human resources acumen is not always a reflection of technological chops, but I sure don’t expect organizational competence if they (HR) thought it was smart to hire guys who thought (the d-bags) it would be amusing to alienate a selection of conference goers at ASHG. Go Affy & Illumina!

- Speaking of sequencing, there were some young companies trying to pitch technologies which will solve the problem of lack of long reads. I’m hopeful, but after the Pacific Biosciences fiasco of the late 2000s, I don’t think there’s a point in putting hopes on any given firm.

- I walked the poster hall, read the titles, and at least skimmed all 3,000+ posters’ abstracts. No surprise that genomics was all over the place. But perhaps a moderate surprise was how big exomes are getting for medically oriented people.

- Speaking of medical/clinical people, I noticed that in their presentations they used the word ‘Caucasian‘ a lot. This was not evident in the pop-gen folks. It shows the influence of bureaucratic nomenclature in modern medicine, as they have taken to using somewhat nonsensical US Census Bureau categories.

- Twitter was a pretty big deal. There were so many interesting sessions that I found myself checking my feed constantly for the #ASHG2012 hashtag. It was also an easy way to figure out who else was at the same session (e.g., in my case, very often Luke Jostins).

- If you could track the patterns of movements of smartphones at the conference it would be interesting to see a network of clustering of individuals. For example, the evolutionary and population genomics posters were bounded by more straight-up informatics (e.g., software to clean your raw sequence data), from which there was bleed over. But right next to the evolution and population genomics sections (and I say genomics rather than genetics, because the latter has been totally subsumed by the former) you had some type of pediatric disease genetics aisles. I wasn’t the only one to have a freak out when I mistakenly kept on moving (i.e., you go from abstruse discussions of the population structure of Ethiopia, to concrete ones about the likely probability of death of a newborn with an autosomal dominant disorder, with photos of said newborn!).

- It was obvious which sessions were more multidisciplinary: just note the “churn” between speakers. People were switching sessions speaker-by-speaker, so if there was a stretch not to their liking, they would opt out.

- Number of questions per talk seemed to follow a power law. Many, many, talks had to have the moderator ask a token question. But there were a few panels where people rushed to the mics, and the moderators had to turn them away (this happened to me a couple of times, though I had the habit of sitting in the middle of aisles so that people wouldn’t have to edge past me, which disadvantaged me).

- 23andMe will supposedly have the new ancestry painting, with many more populations, up by the end of the year at the latest. I’ll believe it when I see it, but the person who was telling me this seemed totally sincere, and I’m hopeful.

- I drank a fair amount some of the nights, and have a lot of business cards from people I don’t remember. But one thing that seems to be emerging is a proliferation of intermediation and b2b services. With the diversity of choices it stands to reason that some firms are stepping into the clutter and attempting to make a profit by matching the two parties at the ends of the transaction. One person who I do recall Michael Heltzen of BlueSEQ, which he pitches as the “Hotels.com of sequencing.”

- Overall this was a well run conference in terms of logistics. I’ll definitely be at Boston next year!

- Lots of stuff on archaic hominin admixture and selection. Lots.

- Friends don’t let friends use structure when they could use admixture. It seems that most people have switched to the latter, which is fast. But a few groups are still using the former. And they shouldn’t be, because their burn-in and replication parameters are set way low (I use structure for microsatellites) so that it won’t take a thousand years to converge. If you are doing this, why go for the power of Bayesian phylogenetics in the first place?

- Luke Jostins suggested that I looked different in real life from the head shot. I suspect that perhaps Luke has lower powers of perception in this domain. A very drunk member of a well respected lab decided to start yelling my name at a dim after party,* so I can’t look that different (and it isn’t as if there aren’t a lot of brown dudes walking around at these things).

- Konrad Karczewski gave me a “free the data” button which I wore, but there were mixed results when I asked if people were going to release their data sets. Some presenters offered to email me the data, but since I wasn’t flashing my badge I’m curious why they’d offer to even do this, as opposed to releasing it to millions of other strangers!

Specifics:

This section will mostly cover the talks and posters in the evolutionary and population genomics category. I can comment on the talks because I went to them, and on the posters because I looked at them multiple times. One thing to note is that many of the posters and some of the talks were on papers which are already in circulation (preprint or already published). I’m not going to touch on that much. I’ve reviewed/linked to most of those.

- One of the guys involved in fetal whole genome sequencing from last spring stated that the primary cost here is going to be in the sequencing. He’s also confident that they can move the sequencing much further back from 18 weeks (i.e., in terms of sample collection and analysis turnaround).

- There is a lot of talk about structural variants, etc., but for high-throughput sequencing methods we’re still not ‘there.’ I actually went to a CNV talk where the presenter presented some RFLP results! He stated that the reality was that for clinical purposes high-throughput isn’t feasible or accurate enough to distinguish 3 vs. 4 vs. 5 copies.

- I don’t get a lot of the CNV stuff which repeats SNP-results with CNVs. For example, the posters which recapitulated geographical fine-structure with CNV. This was OK for the first pub, but doing it over and over again seems gratuitous.

- Simon Gravel has some very awesome software.

- Luca Pagani is confident about rolloff‘s admixture estimate for Ethiopia. He’s moving to Ethiopian whole genomes now, and plans on doing follow ups on this question (his own methods are in line with rolloff).

- The rumored paper (i.e., I’ve heard about this paper for a few years) which connects Northeast African populations with the Khoisan of southern Africa will finally be published soon. At least that was what I was told…as I noted, this result has been around for a long time, but someone hasn’t been published. Basically the group has some Cushitic speaking samples from Kenya, and it looks like that these are the Ethiopian analogy to Andaman Islanders (or as close as you can get).

- We’ll see something on Afrikaner genomics soon enough. I wasn’t told explicitly, but it was pretty obvious.

- The Nielsen Group is still working on high altitude adaptations. They don’t see hard sweeps. Of course I didn’t get confirmation of whether these were old variants, but it looks as if a lot of preliminary stuff did not have the power to detect anything in the first group. As usual they are up to something.

- Speaking of the Nielsen Group, Melissa Wilson Sayres’s work on purifying selection on Y chromosomal lineages was persuasive to me. Basically, effective population differences (e.g, polygyny) just can not explain the lower diversity of the Y lineages (they ran simulations). Luckily for the phylogeographers this won’t impact the utility of Y trees (positive selection would, but that’s not what she’s talking about). I’m a little confused whether it was Sayres’ talk or not, but these results may explain the discordance in coalescence between mtDNA and Y lineages (the former has a deeper coalescence).

- Also, Amy Goldberg from Noah Rosenberg‘s lab presented some theoretical work that showed that complex demographic history has an impact on the variance, as opposed to the mean, effective population size you might infer for a given sex. Someone from Michael Hammer’s lab started asking me if I liked their research while I was looking at Amy’s poster, and I said sure (I’d blogged it), but her theoretical results also explain some of the weird stuff I’d see out of their lab.

- Sriram Sankararaman had a poster on Neandertal admixture in modern human lineages. In the broad outlines the Reich lab and the Wall lab seem to agree (along with others, such as Melinda Yang in the Slatkin lab). We’re seeing the convergence of a new orthodoxy/paradigm. And they seem to agree broadly with Graham Coop’s conjecture.

- There was a lot of stuff on East Asian genetics, but nothing too cutting edge. I was kind of disappointed. A massive Y and mtDNA study did suggest two waves of admixture in the Tibetan highland, which a priori seems plausible to me. But the rule-of-thumb I have is not to bet against the Nielsen Group, which remains skeptical. Another paper suggested deep lineages of haplogroup M among the Burmans. This is interesting because the Burmans are presumably culturally somewhat intrusive, supplanting the Mon populations.

- The guy from the Peopling of the British Isles presented. Two points. First, ~40 percent of the ancestry in England proper seems Anglo-Saxon. Second, their clustering method seemed to find many more ‘micro-populations’ along the “Celtic Fringe” and in Scotland. Why? My hunch is that the Anglo-Saxon expansion wasn’t a diffusion process. Rather, the hordes of Hengist and Horsa probably admixed with the local Brythonic Celtic population on the East Anglian shore, and the rapidly expanded. There is a high probability of some later assimilation (there is some suggestion that Alfred the Great’s line were Brythonic nobles who were absorbed into the Anglo-Saxon power structure), but the emergence of a huge Anglo-Saxon/England proper cluster was very evident in the figure displayed. The main opposition to this thesis I can think of is that isolation-by-distance gene flow is very efficacious in the topography of England, but less so in the more rugged borderlands.

- Speaking of isolation-by-distance, an Estonian geneticist claimed to me that the distinction between Estonians and Finns probably has to do with the arrival of the original Finnic populations from the east, and their subsequent separation. While the Estonians engaged in gene flow with the Latvians, they diverged from the Finns across the water, who were more isolated until the Swedes arrived.

- There was a poster (didn’t talk to the presenters) which did whole genome analysis of a South Indian man or two, and indicated that there is evidence that these individuals are basal to all other non-Africans. This is another attempt to reaffirm the possibility of an ancient “southern route” out of Africa. I wasn’t convinced because there wasn’t much detailing of their methods (they pointed to a diversity estimate, but that’s not enough these days).

- Another Indian group confirmed a lot of stuff that Zack has found already, but supplemented it with lots of low caste/tribal samples, which most people lacked. They assert (rightly) that within South Asia there are genetic distances across populations/castes which are analogous to inter-continental differences.

- I am excited by the synthesis of spatial and genetic variation data…but am beginning to realize that this has limitations, because we can’t transpose genetic variation representation onto tesseracts (because we can’t visualize tesseracts). In short, two or three dimensional representations remove important information at the finer-grain. And it’s at the finer-grain that we’re focusing now.

- Apparently Mexicans and Chileans overestimate their European ancestry. The presenters found that 40-45% of the ancestry of their Chilean sample was Amerindian. I asked about sampling, and they admitted this might be an issue. The same applied to their other results. We need thicker data sets here. Basically if it’s a heterogeneous country, you can’t have a pie-graph labelled with that country.

- There was a poster on associating OCA2-HERC2 in Brazilians with hair, eye, and skin color. The association of OCA2-HERC2 with skin color is unadmixed Europeans is mixed, but seems to show up in this population. Assuming stratification is not a problem (I believe they looked at that genomically), it seems that the effect on skin only shows up when you have a particular pigmentation genetic architecture. It’s a matter of statistics, not biology.

- Speaking of pigment, Mark Shriver had a poster which correlated perceived, apparent, and genomic racial ancestry. Perceived means how you’re perceived by others. Apparent is taking physical traits and averaging them quantitatively (facial features, skin color, etc.). And genomic ancestry is what you know about. Estimating ancestry quanta. The surprising thing is that people seemed to underestimate African ancestry from apparent physical features (looking at the scatter of apparent to genomic ancestry). This goes against folk wisdom, which asserts that “African features are dominant.”

- Lots of corrections of naive usages of Fst in the literature. A poster out of the Price lab suggested using likelihood ratios, and if not possible, Hudson’s Fst. This showed up multiple times in various forms. Fst will not die, but will be reborn!

- Saw a poster which claimed first cousin marriage decreases expected value of offspring by 3 cm! (this was not in the evolution and pop genomics sections, and I probably should have spent more time looking over complex traits, etc., but there’s only so much you can do)

- More evidence of multiple migrations into New World. Lots of New World genomics. I didn’t talk to these presenters because they were always busy.

- Spencer Wells told me that they’d finally be publishing their paper using their Geno 2.0 results soon. They had really good population coverage, though I wish they’d had the bar plot rotated 90 degrees. I couldn’t read labels too well.

Finally, there was A LOT of software, and A LOT of methods. This is one of the things where I assume over the next decade it will shake out into a few big players. Right now labs are pumping out software to infer ancestry, phase data, etc., and playing up their advantages. This is all good, but at some point the focus will go back to biology, and the software will be the wind beneath its wings. I’m trying to free up time to play with some of the software, though much of it isn’t online yet (the presenters always assured it would be up soon, but I know how that goes.).

* This was not a pleasant experience.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

OK, perhaps I can help with that. Dr. Coop speaks of the collaboration between himself & Dr. Joseph Pickrell, Haldane’s Sieve, which I added to my RSS days ago (and you can see me pushing it to my Pinboard). From the “About”:

As described above, most posts to Haldane’s Sieve will be basic descriptions of relevant preprints, with little to no commentary. All posts will have comment sections where discussion of the papers will be welcome. A second type of post will be detailed comments on a preprint of particular interest to a contributor. These posts could take the style of a journal review, or may simply be some brief comments. We hope they will provide useful feedback to the authors of the preprint. Finally, there will be posts by authors of preprints in which they describe their work and place it in broader context.

We ask the commenters to remember that by submitting articles to preprint servers the authors (often biologists) are taking a somewhat unusual step. Therefore, comments should be phrased in a constructive manner to aid the authors.

It might be helpful if other evolution/genetics bloggers reblog this so we can push it up the Google search results. If you google “Haldane’s Sieve” some of the results are interesting…and not necessarily in a good way. I do feel guilt blogging on stuff my readers can’t get, so the more preprints become acceptable the more we (as in, the general public) can understand about evolution.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Yesterday I pointed out that David Reich had a moderately dismissive attitude toward the new paper in PNAS, Effect of ancient population structure on the degree of polymorphism shared between modern human populations and ancient hominins. Here’s what Reich said:

…But Reich believes that the discussion would have been different if it had happened in the open. The PNAS paper questioning the Neanderthal admixture addresses issues swirling around two years ago, but not Reich and Slatkin’s latest work. “It’s been an issue for several years. They were right to work on this,” says Reich. But now, “it’s kind of an obsolete paper,” he says.

Here’s what Nick Patterson, Reich’s colleague told me via email:

Ancient structure in Africa was considered when we wrote the Green et al. paper, and we were aware that this could explain D-statistics. But the hypothesis is no longer viable as the major explanation of Neandertal genetics in Eurasia. This was discussed in the recent paper of Yang et al. (MBE, 2012). (Not referenced by the PNAS paper).

A very simple argument, that convinces me, is that the allelic frequency spectrum of Neandertal alleles in Eurasia falls off very quickly. A bottleneck flattens out the spectrum, and it turns out that the Neandertal gene flow has to be placed after the out of Africa bottleneck or the spectrum is much too flat.

The paper on the arXiv from the Reich lab (Sankararaman et al.) is trying to do something much more subtle than this and date the flow. I personally am no longer interested in explaining the introgression as ancient structure. That ship has sailed.

Of course the question of what was the genetic structure of Ancient Africa is quite open, and remains very interesting.

If Nick’s explanation is a bit cryptic for you (he was a cryptographer!), figure 2 from the Yang et al. paper lays it out quite clearly:

Let’s back up for a moment and set the stage. What did they do in the PNAS paper which claims that one can not reject the model that the Eurasian affinity to Neandertals is due to ancient African population structure (i.e., the African ancestors of Eurasians already had a closer affinity to Neandertals, perhaps due to continuous gene flow)? Basically they created an explicit spatial model with a temporal dimension. The authors simulated parameters of gene flow (and lack thereof) as well as bottlenecks, etc., and found that ancient structure easily generated the D-statistic which the original authors of the Neandertal admixture paper relied upon.

So why so dismissive from Reich & Patterson? Because the Yang et al. paper admits this problem, and formulates a way to test alternative scenarios which generate just those D-statistics, but exhibit different demographic histories. What they found in Yang et al. is that a model where a population bottleneck occurs followed by admixture is the best fit to the site frequency spectrum that you see in real populations today. In other words, they also simulated situations where ancient structure generated equivalent D-statistics to admixture, and then furthermore explored scenarios where other population genetic statistics could further prune the alternatives. One could say that the appropriate follow up paper to the PNAS contribution was actually published before it.

The paper on arXiv (to be published in PLoS Genetics) goes much further. Using patterns of the linkage disequilibrium in the genome they produce a date when the admixture occurred. The statistical genetics here is somewhat opaque to the casual reader, so interpretation of these results probably should be conditional on the Yang et al. paper, whose results are more elegant and easy to digest.

After all is said and done David Reich’s judgment is not atypical. Several people who I know personally and are deeply immersed in human population genomics are simply not impressed by the PNAS paper. That happens, and there’s no shame in it. But Reich has a point: a speedier process of publication and review would have saved a lot of people some energy.

Related: Dienekes’ comments.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Dienekes has summaries up of human-related abstracts of Society for Molecular Biology & Evolution 2012.

1) Remember these are not papers, and some of the abstracts may never become papers, at least in recognizable form

2) Speaking of which, Estimating a date of mixture of ancestral South Asian populations:


Linguistic and genetic studies have demonstrated that almost all groups in South Asia today descend from a mixture of two highly divergent populations: Ancestral North Indians (ANI) related to Central Asians, Middle Easterners and Europeans, and Ancestral South Indians (ASI) not related to any populations outside the Indian subcontinent. ANI and ASI have been estimated to have diverged from a common ancestor as much as 60,000 years ago, but the date of the ANI-ASI mixture is unknown. Here we analyze data from about 60 South Asian groups to estimate that major ANI-ASI mixture occurred 1,200-4,000 years ago. Some mixture may also be older—beyond the time we can query using admixture linkage disequilibrium—since it is universal throughout the subcontinent: present in every group speaking Indo-European or Dravidian languages, in all caste levels, and in primitive tribes. After the ANI-ASI mixture that occurred within the last four thousand years, a cultural shift led to widespread endogamy, decreasing the rate of additional mixture.

To comments on this. The ~1,200 estimate for large-scale admixture is just nearly impossible to credit. Historically the only group which are likely candidates for this would be the Jatts of Punjab, who have myths of descents from the last pre-Islamic Central Asian populations which intruded upon the Indian subcontinent. In fact, if 1,200-4,000 represents an interval, the expected value is ~2,600 years ago. Approximately the time of the Buddha. This seems rather too recent to be plausible. But…the authors do note that there may be older admixture events. If the signal they’re picking up is the Indo-Aryan expansion, then that is somewhat plausible, in that it seems that as lage as that period large swaths of the eastern Indo-Gangetic plain and much of Central India were in the process of becoming part of greater Aryavarta.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Over the years one issue that crops up repeatedly in human evolutionary genetics and paleoanthropology (or more precisely, the popular exposition of the topics in the media) is the idea that is that “population X are the most ancient Y.” X will always refer to a population within a larger set, Y, which is defined by relative marginalization or retention of older cultural folkways. So, for example, I have seen it said that the Andaman Islanders are the “most ancient Asian population.” Why? The standard model for a while now has been that non-Africans derive from a line of Africans which left the ancestral continent 50 to 100 thousand years ago, and began to diversify. Presumably Andaman Islanders have ancestry which goes back to this original dispersion, just as Europeans and Chinese do (revisions which suggest that Aboriginals may have been part of an earlier wave, still put the Andamanese in the second wave). The reason that the Andaman populations are termed ancient is pretty straightforward: they’re Asia’s last hunter-gatherers, literally chucking spears at outsiders. An ancient lifestyle gets conflated with ancient genetics.

This is a much bigger problem with the hunter-gatherers of Africa, the Pygmies, Hadza, and Bushmen. The reason is that these populations are of particular interest because they seem to have diverged from the rest of humanity rather early on. Both Y chromosomes and mtDNA confirmed this, and now autosomal analyses looking across the whole genome are confirming it. In other words, they’re basal to the rest of humanity. I believe this is moderately misleading. With the Bantu Expansion much of African genetic diversity disappeared. The hunter-gatherers seem exceptional long and bare branches on the phylogenetic tree because all their relatives are gone!

But the hunter-gatherers remain, and their genetic material has been collected for scientists to study. A new paper in PLoS Genetics puts the spotlight on Western Pygmies, and their relationship to their Bantu neighors. Patterns of Ancestry, Signatures of Natural Selection, and Genetic Association with Stature in Western African Pygmies:

Africa is thought to be the location of origin of modern humans within the past 200,000 years and the source of our dispersion across the globe within the past 100,000 years. Africa is also a region of extreme environmental, cultural, linguistic, and phenotypic diversity, and human populations living there show the highest levels of genetic diversity in the world. Yet little is known about the genetic basis of the observed phenotypic variation in Africa or how local adaptation and demography have influenced these patterns in the recent past. Here, we analyze a set of admixing Bantu-speaking agricultural and Western Pygmy hunter-gatherer populations that show extreme differences in stature; Pygmies are ~17 cm shorter on average than their Bantu neighbors and among the shortest populations globally. Our multifaceted approach identified several genomic regions that may have been targets of natural selection and so may harbor variants underlying the unique anatomy and physiology of Western African Pygmies. One region of chromosome three, in particular, harbors strong signals of natural selection, population differentiation, and association with height. This region also contains a significant association with height in Europeans as well as a candidate gene known to regulate growth hormone signaling.

The method here is simple. Previous work already confirmed that the height of a given Pygmy was strongly predicted by the amount of non-Pygmy ancestry they carried within their genome. Now the authors here are focusing on regions of the genome which not only show association with the phenotype in question, but signatures of natural selection. At this point I’m cautious enough about associations and positive results from tests for natural selection to be wary of accepting this on face value, but we have some priors here which should make this plausible. That is, there are strong functional rationales, and it isn’t as if the Pygmies are not distinctive in their height phenotype.

Let’s take the likelihood of natural selection for height as a given. What fascinates me is that the authors suggest that selection post-dates the divergence of the Western and Easter Pygmy populations. Why does this matter? Because it may give us a better clue as to the nature of the “pygmy” phenotype, which is common among relic hunter-gatherers the world over. The Bushmen, Pygmy, and various “Negritos” of Asia are small. Some have suggested this is an ancestral human type, or a natural adaptation, or an adaptation to the rainforest. On the other hand, the populations of Oceania are not small. To my knowledge the Indians of the Amazon are not the size of Pygmies. To put my own cards on the table I lean toward the proposition that the “pygmoid” body plan emerges when populations are driven to the margins, or, are being buffeted by disease and stress. It seems likely now that the closest relatives of the Philippine Negritos are the people of Oceania, most of whom are not small of stature. There are non-Bushmen Khoisan populations who are not small of stature. And, reportedly the isolated Andamanese of Sentinel Island are not of small stature!

The point here is that studying marginalized hunter-gatherers has limits in telling us about the nature of the human ancestors. It may be that Pygmies are in many ways derived in their phenotypes, relatively recent adaptations to contemporary exigencies. The results above even imply that the small stature of these populations may be a byproduct of the genetic correlation between various traits, and selection in one direction resulted in a correlated response in height. I would like to make a modest proposal: simply take these people on their own terms, and stop trying to slot them into a convenient paradigm. I doubt that Pygmies are going to be the great physicists of the 21st century because of their genetic variation (this was floated by Dierdre McCloskey on Dan MacArthur’s blog), nor do I think they are a special window in the very earliest of H. sapiens sapiens. They are who they are.

Addendum: Though I do know that some people would be curious about the evolutionary origins of other traits besides height in African hunter-gatherers.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

The new article in The American Journal of Human Genetics, A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root, is open access, so you should check it out. The discussion gets to the heart of the matter:

Supported by a consensus of many colleagues and after a few years of hesitation, we have reached the conclusion that on the verge of the deep-sequencing revolution…when perhaps tens of thousands of additional complete mtDNA sequences are expected to be generated over the next few years, the principal change we suggest cannot be postponed any longer: an ancestral rather than a “phylogenetically peripheral” and modern mitogenome from Europe should serve as the epicenter of the human mtDNA reference system. Inevitably, the proposed change could raise some temporary inconveniences. For this reason, we provide tables and software to aid data transition.

What we propose is much more than a mere clerical change. We use the Ptolemaian geocentric versus Copernican heliocentric systems as a metaphor. And the metaphor extends further: as the acceptance of the heliocentric system circumvented epicycles in the orbits of planets, switching the mtDNA reference to an ancestral RSRS will end an academically inadmissible conjuncture where virtually all mitochondrial genome sequences are scored in part from derived-to-ancestral states and in part from ancestral-to-derived states. We aim to trigger the radical but necessary change in the way mtDNA mutations are reported relative to their ancestral versus derived status, thus establishing an intellectual cohesiveness with the current consensus of shared common ancestry of all contemporary human mitochondrial genomes.

Note that the problem is not restricted to mtDNA. Indeed, in the much larger perspective of complete nuclear genomes in which comparisons are often currently made relative to modern human reference sequences, often of European origin, it seems worthwhile to begin considering, as valuable alternatives, public reference sequences of ancestral alleles (common in all primates) whereby derived alleles (common to some human populations) would be distinguished.

Perhaps the first generation or so of human molecular evolutionary genetics might be thought of as a “first draft.” A serviceable first draft which rendered in broad strokes the gist of the truth as we understand it, but lacking in some essential details.

On a minor note, there are some theoretical reasons why mtDNA did not yield much evidence for archaic admixture, which is clear in the nuclear genomics (e.g., higher rate of change due to lower effective population size, so more rapid extinction of ancient lineages). But perhaps now that the number of complete mtDNA genomes is increasing in size we might start to see “long branches,” which reflect the inferences generated from the ancient nuclear genomes.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

The face is an important aspect of our phenotype. So important that facial recognition is one of many innate reflexive cognitive competencies. By this, I mean that you can recognize a face in a gestalt manner, just like you can recognize a set of three marbles. You don’t have to think about it in a step-by-step fashion. Particular types of brain injuries can actually result in disablement of this faculty, and a minority of humans seem to lack it altogether at birth (prosopagnosia). That’s why I’ve long been interested in the genetic architecture and evolution of craniofacial traits. I long ago knew the potential range of pigmentation phenotypes for my daughter because both her parents have been genotyped, but when it comes to facial features we’re stuck with the old ‘blending inheritance’ heuristic. The most obvious importance of teasing apart the genetic architecture of craniofacial traits is forensics. It might not put the sketch artist out of a job, but it would be an excellent supplement to problematic eye witness reports.

But it isn’t just forensics. The issue has evolutionary relevance. It looks like that in terms of morphology our own lineage has had a lot of diversity up until recently. I’m thinking in particular of the ‘archaic’ looking humans recently discovered in China and Nigeria, who seem to have persisted down into the Holocene. More generally, humans as a whole have become more gracile over the last 10,000 years. Why? There are two extreme answers we can look to. First, gracile humans have replaced robust humans. Second, natural selection for gracility has resulted in the in situ evolution of many populations over the last ~10,000 years. An interesting aspect of this is that it looks as if many salient traits have been targets of selection, and therefore evolution and population differentiation.

Here the top 10 SNPs which deviate from the overall phylogenetic tree of population relationships in the HGDP data set:

 

SNP Chr Nearest gene Phenotype
rs1834640 15 SLC24A5 skin pigmentation
rs260690 2 EDAR hair morphology
rs10882168 10 CYP26A1/FER1L3 ?
rs4918664 10 CYP26A1/FER1L3 ?
rs2250072 15 SLC24A5 skin pigmentation
rs6583859 10 CYP26A1/FER1L3 ?
rs2384319 2 KIF3C ?
rs6500380 16 LONP2 ?
rs4497887 2 CNTNAP5 ?
rs9809818 3 FOXP1 ?

There are two things I want to say off the bat. First, a given SNP likely has many phenotypic effects. So the trait that we “see” in terms of its effect may not be the same trait that natural selection “sees.” Second, it is not a surprise that out of the traits that a given variant may affect the physically salient ones stand out; sometimes you do go looking where the light is shining on a dark street. We know that the lighter complexion of East and West Eurasians seems to be due to independent evolutionary events. In other words, they aren’t derived from common ancestry. When it comes to hair form the EDAR locus seems to be responsible for the distinctive characteristics of East Asians, and has been under recent selection.

What does all this have to do with craniofacial traits? Simple: the coarse and “skin deep” traits that physical anthropologists used decades ago to classify human beings have been rather informative to a first approximation of both details of phylogeny and natural selection. I see no reason why craniofacial traits should be any different. Humans have become more gracile, and some human populations seem to have been changing rather rapidly. I am highly skeptical that this is a neutral process. We care a great deal about facial features, and deviation from the norm can be arresting. If there has been change it is either due to population replacement, or selection (it could be a correlated response, or direct selection).

It is with that preamble that I offer up Mark Shriver’s abstract at the Modern Human Genetic Variation symposium:

The genes determining normal-range variation in human faces are arguably some of the most intrinsically interesting and fastest evolving. However, so far, little work has been focused on discovering these genes. Working under the hypothesis that genes causing Mendelian craniofacial dysmorphologies also may be important in determining normal-range facial-feature variation, and that those genes associated with population differences in facial features should have experienced greater levels of evolution (change in allele frequency), we have taken an admixture mapping/selection scan approach to identifying and studying the genes directly affecting facial features. We have applied the methods of automated quasi-landmark analyses, partial least squares regression, and individual genomic ancestry estimates to explore the distribution of facial features across two groups of human populations — West Africans and Europeans. Using three samples of admixed subjects (American; N=159, Brazilian; N=197, and Cape Verdean; N=248) we have modeled facial variation in the parental populations and compared the extent to which estimates of ancestry from the face compare to genomic-ancestry estimates. We also have tested six selection-nominated craniofacial candidate genes for functional effects on facial features using admixture mapping. In objective tests, two of these six genes (FGFR1 and TRPS1) show significant effects on facial features. In addition, human-observer ratings of the similarity between subjects and allele-specific facial morphs show the same effects for these two genes. Additionally, exaggerated allele-specific morphs based on normal-range variation in these genes recapitulates the syndromic facies of the craniofacial dysmorphologies with which they are associated.

I asked Mark about the nature of these genes and the traits. The paper is coming soon, but he told me that he does not think that the genetic architecture of craniofacial traits is going be as simple or easy to characterize as pigmentation genes. On the other hand, he’s reportedly capturing 35% of the African vs. European difference with his marker set, so that’s not trivial, and some of the individual loci have a strong enough effect that it’s visible by eye! Also, given the preserved extant diversity within populations (pigmentation genes are often disjoint across Africans and Europeans) he believes that the selection events are recent.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

The excellent site io9 has a piece up today which is a fascinating indicator of the nature of popular science publications as a lagging indicator. It is a re-post of a piece published last April, How Mitochondrial Eve connected all humanity and rewrote human evolution. In it you have an encapsulation of a particular period in our understanding of human natural history through evolutionary genetics. Notice for example the focus on maternally transmitted lineages, mtDNA and Y chromosomes. And the citations on genealogy date to the middle aughts. The science is mostly correct as far as it goes in the details (or at least it is defensible, last I checked there was still debate as to the validity of the molecular clocks used for Y chromosomal lineages), but it misses the big picture of how we’ve reframed our understanding of the human past over the last few years. The distance between 2011 and 2009 is far greater in this sense than between 2009 and 1999 (or even 2009 and 1989!). The io9 piece is a reflection of the era before the paradigmatic rupture.

We are no longer talking just about African mtDNA Eve and her husband Y chromosomal Adam. I’m going to consciously avoid the term “revolutionize,” because the broad outlines of the old story certainly hold. Rather, as we are wont to do it seems that we became a bit too bold with some of our brush strokes, and elided fascinating and subtle elements of the landscape on the margins. There were Crebs, and other assorted Oogas and Boogas. And the painting is not completed yet. As such we can’t really draw any conclusions as to “what it all means,” aside from the fact that it’s fascinating.

Addendum: Someone in the comments observes in relation to a depiction of Eve in the story that “She’s awfully pale for an East African.” This is true on the merits, but the logic is kind of dumb. Why exactly do we think that people ~150,000 years ago looked anything like modern East Africans? It is very likely that Europeans ~35,000 years ago did not look like Daryl Hannah.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

I have blogged about the genetics of altitude adaptation before. There seem to be three populations in the world which have been subject to very strong natural selection, resulting in physiological differences, in response to the human tendency toward hypoxia. Two of them are relatively well known, the Tibetans and the indigenous people of the Andes. But the highlanders of Ethiopia have been less well studied, nor have they received as much attention. But the capital of Ethiopia, Addis Ababa, is nearly 8,000 feet above sea level!

Another interesting aspect to this phenomenon is that it looks like the three populations respond to adaptive pressures differently. Their physiological response varies. And the more recent work in genomics implies that though there are similarities between the Asian and American populations, there are also differences. This illustrates the evolutionary principle of convergence, where different populations approach the same phenotypic optimum, though by somewhat different means. To my knowledge there has not been as much investigation of the African example. Until now. A new provisional paper in Genome Biology is out, Genetic adaptation to high altitude in the Ethiopian highlands:

We highlight several candidate genes for involvement in high-altitude adaptation in Ethiopia, including CBARA1, VAV3, ARNT2 and THRB. Although most of these genes have not been identified in previous studies of high-altitude Tibetan or Andean population samples, two of these genes (THRB and ARNT2) play a role in the HIF-1 pathway, a pathway implicated in previous work reported in Tibetan and Andean studies. These combined results suggest that adaptation to high altitude arose independently due to convergent evolution in high-altitude Amhara populations in Ethiopia.

The main shortcoming about this paper for me is that it does not highlight the evolutionary history of this adaptation. In the paper the authors compared the Amhara (a highland population) to nearby lowland populations. But did not explore the nature of the population structure and how it might have influenced the arc of adaptation. Are these very ancient adaptations? Or new ones? It seems that hominins have been resident in Ethiopian for millions of years. If this is so presumably there have been adaptations to higher elevations from time immemorial. But what if these adaptations are new?

More pointedly the Ethiopians can be modeled as a compound of an Arabian population with an indigenous East African one. If this is a genuine recent admixture event, then one might be able to ascertain via haplotype structure whether the adaptive variants derive from ancient African genetic variation, or whether they’re novel mutations. It seems that this paper is a good first step, but there’s a lot more to see here….

Citation: Genome Biology, doi:10.1186/gb-2012-13-1-r1

Image credit: Wikipedia

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

Dienekes and Maju have both commented on a new paper which looked at the likelihood of lactase persistence in Neolithic remains from Spain, but I thought I would comment on it as well. The paper is: Low prevalence of lactase persistence in Neolithic South-West Europe. The location is on the fringes of the modern Basque country, while the time frame is ~3000 BC. Table 3 shows the major result:

Lactase persistence is a dominant trait. That means any individual with at least one copy of the T allele is persistent. As Maju noted a peculiarity here is that the genotypes are not in Hardy-Weinberg Equilibrium. Specifically, there are an excess of homozygotes. Using the SJAPL location as a potentially random mating scenario you should expect ~7 T/C genotypes, not 2. Interestingly the persistent individual in the Longar location also a homozygote.


HWE makes a few assumptions. For example, no selection, migration, mutation, or assortative mating. Deviation from HWE is suggestive of one of these dynamics. The sample size here is small, but the deviation is not to be dismissed. Recall that lactase persistence has dominant inheritance patterns. If the trait was being positively selected for you would only need one copy. The enrichment of homozygotes is unexpected if selection in situ is occurring here. It can not be ruled out that one is observing the admixture of two distinct populations. One generation of random mating would generate HWE, but when populations hybridize in realistic scenarios this is not always a plausible assumption. Rather, assortative mating often persists over the generations, slowing down the diminishing of population substructure.

Stepping back from speculation in this case what can we say? First, the LCT locus has a large mutational target. The trait of lactase persistence has arisen multiple times via different mutational events across the Old World. But, there does seem to be one particular variant which is found from Spain to Northern India. There is some circumstantial evidence that the allele had its origin somewhere in Central Eurasia, but currently its modal frequency is in Northern Europe, Scandinavia and Germany. The region in the genome around this mutation is characterized by a very long haplotype. It is one of the most definitive loci as a candidate for natural selection in the human genome. There is now a fair amount of ancient DNA evidence that lactase persistence in Europe is a feature of the last ~5,000 years or so. Among the modern Basques the frequency of the allele is 66 percent.

For me the key issue is teasing apart the role of migration and selection in each specific case. It does not seem to be correct that the frequency of the -13910T LCT allele in Basques and Punjabis is reflective of the frequency of recent common ancestry. That implies that natural selection is at work at this locus. On the other hand, the haplotype which is present in both the Basque and Punjabis is likely to be descended from a common set of individuals, implying that there is a genealogical chain connecting these two very distinct and distant Eurasian populations. Therefore, we can potentially make some inferences about the power of migration in spreading distinctive alleles. Often we partition selection from genealogical information, because selection so often serves to distort the signal. But the genealogical patterns may lay at the heart of the distribution of different natural selective events at the LCT locus.

Overall, I would say that the results from ancient DNA are disordering and clouding simple elegant models. One hopes and presumes that as sample sizes increase in this domain we’ll start to see more clarity as new paradigms crystallize.

Citation: European Journal of Human Genetics, 10.1038/ejhg.2011.254

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

In the near future I will be analyzing the genotype of an individual where all four grandparents have been typed. But this got me thinking about my own situation: is there a way I could “reconstruct” my own grandparents? None of them are living. The easiest way to type them would be to obtain tissue samples from hospitals. This is not totally implausible, though in this case these would be Bangladeshi hospitals, so they might not have saved samples or even have a good record of hem. Another way would be to extract DNA from the burial site. This is not necessarily palatable. But assuming you did this, if you have access to a forensic lab it might be pretty easy (though I think most forensic labs using VNTRs, rather than SNP chips, so I don’t know if they’d touch every chromosome), I’m not sure that the quality would be optimal for more vanilla typing operations, especially for older samples which are likely to be contaminated with a lot of bacteria.

For me the simplest option is to look at relatives. Each of my grandparents happens to have had siblings, so there are many sets of relatives related to just each of those individuals of interest. I also have many cousins, so pooling all the genotypes together and using the information of a pedigree one could ascertain which chromosomal segments are likely to derive from a particular grandparent. To give a concrete example, my mother has a maternal cousin to whom she is quite close. By typing my mother and her cousin one could infer that the segments shared across the two individuals derive from the common maternal grandparents. Of course there’s a problem that cousins have a coefficient of relatedness of only 1/8th, so there is going to be a lot of information missing. But, if you had lots of cousins you could presumably reconstruct the genotypes far better.

 

But what if you didn’t have any of this? I came up with a crazy idea, and I want to throw it out there to see how crazy it is. The issue from the perspective of you, the indivdual without grandparental information, is that for either your mother or your father you don’t know which homologous chromosomes come from which parent (your grandparents, their parents). As it happens, everyone has a male parent and a female parent. So if you can assign a a chromosomal region as having come from the male, and another as having come from the female, then you can reconstruct some of your grandparents’ genotypes because you know their sexes. How can you make this determination?

Genomic imprinting. This is a phenomenon where genes from a given parent, often of a particular sex, are expressed, while those of the other sex are repressed (often it manifests in terms of methylation or lack of methylation). Therefore, if you have a gene, A, which is usually expressed if inherited from a male parent and repressed if it is inherited from a female parent then the state of that gene within a chromosomal region can be a “tag” for the sex of the parent of origin. With enough of these imprinted genes you can create a mosaic of the genome of the individual in terms of sex of origin. Obviously genomic regions from different sexes are from different parents. If you have enough children of these two parents you should be able to infer the whole genomes of these individuals.

The big reason this probably won’t work is that there just aren’t enough imprinted genes in the human genome. But what do readers think?

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS


Zinedine Zidane, a Kabyle

There is a new paper in PLoS Genetics out which purports to characterize the ancestry of the populations of northern Africa in greater detail. This is important. The HGDP data set does have a North African population, the Mozabites, but it’s not ideal to represent hundreds of millions of people with just one group. The first author on this new paper is Brenna Henn, who was also first author on another paper with a diverse African data set. Importantly the data was posted online. Unfortunately though most of the populations didn’t have too many markers. This isn’t an issue in an of itself, but it becomes a big deal when trying to combine it with other data sets. If you limit the markers to those which intersect across two data sets you start to thin them down a lot, to the point where they’re not useful. Though the the results of the paper are worth talking about, the authors claim that they’ll be putting the data online. This is important because they used a large number of markers, so the intersections will be nice (I can, for example, envisage exploring the relationship between the North Africans and the IBS Iberian sample in the near future).

As for the paper itself, Genomic Ancestry of North Africans Supports Back-to-Africa Migrations:

Proposed migrations between North Africa and neighboring regions have included Paleolithic gene flow from the Near East, an Arabic migration across the whole of North Africa 1,400 years ago (ya), and trans-Saharan transport of slaves from sub-Saharan Africa. Historical records, archaeology, and mitochondrial and Y-chromosome DNA have been marshaled in support of one theory or another, but there is little consensus regarding the overall genetic background of North African populations or their origin and expansion. We characterize the patterns of genetic variation in North Africa using ~730,000 single nucleotide polymorphisms from across the genome for seven populations. We observe two distinct, opposite gradients of ancestry: an east-to-west increase in likely autochthonous North African ancestry and an east-to-west decrease in likely Near Eastern Arabic ancestry. The indigenous North African ancestry may have been more common in Berber populations and appears most closely related to populations outside of Africa, but divergence between Maghrebi peoples and Near Eastern/Europeans likely precedes the Holocene (>12,000 ya). We also find significant signatures of sub-Saharan African ancestry that vary substantially among populations. These sub-Saharan ancestries appear to be a recent introduction into North African populations, dating to about 1,200 years ago in southern Morocco and about 750 years ago into Egypt, possibly reflecting the patterns of the trans-Saharan slave trade that occurred during this period.

The model outline here is straightforward:

- A population of West Eurasian provenance migrated across the fringe of the southern Mediterranean >10,000 years B.P. (Maghrebi)

- This was later overlain by a later West Asian migration (Near Eastern)

- A third major element here seems to be Sub-Saharan African admixture, which these authors claim is rather new (post-Roman)

Two of the methods used will be familiar to readers of this weblog. They used ADMIXTURE to generate barplots which fractionate putative ancestral components given K number of components. Second, they also use PCA to visualize the largest components genetic variation within the samples on a plane.

[nggallery id=33]

As you “move up” the K’s you note that Maghrebi populations “split” from the Near Eastern reference, the Qataris. This is supported by the PCA, which shows that there is a dimension of variation which separates Near Easterners & Europeans from Maghrebis. The authors note that this dimension is orthogonal to the Sub-Saharan African vs. Eurasian component. That suggests that the putative Maghrebi component is likely to be part of the set of “Out of Africa” populations, rather than an African population which simply experienced continuous gene flow with West Eurasians.

They also estimate a Fst, a statistic which partitions genetic variation within and between groups. The value between Sub-Saharan Africans and Europeans is ~0.15 using HGDP SNP data, and between Europeans and East Asians ~0.10. Using the Tuscans and Qataris as European and West Asian references against the North African populations along their east-west cline they estimate Fsts from ~0.03 to ~0.06. The higher end values are from populations which are less admixed with Near Eastern elements, and the colored polygons illustrate the domain generated by ADMIXTURE Fsts across inferred ancestral components. You also see in the chart estimated time of divergence. I won’t get into the assumptions in the model, but the authors do note that ~12,000 years B.P. seems to be the low bound estimate for when the Maghbrebis diverged from other West Eurasians. This is important, because it predates agriculture.

The final set of methods outlined in this paper looked at ancestry on a more fine-grained genomic scale. To the left you see a plot where each horizontal bar represents an individual’s chromosome 1 (among a set of North Africans). Each color in that bar indicates a component of ancestry (except the black, which are centromeres). This sort of information is important, because saying someone is 50% X and 50% Y summarizes information to the point of eliding it. An individual who is a first generation product of a Chinese-European marriage is going to have the same ancestral proportions as someone who is a Uyghur for those respective populations. But a fine-scale mapping of the genomic ancestry would look very different, because the history of the admixture is very different.

There are many inferences in the paper which I won’t address. Rather, let me focus on this one assertion:

After accounting for putative recent admixture (Figure 1), the indigenous Maghrebi component (k-based) is estimated to have diverged from Near Eastern/Europeans between 18–38 Kya (Figure 3), under a range of Ne and k values. We hence suggest that the ancestral Maghrebi population separated from Near Eastern/Europeans prior to the Holocene, and that the Maghrebi populations do not represent a large-scale demic diffusion of agropastoralists from the Near East.

This is not implausible on the face of it. The component of ancestry modal in the Mozabite HGDP sample tends to have a relatively high Fst in relation to other West Eurasian groups. I had wondered if this was due to ancient Sub-Saharan African admixture which had produced a particular stabilized hybrid, but these results indicate that the component is no closer than other West Eurasians. What I’m confused and skeptical about are the range of divergence times which different papers are producing which seem somewhat implausible taken together.

There are papers which posit that East Asians separated from Europeans ~25,000 years B.P. This is in the same range as the divergence between Maghrebis and West Eurasians, but the Maghrebi genetic distance (Fst) is about 1/2 as great. Also, these sets of results which generate a “bunching” together of the separation of many extant non-African lineages in the 20-40,000 year range imply very rapid differentiation after the “Out of Africa” event, if that event did occur ~50,000 years ago (at least for most Eurasians, even assuming a revised model whereby Australian Aboriginals derive from an earlier wave). One at a time any given divergence estimate may be broadly plausible, but the literature is just not particularly coherent on this matter, and it often seems archaeologically implausible.

Citation: Henn BM , Botigué LR , Gravel S , Wang W , Brisbin A , et al. 2012 Genomic Ancestry of North Africans Supports Back-to-Africa Migrations. PLoS Genet 8(1): e1002397. doi:10.1371/journal.pgen.1002397

Image Credit: Raphaël Labbé

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS


Hominin increase in cranial capacity, courtesy of Luke Jostins


A few years ago a statistical geneticist at Cambridge’s Sanger Institute, Luke Jostins, posted the chart above using data from fossils on cranial capacity of hominins (the human lineage). As you can see there was a gradual increase in cranial capacity until ~250,000 years before the present, and then a more rapid increase. I should also note that from what I know about the empirical data, mean human cranial capacity peaked around the Last Glacial Maximum. Our brains have been shrinking, even relative to our body sizes (we’re not as large as we were during the Ice Age). But that’s neither here nor there. In the comments Jostins observes:

The data above includes all known Homo skulls, but none of the results change if you exclude the 24 Neandertals. In fact, you see the same results if you exclude Sapiens but keep Neandertals; the trends are pan-Homo, and aren’t confined to a specific lineage….


In other words: the secular increase in cranial capacity for our lineage extends millions of years back into the past, and also shifts laterally to “side-branches” (with our specific terminal node, H. sapiens sapiens, as a reference). This is why I often contend as an aside that humanity was to some extent inevitable. By humanity I do not mean H. sapiens sapiens, the descendants of a subset of African hominins who flourished ~100,000 years before the present, but intelligent and cultural hominins who would inevitably construct a technological civilization. The parallel trends across the different distinct branches of the hominin family tree which Luke Jostins observed indicated to me that our lineage was not special, but simply first. That is, if African hominins were exterminated by aliens ~100,000 years before the present, at some point something akin to H. sapiens sapiens in creativity and rapidity of cultural production would eventually arise (in all likelihood later, but possibly earlier!).

This does not mean that I think humanity was inevitable upon earth. For most of the history of this planet life was unicellular. I do not find it implausible that life on earth may have reached its “sell by” date due to astronomical events before the emergence of complex organisms (in fact, from what I have heard the end of life is going to occur ~1 billion years into the future due to the persistent increase in the energy output of Sol, not ~4 billion years in the future when Sol turns into a red giant). But, once complex organisms arose it does seem that further complexity was inevitable. This was Richard Dawkins’ case in The Ancestor’s Tale based simply on the descriptive record. But did the emergence of complex organisms necessarily entail the evolution of a technological species? I don’t think so. It took 500 million years for that to occur (it does not seem that coal resources formed hundreds of millions of years ago were tapped before humans). Given enough time obviously a technological species would evolve (e.g., extend the time of evaluation to 1 trillion years), but note that the earth has only ~5 billion years. Homo arrived on the scene in the last 20% of that interval.

Here I am positing at a minimum two not excessively likely or inevitable events over a 5 billion year time span which would lead to a hyper-technological and cultural species:

- The emergence of multicellular life

- The emergence of a lineage with the propensities of Homo

One Homo evolved and expanded outside of Africa I suspect that something of the form of a technological civilization became inevitable n this planet. We see parallelism in our own short post-Pleistocene epoch. Multiple human societies shifted from hunter-gatherers to agriculturalists over the past 10,000 years. The experience of the New World civilizations in particular illustrates that human universal tendencies are real. Not only were “game changing” cultural forms such as agriculture and literacy invented independently during the Holocene, but they were not invented during earlier interglacials (at least in all likelihood).


Khufu, Necho, Augustus and Napoleon

Why not? Well, consider the cultural torpidity of Paleolithic toolkits, which might persist for hundreds of thousands of years! I suspect some of this due to biology. But even over the Holocene we do perceive that cultural change has proceeded at a more rapid clip as time has progressed (i.e., at a minimum cultural change has been accelerating, and it may be that the rate of acceleration itself is increasing!). Consider that the civilization of ancient Egypt spanned at least 2,000 years. Though there are clear differences, the continuity between Old Kingdom Egypt and the last dynasties before the Assyrian and Persian conquests is very obvious to us, and would be obvious to ancient Egyptians. In contrast, 2,000 years separates us from Augustan Rome. The continuities here are clear as well (e.g., the Roman alphabet), but the cultural change is also clear (if you wish to argue that the early modern and modern period are sui generis, the 1,500 year interval from Augustan Rome to the Neo-Classical Renaissance would still be a stark contrast when compared against an ancient Egyptian reference*, despite the latter’s aping of the forms of the former).

So far I have focused on the vertical dimension of time. But there is also the lateral dimension, of cross-fertilization across the branches of the hominin family tree. The admixture of a Neanderthal element into non-Africans has started to become widely accepted recently, thanks to the confluence of archaeology and genomics in the field of ancient DNA. Even if one rejects the viability of Neanderthal admixture, the solution to the conundrum of these results must still entail stepping away from a simple model of recent exclusive origin of humans from a small African population. There are also hints of admixture with other archaic lineages on the Pacific fringe, and within Africa.

Until recently it was common to posit that modern humans, our own lineage, had some special genius which allowed it to sweep the field and extinguish our cousins. The qualitative result of Luke Jostins’ plot was known; that other hominin lineages also exhibited encephalization. In fact, it was a curious fact that Neanderthals on average had larger cranial capacities than anatomically modern humans. But the reality remained that we replaced them, ergo, we must have a special genius. Until the lack of distinction between Neanderthals and modern humans on loci implicated in the necessary (if not sufficient) competency of language that trait was a prime candidate for what made “us” special. But now I put “us” in quotation marks. The data do point to an overwhelming descent from an African or near-African population for non-Africans over the past 100,000 years. But the “archaic admixture” is not trivial. What was they are us, and we have become what they might have been.

For over two centuries there has been a debate in the West between monogenesis and polygenesis. The former is the position that humankind derives from one single pair or population (the former a straightforward recapitulation of the standard Abrahamic model). The latter is the position that different races of humans derive from different proto-humans, or, for the Christian polygenists that only Europeans descent from Adam and Eve (the other races being “non-Adamic”). Echoes of this conflict persist down to the present era. Many of the earlier partisans of “Out of Africa” have claimed that the proponents of multiregionalism were latter-day polygenists (not without total justification in some cases).

But the conflict between monogenism and polygenism is not the appropriate frame for what is being unveiled by reality before our eyes. What we see in the creation of modern humanity is a monogenic base inflected with the flavors of polygenism. Modern humans descend, by and large, from an expansion of an African population over the past 200,000 years. But on the margins there are other strands and filaments of ancestry which tie disparate populations back to lineages which branched off far earlier from the main trunk. At a minimum hundreds of thousands, and perhaps an order of 1 million years, before our own age. Today genomics avails of us the statistical power to extract out these discordant signals from the fluid “Out of Africa” narrative, but I would not be surprised if in the near future we stumble upon more and more “long branches” of less noteworthy quantity. Admixture is likely to be an old and persistent story in the hominin lineage, with only the most recent substantial bouts of separation and hybridization being of notice and curiosity at this moment in time.

What does all this mean? And why have I juxtaposed deep time natural history across the tree of life with inferences of relatively recent paleoanthropology? Let’s start with two propositions:

- Technological civilization, an outward manifestation of radically complex sentience, is not inevitable, though it is probable given certain preconditions (I believe that the existence of Homo increased its probability to ~1.0 over a reasonable time period)

- Radically complex sentience is not the monopoly of a particular exclusive lineage which accrues its genius from a particular specific forebear

John Farrell has pointed out the possible issues that the Roman Catholic church may have with the new model of human origins. But the Catholic church is only but a reflection of more general human strain of thought. Descent-groups, whether real or fictive, loom large in the human imagination. The evolutionary rationale for this is not too hard to explain, but we co-opt the importance of kinship in many different domains. Like evolution, human cultural forms simply take what is already present, and retrofit and modify elements to taste.

So why are humans special? And why do humans have inalienable rights? Many of us may not agree with the proposition that we are the descendants of Adam and Eve, and therefore we were granted the divine grace of eternal souls. But a hint of this logic can be found in the assumptions of many thinkers who do not agree with the propositions of the Roman Catholic church. Recently I listened to Sherry Turkle arguing against a reliance on “robot companions” which are able to exhibit the verisimilitude of human emotions for those who may be lacking in companionship (e.g., the aged and infirm). Though Turkles’ arguments were not without foundation, some of her arguments were of the form that “they are not us, they are not real, we are real. And that matters.” This is certainly true now, but will it always be? Who is this “they” and this “we”? And what does “real” mean? Are emotions a mysterious human quality, which will remain outside of the grasp of those who do not descend from Adam, literal or metaphorical?

If there arises a point where non-human sentience is a reality, do they have the same rights as we? Though the difference is radical in terms of quantity to some extent I think we know the answer: they are human by the way they are, not by the way their ancestors were. The “taint” of admixture with diverse lineages across the present human tree of life has not resulted in an updating of our understanding of human rights. That is because the idea that we are all the children of Adam, or the descendants of mitochondrial Eve, is a post facto justification for our understanding of what the rights of humanity are, adn what humanity is. And what it is is a particular ecological niche, a way of being, not being who descend down in a line of biological relationship from a particular person or persons.

* The cultural fundamentals of Old Kingdom Egypt arguably persisted in a living fossil form in the temple at Philae down to the 6th century A.D.! Therefore, a 3,500 year lineage of literature continuity.

Image credits: all public domain images from Wikpedia

(Republished from Discover/GNXP by permission of author or representative)
 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"