The Pith: The human X chromosome is subject to more pressure from natural selection, resulting in less genetic diversity. But, the differences in diversity of X chromosomes across human populations seem to be more a function of population history than differences in the power of natural selection across those populations.
In the past few years there has been a finding that the human X chromosome exhibits less genetic diversity than the non-sex regions of the genome, the autosome. Why? On the face of it this might seem inexplicable, but a few basic structural factors derived from the architecture of the human genome present themselves.
First, in males the X chromosome is hemizygous, rendering it more exposed to selection. This is rather straightforward once you move beyond the jargon. Human males have only one copy of genes which express on the X chromosome, because they have only one X chromosome. In contrast, females have two X chromosomes. This is the reason why sex linked traits in humans are disproportionately male. For genes on the X chromosome women can be carriers of many diseases because they have two copies of a gene, and one copy may be functional. In contrast, a male has only a functional or nonfunctional version of the gene, because he has one copy on the X chromosome. This is different from the case on the autosome, where both males and females have two copies of every gene.
This structural divergence matters for the selective dynamics operative upon the X chromosome vs. the autosome. On the autosome recessive traits pay far less of a cost in terms of fitness than they do on the X chromosome, because in the case of the latter they’re much more often exposed to natural selection via males. In the rest of the genome recessive traits only pay the cost of their shortcomings when they’re present as two copies in an individual, homozygotes. A simple quasi-formal example illustrates the process.
Imagine a population which has an allele which expresses recessively and has sharply reduced fitness when it expressed. Assume that the allele in question, q, is present at a proportion of 0.50. All the other functional alleles are classed together as p, and are also 0.50. In the next generation the Hardy-Weinberg Equilibrium would entail that: 75% of the individuals would not express the recessive trait, but 25% of the individuals would.* But for ever copy of the deleterious allele which is expressed and so exposed to natural selection, there’s another copy of the deleterious allele which is “masked” in a heterozygous individual with one good copy, and so evades natural selection. As natural selection decreases the frequency of the deleterious allele fewer and fewer copies will be found in recessively expressed individuals, and so the power of selection to remove the allele will decrease as its own frequency declines. When the frequency of the deleterious alleles is ~0.01, only about 1 out of 100 copies will be found in a homozygote exposed to natural selection. In this way genetic diversity of even deleterious alleles can be preserved as many low frequency recessively expressed variants.
The situation differs on the X chromosome. If the population consisted only of females then the model above would hold. The trait only expresses if a female has two copies of the faulty gene. But one out of every three X chromosomes in the typical human population is present in a male. That means that every deleterious allele on that X will bear its full cost if it happens to be in a male, a 1 out of 3 probability. So I calculate that when you have a situation where the deleterious allele is present as a fraction ~0.01 on the X chromosome about 1 out of 4 copies will be expressed, overwhelmingly in males. This is a 25-fold difference between the X and autosome in terms of copies of a deleterious allele exposed to natural selection, all due to the hemizygosity of males.
But the effect of selection isn’t uniformly negative, the purification of bad gene copies from the population. Positive forces can also reduce diversity via a selective sweep. How and why this happens is rather straightforward. Imagine that you have a single base pair which fortuitously has a mutation which is very beneficial in a single individual. To make the expression simple imagine that it is dominant, and the individual is a heterozygote. The single individual who carries the favored mutation has a very large family because ~50 percent of their offspring also carry the favored mutation and are much more fit than the population average. And so on. This favored variant can spread very fast. Lactose tolerance is a good concrete case of this. When I say the favored variant spreads, I’m actually talking about one gene copy from one person which starts to increase in frequency because of its adaptive value. But recall that a single base pair is embedded within the genome, and that chromosomal regions are generally passed on together from parent to offspring. It’s quite often a package deal. When a favored allele emerges it enables the “hitchhiking” of nearby variants which have no selective advantage, except that they luckily exist next to a very adaptively beneficial allele (think of them as the gene’s “posse” or entourage). Of course genetic recombination breaks apart these associations over time, but this process takes generations. Until then what you see is the proliferation of a particular genomic segment along with the increase in frequency of the favored gene which is embedded in that particular region. By straightforward logic when a whole segment with associated alleles starts to increase in frequency aggregate genetic diversity decreases, as variation is swept aside.
And yet evolution is not simply natural selection. There are two processes which have nothing to do with selection as such which might reduce genetic variation. The motor which both these phenomenon turn on is random genetic drift. As you increase the power of drift to fluctuate gene frequencies generation to generation you also increase its power to render alleles extinct as they are extinguished once they hit the zero frequency boundary condition. This is why populations which have gone through population bottlenecks are so homogeneous; drift has squeezed most of the variation out of the gene pool by capriciously favoring some alleles and eliminating most of the rest.
The dynamics relevant to this specific case are differences in male and female effective population size, and large fluctuations in long term effective population size. For purposes of reduced X chromosomal diversity one would have to posit lower female effective population size than male effective population size. The reason why this would impact the diversity of the X relative to the autosome is that the X spends 2/3 of its time in females, while the autosome only spends 1/2 of its time in both sexes. So if females have lower effective population sizes than males the X chromosome is being buffeted by greater stochastic forces than the autosome. More generally, the X chromosome has a lower effective population even assuming sex balance because for every 4 copies of an autosomal chromosome there are 3 X chromosomes. Because of this reduced effective population size the X would be more sensitive to bottlenecks and the like, one of the consequence of which is reduced genetic diversity.
All the above is important to keep in mind when reading a new report in Nature Genetics on the balance between selection and drift in reducing variation on the X chromosome and across populations. The second refers to the fact that Africans seem to exhibit less relative reduction of variation on the X chromosome than non-Africans. First, the paper’s abstract, Analyses of X-linked and autosomal genetic variation in population-scale whole genome sequencing:
The ratio of genetic diversity on chromosome X to that on the autosomes is sensitive to both natural selection and demography. On the basis of whole-genome sequences of 69 females, we report that whereas this ratio increases with genetic distance from genes across populations, it is lower in Europeans than in West Africans independent of proximity to genes. This relative reduction is most parsimoniously explained by differences in demographic history without the need to invoke natural selection.
This research is part of the trend I’ve alluded to toward looking at whole-genome sequences. Remember, a lot of the 1 million SNP papers are focusing only upon genetic variants, polymorphisms, across the 3 billion base pairs. These variants are especially informative, but they miss a lot of the genome. Additionally there are some statistical problems with bias in the selection of the variants because they’re usually tuned toward one population, Europeans (different populations have somewhat different variants across the genome). The takeaway is that the time is now nearly here when we can look at the genome at its most precise and fine-grained scale, rather than using approximations, whether it be one locus, or 1 million SNPs.
With this broad canvas in mind, if there’s one thing you’ve read about the genome it’s that much of it is not functional. It doesn’t code. There are zones of the genome which are intergenic, between genes. Natural selection generally targets functional regions, not intergenic ones. If natural selection is the primary dynamic effecting the pattern we see here then differences should manifest between genic and intergenic regions since selection plays a much larger role in the former than the latter, both in constraining variation and increasing the frequency of favored alleles.
The figure below has four panels. Every panel has an x-axis defined by distance from a gene, left to right with increasing distance. So the leftmost point can be thought of as genic, and the rightmost point as intergenic. The left panels define Europeans, and the right panels Africans. More precisely they’re displaying results from whole-genome sequences of 36 West African Yoruba and 33 European American females. The top row shows the change in raw nucelotide diversities for autosomes and X, and the bottom row illustrates the change in ratio of diversity of the two genomic classes (X vs. autosome) as a function of distance.
In molecular evolutionary genetics it often useful to assume that the null hypothesis is neutrality. Basically that means that selection is not a main effect in driving the variation. Instead it’s a function of random forces such as mutation and drift. When one sees deviation from neutrality then one considers the effect of natural selection and the possibility of adaptation. You see here clear evidence for natural selection. The genetic diversity on the X chromosome has a much stronger relationship to distance from genes than the autosome. This matters because as you recall the X chromosome is much more brutally sculpted by natural selection on a priori grounds because disfavored alleles would be pruned more efficiently, while recessively expressing favored alleles would be less handicapped by the fact that their favored traits often did not express when they were present (because they were suppressed when in heterozygote). The pattern above is entirely in keeping with that model.
So now we’ve seen that a closer whole-genome examination of these samples implies that the X vs. autosomal difference in diversity is not just a function of neutral forces, but may have been driven by natural selection. But there’s a second part of the phenomenon: the disjunction is usually more stark in non-Africans. If so, does this imply that non-Africans have been subject to more natural selection? The manner in which they explored this question was clean and elegant: they compared the ratios of ratios as a function of distance from genes. By this, I mean that they looked at the ratio of diversity of the genome between the X and the autosome, and then generated a ratio from this value by comparing across Europeans and Africans. Unlike those above the figure to the left shows no differences as a function of genetic distance. What does this tell us? If natural selection was more efficacious in Europeans than Africans then the differences in diversity across these two populations should be stronger near genetic regions, because that is where the power of selection is most felt. Instead, what you see is that though the difference across X and autosomal genomes is real, it is consistent between the genomes of Africans and Europeans across the X and the autosome.
This suggests that the difference between Africans and Europeans is driven by demographics and not adaptation (positive selection) or functional constraint (negative or purifying selection). Random evolutionary forces don’t see genic or intergenic regions. They’re random, and blind or neutral to functional import. Unlike selection their impact is going to be genome-wide, just as the inter-regional differences we see here are.
In this case what happened? Going back to the beginning there were two specific possibilities: sex-biased migration and greater fluctuation in effective population size among non-Africans. The latter model is entirely consistent with an “Out of Africa” scenario where non-Africans derive from a small ancestral population which left Africa. This is the great “Out of Africa” bottleneck which seems to be a consistent finding by human molecular evolutionists. Because the X chromosome has a somewhat smaller effective population it would presumably have been more impacted by the homogenizing force of this bottleneck.
The first option though is intriguing, if peculiar. What if there were multiple “Out of Africa” pulses which consisted disproportionately of groups of young males? This would have enriched the genetic diversity of non-Africans on the autosome far more than the X chromosome, because the males would bring only one X chromosome for every two autosomes. I think the “Out of Africa” model is more plausible, but I’m not going to dismiss this scenario out of hand. We live in interesting and strange times when it comes to the origin of modern humans.
* p2 + 2pq + q2 = 1 = 0.502 + 2(0.50)(0.50) + 0.502
Citation: Gottipati, Srikanth, Arbiza, Leonardo, Siepel, Adam, Clark, Andrew G, & Keinan, Alon (2011). Relative autosomal, X-linked and X/A diversity are not correlated with genetic distance from the nearest gene. Nature Genetics : 10.1038/ng.877