The Pith: Natural selection is a quick & dirty operator. When subject to novel environments it can react rapidly, bringing both the good and the bad. The key toward successful adaptation is not perfection, but being better than the alternatives. This may mean that many contemporary diseases are side effects of past evolutionary genetic compromises.
The above is a figure from a recent paper which just came out in Molecular Biology and Evolution, Crohn’s disease and genetic hitchhiking at IBD5. You probably have heard about Crohn’s disease before, there are hundreds of thousands of Americans afflicted with it. It’s an inflammatory bowel ailment, and it can be debilitating even to very young people. The prevalence also varies quite a bit by population. Why? It could be something in the environment (e.g., different diet) or genetic predisposition, or some combination. What the figure above purports to illustrate is the correlation between Crohn’s disease and the expansion of the agricultural lifestyle.
But don’t get overexcited Paleos! There are many moving parts to this story, and I need to back up to the beginning. The tens of thousands of genes which you inherited from your parents are embedded within the genome and aligned in a set of sequences, one after the other. On the one hand for the purposes of conceptualizing evolutionary dynamics, such as natural selection or random genetic drift, focusing on a single gene is useful. It has power to illustrate some basic and elementary principles. But sometimes you need to take a more synoptic view, and look at genes in their broader context. In this post I’ll avoid molecular or statistical epistasis, gene-gene interaction. Rather, let’s just consider the static landscape of the genome, where genes are physical concrete entities which are embedded in a particular spatial relationship to other genes, upstream or downstream in the genetic code. These physical or statistical associations of genes can form a de facto supergene through linkage, and their variants combine to form haplotypes, sequences of markers across small stretches of the genome. But recall that these associations are counter-balanced by genetic recombination, which tears apart physical sequences and sows them to the opposite DNA strand.
This is what we think happened. Between 5 and 10 thousand years before the present there lived an individual who carried a dominant genetic mutation which allowed for the persistent production of lactase into adulthood. Only one copy of the lactase persistent allele is needed for lactose tolerance. That’s why populations such as in Denmark where the persistent allele is present in proportions of 80-90% have nearly universal tolerance. As per the Hardy-Weinberg equilibrium a recessive trait would express at frequencies of 1-4% (square the frequency of the minor allele). Going back to the individual with the mutant copy, if one considers a scenario where lactase persistence would be highly beneficial (this is not hard to imagine) then the frequency of that mutant would rapidly rise. It would “sweep” through the population. As it has a dominant mode of expression half of the children of the original mutant would express the trait and carry the allele, while half would not. Over the generations that one original copy could replicate rapidly within a population due to positive selection and intermarriage.
But it’s not just the functionally relevant genetic variant which would proliferate. The lactase persistent allele would be embedded within the context of a host of other genetic variants across the sequence of the DNA strand in which it was located. As the lactase persistent allele rose rapidly in frequency in a selective sweep its neighbors would hitchhike along. The extent of the hitchhiking would be conditional upon distance from the positively selected variant and the speed of the sweep, which itself would presumably depend upon the strength of selection. All of this together explains the very long haplotype around LCT in Northern Europeans: 5 to 10 thousand years ago a relatively large genomic segment of an individual who carried a lactase persistent allele was driven up in frequency very rapidly because of adaption to new conditions. Not only did that particular individual’s functionally relevant variant, the target of selection, sweep nearly to fixation in some Northern European populations, but many adjacent variants also rose in frequency, in direct proportion from distance from the focal variant. In other words, natural selection in this case was about one specific functional unit within LCT, but as a side effect it also reorganized a whole swath of the total population genome structure of Northern Europeans.
What does that have to do with Crohn’s disease and agriculture? Crohn’s disease may be a modification of the LCT story in a genomic sense, and the trigger of that modification may have been agriculture. Before I go any further, let me post the paper’s abstract:
IBD5 (inflammatory bowel disease 5) is a 250 kb haplotype on chromosome 5 that is associated with an increased risk of Crohn’s disease in Europeans. The OCTN1 gene is centrally located on IBD5 and encodes a transporter of the antioxidant ergothioneine (ET). The 503F variant of OCTN1 is strongly associated with IBD5 and is a gain-of-function mutation that increases absorption of ET. Although 503F has been implicated as the variant potentially responsible for Crohn’s disease susceptibility at IBD5, there is little evidence beyond statistical association to support its role in disease causation. We hypothesize that 503F is a recent adaptation in Europeans that swept to relatively high frequency, and that disease association at IBD5 results not from 503F itself, but from one or more nearby hitchhiking variants, in the genes IRF1 or IL5. To test for evidence of recent positive selection on the 503F allele, we employed the iHS statistic, which was significant in the European…populations…To evaluate the hypothesis of disease-variant hitchhiking, we performed haplotype association tests on high-density microarray data in a sample of 1868 Crohn’s disease cases and 5550 controls. We found that 503F haplotypes with recombination breakpoints between OCTN1 and IRF1 or IL5 were not associated with disease…In contrast, we observed strong disease association for 503F haplotypes with no recombination between these three gene…as expected if the sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5. To further evaluate these disease-gene candidates, we obtained expression data from lower gastrointestinal biopsies of healthy individuals and Crohn’s disease patients. We observed a 72% increase in gene expression of IRF1 among Crohn’s disease patients (p=0.0006) and no significant difference in expression of OCTN1….
It’s all a mouthful. But let’s review here. IBD5 is a 250 kilobase haplotype implicated in Crohn’s disease. A long segment of associated markers which also seem to correlate with individuals with the illness. This does not imply that the whole segment is causally connected with Crohn’s disease. But, there are two genes which have been pegged as likely candidates, IRF1 and IL5. Finally, there’s another gene, OCTN1, which is statistically associated with Crohn’s disease, but lacks a biologically plausible connection. Rather, it seems to have a role in absorption of the amino acid ergothioneine, with the 503F allele of OCTN1 resulting in gain of function in regards to this process. Interestingly the authors observe that OCTN1 is positioned exactly in the middle of the haplotype. In other words, you can think of the genome upstream and downstream of OCTN1 extending out across the haplotype as two wings or fringes of this gene.
The IBD5 haplotype is the broader landscape. IRF1, IL5, and OCTN1 are general features embedded within this landscape. 503F is a specific feature, in that it is a flavor of OCTN1. Crohn’s disease is another phenomenon which has an association with this genomic landscape, but is of a different class or category. It is correlated in particular with IBD5 haplotypes with 503F allele. The main aim of this paper is to tease apart all these multitudinous associations. What the authors found is that in terms of biochemistry the symptoms of Crohn’s disease are not correlated with the 503F allele if that allele is not associated with known risk variants of IRF1 and IL5. These are instances where genetic recombination has broken apart the association which couples 503F with the risk alleles of those two genes. The architecture of the genomic landscape then in this case has obscured the more specific causal chain which leads to an increased risk for Crohn’s disease.
So what happened? The authors posit that the 503F allele was selectively favored at some point in the past, and flanking it were the Crohn’s disease risk elevating variants of IRF1 and IL5. All things equal it is best not to have a risk for this disease, but all things are not equal. If there was a strong enough selective pressure on the target, 503F, then the downsides of the fact that it came as a “total package” with some deleterious alleles would be irrelevant. Over a long enough evolutionary time the deleterious alleles would be purified through negative selection because recombination does break apart associations, but there’s a lot of reality which consists of being between beginnings and ends.
To infer that 503F was the target of natural selection the authors used a haplotype based test for detecting such this phenomeon, iHS. This test tends to detect selective sweeps in midstream, or those which do not shift to fixation because of balancing dynamics. One implication of this is that the allele which was the target of selection will tend to have modest frequencies at best, and that is so. From the supplements here are a list of populations with the percentage of the selected allele (some duplicates because they sampled different data sets):
|Population||N = 503f alleles||N = 503L alleles||% of 503f|
|Pedi (northern Sotho)||0||22||0%|
From these data the authors make the inference that the 503F allele was selected for its enhanced transport of ergothioneine, which is lacking in many plant foodstuffs which became prominent with the Neolithic Revolution. In other words, Crohn’s disease is a byproduct of an adaptation to nutrient deficiencies brought on by agricultural monocultures. The main problem this thesis seems to have is that many Middle Eastern populations which have long been agricultural don’t have a high frequency of the 503F allele. This doesn’t mean that the selective model proposed here is impossible, but, it does indicate that if this was a plausible adaptation then Middle Eastern populations must have their own distinctive variants.
I think this is a great paper, though I’m not confident about the conclusion. Agriculture was obviously one of the major selective pressures on the human genome. Even if some of the preliminary tests of natural selection from the mid-2000s don’t hold up because they tend to confuse genuine natural selective targets from spurious positives I’m rather confident that genes which are associated in some way with agriculture are going to be enriched in terms of functional constraint and adaptive sculpting.
Citation: Chad D. Huff, David Witherspoon, Yuhua Zhang, Chandler Gatenbee, Lee A. Denson, Subra Kugathasan, Hakon Hakonarson, April Whiting, Chad Davis, Wilfred Wu, Jinchuan Xing, W. Scott Watkins, Mike Bamshad, Jonathan P. Bradfield, Kazima Bulayeva, Tatum S. Simonson, Lynn B. Jorde, and Stephen L. Guthery Crohn’s disease and genetic hitchhiking at IBD5, Mol Biol Evol, doi:10.1093/molbev/msr151.