The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Medical genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

The Pith: Natural selection is a quick & dirty operator. When subject to novel environments it can react rapidly, bringing both the good and the bad. The key toward successful adaptation is not perfection, but being better than the alternatives. This may mean that many contemporary diseases are side effects of past evolutionary genetic compromises.

The above is a figure from a recent paper which just came out in Molecular Biology and Evolution, Crohn’s disease and genetic hitchhiking at IBD5. You probably have heard about Crohn’s disease before, there are hundreds of thousands of Americans afflicted with it. It’s an inflammatory bowel ailment, and it can be debilitating even to very young people. The prevalence also varies quite a bit by population. Why? It could be something in the environment (e.g., different diet) or genetic predisposition, or some combination. What the figure above purports to illustrate is the correlation between Crohn’s disease and the expansion of the agricultural lifestyle.

But don’t get overexcited Paleos! There are many moving parts to this story, and I need to back up to the beginning. The tens of thousands of genes which you inherited from your parents are embedded within the genome and aligned in a set of sequences, one after the other. On the one hand for the purposes of conceptualizing evolutionary dynamics, such as natural selection or random genetic drift, focusing on a single gene is useful. It has power to illustrate some basic and elementary principles. But sometimes you need to take a more synoptic view, and look at genes in their broader context. In this post I’ll avoid molecular or statistical epistasis, gene-gene interaction. Rather, let’s just consider the static landscape of the genome, where genes are physical concrete entities which are embedded in a particular spatial relationship to other genes, upstream or downstream in the genetic code. These physical or statistical associations of genes can form a de facto supergene through linkage, and their variants combine to form haplotypes, sequences of markers across small stretches of the genome. But recall that these associations are counter-balanced by genetic recombination, which tears apart physical sequences and sows them to the opposite DNA strand.

The big picture that the above highlights is the fact that evolutionary dynamics operate not just on the gene, but also upon the local genetic neighborhood. Therefore, when we talk about selection upon a gene, we need to recall that this has consequences for that gene’s neighbors. Let’s use a concrete and real example. Northern Europeans tend to have very long haplotypes around the LCT gene, which encodes the production of lactase. Functionally this haplotype has embedded within it a variant which allows for continued production of lactase as an adult, and therefore the ability to extract all the calories from milk beyond childhood in the form of lactose sugar. The molecular genetic details of how this happens does not concern us. Instead, let’s consider why LCT is characterized by a very long haplotype.

This is what we think happened. Between 5 and 10 thousand years before the present there lived an individual who carried a dominant genetic mutation which allowed for the persistent production of lactase into adulthood. Only one copy of the lactase persistent allele is needed for lactose tolerance. That’s why populations such as in Denmark where the persistent allele is present in proportions of 80-90% have nearly universal tolerance. As per the Hardy-Weinberg equilibrium a recessive trait would express at frequencies of 1-4% (square the frequency of the minor allele). Going back to the individual with the mutant copy, if one considers a scenario where lactase persistence would be highly beneficial (this is not hard to imagine) then the frequency of that mutant would rapidly rise. It would “sweep” through the population. As it has a dominant mode of expression half of the children of the original mutant would express the trait and carry the allele, while half would not. Over the generations that one original copy could replicate rapidly within a population due to positive selection and intermarriage.

But it’s not just the functionally relevant genetic variant which would proliferate. The lactase persistent allele would be embedded within the context of a host of other genetic variants across the sequence of the DNA strand in which it was located. As the lactase persistent allele rose rapidly in frequency in a selective sweep its neighbors would hitchhike along. The extent of the hitchhiking would be conditional upon distance from the positively selected variant and the speed of the sweep, which itself would presumably depend upon the strength of selection. All of this together explains the very long haplotype around LCT in Northern Europeans: 5 to 10 thousand years ago a relatively large genomic segment of an individual who carried a lactase persistent allele was driven up in frequency very rapidly because of adaption to new conditions. Not only did that particular individual’s functionally relevant variant, the target of selection, sweep nearly to fixation in some Northern European populations, but many adjacent variants also rose in frequency, in direct proportion from distance from the focal variant. In other words, natural selection in this case was about one specific functional unit within LCT, but as a side effect it also reorganized a whole swath of the total population genome structure of Northern Europeans.

What does that have to do with Crohn’s disease and agriculture? Crohn’s disease may be a modification of the LCT story in a genomic sense, and the trigger of that modification may have been agriculture. Before I go any further, let me post the paper’s abstract:

IBD5 (inflammatory bowel disease 5) is a 250 kb haplotype on chromosome 5 that is associated with an increased risk of Crohn’s disease in Europeans. The OCTN1 gene is centrally located on IBD5 and encodes a transporter of the antioxidant ergothioneine (ET). The 503F variant of OCTN1 is strongly associated with IBD5 and is a gain-of-function mutation that increases absorption of ET. Although 503F has been implicated as the variant potentially responsible for Crohn’s disease susceptibility at IBD5, there is little evidence beyond statistical association to support its role in disease causation. We hypothesize that 503F is a recent adaptation in Europeans that swept to relatively high frequency, and that disease association at IBD5 results not from 503F itself, but from one or more nearby hitchhiking variants, in the genes IRF1 or IL5. To test for evidence of recent positive selection on the 503F allele, we employed the iHS statistic, which was significant in the European…populations…To evaluate the hypothesis of disease-variant hitchhiking, we performed haplotype association tests on high-density microarray data in a sample of 1868 Crohn’s disease cases and 5550 controls. We found that 503F haplotypes with recombination breakpoints between OCTN1 and IRF1 or IL5 were not associated with disease…In contrast, we observed strong disease association for 503F haplotypes with no recombination between these three gene…as expected if the sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5. To further evaluate these disease-gene candidates, we obtained expression data from lower gastrointestinal biopsies of healthy individuals and Crohn’s disease patients. We observed a 72% increase in gene expression of IRF1 among Crohn’s disease patients (p=0.0006) and no significant difference in expression of OCTN1….

It’s all a mouthful. But let’s review here. IBD5 is a 250 kilobase haplotype implicated in Crohn’s disease. A long segment of associated markers which also seem to correlate with individuals with the illness. This does not imply that the whole segment is causally connected with Crohn’s disease. But, there are two genes which have been pegged as likely candidates, IRF1 and IL5. Finally, there’s another gene, OCTN1, which is statistically associated with Crohn’s disease, but lacks a biologically plausible connection. Rather, it seems to have a role in absorption of the amino acid ergothioneine, with the 503F allele of OCTN1 resulting in gain of function in regards to this process. Interestingly the authors observe that OCTN1 is positioned exactly in the middle of the haplotype. In other words, you can think of the genome upstream and downstream of OCTN1 extending out across the haplotype as two wings or fringes of this gene.

The IBD5 haplotype is the broader landscape. IRF1, IL5, and OCTN1 are general features embedded within this landscape. 503F is a specific feature, in that it is a flavor of OCTN1. Crohn’s disease is another phenomenon which has an association with this genomic landscape, but is of a different class or category. It is correlated in particular with IBD5 haplotypes with 503F allele. The main aim of this paper is to tease apart all these multitudinous associations. What the authors found is that in terms of biochemistry the symptoms of Crohn’s disease are not correlated with the 503F allele if that allele is not associated with known risk variants of IRF1 and IL5. These are instances where genetic recombination has broken apart the association which couples 503F with the risk alleles of those two genes. The architecture of the genomic landscape then in this case has obscured the more specific causal chain which leads to an increased risk for Crohn’s disease.

So what happened? The authors posit that the 503F allele was selectively favored at some point in the past, and flanking it were the Crohn’s disease risk elevating variants of IRF1 and IL5. All things equal it is best not to have a risk for this disease, but all things are not equal. If there was a strong enough selective pressure on the target, 503F, then the downsides of the fact that it came as a “total package” with some deleterious alleles would be irrelevant. Over a long enough evolutionary time the deleterious alleles would be purified through negative selection because recombination does break apart associations, but there’s a lot of reality which consists of being between beginnings and ends.

To infer that 503F was the target of natural selection the authors used a haplotype based test for detecting such this phenomeon, iHS. This test tends to detect selective sweeps in midstream, or those which do not shift to fixation because of balancing dynamics. One implication of this is that the allele which was the target of selection will tend to have modest frequencies at best, and that is so. From the supplements here are a list of populations with the percentage of the selected allele (some duplicates because they sampled different data sets):

Population N = 503f alleles N = 503L alleles % of 503f
Sardinian 40 16 71%
Tuscan 9 7 56%
Turku 11 9 55%
Basque 23 23 50%
Adygei 15 17 47%
Orcadian 15 17 47%
Italian 12 16 43%
Utah 40 56 42%
French 24 34 41%
Kuopio 8 12 40%
Tuscan 23 35 40%
Pole 7 13 35%
Druze 27 67 29%
Russian 13 35 27%
Uygur 5 15 25%
Terekli-Mektab (Daghestani) 13 43 23%
Makrani 11 39 22%
Balochi 10 40 20%
Mozabite 12 48 20%
Palestinian 19 83 19%
Kalash 8 42 16%
Pathan 8 42 16%
Kubachi (Daghestani) 7 39 15%
Brahmin Niyogi 4 26 13%
Brahmin 5 33 13%
Hazara 6 42 13%
Burusho 6 44 12%
Brahmin Vydika 5 41 11%
Sindhi 5 43 10%
Bedouin 10 88 10%
Brahui 5 45 10%
BantuSouthAfrica 1 15 6%
Yakut 3 47 6%
Xibo 1 17 6%
Daur 1 19 5%
Lahu 1 19 5%
Tu 1 19 5%
Yi 1 19 5%
Cambodian 1 21 5%
Mbuti Pygmy 2 74 3%
Mbuti Pygmy 2 74 3%
Mbuti Pygmy 2 74 3%
Mbuti Pygmy 2 74 3%
Mandenka 1 47 2%
Khonda Dora 1 51 2%
Irula 1 59 2%
BiakaPygmy 1 69 1%
!Kung (San) 0 22 0%
Alur 0 16 0%
BantuKenya 0 22 0%
Biaka Pygmy 0 10 0%
Cambodian 0 10 0%
Chinese 0 16 0%
Dai 0 20 0%
Han 0 70 0%
Han-NChina 0 18 0%
Hema 0 42 0%
Hezhen 0 20 0%
Japanese 0 62 0%
Japanese 0 38 0%
Khmer Cambodian 0 8 0%
Malasian 0 12 0%
MbutiPygmy 0 30 0%
Melanesian 0 44 0%
Miao 0 18 0%
Mongola 0 20 0%
Nande 0 36 0%
Naxi 0 20 0%
Oroqen 0 20 0%
Papuan 0 34 0%
Pedi (northern Sotho) 0 22 0%
San 0 14 0%
She 0 20 0%
Sotho 0 10 0%
Southern Chinese 0 8 0%
Taiwan 0 6 0%
Tsonga 0 12 0%
Tswana 0 14 0%
Tujia 0 20 0%
Vietnamese 0 18 0%
Xhosa 0 4 0%
Yoruba 0 50 0%
Zulu (Nguni) 0 18 0%

From these data the authors make the inference that the 503F allele was selected for its enhanced transport of ergothioneine, which is lacking in many plant foodstuffs which became prominent with the Neolithic Revolution. In other words, Crohn’s disease is a byproduct of an adaptation to nutrient deficiencies brought on by agricultural monocultures. The main problem this thesis seems to have is that many Middle Eastern populations which have long been agricultural don’t have a high frequency of the 503F allele. This doesn’t mean that the selective model proposed here is impossible, but, it does indicate that if this was a plausible adaptation then Middle Eastern populations must have their own distinctive variants.

I think this is a great paper, though I’m not confident about the conclusion. Agriculture was obviously one of the major selective pressures on the human genome. Even if some of the preliminary tests of natural selection from the mid-2000s don’t hold up because they tend to confuse genuine natural selective targets from spurious positives I’m rather confident that genes which are associated in some way with agriculture are going to be enriched in terms of functional constraint and adaptive sculpting.

Citation: Chad D. Huff, David Witherspoon, Yuhua Zhang, Chandler Gatenbee, Lee A. Denson, Subra Kugathasan, Hakon Hakonarson, April Whiting, Chad Davis, Wilfred Wu, Jinchuan Xing, W. Scott Watkins, Mike Bamshad, Jonathan P. Bradfield, Kazima Bulayeva, Tatum S. Simonson, Lynn B. Jorde, and Stephen L. Guthery Crohn’s disease and genetic hitchhiking at IBD5, Mol Biol Evol, doi:10.1093/molbev/msr151.

🔊 Listen RSS
The Pith: You are expected to have 30 new mutations which differentiate you from your parents. But, there is wiggle room around this number, and you may have more or less. This number may vary across siblings, and explain differences across siblings. Additionally, previously used estimates of mutation rates which may have been too high by a factor of 2. This may push the “last common ancestor” of many human and human-related lineages back by a factor of 2 in terms of time.

There’s a new letter in Nature Genetics on de novo mutations in humans which is sending the headline writers in the press into a natural frenzy trying to “hook” the results into the X-Men franchise. I implicitly assume most people understand that they all have new genetic mutations specific and identifiable to them. The important issue in relation to “mutants” as commonly understood is that they have salient identifiable phenotypes, not that they have subtle genetic variants which are invisible to us. Another implicit aspect is that phenotypes are an accurate signal or representation of high underlying mutational load. In other words, if you can see that someone is weird in their traits, presumably they are rather strange in their underlying genetics. This is the logic behind models which assume that mutational load has correlates with intelligence or beauty, and these naturally tie back into evolutionary rationales for human aesthetic preferences (e.g., “good genes” models of sexual selection).

Variation in genome-wide mutation rates within and between human families:

J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline…Diverse studies have supported Haldane’s contention of a higher average mutation rate in the male germline in a variety of mammals, including humans…Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.

From what I gather there’s a straightforward reason why the male germline, the genetic information which is transmitted by sperm to a male’s offspring, is more mutagenetic: sperm are produced throughout your whole life, and over time replication errors creep in. This is in contrast to a female’s eggs, where the full complement are present at birth. The fact that mutations creep in through sperm is just a boundary condition of how mutations creep in to the germline in the first place, errors in the DNA repair process. This is good on rare occasions (in that mutations may actually be fitness enhancing), more often this is bad (in that mutations are fitness detracting), and, oftentimes it is neutral. Remember that in terms of function and fitness a large class of mutations don’t have any effect. Consider the fact that 1 out of 25 people of European descent carry a mutation which can cause cystic fibrosis in the general population if it manifests in a homozygote genotype. But the vast majority of cystic fibrosis mutations are present in people who are heterozygote, and have a conventional functional gene which “masks” the deleterious allele.* And there are many mutations which are silent even in homozogyote form (e.g., if there is a change in a base at a synonymous position).

As noted in the letter above until recently estimating mutation rates was a matter of inference. On the broadest canvas one simply looked at differences between two related lineages which had been long separated (e.g., chimpanzee vs. human), and so accumulated many differential mutations, and assayed the differences. It may have been a fine-grained inference in the case of individuals who manifested a disease which exhibited a dominant expression pattern, so that one de novo mutation in the offspring could change the phenotype. For most humans this is thankfully not a major issue, and mutations remain cryptic for most of our lives. But no longer. With cheaper sequencing at some point in the near future most of us will have accurate and precise copies of our genomes available to us, and we will be able to see exactly where we have unique mutations which differentiate us from our parents and our siblings.

In this paper the authors took two “trios,” parent-child triplets, and compared their patterns of genetic variation at the scale of the full genome to a very high level of accuracy. Accuracy obviously matters a great deal when you might be looking for de novo mutations which are going to be counted on the scale of hundreds when base pairs are counted in billions. In the future when we have billions and billions of genomes on file and omnipotent computational tools I suspect there will be all sorts of ways to ascertain “typicality” of regions of your genome, but in this paper the authors naturally compared the parents to the children. If a mutation is de novo it should be underivable from the genetic patterns of the parent. But, sequencing technologies are not perfect, so there’s going to be a high risk for false positives when you are looking for the de novo mutations “in the haystack” (e.g., an error in the read of the offspring can be picked up as a mutation).

So they started with ~3,000 candidate de novo mutations (DNMs) for each family trio after comparing the genomes of the trios, but narrowed it down further experimentally as they filtered out the false positives. You can read the gory details in the supplements, but it seems that they focused on the identified candidates to see if they were: germline DNMs, non-germline DNMs, variant inherited from the parents, or a false positive call. So it turns out that half of the preliminary DNMs were somatic and about 1% turned out to be germline. Remember that the difference is that the germline mutations are going to be passed on to one’s offspring, while the somatic mutations only have impact on one’s physiological fitness over one’s life history. For the purposes of evolution germline mutations are much more important, though over your lifetime somatic mutations are going to be very important as you age.

After the methodological heavy-lifting the results themselves are interesting, albeit of somewhat limited generalizability because you are focusing on two trios only. Before we examine the results here’s a figure which illustrates the study design:

From what I can gather there are two primary findings in this paper:

1) Variance in the sex-mediated nature of DNMs across trios. One of the pairs was much closer to expectation. The male germline contribution was responsible for the vast majority of DNMs.

2) A more precise estimate of human mutational rates which might have implications for “molecular clock” estimates used in evolutionary phylogenetics.

Here are the findings in a figure which shows the 95% confidence intervals around estimated mutation rates:

CEU refers to the sample of white Utah Mormons commonly used in medical genetics, while YRI refers to Yoruba from Nigerians. Remember, these are two families only. That severely limits the power of the insights which you can draw, but already you see that while the CEU trio shows the expected imbalance between male and female contribution to DNMs, the YRI trio does not. But, both of the trios do suggest a lower mutation rate than found in previous studies which inferred the value from species divergence. Here is the portion which is relevant for human evolution: “These apparently discordant estimates can be largely reconciled if the age of the human-chimpanzee divergence is pushed back to 7 million years, as suggested by some interpretations of recent fossil finds.” I wouldn’t put my money on this quite yet, going by just this one study, but I’ve been hearing that this paper doesn’t come to this number in a scientific vacuum. Other researchers are converging upon a similar recalibration of mutational rates which might push back the time until the last common ancestor of many divergent hominoid and hominin lineages (including modern humans).

Moving the lens back to the present and of more personal genomic relevance:

Mutation is a random process and, as a result, considerable variation in the numbers of mutations is to be expected between contemporaneous gametes within an individual. If modeled as a Poisson process, the 95% confidence intervals on a mean of ~30 DNMs per gamete (as expected from a mutation rate of ~1 × 10−8) ranges from 20 to 41, which is a twofold difference. Truncating selection might act to remove the most mutated gametes and thus reduce this variation among gametes that successfully reproduce, however, any additional heterogeneity in stem-cell ancestry or environment (for example, variation in the number of cell divisions leading to contemporaneous gametes) would likely increase inter-gamete variation in the number of mutations.

Using the much smaller marker set obtained from 23andMe I found that two of my siblings are nearly 3 standard deviations apart in in identity-by-descent when it comes to the distribution of full-siblings. In the near future we might be able to ascertain the realized, not just theoretical, extent of mutational load across a family. As noted by the authors much of this might be a function of paternal age. Rupert Murdoch has children who are younger than many of his grandchildren, so there are many, many, “natural experiments” out there, as males are having offspring over 40 years apart.

On a societal level we may be able to estimate the exact cost in terms of public health costs of rising mean age of fathers. Personally we may also be able to note the correlations within families between high levels of DNMs and traits of interest such as intelligence and beauty. Compared to more fine-grained tools of ancestry inference I presume this is going to be dynamite. But it isn’t as if we didn’t know siblings varied before.

Citation: Donald F Conrad, Jonathan E M Keebler, Mark A DePristo, Sarah J Lindsay, Yujun Zhang, Ferran Casals, Youssef Idaghdour, Chris L Hartl, Carlos Torroja, Kiran V Garimella, Martine Zilversmit, Reed Cartwright, Guy A Rouleau, Mark Daly, Eric A Stone, Matthew E Hurles, & Philip Awadalla (2011). Variation in genome-wide mutation rates within and between human families Nature Genetics : 10.1038/ng.862

* In a random mating population the proportions are defined by the Hardy-Weinberg Equilibrium, p2 + 2pq + q2 = 1, so where q = 0.04, q2 = 0.0016 and 2pq = 0.0768. Heterozygote genotypes of CF outnumber homozygote ones 50 to 1.

Bloggy addendum: The first author of this letter is Don Conrad who is a contributor to Genomes Unzipped.

🔊 Listen RSS

The figure to the left is a three dimensional representation of principal components 1, 2, and 3, generated from a sample of Gujaratis from Houston, and Chinese from Denver. When these two populations are pooled together the Chinese form a very homogeneous cluster. They don’t vary much across the three top explanatory dimensions of genetic variance. In contrast, the Gujaratis do vary. This is not surprising. In the supplements of Reconstructing Indian population history it was notable that the Gujaratis did tend to shake out into two distinct clusters in the PCAs. This is a finding you see over and over when you manipulate the HapMap Gujarati data set. In reality, there aren’t two equivalent clusters. Rather, there’s one “tight” cluster, which I will label “Gujarati_B” from now on in my data set, and another cluster, “Gujarati_A,” which really just consists of all the individuals who are outside of Gujarati_B cluster. Even when compared to other South Asian populations these two distinct categories persist in the HapMap Gujaratis.

Zack has already identified a major difference between the two clusters: Gujarat_A has some individuals with much more “West Eurasian” ancestry. To be more formal about this in the future I simply assigned individuals in my merged data set to one of the two Gujarati clusters based on their position in the first two PCs. Yesterday night I ran ADMIXTURE K = 2 to 10, with 75,000 SNPs. I also removed the Native American groups, and added more European and East Asian samples from the HapMap. Below are some populations at K = 4:

Let’s drill down to the level of individuals. Here are the Gujarati individuals, along with Sindhis, and my parents (Bengali). I’ve sorted by the “European” and then “South Asian” components (light blue and green respectively, while purple is modal in Papuans and red in East Asians):

The ADMIXTURE plots are in total alignment with the PCA. In the PCA Gujarati_A exhibit a spectrum of distance from the European cluster, and in the ADMIXTURE you see the same. In contrast, Gujarati_B is relatively uniform. So what’s going on? I will be posting something similar over at Sepia Mutiny soon. But my guess is that Gujarati_B are a subset of Patels. In other words, they’re a genetically distinct jati. I suspect that Gujarati_A are a more diverse bunch from a number of different jatis.

Does this matter? I believe it does. If Gujarati_B are a distinct ethno-social group which is a subset of Gujaratis, then they may not be as good a proxy for South Asian medical genetics as Gujarati_A. More concretely, Gujarati_B may have relatively high frequency rare disease alleles because they’re an inbred clan. In contrast, while Gujarati_A may exhibit all the hallmarks of South Asian endogamy, if they’re a larger number of different groups, then they’ll have all sorts of different rare alleles. The ones they have in common may be more generally South Asian.

🔊 Listen RSS

The Pith: In this post I examine the relationship between racial ancestry and cancer mortality risks conditioned on particular courses of treatment. I review research which indicates that the amount of Native American ancestry can be a very important signal as to your response to treatment if you have leukemia, as measured by probability of relapse.

If you are an engaged patient who has been prescribed medication I assume you’ve done your due diligence and double-checked your doctor’s recommendations (no, unfortunately an M.D. does not mean that an individual is omniscient). Several times when I’ve been prescribed a medication I have seen a note about different recommended dosages by race when I did further research. Because of my own personal background I am curious when it says “Asian.” The problem with this term in medical literature is that “Asian” in the American context is derived from a Census category constructed in 1980 for bureaucratic and political purposes. It amalgamates populations which are genetically relatively close, East and Southeast Asians, with more distant ones, South Asians (when my siblings were born I remember that my parents listed their race as “Asian” when they filled out paper work for the hospital).

But at least the issues with an “Asian” category are clear. Consider the “Hispanic/Latino” category. In the the USA this term also became popular through government fiat around 1970, as a catchall for people whose ancestry derives from the Spanish speaking Americas, with Spaniards, Portuguese, and Brazilians, being border-line cases. Additionally, it has become relatively common in the general American culture to code Hispanic as non-white. This despite the fact that all Latin American populations have large self-identified white populations, with some, such as Argentina and Uruguay, being overwhelmingly white. In the USA between 54% and 92% of Hispanics identify as white in terms of their race. The discrepancy is that some surveys allow for the “Some other race” option, which is the second most popular choice. Surveys which force respondents into a few categories such as white, black, Native American or Asian, produce a result where Hispanics default to a white self-identification. Implicitly we know it’s more complicated than this mishmash of bureaucratic convenience and opportunistic American identity politics. The HapMap has a Mexican American sample from Los Angeles. Above you see K = 3 in ADMIXTURE for Mexican Americans. Each thin “slice” is an individual, with the color proportions reflective of genomic contributions of one of three putative ancestral groups.The full plot had Europeans and Chinese as well. Blue seems to correspond with Native American, and red white European (the green residual is modal in East Asians). Los Angeles’ Mexican American community is obviously mixed-race. What in Latin American might be termed mestizo. And yet according to the survey data when forced to choose this community seems to affiliate with a white Spanish identity, blanco. Seeing as almost all of them are Spanish speaking and not indigenous (I am aware that the USA has a small and growing non-Spanish speaking Latino population of indigenous immigrants), this would make sense. But another facet of Mexican American identity surfaces in the concept of Aztlán, which is a nod to the Nahua roots of much of the Mexican population.

But whatever the the cultural nuance and subtly, which can be decomposed at length, it is also important to properly characterize the genetic structure of the Hispanic populations. Some Mexican Americans are predominantly white European in ancestry, and some are predominantly Amerindian. Many are mixed in roughly equal proportions. This is not just a minor detail. Going back to my first paragraph, a new letter to Nature Genetics reports on the differential response to treatment in children with leukemia proportional to Native American ancestry. Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia:

Although five-year survival rates for childhood acute lymphoblastic leukemia (ALL) are now over 80% in most industrialized countries…not all children have benefited equally from this progress…Ethnic differences in survival after childhood ALL have been reported in many clinical studies…with poorer survival observed among African Americans or those with Hispanic ethnicity when compared with European Americans or Asians…The causes of ethnic differences remain uncertain, although both genetic and non-genetic factors are likely important…Interrogating genome-wide germline SNP genotypes in an unselected large cohort of children with ALL, we observed that the component of genomic variation that co-segregated with Native American ancestry was associated with risk of relapse (P = 0.0029) even after adjusting for known prognostic factors (P = 0.017). Ancestry-related differences in relapse risk were abrogated by the addition of a single extra phase of chemotherapy, indicating that modifications to therapy can mitigate the ancestry-related risk of relapse.

They inferred ancestry through two different methods. First, they used principal component analysis to extract the biggest independent dimensions of variation within the genetic data set. What happens when you do this is that you quickly recapitulate totally comprehensible patterns of population genetic clustering within your data set. To the left you see a PCA where the largest component of variance separates Africans from non-Africans (x axis) and the second largest separates Europeans from East Asians (y axis). The underlying data is from a merging of the HapMap and HGDP.

This pattern crops up over and over. Within this broader framework you see more specific trends. I have labelled the Mexican American populations on the two dimensional plot. Note its linear topology. This is a sign of possible admixture. Roughly, the position of any given individual along a line between two putative parental populations is proportional to their distance from those populations. In plain English, someone who is half-Chinese and half-Swedish will be placed equidistant from the Chinese and Swedish clusters on a PCA plot with those populations. The Mexican Americans span a region between Europeans and East Asians. This makes perfect sense in terms of their recent population history. It also means that just knowing that someone is “Mexican” in their heritage does not tell you as much about their ancestry as if you knew that someone was “Chinese.” There’s a lot of variance genetically in the Mexican population.

I introduced the preamble about PCA plots because the figure where they use PCA to elucidate the ancestry of the HapMap and their sample population of leukemia effected children can be somewhat confusing. What you see is that panel A, B, and C, are PC 1, 2, and 3, respectively. That means that the top panel explains the most variation, and the third panel the least. I’ve added some extra labels because of the small font. You see in the top panel immediately what was evident in the two dimensional plot above: Africans separate out from non-Africans.The boxes represent the 25-75 percent intervals within the populations. Contrast the very tight distributions of the “pure” reference populations, and the more varied distribution of the children in their data set. It seems that some self-identified white children have a rather high load of African ancestry, while the black Americans naturally vary a great deal more than Yoruba in Nigeria. The distribution of the Mexican Americans reflects the African ancestry which has been absorbed within the Mexican population more broadly.

Panel B illustrates the second PC, which separates non-Africans on a rough west-east axis. So what’s going on with the Asians? Again, the 1980 Census strikes again! A substantial fraction of the “Asians” are “South Asian,” who have somewhat more “European” than “Asian” ancestry. Again, a small minority of “white” children seem to have substantial Asian ancestry. The Hispanic pattern is rather easy to explain, probably simply Amerindian-European admixture variation.

The last PC seems to separate Native Americans from other populations. So why are white and Asian children also exhibiting variance here? First, a non-trivial proportion of white Americans have substantial Native American ancestry. Brett Favre has a grandparent who was a member of the Choctow tribe, for example. Second, I suspect much of the variance is due to a common ancestry between Amerindians and some Eurasian groups which isn’t showing up in the white Utah sample, which is sampled from the far west of Eurasia, or the Chinese sample.

Another way to visualize ancestry is of course to posit K number of ancestral groups, and assign given quanta of ancestry to each individual from a given K. To the right you see a STRUCTURE bar plot where 2,500+ individuals are displayed vertically, with shading proportional to ancestry. I’ve placed tentative labels. Most of the children in the sample are white, and so exhibit mostly European (red) ancestry. From what I know about the black American community it seems that on this visualization they’ve been separated into two clusters (see the supplements for the algorithm). About 10% of black Americans are more than 50% white, while the median black American has 20-25% white ancestry. The Asian cluster is strange because it amalgamates East and South Asians. South Asians are 65-90% “European,” depending on their ancestral region. Finally, you have the Mexican Americans, who span the range of admixture between Europeans and Amerindians, with some African element as well.

STRUCTURE produces the averages to the left for self-identified populations. The proportions for African Americans is just about right. For the Hispanic category it seems more European than the Los Angeles Mexican Americans, but there are historical reasons to suspect that Mexican Americans in Texas have more Spanish ancestry, while Cubans in the USA are overwhelmingly white (and Puerto Ricans have more white ancestry than black or Amerindian). The very low percentages for non-European ancestry for whites makes me skeptical of the means; I assume there are some mixed-race individuals who identify as white, but I wonder if most white Americans with ~1% “Native American” just have deep common ancestry dating back to the Ice Age, or whether it’s an artifact of the method.

But what’s the point of all this? Ancestry analysis is fun, interesting, and has some relevance to broader socio-political debates and conflicts, but this is a story with some medical relevance (which is why it’s in Nature Genetics). In short, the authors found that Native American ancestry was a very high risk factor for relapse, conditional on the extent of chemotherapy. I merged table 3 and some panels from figure 2 to show what’s going on:

On the right are a list of risk (or mitigating) factors. I’ve underlined the effect of the proportion of Native American ancestry, treated as a continuous variable. To the left, you see the probability of cancer relapse as a function of Native American ancestry. The red line are those with less than 10% Native American ancestry, and the blue line more than 10%. In the top panel you see the impact on self-identified whites. The bottom panel shows the outcome for those who did not receive “delayed intensification” treatment. Panel E, which I did not show, illustrates clearly that the two lines converge when delayed intensification treatment is provided. If there is something in the Native American genetic background causing this problem, then it seems one can model it as a “gene-environment interaction.” That is, the genetically mediated outcome is conditional on particular environmental conditions (in this case, lack of the treatment).

But ancestry isn’t magic. The authors managed to track down candidates SNPs associated with Native American ancestry on the genomic level. In particular, the risk allele at rs6683977 at PDE4B was significantly more common among those with more than 10% Native American ancestry than less than 10%. Native American ancestry in that genomic region was also associated with particular relapse risk.

This seems a relatively straightforward application of using genetic data in a cost-benefit program. In the United States there is a major issue with growing health care costs. Many with a technocratic “evidence based” bent are curious as to efficacies of particular treatments, and their relationship to the costs incurred. It may be that for particular genetic backgrounds the cost-benefit calculus will be different than for the general population. While further rounds of chemotherapy may not be justified in terms of return-on-investment (i.e., probability of survival 10 years out) for a white child, it may be for a Native American one. Weighing costs and probabilities like this may seem bloodless, but we do it implicitly every day. This is just another tool in that thankless enterprise.

Citation: Yang JJ, Cheng C, Devidas M, Cao X, Fan Y, Campana D, Yang W, Neale G, Cox NJ, Scheet P, Borowitz MJ, Winick NJ, Martin PL, Willman CL, Bowman WP, Camitta BM, Carroll A, Reaman GH, Carroll WL, Loh M, Hunger SP, Pui CH, Evans WE, & Relling MV (2011). Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia. Nature genetics PMID: 21297632

• Category: Science • Tags: Ancestry, Genetics, Genomics, Health, Medical genetics, Race 
🔊 Listen RSS

I recall projections in the early 2000s that 25% of the American population would be employed as systems administrators circa 2020 if rates of employment growth at that time were extrapolated. Obviously the projections weren’t taken too seriously, and the pieces were generally making fun of the idea that IT would reduce labor inputs and increase productivity. I thought back to those earlier articles when I saw a new letter in Nature in my RSS feed this morning, Hundreds of variants clustered in genomic loci and biological pathways affect human height:

Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2, 3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

The supplements run to nearly 100 pages, and the author list is enormous. But at least the supplements are free to all, so you should check them out. There are a few sections of the paper proper that are worth passing on though if you can’t get beyond the paywall.

fig1bIn this study they pooled together several studies into a meta-analysis. One thing not mentioned in the abstract: they checked their GWAS SNPs against a family based study. This was important because in the latter population stratification isn’t an issue. Family members naturally overlap a great deal in their genetic background. Also, if I read it correctly they’re focusing on populations of European origin, so this might not capture larger effect alleles which impact between population variance in height but don’t vary within a given population (note that if you explored pigmentation genetics just through Europeans you would miss the most important variable on the world wide scale, SLC24A5, because it’s fixed in Europeans). In any case, as you can see what they did was extrapolate out the number of loci which their methods could capture to explain variation with the predictor being the sample size. At 500,000 individuals they’re at ~700 loci, and around 20% of the heritable variation. My initial thought is that I’m not seeing diminishing returns here, but since I haven’t read the supplements I’ll let that pass since I don’t know the guts of this anyhow. They do assert that they are likely underestimating the power of these methods because there may be be smaller effect common variants which can top off the fraction.

But even they admit that they can go only so far. Here are some sections from the conclusion that lays it out pretty clearly:

By increasing our sample size to more than 100,000 individuals, we identified common variants that account for approximately 10% of phenotypic variation. Although larger than predicted by some models26, this figure suggests that GWA studies, as currently implemented, will not explain most of the estimated 80% contribution of genetic factors to variation in height. This conclusion supports the idea that biological insights, rather than predictive power, will be the main outcome of this initial wave of GWA studies, and that new approaches, which could include sequencing studies or GWA studies targeting variants of lower frequency, will be needed to account for more of the ‘missing’ heritability. Our finding that many loci exhibit allelic heterogeneity suggests that many as yet unidentified causal variants, including common variants, will map to the loci already identified in GWA studies, and that the fraction of causal loci that have been identified could be substantially greater than the fraction of causal variants that have been identified.

In our study, many associated variants are tightly correlated with common nsSNPs, which would not be expected if these associated common variants were proxies for collections of rare causal variants, as has been proposed27. Although a substantial contribution to heritability by less common and/or quite rare variants may be more plausible, our data are not inconsistent with the recent suggestion28 that many common variants of very small effect mostly explain the regulation of height.

In summary, our findings indicate that additional approaches, including those aimed at less common variants, will likely be needed to dissect more completely the genetic component of complex human traits. Our results also strongly demonstrate that GWA studies can identify many loci that together implicate biologically relevant pathways and mechanisms. We envisage that thorough exploration of the genes at associated loci through additional genetic, functional and computational studies will lead to novel insights into human height and other polygenic traits and diseases.

The second to last paragraph takes a shot at David Goldstein’s idea of synthetic associations.

We’re still where we were a a few years back though, old fashioned Galtonian quantitative genetics, a branch of statistics, is the best bet to predict the heights of your offspring. As with intelligence, “height genes”, are not improvements upon common sense. But if you’re going into the 10-20% range of variation explained it’s certainly not trivial, and the biological details are going to be of interest.

🔊 Listen RSS

It’s just a fact that contemporary human evolutionary genetics has relied upon its potential insights into disease to generate funding, support and interest. I don’t think that this is much of a silver lining when set next to the suffering caused by disease, but it’s a silver lining nevertheless. Therefore findings which would be of interest in and of themselves are able to push to the front of the line because of possible medical relevance. A new paper in PLoS Genetics illustrates the relationship between what seem like esoteric evolutionary insights and diseases of importance to the medical community. It takes a look at the gene whose disruption results in the horrible illness cystic fibrosis, CFTR, and uncovers some interesting genetic patterns of possible evolutionary relevance. The paper is The CFTR Met 470 Allele Is Associated with Lower Birth Rates in Fertile Men from a Population Isolate. From the author summary:

Cystic fibrosis (CF) is the most common lethal recessive disorder in European-derived populations and is characterized by clinical heterogeneity that involves multiple organ systems. Over 1,600 disease-causing mutations have been identified in the cystic fibrosis transmembrane regulator (CFTR) gene, but our understanding of genotype–phenotype correlations is incomplete. Male infertility is a common feature in CF patients; but, curiously, CF–causing mutations are also found in infertile men who do not exhibit any other CF–related complications. In addition, three common polymorphisms in CFTR have been associated with infertility in otherwise healthy men. We studied these three polymorphisms in fertile men and show that one, called Met470Val, is associated with variation in male fertility and shows a signature of positive selection. We suggest that the Val470 allele has risen to high frequencies in European populations due a fertility advantage but that other genetic and, possibly, environmental factors have tempered the magnitude of these effects during human evolution.

The high frequency of alleles which result in cystic fibrosis is something of a mystery. Basic population genetic theory tells us that lethal (at least in the pre-modern era) recessive traits should be extant only at very low frequencies so that most of the deleterious alleles are “masked” by normal copies. The ΔF508 mutation is found in 1 in 30 people of Northern European descent (you see somewhat different ratios, but all in the same ballpark). That means that assuming a random mating Hardy-Weinberg Equilibrium a touch more than 0.1% of offspring would exhibit the disease due to the coming together of the ΔF508 allele in a homozygote state, not a trivial proportion when you consider that the fitness of these individuals converges upon zero.

In this paper they don’t get at ΔF508 and the other disease causing alleles directly. Rather, they find that one particular SNP has a strong effect on fertility, as well as having a relationship in some contexts to disease implicated alleles. Not too surprising considering that cystic fibrosis is associated with infertility. I presume that the overarching logic is that understanding the genetics of CFTR in its details will give us a better picture of its internal architecture and the various networks and pathways which result in its proper, or improper, function.

CFTR spans ~200,000 base pairs, but in the paper the authors focus on a few regions of interest within a sample from the American Hutterite community. In particular there is the 5-thymidine (5T) repeat allele at the 3′ splice site of intron 8, a variant which interferences with the proper transcription of exon 9. Then there is TG repeat (TG) on intron 8 and an SNP on exon 10, rs213950. In the latter case the two alleles result in the amino acids methionine and valine respectively at the 470th position (Met470 and Val470). Both of these variants have an effect on the 5T allele, increasing its penetrance in relation to the outcome of cystic fibrosis. The Met470Val mutation’s molecular genetic implications are double-edged outcome; Val470 results in a CFTR protein which matures more quickly, but with lower activity compared to the Met470 allele. Since 5T reduces splicing efficiency one could intuit why the presence of Val470, with its result of lower activity of the protein, might have a a deleterious effect when the two are found in conjunction.

The paper approaches cystic fibrosis sideways because the focus on Met470Val means that they’re looking at a secondary variant from a medical perspective; a modifier, not the primary agent. But from an evolutionary perspective there’s a lot to dig into! First, let me jump to the discussion, where they seem to admit the modest current medical relevance of this paper:

Lastly, there has been a long-standing debate as to whether disease-causing CF mutations, such as ΔF508, confer a fertility advantage to healthy carriers…Unfortunately, the results we report here do not provide insight into this question. The most common CF causing mutations in Europeans (i.e. ΔF508, G542X, N1303K, W1282X) and the most common mutation in the Hutterites, M1101K…all reside on haplotypes carrying the ancestral, Met470 allele in exon 10…the 9T allele at the polyT locus, and (by inference) the TG10 or TG11 alleles…Therefore, any positive fertility effects of the Val470 allele would not be expected to affect the frequencies of the common CF disease-causing mutations in European populations.

A haplotype just refers to a sequence/correlation of alleles along the genome. You know that DNA consists of a string of base pairs, AGCGCTGAGCGCAA…. If there is variation at the first and last positions in the sequence above, and if the alternative variants at the two loci do not associate randomly but exhibit high correlations along a physical sequence, then there may be a haplotype of the variants. In the case of this paper the three regions of mutations combine to form the haplotypes. Tables 1 & 2 show the frequencies of alleles and haplotypes within their Hutterite sample.



Table 1 lays out the frequencies of each allele within the sample, while table 2 illustrates the frequencies of combinations of these alleles. The haplotypes.

The next two figures show the major finding, the association between Val470 and higher fertility in Hutterite men (not women). Remember that p-value = 0.05 is the normal bar for statistical significance. The ticks in the second figure are 95% intervals.



Do I need to emphasize how important it is that the alleles have a correlation with reproductive outcomes? Changes in gene frequencies are driven by variations in reproductive outcomes, whether random or systematically correlated with phenotypes. Drift or selection. Traits strongly tied to reproduction often have low heritabilities because all the variation on such traits quickly disappear because of selection’s homogenizing power. It is interesting that in this case they’re implying that there’s heritable variation in reproductive outcomes, as they know a priori that selection should have expunged the variation, all things equal.

Here’s a more stark figure which illustrates the association between haplotype and fertility in a more stepwise fashion:


OK, so how does this vary across populations? The next figure comes straight out of the HGDP browser:


The variation on Met470Val exhibits an African/non-African difference. I assume that the variation in the non-African segment (compare the Tuscans to the Russians for example) is mostly noise because of the small sizes of some of the HGDP sample groups. The 0.10 frequency in the San sample is intriguing. I’ve never heard anyone assert that the HGDP San had likely non-Africa admixture, so existence of Val470 in this southern African group suggests to me that its appearance among non-Africans is not simply a random act of history (i.e., the outcome of the Out of Africa event and bottleneck). There may be common relaxations of ecological constraints on novel adaptation as one moves away from the tropics, or, new selective pressures.

I wanted to highlight the nature of the haplotype variation earlier because the authors ascertain the possibility of natural selection driving Val470 up in frequency among non-Africans using haplotype based tests of natural selection. In the figure below panel A shows the haplotype blocks. The short of it is that Val470 has a much longer haplotype than Met470, which stands to reason if Met470 was the ancestral state around which a lot of variation had crept in through drift (LCT, the gene which has a derived variant which confers lactase persistence has a very long haplotype on the selected allele because it rose in frequency faster than recombination and mutation could break apart the distinctive genetic profile of the original copy). Panel B shows extended haplotype homozygosity (EHH), while D shows iHS (integrated haplotype score). The latter is to some extent an elaboration of the former, able to detect selective sweeps which have not come close to fixation as those best detected by EHH. Panel C has Fst between African and non-African populations. Fst is a statistic which summarizes between-population variance. It is 0.43 for Met470Val, while genome-wide it’s 0.11. Both the Fst and iHS values for the SNP are on 5% tails of the distribution, illustrated by panel E.


The Fst differences, along with suggestions of homogeneity across the genetic scale for the allele, Val470, which confers reproductive fitness, strongly points to the possibility of natural selection. But the reproductive differences they found were large; why is Met470 still around? In the discussion there throw out some possibilities:

In fact, given the large fertility effects observed in the Hutterites, it is surprising that the Val470 allele has not gone to fixation in non-African populations. However, there might be several reasons why this has not occurred. First, the combined data on fertility effects of the Val470 allele indicate that this allele can be associated with both increased and decreased fertility, depending on genetic background. In the presence of the 5T allele at the intron 8 polyT locus, Val470 increases the risk of CBAVD and male infertility…In the absence of the 5T allele (as in the Hutterites), the Val470 allele is associated with increased male fertility relative to Met470. Although the mechanism of this interaction is obscure, it provides one example of counteracting variation that could increase the time to fixation of the Val470 allele. Second, as mentioned above, the Val allele could also be deleterious in certain environments, such as in the presence of specific pathogens or the 5T allele, as a result of its pleiotropic effects in other organ systems. Third, the fertility advantage we observed is restricted to males; we found no such association in Hutterite women…This would further slow the spread of the allele as there would be no selection advantage in half of all Val carriers. Lastly, this study was conducted in a population living under optimal conditions for reproductive success, including excellent nutrition and abundant food, access to modern health care, and negligible maternal mortality. Thus, estimates of fitness effects based on Hutterite fertility rates are likely inflated compared to the effects in human populations throughout most of evolutionary history, when competing selective pressures were likely more prevalent. Taken together, the lack of fixation of the Val470 alleles in populations outside of African may not be inconsistent with the fertility effects observed in the Hutterites, but rather suggestive of antagonistic effects of other genetic variations or environment factors that tempered these effects during most of human evolution.

Remember that we’ve seen for a while now that loci which exhibit signatures of positive natural selection are often not fixed to 100%. Why not? There have been many explanations offered, and the ones above fall into the general categories mooted. Looking at a relatively isolated population in a snapshot form may not give us a full impression of what’s going on. On the other hand, the Hutterite genetic uniformity presumably eliminates many of the confound signals which might otherwise obscure associations, so there are pluses and negatives to this sample. And of course evolution occurs over time, and peaking at slices tells us what it tells us, no more, no less. This is a place to start, but I bet it will make more sense once we have a better grasp of the distribution of dynamics across the genome. Scientific understanding often proceeds in a piecewise fashion, but the sum is greater than the parts as the sum often exhibits a structure of variation which allows us to squeeze more juice from the parts.

Citation: Kosova G, Pickrell JK, Kelley JL, McArdle PF, Shuldiner AR, Abney M, & Ober C (2010). The CFTR Met 470 allele is associated with lower birth rates in fertile men from a population isolate. PLoS genetics, 6 (6) PMID: 20532200

• Category: Science • Tags: Genetics, Genomics, Medical genetics 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"