The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Citation: Common genetic variants influence human subcortical brain structures, Nature (2015) doi:10.1038/nature14101

Citation: Common genetic variants influence human subcortical brain structures, Nature (2015) doi:10.1038/nature14101

Here’s what we know. Intelligence, as defined by a general factor which explains variation across a range of cognitive tasks, is substantially heritable, with a narrow sense heritability on the order of 0.25 to 0.75 depending on who you talk to and what context.* Intelligence itself exhibits correlations with other traits, from those of social importance, such as education, as well as biological parameters, such as brain size. Additionally, the effect size of genetic variants associated with general intelligence are likely to be very small. This means that you should be immediately skeptical of claims that a common variant segregating in the population explains a large proportion of the variation in intelligence within the population. The history of this area of research, which goes back to linkage studies, is one of non-reproducibility. Large effect quantitative trait loci should already have been picked up by linkage studies decades ago, so I am usually rather skeptical when this old wine is presented again in a genomic guise. In short, the genetic architecture of general intelligence is likely to resemble height, with many loci of small effect.**

This is what Rietveld et al. found last fall in Common genetic variants associated with cognitive performance identified using the proxy-phenotype method. The same sizes were on the order of 10,000 to 100,000 within this study. The top associations within this study explain less than 1% of the variation within the data. It seems likely that the largest effect alleles which influence intelligence variation are about an order of magnitude smaller in impact than those for height. A new paper in Nature, Common genetic variants influence human subcortical brain structures, looks at the morphology of the brain, synthesizing imaging, cognitive neuroscience, and genomics. Here’s the abstract:

…To investigate how common genetic variants affect the structure of these brain regions, here we conduct genome-wide association studies of the volumes of seven subcortical regions and the intracranial volume derived from magnetic resonance images of 30,717 individuals from 50 cohorts. We identify five novel genetic variants influencing the volumes of the putamen and caudate nucleus. We also find stronger evidence for three loci with previously established influences on hippocampal volume and intracranial volume. These variants show specific volumetric effects on brain structures rather than global effects across structures. The strongest effects were found for the putamen, where a novel intergenic locus with replicable influence on volume (rs945270; P = 1.08 × 10−33; 0.52% variance explained) showed evidence of altering the expression of the KTN1 gene in both brain and blood tissue. Variants influencing putamen volume clustered near developmental genes that regulate apoptosis, axon guidance and vesicle transport. Identification of these genetic variants provides insight into the causes of variability in human brain development, and may help to determine mechanisms of neuropsychiatric dysfunction.

Paul Thompson was involved in the research, so I am confident that it was be done thoroughly (and the author list is long enough that I hope they checked for obvious problems!). To correct for population stratification within this European sample they looked at the top for dimensions of variation, and used a regression model to capture other variables which might be confounded with the SNPs in question. The small proportion of variation explained actually increases my confidence, in that it seems to be in the same order of magnitude as the type of studies looking at endophenotypes.

Because of their sheer number I doubt that there’s a great short term likelihood of annotating all the genes responsible for variation in intelligence. Rather, I wonder if the ultimate goal is something similar to what occurred with statins. Find a small effect locus, and target a drug at that locus to help cure cognitive illnesses such as schizophrenia. It stands to reason that the same loci which impact general intelligence would also shape cognitive phenotypes which we term pathological.

* So if heritability in the narrow sense is 0.50 that means half the variation in intelligence in the population can be explained by variation of genes in the population. By way of comparison, height is 0.80 to 0.90 heritable in the narrow sense in the developed world. This does not mean that the correlation between parents and offspring is 0.80 or 0.90 for height. In fact the correlation is closer to 0.50 for height between parents and offspring and also between siblings.

** An alternative minority viewpoint is many rare alleles of somewhat larger effect.

• Category: Science • Tags: Genomics, GWAS, IQ 
🔊 Listen RSS

Razib’s daughter’s ancestry composition

An F1, r = 0.5 to Razib

Genome-wide associations are rather simple in their methodological philosophy. You take cases (affected) and controls (unaffected) of the same genetic background (i.e. ethnically homogeneous) and look for alleles which diverge greatly between the two pooled populations. Visually the risk alleles, which exhibit higher odds ratios, are represented via Manhattan plots. But please note the clause: ethnically homogeneous study populations. In practice this means white Europeans, and to a lesser extent East Asians and African Americans (the last because of the biomedical industrial complex in the United States performs many GWAS, and the USA is a diverse nation). Looking within ethnic groups eliminates many false positives one might obtain due to population stratification. Basically, alleles which differ between groups because of their history may produce associations when the groups themselves differ in the propensity of the trait of interest (e.g. hypertension in blacks vs. whites).

But this begs the question: how generalizable are GWAS, and therefore portable across ethnicities? This is not a trivial question for someone like me, as South Asians tend to be understudied for natural reasons (there aren’t that may of us in the West, and funding for this sort of thing is not viable in Third World nations where most South Asians live). Not only are South Asians understudied, but we tend to have large genetic distances within the putative population, so I’m not even sure that GWAS from the HapMap Gujarati samples would be applicable to me (the genetic distance between South Asian ethnic groups is actually greater than between Europeans and some West Asians). And then there is the question of people of mixed heritage. Is there really a possibility in the near future of GWAS’ of various F1 combinations, let alone backcrosses like Reiko Aylesworth?

Fortunately, from where I stand seems that most GWAS being reported today are portable across ethnicities, so we don’t have to go reinventing every wheel. Some of the evidence is plain to see a in new PLoS GENETICS paper, High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants. Here is the abstract:

Describing and identifying the genetic variants that increase risk for complex diseases remains a central focus of human genetics and is fundamental for the emergent field of personalized medicine. Over the last six years, GWAS have revolutionized the field, discovering hundreds of disease loci. However, with only a handful of exceptions, the causal variants that generate the associations unveiled by GWAS have not been identified, and their frequency and degree of sharing across populations remains unknown. Here, we present a comprehensive comparison of GWAS results designed to try to understand the nature of causal variants. By examining the results of GWAS for 28 diseases that have been performed with peoples of European, East Asian, and African ancestries, we conclude that a large fraction of associations are caused by common causal variants that should map relatively close to the associated markers. Our results indicate that many of the disease risk variants discovered by GWAS are shared across Eurasians.

I want to stipulate that my own views on this matter do not hinge on just this paper. Nor do I believe that there is no regional heterogeneity in the genomic architecture of disease risk alleles. Rather, as a prior I now would contend that when looking at the odds ratios for a relatively large effect allele in Europeans for Eurasians at least one shouldn’t be excessively skeptical of transferring the inference toward other populations In the paper the authors report that when accounting for differences in statistical power (European studies tend to have much larger sample sizes, and so can catch more variants) there is a decent replicability of GWAS. Additionally, there is the possibility that some non-replications are due to the fact that the GWAS are focusing on marker SNPs, rather than causal SNPs, and the marker associations are not portable across populations even if the causal ones are. Remember, often current GWAS studies utilizing SNP-chips are focusing on a genomic region, more than a particular SNP as such. This is why you may get strong GWAS signals in noncoding regions.

Of course there are going to be rare variants which are less portable, and as genomics scales up in population sample size and deep whole genome analyses we’re going to be plumbing private alleles. But until then there’ll be a mountain of common variants of diverse effect sizes, and that information needn’t be discarded when one considers populations outside of the study’s purview. When viewing odds ratios in 23andMe there’s always the caveat that “results X for Europeans.” This not expected for a business. And in terms of medical actions one still needs to be cautious. But to the question of how seriously to take GWAS performed in Europeans if you are not European? If you are non-African, I’d say moderately seriously. If you are an African, I’d probably still say somewhat seriously.

Citation: Marigorta, Urko M., and Arcadi Navarro. “High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants.” PLoS Genetics 9.6 (2013): e1003566.

🔊 Listen RSS

Last week Luke Jostins (soon to be Dr. Luke Jostins) published an interesting paper in Nature. To be fair, this paper has an extensive author list, but from what I am to understand this is the fruit of the first author’s Ph.D. project. In any case, you may know Luke because I have used his loess curve on hominin encephalization for years. His bread & butter is statistical genetics, and it shows in this Nature paper. God knows how he managed to cram so much density into ~5.5 pages of plain text. Luke is also a contributor to Genomes Unzipped, and has put up a post over there on one implication of the paper, Dozens of new IBD genes, but can they predict disease? The short answer is that for individual prediction complex traits are going to be a hard haul over the long term.*

They are subject to what Jim Manzi would term “high causal density.” A simple way to state this is that outcome X is dependent on a host of variables, and if you capture only a small number of variables, you aren’t going to be explaining much in a general fashion. This is obvious from the text of Luke’s paper. Let’ look at the abstract, Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease:

Crohn’s disease and ulcerative colitis, the two common forms of inflammatory bowel disease (IBD), affect over 2.5 million people of European ancestry, with rising prevalence in other populations…Genome-wide association studies and subsequent meta-analyses…have implicated previously unsuspected mechanisms…Here we expand on the knowledge of relevant pathways by undertaking a meta-analysis of Crohn’s disease and ulcerative colitis genome-wide association scans, followed by extensive validation of significant findings, with a combined total of more than 75,000 cases and controls. We identify 71 new associations, for a total of 163 IBD loci, that meet genome-wide significance thresholds. Most loci contribute to both phenotypes, and both directional (consistently favouring one allele over the course of human history) and balancing (favouring the retention of both alleles within populations) selection effects are evident. Many IBD loci are also implicated in other immune-mediated disorders, most notably with ankylosing spondylitis and psoriasis. We also observe considerable overlap between susceptibility loci for IBD and mycobacterial infection. Gene co-expression network analysis emphasizes this relationship, with pathways shared between host responses to mycobacteria and those predisposing to IBD.

The numbers tell the tale here. This is a massive GWAS study, with ~75,000 cases and controls. And yet what does that gain us? I’ll let the text speak here: “We have increased the total disease variance explained (variance being subject to fewer assumptions than heritability7) from 8.2% to 13.6% in Crohn’s disease and from 4.1% to 7.5% in ulcerative colitis.” This is not trivial. But it is exactly the kind of incremental increase in knowledge that systems characterized by high causal density will yield, even granting herculean efforts at data collection. I believe that studies like this, with “best-of-breed” methods, are important, because cohorts of tens of thousands, and perhaps hundreds of thousands, are not going to be unusual in the near future. The hope is that geneticists keep pushing the boulder up the hill, every so slightly.

If not individual prediction, then is there another value to this sort of work? First, one can still generate drug discovery from small genetic effects. And a major aspect of the paper above is that the authors are localizing classes of genes likely to be implicated in these illnesses. Not only that, they report that many of the pathogenic variants may not be SNPs, but structural variants of some sort. In other words, massively scaled up GWAS holds not the promise of individual prediction, but a fuller and better systematic knowledge of the human organism in the aggregate.

Finally, there is one aspect of the paper which jumped out at me because I’m not a practical person with biomedical interests first and foremost. Jostins et al. report that many of these loci seem to be subject to either directional or balancing selection. The latter is not unexpected to me. Many of the loci have immunological associations, and host-pathogen coevolution is assumed to be governed by negative frequency dependence. In other words, when slow reproducing organisms develop an effective anti-pathogen strategy, the pathogens adapt very quickly. But at this point the lower frequency strategies are now more fit, and effective against the pathogens, who are localized on a narrow adaptive peak.

But what about directional selection? My working assumption here is that high density living and the protean conditions of the post-hunter-gatherer world have reshaped the genome of most humans a great deal. Now recall that immediate adaptations often have deleterious consequences. They’re kludges. When a problem is confronting you you reach for the closest and easiest solution, even if in the infinite space of possibilities there are more optimal solutions. You don’t have the time, energy, or choice, frankly. For what it’s worth Crohn’s is more frequent in Ashkenazi Jews in relation to the population wide average (though one can posit environmental rationales for this; there’s high causal density popping up again!).

The moral of the story is that many complex traits and diseases may simply be the wages of adaptation itself. Even in an environmentally unperturbed context it is difficult to imagine a situation where endemic host-pathogen coevolution wouldn’t result in fluctuations in gene frequencies which might have deleterious consequences. This may be the best of all worlds, though all the most optimal worlds may be characterized by a familiar mediocrity in physiological fitness.

Citation: doi:10.1038/nature11582

* IBD here = Inflammatory bowel syndrome, not identical by descent!

• Category: Science • Tags: Genetics, Genomics, GWAS, Human Genetics, Human Genomics 
🔊 Listen RSS

In science, like most things, one prefers simple over complex whenever possible. You keep adding variables until the explanatory juice starts hitting diminishing marginal returns. So cystic fibrosis is due to a mutation at one gene, and the disease expresses recessively at that locus. The reality is that one mutation accounts for ~65-70% of cystic fibrosis cases around the world, and there are nearly ~1,400 known mutations on the CFTR locus. How about skin color? Mutations on a dozen genes can probably explain ~90% of the variance in the trait value across the world between populations. In fact, one single mutation on one base pair can explain ~30-40% of the trait value difference between Europeans and Africans. This is a more complex story that cystic fibrosis; you have not just many mutations, but many mutations across many genes. But, the number of genes and mutations are manageable. You can keep track of most of them in your head (e.g., I can tell you that SLC24A5, SLC45A2, KITLG, and HERC2, can explain most of the trait value difference between Africans and Europeans without looking it up). Now think about something like height. The only gene I can think of off the top of my head is HMGA2. With obesity I know FTO. The reason is that there’s a veritable alphabet soup of genes which pop out of the numerous studies focusing on these traits. But the reality is that it seems possible that there are many genes which harbor variants of small effect size which in totality account for the range of the trait value. Abstractly this isn’t really that much more complex than the models above. You can imagine it as a concrete instantiation of the central limit theorem. But in practice it does change things when you can’t focus on one gene, or a few genes, but have to understand that there exists a huge class of genetic causes which modulate the expression of the phenotype.

We’ve reached a stage where the mapping from genotype to phenotype is getting a bit on the baroque side. We have come to confront and wrestle with ‘genetic architecture.’ Here’s what Wikipedia says about this term:

Genetic architecture refers to the underlying genetic basis of a phenotypic trait. A synonymous term is the ‘genotype-phenotype map’, the way that genotypes map to the phenotypes.

The genotype-phenotype map has been analyzed in terms of several principal axes: epistasis, polygeny, pleiotropy, quasi-continuity, modularity, phenotypic plasticity, robustness, and evolvability.

And it gets more complicated. Epistasis comes in different flavors. As for the polygenic traits, they also exhibit differences. Pigmentation seems to be a trait where there really are common variants of very large effect. In contrast, for height, obesity, schizophrenia, and I.Q., no one has found them yet if they exist. So polygeny itself has many shades. Combine pleiotropy, the effect of one gene on multiple traits, with polygeny and epistasis, and the tangle of abstraction gets intractable very quickly.

This is why the arguments about synthetic associations can be difficult to unpack. Not only do you have the old problems with complex genetic architectures, but you also have to keep track of concepts such as linkage disequilibrium as well as a model of the physical embodiment of genetic information in the chromosome. Alas, we’re way past the “spherical cow” phase of simplifying for purposes of intelligibility.

So why does this matter? It’s about the “missing heritability”. We know that height is about ~80-90% heritable in developed societies. If you are adopted your height is going to correlate with your birth parents, not your adoptive parents. But very little of the variance in height can be accounted for by genes detected in linkage or genome-wide association studies (GWAS). Neither of these techniques have the power to pick out thousands of alleles of small effect. Linkage is good at detecting rare large effect variants (usually in families), while GWAS picks up more modest effect but common variants (usually in study samples of the same ethnicity).

Unfortunately GWAS hasn’t been that effective in accounting for much of the variation which we see around us. Old fashioned quantitative genetics using statistical techniques based on family relationships is still a better bet for many traits and diseases (e.g., I have a family history of type 2 diabetes, but 23andMe gives me no greater risk). A group last year suggested a solution to the conundrum of why GWAS wasn’t picking most of the genetic variation: synthetic associations. Let me jump to their author summary:

It has long been assumed that common genetic variants of modest effect make an important contribution to common human diseases, such as most forms of cardiovascular disease, asthma, and neuropsychiatric disease. Genome-wide scans evaluating the role of common variation have now been completed for all common disease using technology that claims to capture greater than 90% of common variants in major human populations. Surprisingly, the proportion of variation explained by common variation appears to be very modest, and moreover, there are very few examples of the actual variant being identified. At the same time, rare variants have been found with very large effects. Now it is demonstrated in a simulation study that even those signals that have been detected for common variants could, in principle, come from the effect of rare ones. This has important implications for our understanding of the genetic architecture of human disease and in the design of future studies to detect causal genetic variants.

To understand the logic, you need to recall that the SNP which is reported in a GWAS may not be the causal variant. In other words the SNP is just a marker which is nearby the real genetic cause, but is associated closely enough that the correlation is such that you can substitute the two in terms of their presence for purposes of predicting trait value. This has cropped up as a major issue with the genetics of blue eyes. This is a ‘quasi-Mendelian’ trait. It looks like most of the variation in Europeans is due to differences in the genomic region spanning the nearby genes HERC2 and OCA2, but different studies report different SNPs and haplotypes as diagnostic. It is unlikely that all of these markers are causal, so most of them are just strongly correlated with the true functional variant.

Because of recombination, where chromosomal regions cross over and swap partners, these sorts of associations break down over time. So linkage disequilibrium, where genetic variants (alleles) across loci (genes) exhibit non-random statistical associations, varies over time as the correlations decay due to recombination. Synthetic associations are hypothesized to be cases where very low frequency large effect variants are associated with a more common variant, the latter of which shows up in a GWAS as the associated signal with the trait. Because the correlation between the causal variant and more common variant is going to be imperfect one will only explain a small proportion of the variance (if allele 1 one at locus A has frequency ~0.001 and allele 1 at locus B has frequency ~0.20, their association has to be less than 1 because the latter so outnumbers the former in terms of copies). Additionally, there may also be several low frequency causal variants associated with the common marker.

In other words, the missing heritability isn’t very missing at all. The GWAS are picking up genuine signals, only dampened because of the imperfect correlations between the high frequency marker and the low frequency causal variant. This has practical implications:

…The distance over which synthetic associations occur also offers an alternative explanation to the increasingly common observation of rare variants that occur within the vicinity of a GWAS signal but cannot explain that signal entirely. A simple explanation for such observations is that extending the sequencing to at least 4 Mb and ideally up to 10 Mb around the GWAS signal would pick up other rare variants. In some cases, identifying all the contributing rare variants may explain all of the original signal, whereas in other cases, there could be a combination of rare and common variants contributing. In addition, if synthetic associations are responsible for many of the observed signals, then sequencing in a small number of control samples (even over a much broader genomic region) is also unlikely to succeed. Under our model, the causal sites are both rare and relatively high-penetrant contributors to disease, and will therefore be unlikely to be detected in a small number of control samples. Finally, the focus of attention on genes that are near GWAS signals may be incomplete or misleading in that the actual causal sites may occur in many different genes surrounding the implicated common variant. It is also worth emphasizing that as few as one or two rare variants, at much lower frequency than the associated common SNP, can create a significant synthetic association. In such a case, sequencing a small number of cases that carry the “at risk” common variant might miss entirely the causal rare variants even if the correct genome region is resequenced. These considerations argue for caution in efforts to resequence around genome-wide associations and argue instead that genome-wide sequencing in carefully phenotyped cohorts might be a better use of resources.

One of the papers rebutting the one above, Rare Variants Create Synthetic Genome-Wide Associations, will be covered at Genomes Unzipped. So let’s look at the other one. Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results. Frankly I found the paper hard going. The basic units of each section are intelligible, but recalling them as a coherent whole is not as easy. Part of the reason is that they take the simulations of the Dickson et al. paper, and raise them one. And simulations are to some extent “black-boxes,” at least unless you replicate them and get a feel for how modulating the parameters tweak the outcomes.

First they explored how varying the number of rare causal variants associated with a common associated SNP would effect the distribution of frequencies of the latter, and how they compared to the empirical distribution detected. What’s interesting here are panels A and D, E, and F. The first just shows the distribution of frequencies of detected SNPs in GWAS. They go from 0 to 1. D, E, and F simply show you the expected frequencies of the associated allele with the rare causal variants for a given k of variants. 1, 9, and 18, respectively. What you see is that for synthetic associations the distribution of variants associated with the rare causal SNPs should skew toward the lower end. Also, they found that irrespective of the number of k variants the associated SNP only explained ~10% of the trait variance. Finally, they also suggested that the effect size of the rare variants would have to be very large indeed for the GWAS to pick up the associated SNP. This is a problem since there’s only so much variance to go around. And, it begs the question: if the variants are of such large effect why didn’t linkage studies pick any of them up? Speaking of large effect, once you start adding up k variants to a locus you begin to narrow the regions of the genome in which causal variants can concentrate within. They authors indicate that such clustering within the genome is simply not found, another argument against numerous synthetic associations.

Next they looked at results from schizophrenia research, and attempted to see how it mapped onto the predictions entailed by a synthetic association model. The top panel shows the observed data. Not quite a uniform distribution, but there are rare variants, and common variants, and variants in the mid-range frequency. The bottom panel shows simulated results using the synthetic model. As expected you see a skew toward rare alleles, and a deviation from what is observed. Additionally, they note that they ran the simulations with a lot of different parameters, and those that included common variant alleles always tended to have a better fit with the realized results than the synthetic model predicted on rare alleles of large effect size.

The short of it is that the authors conclude that the model outlined last year simply does not fit the empirical results very well. They do not deny the existence of possible synthetic associations, but they seem to suggest that this variety of associations is not that important in explaining the missing heritability. Additionally, they note that rare alleles of large effect should not span populations, since they are likely to be evolutionarily novel. But recent work in fact suggests that risk alleles in one population is highly portable to another population. So genetic architecture may not matter as much we suspected when it comes to inter-population difference.

Why is this important? Money and time, which are both finite:

Empirical observation suggests that much of the missing heritability is contributed by causal variants (including loci comprising multiple rare variants) having effect size too small to be detected with stringent statistical significance…Larger samples for GWAS are needed to detect these which would directly compete with research funds used in sequencing studies. …Genes identified through GWAS harbouring common variants are likely to be good targets for identification of rare variants and for sorting the wheat from the chaff in next generation sequencing studies. We expect that continued GWAS will make valuable contributions to our understanding of many complex traits and will, for some time, remain as one important tool in a growing set of technologies to probe the full spectrum of genetic variation efficiently.

At the end of the day I’m interested in evolution. But to understand evolution you need to understand the genetic architecture of the traits which are the targets of natural selection. I’ve only skimmed the paper, so I really recommend you read the original for the “blood & guts.” Actually, read it a few times!

Also, please see David Goldstein’s response. I felt he was rather cordial, given the rather forceful tone of the two papers which challenged the one that came out of his lab.

Citation: Wray NR, Purcell SM, & Visscher PM (2011). Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results PLoS Biology : 10.1371/journal.pbio.1000579

Image Credit: Sailko.

🔊 Listen RSS

PLoS Biology has four items of great interest out today:

Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results
Synthetic Associations Are Unlikely to Account for Many Common Disease Genome-Wide Association Signals
The Importance of Synthetic Associations Will Only Be Resolved Empirically
Common Disease: Are Causative Alleles Common or Rare?

These are a response to last year’s paper on synthetic associations from the Goldstein lab. Here’s a critique of that that paper. I plan on reviewing the first in the list above soon. #3 is a response to #1 and #2 from David Goldstein, while #4 is a summation more aimed at the general audience.

• Category: Science • Tags: Genetics, Genomics, GWAS, Synthetic Associations 
🔊 Listen RSS

I recall projections in the early 2000s that 25% of the American population would be employed as systems administrators circa 2020 if rates of employment growth at that time were extrapolated. Obviously the projections weren’t taken too seriously, and the pieces were generally making fun of the idea that IT would reduce labor inputs and increase productivity. I thought back to those earlier articles when I saw a new letter in Nature in my RSS feed this morning, Hundreds of variants clustered in genomic loci and biological pathways affect human height:

Most common human traits and diseases have a polygenic pattern of inheritance: DNA sequence variants at many genetic loci influence the phenotype. Genome-wide association (GWA) studies have identified more than 600 variants associated with human traits1, but these typically explain small fractions of phenotypic variation, raising questions about the use of further studies. Here, using 183,727 individuals, we show that hundreds of genetic variants, in at least 180 loci, influence adult height, a highly heritable and classic polygenic trait2, 3. The large number of loci reveals patterns with important implications for genetic studies of common human diseases and traits. First, the 180 loci are not random, but instead are enriched for genes that are connected in biological pathways (P = 0.016) and that underlie skeletal growth defects (P < 0.001). Second, the likely causal gene is often located near the most strongly associated variant: in 13 of 21 loci containing a known skeletal growth gene, that gene was closest to the associated variant. Third, at least 19 loci have multiple independently associated variants, suggesting that allelic heterogeneity is a frequent feature of polygenic traits, that comprehensive explorations of already-discovered loci should discover additional variants and that an appreciable fraction of associated loci may have been identified. Fourth, associated variants are enriched for likely functional effects on genes, being over-represented among variants that alter amino-acid structure of proteins and expression levels of nearby genes. Our data explain approximately 10% of the phenotypic variation in height, and we estimate that unidentified common variants of similar effect sizes would increase this figure to approximately 16% of phenotypic variation (approximately 20% of heritable variation). Although additional approaches are needed to dissect the genetic architecture of polygenic human traits fully, our findings indicate that GWA studies can identify large numbers of loci that implicate biologically relevant genes and pathways.

The supplements run to nearly 100 pages, and the author list is enormous. But at least the supplements are free to all, so you should check them out. There are a few sections of the paper proper that are worth passing on though if you can’t get beyond the paywall.

fig1bIn this study they pooled together several studies into a meta-analysis. One thing not mentioned in the abstract: they checked their GWAS SNPs against a family based study. This was important because in the latter population stratification isn’t an issue. Family members naturally overlap a great deal in their genetic background. Also, if I read it correctly they’re focusing on populations of European origin, so this might not capture larger effect alleles which impact between population variance in height but don’t vary within a given population (note that if you explored pigmentation genetics just through Europeans you would miss the most important variable on the world wide scale, SLC24A5, because it’s fixed in Europeans). In any case, as you can see what they did was extrapolate out the number of loci which their methods could capture to explain variation with the predictor being the sample size. At 500,000 individuals they’re at ~700 loci, and around 20% of the heritable variation. My initial thought is that I’m not seeing diminishing returns here, but since I haven’t read the supplements I’ll let that pass since I don’t know the guts of this anyhow. They do assert that they are likely underestimating the power of these methods because there may be be smaller effect common variants which can top off the fraction.

But even they admit that they can go only so far. Here are some sections from the conclusion that lays it out pretty clearly:

By increasing our sample size to more than 100,000 individuals, we identified common variants that account for approximately 10% of phenotypic variation. Although larger than predicted by some models26, this figure suggests that GWA studies, as currently implemented, will not explain most of the estimated 80% contribution of genetic factors to variation in height. This conclusion supports the idea that biological insights, rather than predictive power, will be the main outcome of this initial wave of GWA studies, and that new approaches, which could include sequencing studies or GWA studies targeting variants of lower frequency, will be needed to account for more of the ‘missing’ heritability. Our finding that many loci exhibit allelic heterogeneity suggests that many as yet unidentified causal variants, including common variants, will map to the loci already identified in GWA studies, and that the fraction of causal loci that have been identified could be substantially greater than the fraction of causal variants that have been identified.

In our study, many associated variants are tightly correlated with common nsSNPs, which would not be expected if these associated common variants were proxies for collections of rare causal variants, as has been proposed27. Although a substantial contribution to heritability by less common and/or quite rare variants may be more plausible, our data are not inconsistent with the recent suggestion28 that many common variants of very small effect mostly explain the regulation of height.

In summary, our findings indicate that additional approaches, including those aimed at less common variants, will likely be needed to dissect more completely the genetic component of complex human traits. Our results also strongly demonstrate that GWA studies can identify many loci that together implicate biologically relevant pathways and mechanisms. We envisage that thorough exploration of the genes at associated loci through additional genetic, functional and computational studies will lead to novel insights into human height and other polygenic traits and diseases.

The second to last paragraph takes a shot at David Goldstein’s idea of synthetic associations.

We’re still where we were a a few years back though, old fashioned Galtonian quantitative genetics, a branch of statistics, is the best bet to predict the heights of your offspring. As with intelligence, “height genes”, are not improvements upon common sense. But if you’re going into the 10-20% range of variation explained it’s certainly not trivial, and the biological details are going to be of interest.

🔊 Listen RSS

It looks like Genomes Unzipped has their own Mortimer Adler, with an excellent posting, How to read a genome-wide association study. For those outside the biz I suspect that #4, replication, is going to be the easiest. In the early 2000s a biologist who’d been in the business for a while cautioned about reading too much into early association results which were sexy, as the same had occurred when linkage studies were all the vogue, but replication was not to be. Goes to show that history of science can be useful on a very pragmatic level. It can give you a sense of perspective on the evanescent impact of some techniques over the long run.

• Category: Science • Tags: Genome-wide Association, Genomics, GWAS 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"