The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
David Goldstein

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Over at Genetic Future Dr. Daniel MacArthur has a measured response to a Nature commentary by David Goldstein, Growth of genome screening needs debate. As Dr. MacArthur notes an excessive portion of Goldstein’s piece is taken up with inferences derived from assuming that the model of rare variants causing most diseases is correct, when that is an issue currently in scientific contention (and this is a debate where Goldstein is a primary player on one side). But the last two paragraphs of the piece is where the real action is, no matter the details of genetic architecture of diseases:

One potential problem with this is that numerous genetic risk factors will have diverse and unexpected effects — sometimes causing disease, sometimes being harmless and sometimes perhaps being associated with behaviours or characteristics that society deems positive. Even for simpler Mendelian diseases, up to 30% of the mutations originally termed pathogenic have turned out to be apparently harmless…Wholesale elimination of variants associated with disease could end up influencing unexpected traits — increasing the vulnerability of populations to infectious diseases, for instance, or depleting people’s creativity.

There are no clear-cut answers to the questions of what should be screened for and to what end, but we must at least begin the debate.

There’s nothing that I can find objectionable at all about the last sentence. We’re really close to the “we have the technology” moment. Seeing as how more and more of the higher socioeconomic strata are delaying having children you are going to have a phenomenon where those with disposable income are going to “want to make it count,” so to speak. Especially since children born to older parents already have a higher likelihood of having medical issues.

The specific point which Dr. MacArthur brings to light is that we need to balance social consequences with individual incentives. Consider something which is more clear-cut than diseases: sex selection. Intuitively we understand that humans flourish best in circumstances when the ratio between the sexes is balanced. Imbalances tend to lead in a skewing of social dynamics. But in many societies there are strong incentives toward having children of one particular sex. Therefore on an individual level the rational calculus is one where you make the choice which is totally irrational when everyone else makes the same choice. I have pointed out before that the reality is that male sex bias in both Japan and Korea have shifted toward a female sex bias (first in Japan, and now in Korea). So just like biological sex ratios cultural pressures tend toward equilibrium. Eventually. But, that equilibration may take a generation, and between now and then the social phenomena which we are confident will eventually fade may not be so positive, whether in the individual happiness of the excess sex, or in the aggregate functioning of a well balanced society. An average stable point can still manifest as uncomfortable swings and transitions for human beings living their lives.

In relation to preimplanation genetic diagnosis the moral and ethical dilemma is somewhat different. Overall there is a boundary condition where most might agree that genetic screening of some sort is preferred. The issue is that when price points decrease it is inevitable that instead of preventing Tay-Sachs, parents will want to prevent the birth of a brown eyed child. The latter is a pretty plausible candidate for selection; readers of this weblog who have brown eyes but one blue eyed parent have expressed the wish to load the die so that their children might be homozyogte for the allele which tends* to produce blue eyes. Or going to an example which is less Eurocentric, there are large effect genes segregating within South Asian populations which are responsible for the great amount of within population and family range in complexion. It is entirely plausible that South Asian parents, who are already major practitioners of sex selection, will be open engaging in diagnostic screening so that biological children are as “fair and lovely” as possible out of the potential range (skin color has huge life implications in India, especially for women, so sex selection and complexion selection may actually have opposite effects in that the latter may diminish the “need” for the former).

I jumped straight to cosmetic issues because I’m rather skeptical that governments or cultural elites will be able to prevent a lot of discretionary genetic screening for possible disease alleles. Unless we mandate that the whole society raises children and is responsible for their needs through mass transfer of payments in terms of a cradle-to-grave welfare state I think the demand will be strong enough that any debate of ethical concerns will be rendered moot and pushed to the margins of the anti-biotech movements of the Right and Left. A more interesting issue to me is the implication by Goldstein that we “need” more genetic diversity which might have negative byproducts for creativity.

There are ~7 billion people alive today. In raw absolute terms we’ve got a lot more summed up genetic variation than we did 100 years ago, let alone 1,000 years ago. Is there no point of diminishing marginal returns on absolute variation levels? In other words I suspect that in terms of creativity and the downside risks of removing some of the positive externalities of “oddballs” who are weird and unexpected we’ve got a lot of slack with our huge population. There are always going to be large groups of people who will refuse to manipulate the nature of their offspring, or constrain the parameters excessively. Large perhaps not in a proportional sense, but we’ve got a huge census size now. My own suspicion is that there are limits to how many creative types a society can absorb. Most people are going to be more conventional, or going to have to be more conventional, because much economic and social productivity is driven by workaday behaviors. PGD is going to “perfect” these workaday types. I don’t see a problem with that. There will be huge numbers of Leftist Deep Ecology types and Rightist Roman Catholics who will let nature or god decide for them. These will be the cultural creatives if deviation from the genetic ideal is strongly correlated with creativity.**

* ~75% of the variation in the European population in blue vs. brown eyes is accounted for by a few SNPs around the HERC2-OCA2 locus…but, the prediction algorithm isn’t perfect, so a parent might not get their heart’s desire.

** With widespread whole genome sequencing I’m assuming we’ll actually have time to see if this is true. That is, genetic oddballs are more creative.

🔊 Listen RSS

In science, like most things, one prefers simple over complex whenever possible. You keep adding variables until the explanatory juice starts hitting diminishing marginal returns. So cystic fibrosis is due to a mutation at one gene, and the disease expresses recessively at that locus. The reality is that one mutation accounts for ~65-70% of cystic fibrosis cases around the world, and there are nearly ~1,400 known mutations on the CFTR locus. How about skin color? Mutations on a dozen genes can probably explain ~90% of the variance in the trait value across the world between populations. In fact, one single mutation on one base pair can explain ~30-40% of the trait value difference between Europeans and Africans. This is a more complex story that cystic fibrosis; you have not just many mutations, but many mutations across many genes. But, the number of genes and mutations are manageable. You can keep track of most of them in your head (e.g., I can tell you that SLC24A5, SLC45A2, KITLG, and HERC2, can explain most of the trait value difference between Africans and Europeans without looking it up). Now think about something like height. The only gene I can think of off the top of my head is HMGA2. With obesity I know FTO. The reason is that there’s a veritable alphabet soup of genes which pop out of the numerous studies focusing on these traits. But the reality is that it seems possible that there are many genes which harbor variants of small effect size which in totality account for the range of the trait value. Abstractly this isn’t really that much more complex than the models above. You can imagine it as a concrete instantiation of the central limit theorem. But in practice it does change things when you can’t focus on one gene, or a few genes, but have to understand that there exists a huge class of genetic causes which modulate the expression of the phenotype.

We’ve reached a stage where the mapping from genotype to phenotype is getting a bit on the baroque side. We have come to confront and wrestle with ‘genetic architecture.’ Here’s what Wikipedia says about this term:

Genetic architecture refers to the underlying genetic basis of a phenotypic trait. A synonymous term is the ‘genotype-phenotype map’, the way that genotypes map to the phenotypes.

The genotype-phenotype map has been analyzed in terms of several principal axes: epistasis, polygeny, pleiotropy, quasi-continuity, modularity, phenotypic plasticity, robustness, and evolvability.

And it gets more complicated. Epistasis comes in different flavors. As for the polygenic traits, they also exhibit differences. Pigmentation seems to be a trait where there really are common variants of very large effect. In contrast, for height, obesity, schizophrenia, and I.Q., no one has found them yet if they exist. So polygeny itself has many shades. Combine pleiotropy, the effect of one gene on multiple traits, with polygeny and epistasis, and the tangle of abstraction gets intractable very quickly.

This is why the arguments about synthetic associations can be difficult to unpack. Not only do you have the old problems with complex genetic architectures, but you also have to keep track of concepts such as linkage disequilibrium as well as a model of the physical embodiment of genetic information in the chromosome. Alas, we’re way past the “spherical cow” phase of simplifying for purposes of intelligibility.

So why does this matter? It’s about the “missing heritability”. We know that height is about ~80-90% heritable in developed societies. If you are adopted your height is going to correlate with your birth parents, not your adoptive parents. But very little of the variance in height can be accounted for by genes detected in linkage or genome-wide association studies (GWAS). Neither of these techniques have the power to pick out thousands of alleles of small effect. Linkage is good at detecting rare large effect variants (usually in families), while GWAS picks up more modest effect but common variants (usually in study samples of the same ethnicity).

Unfortunately GWAS hasn’t been that effective in accounting for much of the variation which we see around us. Old fashioned quantitative genetics using statistical techniques based on family relationships is still a better bet for many traits and diseases (e.g., I have a family history of type 2 diabetes, but 23andMe gives me no greater risk). A group last year suggested a solution to the conundrum of why GWAS wasn’t picking most of the genetic variation: synthetic associations. Let me jump to their author summary:

It has long been assumed that common genetic variants of modest effect make an important contribution to common human diseases, such as most forms of cardiovascular disease, asthma, and neuropsychiatric disease. Genome-wide scans evaluating the role of common variation have now been completed for all common disease using technology that claims to capture greater than 90% of common variants in major human populations. Surprisingly, the proportion of variation explained by common variation appears to be very modest, and moreover, there are very few examples of the actual variant being identified. At the same time, rare variants have been found with very large effects. Now it is demonstrated in a simulation study that even those signals that have been detected for common variants could, in principle, come from the effect of rare ones. This has important implications for our understanding of the genetic architecture of human disease and in the design of future studies to detect causal genetic variants.

To understand the logic, you need to recall that the SNP which is reported in a GWAS may not be the causal variant. In other words the SNP is just a marker which is nearby the real genetic cause, but is associated closely enough that the correlation is such that you can substitute the two in terms of their presence for purposes of predicting trait value. This has cropped up as a major issue with the genetics of blue eyes. This is a ‘quasi-Mendelian’ trait. It looks like most of the variation in Europeans is due to differences in the genomic region spanning the nearby genes HERC2 and OCA2, but different studies report different SNPs and haplotypes as diagnostic. It is unlikely that all of these markers are causal, so most of them are just strongly correlated with the true functional variant.

Because of recombination, where chromosomal regions cross over and swap partners, these sorts of associations break down over time. So linkage disequilibrium, where genetic variants (alleles) across loci (genes) exhibit non-random statistical associations, varies over time as the correlations decay due to recombination. Synthetic associations are hypothesized to be cases where very low frequency large effect variants are associated with a more common variant, the latter of which shows up in a GWAS as the associated signal with the trait. Because the correlation between the causal variant and more common variant is going to be imperfect one will only explain a small proportion of the variance (if allele 1 one at locus A has frequency ~0.001 and allele 1 at locus B has frequency ~0.20, their association has to be less than 1 because the latter so outnumbers the former in terms of copies). Additionally, there may also be several low frequency causal variants associated with the common marker.

In other words, the missing heritability isn’t very missing at all. The GWAS are picking up genuine signals, only dampened because of the imperfect correlations between the high frequency marker and the low frequency causal variant. This has practical implications:

…The distance over which synthetic associations occur also offers an alternative explanation to the increasingly common observation of rare variants that occur within the vicinity of a GWAS signal but cannot explain that signal entirely. A simple explanation for such observations is that extending the sequencing to at least 4 Mb and ideally up to 10 Mb around the GWAS signal would pick up other rare variants. In some cases, identifying all the contributing rare variants may explain all of the original signal, whereas in other cases, there could be a combination of rare and common variants contributing. In addition, if synthetic associations are responsible for many of the observed signals, then sequencing in a small number of control samples (even over a much broader genomic region) is also unlikely to succeed. Under our model, the causal sites are both rare and relatively high-penetrant contributors to disease, and will therefore be unlikely to be detected in a small number of control samples. Finally, the focus of attention on genes that are near GWAS signals may be incomplete or misleading in that the actual causal sites may occur in many different genes surrounding the implicated common variant. It is also worth emphasizing that as few as one or two rare variants, at much lower frequency than the associated common SNP, can create a significant synthetic association. In such a case, sequencing a small number of cases that carry the “at risk” common variant might miss entirely the causal rare variants even if the correct genome region is resequenced. These considerations argue for caution in efforts to resequence around genome-wide associations and argue instead that genome-wide sequencing in carefully phenotyped cohorts might be a better use of resources.

One of the papers rebutting the one above, Rare Variants Create Synthetic Genome-Wide Associations, will be covered at Genomes Unzipped. So let’s look at the other one. Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results. Frankly I found the paper hard going. The basic units of each section are intelligible, but recalling them as a coherent whole is not as easy. Part of the reason is that they take the simulations of the Dickson et al. paper, and raise them one. And simulations are to some extent “black-boxes,” at least unless you replicate them and get a feel for how modulating the parameters tweak the outcomes.

First they explored how varying the number of rare causal variants associated with a common associated SNP would effect the distribution of frequencies of the latter, and how they compared to the empirical distribution detected. What’s interesting here are panels A and D, E, and F. The first just shows the distribution of frequencies of detected SNPs in GWAS. They go from 0 to 1. D, E, and F simply show you the expected frequencies of the associated allele with the rare causal variants for a given k of variants. 1, 9, and 18, respectively. What you see is that for synthetic associations the distribution of variants associated with the rare causal SNPs should skew toward the lower end. Also, they found that irrespective of the number of k variants the associated SNP only explained ~10% of the trait variance. Finally, they also suggested that the effect size of the rare variants would have to be very large indeed for the GWAS to pick up the associated SNP. This is a problem since there’s only so much variance to go around. And, it begs the question: if the variants are of such large effect why didn’t linkage studies pick any of them up? Speaking of large effect, once you start adding up k variants to a locus you begin to narrow the regions of the genome in which causal variants can concentrate within. They authors indicate that such clustering within the genome is simply not found, another argument against numerous synthetic associations.

Next they looked at results from schizophrenia research, and attempted to see how it mapped onto the predictions entailed by a synthetic association model. The top panel shows the observed data. Not quite a uniform distribution, but there are rare variants, and common variants, and variants in the mid-range frequency. The bottom panel shows simulated results using the synthetic model. As expected you see a skew toward rare alleles, and a deviation from what is observed. Additionally, they note that they ran the simulations with a lot of different parameters, and those that included common variant alleles always tended to have a better fit with the realized results than the synthetic model predicted on rare alleles of large effect size.

The short of it is that the authors conclude that the model outlined last year simply does not fit the empirical results very well. They do not deny the existence of possible synthetic associations, but they seem to suggest that this variety of associations is not that important in explaining the missing heritability. Additionally, they note that rare alleles of large effect should not span populations, since they are likely to be evolutionarily novel. But recent work in fact suggests that risk alleles in one population is highly portable to another population. So genetic architecture may not matter as much we suspected when it comes to inter-population difference.

Why is this important? Money and time, which are both finite:

Empirical observation suggests that much of the missing heritability is contributed by causal variants (including loci comprising multiple rare variants) having effect size too small to be detected with stringent statistical significance…Larger samples for GWAS are needed to detect these which would directly compete with research funds used in sequencing studies. …Genes identified through GWAS harbouring common variants are likely to be good targets for identification of rare variants and for sorting the wheat from the chaff in next generation sequencing studies. We expect that continued GWAS will make valuable contributions to our understanding of many complex traits and will, for some time, remain as one important tool in a growing set of technologies to probe the full spectrum of genetic variation efficiently.

At the end of the day I’m interested in evolution. But to understand evolution you need to understand the genetic architecture of the traits which are the targets of natural selection. I’ve only skimmed the paper, so I really recommend you read the original for the “blood & guts.” Actually, read it a few times!

Also, please see David Goldstein’s response. I felt he was rather cordial, given the rather forceful tone of the two papers which challenged the one that came out of his lab.

Citation: Wray NR, Purcell SM, & Visscher PM (2011). Synthetic Associations Created by Rare Variants Do Not Explain Most GWAS Results PLoS Biology : 10.1371/journal.pbio.1000579

Image Credit: Sailko.

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"