RSSMore importantly, if you want to make any kind of argument based on the spatial location of the points such as testing if two populations are distinct (beyond ‘ocular methods’), or determining how strong that distinction is, then spurious positioning in PCA space is really harmful, and could lead to false answers.
obviously i defer to Nick Patterson on this, but I think Patterson et al. (cited in #3) and McVean (cited in #1) make the point that positioning in PCA space is not “spurious” (depending on what you mean by that), but can instead form a formally testable model for whether two populations have diverged, and by how much.
PCs are interpretable in terms of standard population genetic parameters:
http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1000686
I don’t follow the objection: certainly alleles that are private to a population are informative about population structure, no?
yeah, i was looking at hg18. the encode pseudogene track seems to be the same thing i’m linking to, except limited to the encode pilot regions. odd.
How do scientists distinguish introns and exons?
if you sequence a processed mRNA, you can align the sequence back to the genome–the things included in your sequence are exons, and the big gaps are introns.
My reading is that neither rebuttal letter addressed McClellan and King’s point about lack of reproducibility between studies — and this new study of blood lipid loci may be as false-positive-prone as others cited in the original editorial and the counter-rebuttal.
the current standard in GWAS is to reproduce an association in multiple cohorts before publication (in the lipid study, the authors go even beyond this and replicate the associations in several non-European populations). McClellan and King are simply misinformed about how reproducible these results are.
I’m guessing that most GWAS investigators had their fingers crossed for coding region mutations (like any other forward geneticists)
yes, i imagine they were. that said, biology is complicated, such is life. Instead of being disappointed, I think instead that this is an exciting time for human biology.
Second, as an experimental biologist, I share McClellan and King’s skepticism that effects with odds ratios in the 1.33 range will ever be understandable in a mechanistic sense.
fair enough. time will tell; i’m an optimist. look at the paper i linked to above, for example. it’s a great example of the cool biology that can come out of these things
http://www.nature.com/nature/journal/v466/n7307/abs/nature09266.html
paper on population structure with PCA:
http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.0020190
so how does this happen? they have to pass their letters to some other people for second opinions, right?
in this case, I think they must just be insulated. and since it’s a response to letters to the editor, i don’t think anyone else has to see it before its published.
you know, it’s possible to argue about the relative importance of rare variants versus common variants in different diseases. fair enough. McClellan and King, however, are arguing that many (most?) associations between common polymorphisms and diseases are false positives. this is a very different point, and one that is (luckily) wrong.
a few examples of mistakes:
Many SNPs, inversions/deletions (indels), and short tandem repeats vary widely in allele frequency among populations. This is especially true for variants that are not in coding or regulatory regions because these alleles vary with population clusters in patterns more consistent with neutral drift and migration rather than with selection (Coop et al., 2009). The colonization of the world by modern humans was carried out by a series of founder populations with subsequent rapid expansion of population size. Neutral alleles emerging at the forefront of these expansions “surfed” waves of population growth. Variations in allele frequencies across populations stem from differences in the timing of the variant’s emergence in the expansion.
I have no idea what to make of this paragraph. First, it’s not clear what this has to do with GWAS. Yes, sometimes allele frequencies are different between Africa and Europe, but this should not affect a GWAS in Europeans, for example. Second, Coop et al. (2009) does not claim that most large allele frequency differences between populations are neutral (though again, why does it matter for GWAS whether they’re neutral or not?). Third, most genetic variation in humans was present in Africa before humans expanded out, so the variation in allele frequencies has absolutely nothing to do with the “timing” of the emergence of the allele.
We suggest that associations based on such highly variable SNPs are often artifacts of cryptic population stratification. Wang et al. argue that standard GWAS strategies have been adopted to control for population stratification. However, these methods control by person, not by SNP. Because populations from large geographic areas (e.g., Europe) are genetically heterogeneous, outlier SNPs that vary widely among subgroups of such populations are not excluded by these methods and often drive positive associations.
My emphasis. Where is the evidence of this? How can you make an off-hand comment like, “Oh, by the way, i’ve found the fundamental flow in modern genetics”, when no one agrees with you and you have no evidence? For the case of the autism locus discussed, Wang et al. make a very clear case that 1) it’s not extremely variable between populations, and 2) that it doesn’t really matter if it were. For example, are all the 96 loci identified in the lipids study false positives due to population structure? Of course not; probably none of them are.
Both Klein et al. and Wang et al. suggest that the vast majority of GWAS risk alleles are in LD with causal mutations, and that intergenic and intronic risk variants represent regulatory elements. In principle, either or both of these hypotheses could be true. However, thus far, virtually no such mutations or elements have been found by following up on GWAS findings.
Finding a functional non-coding variant is more difficult than finding coding variants. That said, it’s simply not true that there are “virtually no” examples. See the example in this post, and in the previous post. If a causal variant has not yet been found, it does not follow that the signal is a false positive.
Only additive genetic variance contributes to the (narrow-sense) heritability. So non-linear effect are not directly relevant here, though they may be important in other situations.
eh, if you look for evidence of mixture (as they did previously, using a bunch of nuclear loci, not just mtDNA) and don’t find it, you come up with reasons why not. obviously their prior belief about the probability of admixture wasn’t 0, or they wouldn’t have even bothered looking. so maybe your prior was 90% on admixture and theirs was 10%; fair enough.
There was no positive genetic data of admixture, even after the first analyses of nuclear loci, and you can make non-silly justifications about why there wouldn’t be any (maybe hybrids had very decreased fitness, for example–even an advantageous locus has to last through the first few generations when it’s linked to the rest of the introgressed genome).
Or maybe they just convinced themselves there wasn’t any admixture so they could enjoy the surprise of finding it 🙂
interesting. I gave McClellan and King the benefit of the doubt and assumed were using the standard shorthand when describing an association (ie. I thought when they said “associated SNPs have no known function” they meant “neither the associated SNPs, nor any correlated SNP in the region, have a known function”. )
On re-reading, I think you might be right that they’re actually actually wondering why all associated SNPs aren’t functional. On the other hand, they’re certainly familiar with linkage studies, which also use random polymorphisms as markers to track the inheritance of functional ones, so they must see the analogy, right?
but I can’t help thinking that if we are seeing the majority of these mutations in regions with no obvious biological significance, then the big question is why there and not as frequently in the regions that do have biological significance?
so take the example of celiac disease in my point 2. I don’t know how many of the SNPs they identified would have “obvious” biological significance if you were to look at them; my guess is few of them. But that’s beside the point–when they went and assayed gene expression in a relevant tissue, they found that the SNPs influenced it (or half of them did). Now it doesn’t matter if the SNPs have any “obvious” function a priori–they clearly have some function we’re not good at identifying yet from genome sequence alone.
Or is it as the authors suggest – that because the effects tend to be on the order of an odds ratio of 1.5 or less, what we are seeing could as easily be due to cryptic population stratification causing spurious associations?
this is the most bogus of the authors’ claims, since it’s readily testable. In fact, people have looked to see whether SNPs associated with disease show more population structure than random SNPs (in the context of wondering if the structure is due to selection, but the point remains). In general, the answer is no, SNP associated with disease look about the same as your average SNP, with a few interesting exceptions. see here:
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2440747/
http://genome.cshlp.org/content/19/5/826
the authors identified a single SNP (one of the signals from a GWAS that hasn’t replicated) that they think might be due to population structure, but this is clearly a cherry-picked example (and not even a conclusive one at that)
What comes to your point three, I’d be more optimistic and say the two papers below pretty much nailed the colorectal cancer risk function for the SNP on 8q24
good point, i’d forgotten the exact results from those two papers
whoa, what a bizarre paper. Figure 1 is a Venn diagram of nothing!
the response you link to gives a lot of good context; it seems like this is one of those cases where a legend in a field find himself a little out of his depth and gets pugnacious…
Good question. I think the data is there, but I don’t think anyone has done a really careful analysis of this sort. maybe something like this is the closest that’s been published:
http://www.pnas.org/content/106/23/9362.long
this question would definitely be worth revisiting.
strictly speaking, every allele has some selection coefficient. the number of alleles with selection coefficient X at some frequency Y depends on the population size, the mutation rate, blah, blah…i’m sure you’re familiar with this. alleles with a small effect on risk are nearly neutral.
>As it happens, very few such associations have emerged from GWAS for psychiatric disorders, indicating a small contribution of common variants to overall phenotypic variance
again, this is a non sequitur–an alternative is that common variants have very small effects.
>On the other hand, an increasing number of very rare mutations with large effects have been and continue to be discovered
this is certainly true. but rare mutation, by definition, contribute very little to the overall phenotypic variance. let’s say we do full genome resequencing from 500 schizophrenia cases and 500 controls (presumably this will happen in the next few years). A number of interesting things will be identified, without a doubt. What fraction of the phenotypic variance do you predict these things will explain?
>Common variants are common because they are almost invariably neutral.
right. and a polymorphism with a slight effect on disease risk is essentially neutral. neutral != no phenotypic effect.
ok, i see the argument. yes, that’s plausible. It’s also consistent with background selection on new deleterious alleles, and positive selection on standing variation. probably a weighted average of all these effects, and the weights are unclear.
>mean Fst between china and japan in the hapmap is ~0.005, fwiw.
So in light of that, let me give a different example (I admit I sort of randomly pulled those initial numbers out of the air without much thought):
Allele 1: 60% in Japan, 50% in China
Allele 2: 55% in Japan, 50% in China
The observation (again, I think) in this paper is that the former are more common in regions of low recombination compared to the latter.
>A distinction we might make is the a new mutation may be better or (more likely) worse than the “wild” type, in the former case we expect “positive” selection and in the latter “negative”.
Yep, that’s what I mean.
>Of course they’re not finding new selected alleles; they’re seeing the effect of linkage with selected alleles which probably have not been identified.
Right. So we agree, it has nothing to do with low frequency, geographically restricted alleles 🙂
mean Fst between china and japan in the hapmap is ~0.005, fwiw.
>Still, there are limitations to this approach, namely: “for most disorders, we do not know what the relevant quantitative traits are”.
this is more of a fatal flaw rather than a limitation, no?
not to take anything away from you, but this was also proposed in the initial paper identifying EDAR as a target of selection–see Supplementary Figure 7 in Sabeti et al. (2007)
http://www.nature.com/nature/journal/v449/n7164/suppinfo/nature06250.html
the reviewers must have made them take all mention of this out of the main text due to being too speculative, but the figure legend is:Figure S7 Prevalence of tooth shovelling and EDAR-Ala370 allele in 4 Sinodont populations. A great deal is known from the anthropological record about the physical traits regulated by the EDA pathway, particular teeth and to less extent hair, in human populations. There are two distinct tooth patterns common to Asia 1, defined by a phenomenon called “tooth shoveling,” in which the back surface of the upper incisors has a “shovel” appearance.1 Shoveling consists of a “combination of a concave lingual surface and elevated marginal ridges enclosing a central fossa in the upper central incisor teeth.” 2. The pattern is particular among the Sinodonts, a population that evolved from the Sundadonts (the original inhabitants of Asia) as they moved north and inland into Asia. Sinodonts evolved in present-day China, and they also migrated from the Asian mainland into Japan around 2,000 years ago. Native American populations came from Asia in at least two waves of migration,3 and may be in part populated by Sinodonts. High tooth shoveling frequencies have accordingly been reported in Sinodont populations in China-Mongolia, Japan, NE Siberia-Amur, Aleut-Eskimo, Greater NW Coast, North America, and South America. We had EDAR-Ala370 allele frequency data for four Sinodont populations, where tooth shovelling frequencies have been determined and examined the correlation. There are many limitations to this analysis. Only 4 populations (as well as Europe and Africa) frequencies are known. Moreover the samples are not the same and may reflect different subpopulations.
i guess i don’t know the field very well (I was basing that comment on a conversation about genetics and warfarin dosing, or the relative indifference of MDs to genetics when they can do a quick blood clotting test in-house); what tests are you referring to?
Does anyone get how they selected candidate genes?
they say in the supplement they were screening all circadian clock-related genes (a number of these have been identified in, eg. mutagenesis screens in various organisms).
This is not a linear regression, as pointed out by Eric Johnson above (and in the paper). Intuitively, if one samples a large number of haplotypes from some allele frequency distribution, most of them will not be “new”, since many dogs share the same haplotype.
Given a certain mutation rate and population size (a joint parameter of the mutation rate times population size is what has been estimated by the authors), it is possible to determine a distribution on the number of haplotypes sampled once, twice, three times, etc., in a sample of N haplotypes.
http://en.wikipedia.org/wiki/Ewens%27s_sa
This can be modified to be a distribution on the number of different haplotypes in a sample of N, though I can’t derive it off the top of my head (the result, though not the derivation itself, is in the paper).
I think the current consensus is that there’s a single causal mutation (rs12913832) for blue eyes. This is also the SNP that has the geographic distribution that makes the most sense (ie. it’s absent from E. Asia and Africa)
that said, Collins has done great work in disease genetics, and has been at the helm of a number of hugely successful projects while at NHGRI. So a “reasonable choice”. my guess is that a bit of money will go towards establishing a large, prospective study of disease and physiology including genetics and as many environmental variables that people can think to analyse. This would be a fantastic resource for all sorts of questions.
exactly. see the biologos foundation, established by Collins. The answers to the questions at the bottom of the page are the sort of thing that make people a little uncomfortable.
eg.
Question 18: At what point in the evolutionary process did humans attain the ?Image of God?? This is on par with “how many angels can dance on the head of a pin?”
or
Question 22: Did evolution have to result in human beings? See the long, rambling way of saying “yes”, when the answer, by all reasonable evolutionary arguments, is “no, of course not”. and really, the question is ill-posed.
No mention of Cochran’s pathogenic theory?
I wasn’t aware it was Cochran’s theory (as opposed to the theory favored by Cochran)? One of the papers:
Schizophrenia patients are more likely, compared to the general population, to have been born in the winter or the spring. Although infections such as influenza and measles have been proposed as a possible mechanism for this distortion, a clear association between infectious agents and schizophrenia has not been demonstrated. The association with the MHC region reported here supports a role for infection but, as many non-immune-related genes are also found in the extended MHC region, it does not provide strong evidence. On the basis of the 3,130 schizophrenia patients for which month of birth information was available, no significant difference in the frequency of the top SNPs from the MHC region according to season of birth (winter/spring versus summer/autumn) was identified (P > 0.29).
the chocolate example, as described, seems odd. certainly one possibility is that humans compare the prices of two things in relative terms–that is, 15 cents is 15X the price of 1c, but 14c is infinitely more expensive than free. this is testable (increasing the price of both chocolates by 10c should further skew people towards the truffles), and would explain behavior much more parsimoniously than invoking some sort of weird psychological phenomenon associated with the word “free”.
And yes, Howser, my feeling is exactly what you state, that broader-defined phenotypes should not be combined with more specific phenotype dimensions in an analytical sense
wait, i thought everyone agreed that more phenotypes should be included. right now the proper way to analyse these correlated phenotypes don’t exist, but I’m sure they will soon, and some methods (eg. bayesian networks) might actually allow one to come up with causal relationships between them.
howser,
yes, i agree, the impact of lumping together of phenotypes depends on how shared the actual genetic factors are, which of course isn’t known ahead of time (and for the record, I don’t think the WTCCC approach to bipolar was unreasonable; it’s of course only after the fact that we get to speculate about why they didn’t find much). however, in general, I think a lot of effort should be put into better phenotyping.
Take the WTCCC GWAS of bipolar disorder. Bipolar is a heterogeneous disorder, and individuals with a bipolar diagnosis can present with very different symptoms. This is a fundamental biological problem that needs to be addressed before you can accurately determine power and thresholds for statistical significance. It really wasn’t surprising nothing exciting came of it.
oh ok, yes, in this case i totally agree. I’m not an expert in bipolar disorder, but I do think a lot of these psychiatric diseaes are not well defined, and people are likely lumping together things that have entirely distinct genetic etiologies. the problem is how to know this ahead of time (perhaps better phenotyping–ie performing association studies on a number of symptoms, rather than the disease diagnosis itself–would be helpful)
perhaps this is a proper extension of your metaphor, p-ter: there are 10 people who are planning on going on a boating trip together, and they need to see if any of them have a very rare (1 in a billion) contagious disease first. theyre all tested for the disease, with a test that is accurate in giving a positive result 99% of the time. multiple testing is used to ensure that the 10 hypothesis tests are corrected for multiple testing. even after that correciton, it’s unlikely that if a person gets a positive result that it’s truly positive.
yes, that’s right. the multiple testing correction ensures that if you took random groups of 10 people without the disease and tested them all, only 5% (or whatever) of those groups would show up as having the disease (whereas if you didn’t do the correction, it would be more like 50%). but intuition (and statistics) suggests that even if you get a positive result, you probably don’t have to worry.
If you instead focused on a certain tumor type in individuals with the same demographics (ethnicity, other environmental risk factors for that particular tumor, etc..), you have reduced your statistical power dramatically but wouldn’t you be much better placed to find specific genetic risk factors for that tumor type?
well, if you’re interested in a certain tumor type, throwing out people who don’t have that type will increase your power. i didn’t think that’s what was being suggested.
well, yes, replication is important, the question is what is worth trying to replicate (the motivation for this post was an association that had conflicting reports of replication in the literature for 8 years).
and throwing out half your data (ie. reducing your power even more) to toy around with “statistical significance” is a bit silly.
thanks for the clarifying metaphor. is there a way to expand that metaphor so as to explain the role of the correction statistics?
well, maybe…imagine you want to be really, really stringent about false positives because following up a positive test costs thousands of dollars, so you come up with a test that only comes up positive in 0.00001% of people who don’t have the disease. still, you can do the calculations in that link–depending on the other parameters (ie. what if it only comes up positive in 0.1% of people that do have the disease?) it’s still likely that you don’t have anything to worry about despite a positive result.
i think the main point is just that multiple test corrections are intended only to allow you to have properly-calibrated p-values (ie. you want to get p
imagine you’re getting a test for a fatal rare disease, and that the test is 99% accurate. the test comes back positive. how worried should you be? not very.
why? The p-value– P(positive result | no disease) is not the same as what you want to know: P(no disease| positive result).
this same principle is at work in the association study context.
If only a fixed percentage of the associations are false, how could none of them be real?
i think that must be a typo: only a fixed percentage of the false tests will pass your threshold, but there’s no guarantee that any of those that do pass the threshold are real.
howser,
greater then/less than signs get interpreted as html tags…
yes, he’s making a good point about models almost always being wrong in some way. that said, genotype-phenotype associations are like the birth ratio example he gives–the null hypothesis of no correlation between genotype and phenotype is almost always completely correct, and we’re looking for really tiny correlations that denote real effects. (this is why things like population structure, which cause the null to be modeled slightly incorrectly in some cases, are such a problem)
That doesn’t really address the issue–that addresses the significance threshold you choose. but the question is: given that a marker passes that significance threshold, what is the probability that it’s a real association?
I guess the issue is: people worry a lot about reporting proper p-values, but the interpretation of a p-value is not the same in every study–for these phenotypes where effect sizes are small, a p-value (even a corrected p-value) of 0.05 doesn’t mean the same thing in a small study as in a large study.
it is compatible with known physiology and biochemistry and makes general sense, too.
the only reason anyone looked for association in MAOA to begin with is because it’s compatible with known physiology and biochemistry and makes general sense. that’s why it’s a candidate gene. it’s somewhat circular to argue that then makes the association more believable.
my bet is that “violence”, once subject to a large, genome-wide association study, will be influenced by many loci of much weaker effect than those reported for MAOA, and that those loci will not include the polymorphism reported in the MAOA studies. this is based entirely on analogy with disease studies over the last few years. the lesson from these studies is clear: almost all small candidate gene studies were massively underpowered, leading to false associations being pursued for quite a while.
But in the case of MAOA the literature is full of the evidence of association
yes, the psychiatric genetics community (or some subset therein) has the tendency to take a single popular polymorphism (the repeat upstream of the seratonin receptor, nonsynonymous changes in COMT, DRD4, etc.), and try to associate them with any and every phenotype you can imagine. check out the list of things this single maoa polymorphism is supposed to affect:
http://scholar.google.com/scholar?q=ma
Off the top of my head A1AT deficency and smoking is clear or G6PD defiency and malaria or rxn to some drugs.
agreed. drug reactions are probably the best examples of GxE interactions.
So you don?t believe that MAOA is a violence gene? So I guess you don?t believe in Brunner syndrome, either
not every gene that causes a monogenic disease also harbours common variation influencing phenotypes related to that disease.
wrt to the title, i’m guessing “Report on gene that plays modest role in depression risk via gene-environment interaction is now faulted” just didn’t fit in the space 🙂
I thought that article was actually pretty good. Why do you think it implies genes don’t play a role in depression? what’s really ridiculous is the quote from Caspi:Others said the new analysis was unjustifiably dismissive. ?What is needed is not less research into gene-environment interaction,? Avshalom Caspi, a neuroscientist at Duke University and lead author of the original paper, wrote in an e-mail message, ?but more research of better quality.?you get the impression he’s worried more about funding to studies like his being cut rather than the fact that everything he’s published is probably wrong.
I’d think that that’s a side effect and that the advantage comes in somewhere else
that’s my guess as well.
the same snp also possibly influences milk production:
http://www.springerlink.com/content/x6t1383937112546/
For instance, a 1% variant would be largely invisible to current GWAS, even if it had a large effect size (let’s say a per-allele increase of 1 SD, or 2 inches of height, which would mean it explained 2% of the total variance in height – very respectable compared to most common variants)
I see. Post-1K Genomes Project, the next generation of SNP chips will include things of this frequency, right? so to some extent, you don’t need to sequence your cohort to get these?
it’s still way too expensive. I think by the end of this year Illumina should be able to do full resequencing of a human genome at acceptable coverage for $10K (maybe less?). a SNP chip costs $250.
I don’t see what’s so baffling about a division of labor between theorists and experimentalists
if “having a hypothesis” counts as theory, then all experimentalists are theorists.
ok, i’m re-reading this. he’s definitely implying that his calculations on the number of SNPs influencing a trait only apply if the SNPs are common, no?
Though the strongest SNP may have been found, many SNPs could remain unidentified in the range of the lower effects that have been determined. If such SNPs are accounted for, fewer SNPs will be required to explain a given proportion of variance. The sample sizes that have been studied for height, however, range from 14,000 to 34,000. At the lower sample size, the power of detection is 90% for the largest effect size; for effect sizes as small as 0.05%, the largest sample size provides a 10% chance of detection. Even if we conservatively assume that all remaining unidentified variants influencing height each explained as much as 0.05% of the variation, 1500 such variants would be required to explain the missing heritability. These calculations also assume that the effects of “height SNPs” are additive. If variants show meaningful interactions, a somewhat stronger genetic effect could emerge among variants with small individual effect sizes. But only dramatic departures from these assumptions would allow a manageable number of common SNPs to account for a sizable fraction of the heritability of height.
I can’t say for sure, but I imagine he’s thinking about variants at a frequency of perhaps 0.1% with a per-allele effect of, say, 0.5 SD (about an inch of adult height, IIRC). Each such variant would explain 0.5% of the population variance and yet would be essentially completely undetectable using current GWAS.
hm, yes, true. but then those effect sizes are still subject to his exponential decay curve–there will still be 93,000 variants affecting a trait. If there are 93,000 rare variants spread across 5,000 genes (or whatever) that affect a trait, I don’t see why that leads to more biological insight than 93,000 common variants spread across 5000 genes.
actually, i may still not be getting his argument. if we assume he’s not making the mistake I say he is, then (still following his example and model), there are ~100K variants, both rare and common, that affect height. If we identify those variants and they cluster together in pathways, then we gain biological insight. If they don’t cluster together, then we don’t. I guess we’ll see, and I presume they will. But what does this have to do with the frequency of the variants? It’s perfectly possible that rare variants in many, many genes impact a phenotype, no?
P-ter: feel free to go ahead and do it. I’m sure that permission and funding will be a snap.
no, i agree the logistics would be difficult. but it’s a dirt-cheap experiment–all you need to do is assay (or infer) carrier status and IQ in families carrying the mutations. maybe i’m being naive about how willing such families would be to participate. but certainly it’s worth putting some effort into if you prefer the adjective “vindicated” to “controversial” 🙂
Cochran and Harpending readily acknowledge the need for such experiments. But they have no plans to do them. They say their role as theorists is to generate hypotheses that others can test.
“One criticism about our paper is ‘It can’t mean anything because they didn’t do any new experiments,’ ” Cochran said. “OK, then I guess Einstein’s papers didn’t mean anything either.”
seriously, what a crock. testing the ashkenazi hypothesis as pinker proposes would involve an association study, using completely standard techniques (no one expected einstein to create and perfect some new cooling technology to prove the existence of the Bose-Einstein condensate, but if such a technology had existed, one imagines he might have been interested in it). the fact that Cochran et al. profess absolutely zero interest in doing a trivial (intellectually, though admittedly not logistically) test of their hypothesis baffles me.
In any event, Cheap Sequencing Now!
agreed.
in some sense, being an advocate for sequencing like Goldstein is trying to do is like being an advocate for PCR–people are going to do in no matter what. ultimately, the study design for any disease is going to be full sequencing of tens of thousands of cases and controls (and discovery of both rare and common variants). Someone is going to assemble those massive cohorts, and why not genotype them on a standard chip once you’ve got them assembled? Sure, you’ll miss some things (which you might then find by sequencing those same individuals once the price gets reasonable), but you’ll probably also find a lot of interesting things.
The easiest way to identify those core pathways is to find variations that confer a severe loss (or gain) of function phenotype with high probability
i think you’re referring to monogenic forms of complex diseases. while they exist (and have been studied successfully for some time), they are not the focus of Goldstein’s argument. (it’s often helpful to look at the figures of a paper to decide what what author thinks his/her main points are. In this case, Goldstein has one figure–a plot of the exponential decay of fraction of variance explained)
if, for example, many cases of type II diabetes were due to severe mutations with high penetrance, that would have been noticed in pedigrees. Some instances have indeed been noticed [link], but they explain a relatively small fraction of all cases.
I do think looking at monogenic forms of disease is helpful. But that’s neither here nor there.
And note Goldstein does mention that common polymorphisms can identify interesting genes/drug targets:Some experts emphasize that small effect sizes don’t necessarily mean that a gene variant is of no interest or use. Effect size is a function of what a variant does: it may change only slightly a gene’s expression or a protein’s function. The gene’s pathway, however, may be decisive for a particular condition, or pharmacologic action on the same protein may produce much larger effects in controlling disease. These arguments are reasonable, as far as they go, and there are supporting examples, such as a polymorphism of modest effect in PPARG, a gene that encodes a drug target for diabetes.
Actually, considering the spreads one normally gets in population studies, the data are compact about the regression line. Of course the trend is very small, and the r2s pathetic, but the p values are good.
sure, i’m not really disputing the result, though it’s tough to really see where the density lies in plots like this. just pointing out that the figure in the main text is quite a bit more visually striking than the actual data.
interesting. dogs are getting all the press/investment these days, i wonder when the genomic tools for mapping in cats will get to the same point.
don’t know of anything done on epigenetics in the sense of chromatin modifications.
however, there is some work on changes in the bacterial composition of the gut after weight loss surgery
http://www.ncbi.nlm.nih.gov/pubmed/19164560
it’s difficult to say what’s going on. certainly the demonstration that mouse FTO knockouts gain less weight when put on a high-fat diet makes it a strong candidate for being the relevant gene.
I find it interesting how everyone just glosses over the fact that this individual has the X4 variant. Lacking CCR5 doesn’t mean much if you have a pathogen which can enter a cell without the CCR5 motif.
good point, I didn’t fully get that. certainly complicates the interpretation.
i guess my instinct was that this has too many obstacles to become viable (esp. given that “current care” treatment of HIV is pretty good now), but in the long term perhaps this is not far-fetched at all.
man, I occasionally come across old posts I wrote that are simply embarrassing. It’s always easier to think you’re not an idiot when you don’t have documented evidence of all the stupid shit you’ve said. definitely humbling.
we can taxonomize traits into those whose genetic architecture we understand pretty well (skin coloration) and those that are still a puzzla (IQ). I am curious to see where the social network building blocks fit in.
it would be pretty astounding if it were the former, no?
the major reason these loci pop up again and again is because they are the only loci that ever get tested
this was my first thought as well.
Are you dubious of the big idea? Or just of the examples? And considering the large numbers of examples suggested, wouldn’t the law of large numbers imply that more than just the Neanderthal idea is true?
The examples. I think the big idea is more specific than just “humans vary”, but whatever.
The law of large numbers? Sure, the fraction of correct hypotheses should converge in the limit to the overall fraction of true hypotheses in hypothesis space – about zero. 🙂
re: Haldane, number of selective sweeps, etc.
I’m not sure this (a large number of strong selective sweeps) is the same situation considered by Haldane; I don’t think Anonymouse’s original questions are trivial.
Hawks et al. propose thousands of strongly selected alleles. If you consider a single selected allele at a selection coefficient of 1%, that 1% is relative to all individuals not carrying that allele (depending on how you model selection). But those individuals are also carrying alleles that, in this model, are highly beneficial. So if this allele had arisen on its own (ie. not in a poppulation with thousands of other selected alleles), I think we have to conclude its selection coefficient would have been orders of magnitude larger?
I’m not sure exactly how to think about this, maybe other people–who actually think this situation is going on–have a better idea?
I read most of the “Savage Minds” article you linked to to explain why anthropology has a “Guns, Germs, and Steel” problem.
oh, sorry to direct you to that, I actually couldn’t get through most of it. I just liked the opening anecdote about having to discuss a book you think is wrong but which defines a field for non-specialists.
I gather that you think that the idea that the Ashkenazi Jews are A. significantly smarter than average and B. this is a consequence of selection – is implausible or unlikely. Why?
A. seems reasonable, B. is vaguely plausible. Certainly the evidence is pretty circumstantial-this will probably be answered one way or another in the relatively near future.
The kind of words that can come back to haunt a man.
I know! 🙂
see figure 4 in this paper:
http://www.nature.com/ng/journal/v40/n3/abs/ng.78.html
So this research is based on an unproven assumption?
ha. i guess that depends on what you mean by “proven”…
evolutionarily conserved (and thus presumably functional) sequence is preferentially near genes, snps that differentiate human populations are preferentially near genes, removal of gene deserts rarely leads to an obvious phenotype (while removal of genes or regions near genes often does), etc.
i think it’s fair to say that the fraction of nongenic dna that is functional is much much lower than the fraction of functional genic dna.
given that genic regions are only about 3%
correction: that precise number is not right if you include introns, etc. not sure what it is off the top of my head.
Does it mean near genes coding for proteins (as contrasted with non-coding DNA)
yes, that’s it. generally “genic” is defined as within a few kb of a protein coding gene, including introns, utrs, etc.
Why should natural selection only be limited to the gene area of the genome? Are the researchers saying that natural selection doesn’t affect other areas?
no, the premise is that selection should be enriched in or near genes (given that genic regions are only about 3% of the genome, this doesn’t even mean that, in absolute numbers, genic selection is more common than nongenic selection). ie. if there are 100 selective sweeps, 50 near genes and 50 not near genes, that’s a massive enrichment for genic regions.
basically, the assumption is that a given mutation is more likely to be functional if it falls near a gene.
A single mutation isn’t likely to become fixed if it is neutral
a single mutation isn’t likely to become fixed if it’s beneficial either.
yes, ok, people make fair points–the distributions of selection coefficients on mutations of different types are certainly different.
but conditional on knowing the effect of a mutation (in this case, they have a massive impact on pigmentation, and potentially other things), the type of mutation (frameshift, etc) carries no additional information with regards to selection.
Neither is the most neutral-looking mutation that ever came down the pike.
well, both are certainly functional. but the type of mutation conatains no information about the selection coefficient on it, the devil’s advocate would say. a mutation that changes a neutral phenotype is necessarily neutral.
I probably just can’t grasp the math, but I find it hard to believe that a trait without any advantage whatsoever can manage to swamp an entire population.
any new neutral mutation has a 1/N probability of reaching fixation in the population (where N is the effective population size). there are many mutations that could possibly knock out the function of a gene. over time, all of these mutations will occur in someone many, many times. so if there’s no selection against those mutations, the probability that one of them will reach fixation in the population is 1.
When purifying selection on a gene is removed, why can’t several different loss-of-function alleles all drift upward (regardless of whether the scope is one population or several populations)? This could be called “convergent decay.”
I think that’s a fair point–if, say, the common ancestor of a number of species had a gene for smelling compound X, then that gene became a pseudogene in all the species that adapted to environments without compound X, that wouldn’t necessarily be evidence for positive selection. In the pigmentation case, it’s plausible loss of function could be constrained to a couple loci due to pleiotropic effects.
I think having a number on the divergence times between the populations of cavefish here is pretty important–if, like i guess, it’s pretty recent, that’s stonger evidence for positive selection. One could also look at the sequence around these loci for other evidence of selection (in LD, diversity, etc.)
maybe i didn’t write that clearly. a priori, one might chalk depigmentation in cave fish up to drift (this was, i believe, darwin’s hypothesis). the molecular evidence–multiple convergences, including independent mutations in the same gene–is powerful evidence that it is instead due to strong positive selection.
Could it be that Hammer et al. oversampled the descendants of polygamists and vice versa for Keinan?
don’t know, i kind of doubt it. I think it’s more likely to be something like what hawks suggests–different ways of calibrating mutation rates.
keinan et al have three analyses–one of differentiation, one of the allele frequency spectrum, and one of diversity. hammer et al. present data based on diversity only. looking that those other two aspects of the data in the hammer et al. dataset would probably shed some light on where the discrepancy is coming from.
it actually doesn’t bother me when people talk about not having to “invoke selection”. In some sense, a selective model is like a neutral model with an additional parameter. So it feels natural to have to justify including an additional parameter in a model (like in all hypothesis testing). though it’s true it’s not exactly analogous.
I wonder what complications come with antibiotic use. Once a healthy (whatever that is) population of intestinal bacteria is wiped out, does it re-establish itself?
i think most species bounce back pretty effectively, see here:
http://biology.plos
What amazes me is that the components seem to align with perpendicular spacial axes
this is what’s expected under isolation by distance model, see here:
http://www.gnxp.com/blog/2008/04/picture-is-worth-thousand-words-part-n.php
I think it is interesting that baldness is most common in light-skinned men of European ancestry. It would be interesting to know if the “risk variants” have been subject to recent selection.
i’d say they have. the snp with the strongest association with baldness is almost fixed between african and non-african pops
http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?rs=6625163
Common disease-common variant researchers in human genetics may get more media coverage, including the science media, but the Germ Theory people are cleaning up where it counts.
i’ve made this argument before, but this is a false dichotomy/distinction.
for example:
We present a genome-wide association study of ileal Crohn disease and two independent replication studies that identify several new regions of association to Crohn disease
…
these findings suggest that autophagy and host cell responses to intracellular microbes are involved in the pathogenesis of Crohn disease.
I think you overstate the role any given administration would have in any of this. All these restrictions on data access come from *within* the scientific community, not in response to the executive branch. If you’re going to worry about one person overreacting to these hypothetical situations, it should probably be francis collins.
for PCA analyses of a worldwide set of human populations, see here
http://www.sciencemag.org/cgi/content/abstract/319/5866/1100
So if the PC map and the real map match so well, does it mean that there has been no mass migration in Europe (other than limited, homogeneous diffusion) for a very long time?
no, I don’t think the behavior of PCA in different situations is well-enough known to make that conclusion. isolation by distance is sufficient for the matching of the real map and the PC map, but it’s probably not necessary. it’s mostly unknown how different types of migration affect PCA.
Slovaks are genetically Italian?
i think they just have a small sample size of slovaks; that’s probably not a reliable result.