The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 BlogviewJames Thompson Archive
Heritability: Lost and Found?
Optimal prediction to the rescue.
🔊 Listen RSS
Email This Page to Someone

 Remember My Information



=>

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

Hsu predicted and actual height

The “missing heritability” problem: current genetic analysis cannot explain as much variance as that suggested by population heritability estimates. This has been a cue for “Down with twin studies” arguments, in which those of dramatic inclinations have chosen to imagine that heritability estimates were thereby disproved. Not so. I was never particularly worried about this argument, regarding it as only a matter of time before the genetic code was cracked sufficiently to bridge the gap.

Another problem about breaking the genetic code is that some important human characteristics, like height and intelligence, are controlled by many genes of small effect. As regards height, this is in fact a problem of proportionality: tall people are usually taller not just because they have longer legs, though they do, but that they are generally longer and thus taller as a consequence. Building a taller body involves a large set of changes. Indeed, perhaps as many at 20,000 SNPs are required, each of them doing only a little. Equally, for intelligence as many as 10,000 SNPs may be involved. However, if many SNPs are required for an important trait, each doing very little, it is hard to prove or disprove their involvement. Rather than just identifying significant SNPs, showing that a technique can account for a good proportion of the overall variance is important. Prediction matters.

Now a paper comes along which claims to have hoovered up the SNP heritability variance for height, and to have done so by using machine learning, namely the LASSO or compressed sensing technique. It also gets 9% of the variance for scholastic attainment, close to the 10% I had previously mentioned as the current upper limit.

Accurate Genomic Prediction of Human Height. Louis Lello, Steven G. Avery, Laurent Tellier, Ana I. Vazquez, Gustavo de los Campos, and Stephen D.H. Hsu. bioRxiv preprint first posted online Sep. 18, 2017.
The abstract:

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ∼40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate∼0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The∼20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

http://www.biorxiv.org/content/biorxiv/early/2017/09/18/190124.full.pdf

The introduction sets out the problems clearly, and distinguishes the SNP hunting techniques from genomic prediction which, “based on whole genome regression methods, seek to construct the most accurate predictor of phenotype, tolerates possible inclusion of a small fraction of false-positive SNPs in the predictor set. The SNP heritability of the molecular markers used to build the predictor, can be interpreted as an upper bound to the variance that could be captured by the predictor”.

The authors have used the UK Biobank database with nearly 500,000 genotypes. The paper has, quite necessarily, very technical supplementary appendices, but the underlying approach is to use large samples of the data to train the learning procedure, and then test the results on samples of 5,000 genotypes which had been held apart for that purpose. In my primitive terms, the sample of discovery is used to generate the best predictor, warts and all, and that is tested on the sample of proof. I like this, because it is pragmatic, not burdened by too many prior assumptions about genes, uses all the data to advantage, and is willing to include weak signals.

Fig 3 shown above reveals a good fit with the data for height.

As the authors say in their discussion

Until recently most work with large genomic datasets has focused on finding associations between markers (e.g., SNPs) and phenotype. In contrast, we focused on optimal prediction of phenotype from available data. We show that much of the expected heritability from common SNPs can be captured, even for complex traits affected by thousands of variants. Recent studies using data from the interim release of the UKBB reported prediction correlations of about 0.5 for human height using roughly 100K individuals in the training[19]. These studies forecast further improvement of prediction accuracy with increased sample size, which have been confirmed here.

We are optimistic that, given enough data and high-quality phenotypes, results similar to those for height might be obtained for other quantitative traits, such as cognitive ability or specific disease risk. There are numerous disease conditions with heritability in the 0.5 range, such as Alzheimer’s, Type I Diabetes, Obesity, Ovarian Cancer, Schizophrenia, etc. Even if the heritable risk for these conditions is controlled by thousands of genetic variants, our work suggests that effective predictors might be obtainable (i.e., comparable to the height predictor in Figure (4)). This would allow identification of individuals at high risk from genotypes alone. The public health benefits are potentially enormous.

We can roughly estimate the amount of case-control data required to capture most of the variance in disease risk. For a quantitative trait (e.g., height) with h2∼0.5, our simulations predict that the phase transition in LASSO performance occurs at n∼30s where n is the number of individuals in the sample and s is the sparsity of the trait (i.e., number of variants with non-zero effect sizes). For case-control data, we find n∼100s (where n means number of cases with equal number controls) is sufficient. Thus, using our methods, analysis of∼100k cases together with a similar number of controls might allow good prediction of highly heritable disease risk, even if the genetic architecture is complex and depends on a thousand or more genetic variants

In summary, this is exciting stuff. It would appear that, given large samples and meeting signal sparsity requirements, compressed sensing may help track down predictive formulas for many traits and conditions. The benefits are enormous, as are that greatest benefit, a gain in understanding.

 
• Category: Science • Tags: Genetics of Height, Genomics, Height, Heritability 
Hide 40 CommentsLeave a Comment
Commenters to Ignore...to FollowEndorsed Only
    []
  1. hyperbola says:

    Still not very convincing and not very useful.

    To “predict” a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs, but Fig. 5 does seem to show that they are (randomly?) distributed across most of the genome, i.e. the number of genes is presumably also in the many thousands.

    Lets give the authors the benefit of the doubt and imagine that this “success” applies to a complex disease such as Alzheimers (as they suggest). Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to “inherited” Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?

    Finally, isn’t this kind of “analysis” actually subject to many unverified assumptions? For example, assume that “nurture” does have a significant role in adult height (highly likely since humans have NOT been mutating fast enough to produce the substantial increases in height observed over the last few generations). This would mean that the whole analysis set (500,000) is distorted by unknown and uncontrolled factors.

    This article seems to be another nail in the coffin of present GWAS-style approaches.

    Read More
    • Replies: @dearieme
    "Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to “inherited” Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?"

    Yeah, why bother doing research if you have the magical ability to tell in advance that it can't do any good?
    , @res

    To “predict” a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs
     
    First, do you understand what out of sample testing is? A related technique is described by Dr. Thompson above but not using those words:

    the underlying approach is to use large samples of the data to train the learning procedure, and then test the results on samples of 5,000 genotypes which had been held apart for that purpose. In my primitive terms, the sample of discovery is used to generate the best predictor, warts and all, and that is tested on the sample of proof. I like this, because it is pragmatic, not burdened by too many prior assumptions about genes, uses all the data to advantage, and is willing to include weak signals.
     
    I believe this excerpt shows that there were two different techniques being used (i.e. Figure 3 was held out UKBB data as described by Dr. Thompson rather than ARIC data):

    Figure (3) shows the correlation between predicted and actual phenotypes in a validation set of 5000 individuals not used in the training optimization described in above - this is shown both for height and heel bone mineral density.
     
    The paper abstract says: "We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results."

    Note that this is even more rigorous than Dr. Thompson described (the distinction is using a completely separate dataset (ARIC) for validation rather than a held out subset of the primary dataset (UKBB), I suspect he simplified for explanatory purposes). Here is a more detailed explanation from the paper:


    For height we tested out-of-sample validity by building a predictor model using SNPs whose state is available for both UKBB individuals (via imputation) and on Atherosclerosis Risk in Communities Study (ARIC) [18] individuals (the latter is a US sample). This SNP set differs from the one used above, and is somewhat more restricted due to the different genotyping arrays used by UKBB and ARIC. Training was done on UKBB data and out-of-sample validity tested on ARIC data. A ∼5% decrease in maximum correlation results from the restriction of SNPs and limitations of imputation: the correlation fell to ∼0.58 (from 0.61) while testing within the UKBB. On ARIC participants the correlation drops further by ∼7%, with a maximum correlation of ∼0.54. Only this latter decrease in predictive power is really due to out-of-sample effects. It is plausible that if ARIC participants
    were genotyped on the same array as the UKBB training set there would only be a ∼7% difference in predictor performance. An ARIC scatterplot analogous to Figure (4) is shown in the Supplement. Most ARIC individuals have actual height within 4 cm or less of predicted height.

     

    and

    For out-of-sample validation of height, we extracted SNPs which survived the prior quality control measures, and are also present in a second dataset from the Atherosclerosis Risk in Communities Study (ARIC) [18]. This resulted in a total of 632,155 SNPs and 464,192 samples.

     

    More on out-of-sample testing at http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:In-sample_vs._out-of-sample_forecasts

    I believe out-of-sample testing on a completely separate data set can be considered the gold standard of verification by the original researchers. Even better is having a separate group do that with more data sets. Hopefully that will also happen.

    Regarding your genes point, I agree that is disappointing. The authors did not dig into the biology much. The 2014 height GWAS I linked in another thread: http://neurogenetics.qimrberghofer.edu.au/papers/Wood2014NatGenet.pdf
    IMHO serves as a good model for some possible analyses to perform. Hopefully some hard core systems biologists will join the existing collaboration. One notable characteristic of the recent paper is just how few authors there are compared to many current GWAS papers.

    , @candid_observer
    I find this line of criticism baffling.

    Look, if we can predict diseases, with at least some level of accuracy, that can very often be medically useful in and of itself. Suppose we can predict with some reliability someone's tendency toward colon cancer, or prostate cancer, based on their genes -- predictions that would otherwise be impossible. Then we could carefully monitor such cases, and perform diagnostic procedures only on such cases, sparing those with very low susceptibility. It's entirely possible that a great range of diseases will fall under this rubric, including many forms of cancer.

    And of course from a scientific point of view it is often very important to establish that a trait is genetic, and to what degree, especially when the other methods (such as twin studies) are, or are at least are thought to be, methodologically problematic.
    ReplyAgree/Disagree/Etc.
    AgreeDisagreeLOLTroll
    These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
    Ignore Commenter Follow Commenter
    Sharing Comment via Twitter
    /jthompson/heritability-lost-and-found/#comment-2012696
    More... This Commenter This Thread Hide Thread Display All Comments
  2. dearieme says:
    @hyperbola
    Still not very convincing and not very useful.

    To "predict" a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs, but Fig. 5 does seem to show that they are (randomly?) distributed across most of the genome, i.e. the number of genes is presumably also in the many thousands.

    Lets give the authors the benefit of the doubt and imagine that this "success" applies to a complex disease such as Alzheimers (as they suggest). Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to "inherited" Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?

    Finally, isn't this kind of "analysis" actually subject to many unverified assumptions? For example, assume that "nurture" does have a significant role in adult height (highly likely since humans have NOT been mutating fast enough to produce the substantial increases in height observed over the last few generations). This would mean that the whole analysis set (500,000) is distorted by unknown and uncontrolled factors.

    This article seems to be another nail in the coffin of present GWAS-style approaches.

    “Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to “inherited” Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?”

    Yeah, why bother doing research if you have the magical ability to tell in advance that it can’t do any good?

    Read More
    • Replies: @hyperbola
    Oh, I am in favor of more research, even of more GWAS type studies for "rare" diseases. BUT, I don't expect that more GWAS is going to be very productive for many medical situations (or complex traits like IQ). Fortunately, it is now sufficiently cheap that we don't need massive resources for GWAS measurements (access to patients is probably the main bottleneck?) and can increasingly devote resources to other research approaches. I think what I replied to res will give you an idea what I mean:

    GWAS style approaches have been spectacularly successful for “rare” diseases because the “necessary and sufficient” criterion involves only a very limited number of genes. I think that by now it is becoming ever clearer that there are many “complex” traits/diseases where simply measuring ever larger cohort sizes is not going to get us much further. As you suggest, “systems biology” in the sense that genes “co-function” in achieving biological states is probably where this has to go. BUT, I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types. Just as development of the technology for GWAS was necessary, now we need to concentrate on developing new kinds of technology that measure other things.
     
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  3. C T says: • Website

    Genes tell the body to make more or less of specific molecules. Figure out what those molecules are and you can “hack” intelligence without DNA changes. For instance, mice either 1) given aspartic acid or 2) who had higher endogenous (i.e., naturally occurring due most likely to genetics) aspartic acid in their brains had better working memory. (Interestingly for those looking at intelligence in populations, East Asians and seafood consumers appear to be the ones getting the highest aspartic acid in their diets).

    Read More
    • Replies: @C T
    Sorry, rats not mice.

    Amino Acids. 2010 May;38(5):1561-9. doi: 10.1007/s00726-009-0369-x. Epub 2009 Nov 5.
    Evidence for the involvement of D-aspartic acid in learning and memory of rat.
    Topo E, Soricelli A, Di Maio A, D'Aniello E, Di Fiore MM, D'Aniello A.
    Abstract
    D-Aspartic acid (D-Asp) is an endogenous amino acid present in neuroendocrine systems. Here, we report evidence that D-Asp in the rat is involved in learning and memory processes. Oral administration of sodium D-aspartate (40 mM) for 12-16 days improved the rats' cognitive capability to find a hidden platform in the Morris water maze system. Two sessions per day for three consecutive days were performed in two groups of 12 rats. One group was treated with Na-D-aspartate and the other with control. A significant increase in the cognitive effect was observed in the treated group compared to controls (two-way ANOVA with repeated measurements: F ((2, 105)) = 57.29; P value < 0.001). Five further sessions of repeated training, involving a change in platform location, also displayed a significant treatment effect [F ((2, 84)) = 27.62; P value < 0.001]. In the hippocampus of treated rats, D-Asp increased by about 2.7-fold compared to controls (82.5 +/- 10.0 vs. the 30.6 +/- 5.4 ng/g tissue; P < 0.0001). Moreover, 20 randomly selected rats possessing relatively high endogenous concentrations of D-Asp in the hippocampus were much faster in reaching the hidden platform, an event suggesting that their enhanced cognitive capability was functionally related to the high levels of D-Asp. The correlation coefficient calculated in the 20 rats was R = -0.916 with a df of 18; P < 0.001. In conclusion, this study provides corroborating evidence that D-aspartic acid plays an important role in the modulation of learning and memory.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  4. C T says:
    @C T
    Genes tell the body to make more or less of specific molecules. Figure out what those molecules are and you can "hack" intelligence without DNA changes. For instance, mice either 1) given aspartic acid or 2) who had higher endogenous (i.e., naturally occurring due most likely to genetics) aspartic acid in their brains had better working memory. (Interestingly for those looking at intelligence in populations, East Asians and seafood consumers appear to be the ones getting the highest aspartic acid in their diets).

    Sorry, rats not mice.

    Amino Acids. 2010 May;38(5):1561-9. doi: 10.1007/s00726-009-0369-x. Epub 2009 Nov 5.
    Evidence for the involvement of D-aspartic acid in learning and memory of rat.
    Topo E, Soricelli A, Di Maio A, D’Aniello E, Di Fiore MM, D’Aniello A.
    Abstract
    D-Aspartic acid (D-Asp) is an endogenous amino acid present in neuroendocrine systems. Here, we report evidence that D-Asp in the rat is involved in learning and memory processes. Oral administration of sodium D-aspartate (40 mM) for 12-16 days improved the rats’ cognitive capability to find a hidden platform in the Morris water maze system. Two sessions per day for three consecutive days were performed in two groups of 12 rats. One group was treated with Na-D-aspartate and the other with control. A significant increase in the cognitive effect was observed in the treated group compared to controls (two-way ANOVA with repeated measurements: F ((2, 105)) = 57.29; P value < 0.001). Five further sessions of repeated training, involving a change in platform location, also displayed a significant treatment effect [F ((2, 84)) = 27.62; P value < 0.001]. In the hippocampus of treated rats, D-Asp increased by about 2.7-fold compared to controls (82.5 +/- 10.0 vs. the 30.6 +/- 5.4 ng/g tissue; P < 0.0001). Moreover, 20 randomly selected rats possessing relatively high endogenous concentrations of D-Asp in the hippocampus were much faster in reaching the hidden platform, an event suggesting that their enhanced cognitive capability was functionally related to the high levels of D-Asp. The correlation coefficient calculated in the 20 rats was R = -0.916 with a df of 18; P < 0.001. In conclusion, this study provides corroborating evidence that D-aspartic acid plays an important role in the modulation of learning and memory.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  5. res says:
    @hyperbola
    Still not very convincing and not very useful.

    To "predict" a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs, but Fig. 5 does seem to show that they are (randomly?) distributed across most of the genome, i.e. the number of genes is presumably also in the many thousands.

    Lets give the authors the benefit of the doubt and imagine that this "success" applies to a complex disease such as Alzheimers (as they suggest). Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to "inherited" Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?

    Finally, isn't this kind of "analysis" actually subject to many unverified assumptions? For example, assume that "nurture" does have a significant role in adult height (highly likely since humans have NOT been mutating fast enough to produce the substantial increases in height observed over the last few generations). This would mean that the whole analysis set (500,000) is distorted by unknown and uncontrolled factors.

    This article seems to be another nail in the coffin of present GWAS-style approaches.

    To “predict” a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs

    First, do you understand what out of sample testing is? A related technique is described by Dr. Thompson above but not using those words:

    the underlying approach is to use large samples of the data to train the learning procedure, and then test the results on samples of 5,000 genotypes which had been held apart for that purpose. In my primitive terms, the sample of discovery is used to generate the best predictor, warts and all, and that is tested on the sample of proof. I like this, because it is pragmatic, not burdened by too many prior assumptions about genes, uses all the data to advantage, and is willing to include weak signals.

    I believe this excerpt shows that there were two different techniques being used (i.e. Figure 3 was held out UKBB data as described by Dr. Thompson rather than ARIC data):

    Figure (3) shows the correlation between predicted and actual phenotypes in a validation set of 5000 individuals not used in the training optimization described in above – this is shown both for height and heel bone mineral density.

    The paper abstract says: “We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.”

    Note that this is even more rigorous than Dr. Thompson described (the distinction is using a completely separate dataset (ARIC) for validation rather than a held out subset of the primary dataset (UKBB), I suspect he simplified for explanatory purposes). Here is a more detailed explanation from the paper:

    For height we tested out-of-sample validity by building a predictor model using SNPs whose state is available for both UKBB individuals (via imputation) and on Atherosclerosis Risk in Communities Study (ARIC) [18] individuals (the latter is a US sample). This SNP set differs from the one used above, and is somewhat more restricted due to the different genotyping arrays used by UKBB and ARIC. Training was done on UKBB data and out-of-sample validity tested on ARIC data. A ∼5% decrease in maximum correlation results from the restriction of SNPs and limitations of imputation: the correlation fell to ∼0.58 (from 0.61) while testing within the UKBB. On ARIC participants the correlation drops further by ∼7%, with a maximum correlation of ∼0.54. Only this latter decrease in predictive power is really due to out-of-sample effects. It is plausible that if ARIC participants
    were genotyped on the same array as the UKBB training set there would only be a ∼7% difference in predictor performance. An ARIC scatterplot analogous to Figure (4) is shown in the Supplement. Most ARIC individuals have actual height within 4 cm or less of predicted height.

    and

    For out-of-sample validation of height, we extracted SNPs which survived the prior quality control measures, and are also present in a second dataset from the Atherosclerosis Risk in Communities Study (ARIC) [18]. This resulted in a total of 632,155 SNPs and 464,192 samples.

    More on out-of-sample testing at http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:In-sample_vs._out-of-sample_forecasts

    I believe out-of-sample testing on a completely separate data set can be considered the gold standard of verification by the original researchers. Even better is having a separate group do that with more data sets. Hopefully that will also happen.

    Regarding your genes point, I agree that is disappointing. The authors did not dig into the biology much. The 2014 height GWAS I linked in another thread: http://neurogenetics.qimrberghofer.edu.au/papers/Wood2014NatGenet.pdf
    IMHO serves as a good model for some possible analyses to perform. Hopefully some hard core systems biologists will join the existing collaboration. One notable characteristic of the recent paper is just how few authors there are compared to many current GWAS papers.

    Read More
    • Replies: @James Thompson
    Thanks for your helpful comments. Yes, I was simplifying and skipped the other samples of proof, because the paper is there for everyone to read, and of course you have done so! I am using the distinction between "sample of discovery" versus "sample of proof" but others call the latter the "sample of testing" which sometimes gets misunderstood. Perhaps "sample of validation" would be better. Any, it is another "two for the price of one" paper.
    Yes it is more rigorous to test on a completely different sample than a subset held back for that purpose, but if different measurements were taken in that other sample it might be an unfair measure.
    The authors didn't mention any of the biology. I think their point is that they can get a good predictive equation with CS, and others can look at the biology if they wish to. And they are, as you note, not a multitude of biologists, but a small group who incline towards physics. What could be better, other than them all being mathematicians? Nonetheless, I will see if they wish to comment.
    , @hyperbola
    "Out of sample" testing is nothing particularly new. In a sense every "parallel" case of different research groups performing "gwas" for the same disease on different data sets is "out of sample" testing. What you like to extol as "out of sample" testing might instead be regarded as the usual scientific requirement that the result can be reproduced by others with independent data collection - as you noted.

    GWAS style approaches have been spectacularly successful for "rare" diseases because the "necessary and sufficient" criterion involves only a very limited number of genes. I think that by now it is becoming ever clearer that there are many "complex" traits/diseases where simply measuring ever larger cohort sizes is not going to get us much further. As you suggest, "systems biology" in the sense that genes "co-function" in achieving biological states is probably where this has to go. BUT, I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types. Just as development of the technology for GWAS was necessary, now we need to concentrate on developing new kinds of technology that measure other things.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  6. @res

    To “predict” a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs
     
    First, do you understand what out of sample testing is? A related technique is described by Dr. Thompson above but not using those words:

    the underlying approach is to use large samples of the data to train the learning procedure, and then test the results on samples of 5,000 genotypes which had been held apart for that purpose. In my primitive terms, the sample of discovery is used to generate the best predictor, warts and all, and that is tested on the sample of proof. I like this, because it is pragmatic, not burdened by too many prior assumptions about genes, uses all the data to advantage, and is willing to include weak signals.
     
    I believe this excerpt shows that there were two different techniques being used (i.e. Figure 3 was held out UKBB data as described by Dr. Thompson rather than ARIC data):

    Figure (3) shows the correlation between predicted and actual phenotypes in a validation set of 5000 individuals not used in the training optimization described in above - this is shown both for height and heel bone mineral density.
     
    The paper abstract says: "We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results."

    Note that this is even more rigorous than Dr. Thompson described (the distinction is using a completely separate dataset (ARIC) for validation rather than a held out subset of the primary dataset (UKBB), I suspect he simplified for explanatory purposes). Here is a more detailed explanation from the paper:


    For height we tested out-of-sample validity by building a predictor model using SNPs whose state is available for both UKBB individuals (via imputation) and on Atherosclerosis Risk in Communities Study (ARIC) [18] individuals (the latter is a US sample). This SNP set differs from the one used above, and is somewhat more restricted due to the different genotyping arrays used by UKBB and ARIC. Training was done on UKBB data and out-of-sample validity tested on ARIC data. A ∼5% decrease in maximum correlation results from the restriction of SNPs and limitations of imputation: the correlation fell to ∼0.58 (from 0.61) while testing within the UKBB. On ARIC participants the correlation drops further by ∼7%, with a maximum correlation of ∼0.54. Only this latter decrease in predictive power is really due to out-of-sample effects. It is plausible that if ARIC participants
    were genotyped on the same array as the UKBB training set there would only be a ∼7% difference in predictor performance. An ARIC scatterplot analogous to Figure (4) is shown in the Supplement. Most ARIC individuals have actual height within 4 cm or less of predicted height.

     

    and

    For out-of-sample validation of height, we extracted SNPs which survived the prior quality control measures, and are also present in a second dataset from the Atherosclerosis Risk in Communities Study (ARIC) [18]. This resulted in a total of 632,155 SNPs and 464,192 samples.

     

    More on out-of-sample testing at http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:In-sample_vs._out-of-sample_forecasts

    I believe out-of-sample testing on a completely separate data set can be considered the gold standard of verification by the original researchers. Even better is having a separate group do that with more data sets. Hopefully that will also happen.

    Regarding your genes point, I agree that is disappointing. The authors did not dig into the biology much. The 2014 height GWAS I linked in another thread: http://neurogenetics.qimrberghofer.edu.au/papers/Wood2014NatGenet.pdf
    IMHO serves as a good model for some possible analyses to perform. Hopefully some hard core systems biologists will join the existing collaboration. One notable characteristic of the recent paper is just how few authors there are compared to many current GWAS papers.

    Thanks for your helpful comments. Yes, I was simplifying and skipped the other samples of proof, because the paper is there for everyone to read, and of course you have done so! I am using the distinction between “sample of discovery” versus “sample of proof” but others call the latter the “sample of testing” which sometimes gets misunderstood. Perhaps “sample of validation” would be better. Any, it is another “two for the price of one” paper.
    Yes it is more rigorous to test on a completely different sample than a subset held back for that purpose, but if different measurements were taken in that other sample it might be an unfair measure.
    The authors didn’t mention any of the biology. I think their point is that they can get a good predictive equation with CS, and others can look at the biology if they wish to. And they are, as you note, not a multitude of biologists, but a small group who incline towards physics. What could be better, other than them all being mathematicians? Nonetheless, I will see if they wish to comment.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  7. lauris71 says:

    In my opinion, the talk about SNPs is a bit misleading in this context.
    20 000 SNPs is in the same order of magnitude that the number of haploblocks in European population. So they more or less capture the total genetic variability of the population into the model and use it to predict phenotype. It is interesting and certainly useful approach, for example to predict disease risks. But moving from such model to biological explanations – i.e. finding genes that should be manipulated to cure certain symptoms, is a long way.

    Read More
    • Replies: @res
    Do we have any sense of how many haploblocks these 20k SNPs represent? Do you have a reference for the number of haploblocks in the European population? Has anyone tried making a haploblock based predictor for height?

    I just looked for references to haploblocks and most of what I am seeing is a decade or more old. Given that we know so much more about genetic structure across populations now I am not sure how much to value old references.

    One thing I don't know how to evaluate is how likely CS is to get to actual causal SNPs. In a traditional GWAS there tend to be multiple nearby (in LD disequilibrium) high significance SNPs. I think the CS pressure towards sparseness would help select only one of those SNPs, but is it causal or is it (as you discuss) only representative of a haplotype?

    My recollection of my time spent learning about and using L1-regularization was that there were some issues with enforced sparseness at different levels of penalization, especially with correlated variables. I don't know whether or not that relates to a CS complete solution.

    But in this interesting discussion some seemingly informed people don't think that is a big deal: https://stats.stackexchange.com/questions/30486/when-does-lasso-select-correlated-predictors

    P.S. The caption to Figure 5 does offer some support for what you are saying: "Activated SNPs are distributed roughly uniformly throughout the genome."

    P.P.S. One thing I did not pick up on initially was this from the very end of the conclusion:

    For case-control data, we find n ∼ 100s (where n means number of cases with equal number controls) is sufficient. Thus, using our methods, analysis of ∼ 100k cases together with a similar number of controls might allow good prediction of highly heritable disease risk, even if the genetic architecture is complex and depends on a thousand or more genetic variants.
     
    I wonder if they are still thinking about those high-IQ case control studies? Or is this more about disease?

    It seems odd to me that the necessary sample size for case-control data is ~3x that of broad population data (n ∼ 30s). I would have expected the case-control methodology to be more powerful as suggested by that recent IQ meta-study. Perhaps the relatively low power has to do with looking at fundamentally different problems (e.g. low prevalence disease rather than quantitative traits)?
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  8. res says:
    @lauris71
    In my opinion, the talk about SNPs is a bit misleading in this context.
    20 000 SNPs is in the same order of magnitude that the number of haploblocks in European population. So they more or less capture the total genetic variability of the population into the model and use it to predict phenotype. It is interesting and certainly useful approach, for example to predict disease risks. But moving from such model to biological explanations - i.e. finding genes that should be manipulated to cure certain symptoms, is a long way.

    Do we have any sense of how many haploblocks these 20k SNPs represent? Do you have a reference for the number of haploblocks in the European population? Has anyone tried making a haploblock based predictor for height?

    I just looked for references to haploblocks and most of what I am seeing is a decade or more old. Given that we know so much more about genetic structure across populations now I am not sure how much to value old references.

    One thing I don’t know how to evaluate is how likely CS is to get to actual causal SNPs. In a traditional GWAS there tend to be multiple nearby (in LD disequilibrium) high significance SNPs. I think the CS pressure towards sparseness would help select only one of those SNPs, but is it causal or is it (as you discuss) only representative of a haplotype?

    My recollection of my time spent learning about and using L1-regularization was that there were some issues with enforced sparseness at different levels of penalization, especially with correlated variables. I don’t know whether or not that relates to a CS complete solution.

    But in this interesting discussion some seemingly informed people don’t think that is a big deal: https://stats.stackexchange.com/questions/30486/when-does-lasso-select-correlated-predictors

    P.S. The caption to Figure 5 does offer some support for what you are saying: “Activated SNPs are distributed roughly uniformly throughout the genome.”

    P.P.S. One thing I did not pick up on initially was this from the very end of the conclusion:

    For case-control data, we find n ∼ 100s (where n means number of cases with equal number controls) is sufficient. Thus, using our methods, analysis of ∼ 100k cases together with a similar number of controls might allow good prediction of highly heritable disease risk, even if the genetic architecture is complex and depends on a thousand or more genetic variants.

    I wonder if they are still thinking about those high-IQ case control studies? Or is this more about disease?

    It seems odd to me that the necessary sample size for case-control data is ~3x that of broad population data (n ∼ 30s). I would have expected the case-control methodology to be more powerful as suggested by that recent IQ meta-study. Perhaps the relatively low power has to do with looking at fundamentally different problems (e.g. low prevalence disease rather than quantitative traits)?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  9. hyperbola says:
    @res

    To “predict” a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs
     
    First, do you understand what out of sample testing is? A related technique is described by Dr. Thompson above but not using those words:

    the underlying approach is to use large samples of the data to train the learning procedure, and then test the results on samples of 5,000 genotypes which had been held apart for that purpose. In my primitive terms, the sample of discovery is used to generate the best predictor, warts and all, and that is tested on the sample of proof. I like this, because it is pragmatic, not burdened by too many prior assumptions about genes, uses all the data to advantage, and is willing to include weak signals.
     
    I believe this excerpt shows that there were two different techniques being used (i.e. Figure 3 was held out UKBB data as described by Dr. Thompson rather than ARIC data):

    Figure (3) shows the correlation between predicted and actual phenotypes in a validation set of 5000 individuals not used in the training optimization described in above - this is shown both for height and heel bone mineral density.
     
    The paper abstract says: "We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results."

    Note that this is even more rigorous than Dr. Thompson described (the distinction is using a completely separate dataset (ARIC) for validation rather than a held out subset of the primary dataset (UKBB), I suspect he simplified for explanatory purposes). Here is a more detailed explanation from the paper:


    For height we tested out-of-sample validity by building a predictor model using SNPs whose state is available for both UKBB individuals (via imputation) and on Atherosclerosis Risk in Communities Study (ARIC) [18] individuals (the latter is a US sample). This SNP set differs from the one used above, and is somewhat more restricted due to the different genotyping arrays used by UKBB and ARIC. Training was done on UKBB data and out-of-sample validity tested on ARIC data. A ∼5% decrease in maximum correlation results from the restriction of SNPs and limitations of imputation: the correlation fell to ∼0.58 (from 0.61) while testing within the UKBB. On ARIC participants the correlation drops further by ∼7%, with a maximum correlation of ∼0.54. Only this latter decrease in predictive power is really due to out-of-sample effects. It is plausible that if ARIC participants
    were genotyped on the same array as the UKBB training set there would only be a ∼7% difference in predictor performance. An ARIC scatterplot analogous to Figure (4) is shown in the Supplement. Most ARIC individuals have actual height within 4 cm or less of predicted height.

     

    and

    For out-of-sample validation of height, we extracted SNPs which survived the prior quality control measures, and are also present in a second dataset from the Atherosclerosis Risk in Communities Study (ARIC) [18]. This resulted in a total of 632,155 SNPs and 464,192 samples.

     

    More on out-of-sample testing at http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:In-sample_vs._out-of-sample_forecasts

    I believe out-of-sample testing on a completely separate data set can be considered the gold standard of verification by the original researchers. Even better is having a separate group do that with more data sets. Hopefully that will also happen.

    Regarding your genes point, I agree that is disappointing. The authors did not dig into the biology much. The 2014 height GWAS I linked in another thread: http://neurogenetics.qimrberghofer.edu.au/papers/Wood2014NatGenet.pdf
    IMHO serves as a good model for some possible analyses to perform. Hopefully some hard core systems biologists will join the existing collaboration. One notable characteristic of the recent paper is just how few authors there are compared to many current GWAS papers.

    “Out of sample” testing is nothing particularly new. In a sense every “parallel” case of different research groups performing “gwas” for the same disease on different data sets is “out of sample” testing. What you like to extol as “out of sample” testing might instead be regarded as the usual scientific requirement that the result can be reproduced by others with independent data collection – as you noted.

    GWAS style approaches have been spectacularly successful for “rare” diseases because the “necessary and sufficient” criterion involves only a very limited number of genes. I think that by now it is becoming ever clearer that there are many “complex” traits/diseases where simply measuring ever larger cohort sizes is not going to get us much further. As you suggest, “systems biology” in the sense that genes “co-function” in achieving biological states is probably where this has to go. BUT, I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types. Just as development of the technology for GWAS was necessary, now we need to concentrate on developing new kinds of technology that measure other things.

    Read More
    • Replies: @res

    “Out of sample” testing is nothing particularly new.
     
    Of course. But I do think it is the gold standard of validation. Implementing it can be hard (see the contortions involved in reconciling the data sets described in the paper), and for that reason I think even publishing results for it is a good indicator of how seriously the authors take verifying their results. Seeing good out-of-sample results is even better.

    I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types.
     
    I don't think the two approaches are exclusive. I agree that integrating new genetic results into the systems biology hierarchy (from rate constants on enzyme reactions to cells to tissues to organisms) is important and should prove to be valuable over time.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  10. hyperbola says:
    @dearieme
    "Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to “inherited” Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?"

    Yeah, why bother doing research if you have the magical ability to tell in advance that it can't do any good?

    Oh, I am in favor of more research, even of more GWAS type studies for “rare” diseases. BUT, I don’t expect that more GWAS is going to be very productive for many medical situations (or complex traits like IQ). Fortunately, it is now sufficiently cheap that we don’t need massive resources for GWAS measurements (access to patients is probably the main bottleneck?) and can increasingly devote resources to other research approaches. I think what I replied to res will give you an idea what I mean:

    GWAS style approaches have been spectacularly successful for “rare” diseases because the “necessary and sufficient” criterion involves only a very limited number of genes. I think that by now it is becoming ever clearer that there are many “complex” traits/diseases where simply measuring ever larger cohort sizes is not going to get us much further. As you suggest, “systems biology” in the sense that genes “co-function” in achieving biological states is probably where this has to go. BUT, I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types. Just as development of the technology for GWAS was necessary, now we need to concentrate on developing new kinds of technology that measure other things.

    Read More
    • Replies: @dearieme
    Fair enough.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  11. res says:
    @hyperbola
    "Out of sample" testing is nothing particularly new. In a sense every "parallel" case of different research groups performing "gwas" for the same disease on different data sets is "out of sample" testing. What you like to extol as "out of sample" testing might instead be regarded as the usual scientific requirement that the result can be reproduced by others with independent data collection - as you noted.

    GWAS style approaches have been spectacularly successful for "rare" diseases because the "necessary and sufficient" criterion involves only a very limited number of genes. I think that by now it is becoming ever clearer that there are many "complex" traits/diseases where simply measuring ever larger cohort sizes is not going to get us much further. As you suggest, "systems biology" in the sense that genes "co-function" in achieving biological states is probably where this has to go. BUT, I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types. Just as development of the technology for GWAS was necessary, now we need to concentrate on developing new kinds of technology that measure other things.

    “Out of sample” testing is nothing particularly new.

    Of course. But I do think it is the gold standard of validation. Implementing it can be hard (see the contortions involved in reconciling the data sets described in the paper), and for that reason I think even publishing results for it is a good indicator of how seriously the authors take verifying their results. Seeing good out-of-sample results is even better.

    I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types.

    I don’t think the two approaches are exclusive. I agree that integrating new genetic results into the systems biology hierarchy (from rate constants on enzyme reactions to cells to tissues to organisms) is important and should prove to be valuable over time.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  12. Factorize says:

    res, I think the infoproc site is very descriptive in calling diseases 1-bit sensing.
    As you mentioned quantitative traits such as height gives usable information from each
    person in the sample to apply to the betas. The problem with diseases is that typically you
    have the disease or you don’t. Some risk threshold needs to be crossed before a disease manifest.

    I am not completely sure whether the above thinking would necessarily apply in a typical way to Alzheimer’s. By age 90 everyone has Alzheimer pathology. So with AD it is not so much having or not having dementia that is centrally important, but when such impairment emerges. This might make AD more similar to quantitative traits such as height than a disease.

    Read More
    • Replies: @res

    This might make AD more similar to quantitative traits such as height than a disease.
     
    I think there are many diseases/conditions that share this similarity. See the Liability, Threshold Model: http://www.wikilectures.eu/index.php/Genetic_Liability,_Threshold_Model.

    The issue is to a large degree the diagnoses are binary though there are exceptions where status is given a quantitative measurement. As you note, age of onset might be useful in general.

    P.S. Didn't really say much new here. More expressing agreement.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  13. res says:
    @Factorize
    res, I think the infoproc site is very descriptive in calling diseases 1-bit sensing.
    As you mentioned quantitative traits such as height gives usable information from each
    person in the sample to apply to the betas. The problem with diseases is that typically you
    have the disease or you don't. Some risk threshold needs to be crossed before a disease manifest.

    I am not completely sure whether the above thinking would necessarily apply in a typical way to Alzheimer's. By age 90 everyone has Alzheimer pathology. So with AD it is not so much having or not having dementia that is centrally important, but when such impairment emerges. This might make AD more similar to quantitative traits such as height than a disease.

    This might make AD more similar to quantitative traits such as height than a disease.

    I think there are many diseases/conditions that share this similarity. See the Liability, Threshold Model: http://www.wikilectures.eu/index.php/Genetic_Liability,_Threshold_Model.

    The issue is to a large degree the diagnoses are binary though there are exceptions where status is given a quantitative measurement. As you note, age of onset might be useful in general.

    P.S. Didn’t really say much new here. More expressing agreement.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  14. Factorize says:

    res, could use MMSE score to quantitize dementia.
    An MMSE GWAS?

    Turning a disease into a quantifiable trait might greatly amplify the power of the CS L1 Lasso: you might move n from 100 s back down to 30 s while also benefiting from diseases having fewer non-zero SNPs. infoproc suggested 1000 SNPs for diseases versus perhaps 10,000-40,000 SNPs for traits.

    I am sure many in the data processing community must be extremely frustrated by all the data sharing barriers that stand in the way to getting the job done.

    If they were to open up access to these data servers there would be a truly massive analysis frenzy. How long will we have to wait for the GCHD for height? Days? Months? Years?

    This research is critically important for anyone coping with an inherited illness or concerned about IQ or income inequality .. everyone. It is profoundly immoral for this research to be deliberated hindered.

    This could be done in less than a day if there were no barriers. Why don’t they simply open up access to running analysis while keeping the data secure? They could make it a black box in which only the processed results were returned. This could be done on an open sharing basis.

    The scientific literature could move from pretty prose to large scale data drops that could then be fed back into the servers for further analysis.

    We are still far away from achieving truly open science. There would be this enormous leap forward if the data bureaucrats actually allowed scientists to do their jobs without obstruction.

    Read More
    • Replies: @James Thompson
    Those who put together collaborative projects on the genetics of intelligence find them very time consuming, telling me that 3 years of work to achieve agreed participation is not unusual.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  15. @Factorize
    res, could use MMSE score to quantitize dementia.
    An MMSE GWAS?

    Turning a disease into a quantifiable trait might greatly amplify the power of the CS L1 Lasso: you might move n from 100 s back down to 30 s while also benefiting from diseases having fewer non-zero SNPs. infoproc suggested 1000 SNPs for diseases versus perhaps 10,000-40,000 SNPs for traits.

    I am sure many in the data processing community must be extremely frustrated by all the data sharing barriers that stand in the way to getting the job done.

    If they were to open up access to these data servers there would be a truly massive analysis frenzy. How long will we have to wait for the GCHD for height? Days? Months? Years?

    This research is critically important for anyone coping with an inherited illness or concerned about IQ or income inequality .. everyone. It is profoundly immoral for this research to be deliberated hindered.


    This could be done in less than a day if there were no barriers. Why don't they simply open up access to running analysis while keeping the data secure? They could make it a black box in which only the processed results were returned. This could be done on an open sharing basis.

    The scientific literature could move from pretty prose to large scale data drops that could then be fed back into the servers for further analysis.

    We are still far away from achieving truly open science. There would be this enormous leap forward if the data bureaucrats actually allowed scientists to do their jobs without obstruction.

    Those who put together collaborative projects on the genetics of intelligence find them very time consuming, telling me that 3 years of work to achieve agreed participation is not unusual.

    Read More
    • Replies: @res
    That is interesting. Have they offered any thoughts on which parts take the most time? Is it mostly HIPAA (or equivalent non-US requirements) compliance?

    P.S. Re: MMSE, I don't have any real knowledge there, but sounds plausible.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  16. res says:
    @James Thompson
    Those who put together collaborative projects on the genetics of intelligence find them very time consuming, telling me that 3 years of work to achieve agreed participation is not unusual.

    That is interesting. Have they offered any thoughts on which parts take the most time? Is it mostly HIPAA (or equivalent non-US requirements) compliance?

    P.S. Re: MMSE, I don’t have any real knowledge there, but sounds plausible.

    Read More
    • Replies: @James Thompson
    I did not inquire about details, and the subject is delicate, since further cooperation depends on further cooperation.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  17. Factorize says:

    Doctor Thompson, I was considering more what happens when there is a large scale genetic database that is ready to be analyzed, such as the UKB. The theory for linear Lasso CS was published 4 years ago. The UKB uploaded results in July and here we are in September already with the first round of results.

    I will be especially interested to see how long it will take to recycle the current output back into the UKB. For example, the non-linear analysis and then perhaps additional iterations. The theoretical work has already been done, how long will be required for successive steps of data feedback to occur? All that is needed is to let the supercomputer crunch the numbers. The computing time
    is probably minimal. The bureaucratic obstacles are likely much more daunting.

    If the UKB database would allow a more open access approach to analysis of the data, then results
    could occur almost instantly. The data itself could remain behind a firewall.

    res, did you notice that the gigascience article almost entirely neglected to talk about compound heterogeneity? The recent GCDH article had large interction terms for CH. The biggest
    interaction was almost largest than the biggest SNP. The giga article focused almost exclusively on epistasis.

    I have been worried about the computational explosion that would occur with non-linear terms, though this does not seem to apply. If you extract all the linear SNPs in the first round, then the non-linear interactions especially for the CH ones could be quite modest. {The CH interactions only happen within a given gene. Thus, there might be a high degree of scarcity of such interactions.}

    Something that I am puzzling over is how the phase 2 phase boundary might change due to the information derived from step 1. If you eliminate nearly all the noise from the zero SNPs with Lasso
    in Step 1 i.e. remove all the red area in the phase diagram, then what would the phase boundary look like in Step 2? Would you then have a firmer boundary?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter Display All Comments
  18. @hyperbola
    Still not very convincing and not very useful.

    To "predict" a single simple phenotype such as height, the authors use about 10,000 variables (for a sample size of only 500,000). It seems strange that the authors seem never to tell us how many genes are represented by those 10,000 SNPs, but Fig. 5 does seem to show that they are (randomly?) distributed across most of the genome, i.e. the number of genes is presumably also in the many thousands.

    Lets give the authors the benefit of the doubt and imagine that this "success" applies to a complex disease such as Alzheimers (as they suggest). Is there any real medical utility to knowing that several thousand genes may influence whether any given patient has been susceptible since birth to "inherited" Alzheimers? Is it likely that knowing several thousand genes may have (mostly very small) contributions to Alzheimers will help in producing therapies?

    Finally, isn't this kind of "analysis" actually subject to many unverified assumptions? For example, assume that "nurture" does have a significant role in adult height (highly likely since humans have NOT been mutating fast enough to produce the substantial increases in height observed over the last few generations). This would mean that the whole analysis set (500,000) is distorted by unknown and uncontrolled factors.

    This article seems to be another nail in the coffin of present GWAS-style approaches.

    I find this line of criticism baffling.

    Look, if we can predict diseases, with at least some level of accuracy, that can very often be medically useful in and of itself. Suppose we can predict with some reliability someone’s tendency toward colon cancer, or prostate cancer, based on their genes — predictions that would otherwise be impossible. Then we could carefully monitor such cases, and perform diagnostic procedures only on such cases, sparing those with very low susceptibility. It’s entirely possible that a great range of diseases will fall under this rubric, including many forms of cancer.

    And of course from a scientific point of view it is often very important to establish that a trait is genetic, and to what degree, especially when the other methods (such as twin studies) are, or are at least are thought to be, methodologically problematic.

    Read More
    • Replies: @James Thompson
    I agree. More targeted screening would be a great advantage, also having a better idea of which drugs to use, or to develop.
    , @hyperbola
    The distinction is the "rare" vs. "complex" disease conundrum again. When the "necessary and sufficient" genes are a handful, finding therapies (even pre-birth genetic modifications) is plausible. When hundreds or thousands of genes contribute to some kind of disease, then a single measure like "propensity to suffer" this disease can be constructed with hundreds if not thousands of different combinations of genes if the statistics of the gene identification was correct. What is it that you then want to monitor as an indicator of disease occurence/progression (the inheritable genes are not changing)? If you then have a reliable indicator of disease, but this involves hundreds of genes (i.e. things like 2 to the power of 100 for just two variants of each gene), what are the chances that some sort of therapy for the particular gene-set of a particular patient is already known? What kind of patient sample sizes will be required to test/verify such therapies for the different patients and how will these be obtained?

    The sheer intractability that seems now to be increasingly revealed from GWAS of complex diseases suggests that only the "rare" diseases will really be helped by current GWAS approaches.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  19. @candid_observer
    I find this line of criticism baffling.

    Look, if we can predict diseases, with at least some level of accuracy, that can very often be medically useful in and of itself. Suppose we can predict with some reliability someone's tendency toward colon cancer, or prostate cancer, based on their genes -- predictions that would otherwise be impossible. Then we could carefully monitor such cases, and perform diagnostic procedures only on such cases, sparing those with very low susceptibility. It's entirely possible that a great range of diseases will fall under this rubric, including many forms of cancer.

    And of course from a scientific point of view it is often very important to establish that a trait is genetic, and to what degree, especially when the other methods (such as twin studies) are, or are at least are thought to be, methodologically problematic.

    I agree. More targeted screening would be a great advantage, also having a better idea of which drugs to use, or to develop.

    Read More
    • Replies: @res
    More targeted screening could be an advantage in a number of dimensions:
    - Cost control of screening. That might not be binary (say screen high risk every 2 years, low risk every 5).
    - Potentially better screening effectiveness because of higher disease base rates in the population identified. I think this one could be a big deal, but don't have any idea of the numbers.
    - If false positives decrease relative to true positives (previous), net treatment outcomes would probably improve. Would also improve cost control of treatment.
    Any more ideas?

    Also related:
    - At risk individuals might have more incentive to take preemptive measures like modifying their diets.

    As you note the potential ability to make connections like disease genetics -> relevant tissues or metabolic pathways -> more targeted drugs
    could be valuable.

    This could be especially useful for something like schizophrenia which I think is believed to have a number of possible causative factors and to be something of a "group of syndromes with similar symptoms." I think there is a high likelihood of more targeted drugs being a big improvement there. I would be interested in hearing the thoughts of someone more expert on schizophrenia about this.
    , @hyperbola
    I disagree. What is it that you propose to screen for? See my answer to candid_observer at #28.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  20. @res
    That is interesting. Have they offered any thoughts on which parts take the most time? Is it mostly HIPAA (or equivalent non-US requirements) compliance?

    P.S. Re: MMSE, I don't have any real knowledge there, but sounds plausible.

    I did not inquire about details, and the subject is delicate, since further cooperation depends on further cooperation.

    Read More
    • Replies: @res
    Understood. I respect your judgment and discretion.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  21. res says:
    @James Thompson
    I did not inquire about details, and the subject is delicate, since further cooperation depends on further cooperation.

    Understood. I respect your judgment and discretion.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  22. res says:
    @James Thompson
    I agree. More targeted screening would be a great advantage, also having a better idea of which drugs to use, or to develop.

    More targeted screening could be an advantage in a number of dimensions:
    - Cost control of screening. That might not be binary (say screen high risk every 2 years, low risk every 5).
    - Potentially better screening effectiveness because of higher disease base rates in the population identified. I think this one could be a big deal, but don’t have any idea of the numbers.
    - If false positives decrease relative to true positives (previous), net treatment outcomes would probably improve. Would also improve cost control of treatment.
    Any more ideas?

    Also related:
    - At risk individuals might have more incentive to take preemptive measures like modifying their diets.

    As you note the potential ability to make connections like disease genetics -> relevant tissues or metabolic pathways -> more targeted drugs
    could be valuable.

    This could be especially useful for something like schizophrenia which I think is believed to have a number of possible causative factors and to be something of a “group of syndromes with similar symptoms.” I think there is a high likelihood of more targeted drugs being a big improvement there. I would be interested in hearing the thoughts of someone more expert on schizophrenia about this.

    Read More
    • Replies: @James Thompson
    I think the main gain would be to overcome base-rate false positives. They cause anxiety, and needless painful investigation. Gerd Gigerenzer "Reckoning with risk" is a great read on this topic.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  23. @res
    More targeted screening could be an advantage in a number of dimensions:
    - Cost control of screening. That might not be binary (say screen high risk every 2 years, low risk every 5).
    - Potentially better screening effectiveness because of higher disease base rates in the population identified. I think this one could be a big deal, but don't have any idea of the numbers.
    - If false positives decrease relative to true positives (previous), net treatment outcomes would probably improve. Would also improve cost control of treatment.
    Any more ideas?

    Also related:
    - At risk individuals might have more incentive to take preemptive measures like modifying their diets.

    As you note the potential ability to make connections like disease genetics -> relevant tissues or metabolic pathways -> more targeted drugs
    could be valuable.

    This could be especially useful for something like schizophrenia which I think is believed to have a number of possible causative factors and to be something of a "group of syndromes with similar symptoms." I think there is a high likelihood of more targeted drugs being a big improvement there. I would be interested in hearing the thoughts of someone more expert on schizophrenia about this.

    I think the main gain would be to overcome base-rate false positives. They cause anxiety, and needless painful investigation. Gerd Gigerenzer “Reckoning with risk” is a great read on this topic.

    Read More
    • Agree: res
    • Replies: @candid_observer
    This goes beyond painful investigations. A number of radical procedures depend on estimations of potential aggressive cancer.

    Prostate cancer is a good example. The evidence is that a very high proportion of men over, say, age 65 have prostate cancer -- perhaps over 50%. This may well show up on a biopsy. But in the vast majority of cases, that cancer is not going to be aggressive, and will simply stay at a low, relatively passive level for perhaps decades.

    Question is, do you remove the prostate if you find cancer in a biopsy? There are complicated and disputed protocols used to make this decision. Anything that could add to the reliable prediction of whether the cancer will turn aggressive would be a huge boon, saving lives on the one hand, and unnecessary radical procedures on the other. Obviously further good information from genetics could go a great distance in doing this.

    Similar points would apply to breast cancer, of course. And treatment of other cancers and other diseases would likewise profit from such insight.
    , @dearieme
    'Gerd Gigerenzer “Reckoning with risk” is a great read on this topic.'

    It's a tremendous read on many topics. It's my Book of the Decade for whichever decade I bought it in.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  24. @James Thompson
    I think the main gain would be to overcome base-rate false positives. They cause anxiety, and needless painful investigation. Gerd Gigerenzer "Reckoning with risk" is a great read on this topic.

    This goes beyond painful investigations. A number of radical procedures depend on estimations of potential aggressive cancer.

    Prostate cancer is a good example. The evidence is that a very high proportion of men over, say, age 65 have prostate cancer — perhaps over 50%. This may well show up on a biopsy. But in the vast majority of cases, that cancer is not going to be aggressive, and will simply stay at a low, relatively passive level for perhaps decades.

    Question is, do you remove the prostate if you find cancer in a biopsy? There are complicated and disputed protocols used to make this decision. Anything that could add to the reliable prediction of whether the cancer will turn aggressive would be a huge boon, saving lives on the one hand, and unnecessary radical procedures on the other. Obviously further good information from genetics could go a great distance in doing this.

    Similar points would apply to breast cancer, of course. And treatment of other cancers and other diseases would likewise profit from such insight.

    Read More
    • Replies: @res
    Agreed and well said. Prostate cancer is my go to example in this area (it is actually what I was thinking about when writing my earlier comment, notice the screening intervals--for PSA ; ) because of both the reasons you mention and because screening is currently controversial in the US--primarily for the reasons you describe. Also because there is significant literature in this area and because the controversy seems relatively balanced I think there has been a decent effort to develop good quantitative evidence and arguments.

    A better defined hierarchy of screenings and treatments for prostate cancer would be a valuable addition to medicine. Especially if it could be informed by genetic knowledge.

    I think it worth expanding on your (I think implicit) point that it is worth looking at the genetics of prostate cancer virulence separately from the genetics of prostate cancer incidence.

    P.S. re: screening intervals: https://www.ncbi.nlm.nih.gov/pubmed/21948815

    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  25. Factorize says:

    res, looming large for me is the implications of the current research on those born recently or will soon be born. What changes in the environment will be made to help them be more adaptive citizens?

    For example, we might not be far away now from genetic predictors of IQ, EA and other behavioral traits. Will we soon start a genetic based streaming program even at very young ages?

    With schizophrenia, what might happen if one were able to disentangle the cognitive impairment from the other symptoms? Might the illness then transform (as the title of the book you noted suggest) into a new way of being without the debilitating consequences as occurs today?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter Display All Comments
  26. res says:
    @candid_observer
    This goes beyond painful investigations. A number of radical procedures depend on estimations of potential aggressive cancer.

    Prostate cancer is a good example. The evidence is that a very high proportion of men over, say, age 65 have prostate cancer -- perhaps over 50%. This may well show up on a biopsy. But in the vast majority of cases, that cancer is not going to be aggressive, and will simply stay at a low, relatively passive level for perhaps decades.

    Question is, do you remove the prostate if you find cancer in a biopsy? There are complicated and disputed protocols used to make this decision. Anything that could add to the reliable prediction of whether the cancer will turn aggressive would be a huge boon, saving lives on the one hand, and unnecessary radical procedures on the other. Obviously further good information from genetics could go a great distance in doing this.

    Similar points would apply to breast cancer, of course. And treatment of other cancers and other diseases would likewise profit from such insight.

    Agreed and well said. Prostate cancer is my go to example in this area (it is actually what I was thinking about when writing my earlier comment, notice the screening intervals–for PSA ; ) because of both the reasons you mention and because screening is currently controversial in the US–primarily for the reasons you describe. Also because there is significant literature in this area and because the controversy seems relatively balanced I think there has been a decent effort to develop good quantitative evidence and arguments.

    A better defined hierarchy of screenings and treatments for prostate cancer would be a valuable addition to medicine. Especially if it could be informed by genetic knowledge.

    I think it worth expanding on your (I think implicit) point that it is worth looking at the genetics of prostate cancer virulence separately from the genetics of prostate cancer incidence.

    P.S. re: screening intervals: https://www.ncbi.nlm.nih.gov/pubmed/21948815

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  27. res says:

    The most recent version of this paper at biorxiv: https://www.biorxiv.org/content/early/2017/09/19/190124
    has a list of blog posts referencing the paper (including this one).

    I am still going through that, but one thing I recommend is taking a look at Rick Hyatt’s comments at: http://marginalrevolution.com/marginalrevolution/2017/09/accurate-genomic-prediction-human-height.html

    One of those comments references this paper: https://www.biorxiv.org/content/early/2017/07/07/160291.1
    which looks at intelligence and uses a relatively new technique named MTAG (Multi-Trait Analysis of Genome-wide association). I wonder if that will prove generally useful.

    Razib has a brief post at: https://gnxp.nofe.me/2017/09/18/release-the-uk-biobank-the-prediction-of-height-edition/
    but it contains more enthusiasm than content (am I projecting? ; )

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter Display All Comments
  28. hyperbola says:
    @candid_observer
    I find this line of criticism baffling.

    Look, if we can predict diseases, with at least some level of accuracy, that can very often be medically useful in and of itself. Suppose we can predict with some reliability someone's tendency toward colon cancer, or prostate cancer, based on their genes -- predictions that would otherwise be impossible. Then we could carefully monitor such cases, and perform diagnostic procedures only on such cases, sparing those with very low susceptibility. It's entirely possible that a great range of diseases will fall under this rubric, including many forms of cancer.

    And of course from a scientific point of view it is often very important to establish that a trait is genetic, and to what degree, especially when the other methods (such as twin studies) are, or are at least are thought to be, methodologically problematic.

    The distinction is the “rare” vs. “complex” disease conundrum again. When the “necessary and sufficient” genes are a handful, finding therapies (even pre-birth genetic modifications) is plausible. When hundreds or thousands of genes contribute to some kind of disease, then a single measure like “propensity to suffer” this disease can be constructed with hundreds if not thousands of different combinations of genes if the statistics of the gene identification was correct. What is it that you then want to monitor as an indicator of disease occurence/progression (the inheritable genes are not changing)? If you then have a reliable indicator of disease, but this involves hundreds of genes (i.e. things like 2 to the power of 100 for just two variants of each gene), what are the chances that some sort of therapy for the particular gene-set of a particular patient is already known? What kind of patient sample sizes will be required to test/verify such therapies for the different patients and how will these be obtained?

    The sheer intractability that seems now to be increasingly revealed from GWAS of complex diseases suggests that only the “rare” diseases will really be helped by current GWAS approaches.

    Read More
    • Replies: @candid_observer
    You seem to have failed even to understand the point I was making.

    With regard to many major diseases, a key medical question is, how many individuals must have a procedure performed to prevent one additional bad outcome?

    https://en.wikipedia.org/wiki/Number_needed_to_treat

    With regard to prostate cancer, for example, that question arises in the following context (among others): how many individuals with a finding of prostate cancer of a given Gleason score should have their prostates removed? The "safe" thing, of course, is to have them removed in all such findings -- but the removal of the prostate is a significant step for a man, not to be undertaken lightly. And, since with the lower Gleason scores, the probability of aggressive cancer is relatively low -- perhaps in the lower single digits in many cases -- removing all those prostates will be unnecessary in perhaps 95+% of the cases.

    This is the reality of the treatment of prostate cancer today.

    What would make all the difference? A more accurate estimation of the likelihood that an individual with a finding of a given Gleason score will in fact develop aggressive cancer.

    And that is precisely what a genetic prediction, of the sort we see in this paper, might be able to provide us.

    The same kind of issue arises in breast cancer, and in many other kinds of diseases. Prediction is more than half of the game.

    And it's also rather naive to expect that a genetic explanation of a disease which provided information about the pathways involved would actually turn directly into a treatment. There is a huge leap from one to the other. We have known for some time that certain genes have a significant impact on breast cancer. Do we have a cure yet?

    It's just not in any way certain that it's more genetic information we need to create cures for cancer. The cures, if any, may well come from very different sources.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  29. hyperbola says:
    @James Thompson
    I agree. More targeted screening would be a great advantage, also having a better idea of which drugs to use, or to develop.

    I disagree. What is it that you propose to screen for? See my answer to candid_observer at #28.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  30. dearieme says:
    @hyperbola
    Oh, I am in favor of more research, even of more GWAS type studies for "rare" diseases. BUT, I don't expect that more GWAS is going to be very productive for many medical situations (or complex traits like IQ). Fortunately, it is now sufficiently cheap that we don't need massive resources for GWAS measurements (access to patients is probably the main bottleneck?) and can increasingly devote resources to other research approaches. I think what I replied to res will give you an idea what I mean:

    GWAS style approaches have been spectacularly successful for “rare” diseases because the “necessary and sufficient” criterion involves only a very limited number of genes. I think that by now it is becoming ever clearer that there are many “complex” traits/diseases where simply measuring ever larger cohort sizes is not going to get us much further. As you suggest, “systems biology” in the sense that genes “co-function” in achieving biological states is probably where this has to go. BUT, I would contend that we need other kinds of measurements (loosely described as highly parallel conventional biology of massive numbers of genes/proteins) rather than ever more GWAS of the present types. Just as development of the technology for GWAS was necessary, now we need to concentrate on developing new kinds of technology that measure other things.
     

    Fair enough.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  31. dearieme says:
    @James Thompson
    I think the main gain would be to overcome base-rate false positives. They cause anxiety, and needless painful investigation. Gerd Gigerenzer "Reckoning with risk" is a great read on this topic.

    ‘Gerd Gigerenzer “Reckoning with risk” is a great read on this topic.’

    It’s a tremendous read on many topics. It’s my Book of the Decade for whichever decade I bought it in.

    Read More
    • Replies: @res
    How useful do you think it is to read "Reckoning with Risk" if I have already read Gigerenzer's "Calculated risks : how to know when numbers deceive you" and "Gut Feelings"? I have the sense "Calculated risks" overlaps a fair bit with "Reckoning with Risk."

    I think an earlier recommendation of yours (dearieme) might have been part of what motivated me to read one or both of those. Thanks!
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  32. @hyperbola
    The distinction is the "rare" vs. "complex" disease conundrum again. When the "necessary and sufficient" genes are a handful, finding therapies (even pre-birth genetic modifications) is plausible. When hundreds or thousands of genes contribute to some kind of disease, then a single measure like "propensity to suffer" this disease can be constructed with hundreds if not thousands of different combinations of genes if the statistics of the gene identification was correct. What is it that you then want to monitor as an indicator of disease occurence/progression (the inheritable genes are not changing)? If you then have a reliable indicator of disease, but this involves hundreds of genes (i.e. things like 2 to the power of 100 for just two variants of each gene), what are the chances that some sort of therapy for the particular gene-set of a particular patient is already known? What kind of patient sample sizes will be required to test/verify such therapies for the different patients and how will these be obtained?

    The sheer intractability that seems now to be increasingly revealed from GWAS of complex diseases suggests that only the "rare" diseases will really be helped by current GWAS approaches.

    You seem to have failed even to understand the point I was making.

    With regard to many major diseases, a key medical question is, how many individuals must have a procedure performed to prevent one additional bad outcome?

    https://en.wikipedia.org/wiki/Number_needed_to_treat

    With regard to prostate cancer, for example, that question arises in the following context (among others): how many individuals with a finding of prostate cancer of a given Gleason score should have their prostates removed? The “safe” thing, of course, is to have them removed in all such findings — but the removal of the prostate is a significant step for a man, not to be undertaken lightly. And, since with the lower Gleason scores, the probability of aggressive cancer is relatively low — perhaps in the lower single digits in many cases — removing all those prostates will be unnecessary in perhaps 95+% of the cases.

    This is the reality of the treatment of prostate cancer today.

    What would make all the difference? A more accurate estimation of the likelihood that an individual with a finding of a given Gleason score will in fact develop aggressive cancer.

    And that is precisely what a genetic prediction, of the sort we see in this paper, might be able to provide us.

    The same kind of issue arises in breast cancer, and in many other kinds of diseases. Prediction is more than half of the game.

    And it’s also rather naive to expect that a genetic explanation of a disease which provided information about the pathways involved would actually turn directly into a treatment. There is a huge leap from one to the other. We have known for some time that certain genes have a significant impact on breast cancer. Do we have a cure yet?

    It’s just not in any way certain that it’s more genetic information we need to create cures for cancer. The cures, if any, may well come from very different sources.

    Read More
    • Replies: @hyperbola
    I don't really see any contradictions between what each of us has said. You bring in the additional question of prognosis - and it is certainly crucial for therapy. I would simply say that the rare vs complex distinction is probably also valid here and that prognosis in the case of complex diseases (many contributing genes) will also remain highly ambiguous and very difficult to ever define clearly purely from gene patterns. As for this statement,

    And it’s also rather naive to expect that a genetic explanation of a disease which provided information about the pathways involved would actually turn directly into a treatment. There is a huge leap from one to the other. We have known for some time that certain genes have a significant impact on breast cancer. Do we have a cure yet?
     
    my argument is that for complex diseases it is only when we understand the functional networks that therapy and prognosis will advance strongly. To put this in a slightly different way, maybe we should regard diseases such as cancer as a chronic, semi-stable state in a response landscape that does not react to its environment in the "normal" way (related to some of Weinstein´s suggestions). We then need to understand such states and the transtions to/from these states (which might have many pathways) to make medical progress. Given the different genetic backgrounds of individuals, these states/pathways may show some variation, but presumably disease phenotypes reflect some common underlying features at the systems biology level.

    We probably have some common ground when you say this:

    It’s just not in any way certain that it’s more genetic information we need to create cures for cancer. The cures, if any, may well come from very different sources.
     
    , @hyperbola
    Post script.

    Some of these same kinds of considerations may also be applicable to infectious diseases. Here is something I find intriguing.

    Annu Rev Genomics Hum Genet. 2013;14:215-43. doi: 10.1146/annurev-genom-091212-153448. Epub 2013 May 29.
    The genetic theory of infectious diseases: a brief history and selected illustrations.

    Casanova JL1, Abel L.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  33. res says:
    @dearieme
    'Gerd Gigerenzer “Reckoning with risk” is a great read on this topic.'

    It's a tremendous read on many topics. It's my Book of the Decade for whichever decade I bought it in.

    How useful do you think it is to read “Reckoning with Risk” if I have already read Gigerenzer’s “Calculated risks : how to know when numbers deceive you” and “Gut Feelings”? I have the sense “Calculated risks” overlaps a fair bit with “Reckoning with Risk.”

    I think an earlier recommendation of yours (dearieme) might have been part of what motivated me to read one or both of those. Thanks!

    Read More
    • Replies: @James Thompson
    One of his first books I came across was "Heuristics which make us smart". Yes, there are overlaps, but "Reckoning with risk" was a good collection. I have 7 posts on him, but here is my commendation:

    “It can rarely be said of a psychologist that everything they write is worth reading. Gigerenzer is one such psychologist. He writes in plain English (presumably his second language) and understands his material so thoroughly that he can explain it simply, the sign of an intelligent and honest teacher. This straightforward approach means that you can follow this heuristic to make you smart: if you cannot understand him first time around, it is worth reading him several times until you do. With lesser writers, if you cannot understand them first time, turn elsewhere.”
    , @dearieme
    Sorry I can't help, res; I haven't got his other books. Maybe it's time I added them to my Christmas list.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  34. @res
    How useful do you think it is to read "Reckoning with Risk" if I have already read Gigerenzer's "Calculated risks : how to know when numbers deceive you" and "Gut Feelings"? I have the sense "Calculated risks" overlaps a fair bit with "Reckoning with Risk."

    I think an earlier recommendation of yours (dearieme) might have been part of what motivated me to read one or both of those. Thanks!

    One of his first books I came across was “Heuristics which make us smart”. Yes, there are overlaps, but “Reckoning with risk” was a good collection. I have 7 posts on him, but here is my commendation:

    “It can rarely be said of a psychologist that everything they write is worth reading. Gigerenzer is one such psychologist. He writes in plain English (presumably his second language) and understands his material so thoroughly that he can explain it simply, the sign of an intelligent and honest teacher. This straightforward approach means that you can follow this heuristic to make you smart: if you cannot understand him first time around, it is worth reading him several times until you do. With lesser writers, if you cannot understand them first time, turn elsewhere.”

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  35. hyperbola says:
    @candid_observer
    You seem to have failed even to understand the point I was making.

    With regard to many major diseases, a key medical question is, how many individuals must have a procedure performed to prevent one additional bad outcome?

    https://en.wikipedia.org/wiki/Number_needed_to_treat

    With regard to prostate cancer, for example, that question arises in the following context (among others): how many individuals with a finding of prostate cancer of a given Gleason score should have their prostates removed? The "safe" thing, of course, is to have them removed in all such findings -- but the removal of the prostate is a significant step for a man, not to be undertaken lightly. And, since with the lower Gleason scores, the probability of aggressive cancer is relatively low -- perhaps in the lower single digits in many cases -- removing all those prostates will be unnecessary in perhaps 95+% of the cases.

    This is the reality of the treatment of prostate cancer today.

    What would make all the difference? A more accurate estimation of the likelihood that an individual with a finding of a given Gleason score will in fact develop aggressive cancer.

    And that is precisely what a genetic prediction, of the sort we see in this paper, might be able to provide us.

    The same kind of issue arises in breast cancer, and in many other kinds of diseases. Prediction is more than half of the game.

    And it's also rather naive to expect that a genetic explanation of a disease which provided information about the pathways involved would actually turn directly into a treatment. There is a huge leap from one to the other. We have known for some time that certain genes have a significant impact on breast cancer. Do we have a cure yet?

    It's just not in any way certain that it's more genetic information we need to create cures for cancer. The cures, if any, may well come from very different sources.

    I don’t really see any contradictions between what each of us has said. You bring in the additional question of prognosis – and it is certainly crucial for therapy. I would simply say that the rare vs complex distinction is probably also valid here and that prognosis in the case of complex diseases (many contributing genes) will also remain highly ambiguous and very difficult to ever define clearly purely from gene patterns. As for this statement,

    And it’s also rather naive to expect that a genetic explanation of a disease which provided information about the pathways involved would actually turn directly into a treatment. There is a huge leap from one to the other. We have known for some time that certain genes have a significant impact on breast cancer. Do we have a cure yet?

    my argument is that for complex diseases it is only when we understand the functional networks that therapy and prognosis will advance strongly. To put this in a slightly different way, maybe we should regard diseases such as cancer as a chronic, semi-stable state in a response landscape that does not react to its environment in the “normal” way (related to some of Weinstein´s suggestions). We then need to understand such states and the transtions to/from these states (which might have many pathways) to make medical progress. Given the different genetic backgrounds of individuals, these states/pathways may show some variation, but presumably disease phenotypes reflect some common underlying features at the systems biology level.

    We probably have some common ground when you say this:

    It’s just not in any way certain that it’s more genetic information we need to create cures for cancer. The cures, if any, may well come from very different sources.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  36. hyperbola says:
    @candid_observer
    You seem to have failed even to understand the point I was making.

    With regard to many major diseases, a key medical question is, how many individuals must have a procedure performed to prevent one additional bad outcome?

    https://en.wikipedia.org/wiki/Number_needed_to_treat

    With regard to prostate cancer, for example, that question arises in the following context (among others): how many individuals with a finding of prostate cancer of a given Gleason score should have their prostates removed? The "safe" thing, of course, is to have them removed in all such findings -- but the removal of the prostate is a significant step for a man, not to be undertaken lightly. And, since with the lower Gleason scores, the probability of aggressive cancer is relatively low -- perhaps in the lower single digits in many cases -- removing all those prostates will be unnecessary in perhaps 95+% of the cases.

    This is the reality of the treatment of prostate cancer today.

    What would make all the difference? A more accurate estimation of the likelihood that an individual with a finding of a given Gleason score will in fact develop aggressive cancer.

    And that is precisely what a genetic prediction, of the sort we see in this paper, might be able to provide us.

    The same kind of issue arises in breast cancer, and in many other kinds of diseases. Prediction is more than half of the game.

    And it's also rather naive to expect that a genetic explanation of a disease which provided information about the pathways involved would actually turn directly into a treatment. There is a huge leap from one to the other. We have known for some time that certain genes have a significant impact on breast cancer. Do we have a cure yet?

    It's just not in any way certain that it's more genetic information we need to create cures for cancer. The cures, if any, may well come from very different sources.

    Post script.

    Some of these same kinds of considerations may also be applicable to infectious diseases. Here is something I find intriguing.

    Annu Rev Genomics Hum Genet. 2013;14:215-43. doi: 10.1146/annurev-genom-091212-153448. Epub 2013 May 29.
    The genetic theory of infectious diseases: a brief history and selected illustrations.

    Casanova JL1, Abel L.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  37. Factorize says:

    Note the differences in results for height using the below method (see supplement page 10 etc.) This research has found many fewer SNPs (2000 vs 22,000).

    https://www.biorxiv.org/content/biorxiv/early/2017/09/27/194944.full.pdf

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter Display All Comments
  38. Factorize says:

    Is anyone aware of a technology that could select individual chromosomes for inclusion into an egg/sperm/early embryo?

    Merely selecting amongst 10 embryos for the highest IQ PGS could increase IQ by 1 standard deviation. Using this same technique with more intense selection could perhaps increase intelligence by a few SD.

    If one could independently select chromosomes with the most favorable characteristics (selection factor up to 1 in 10~23), then very extreme changes could result using a very low end technology.

    Comments?

    Read More
    • Replies: @RaceRealist88
    Except embryos have totipotent cells which can become any cell in the body and which, therefore means that you can't 'estimate the potential' of a human embryo.

    https://notpoliticallycorrect.me/2017/09/16/does-human-potential-lie-in-the-embryo/

    https://digest.bps.org.uk/2016/09/12/its-now-possible-in-theory-to-predict-life-success-from-a-genetic-test-at-birth/comment-page-1/#comment-9122
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  39. @Factorize
    Is anyone aware of a technology that could select individual chromosomes for inclusion into an egg/sperm/early embryo?

    Merely selecting amongst 10 embryos for the highest IQ PGS could increase IQ by 1 standard deviation. Using this same technique with more intense selection could perhaps increase intelligence by a few SD.

    If one could independently select chromosomes with the most favorable characteristics (selection factor up to 1 in 10~23), then very extreme changes could result using a very low end technology.

    Comments?

    Except embryos have totipotent cells which can become any cell in the body and which, therefore means that you can’t ‘estimate the potential’ of a human embryo.

    https://notpoliticallycorrect.me/2017/09/16/does-human-potential-lie-in-the-embryo/

    https://digest.bps.org.uk/2016/09/12/its-now-possible-in-theory-to-predict-life-success-from-a-genetic-test-at-birth/comment-page-1/#comment-9122

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  40. dearieme says:
    @res
    How useful do you think it is to read "Reckoning with Risk" if I have already read Gigerenzer's "Calculated risks : how to know when numbers deceive you" and "Gut Feelings"? I have the sense "Calculated risks" overlaps a fair bit with "Reckoning with Risk."

    I think an earlier recommendation of yours (dearieme) might have been part of what motivated me to read one or both of those. Thanks!

    Sorry I can’t help, res; I haven’t got his other books. Maybe it’s time I added them to my Christmas list.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
Current Commenter says:

Leave a Reply - Comments on articles more than two weeks old will be judged much more strictly on quality and tone


 Remember My InformationWhy?
 Email Replies to my Comment
Submitted comments become the property of The Unz Review and may be republished elsewhere at the sole discretion of the latter
Subscribe to This Comment Thread via RSS Subscribe to All James Thompson Comments via RSS