The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 James Thompson ArchiveBlogview
The 99 Steps of Intelligence Hunters
Email This Page to Someone

 Remember My Information



=>
Search Text Case Sensitive  Exact Words  Include Comments

39 steps

In my last post I said:

I think we can see the direction of travel of the debate, which is that the case for genetics being a part cause of individual differences is gaining ground. It is only doing so because it can increasingly account for some of variance. A decade ago it was not possible to associate the genetic code with intelligent behaviour. Now studies which link snippets of code to intelligence are being published every few months. The pace of discovery is extraordinary. “Nature” and other science journals report frequently on new genetic correlations with important human behaviours, notably mental ability and mental illness and health generally.

I had no idea that my thesis would receive instant support the following day in a paper which begins with a stirring paragraph, worth quoting in full:

Since its discovery in 1904, hundreds of studies have replicated the finding that around 40% of the variance in people’s test scores on a diverse battery of cognitive tests can be accounted for by a single general factor. General cognitive function is peerless among human psychological traits in terms of its empirical support and importance for life outcomes. Individual differences in general cognitive function are stable across most of the life course. Twin studies find that general cognitive function has a heritability of more than 50% from adolescence through adulthood to older age. SNP-based estimates of heritability for general cognitive function are about 20-30%. To date, little of this substantial heritability has been explained; only a few relevant genetic loci have been discovered (Table 1 and Fig.1). Like other highly polygenic traits, a limitation on uncovering relevant genetic loci is sample size; to date, there have been fewer than 100,000 individuals in studies of general cognitive function.

Ninety-nine independent genetic loci influencing general cognitive function include genes associated with brain health and structure (N = 280,360)
Davies et al. (2017)

doi: https://doi.org/10.1101/176511

http://www.biorxiv.org/content/early/2017/08/17/176511.full.pdf+html

First of all, it is good to see papers based on f*!^#ff sample sizes. One of the authors, who shall remain nameless so long as he pays me my usual fee, relishes sample sizes large enough to tell doubters that they should go off and multiply. Not only that, but once again a large group of researchers drawn from the Western research world have got together in order to assemble the afore-mentioned large samples. At a quick glance, there are about 200 authors, so each gene takes 2 researchers to find. I have it on good authority that the first and last-named authors spent 3 years of their lives on this project. The authors say:

General cognitive function is a prominent human trait associated with many important life outcomes, including longevity. The substantial heritability of general cognitive function is known to be polygenic, but it has had little explication in terms of the contributing genetic variants. Here, we combined cognitive and genetic data from the CHARGE and COGENT consortia, and UK Biobank (total N=280,360). We found 9,714 genome-wide significant SNPs (P<5 x 10-8) in 99 independent loci. Most showed clear evidence of functional importance. Among many novel genes associated with general cognitive function were SGCZ, ATXN1, MAPT, AUTS2, and P2RY6. Within the novel genetic loci were variants associated with neurodegenerative disorders, neurodevelopmental disorders, physical and psychiatric illnesses, brain structure, and BMI. Gene-based analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets. Genetic association results predicted up to 4% of general cognitive function variance in independent samples. There was significant genetic overlap between general cognitive function and information processing speed, as well as many health variables including longevity.

Interestingly, the 9,714 SNPs is entirely in line with the calculation Steve Hsu made that genetics researchers needed ~10k causal variants and ~million sample size to “solve” intelligence. Here we have the requisite causal variations, and have gone a third of the way on sample size, resulting in a magnificent step towards the required goal. Some of the hits overlap with previously identified sections of code, others are novel. Novel genetic correlations were identified between general cognitive function and ADHD rg= -0.36, bipolar disorder rg= -0.09, major depression rg= -0.30, and longevity rg= 0.15. This is now part of a general pattern: the genetic code for ability is associated with psychiatric state, probably because vulnerability to those disorders is association with lower ability. Remember that when one talks of “associations” with genetics, this will involve genes which are positive for ability, and genes which are negative.

The team also looked at reaction times, another one of my bombshell measures of mental ability (true zero, ratio scale) which is both phenotypically and genetically correlated with general cognitive function, and accounts for some of its association with health.

There were 330,069 individuals in the UK Biobank sample with both reaction time and genetic data. GWA results for reaction time uncovered 2,022 significant SNPs in 42 independent genomic regions; 122 of these SNPs overlapped with general cognitive function, with 76 having a consistent direction of effect. These genomic loci showed clear evidence of functionality. Using gene-based GWA, 191 genes attained statistical significance, 28 of which overlapped with general cognitive function. [] There was a genetic correlation of 0.227 between reaction time and general cognitive function.

People with higher general cognitive function are broadly healthier; here, we find overlap between genetic loci for general cognitive function and a number of physical health traits. These shared genetic associations may reflect a causal path from cognitive function to disease, cognitive consequences of disease, or pleiotropy. For psychiatric illness, conditions like schizophrenia (and, to a lesser extent, bipolar disorder) are characterised by cognitive impairments, and thus reverse causality (i.e. from cognitive function to disease) is less likely.

As per usual, the papers from this team follow the “two for the price of one” principle, in that they contain the results from the sample of discovery, which they immediately test on other samples. This shows that on 3 test samples they were able to account for 2.37% of the variance in ELSA, 3.96% in Generation Scotland and 4.00% in Understanding Society. “Why so little?” you may ask. Well, “Why so much”, I would reply. These are association studies, the first step in looking for causal links. Yes, causal. Associations are being found by an a-theoretical search process, tantamount to trying to break an enemy code without having any detailed knowledge as to how the enemy forces operate. Testing actual causality may need to involve Petri dishes and selective deletions of bits of code using CRSIP-R. That is what James Lee surmises might be the next step, but association techniques are developing rapidly, and may yet have some way to run.

The authors studied the proportion of variance explained by all common SNPs in four of the largest individual samples, using univariate GCTA-GREML analyses: English Longitudinal Study of Ageing (ELSA: h2= 0.12, SE= 0.06), Understanding Society (h2= 0.17, SE = 0.04), UK Biobank Assessment Centre (h2= 0.25,SE =0.006), and Generation Scotland (h2= 0.20, SE= 0.0519) (Table 2). Genetic correlations for general cognitive function amongst these cohorts, estimated using bivariate GCTA-GREML, ranged from rg= 0.88 to 1.0

There was a genetic correlation (rg) of 0.227 (P= 4.33 × 10-27) between reaction time and general cognitive function.

How to summarize this paper? Well, first note that as the sample sizes increase, the number of reliable genes and SNPs detected increases enormously. Size matters.

GWA summary 2017

Second, note that the association studies are done in a number of different ways, each with their own characteristics, but all contributing to a general picture. It is clear that the genetic signals relate to brain processes, and the identification of those associations is becoming far more specific. We are much closer to detecting actual causal links.

GWA tissue type 2017

Third, note that we can study correlations in two domains: the correlation between the genetic code and human behaviour (in this case intelligence) and the correlations between different parts of the genetic code. The latter gives rise to the concept of genetic correlation: a way of identifying shared genetic pathways and putative causes.

Fourth, note that the technique used to bring the different intelligence tests onto a common scale is principal components analysis, which is much closer to a simple mathematical process than is factor analysis. The simplest technique is the best for this large-scale comparative study.

The authors conclude:

General cognitive function has prominence and pervasiveness in the human life course, and it is important to understand the environmental and genetic origins of its variation in the population. The unveiling here of many new genetic loci, genes, and genetic pathways that contribute to its heritability—which it shares with many health outcomes, longevity, brain structure, and processing speed—provides a foundation for exploring the mechanisms that bring about and sustain cognitive efficiency through life.

This paper is a major achievement, a magnificent step forwards in cracking the intelligence problem. What happens next? More papers are in the offing, and then we may see the beginnings of an experimental phase, looking at neurone and dendrite development in vitro.

 
• Category: Science • Tags: Genomics, I.Q. genomics, IQ 
Commenters to Ignore...to FollowEndorsed Only
    []
  1. res says:

    Wow. If this paper took three years any idea what kind of sample sizes are in the pipeline now?

    Any thoughts on the pituitary being the only non-brain tissue type to exceed their threshold in that final figure? The pituitary is even more notable in the coarser grained b panel of the paper figure.

    Read More
    ReplyAgree/Disagree/Etc.
    AgreeDisagreeLOLTroll
    These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
    Sharing Comment via Twitter
    /jthompson/the-99-steps-of-intelligence-hunters/#comment-1977869
    More... This Commenter Display All Comments
  2. m___ says:

    Comprehension of genetic code, editing genetic code, synthesizing genetic code … and machines, based on the von Neuman computer model combined will lead to artificial intelligence.

    For now the race of what and who, when on both sides, the writing of algorithms that “think” on one side and the genetic comprehension on the other is the most interesting race of our times. The breakthroughs, according to mine miserable insight of yet, : understanding genetic code will be the major pathway. There is something wrong in brute forcing as is done by machines, eating energy at rat pace, as to “tweak” instinct for pathways of mininmal resistance.

    The comment section is eery quit, feels naked. Even the political – social consequence of a combined aproach of both: still further elognating quantity from quality should have all eyeballs on the above.

    Read More
    • Replies: @nickels
    It is unlikely that, beyond transcription of proteins and maybe a few regulatory functions, we will, anytime soon, understand how DNA works, or how it creates the phenomenon of intelligence and thought.

    For one thing, epigenetic factors also play a significant role in development and are just starting to be studied. The structure of the mother's egg is chocked full of information that is transmitted throughout the creation of the organism.

    Also, it still leaves the problem of understanding how the mechanism created from the genes and cells (brain) works, which is 100% in the dark at the moment. Neuroscience is a joke, basically a mockery based on watching parts of the brain 'light up' and trying to make up a bunch of nonsense conclusions.

    Just as genetics opened a new pathway of study, now epigenetics is doing the same, and, most likely, beyond that we will find that quantum biology adds yet another level of complexity. The rabbit hole is unlikely to ever end.

    The plain fact is that consciousness cannot be a function of matter, as we have seen from the many people who have temporarily died. There are many cases of people witnessing things they could not have possibly seen from their bodies. These reports indicate a soul, a consciousness beyond matter.

    This spells a great doom for any notion of AI. A simple test to see the completely bleak state of AI is to chat for a moment with an online bot. They are simply dumber than toasters.

    That spark of will, of motive, of understanding is not from matter.
  3. DFH says:

    How are they going to keep denying Race/IQ once the actual genes responsible are found to have different racial distributions?

    Read More
    • Replies: @Nickname
    Well, some genes don't work cross-racially. Skin colour for Caucasoids and Mongoloids are race-specific and so is Red Hair on Cacasoids (it only lightens skin in Mongoloids, without the hair effect).
    They could pull this, and it would make sense.
    But then, they would have to admit that Race is a biological reality, a fact, and that it matters and makes all the difference. They would also have to admit then that Caucasoids, or Whites in this matter, have special Intelligence genes that other races supposedly wouldn't have.
    It's a no-win situation for "them".
    , @Medvedev

    genes responsible are found to have different racial distributions?
     
    Crime statistics has different rates among races. It doesn't stop leftists from labeling you a racist for merely citing crime statistics. Because, you know, "there are no races only human race". Yet, leftists would use the same crime statistics to bash Whiteys for supposed racism that causes such disparities. Go figure %)
  4. Nickname says:
    @DFH
    How are they going to keep denying Race/IQ once the actual genes responsible are found to have different racial distributions?

    Well, some genes don’t work cross-racially. Skin colour for Caucasoids and Mongoloids are race-specific and so is Red Hair on Cacasoids (it only lightens skin in Mongoloids, without the hair effect).
    They could pull this, and it would make sense.
    But then, they would have to admit that Race is a biological reality, a fact, and that it matters and makes all the difference. They would also have to admit then that Caucasoids, or Whites in this matter, have special Intelligence genes that other races supposedly wouldn’t have.
    It’s a no-win situation for “them”.

    Read More
  5. utu says:

    4% explained by 200 authors. 0.02% per author. Perhaps we need more authors.

    Read More
    • Replies: @Heracleitus
    From Hsu's post on this:

    Note 4% of total variance = 1/25 and sqrt(1/25) = 1/5, so a predictor built from these variants would correlate ~0.2 with actual cognitive ability. There is still much more variance to be discovered with larger samples, of course.
  6. ” provides a foundation for exploring the mechanisms that bring about and sustain cognitive efficiency through life. ”

    Exploring of course can be great fun, for the curious, people like me who wonder why we have such large brains.
    Maybe an explanation is:
    ⦁ William H. Calvin, ‘De opkomst van het intellect, Een reis naar de ijstijd’, Amsterdam 1994 (The Ascent of Mind. Ice Climates and the Evolution of Intelligence’, 1990)
    If this exploration will have any impact on present political problems, I wonder.
    In a documentary on differences in IQ between groups a black lady accepted that blacks have lower IQ’s, but she wanted blacks to be financially compensated for this handicap.
    Her handicap maybe explains the demand for compensation.

    Read More
  7. Medvedev says:
    @DFH
    How are they going to keep denying Race/IQ once the actual genes responsible are found to have different racial distributions?

    genes responsible are found to have different racial distributions?

    Crime statistics has different rates among races. It doesn’t stop leftists from labeling you a racist for merely citing crime statistics. Because, you know, “there are no races only human race”. Yet, leftists would use the same crime statistics to bash Whiteys for supposed racism that causes such disparities. Go figure %)

    Read More
  8. utu says:

    principal components analysis, which is much closer to a simple mathematical process than is factor analysis. The simplest technique is the best for this large-scale comparative study

    It’s not about simplicity. It is about clarity and sound mathematical definition. In PCA the solution is unique. The components (eigenvectors) are orthogonal. The eigenvectors maximize the Rayleigh quotient which means that the loadings and accounted variance are the greatest. So any other vector explains less variance than the eigenvector associated with the largest eigenvalue. The same applies to the second eigenvector in the n-1 subspace and the third vector in the n-2 subspace and so on. (see Rayleigh quotient in wiki)

    There is no mathematical clarity in factor analysis. If chosen so, as it is common in higher order analysis, factors are not orthogonal. There is no uniqueness. Different criteria produce different results. One researcher using one set of criteria and one mathematical recipe will obtain different result from another researcher using a different recipe. Only researchers indoctrinated in the same methodology (coming from the same PhD breeding stable) obtain congruent results. The ambiguity in factor analysis offer great flexibility for tweaking to obtain desired results and plenty of room for obfuscation to hide the tweaking. The etiology of problems of factor analysis is in its genes. Its inventor and its early practitioners in their part time roles as mathematicians preferred creativity over mathematical rigor. It’s a good example of the mind over matter phenomenon.

    Read More
  9. Since geneticists (unlike the mainstream narrative) all acknowledge race categories as genetically important, actual work in this research area almost always restricts the sample to one race. Usually they restrict the data to the European-race subsample since that has the largest subsample. This is empirically easy to do: despite the “race is a social construct” nonsense in the mainstream narrative, it is possible to identify race with near perfect accuracy from an individual’s genome. This sample restriction also has the political advantage that it keeps the researchers away from troubling findings, in terms of noticing gene-linked race differences in intelligence.

    Read More
    • Replies: @Double Juice JJ
    No, it's called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles. So the standard procedure is to test genotype-phenotype relationship in a population and then trying to replicate findings in another population to see if gene expression is the same. They're miles away from it now.
  10. Joe Hide says:

    I enjoyed this article James Thompson. You had bits of humor that kept me alert during the statistical/ experimental math part! Actually, since I didn’t understand most of that, You helped me & similiar readers by partitioning off the conclusions, which were comprehensible to my limited mathematical/ experimental background, quite well. That said, I’m glad You had the math part because it lended credibility. Keep them articles coming!

    Read More
  11. nickels says:
    @m___
    Comprehension of genetic code, editing genetic code, synthesizing genetic code ... and machines, based on the von Neuman computer model combined will lead to artificial intelligence.

    For now the race of what and who, when on both sides, the writing of algorithms that "think" on one side and the genetic comprehension on the other is the most interesting race of our times. The breakthroughs, according to mine miserable insight of yet, : understanding genetic code will be the major pathway. There is something wrong in brute forcing as is done by machines, eating energy at rat pace, as to "tweak" instinct for pathways of mininmal resistance.

    The comment section is eery quit, feels naked. Even the political - social consequence of a combined aproach of both: still further elognating quantity from quality should have all eyeballs on the above.

    It is unlikely that, beyond transcription of proteins and maybe a few regulatory functions, we will, anytime soon, understand how DNA works, or how it creates the phenomenon of intelligence and thought.

    For one thing, epigenetic factors also play a significant role in development and are just starting to be studied. The structure of the mother’s egg is chocked full of information that is transmitted throughout the creation of the organism.

    Also, it still leaves the problem of understanding how the mechanism created from the genes and cells (brain) works, which is 100% in the dark at the moment. Neuroscience is a joke, basically a mockery based on watching parts of the brain ‘light up’ and trying to make up a bunch of nonsense conclusions.

    Just as genetics opened a new pathway of study, now epigenetics is doing the same, and, most likely, beyond that we will find that quantum biology adds yet another level of complexity. The rabbit hole is unlikely to ever end.

    The plain fact is that consciousness cannot be a function of matter, as we have seen from the many people who have temporarily died. There are many cases of people witnessing things they could not have possibly seen from their bodies. These reports indicate a soul, a consciousness beyond matter.

    This spells a great doom for any notion of AI. A simple test to see the completely bleak state of AI is to chat for a moment with an online bot. They are simply dumber than toasters.

    That spark of will, of motive, of understanding is not from matter.

    Read More
    • Replies: @Logan
    I've heard it explained that the genes are the design of an automobile, and the womb is the factory in which it is assembled.

    I think we're all clear that you can't get a great car without a great design. But we also all understand that quality control is the factory is esssential if you want that design to reach its potential.
  12. @utu
    4% explained by 200 authors. 0.02% per author. Perhaps we need more authors.

    From Hsu’s post on this:

    Note 4% of total variance = 1/25 and sqrt(1/25) = 1/5, so a predictor built from these variants would correlate ~0.2 with actual cognitive ability. There is still much more variance to be discovered with larger samples, of course.

    Read More
    • Replies: @utu
    predictor built from these variants would correlate ~0.2 with actual cognitive ability

    Correct. But what would it mean in practice? The standard deviation error of the prediction: 15*sqrt(1-0.2^2)=14.6969, where SD=15 of IQ distribution. It is not much better than the constant predictor function Predicted_IQ=100 that gives you the standard deviation error of the prediction 15 also with zero bias.

    Using 9,714 SNP's will reduce your error from SD=15, which you get when you are oblivious to genetics, to SD=14.7. Not very impressive.
  13. hyperbola says:

    Looks like another failure to understand “big data”. From the article:

    “””Gene-based analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets….”””

    At this level of multi-gene involvement, the number of combinations of the genetic variants of the 536 genes potentially present in different individuals becomes so large that even N = 280,360 is NOT a large sample size. Suppose that there are only 2 mutants per gene. Then the number of possible individual variants of this collection of genes is 2 raised to the power 536!

    In fact, a number as big as 536 already says that this line of “research” is pretty much a waste of time (and money). How many of these genetic variants are individually both necessary and sufficient for “general cognitive function”?

    By the way, note that this paper has NOT yet been reviewed. Even in these times of heavily corrupted “science”,

    Drug Companies & Doctors: A Story of Corruption

    http://www.nybooks.com/articles/2009/01/15/drug-companies-doctorsa-story-of-corruption/

    one hopes that competent refereees might be found.

    Read More
    • Replies: @res
    Because the economic incentives applicable to drug papers also apply to papers about the genetics of IQ. Thanks for the laugh. We need better trolls.

    If you actually care about understanding reality rather than just spewing FUD you might also want to look at the studies exploring the contributions of additive genetic effects relative to interactions, etc.
  14. res says:
    @hyperbola
    Looks like another failure to understand "big data". From the article:

    """Gene-based analyses found 536 genes significantly associated with general cognitive function; many were highly expressed in the brain, and associated with neurogenesis and dendrite gene sets...."""

    At this level of multi-gene involvement, the number of combinations of the genetic variants of the 536 genes potentially present in different individuals becomes so large that even N = 280,360 is NOT a large sample size. Suppose that there are only 2 mutants per gene. Then the number of possible individual variants of this collection of genes is 2 raised to the power 536!

    In fact, a number as big as 536 already says that this line of "research" is pretty much a waste of time (and money). How many of these genetic variants are individually both necessary and sufficient for "general cognitive function"?


    By the way, note that this paper has NOT yet been reviewed. Even in these times of heavily corrupted "science",

    Drug Companies & Doctors: A Story of Corruption
    http://www.nybooks.com/articles/2009/01/15/drug-companies-doctorsa-story-of-corruption/

    one hopes that competent refereees might be found.

    Because the economic incentives applicable to drug papers also apply to papers about the genetics of IQ. Thanks for the laugh. We need better trolls.

    If you actually care about understanding reality rather than just spewing FUD you might also want to look at the studies exploring the contributions of additive genetic effects relative to interactions, etc.

    Read More
    • Replies: @hyperbola
    If you were versed in the enormous literature from GWAS studies related to medicine, you wouldn't make such statements. Look up the verified examples of necessary and sufficient genes related to medical disease. Then look up what percentage of the "disease load" of human beings these necessary and sufficient genes account for (ca. 1%). This study shows pretty clearly that "general cognitive function" (a rather nebulous construct that is MUCH less well defined than a medical disease) has very few necessary and sufficient genes. Combinations of LARGE numbers of genetic variants in the context of the enormous number of potential gene combinations (e.g. 2 to the power 536) contribute to whatever "general cognitive function" is for any given individual person. In fact, that is probably what one should expect for almost any complex trait in an evolvable, stable complex system.
  15. in the end we will find that man is simply a machine. indeed, a complicated one, but a machine nonetheless. which means that a highly functioning artificial intelligence is a certainty.
    the future for humanity is borgs. we will augment our brainpower and reduce our susceptibility to disease. we will be evolved.
    and a longtime from now, we will transplant a mind into a machine, thus achieving immortality.

    Read More
  16. One Tribe says:

    I have noted the rise in ‘race’/genetic ‘media’ coverage, as a general trend in this year (some of us call) 2017.

    It has approximately aligned with the overt and ridiculous take over of the mainstream media, and looks very much like an intermediate stage social media as a tool of hybrid warfare (as per NATO Strategic Communications Centre of Excellence; see http://www.stratcomcoe.org/download/file/fid/5314 ) by proponents of racial differentiation/superiority.

    Recently, for the first time, I read the term “Racial Denialism”, maybe it was here, at UNZ.
    What a curious term?!

    These ‘arguments’ seem to implicitly suggest generational persistence.
    While I don’t see a prima facia problem with associating genetic configurations to behavioural/intelligence phenotypical traits, the question of generational persistence is greatly troubling.

    Maybe I am wrong, but as far as I know, only Y-chromosome and mitochondrial DNA are the only traceable persistent genetic configurations.
    With little or no evidence of co-persistence in these other genetic configurations, the entire argument for racial differentiation is invalidated.

    In the end, what the rise in the discussions of race/intelligence in the ‘media’ shows, is that people in power want to socialize the concept, massage it along, and eventually infect people with the false belief of racial group differentiation, most likely to justify their own existing grip on…
    entitlement.

    DON’T BUY IT!

    It is propaganda/psyops campaign underway.

    Read More
  17. Logan says:
    @nickels
    It is unlikely that, beyond transcription of proteins and maybe a few regulatory functions, we will, anytime soon, understand how DNA works, or how it creates the phenomenon of intelligence and thought.

    For one thing, epigenetic factors also play a significant role in development and are just starting to be studied. The structure of the mother's egg is chocked full of information that is transmitted throughout the creation of the organism.

    Also, it still leaves the problem of understanding how the mechanism created from the genes and cells (brain) works, which is 100% in the dark at the moment. Neuroscience is a joke, basically a mockery based on watching parts of the brain 'light up' and trying to make up a bunch of nonsense conclusions.

    Just as genetics opened a new pathway of study, now epigenetics is doing the same, and, most likely, beyond that we will find that quantum biology adds yet another level of complexity. The rabbit hole is unlikely to ever end.

    The plain fact is that consciousness cannot be a function of matter, as we have seen from the many people who have temporarily died. There are many cases of people witnessing things they could not have possibly seen from their bodies. These reports indicate a soul, a consciousness beyond matter.

    This spells a great doom for any notion of AI. A simple test to see the completely bleak state of AI is to chat for a moment with an online bot. They are simply dumber than toasters.

    That spark of will, of motive, of understanding is not from matter.

    I’ve heard it explained that the genes are the design of an automobile, and the womb is the factory in which it is assembled.

    I think we’re all clear that you can’t get a great car without a great design. But we also all understand that quality control is the factory is esssential if you want that design to reach its potential.

    Read More
    • Replies: @nickels
    I don't know enough about epigenetics to know how far the factory analogy goes.
    I do worry about people who are starting to use epigenetics to argue bizarre social justice constructs.

    From my preliminary reading there seems to be discrete information coded into the cell beyond DNA that directs certain functions of cell structure and organism growth.

    So does the environment play a roll, or is it just more deterministic info like DNA? Not sure.

  18. utu says:
    @Heracleitus
    From Hsu's post on this:

    Note 4% of total variance = 1/25 and sqrt(1/25) = 1/5, so a predictor built from these variants would correlate ~0.2 with actual cognitive ability. There is still much more variance to be discovered with larger samples, of course.

    predictor built from these variants would correlate ~0.2 with actual cognitive ability

    Correct. But what would it mean in practice? The standard deviation error of the prediction: 15*sqrt(1-0.2^2)=14.6969, where SD=15 of IQ distribution. It is not much better than the constant predictor function Predicted_IQ=100 that gives you the standard deviation error of the prediction 15 also with zero bias.

    Using 9,714 SNP’s will reduce your error from SD=15, which you get when you are oblivious to genetics, to SD=14.7. Not very impressive.

    Read More
  19. nickels says:
    @Logan
    I've heard it explained that the genes are the design of an automobile, and the womb is the factory in which it is assembled.

    I think we're all clear that you can't get a great car without a great design. But we also all understand that quality control is the factory is esssential if you want that design to reach its potential.

    I don’t know enough about epigenetics to know how far the factory analogy goes.
    I do worry about people who are starting to use epigenetics to argue bizarre social justice constructs.

    From my preliminary reading there seems to be discrete information coded into the cell beyond DNA that directs certain functions of cell structure and organism growth.

    So does the environment play a roll, or is it just more deterministic info like DNA? Not sure.

    Read More
    • Replies: @Logan
    I don't know either, and I don't think anyone does.

    But I think it's reasonably clear that DNA doesn't decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6' and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don't understand well at all.
  20. Everybody should watch this video – thanks to the person on 4chan for linking it

    Read More
  21. hyperbola says:
    @res
    Because the economic incentives applicable to drug papers also apply to papers about the genetics of IQ. Thanks for the laugh. We need better trolls.

    If you actually care about understanding reality rather than just spewing FUD you might also want to look at the studies exploring the contributions of additive genetic effects relative to interactions, etc.

    If you were versed in the enormous literature from GWAS studies related to medicine, you wouldn’t make such statements. Look up the verified examples of necessary and sufficient genes related to medical disease. Then look up what percentage of the “disease load” of human beings these necessary and sufficient genes account for (ca. 1%). This study shows pretty clearly that “general cognitive function” (a rather nebulous construct that is MUCH less well defined than a medical disease) has very few necessary and sufficient genes. Combinations of LARGE numbers of genetic variants in the context of the enormous number of potential gene combinations (e.g. 2 to the power 536) contribute to whatever “general cognitive function” is for any given individual person. In fact, that is probably what one should expect for almost any complex trait in an evolvable, stable complex system.

    Read More
    • Replies: @res
    One of my personal favorites is people who blather on about things like "enormous literature" and "Science!" but can't be bothered to actually cite a single study. Thanks for confirming my first impression. The capitalizing of words like "MUCH" and "LARGE" was a bonus. As was the invocation of 2^536 again showing you did not understand my additive genetics comment. Looking forward to continuing this conversation in 10 years as data continues to roll in.

    P.S. When you chose your username were you looking for "hyperbole" instead?
  22. res says:
    @hyperbola
    If you were versed in the enormous literature from GWAS studies related to medicine, you wouldn't make such statements. Look up the verified examples of necessary and sufficient genes related to medical disease. Then look up what percentage of the "disease load" of human beings these necessary and sufficient genes account for (ca. 1%). This study shows pretty clearly that "general cognitive function" (a rather nebulous construct that is MUCH less well defined than a medical disease) has very few necessary and sufficient genes. Combinations of LARGE numbers of genetic variants in the context of the enormous number of potential gene combinations (e.g. 2 to the power 536) contribute to whatever "general cognitive function" is for any given individual person. In fact, that is probably what one should expect for almost any complex trait in an evolvable, stable complex system.

    One of my personal favorites is people who blather on about things like “enormous literature” and “Science!” but can’t be bothered to actually cite a single study. Thanks for confirming my first impression. The capitalizing of words like “MUCH” and “LARGE” was a bonus. As was the invocation of 2^536 again showing you did not understand my additive genetics comment. Looking forward to continuing this conversation in 10 years as data continues to roll in.

    P.S. When you chose your username were you looking for “hyperbole” instead?

    Read More
    • Replies: @Double Juice JJ
    You must be seriously emotionally invested in this area of research to believe in the sample size excuse. Hyperbola is right, finding so little with such huge samples (in a preprint study) means such research likely is a waste of time and money. Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.

    Looking forward to continuing this conversation in 10 years, when your new excuse is "we need a 10 billion sample".
    , @hyperbola
    The statements I made are well known in the area of medical GWAS studies. I do not have any idea at what level of sophistication you might read relevant literature. I should not have to do literature research for you. Nor do I have any idea what library resources are available to you. Here are some places where you could start to educate yourself if needed.


    Scholz SW, Mhyre T, Ressom H, Shah S, Federoff HJ. 2012. Genomics and
    bioinformatics of Parkinson’s disease. Cold Spring Harb Perspect Med
    2:a009449.

    Genetics of Parkinson's disease.
    Lill CM.
    Mol Cell Probes. 2016 Dec;30(6):386-396. doi: 10.1016/j.mcp.2016.11.001. Epub 2016 Nov 4. Review.

    Ramanan VK1, Saykin AJ.
    Am J Neurodegener Dis. 2013 Sep 18;2(3):145-75.
    Pathways to neurodegeneration: mechanistic insights from GWAS in Alzheimer's disease, Parkinson's disease, and related disorders.

    Levin SA. 2003. Complex adaptive systems: Exploring the known, the
    unknown and the unknowable. Bull Am Math Soc 40:3–19.

    Whitacre JM, Bender A. 2010. Networked buffering: A basic mechanism for
    distributed robustness in complex adaptive systems. Theor Biol Med
    Model 7:20.

    Whitacre JM. Biological robustness: Paradigms, mechanisms, systems principles. Front Genet. 2012;3: 1–15.
  23. @res
    One of my personal favorites is people who blather on about things like "enormous literature" and "Science!" but can't be bothered to actually cite a single study. Thanks for confirming my first impression. The capitalizing of words like "MUCH" and "LARGE" was a bonus. As was the invocation of 2^536 again showing you did not understand my additive genetics comment. Looking forward to continuing this conversation in 10 years as data continues to roll in.

    P.S. When you chose your username were you looking for "hyperbole" instead?

    You must be seriously emotionally invested in this area of research to believe in the sample size excuse. Hyperbola is right, finding so little with such huge samples (in a preprint study) means such research likely is a waste of time and money. Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.

    Looking forward to continuing this conversation in 10 years, when your new excuse is “we need a 10 billion sample”.

    Read More
    • Replies: @utu
    the sample size excuse

    I thought that the sample size issue was to avoid false positives. If sample sizes N=2 then any gene on which the two individuals differ predicts the differences in phenotype traits between the two individuals in purely mathematical sense. When the trait is not binary like a disease where you either have it or not but continuous like height or IQ then obviously you need to have many genes or SNP's to build your model of prediction to match granularity (resolution) of the trait. You need to construct a function like polygenic score of many SNP's that would correlate with IQ, so the polygenic score must be able to assume as many values as IQ can assume. Now you have many SNP's to chose from, so how do you know that you do not over do it. In the sky you can select M stars and come up with analytical formula that each American resident's social security number (N=320 millions) will be predicted by star coordinates from one subset of stars out of 2^M plugging in just coordinates of the stars in the subset. But once you get the formula that works for Americans after you enlarge the set by appending all Chinese who got freshly issued SS numbers (N=320+1500 millions) the formula will fail and you will have to make entirely new formula presuming that M is large enough. This is, I thought, the main reason they want to have large samples. From the mathematical point of view the system is strongly undetermined. If there are circa 10 million SNPs the number of subsets of say of 9000 SNPs is staggeringly huge by 1000's order of magnitude greater than the number of atoms in the universe.

    What puzzles me that they obtain extremely low variance explained fractions. Like in this study it is just 4%. I think this is so they are limiting themselves to linear models only where the effects of SNP's are additive. Using the polygenic score is a simplest possible linear model. A nonlinear models open a new can of worms. Somewhere in Washington DC there might be a computer that have two lists in the data base: the list of SS numbers and the list of IQ test score results. The two lists establish a de facto a very nonlinear SS--->IQ relationship. You can plot it on the graph. Using thins graphs SSN predicts IQ with 100% accuracy for all the subjects who are included in the graph. You can do the same with SNP's if you permit nonlinear relationships. The only thing that can keep them in check is the requirement of having two data sets: one on which you develop your model and one on which you test it. But this works if there is no cheating.

    , @res
    "sample size excuse"--Interesting. Perhaps you can quote the particular statement of mine that you object to?

    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    It will be interesting to see if Steve Hsu is right about a sample of 1M possibly being enough.

    Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.
     
    That is not the case for height: Common SNPs explain a large proportion of heritability for human height
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232052

    Single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method by simulations based upon the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency (MAF) than the SNPs explored to date.

    Genome-wide association studies in human populations have discovered hundreds of SNPs significantly associated with complex traits1,2, yet for any one trait they typically account for only a small fraction of the genetic variation. Where is the missing heritability, the so called dark matter of the genome3,4? Suggested explanations include the existence of gene-by-gene or gene-by-environment interactions5, the common disease-rare variant hypothesis6 and the possibility that inherited epigenetic factors cause resemblance between relatives7,8. However, the variance explained by the validated SNPs is usually much less than the narrow-sense heritability, the proportion of phenotypic variance due to additive genetic variance. Non-additive genetic effects do not contribute to the narrow-sense heritability, so explanations based on non-additive effects are not relevant to the problem of missing heritability (Supplementary Note). There are two explanations for the failure of validated SNP associations to explain the estimated heritability: either the causal variants each explain such a small amount of variation that their effects fail to reach stringent significance thresholds and/or the causal variants are not in complete linkage disequilibrium (LD) with the SNPs that have been genotyped. Lack of complete LD might, for instance, occur if causal variants have lower minor allele frequency (MAF) than genotyped SNPs. Here we test these two hypotheses and estimate the contribution of each to the heritability of height in humans as a model complex trait.

     

    This is a good introduction: Heritability and additive genetic variance
    https://sciencehouse.wordpress.com/2013/10/04/heritability-and-additive-genetic-variance/
  24. hyperbola says:
    @res
    One of my personal favorites is people who blather on about things like "enormous literature" and "Science!" but can't be bothered to actually cite a single study. Thanks for confirming my first impression. The capitalizing of words like "MUCH" and "LARGE" was a bonus. As was the invocation of 2^536 again showing you did not understand my additive genetics comment. Looking forward to continuing this conversation in 10 years as data continues to roll in.

    P.S. When you chose your username were you looking for "hyperbole" instead?

    The statements I made are well known in the area of medical GWAS studies. I do not have any idea at what level of sophistication you might read relevant literature. I should not have to do literature research for you. Nor do I have any idea what library resources are available to you. Here are some places where you could start to educate yourself if needed.

    Scholz SW, Mhyre T, Ressom H, Shah S, Federoff HJ. 2012. Genomics and
    bioinformatics of Parkinson’s disease. Cold Spring Harb Perspect Med
    2:a009449.

    Genetics of Parkinson’s disease.
    Lill CM.
    Mol Cell Probes. 2016 Dec;30(6):386-396. doi: 10.1016/j.mcp.2016.11.001. Epub 2016 Nov 4. Review.

    Ramanan VK1, Saykin AJ.
    Am J Neurodegener Dis. 2013 Sep 18;2(3):145-75.
    Pathways to neurodegeneration: mechanistic insights from GWAS in Alzheimer’s disease, Parkinson’s disease, and related disorders.

    Levin SA. 2003. Complex adaptive systems: Exploring the known, the
    unknown and the unknowable. Bull Am Math Soc 40:3–19.

    Whitacre JM, Bender A. 2010. Networked buffering: A basic mechanism for
    distributed robustness in complex adaptive systems. Theor Biol Med
    Model 7:20.

    Whitacre JM. Biological robustness: Paradigms, mechanisms, systems principles. Front Genet. 2012;3: 1–15.

    Read More
    • Replies: @res
    When you are making an argument and using the literature as evidence it is your obligation to support your argument with specific citations. And more typically, specific excerpts. Let's see how one might do that. It is frustrating to have to do your work for you.

    Let's examine one of the statements from your comment 24:

    Then look up what percentage of the “disease load” of human beings these necessary and sufficient genes account for (ca. 1%).
     
    Then take a look at the abstract for your second reference (emphasis mine): https://www.ncbi.nlm.nih.gov/pubmed/27818248

    Almost two decades after the identification of SNCA as the first causative gene in Parkinson's disease (PD) and the subsequent understanding that genetic factors play a substantial role in PD development, our knowledge of the genetic architecture underlying this disease has vastly improved. Approximately 5-10% of patients suffer from a monogenic form of PD where autosomal dominant mutations in SNCA, LRRK2, and VPS35 and autosomal recessive mutations in PINK1, DJ-1, and Parkin cause the disease with high penetrance.

     

    Sounds more like evidence against your quoted statement than evidence for it.

    I don't have good library access at the moment and it appears medical research believes in restricted availability so that is a problem as you note. But as we will see my access appears adequate for the task at hand.

    Happily your first reference has full text available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3385936/

    There we find:

    In PD for example only ~60% of heritability is understood, depending on the population studied
     
    Gosh, only 60%. That paper talks more about future possibilities than GWAS problems AFAICT, but perhaps you can supply a quote in your support from it?

    Your third reference also has full text available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783830/

    The abstract is a bit more supportive of your assertions. Emphasis mine, but note the "most".

    Although unbiased genome-wide association studies (GWAS) have identified novel associations to neurodegenerative diseases, most of these hits explain only modest fractions of disease heritability. In addition, despite the substantial overlap of clinical and pathologic features among major neurodegenerative diseases, surprisingly few GWAS-implicated variants appear to exhibit cross-disease association. These realities suggest limitations of the focus on individual genetic variants and create challenges for the development of diagnostic and therapeutic strategies
     
    But looking closer we find:

    For example, although up to 60-80% of AD risk is estimated to derive from genetic factors [14], known genes including the uniquely large effect of APOE (apolipoprotein E) account for just half of this genetic variance

     

    Gosh, only half of 60-80%. Rather far from 1% it seems.

    Your fourth reference is interesting and freely available so here is a link: http://www.ams.org/journals/bull/2003-40-01/S0273-0979-02-00965-5/
    But it is from 2003 (lacks current genetic knowledge) and seems to focus on the unknowability of complex systems, so I'll just note there is a difference between the glass being half full (or empty) and 1% full.

    Your fifth reference only appears to address your point tangentially, but feel free to correct me: https://tbiomed.biomedcentral.com/articles/10.1186/1742-4682-7-20

    Your final reference also seems to be tangential: http://journal.frontiersin.org/article/10.3389/fgene.2012.00067/full

    Don't get me wrong, those last three references are incredibly fascinating from a philosophy of biology point of view. It's just that I don't think they work for making your point beyond being able to throw FUD around.

    I will leave it up to you (and other readers) to decide whether I (and you) am (are) capable of correctly reading and interpreting this literature.

    Next time at least try to find references that support rather than refute your position.
  25. @Peter Johnson
    Since geneticists (unlike the mainstream narrative) all acknowledge race categories as genetically important, actual work in this research area almost always restricts the sample to one race. Usually they restrict the data to the European-race subsample since that has the largest subsample. This is empirically easy to do: despite the "race is a social construct" nonsense in the mainstream narrative, it is possible to identify race with near perfect accuracy from an individual's genome. This sample restriction also has the political advantage that it keeps the researchers away from troubling findings, in terms of noticing gene-linked race differences in intelligence.

    No, it’s called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles. So the standard procedure is to test genotype-phenotype relationship in a population and then trying to replicate findings in another population to see if gene expression is the same. They’re miles away from it now.

    Read More
    • Replies: @res
    I think it is most accurate to say both your and Peter Johnson's points regarding sampling are true. I think it is fair to say any one of them (Peter made two points: largest subsample and political toxicity) would justify using the typical samples.

    Also, it's not just about gene expression being the same. Linkage disequilibrium (SNP mapping to cause) and minor allele frequency (influences detectability, especially if one allele is near fixation in one of the populations) matter.
  26. res says:
    @hyperbola
    The statements I made are well known in the area of medical GWAS studies. I do not have any idea at what level of sophistication you might read relevant literature. I should not have to do literature research for you. Nor do I have any idea what library resources are available to you. Here are some places where you could start to educate yourself if needed.


    Scholz SW, Mhyre T, Ressom H, Shah S, Federoff HJ. 2012. Genomics and
    bioinformatics of Parkinson’s disease. Cold Spring Harb Perspect Med
    2:a009449.

    Genetics of Parkinson's disease.
    Lill CM.
    Mol Cell Probes. 2016 Dec;30(6):386-396. doi: 10.1016/j.mcp.2016.11.001. Epub 2016 Nov 4. Review.

    Ramanan VK1, Saykin AJ.
    Am J Neurodegener Dis. 2013 Sep 18;2(3):145-75.
    Pathways to neurodegeneration: mechanistic insights from GWAS in Alzheimer's disease, Parkinson's disease, and related disorders.

    Levin SA. 2003. Complex adaptive systems: Exploring the known, the
    unknown and the unknowable. Bull Am Math Soc 40:3–19.

    Whitacre JM, Bender A. 2010. Networked buffering: A basic mechanism for
    distributed robustness in complex adaptive systems. Theor Biol Med
    Model 7:20.

    Whitacre JM. Biological robustness: Paradigms, mechanisms, systems principles. Front Genet. 2012;3: 1–15.

    When you are making an argument and using the literature as evidence it is your obligation to support your argument with specific citations. And more typically, specific excerpts. Let’s see how one might do that. It is frustrating to have to do your work for you.

    Let’s examine one of the statements from your comment 24:

    Then look up what percentage of the “disease load” of human beings these necessary and sufficient genes account for (ca. 1%).

    Then take a look at the abstract for your second reference (emphasis mine): https://www.ncbi.nlm.nih.gov/pubmed/27818248

    Almost two decades after the identification of SNCA as the first causative gene in Parkinson’s disease (PD) and the subsequent understanding that genetic factors play a substantial role in PD development, our knowledge of the genetic architecture underlying this disease has vastly improved. Approximately 5-10% of patients suffer from a monogenic form of PD where autosomal dominant mutations in SNCA, LRRK2, and VPS35 and autosomal recessive mutations in PINK1, DJ-1, and Parkin cause the disease with high penetrance.

    Sounds more like evidence against your quoted statement than evidence for it.

    I don’t have good library access at the moment and it appears medical research believes in restricted availability so that is a problem as you note. But as we will see my access appears adequate for the task at hand.

    Happily your first reference has full text available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3385936/

    There we find:

    In PD for example only ~60% of heritability is understood, depending on the population studied

    Gosh, only 60%. That paper talks more about future possibilities than GWAS problems AFAICT, but perhaps you can supply a quote in your support from it?

    Your third reference also has full text available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783830/

    The abstract is a bit more supportive of your assertions. Emphasis mine, but note the “most”.

    Although unbiased genome-wide association studies (GWAS) have identified novel associations to neurodegenerative diseases, most of these hits explain only modest fractions of disease heritability. In addition, despite the substantial overlap of clinical and pathologic features among major neurodegenerative diseases, surprisingly few GWAS-implicated variants appear to exhibit cross-disease association. These realities suggest limitations of the focus on individual genetic variants and create challenges for the development of diagnostic and therapeutic strategies

    But looking closer we find:

    For example, although up to 60-80% of AD risk is estimated to derive from genetic factors [14], known genes including the uniquely large effect of APOE (apolipoprotein E) account for just half of this genetic variance

    Gosh, only half of 60-80%. Rather far from 1% it seems.

    Your fourth reference is interesting and freely available so here is a link: http://www.ams.org/journals/bull/2003-40-01/S0273-0979-02-00965-5/
    But it is from 2003 (lacks current genetic knowledge) and seems to focus on the unknowability of complex systems, so I’ll just note there is a difference between the glass being half full (or empty) and 1% full.

    Your fifth reference only appears to address your point tangentially, but feel free to correct me: https://tbiomed.biomedcentral.com/articles/10.1186/1742-4682-7-20

    Your final reference also seems to be tangential: http://journal.frontiersin.org/article/10.3389/fgene.2012.00067/full

    Don’t get me wrong, those last three references are incredibly fascinating from a philosophy of biology point of view. It’s just that I don’t think they work for making your point beyond being able to throw FUD around.

    I will leave it up to you (and other readers) to decide whether I (and you) am (are) capable of correctly reading and interpreting this literature.

    Next time at least try to find references that support rather than refute your position.

    Read More
    • Replies: @hyperbola
    You present seriously distorted arguments that are clearly intentional misrepresentations. Your arguments are not made credible by such practices. I am not interested much in batting down your attempts at cherry picking of sentences. So I simply note:

    Ref. 2. ONLY 5-10% of Parkinsons disease can be explained by genes that approach the necessary and sufficient criteria despite over a decade of GWAS studies. This means that 90-95% cannot. For Parkinsons numerous genes have been identified that are neither necessary nor sufficient, i.e. indicate dependence on large networks of genes.

    References 4-6 provide you with a beginning to thinking about why we should normally expect that complex traits (e.g. "general cognitive function") will be based on large numbers of genes with limited influence of single genes. The exceptions are the "rare disease" cases that constitute about 1% of human disease load (and yes on occasion these may have been classified by medical doctors within a more inclusive category of disease, e.g. the 5% of "Parkinsons").

    Sorry you weren't up to understanding why refs 4-6 were included. "Unknowability in complex systems" should be come a major criterion in research funding decisions! Especially for "outcomes" as nebulously defined as "general cognitive function".
  27. utu says:
    @Double Juice JJ
    You must be seriously emotionally invested in this area of research to believe in the sample size excuse. Hyperbola is right, finding so little with such huge samples (in a preprint study) means such research likely is a waste of time and money. Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.

    Looking forward to continuing this conversation in 10 years, when your new excuse is "we need a 10 billion sample".

    the sample size excuse

    I thought that the sample size issue was to avoid false positives. If sample sizes N=2 then any gene on which the two individuals differ predicts the differences in phenotype traits between the two individuals in purely mathematical sense. When the trait is not binary like a disease where you either have it or not but continuous like height or IQ then obviously you need to have many genes or SNP’s to build your model of prediction to match granularity (resolution) of the trait. You need to construct a function like polygenic score of many SNP’s that would correlate with IQ, so the polygenic score must be able to assume as many values as IQ can assume. Now you have many SNP’s to chose from, so how do you know that you do not over do it. In the sky you can select M stars and come up with analytical formula that each American resident’s social security number (N=320 millions) will be predicted by star coordinates from one subset of stars out of 2^M plugging in just coordinates of the stars in the subset. But once you get the formula that works for Americans after you enlarge the set by appending all Chinese who got freshly issued SS numbers (N=320+1500 millions) the formula will fail and you will have to make entirely new formula presuming that M is large enough. This is, I thought, the main reason they want to have large samples. From the mathematical point of view the system is strongly undetermined. If there are circa 10 million SNPs the number of subsets of say of 9000 SNPs is staggeringly huge by 1000′s order of magnitude greater than the number of atoms in the universe.

    What puzzles me that they obtain extremely low variance explained fractions. Like in this study it is just 4%. I think this is so they are limiting themselves to linear models only where the effects of SNP’s are additive. Using the polygenic score is a simplest possible linear model. A nonlinear models open a new can of worms. Somewhere in Washington DC there might be a computer that have two lists in the data base: the list of SS numbers and the list of IQ test score results. The two lists establish a de facto a very nonlinear SS—>IQ relationship. You can plot it on the graph. Using thins graphs SSN predicts IQ with 100% accuracy for all the subjects who are included in the graph. You can do the same with SNP’s if you permit nonlinear relationships. The only thing that can keep them in check is the requirement of having two data sets: one on which you develop your model and one on which you test it. But this works if there is no cheating.

    Read More
    • Replies: @res

    The only thing that can keep them in check is the requirement of having two data sets: one on which you develop your model and one on which you test it. But this works if there is no cheating.
     
    Good point. Have researchers been doing this with their different data sets? You see some of this implicitly happening in Davide Piffer's work looking at different studies, but I haven't seen explicit attempts at using one dataset to validate another dataset's results and enumerating the outcomes. Do such studies exist?

    Worth noting that I would expect to see many "soft failures" (i.e. very small p values that just miss the threshold), especially when using older datasets to look at the newer (larger sample size) SNPs. It's not just the binary dis/confirm result that matters.

    At some point is it appropriate to use a different p value threshold for testing a subset of previously identified SNPs? Say for instance you are looking at 1000 SNPs for validation with another dataset. Isn't it reasonable to use a threshold of 5e-5 (0.05 / 1000) rather than the 5e-8 standard for GWAS?
  28. res says:
    @Double Juice JJ
    You must be seriously emotionally invested in this area of research to believe in the sample size excuse. Hyperbola is right, finding so little with such huge samples (in a preprint study) means such research likely is a waste of time and money. Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.

    Looking forward to continuing this conversation in 10 years, when your new excuse is "we need a 10 billion sample".

    “sample size excuse”–Interesting. Perhaps you can quote the particular statement of mine that you object to?

    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    It will be interesting to see if Steve Hsu is right about a sample of 1M possibly being enough.

    Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.

    That is not the case for height: Common SNPs explain a large proportion of heritability for human height

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232052

    Single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method by simulations based upon the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency (MAF) than the SNPs explored to date.

    Genome-wide association studies in human populations have discovered hundreds of SNPs significantly associated with complex traits1,2, yet for any one trait they typically account for only a small fraction of the genetic variation. Where is the missing heritability, the so called dark matter of the genome3,4? Suggested explanations include the existence of gene-by-gene or gene-by-environment interactions5, the common disease-rare variant hypothesis6 and the possibility that inherited epigenetic factors cause resemblance between relatives7,8. However, the variance explained by the validated SNPs is usually much less than the narrow-sense heritability, the proportion of phenotypic variance due to additive genetic variance. Non-additive genetic effects do not contribute to the narrow-sense heritability, so explanations based on non-additive effects are not relevant to the problem of missing heritability (Supplementary Note). There are two explanations for the failure of validated SNP associations to explain the estimated heritability: either the causal variants each explain such a small amount of variation that their effects fail to reach stringent significance thresholds and/or the causal variants are not in complete linkage disequilibrium (LD) with the SNPs that have been genotyped. Lack of complete LD might, for instance, occur if causal variants have lower minor allele frequency (MAF) than genotyped SNPs. Here we test these two hypotheses and estimate the contribution of each to the heritability of height in humans as a model complex trait.

    This is a good introduction: Heritability and additive genetic variance

    https://sciencehouse.wordpress.com/2013/10/04/heritability-and-additive-genetic-variance/

    Read More
    • Replies: @utu
    human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis,

    Is this for real? One can assign a random sequence of numbers to 3,925 individuals and find 200,000 SNP's with a polygenic score that predicts this sequence exactly.
    , @Double Juice JJ
    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    This is still ridiculous. Look, Their sample almost equals the population of Iceland or the pre-neolithic whole human population. But the hits can only explain 4% of the variance. That's pathetic.

    That is not the case for height: Common SNPs explain a large proportion of heritability for human height

    And height doesn't require ridiculously huge samples to find substantial SNP-based heritability. Height also is an actual measure, not some vague concept like "intelligence" that's only estimated by proxy.
  29. res says:
    @Double Juice JJ
    No, it's called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles. So the standard procedure is to test genotype-phenotype relationship in a population and then trying to replicate findings in another population to see if gene expression is the same. They're miles away from it now.

    I think it is most accurate to say both your and Peter Johnson’s points regarding sampling are true. I think it is fair to say any one of them (Peter made two points: largest subsample and political toxicity) would justify using the typical samples.

    Also, it’s not just about gene expression being the same. Linkage disequilibrium (SNP mapping to cause) and minor allele frequency (influences detectability, especially if one allele is near fixation in one of the populations) matter.

    Read More
    • Replies: @Double Juice JJ
    No, only my point is true and explains the method that's used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.

    The loci that underwent natural selection in different population are well studied and detecting positive selection signals is easy. Of course, no gene of negligible effect is visible to natural selection because of the much stronger effect of random drift. All the enthusiasm of the hereditarian crowd around these GWAS hits is laughable when you know the basics of genetics.
  30. utu says:
    @res
    "sample size excuse"--Interesting. Perhaps you can quote the particular statement of mine that you object to?

    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    It will be interesting to see if Steve Hsu is right about a sample of 1M possibly being enough.

    Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.
     
    That is not the case for height: Common SNPs explain a large proportion of heritability for human height
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232052

    Single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method by simulations based upon the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency (MAF) than the SNPs explored to date.

    Genome-wide association studies in human populations have discovered hundreds of SNPs significantly associated with complex traits1,2, yet for any one trait they typically account for only a small fraction of the genetic variation. Where is the missing heritability, the so called dark matter of the genome3,4? Suggested explanations include the existence of gene-by-gene or gene-by-environment interactions5, the common disease-rare variant hypothesis6 and the possibility that inherited epigenetic factors cause resemblance between relatives7,8. However, the variance explained by the validated SNPs is usually much less than the narrow-sense heritability, the proportion of phenotypic variance due to additive genetic variance. Non-additive genetic effects do not contribute to the narrow-sense heritability, so explanations based on non-additive effects are not relevant to the problem of missing heritability (Supplementary Note). There are two explanations for the failure of validated SNP associations to explain the estimated heritability: either the causal variants each explain such a small amount of variation that their effects fail to reach stringent significance thresholds and/or the causal variants are not in complete linkage disequilibrium (LD) with the SNPs that have been genotyped. Lack of complete LD might, for instance, occur if causal variants have lower minor allele frequency (MAF) than genotyped SNPs. Here we test these two hypotheses and estimate the contribution of each to the heritability of height in humans as a model complex trait.

     

    This is a good introduction: Heritability and additive genetic variance
    https://sciencehouse.wordpress.com/2013/10/04/heritability-and-additive-genetic-variance/

    human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis,

    Is this for real? One can assign a random sequence of numbers to 3,925 individuals and find 200,000 SNP’s with a polygenic score that predicts this sequence exactly.

    Read More
    • Replies: @res

    Is this for real?
     
    It is a real peer reviewed paper published in Nature Genetics: http://www.nature.com/ng/journal/v42/n7/full/ng.608.html

    Determining if their methods avoid the problem you describe is above my pay grade. But their claim to explain only 45% of the 80% additive genetic heritability suggests they are at least partially avoiding it.

    Their Simulation studies section looks like it tried to distinguish causal variants which seems to argue against this just being an example of overfitting.

    Two of the authors published a followup which may be helpful: https://www.ncbi.nlm.nih.gov/pubmed/21142928

    The technique is now known as GCTA. This blog post links a paper by two people at the requisite pay grade examining method validity: https://infoproc.blogspot.com/2014/03/why-does-gcta-work.html

    I see a few papers by Peter Visscher mentioned in Dr. Thompson's blog, but not this one. I do see mentions of GCTA including: http://www.unz.com/jthompson/2014/12/
    , @RaceRealist88
    Less than half of the variance in human height is explained with hundreds of thousands of variants. Will IQ be the same way?

    Though there was a decent study on height heritability:

    "... all independent variants, known and novel together explained 27.4% of heritability. By comparison, the 697 known height SNPs explain 23.3% of height heritability in the same dataset (vs. 4.1% by the new height variants identified in this ExomeChip study)” (pg 7).

    https://serval.unil.ch/resource/serval:BIB_CB04B9543EC2.P001/REF
  31. res says:
    @utu
    the sample size excuse

    I thought that the sample size issue was to avoid false positives. If sample sizes N=2 then any gene on which the two individuals differ predicts the differences in phenotype traits between the two individuals in purely mathematical sense. When the trait is not binary like a disease where you either have it or not but continuous like height or IQ then obviously you need to have many genes or SNP's to build your model of prediction to match granularity (resolution) of the trait. You need to construct a function like polygenic score of many SNP's that would correlate with IQ, so the polygenic score must be able to assume as many values as IQ can assume. Now you have many SNP's to chose from, so how do you know that you do not over do it. In the sky you can select M stars and come up with analytical formula that each American resident's social security number (N=320 millions) will be predicted by star coordinates from one subset of stars out of 2^M plugging in just coordinates of the stars in the subset. But once you get the formula that works for Americans after you enlarge the set by appending all Chinese who got freshly issued SS numbers (N=320+1500 millions) the formula will fail and you will have to make entirely new formula presuming that M is large enough. This is, I thought, the main reason they want to have large samples. From the mathematical point of view the system is strongly undetermined. If there are circa 10 million SNPs the number of subsets of say of 9000 SNPs is staggeringly huge by 1000's order of magnitude greater than the number of atoms in the universe.

    What puzzles me that they obtain extremely low variance explained fractions. Like in this study it is just 4%. I think this is so they are limiting themselves to linear models only where the effects of SNP's are additive. Using the polygenic score is a simplest possible linear model. A nonlinear models open a new can of worms. Somewhere in Washington DC there might be a computer that have two lists in the data base: the list of SS numbers and the list of IQ test score results. The two lists establish a de facto a very nonlinear SS--->IQ relationship. You can plot it on the graph. Using thins graphs SSN predicts IQ with 100% accuracy for all the subjects who are included in the graph. You can do the same with SNP's if you permit nonlinear relationships. The only thing that can keep them in check is the requirement of having two data sets: one on which you develop your model and one on which you test it. But this works if there is no cheating.

    The only thing that can keep them in check is the requirement of having two data sets: one on which you develop your model and one on which you test it. But this works if there is no cheating.

    Good point. Have researchers been doing this with their different data sets? You see some of this implicitly happening in Davide Piffer’s work looking at different studies, but I haven’t seen explicit attempts at using one dataset to validate another dataset’s results and enumerating the outcomes. Do such studies exist?

    Worth noting that I would expect to see many “soft failures” (i.e. very small p values that just miss the threshold), especially when using older datasets to look at the newer (larger sample size) SNPs. It’s not just the binary dis/confirm result that matters.

    At some point is it appropriate to use a different p value threshold for testing a subset of previously identified SNPs? Say for instance you are looking at 1000 SNPs for validation with another dataset. Isn’t it reasonable to use a threshold of 5e-5 (0.05 / 1000) rather than the 5e-8 standard for GWAS?

    Read More
    • Replies: @utu
    I think dividing the set in two subsets is a common practice when developing heuristic predictive models that can potentially have too many variables. For example you can fit n-degree polynomial to number of sun spots in 1800-1900 period and look if they predict suns spots in 1900-2000 period. You can easily find a polynomial n+k>n that fits sun spots in the whole 1800-2000 period but then your model is most likely "overfitted." This approach will keep the number of variables down in a heuristic model. GWAS is essentially a heuristic model. I think that in GWAS studies they do follow a similar procedure of having one subset for developing the model and one subset for verifying the model. This however does not mean that the model is any less heuristic if you follow this procedure. However from the mutual sizes of the subsets you can make a claim about the model robustness. But is this procedure alone sufficient to avoid the overfitting problem?

    The P-values need to be demystified. What do they really prove and do they sometime prove something opposite that what was intended. I use Davide Piffer for this purpose since you have brought him up.

    Suppose Davide Piffer found himself a girlfriend that he thinks she might be the One. But being scientifically minded he would like to verify it. He decides to estimate the P-vaule of his girlfriend being the One. He randomly selects 1000 women and see if all of them he likes less than his girlfriend. If so the P-value is less than 10^-3. But he continues and through a random search (he would glorify it with the name of Monte-Carlo method) among 100,000 women he finds 3 that he likes not less than his girlfriend. He pronounces that P-value of his girlfriend is 3*10^-5. It is the upper bound on the P-value. The question is whether is he going to keep his girlfriend and claim that P-value is 3*10^-5 or will he switch to the one among the 3 who he liked more than his girlfriend and whose P-value is less than 10^-5? Which one is the One? This is the unintended consequence of P-value estimate via random selection of SNP's sequences in GWAS methods. Davide Piffer decided to keep his girlfriend.

    that is, over a total of 819 runs, a correlation coefficient equal to or higher than 0.88 occurred 8 times
     
    In performed 819 random runs he found 8 results that produced higher correlation with countries IQ's than the set of 9 SNPs that he started with. Why he did not go with one of the eight that had the maximal correlation? Why he did not dump his girlfriend?

    The other problem of P-value estimate is as follows. How good is the estimate if you run 1 million simulations? Perhaps you should run 10 million simulations. How do you know how many? Is one teaspoon of ocean water enough to estimate the salinity of the whole ocean?

    The random search in GWAS for all practical purposes can go ad infinitum. There are circa 10 millions SNPs. Say that 1 million of these have frequencies that are smaller than 1 within the population of the set. Among them you may look for suspects that may correlate with the trait you are trying to explain. Let suppose that you want to explain a complex trait like IQ with 10,000 SNPs. How many combinations are there, how many different subsets of SNP's one can test? The number is huuuuge. If the calculator I found is correct, there are 5.8*10^24318 combinations. Clearly it is not doable in the life of universe. If you decide to try 200,000 SNPs as they did for the heigh study the number of combinations jumps to 10^217319. When you are dealing with so many possibilities if you did billion of random simulations you cannot not have much confidence in the P-value you obtained.

    The P-vaules are not the issue I am concerned with. P-values are just a BS to impress the naive and uninitiated. What I am wondering is why they get only 4% with circa 10,000 SNP's? I am sure it is not a sample size? Actually it goes the other way. For lower sample size you can explain more of variance. For sample size of N=2 any gene that is not co-present in both subjects explains the phenotypical difference between them with correlation r=1. What is the actual constraint that holds them back? Why don't they do what they did for height with the brute force fit of over 200,000 SNPs? With 200,000 variables (yes they are just binary variables) you should be able to fit any random sequence of numbers if sample size is not too large. It is the bigness of the sample size that keeps them from getting the results they want.
  32. res says:
    @utu
    human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis,

    Is this for real? One can assign a random sequence of numbers to 3,925 individuals and find 200,000 SNP's with a polygenic score that predicts this sequence exactly.

    Is this for real?

    It is a real peer reviewed paper published in Nature Genetics: http://www.nature.com/ng/journal/v42/n7/full/ng.608.html

    Determining if their methods avoid the problem you describe is above my pay grade. But their claim to explain only 45% of the 80% additive genetic heritability suggests they are at least partially avoiding it.

    Their Simulation studies section looks like it tried to distinguish causal variants which seems to argue against this just being an example of overfitting.

    Two of the authors published a followup which may be helpful: https://www.ncbi.nlm.nih.gov/pubmed/21142928

    The technique is now known as GCTA. This blog post links a paper by two people at the requisite pay grade examining method validity: https://infoproc.blogspot.com/2014/03/why-does-gcta-work.html

    I see a few papers by Peter Visscher mentioned in Dr. Thompson’s blog, but not this one. I do see mentions of GCTA including: http://www.unz.com/jthompson/2014/12/

    Read More
    • Replies: @utu
    I looked at Visscher's GCTA and tried to understand it w/o much success so far. However I thought the overfitting might a problem. And then I found this paper:

    Limitations of GCTA as a solution to the missing heritability problem
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4711841/
    Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype.
  33. @res
    "sample size excuse"--Interesting. Perhaps you can quote the particular statement of mine that you object to?

    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    It will be interesting to see if Steve Hsu is right about a sample of 1M possibly being enough.

    Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.
     
    That is not the case for height: Common SNPs explain a large proportion of heritability for human height
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3232052

    Single nucleotide polymorphisms (SNPs) discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method by simulations based upon the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium (LD) between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency (MAF) than the SNPs explored to date.

    Genome-wide association studies in human populations have discovered hundreds of SNPs significantly associated with complex traits1,2, yet for any one trait they typically account for only a small fraction of the genetic variation. Where is the missing heritability, the so called dark matter of the genome3,4? Suggested explanations include the existence of gene-by-gene or gene-by-environment interactions5, the common disease-rare variant hypothesis6 and the possibility that inherited epigenetic factors cause resemblance between relatives7,8. However, the variance explained by the validated SNPs is usually much less than the narrow-sense heritability, the proportion of phenotypic variance due to additive genetic variance. Non-additive genetic effects do not contribute to the narrow-sense heritability, so explanations based on non-additive effects are not relevant to the problem of missing heritability (Supplementary Note). There are two explanations for the failure of validated SNP associations to explain the estimated heritability: either the causal variants each explain such a small amount of variation that their effects fail to reach stringent significance thresholds and/or the causal variants are not in complete linkage disequilibrium (LD) with the SNPs that have been genotyped. Lack of complete LD might, for instance, occur if causal variants have lower minor allele frequency (MAF) than genotyped SNPs. Here we test these two hypotheses and estimate the contribution of each to the heritability of height in humans as a model complex trait.

     

    This is a good introduction: Heritability and additive genetic variance
    https://sciencehouse.wordpress.com/2013/10/04/heritability-and-additive-genetic-variance/

    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    This is still ridiculous. Look, Their sample almost equals the population of Iceland or the pre-neolithic whole human population. But the hits can only explain 4% of the variance. That’s pathetic.

    That is not the case for height: Common SNPs explain a large proportion of heritability for human height

    And height doesn’t require ridiculously huge samples to find substantial SNP-based heritability. Height also is an actual measure, not some vague concept like “intelligence” that’s only estimated by proxy.

    Read More
    • Replies: @res

    And height doesn’t require ridiculously huge samples to find substantial SNP-based heritability.
     
    So you at least admit that. Good. At least you aren't a complete genetic denialist.

    Any thoughts on the estimates of sample sizes required given by this paper?
    http://www.biorxiv.org/content/early/2017/08/11/175406
    Free full text is available. See the bottom half of Figure 3 for their estimates of % genetic variance explained vs. sample size for both continuous traits (e.g. height, IQ) and disease traits.
    Their estimate for height was a sample size of 200k would explain about 40% of variance. For IQ a sample size of 200k would explain about 5% of variance (pretty close to the 4% we see from 280k).

    And here is some empirical data on the number of GWAS hits by sample size (2012): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/bin/gr2.jpg

    And a 2017 update: http://www.cell.com/ajhg/fulltext/S0002-9297(17)30240-9

    But the hits can only explain 4% of the variance. That’s pathetic.

     

    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?
  34. @res
    I think it is most accurate to say both your and Peter Johnson's points regarding sampling are true. I think it is fair to say any one of them (Peter made two points: largest subsample and political toxicity) would justify using the typical samples.

    Also, it's not just about gene expression being the same. Linkage disequilibrium (SNP mapping to cause) and minor allele frequency (influences detectability, especially if one allele is near fixation in one of the populations) matter.

    No, only my point is true and explains the method that’s used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.

    The loci that underwent natural selection in different population are well studied and detecting positive selection signals is easy. Of course, no gene of negligible effect is visible to natural selection because of the much stronger effect of random drift. All the enthusiasm of the hereditarian crowd around these GWAS hits is laughable when you know the basics of genetics.

    Read More
  35. @Double Juice JJ
    No, only my point is true and explains the method that's used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.

    The loci that underwent natural selection in different population are well studied and detecting positive selection signals is easy. Of course, no gene of negligible effect is visible to natural selection because of the much stronger effect of random drift. All the enthusiasm of the hereditarian crowd around these GWAS hits is laughable when you know the basics of genetics.

    Disagree: res

    Care to elaborate?

    Read More
    • Replies: @res
    That's a fair question. First, I was mostly disagreeing with your first paragraph (and last sentence, which I in turn consider laughable). For the first part of your last paragraph I might quibble about "easy" and add a caveat about "of sufficient effect size", but I think you are basically on target there.

    So the part I disagree with that is worth discussing:

    No, only my point is true and explains the method that’s used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.
     
    OK. Let me try to quote or restate the three points (with numbers for easy reference) I see explaining why studies are done on uniform white populations. If either you or Peter disagree with my version please correct me.

    1. Double Juice JJ (comment 28): "it's called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles."

    2. Peter Johnson (comment 11): "Usually they restrict the data to the European-race subsample since that has the largest subsample."

    3. Peter Johnson (comment 11): "This sample restriction also has the political advantage that it keeps the researchers away from troubling findings, in terms of noticing gene-linked race differences in intelligence."

    You and I agree 1. is true. I would argue there is another issue there with things like different allele frequencies and linkage disequilibrium between the races complicating the study assumptions and interpretation.

    Regarding 2., that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?

    Regarding 3., there we have to rely on "the dog that didn't bark". A 1 SD difference in phenotypic IQ between blacks and whites provides an interesting genetic question. Why have only a few heretics looked into it? Do you have an explanation other than 3.? Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics. While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )

    I sincerely have trouble understanding how people with a good understanding of genetics can be so dismissive of GWAS given the demonstrated trends we have seen with increasing sample sizes resulting in more SNP hits and more variance explained. Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don't find Dr. Thompson's post and the evidence I have given persuasive.

    Many people dismissive of GWAS seem to have gotten stuck in the negative results they heard about in the early 2000s (before correcting for multiple hypothesis testing was common), but I think that can usually be seen by the arguments used and am not seeing that here.
  36. res says:
    @Double Juice JJ
    Did you happen to look at Table 1 included by Dr. Thompson above? Notice any trend with the number of SNPs discovered with increasing sample size? There is a plot showing the trend on page 33 of the paper.

    This is still ridiculous. Look, Their sample almost equals the population of Iceland or the pre-neolithic whole human population. But the hits can only explain 4% of the variance. That's pathetic.

    That is not the case for height: Common SNPs explain a large proportion of heritability for human height

    And height doesn't require ridiculously huge samples to find substantial SNP-based heritability. Height also is an actual measure, not some vague concept like "intelligence" that's only estimated by proxy.

    And height doesn’t require ridiculously huge samples to find substantial SNP-based heritability.

    So you at least admit that. Good. At least you aren’t a complete genetic denialist.

    Any thoughts on the estimates of sample sizes required given by this paper?

    http://www.biorxiv.org/content/early/2017/08/11/175406

    Free full text is available. See the bottom half of Figure 3 for their estimates of % genetic variance explained vs. sample size for both continuous traits (e.g. height, IQ) and disease traits.
    Their estimate for height was a sample size of 200k would explain about 40% of variance. For IQ a sample size of 200k would explain about 5% of variance (pretty close to the 4% we see from 280k).

    And here is some empirical data on the number of GWAS hits by sample size (2012): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/

    And a 2017 update: http://www.cell.com/ajhg/fulltext/S0002-9297(17)30240-9

    But the hits can only explain 4% of the variance. That’s pathetic.

    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    Read More
    • Replies: @Double Juice JJ
    Quoting studies is good, understanding them is better

    SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.
     
    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    It'll stay pathetic as long as you can't come up with something like that in explaining group differences.

    The evolutionary history of the human pygmy phenotype (small body size), a characteristic of African and Southeast Asian rainforest hunter-gatherers, is largely unknown. Here we use a genome-wide admixture mapping analysis to identify 16 genomic regions that are significantly associated with the pygmy phenotype in the Batwa, a rainforest hunter-gatherer population from Uganda (east central Africa). The identified genomic regions have multiple attributes that provide supporting evidence of genuine association with the pygmy phenotype, including enrichments for SNPs previously associated with stature variation in Europeans and for genes with growth hormone receptor and regulation functions. To test adaptive evolutionary hypotheses, we computed the haplotype-based integrated haplotype score (iHS) statistic and the level of population differentiation (FST) between the Batwa and their agricultural neighbors, the Bakiga, for each genomic SNP. Both |iHS| and FST values were significantly higher for SNPs within the Batwa pygmy phenotype-associated regions than the remainder of the genome, a signature of polygenic adaptation. In contrast, when we expanded our analysis to include Baka rainforest hunter-gatherers from Cameroon and Gabon (west central Africa) and Nzebi and Nzime neighboring agriculturalists, we did not observe elevated |iHS| or FST values in these genomic regions. Together, these results suggest adaptive and at least partially convergent origins of the pygmy phenotype even within Africa, supporting the hypothesis that small body size confers a selective advantage for tropical rainforest hunter-gatherers but raising questions about the antiquity of this behavior.
     
    http://www.pnas.org/content/111/35/E3596.full

    Oh, and don't tell me "muh politically correct research, blah, blah, blah". Nothing prevents hereditarian "scholars" from getting the relevant degrees and getting funding from the pioneer fund.
    , @Double Juice JJ
    On the relevance of rare variants:

    In 2014, GIANT, studying roughly 250,000 people, brought the total number of known genetic variants to nearly 700 -- in more than 400 spots in the genome. This effort involved a powerful method called genome-wide association study (GWAS), which rapidly scans across the genomes of large populations for markers that track with a particular trait. GWAS are good at finding common genetic variants, but nearly all of the identified variants alter height by less than 1 mm (less than 1/20 of an inch). GWAS studies are not as good at capturing uncommon genetic variants, which can have larger effects. Finally, the common variants that track with traits tend to lie mostly outside the protein-coding parts of genes, making it harder to figure out which genes they affect.

    So in the new study, the GIANT investigators used a different technology: the ExomeChip, which tested for a catalogue of nearly 200,000 known variants that are less common and that alter the function of protein-coding genes. These variants point more directly to genes and can be used as a shortcut to figuring out which genes are important for a specific disease or trait. Most had not been assessed in prior genetic studies of height.

    Using ExomeChip data from a total of 711,428 adults (an initial 460,000 people and about 250,000 more to validate the findings), the investigators identified 83 uncommon variants associated with adult height: 51 "low-frequency" variants (found in less than 5 percent of people) and 32 rare variants (found in less than 0.5 percent).

    With these new findings, 27.4 percent of the heritability of height is now accounted for (up from 20 percent in earlier studies), with most heritability still explained by common variants.

    Twenty-four of the newly discovered variants affect height by more than 1 cm (4/10 of an inch), larger effects than typically seen with common variants. "This finding matches a pattern seen in other genetic studies, where the more potent variants are rarer in the population," says Hirschhorn, who is also an endocrinologist at Boston Children's and a professor of pediatrics and genetics at Harvard Medical School.
     
    https://www.sciencedaily.com/releases/2017/02/170201131513.htm
  37. @res

    And height doesn’t require ridiculously huge samples to find substantial SNP-based heritability.
     
    So you at least admit that. Good. At least you aren't a complete genetic denialist.

    Any thoughts on the estimates of sample sizes required given by this paper?
    http://www.biorxiv.org/content/early/2017/08/11/175406
    Free full text is available. See the bottom half of Figure 3 for their estimates of % genetic variance explained vs. sample size for both continuous traits (e.g. height, IQ) and disease traits.
    Their estimate for height was a sample size of 200k would explain about 40% of variance. For IQ a sample size of 200k would explain about 5% of variance (pretty close to the 4% we see from 280k).

    And here is some empirical data on the number of GWAS hits by sample size (2012): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/bin/gr2.jpg

    And a 2017 update: http://www.cell.com/ajhg/fulltext/S0002-9297(17)30240-9

    But the hits can only explain 4% of the variance. That’s pathetic.

     

    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    Quoting studies is good, understanding them is better

    SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.

    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    It’ll stay pathetic as long as you can’t come up with something like that in explaining group differences.

    The evolutionary history of the human pygmy phenotype (small body size), a characteristic of African and Southeast Asian rainforest hunter-gatherers, is largely unknown. Here we use a genome-wide admixture mapping analysis to identify 16 genomic regions that are significantly associated with the pygmy phenotype in the Batwa, a rainforest hunter-gatherer population from Uganda (east central Africa). The identified genomic regions have multiple attributes that provide supporting evidence of genuine association with the pygmy phenotype, including enrichments for SNPs previously associated with stature variation in Europeans and for genes with growth hormone receptor and regulation functions. To test adaptive evolutionary hypotheses, we computed the haplotype-based integrated haplotype score (iHS) statistic and the level of population differentiation (FST) between the Batwa and their agricultural neighbors, the Bakiga, for each genomic SNP. Both |iHS| and FST values were significantly higher for SNPs within the Batwa pygmy phenotype-associated regions than the remainder of the genome, a signature of polygenic adaptation. In contrast, when we expanded our analysis to include Baka rainforest hunter-gatherers from Cameroon and Gabon (west central Africa) and Nzebi and Nzime neighboring agriculturalists, we did not observe elevated |iHS| or FST values in these genomic regions. Together, these results suggest adaptive and at least partially convergent origins of the pygmy phenotype even within Africa, supporting the hypothesis that small body size confers a selective advantage for tropical rainforest hunter-gatherers but raising questions about the antiquity of this behavior.

    http://www.pnas.org/content/111/35/E3596.full

    Oh, and don’t tell me “muh politically correct research, blah, blah, blah”. Nothing prevents hereditarian “scholars” from getting the relevant degrees and getting funding from the pioneer fund.

    Read More
    • Replies: @res

    Quoting studies is good, understanding them is better
     
    Indeed. Perhaps I am being dense, but where is my lack of understanding you are implying? Please quote my words with which you disagree.

    It’ll stay pathetic as long as you can’t come up with something like that in explaining group differences.
     
    That is an interesting redirect. It simultaneously tells me what you care about here and explains why I was so confused by your objections (I missed that that was your focus).

    I will remind you I originally objected to "But the hits can only explain 4% of the variance. That’s pathetic." which had nothing to do with group differences.

    Oh, and don’t tell me “muh politically correct research, blah, blah, blah”. Nothing prevents hereditarian “scholars” from getting the relevant degrees and getting funding from the pioneer fund.
     
    Except for wanting to avoid career suicide. And getting academic advisers and funders to sign off on controversial research. If you are an academic you know this better than I do. Here I just have to conclude you are being disingenuous. Do we really need to have a discussion about Jason Richwine and his much less contentious research? Or James Watson, who I would have thought was in an unassailable position?

    How about we agree to disagree regarding the genetics of group differences and talk about other aspects of GWAS? You clearly have a lot of knowledge to offer on this topic. Unless "muh non-PC comments" have been enough to get me blacklisted already.

    P.S. Thanks for the pygmy link. Different selection pressures operating on different populations is an important driver of genetic divergence IMHO.
    , @utu

    We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis
     
    Do you have opinion about the validity of this study? It is grossly undetermined system of 200k binary variables and only 4k sample. To validate this results one must have a sample that is significantly larger than the number of variables.
  38. @res

    And height doesn’t require ridiculously huge samples to find substantial SNP-based heritability.
     
    So you at least admit that. Good. At least you aren't a complete genetic denialist.

    Any thoughts on the estimates of sample sizes required given by this paper?
    http://www.biorxiv.org/content/early/2017/08/11/175406
    Free full text is available. See the bottom half of Figure 3 for their estimates of % genetic variance explained vs. sample size for both continuous traits (e.g. height, IQ) and disease traits.
    Their estimate for height was a sample size of 200k would explain about 40% of variance. For IQ a sample size of 200k would explain about 5% of variance (pretty close to the 4% we see from 280k).

    And here is some empirical data on the number of GWAS hits by sample size (2012): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3257326/bin/gr2.jpg

    And a 2017 update: http://www.cell.com/ajhg/fulltext/S0002-9297(17)30240-9

    But the hits can only explain 4% of the variance. That’s pathetic.

     

    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    On the relevance of rare variants:

    In 2014, GIANT, studying roughly 250,000 people, brought the total number of known genetic variants to nearly 700 — in more than 400 spots in the genome. This effort involved a powerful method called genome-wide association study (GWAS), which rapidly scans across the genomes of large populations for markers that track with a particular trait. GWAS are good at finding common genetic variants, but nearly all of the identified variants alter height by less than 1 mm (less than 1/20 of an inch). GWAS studies are not as good at capturing uncommon genetic variants, which can have larger effects. Finally, the common variants that track with traits tend to lie mostly outside the protein-coding parts of genes, making it harder to figure out which genes they affect.

    So in the new study, the GIANT investigators used a different technology: the ExomeChip, which tested for a catalogue of nearly 200,000 known variants that are less common and that alter the function of protein-coding genes. These variants point more directly to genes and can be used as a shortcut to figuring out which genes are important for a specific disease or trait. Most had not been assessed in prior genetic studies of height.

    Using ExomeChip data from a total of 711,428 adults (an initial 460,000 people and about 250,000 more to validate the findings), the investigators identified 83 uncommon variants associated with adult height: 51 “low-frequency” variants (found in less than 5 percent of people) and 32 rare variants (found in less than 0.5 percent).

    With these new findings, 27.4 percent of the heritability of height is now accounted for (up from 20 percent in earlier studies), with most heritability still explained by common variants.

    Twenty-four of the newly discovered variants affect height by more than 1 cm (4/10 of an inch), larger effects than typically seen with common variants. “This finding matches a pattern seen in other genetic studies, where the more potent variants are rarer in the population,” says Hirschhorn, who is also an endocrinologist at Boston Children’s and a professor of pediatrics and genetics at Harvard Medical School.

    https://www.sciencedaily.com/releases/2017/02/170201131513.htm

    Read More
    • Replies: @res
    Thanks for telling me about that paper. It was new to me. Here is a more direct link: https://www.nature.com/nature/journal/v542/n7640/full/nature21039.html
    Their Figure 1 is very interesting. I did not realize the trend was that strong towards larger effect sizes at lower frequencies. That makes sense to me for the negative variants (rare harmful mutations), but I would have expected the positive variants to be selected for (i.e. increased in MAF) unless there is a countervailing force. It does raise an interesting question about whether height is the primary survival related issue with these "height SNPs." There is also a question of whether the larger effect size trend is partly an artifact of only being able to detect large effect sizes at that MAF.

    I agree rare variants matter. The question is: how much? If we can trust the Visscher GCTA results, they are explaining 45% of the height variance with SNPs. That leaves 35% variance from additive genetic effects to be explained.

    How much % variance explained can a variant that appears in <0.5% of people offer? % variance explained depends on both effect size and MAF. There would need to be a long tail of rare variants to make up for the low frequency. And I am not seeing that long tail in Figure 1, though who knows if the increasing effect size continues for even lower MAF.

    Am I confused or does that paper actually help explain why large sample sizes are important? (i.e. rather than being something to ridicule)
  39. res says:
    @Double Juice JJ
    Disagree: res

    Care to elaborate?

    That’s a fair question. First, I was mostly disagreeing with your first paragraph (and last sentence, which I in turn consider laughable). For the first part of your last paragraph I might quibble about “easy” and add a caveat about “of sufficient effect size”, but I think you are basically on target there.

    So the part I disagree with that is worth discussing:

    No, only my point is true and explains the method that’s used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.

    OK. Let me try to quote or restate the three points (with numbers for easy reference) I see explaining why studies are done on uniform white populations. If either you or Peter disagree with my version please correct me.

    1. Double Juice JJ (comment 28): “it’s called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles.”

    2. Peter Johnson (comment 11): “Usually they restrict the data to the European-race subsample since that has the largest subsample.”

    3. Peter Johnson (comment 11): “This sample restriction also has the political advantage that it keeps the researchers away from troubling findings, in terms of noticing gene-linked race differences in intelligence.”

    You and I agree 1. is true. I would argue there is another issue there with things like different allele frequencies and linkage disequilibrium between the races complicating the study assumptions and interpretation.

    Regarding 2., that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?

    Regarding 3., there we have to rely on “the dog that didn’t bark”. A 1 SD difference in phenotypic IQ between blacks and whites provides an interesting genetic question. Why have only a few heretics looked into it? Do you have an explanation other than 3.? Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics. While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )

    I sincerely have trouble understanding how people with a good understanding of genetics can be so dismissive of GWAS given the demonstrated trends we have seen with increasing sample sizes resulting in more SNP hits and more variance explained. Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don’t find Dr. Thompson’s post and the evidence I have given persuasive.

    Many people dismissive of GWAS seem to have gotten stuck in the negative results they heard about in the early 2000s (before correcting for multiple hypothesis testing was common), but I think that can usually be seen by the arguments used and am not seeing that here.

    Read More
    • Replies: @Double Juice JJ

    that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?
     
    Well, height research uses racially homogeneous samples too, and they find things like that:

    Height is a complex trait under strong genetic influence. To date, numerous genetic loci have been associated with height in individuals of European ancestry. However, few large-scale discovery genome-wide association studies (GWAS) of height in minority populations have been conducted and thus information about population-specific height regulation is limited. We conducted a GWA analysis of height in 8149 African-American (AA) women from the Women's Health Initiative. Genetic variants with P< 5 × 10−5 (n = 169) were followed up in a replication data set (n = 20 809) and meta-analyzed in a total of 28 958 AAs and African-descent individuals. Twelve single-nucleotide polymorphisms (SNPs) representing 7 independent loci were significantly associated with height at P < 5 × 10−8. We identified novel SNPs in 17q23 (TMEM100/PCTP) and Xp22.3 (ARSE) reflecting population-specific regulation of height in AAs and replicated five loci previously reported in European-descent populations [4p15/LCORL, 11q13/SERPINH1, 12q14/HMGA2, 17q23/MAP3K3 (mitogen-activated protein kinase3) and 18q21/DYM]. In addition, we performed an admixture mapping analysis of height which is both complementary and supportive to the GWA analysis and suggests potential associations between ancestry and height on chromosomes 4 (4q21), 15 (15q26) and 17 (17q23). Our findings provide insight into the genetic architecture of height and support the investigation of non-European-descent populations for identifying genetic factors associated with complex traits. Specifically, we identify new loci that may reflect population-specific regulation of height and report several known height loci that are important in determining height in African-descent populations.
     
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259012/

    While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )
     
    If the phenotype doesn't differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don't need to control for race.

    Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don’t find Dr. Thompson’s post and the evidence I have given persuasive.
     
    It's an error to assume height and intelligence research are comparable. Height research doesn't requires unreasonable sample sizes to find substantial heritability.
    , @Double Juice JJ

    Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics.
     
    Uh! I missed that... Are you talking about Piffer's ridiculous numerology that never makes it through peer review? Even the very HBD-friendly intelligence journal trashed his "work". Piffer uses a dozen pathetic GWAS hits found in an European sample, he associates them with similarly pathetic IQ data and them claims they are part of a "naturally selected polygenic genotype" without elaborating on the signals of natural selection and them goes whining "muh PC journals won't publish me, so f**k peer review".

    Who does he think he's fooling? It doesn't work like that. You need to replicate the hits in other populations, which will also lead to discovering novel population-specific loci (especially when it comes to Africans and their huge genetic diversity). And a complementary admixture/genetic distance analysis will come to confirm the pattern. And of course, you need a much better measure than Lynn's laughable global IQ data.
  40. res says:
    @Double Juice JJ
    On the relevance of rare variants:

    In 2014, GIANT, studying roughly 250,000 people, brought the total number of known genetic variants to nearly 700 -- in more than 400 spots in the genome. This effort involved a powerful method called genome-wide association study (GWAS), which rapidly scans across the genomes of large populations for markers that track with a particular trait. GWAS are good at finding common genetic variants, but nearly all of the identified variants alter height by less than 1 mm (less than 1/20 of an inch). GWAS studies are not as good at capturing uncommon genetic variants, which can have larger effects. Finally, the common variants that track with traits tend to lie mostly outside the protein-coding parts of genes, making it harder to figure out which genes they affect.

    So in the new study, the GIANT investigators used a different technology: the ExomeChip, which tested for a catalogue of nearly 200,000 known variants that are less common and that alter the function of protein-coding genes. These variants point more directly to genes and can be used as a shortcut to figuring out which genes are important for a specific disease or trait. Most had not been assessed in prior genetic studies of height.

    Using ExomeChip data from a total of 711,428 adults (an initial 460,000 people and about 250,000 more to validate the findings), the investigators identified 83 uncommon variants associated with adult height: 51 "low-frequency" variants (found in less than 5 percent of people) and 32 rare variants (found in less than 0.5 percent).

    With these new findings, 27.4 percent of the heritability of height is now accounted for (up from 20 percent in earlier studies), with most heritability still explained by common variants.

    Twenty-four of the newly discovered variants affect height by more than 1 cm (4/10 of an inch), larger effects than typically seen with common variants. "This finding matches a pattern seen in other genetic studies, where the more potent variants are rarer in the population," says Hirschhorn, who is also an endocrinologist at Boston Children's and a professor of pediatrics and genetics at Harvard Medical School.
     
    https://www.sciencedaily.com/releases/2017/02/170201131513.htm

    Thanks for telling me about that paper. It was new to me. Here is a more direct link: https://www.nature.com/nature/journal/v542/n7640/full/nature21039.html
    Their Figure 1 is very interesting. I did not realize the trend was that strong towards larger effect sizes at lower frequencies. That makes sense to me for the negative variants (rare harmful mutations), but I would have expected the positive variants to be selected for (i.e. increased in MAF) unless there is a countervailing force. It does raise an interesting question about whether height is the primary survival related issue with these “height SNPs.” There is also a question of whether the larger effect size trend is partly an artifact of only being able to detect large effect sizes at that MAF.

    I agree rare variants matter. The question is: how much? If we can trust the Visscher GCTA results, they are explaining 45% of the height variance with SNPs. That leaves 35% variance from additive genetic effects to be explained.

    How much % variance explained can a variant that appears in <0.5% of people offer? % variance explained depends on both effect size and MAF. There would need to be a long tail of rare variants to make up for the low frequency. And I am not seeing that long tail in Figure 1, though who knows if the increasing effect size continues for even lower MAF.

    Am I confused or does that paper actually help explain why large sample sizes are important? (i.e. rather than being something to ridicule)

    Read More
    • Replies: @Double Juice JJ
    Thank you res, you're a polite debater. Sorry if I'm condescending and aggressive.

    but I would have expected the positive variants to be selected for (i.e. increased in MAF) unless there is a countervailing force. It does raise an interesting question about whether height is the primary survival related issue with these “height SNPs.” There is also a question of whether the larger effect size trend is partly an artifact of only being able to detect large effect sizes at that MAF.
     
    Well, there is nothing automatic about natural selection. Genotypes are only selected if they provide a substantial survival and reproductive advantage. Which doesn't seem to be the case with IQ and height in most population. You can primarily test whether a phenotype is selected by looking at the variance. Natural selection results in very low variance, nothing like the huge IQ bell curve. Also, when a trait is selected, you notice very high between group differences and consistent geographic distribution (equatorial populations are dark, rainforest dwellers are short...) with very low average. IQ doesn't follow the pattern either. Those traits tend to be unmalleable: no Flynn effect on skin color or eye shape. And genetic studies easily spot signals of selection.

    Those complex traits have a completely different genetic architecture.

    Am I confused or does that paper actually help explain why large sample sizes are important? (i.e. rather than being something to ridicule)
     
    Large sample sizes are ridicule when someone is stubbornly trying to prove that common variants are responsible for most variance yet can only explain 4% of it. There is a moment when you just need to stop grasping at straws.
  41. res says:
    @Double Juice JJ
    Quoting studies is good, understanding them is better

    SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.
     
    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    It'll stay pathetic as long as you can't come up with something like that in explaining group differences.

    The evolutionary history of the human pygmy phenotype (small body size), a characteristic of African and Southeast Asian rainforest hunter-gatherers, is largely unknown. Here we use a genome-wide admixture mapping analysis to identify 16 genomic regions that are significantly associated with the pygmy phenotype in the Batwa, a rainforest hunter-gatherer population from Uganda (east central Africa). The identified genomic regions have multiple attributes that provide supporting evidence of genuine association with the pygmy phenotype, including enrichments for SNPs previously associated with stature variation in Europeans and for genes with growth hormone receptor and regulation functions. To test adaptive evolutionary hypotheses, we computed the haplotype-based integrated haplotype score (iHS) statistic and the level of population differentiation (FST) between the Batwa and their agricultural neighbors, the Bakiga, for each genomic SNP. Both |iHS| and FST values were significantly higher for SNPs within the Batwa pygmy phenotype-associated regions than the remainder of the genome, a signature of polygenic adaptation. In contrast, when we expanded our analysis to include Baka rainforest hunter-gatherers from Cameroon and Gabon (west central Africa) and Nzebi and Nzime neighboring agriculturalists, we did not observe elevated |iHS| or FST values in these genomic regions. Together, these results suggest adaptive and at least partially convergent origins of the pygmy phenotype even within Africa, supporting the hypothesis that small body size confers a selective advantage for tropical rainforest hunter-gatherers but raising questions about the antiquity of this behavior.
     
    http://www.pnas.org/content/111/35/E3596.full

    Oh, and don't tell me "muh politically correct research, blah, blah, blah". Nothing prevents hereditarian "scholars" from getting the relevant degrees and getting funding from the pioneer fund.

    Quoting studies is good, understanding them is better

    Indeed. Perhaps I am being dense, but where is my lack of understanding you are implying? Please quote my words with which you disagree.

    It’ll stay pathetic as long as you can’t come up with something like that in explaining group differences.

    That is an interesting redirect. It simultaneously tells me what you care about here and explains why I was so confused by your objections (I missed that that was your focus).

    I will remind you I originally objected to “But the hits can only explain 4% of the variance. That’s pathetic.” which had nothing to do with group differences.

    Oh, and don’t tell me “muh politically correct research, blah, blah, blah”. Nothing prevents hereditarian “scholars” from getting the relevant degrees and getting funding from the pioneer fund.

    Except for wanting to avoid career suicide. And getting academic advisers and funders to sign off on controversial research. If you are an academic you know this better than I do. Here I just have to conclude you are being disingenuous. Do we really need to have a discussion about Jason Richwine and his much less contentious research? Or James Watson, who I would have thought was in an unassailable position?

    How about we agree to disagree regarding the genetics of group differences and talk about other aspects of GWAS? You clearly have a lot of knowledge to offer on this topic. Unless “muh non-PC comments” have been enough to get me blacklisted already.

    P.S. Thanks for the pygmy link. Different selection pressures operating on different populations is an important driver of genetic divergence IMHO.

    Read More
    • Replies: @Double Juice JJ

    Perhaps I am being dense, but where is my lack of understanding you are implying? Please quote my words with which you disagree.
     
    I have to disagree when you say height research is facing the same missing heritability crisis because of sample size. It's not.

    I will remind you I originally objected to “But the hits can only explain 4% of the variance. That’s pathetic.” which had nothing to do with group differences.
     
    Oh yeah, only reacting to the cringy hereditarian euphoria of the post and comments section. Nothing personal.

    Except for wanting to avoid career suicide. And getting academic advisers and funders to sign off on controversial research. If you are an academic you know this better than I do.
     
    I am not an academic and I don't know much more than you. All I know is that the pioneer fund is generously granting scientific racist "research", that guys like Rushton/Flushton, Harpending/Harpoondick or Hsu/Shoe are doing their stuff un-bothered.

    Do we really need to have a discussion about Jason Richwine and his much less contentious research? Or James Watson, who I would have thought was in an unassailable position?
     
    They did no research. They only threw random comments on race.

    How about we agree to disagree regarding the genetics of group differences and talk about other aspects of GWAS?
     
    Sounds good.
  42. utu says:
    @res

    The only thing that can keep them in check is the requirement of having two data sets: one on which you develop your model and one on which you test it. But this works if there is no cheating.
     
    Good point. Have researchers been doing this with their different data sets? You see some of this implicitly happening in Davide Piffer's work looking at different studies, but I haven't seen explicit attempts at using one dataset to validate another dataset's results and enumerating the outcomes. Do such studies exist?

    Worth noting that I would expect to see many "soft failures" (i.e. very small p values that just miss the threshold), especially when using older datasets to look at the newer (larger sample size) SNPs. It's not just the binary dis/confirm result that matters.

    At some point is it appropriate to use a different p value threshold for testing a subset of previously identified SNPs? Say for instance you are looking at 1000 SNPs for validation with another dataset. Isn't it reasonable to use a threshold of 5e-5 (0.05 / 1000) rather than the 5e-8 standard for GWAS?

    I think dividing the set in two subsets is a common practice when developing heuristic predictive models that can potentially have too many variables. For example you can fit n-degree polynomial to number of sun spots in 1800-1900 period and look if they predict suns spots in 1900-2000 period. You can easily find a polynomial n+k>n that fits sun spots in the whole 1800-2000 period but then your model is most likely “overfitted.” This approach will keep the number of variables down in a heuristic model. GWAS is essentially a heuristic model. I think that in GWAS studies they do follow a similar procedure of having one subset for developing the model and one subset for verifying the model. This however does not mean that the model is any less heuristic if you follow this procedure. However from the mutual sizes of the subsets you can make a claim about the model robustness. But is this procedure alone sufficient to avoid the overfitting problem?

    The P-values need to be demystified. What do they really prove and do they sometime prove something opposite that what was intended. I use Davide Piffer for this purpose since you have brought him up.

    Suppose Davide Piffer found himself a girlfriend that he thinks she might be the One. But being scientifically minded he would like to verify it. He decides to estimate the P-vaule of his girlfriend being the One. He randomly selects 1000 women and see if all of them he likes less than his girlfriend. If so the P-value is less than 10^-3. But he continues and through a random search (he would glorify it with the name of Monte-Carlo method) among 100,000 women he finds 3 that he likes not less than his girlfriend. He pronounces that P-value of his girlfriend is 3*10^-5. It is the upper bound on the P-value. The question is whether is he going to keep his girlfriend and claim that P-value is 3*10^-5 or will he switch to the one among the 3 who he liked more than his girlfriend and whose P-value is less than 10^-5? Which one is the One? This is the unintended consequence of P-value estimate via random selection of SNP’s sequences in GWAS methods. Davide Piffer decided to keep his girlfriend.

    that is, over a total of 819 runs, a correlation coefficient equal to or higher than 0.88 occurred 8 times

    In performed 819 random runs he found 8 results that produced higher correlation with countries IQ’s than the set of 9 SNPs that he started with. Why he did not go with one of the eight that had the maximal correlation? Why he did not dump his girlfriend?

    The other problem of P-value estimate is as follows. How good is the estimate if you run 1 million simulations? Perhaps you should run 10 million simulations. How do you know how many? Is one teaspoon of ocean water enough to estimate the salinity of the whole ocean?

    The random search in GWAS for all practical purposes can go ad infinitum. There are circa 10 millions SNPs. Say that 1 million of these have frequencies that are smaller than 1 within the population of the set. Among them you may look for suspects that may correlate with the trait you are trying to explain. Let suppose that you want to explain a complex trait like IQ with 10,000 SNPs. How many combinations are there, how many different subsets of SNP’s one can test? The number is huuuuge. If the calculator I found is correct, there are 5.8*10^24318 combinations. Clearly it is not doable in the life of universe. If you decide to try 200,000 SNPs as they did for the heigh study the number of combinations jumps to 10^217319. When you are dealing with so many possibilities if you did billion of random simulations you cannot not have much confidence in the P-value you obtained.

    The P-vaules are not the issue I am concerned with. P-values are just a BS to impress the naive and uninitiated. What I am wondering is why they get only 4% with circa 10,000 SNP’s? I am sure it is not a sample size? Actually it goes the other way. For lower sample size you can explain more of variance. For sample size of N=2 any gene that is not co-present in both subjects explains the phenotypical difference between them with correlation r=1. What is the actual constraint that holds them back? Why don’t they do what they did for height with the brute force fit of over 200,000 SNPs? With 200,000 variables (yes they are just binary variables) you should be able to fit any random sequence of numbers if sample size is not too large. It is the bigness of the sample size that keeps them from getting the results they want.

    Read More
  43. utu says:
    @res

    Is this for real?
     
    It is a real peer reviewed paper published in Nature Genetics: http://www.nature.com/ng/journal/v42/n7/full/ng.608.html

    Determining if their methods avoid the problem you describe is above my pay grade. But their claim to explain only 45% of the 80% additive genetic heritability suggests they are at least partially avoiding it.

    Their Simulation studies section looks like it tried to distinguish causal variants which seems to argue against this just being an example of overfitting.

    Two of the authors published a followup which may be helpful: https://www.ncbi.nlm.nih.gov/pubmed/21142928

    The technique is now known as GCTA. This blog post links a paper by two people at the requisite pay grade examining method validity: https://infoproc.blogspot.com/2014/03/why-does-gcta-work.html

    I see a few papers by Peter Visscher mentioned in Dr. Thompson's blog, but not this one. I do see mentions of GCTA including: http://www.unz.com/jthompson/2014/12/

    I looked at Visscher’s GCTA and tried to understand it w/o much success so far. However I thought the overfitting might a problem. And then I found this paper:

    Limitations of GCTA as a solution to the missing heritability problem

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4711841/

    Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype.

    Read More
    • Replies: @res
    That is interesting. Did you follow the associated controversy (yellow box at the top)? The response from the GCTA authors is pretty harsh by research paper standards: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987770/

    In a recent publication in PNAS, Krishna Kumar et al. (1) claim that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” We show below that those claims are false due to their misunderstanding of the theory and practice of random-effect models underlying genome-wide complex trait analysis (GCTA) (2).
    ...
    There are many other errors in the paper by Krishna Kumar et al. (1), as pointed out by us (2) and others (8). In conclusion, Krishna Kumar et al. (1, 5) misunderstood the model and assumptions underlying GCTA-GREML, and therefore used the incorrect expected mean and SD of σˆ2subset for comparison with those values observed from resampling. Hence, their conclusion about biasedness of GREML estimates is not supported by empirical evidence.
     
  44. @res
    That's a fair question. First, I was mostly disagreeing with your first paragraph (and last sentence, which I in turn consider laughable). For the first part of your last paragraph I might quibble about "easy" and add a caveat about "of sufficient effect size", but I think you are basically on target there.

    So the part I disagree with that is worth discussing:

    No, only my point is true and explains the method that’s used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.
     
    OK. Let me try to quote or restate the three points (with numbers for easy reference) I see explaining why studies are done on uniform white populations. If either you or Peter disagree with my version please correct me.

    1. Double Juice JJ (comment 28): "it's called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles."

    2. Peter Johnson (comment 11): "Usually they restrict the data to the European-race subsample since that has the largest subsample."

    3. Peter Johnson (comment 11): "This sample restriction also has the political advantage that it keeps the researchers away from troubling findings, in terms of noticing gene-linked race differences in intelligence."

    You and I agree 1. is true. I would argue there is another issue there with things like different allele frequencies and linkage disequilibrium between the races complicating the study assumptions and interpretation.

    Regarding 2., that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?

    Regarding 3., there we have to rely on "the dog that didn't bark". A 1 SD difference in phenotypic IQ between blacks and whites provides an interesting genetic question. Why have only a few heretics looked into it? Do you have an explanation other than 3.? Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics. While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )

    I sincerely have trouble understanding how people with a good understanding of genetics can be so dismissive of GWAS given the demonstrated trends we have seen with increasing sample sizes resulting in more SNP hits and more variance explained. Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don't find Dr. Thompson's post and the evidence I have given persuasive.

    Many people dismissive of GWAS seem to have gotten stuck in the negative results they heard about in the early 2000s (before correcting for multiple hypothesis testing was common), but I think that can usually be seen by the arguments used and am not seeing that here.

    that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?

    Well, height research uses racially homogeneous samples too, and they find things like that:

    Height is a complex trait under strong genetic influence. To date, numerous genetic loci have been associated with height in individuals of European ancestry. However, few large-scale discovery genome-wide association studies (GWAS) of height in minority populations have been conducted and thus information about population-specific height regulation is limited. We conducted a GWA analysis of height in 8149 African-American (AA) women from the Women’s Health Initiative. Genetic variants with P< 5 × 10−5 (n = 169) were followed up in a replication data set (n = 20 809) and meta-analyzed in a total of 28 958 AAs and African-descent individuals. Twelve single-nucleotide polymorphisms (SNPs) representing 7 independent loci were significantly associated with height at P < 5 × 10−8. We identified novel SNPs in 17q23 (TMEM100/PCTP) and Xp22.3 (ARSE) reflecting population-specific regulation of height in AAs and replicated five loci previously reported in European-descent populations [4p15/LCORL, 11q13/SERPINH1, 12q14/HMGA2, 17q23/MAP3K3 (mitogen-activated protein kinase3) and 18q21/DYM]. In addition, we performed an admixture mapping analysis of height which is both complementary and supportive to the GWA analysis and suggests potential associations between ancestry and height on chromosomes 4 (4q21), 15 (15q26) and 17 (17q23). Our findings provide insight into the genetic architecture of height and support the investigation of non-European-descent populations for identifying genetic factors associated with complex traits. Specifically, we identify new loci that may reflect population-specific regulation of height and report several known height loci that are important in determining height in African-descent populations.

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259012/

    While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )

    If the phenotype doesn’t differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don’t need to control for race.

    Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don’t find Dr. Thompson’s post and the evidence I have given persuasive.

    It’s an error to assume height and intelligence research are comparable. Height research doesn’t requires unreasonable sample sizes to find substantial heritability.

    Read More
    • Replies: @res

    Well, height research uses racially homogeneous samples too, and they find things like that:
     
    I see that your link is for a study on individuals from African descent, but I don't see how that counters what I said.

    If the phenotype doesn’t differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don’t need to control for race.
     
    OK. Then we agree on that. Thanks.

    It’s an error to assume height and intelligence research are comparable. Height research doesn’t requires unreasonable sample sizes to find substantial heritability.
     
    Perhaps you could offer a more rigorous definition of "unreasonable sample sizes" and "substantial heritability"? The GIANT study you linked has a sample size of 250,000 people (more than the study in this post) and explained 20% of heritability (in contrast to the 4% for this IQ study).

    The question is how comparable are IQ and height research (i.e. it is not a binary comparable or not question). The main differences I see are that height has a higher genetic heritability and is more easily and accurately measurable (as you noted earlier). These are both going to increase the sample size needed for an IQ study to explain as much heritability. In addition, the ease and frequency of measuring height means it is far easier to get good large sample data.

    Based on the estimates in http://www.biorxiv.org/content/early/2017/08/11/175406 (also linked above) it looks like explaining 50% of heritability will take an IQ sample of 800k vs. a height sample of 350k. That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.
  45. utu says:
    @Double Juice JJ
    Quoting studies is good, understanding them is better

    SNPs discovered by genome-wide association studies (GWASs) account for only a small fraction of the genetic variation of complex traits in human populations. Where is the remaining heritability? We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis, and validated the estimation method with simulations based on the observed genotype data. We show that 45% of variance can be explained by considering all SNPs simultaneously. Thus, most of the heritability is not missing but has not previously been detected because the individual effects are too small to pass stringent significance tests. We provide evidence that the remaining heritability is due to incomplete linkage disequilibrium between causal variants and genotyped SNPs, exacerbated by causal variants having lower minor allele frequency than the SNPs explored to date.
     
    Just keep telling yourself that. At what point does it become non-pathetic in your estimation? Or do we just keep moving the goalposts?

    It'll stay pathetic as long as you can't come up with something like that in explaining group differences.

    The evolutionary history of the human pygmy phenotype (small body size), a characteristic of African and Southeast Asian rainforest hunter-gatherers, is largely unknown. Here we use a genome-wide admixture mapping analysis to identify 16 genomic regions that are significantly associated with the pygmy phenotype in the Batwa, a rainforest hunter-gatherer population from Uganda (east central Africa). The identified genomic regions have multiple attributes that provide supporting evidence of genuine association with the pygmy phenotype, including enrichments for SNPs previously associated with stature variation in Europeans and for genes with growth hormone receptor and regulation functions. To test adaptive evolutionary hypotheses, we computed the haplotype-based integrated haplotype score (iHS) statistic and the level of population differentiation (FST) between the Batwa and their agricultural neighbors, the Bakiga, for each genomic SNP. Both |iHS| and FST values were significantly higher for SNPs within the Batwa pygmy phenotype-associated regions than the remainder of the genome, a signature of polygenic adaptation. In contrast, when we expanded our analysis to include Baka rainforest hunter-gatherers from Cameroon and Gabon (west central Africa) and Nzebi and Nzime neighboring agriculturalists, we did not observe elevated |iHS| or FST values in these genomic regions. Together, these results suggest adaptive and at least partially convergent origins of the pygmy phenotype even within Africa, supporting the hypothesis that small body size confers a selective advantage for tropical rainforest hunter-gatherers but raising questions about the antiquity of this behavior.
     
    http://www.pnas.org/content/111/35/E3596.full

    Oh, and don't tell me "muh politically correct research, blah, blah, blah". Nothing prevents hereditarian "scholars" from getting the relevant degrees and getting funding from the pioneer fund.

    We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis

    Do you have opinion about the validity of this study? It is grossly undetermined system of 200k binary variables and only 4k sample. To validate this results one must have a sample that is significantly larger than the number of variables.

    Read More
    • Replies: @Double Juice JJ
    Yes this is a solid study, with a huge and positive citation feedback:

    https://scholar.google.fr/scholar?cites=6393909358148752848&as_sdt=2005&sciodt=0,5&hl=en

    (2000+ citations)

    Nothing comparable to the above preprint that made hereditarians' day.
  46. @res
    Thanks for telling me about that paper. It was new to me. Here is a more direct link: https://www.nature.com/nature/journal/v542/n7640/full/nature21039.html
    Their Figure 1 is very interesting. I did not realize the trend was that strong towards larger effect sizes at lower frequencies. That makes sense to me for the negative variants (rare harmful mutations), but I would have expected the positive variants to be selected for (i.e. increased in MAF) unless there is a countervailing force. It does raise an interesting question about whether height is the primary survival related issue with these "height SNPs." There is also a question of whether the larger effect size trend is partly an artifact of only being able to detect large effect sizes at that MAF.

    I agree rare variants matter. The question is: how much? If we can trust the Visscher GCTA results, they are explaining 45% of the height variance with SNPs. That leaves 35% variance from additive genetic effects to be explained.

    How much % variance explained can a variant that appears in <0.5% of people offer? % variance explained depends on both effect size and MAF. There would need to be a long tail of rare variants to make up for the low frequency. And I am not seeing that long tail in Figure 1, though who knows if the increasing effect size continues for even lower MAF.

    Am I confused or does that paper actually help explain why large sample sizes are important? (i.e. rather than being something to ridicule)

    Thank you res, you’re a polite debater. Sorry if I’m condescending and aggressive.

    but I would have expected the positive variants to be selected for (i.e. increased in MAF) unless there is a countervailing force. It does raise an interesting question about whether height is the primary survival related issue with these “height SNPs.” There is also a question of whether the larger effect size trend is partly an artifact of only being able to detect large effect sizes at that MAF.

    Well, there is nothing automatic about natural selection. Genotypes are only selected if they provide a substantial survival and reproductive advantage. Which doesn’t seem to be the case with IQ and height in most population. You can primarily test whether a phenotype is selected by looking at the variance. Natural selection results in very low variance, nothing like the huge IQ bell curve. Also, when a trait is selected, you notice very high between group differences and consistent geographic distribution (equatorial populations are dark, rainforest dwellers are short…) with very low average. IQ doesn’t follow the pattern either. Those traits tend to be unmalleable: no Flynn effect on skin color or eye shape. And genetic studies easily spot signals of selection.

    Those complex traits have a completely different genetic architecture.

    Am I confused or does that paper actually help explain why large sample sizes are important? (i.e. rather than being something to ridicule)

    Large sample sizes are ridicule when someone is stubbornly trying to prove that common variants are responsible for most variance yet can only explain 4% of it. There is a moment when you just need to stop grasping at straws.

    Read More
    • Replies: @res

    You can primarily test whether a phenotype is selected by looking at the variance. Natural selection results in very low variance, nothing like the huge IQ bell curve.

     

    But how do you normalize this? Large and small relative to what? And you seem to be assuming a single optimal value (which would drive low variation). In different environments there might be different tradeoffs for things like IQ/metabolic cost/brain+hip size.

    Large sample sizes are ridicule when someone is stubbornly trying to prove that common variants are responsible for most variance yet can only explain 4% of it. There is a moment when you just need to stop grasping at straws.
     
    I think you are reading too much into needing a larger sample size to find the individual SNPs. If you believe the Visscher height GCTA then you should also believe the cognitive ability GCTA analyses (I linked one above) which indicate significant variance explained by common SNPs.

    Also, when a trait is selected, you notice very high between group differences and consistent geographic distribution (equatorial populations are dark, rainforest dwellers are short…)
     
    You mean like a >1SD difference in IQ between SSA and northern populations? (sorry, but you can't expect me not to respond to such a provocative comment ; )
  47. @res

    Quoting studies is good, understanding them is better
     
    Indeed. Perhaps I am being dense, but where is my lack of understanding you are implying? Please quote my words with which you disagree.

    It’ll stay pathetic as long as you can’t come up with something like that in explaining group differences.
     
    That is an interesting redirect. It simultaneously tells me what you care about here and explains why I was so confused by your objections (I missed that that was your focus).

    I will remind you I originally objected to "But the hits can only explain 4% of the variance. That’s pathetic." which had nothing to do with group differences.

    Oh, and don’t tell me “muh politically correct research, blah, blah, blah”. Nothing prevents hereditarian “scholars” from getting the relevant degrees and getting funding from the pioneer fund.
     
    Except for wanting to avoid career suicide. And getting academic advisers and funders to sign off on controversial research. If you are an academic you know this better than I do. Here I just have to conclude you are being disingenuous. Do we really need to have a discussion about Jason Richwine and his much less contentious research? Or James Watson, who I would have thought was in an unassailable position?

    How about we agree to disagree regarding the genetics of group differences and talk about other aspects of GWAS? You clearly have a lot of knowledge to offer on this topic. Unless "muh non-PC comments" have been enough to get me blacklisted already.

    P.S. Thanks for the pygmy link. Different selection pressures operating on different populations is an important driver of genetic divergence IMHO.

    Perhaps I am being dense, but where is my lack of understanding you are implying? Please quote my words with which you disagree.

    I have to disagree when you say height research is facing the same missing heritability crisis because of sample size. It’s not.

    I will remind you I originally objected to “But the hits can only explain 4% of the variance. That’s pathetic.” which had nothing to do with group differences.

    Oh yeah, only reacting to the cringy hereditarian euphoria of the post and comments section. Nothing personal.

    Except for wanting to avoid career suicide. And getting academic advisers and funders to sign off on controversial research. If you are an academic you know this better than I do.

    I am not an academic and I don’t know much more than you. All I know is that the pioneer fund is generously granting scientific racist “research”, that guys like Rushton/Flushton, Harpending/Harpoondick or Hsu/Shoe are doing their stuff un-bothered.

    Do we really need to have a discussion about Jason Richwine and his much less contentious research? Or James Watson, who I would have thought was in an unassailable position?

    They did no research. They only threw random comments on race.

    How about we agree to disagree regarding the genetics of group differences and talk about other aspects of GWAS?

    Sounds good.

    Read More
    • Replies: @res

    I have to disagree when you say height research is facing the same missing heritability crisis because of sample size. It’s not.
     
    Where did I say that? There was a reason I asked for you to quote my words.

    Sounds good.
     
    OK. I'm even giving you last word on that except to note I disagree with "They only threw random comments on race."

    P.S. You are aware that of your three examples (Rushton, Harpending, Hsu) two are dead, right? So as far as "doing their stuff un-bothered" not so much anymore.
  48. Human height is a composite measurement, reflecting the sum of leg, spine, and head lengths. Many common variants influence total height, but the effects of these or other variants on the components of height (body proportion) remain largely unknown. We studied sitting height ratio (SHR), the ratio of sitting height to total height, to identify such effects in 3,545 African Americans and 21,590 individuals of European ancestry. We found that SHR is heritable: 26% and 39% of the total variance of SHR can be explained by common variants in European and African Americans, respectively, and global European admixture is negatively correlated with SHR in African Americans (r2 ≈ 0.03). Six regions reached genome-wide significance (p < 5 × 10−8) for association with SHR and overlapped biological candidate genes, including TBX2 and IGFBP3. We found that 130 of 670 height-associated variants are nominally associated (p < 0.05) with SHR, more than expected by chance (p = 5 × 10−40). At these 130 loci, the height-increasing alleles are associated with either a decrease (71 loci) or increase (59 loci) in SHR, suggesting that different height loci disproportionally affect either leg length or spine/head length. Pathway analyses via DEPICT revealed that height loci affecting SHR, and especially those affecting leg length, show enrichment of different biological pathways (e.g., bone/cartilage/growth plate pathways) than do loci with no effect on SHR (e.g., embryonic development). These results highlight the value of using a pair of related but orthogonal phenotypes, in this case SHR with height, as a prism to dissect the biology underlying genetic associations in polygenic traits and diseases.

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570286/

    look at how height research is productive without needing foolish sample sizes…

    Read More
    • Replies: @res
    There is a lot going on in that study. If I understand correctly the high variance explained quoted comes from GCTA. Compare that to this cognitive ability GCTA study with an even smaller sample size: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3652710/

    Common DNA Markers Can Account for More Than Half of the Genetic Influence on Cognitive Abilities

    For nearly a century, twin and adoption studies have yielded substantial estimates of heritability for cognitive abilities, although it has proved difficult for genomewide-association studies to identify the genetic variants that account for this heritability (i.e., the missing-heritability problem). However, a new approach, genomewide complex-trait analysis (GCTA), forgoes the identification of individual variants to estimate the total heritability captured by common DNA markers on genotyping arrays. In the same sample of 3,154 pairs of 12-year-old twins, we directly compared twin-study heritability estimates for cognitive abilities (language, verbal, nonverbal, and general) with GCTA estimates captured by 1.7 million DNA markers. We found that DNA markers tagged by the array accounted for .66 of the estimated heritability, reaffirming that cognitive abilities are heritable. Larger sample sizes alone will be sufficient to identify many of the genetic variants that influence cognitive abilities.

     

    Also notice that your study uses a 253k sample size study (reference 2): https://www.ncbi.nlm.nih.gov/pubmed/25282103
    to get a set of height alleles it uses. A little misleading to present it as an exemplar of small sample sizes when it relies on data from a much larger sample size study for some of its conclusions.

    But all of that said, as I note above it looks like IQ studies do require larger sample sizes for equivalent results. It is just that the difference is not that big (say a factor of 2-4?).
    , @utu
    I am surprised why are you not more skeptical about height studies. You apparently believe all GWAS and GCTA hype and are totally oblivious of serious mathematical challenges that face both methods and do to seem to envision the possibilities of result tweaking and falsification. The only thing you do not like is that IQ is being studied with GWAS and GCTA. You think that all other missing heritability gaps will be closed except for the IQ. You are very naive. Your line of attack of the IQ business is shortsighted. The missing heritability gaps will be closed by hook or by crook. The whole system is grossly undetermined with too many degrees of freedom which leaves plenty of room for the true believers to obtain the desired results in support of their dogma. There are no skeptics left in the field who could watch the hands of true believer are doing.

    You are in the same league as res. You two just differ about the IQ stuff.

  49. @res
    That's a fair question. First, I was mostly disagreeing with your first paragraph (and last sentence, which I in turn consider laughable). For the first part of your last paragraph I might quibble about "easy" and add a caveat about "of sufficient effect size", but I think you are basically on target there.

    So the part I disagree with that is worth discussing:

    No, only my point is true and explains the method that’s used by specialist. Peter Johnson is just giving the usual alt-right paranoid whining.
     
    OK. Let me try to quote or restate the three points (with numbers for easy reference) I see explaining why studies are done on uniform white populations. If either you or Peter disagree with my version please correct me.

    1. Double Juice JJ (comment 28): "it's called avoiding confounds. If the samples included individuals of various races, ancestry-related alleles would mistakenly pass for intelligence alleles."

    2. Peter Johnson (comment 11): "Usually they restrict the data to the European-race subsample since that has the largest subsample."

    3. Peter Johnson (comment 11): "This sample restriction also has the political advantage that it keeps the researchers away from troubling findings, in terms of noticing gene-linked race differences in intelligence."

    You and I agree 1. is true. I would argue there is another issue there with things like different allele frequencies and linkage disequilibrium between the races complicating the study assumptions and interpretation.

    Regarding 2., that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?

    Regarding 3., there we have to rely on "the dog that didn't bark". A 1 SD difference in phenotypic IQ between blacks and whites provides an interesting genetic question. Why have only a few heretics looked into it? Do you have an explanation other than 3.? Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics. While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )

    I sincerely have trouble understanding how people with a good understanding of genetics can be so dismissive of GWAS given the demonstrated trends we have seen with increasing sample sizes resulting in more SNP hits and more variance explained. Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don't find Dr. Thompson's post and the evidence I have given persuasive.

    Many people dismissive of GWAS seem to have gotten stuck in the negative results they heard about in the early 2000s (before correcting for multiple hypothesis testing was common), but I think that can usually be seen by the arguments used and am not seeing that here.

    Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics.

    Uh! I missed that… Are you talking about Piffer’s ridiculous numerology that never makes it through peer review? Even the very HBD-friendly intelligence journal trashed his “work”. Piffer uses a dozen pathetic GWAS hits found in an European sample, he associates them with similarly pathetic IQ data and them claims they are part of a “naturally selected polygenic genotype” without elaborating on the signals of natural selection and them goes whining “muh PC journals won’t publish me, so f**k peer review”.

    Who does he think he’s fooling? It doesn’t work like that. You need to replicate the hits in other populations, which will also lead to discovering novel population-specific loci (especially when it comes to Africans and their huge genetic diversity). And a complementary admixture/genetic distance analysis will come to confirm the pattern. And of course, you need a much better measure than Lynn’s laughable global IQ data.

    Read More
    • Replies: @res
    Leaving aside Piffer's work, it is easy enough to go to the 1000 genomes browser (or SNPedia etc.) and look at the MAF for different IQ SNPs and see how much they vary between populations. Surely that is an interesting observation? It would be an amazing coincidence to have that and a zero overall contribution of genes to group differences in IQ both be true.

    There is much more work to do to understand the relevant IQ SNPs in the different populations, but surely at least one of those SNPs will be relevant in both populations?
  50. res says:
    @Double Juice JJ

    that seems to be common practice in many studies (e.g. medical). Why do you think it is not applicable here?
     
    Well, height research uses racially homogeneous samples too, and they find things like that:

    Height is a complex trait under strong genetic influence. To date, numerous genetic loci have been associated with height in individuals of European ancestry. However, few large-scale discovery genome-wide association studies (GWAS) of height in minority populations have been conducted and thus information about population-specific height regulation is limited. We conducted a GWA analysis of height in 8149 African-American (AA) women from the Women's Health Initiative. Genetic variants with P< 5 × 10−5 (n = 169) were followed up in a replication data set (n = 20 809) and meta-analyzed in a total of 28 958 AAs and African-descent individuals. Twelve single-nucleotide polymorphisms (SNPs) representing 7 independent loci were significantly associated with height at P < 5 × 10−8. We identified novel SNPs in 17q23 (TMEM100/PCTP) and Xp22.3 (ARSE) reflecting population-specific regulation of height in AAs and replicated five loci previously reported in European-descent populations [4p15/LCORL, 11q13/SERPINH1, 12q14/HMGA2, 17q23/MAP3K3 (mitogen-activated protein kinase3) and 18q21/DYM]. In addition, we performed an admixture mapping analysis of height which is both complementary and supportive to the GWA analysis and suggests potential associations between ancestry and height on chromosomes 4 (4q21), 15 (15q26) and 17 (17q23). Our findings provide insight into the genetic architecture of height and support the investigation of non-European-descent populations for identifying genetic factors associated with complex traits. Specifically, we identify new loci that may reflect population-specific regulation of height and report several known height loci that are important in determining height in African-descent populations.
     
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3259012/

    While we are on this topic, would your 1. even be applicable if there was no difference between the IQ means of the races? Perhaps that is just a polite way of saying 3.? ; )
     
    If the phenotype doesn't differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don't need to control for race.

    Height serves as a great example which you seem to acknowledge. I would be very interested in better understanding why you don’t find Dr. Thompson’s post and the evidence I have given persuasive.
     
    It's an error to assume height and intelligence research are comparable. Height research doesn't requires unreasonable sample sizes to find substantial heritability.

    Well, height research uses racially homogeneous samples too, and they find things like that:

    I see that your link is for a study on individuals from African descent, but I don’t see how that counters what I said.

    If the phenotype doesn’t differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don’t need to control for race.

    OK. Then we agree on that. Thanks.

    It’s an error to assume height and intelligence research are comparable. Height research doesn’t requires unreasonable sample sizes to find substantial heritability.

    Perhaps you could offer a more rigorous definition of “unreasonable sample sizes” and “substantial heritability”? The GIANT study you linked has a sample size of 250,000 people (more than the study in this post) and explained 20% of heritability (in contrast to the 4% for this IQ study).

    The question is how comparable are IQ and height research (i.e. it is not a binary comparable or not question). The main differences I see are that height has a higher genetic heritability and is more easily and accurately measurable (as you noted earlier). These are both going to increase the sample size needed for an IQ study to explain as much heritability. In addition, the ease and frequency of measuring height means it is far easier to get good large sample data.

    Based on the estimates in http://www.biorxiv.org/content/early/2017/08/11/175406 (also linked above) it looks like explaining 50% of heritability will take an IQ sample of 800k vs. a height sample of 350k. That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.

    Read More
    • Replies: @Double Juice JJ

    The GIANT study you linked has a sample size of 250,000 people (more than the study in this post)
     
    What? This post's study samples 280K individuals and manages to explain just 4% of the variance.

    That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.
     
    This is completely unreasonable. Never saw such a sample size requirement for any other trait (even behavioral).
    , @Double Juice JJ

    I see that your link is for a study on individuals from African descent, but I don’t see how that counters what I said.
     
    They explicitly state that studies had only been carried on Caucasian samples and that minority samples would be needed to discover novel population specific variants.

    Perhaps you could offer a more rigorous definition of “unreasonable sample sizes” and “substantial heritability”?
     
    No.

    I'm just thinking this way in comparison to research on other traits for which no clueless blogers are making excuses, and that actually don't need such excuses.
  51. @utu

    We estimated the proportion of variance for human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis
     
    Do you have opinion about the validity of this study? It is grossly undetermined system of 200k binary variables and only 4k sample. To validate this results one must have a sample that is significantly larger than the number of variables.

    Yes this is a solid study, with a huge and positive citation feedback:

    https://scholar.google.fr/scholar?cites=6393909358148752848&as_sdt=2005&sciodt=0,5&hl=en

    (2000+ citations)

    Nothing comparable to the above preprint that made hereditarians’ day.

    Read More
  52. res says:
    @Double Juice JJ

    Human height is a composite measurement, reflecting the sum of leg, spine, and head lengths. Many common variants influence total height, but the effects of these or other variants on the components of height (body proportion) remain largely unknown. We studied sitting height ratio (SHR), the ratio of sitting height to total height, to identify such effects in 3,545 African Americans and 21,590 individuals of European ancestry. We found that SHR is heritable: 26% and 39% of the total variance of SHR can be explained by common variants in European and African Americans, respectively, and global European admixture is negatively correlated with SHR in African Americans (r2 ≈ 0.03). Six regions reached genome-wide significance (p < 5 × 10−8) for association with SHR and overlapped biological candidate genes, including TBX2 and IGFBP3. We found that 130 of 670 height-associated variants are nominally associated (p < 0.05) with SHR, more than expected by chance (p = 5 × 10−40). At these 130 loci, the height-increasing alleles are associated with either a decrease (71 loci) or increase (59 loci) in SHR, suggesting that different height loci disproportionally affect either leg length or spine/head length. Pathway analyses via DEPICT revealed that height loci affecting SHR, and especially those affecting leg length, show enrichment of different biological pathways (e.g., bone/cartilage/growth plate pathways) than do loci with no effect on SHR (e.g., embryonic development). These results highlight the value of using a pair of related but orthogonal phenotypes, in this case SHR with height, as a prism to dissect the biology underlying genetic associations in polygenic traits and diseases.
     
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570286/

    look at how height research is productive without needing foolish sample sizes...

    There is a lot going on in that study. If I understand correctly the high variance explained quoted comes from GCTA. Compare that to this cognitive ability GCTA study with an even smaller sample size: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3652710/

    Common DNA Markers Can Account for More Than Half of the Genetic Influence on Cognitive Abilities

    For nearly a century, twin and adoption studies have yielded substantial estimates of heritability for cognitive abilities, although it has proved difficult for genomewide-association studies to identify the genetic variants that account for this heritability (i.e., the missing-heritability problem). However, a new approach, genomewide complex-trait analysis (GCTA), forgoes the identification of individual variants to estimate the total heritability captured by common DNA markers on genotyping arrays. In the same sample of 3,154 pairs of 12-year-old twins, we directly compared twin-study heritability estimates for cognitive abilities (language, verbal, nonverbal, and general) with GCTA estimates captured by 1.7 million DNA markers. We found that DNA markers tagged by the array accounted for .66 of the estimated heritability, reaffirming that cognitive abilities are heritable. Larger sample sizes alone will be sufficient to identify many of the genetic variants that influence cognitive abilities.

    Also notice that your study uses a 253k sample size study (reference 2): https://www.ncbi.nlm.nih.gov/pubmed/25282103
    to get a set of height alleles it uses. A little misleading to present it as an exemplar of small sample sizes when it relies on data from a much larger sample size study for some of its conclusions.

    But all of that said, as I note above it looks like IQ studies do require larger sample sizes for equivalent results. It is just that the difference is not that big (say a factor of 2-4?).

    Read More
  53. res says:
    @Double Juice JJ

    Perhaps I am being dense, but where is my lack of understanding you are implying? Please quote my words with which you disagree.
     
    I have to disagree when you say height research is facing the same missing heritability crisis because of sample size. It's not.

    I will remind you I originally objected to “But the hits can only explain 4% of the variance. That’s pathetic.” which had nothing to do with group differences.
     
    Oh yeah, only reacting to the cringy hereditarian euphoria of the post and comments section. Nothing personal.

    Except for wanting to avoid career suicide. And getting academic advisers and funders to sign off on controversial research. If you are an academic you know this better than I do.
     
    I am not an academic and I don't know much more than you. All I know is that the pioneer fund is generously granting scientific racist "research", that guys like Rushton/Flushton, Harpending/Harpoondick or Hsu/Shoe are doing their stuff un-bothered.

    Do we really need to have a discussion about Jason Richwine and his much less contentious research? Or James Watson, who I would have thought was in an unassailable position?
     
    They did no research. They only threw random comments on race.

    How about we agree to disagree regarding the genetics of group differences and talk about other aspects of GWAS?
     
    Sounds good.

    I have to disagree when you say height research is facing the same missing heritability crisis because of sample size. It’s not.

    Where did I say that? There was a reason I asked for you to quote my words.

    Sounds good.

    OK. I’m even giving you last word on that except to note I disagree with “They only threw random comments on race.”

    P.S. You are aware that of your three examples (Rushton, Harpending, Hsu) two are dead, right? So as far as “doing their stuff un-bothered” not so much anymore.

    Read More
  54. res says:
    @Double Juice JJ

    Note that the differences between races in MAF for the IQ SNPs are well known but little commented on outside of us heretics.
     
    Uh! I missed that... Are you talking about Piffer's ridiculous numerology that never makes it through peer review? Even the very HBD-friendly intelligence journal trashed his "work". Piffer uses a dozen pathetic GWAS hits found in an European sample, he associates them with similarly pathetic IQ data and them claims they are part of a "naturally selected polygenic genotype" without elaborating on the signals of natural selection and them goes whining "muh PC journals won't publish me, so f**k peer review".

    Who does he think he's fooling? It doesn't work like that. You need to replicate the hits in other populations, which will also lead to discovering novel population-specific loci (especially when it comes to Africans and their huge genetic diversity). And a complementary admixture/genetic distance analysis will come to confirm the pattern. And of course, you need a much better measure than Lynn's laughable global IQ data.

    Leaving aside Piffer’s work, it is easy enough to go to the 1000 genomes browser (or SNPedia etc.) and look at the MAF for different IQ SNPs and see how much they vary between populations. Surely that is an interesting observation? It would be an amazing coincidence to have that and a zero overall contribution of genes to group differences in IQ both be true.

    There is much more work to do to understand the relevant IQ SNPs in the different populations, but surely at least one of those SNPs will be relevant in both populations?

    Read More
    • Replies: @Double Juice JJ

    Leaving aside Piffer’s work, it is easy enough to go to the 1000 genomes browser (or SNPedia etc.) and look at the MAF for different IQ SNPs and see how much they vary between populations. Surely that is an interesting observation? It would be an amazing coincidence to have that and a zero overall contribution of genes to group differences in IQ both be true.
     
    Well, no the MAF calculation is wrong according to intelligence journal reviewers.

    You can read his rant here: https://topseudoscience.wordpress.com/2016/01/10/the-forbidden-paper-on-the-population-genetics-of-iq/

    And the corrected data here: https://topseudoscience.wordpress.com/2016/01/14/using-derived-alleles-to-amplify-selection-signatures-on-intelligence/

    And yet again, it's only done using European GWAS hits. Population-specific variants can greatly change the data, especially when the differences in polygenic score are so low.

    Piffer is a clown. This one needs a nickname too.
  55. utu says:
    @Double Juice JJ

    Human height is a composite measurement, reflecting the sum of leg, spine, and head lengths. Many common variants influence total height, but the effects of these or other variants on the components of height (body proportion) remain largely unknown. We studied sitting height ratio (SHR), the ratio of sitting height to total height, to identify such effects in 3,545 African Americans and 21,590 individuals of European ancestry. We found that SHR is heritable: 26% and 39% of the total variance of SHR can be explained by common variants in European and African Americans, respectively, and global European admixture is negatively correlated with SHR in African Americans (r2 ≈ 0.03). Six regions reached genome-wide significance (p < 5 × 10−8) for association with SHR and overlapped biological candidate genes, including TBX2 and IGFBP3. We found that 130 of 670 height-associated variants are nominally associated (p < 0.05) with SHR, more than expected by chance (p = 5 × 10−40). At these 130 loci, the height-increasing alleles are associated with either a decrease (71 loci) or increase (59 loci) in SHR, suggesting that different height loci disproportionally affect either leg length or spine/head length. Pathway analyses via DEPICT revealed that height loci affecting SHR, and especially those affecting leg length, show enrichment of different biological pathways (e.g., bone/cartilage/growth plate pathways) than do loci with no effect on SHR (e.g., embryonic development). These results highlight the value of using a pair of related but orthogonal phenotypes, in this case SHR with height, as a prism to dissect the biology underlying genetic associations in polygenic traits and diseases.
     
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4570286/

    look at how height research is productive without needing foolish sample sizes...

    I am surprised why are you not more skeptical about height studies. You apparently believe all GWAS and GCTA hype and are totally oblivious of serious mathematical challenges that face both methods and do to seem to envision the possibilities of result tweaking and falsification. The only thing you do not like is that IQ is being studied with GWAS and GCTA. You think that all other missing heritability gaps will be closed except for the IQ. You are very naive. Your line of attack of the IQ business is shortsighted. The missing heritability gaps will be closed by hook or by crook. The whole system is grossly undetermined with too many degrees of freedom which leaves plenty of room for the true believers to obtain the desired results in support of their dogma. There are no skeptics left in the field who could watch the hands of true believer are doing.

    You are in the same league as res. You two just differ about the IQ stuff.

    Read More
    • Replies: @Double Juice JJ
    Yes exactly, I believe in the validity of GWAS and GCTA. I just don't buy the sample size excuse of IQ researchers.
  56. res says:
    @Double Juice JJ
    Thank you res, you're a polite debater. Sorry if I'm condescending and aggressive.

    but I would have expected the positive variants to be selected for (i.e. increased in MAF) unless there is a countervailing force. It does raise an interesting question about whether height is the primary survival related issue with these “height SNPs.” There is also a question of whether the larger effect size trend is partly an artifact of only being able to detect large effect sizes at that MAF.
     
    Well, there is nothing automatic about natural selection. Genotypes are only selected if they provide a substantial survival and reproductive advantage. Which doesn't seem to be the case with IQ and height in most population. You can primarily test whether a phenotype is selected by looking at the variance. Natural selection results in very low variance, nothing like the huge IQ bell curve. Also, when a trait is selected, you notice very high between group differences and consistent geographic distribution (equatorial populations are dark, rainforest dwellers are short...) with very low average. IQ doesn't follow the pattern either. Those traits tend to be unmalleable: no Flynn effect on skin color or eye shape. And genetic studies easily spot signals of selection.

    Those complex traits have a completely different genetic architecture.

    Am I confused or does that paper actually help explain why large sample sizes are important? (i.e. rather than being something to ridicule)
     
    Large sample sizes are ridicule when someone is stubbornly trying to prove that common variants are responsible for most variance yet can only explain 4% of it. There is a moment when you just need to stop grasping at straws.

    You can primarily test whether a phenotype is selected by looking at the variance. Natural selection results in very low variance, nothing like the huge IQ bell curve.

    But how do you normalize this? Large and small relative to what? And you seem to be assuming a single optimal value (which would drive low variation). In different environments there might be different tradeoffs for things like IQ/metabolic cost/brain+hip size.

    Large sample sizes are ridicule when someone is stubbornly trying to prove that common variants are responsible for most variance yet can only explain 4% of it. There is a moment when you just need to stop grasping at straws.

    I think you are reading too much into needing a larger sample size to find the individual SNPs. If you believe the Visscher height GCTA then you should also believe the cognitive ability GCTA analyses (I linked one above) which indicate significant variance explained by common SNPs.

    Also, when a trait is selected, you notice very high between group differences and consistent geographic distribution (equatorial populations are dark, rainforest dwellers are short…)

    You mean like a >1SD difference in IQ between SSA and northern populations? (sorry, but you can’t expect me not to respond to such a provocative comment ; )

    Read More
    • Replies: @Double Juice JJ

    But how do you normalize this? Large and small relative to what? And you seem to be assuming a single optimal value (which would drive low variation).
     
    Well, natural selection is all about selecting for a very narrow range of superior fitness in adaptation to a given environment.

    In different environments there might be different tradeoffs for things like IQ/metabolic cost/brain+hip size.
     
    Or there might not be such tradeoffs.

    I think you are reading too much into needing a larger sample size to find the individual SNPs. If you believe the Visscher height GCTA then you should also believe the cognitive ability GCTA analyses (I linked one above) which indicate significant variance explained by common SNPs.
     
    I agree with them, except I'm bothered by the absence of specific identified loci, which make it difficult to prove causality beyond mere statistical relationship.

    However, the heritability of cognitive ability appears to be much more modest than usually stated.
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3652710/table/table1-0956797612457952/

    You mean like a >1SD difference in IQ between SSA and northern populations?
     
    No, I mean like a >∞SD difference in skin color so that a native equatorial African is never lighter skinned than any European. No overlap, no narrowing gap (reduced by 1/3 in the US, negligible in Europe), just something as stable and divergent as skin color, hair texture, facial features.
  57. @utu
    human height explained by 294,831 SNPs genotyped on 3,925 unrelated individuals using a linear model analysis,

    Is this for real? One can assign a random sequence of numbers to 3,925 individuals and find 200,000 SNP's with a polygenic score that predicts this sequence exactly.

    Less than half of the variance in human height is explained with hundreds of thousands of variants. Will IQ be the same way?

    Though there was a decent study on height heritability:

    … all independent variants, known and novel together explained 27.4% of heritability. By comparison, the 697 known height SNPs explain 23.3% of height heritability in the same dataset (vs. 4.1% by the new height variants identified in this ExomeChip study)” (pg 7).

    https://serval.unil.ch/resource/serval:BIB_CB04B9543EC2.P001/REF

    Read More
  58. res says:
    @utu
    I looked at Visscher's GCTA and tried to understand it w/o much success so far. However I thought the overfitting might a problem. And then I found this paper:

    Limitations of GCTA as a solution to the missing heritability problem
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4711841/
    Here, we show that GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability. We show first that GCTA depends sensitively on all singular values of a high-dimensional genetic relatedness matrix (GRM). When the assumptions in GCTA are satisfied exactly, we show that the heritability estimates produced by GCTA will be biased and the standard errors will likely be inaccurate. When the population is stratified, we find that GRMs typically have highly skewed singular values, and we prove that the many small singular values cannot be estimated reliably. Hence, GWAS data are necessarily overfit by GCTA which, as a result, produces high estimates of heritability. We also show that GCTA’s heritability estimates are sensitive to the chosen sample and to measurement errors in the phenotype.

    That is interesting. Did you follow the associated controversy (yellow box at the top)? The response from the GCTA authors is pretty harsh by research paper standards: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987770/

    In a recent publication in PNAS, Krishna Kumar et al. (1) claim that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” We show below that those claims are false due to their misunderstanding of the theory and practice of random-effect models underlying genome-wide complex trait analysis (GCTA) (2).

    There are many other errors in the paper by Krishna Kumar et al. (1), as pointed out by us (2) and others (8). In conclusion, Krishna Kumar et al. (1, 5) misunderstood the model and assumptions underlying GCTA-GREML, and therefore used the incorrect expected mean and SD of σˆ2subset for comparison with those values observed from resampling. Hence, their conclusion about biasedness of GREML estimates is not supported by empirical evidence.

    Read More
    • Replies: @utu
    Yes, I have read their response. I do not know who is right in this dispute. I have no sufficient knowledge of the nitty gritty of the GCTA method to be able to make judgments but welcomed the fact that somebody (Kumar) stepped out and made claims that sounded right to me because they confirmed my mathematical intuitions, chiefly about the overfitting which I believe might be a serious issue. I wish we had more people from within the field who could offer some criticism and force the insiders to work a bit harder and to maintain high integrity. I know from personal experience what the groupthink is and what harm can it do to the integrity of research. The practitioners of GCTA should be subjected to blind tests of analyzing real and fake data sets w/o knowing where did they come from and what do they represent. For example in the heigh study with over 200k SNPs and 4k subjects they obtained 45% heritability. I would ask them do the same for the same set of data where height was perturbed by noise of different magnitudes to see how the heritability would change and more importantly how the set of the culprit SNPs would change. This is the issue of stability and robustness. One can think of many different tests that should be performed.
    , @utu
    I found this but I could not get the copy to read it:

    http://www.biorxiv.org/content/early/2016/02/13/039594
    Response to Commentary on "Limitations of GCTA as a solution to the missing heritability problem"

    In a recent manuscript, Yang and colleagues criticized our paper, "Limitations of GCTA as a solution to the missing heritability problem". Here we show that their main claims are statistically invalid, and our results hold as stated.
    , @utu
    Here is an interesting article that explains the overfitting issue which I keep yapping about. Also from the formula presented there the importance why the data set must be large to reduce the overfitting effect can be seen which when one uses many SNPs is compounded proportionally to the numbers of SNPs. This is all intuitively obvious but actual formulas are not. The derivation of formulas requires some assumptions like normal distribution which not always are valid.

    Pitfalls of predicting complex traits from SNPs
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4096801/

    If the correlation (R) between a phenotype and a single SNP in the population is zero (that is, the SNP is not associated with the trait), the expected value of the squared correlation (R2) estimated from a sample of size N is 1/(N-1), or approximately 1/N if N is large. Hence, a randomly chosen ‘candidate’ (but not truly associated) SNP explains 1/N of variation in any sample. Usually 1/N is small enough not to worry about. However, a set of m uncorrelated SNPs that have nothing to do with a phenotype of interest would, when fitted together, explain m/N of variation (due to the summing of their effects). For example, a set of 100 independent SNPs when fitted together in a regression analysis in a discovery sample of Nd = 1000 would, on average, explain R2 =10% of phenotypic variance in the discovery sample under the null hypothesis of no true association.

    Applying the incorrect validation procedure results in over-estimation of the accuracy of the prediction (or over-fitting). An example of where over-fitting occurs is when testing the prediction in the discovery sample, i.e., the same data are used to estimate the effect of SNPs on phenotype and to make predictions
     
    And one more thing. They call the two sets of data the discovery sample (on which you create the model) and the validation sample (on which you confirm your model). I wonder what criteria are used based on corrections r_discovery and r_validation if sample sizes are N_discovery and N_validation. Which correlation as the final one is reported? And what if the sets overlap?

    A less obvious mistake is to select the most significantly associated SNPs in the entire sample and to use these to estimate SNP effects and test their prediction accuracy in the discovery and validation sets55. In this case the variance explained by the SNPs when applied in the validation sample is inflated. It creates bias and misleading results because the initial selection step of the SNPs is based upon there being a chance correlation between these SNPs and the entire sample, so also between the SNPs and any sub-sample
     

    In humans, a polygenic prediction analysis of height in 5,117 individuals from the Framingham Heart Study (FHS; original and offspring cohorts only) reported a prediction R2 of 0.25 using 10-fold cross-validation when including all individuals in the analysis60. However, because FHS includes many related individuals, the authors repeated the analysis restricting the 10-fold cross-validation samples to individuals with no known close relatives (parent-offspring, sibling, or half-sib) in the data set based on pedigree information. In this restricted analysis, the prediction R2 decreased to 0.15. We caution that cryptic relatedness can still inflate prediction accuracy even when known close relatives are excluded.
     
    Another way in which prediction accuracy can be inflated is if the discovery and validation samples contain similar patterns of population stratification and the eventual target population is not similarly stratified. For example, this could occur if discovery and validation samples are independently sampled from a stratified population such as European Americans
  59. @utu
    I am surprised why are you not more skeptical about height studies. You apparently believe all GWAS and GCTA hype and are totally oblivious of serious mathematical challenges that face both methods and do to seem to envision the possibilities of result tweaking and falsification. The only thing you do not like is that IQ is being studied with GWAS and GCTA. You think that all other missing heritability gaps will be closed except for the IQ. You are very naive. Your line of attack of the IQ business is shortsighted. The missing heritability gaps will be closed by hook or by crook. The whole system is grossly undetermined with too many degrees of freedom which leaves plenty of room for the true believers to obtain the desired results in support of their dogma. There are no skeptics left in the field who could watch the hands of true believer are doing.

    You are in the same league as res. You two just differ about the IQ stuff.

    Yes exactly, I believe in the validity of GWAS and GCTA. I just don’t buy the sample size excuse of IQ researchers.

    Read More
    • Replies: @utu
    Yes exactly, I believe in the validity of GWAS and GCTA. I just don’t buy the sample size excuse of IQ researchers.

    Clearly you do believe. It would be nice if you made and attempt to understand what you believe but be careful because it might be a traumatic event for you.

    The sample size actually might be not an excuse. The lower the sample size the easier it is to show high correlation and high heritability. With bigger sample it is harder for them. Probably they could wing it with lower sample size more easily by overfitting which probably what Visscher's people are guilty of anyway The reason they want large sample size is because they need to identify many more SNP's. So far they went as high as 10,000 SNPs and got 4%. The height took 200,000 SNPs to get 45% heritability. And you are very happy, Mr. Believer, with the heigh result, right? So why you do not want to let the IQ-ists have 200,000 or more SNP's?
  60. @res

    You can primarily test whether a phenotype is selected by looking at the variance. Natural selection results in very low variance, nothing like the huge IQ bell curve.

     

    But how do you normalize this? Large and small relative to what? And you seem to be assuming a single optimal value (which would drive low variation). In different environments there might be different tradeoffs for things like IQ/metabolic cost/brain+hip size.

    Large sample sizes are ridicule when someone is stubbornly trying to prove that common variants are responsible for most variance yet can only explain 4% of it. There is a moment when you just need to stop grasping at straws.
     
    I think you are reading too much into needing a larger sample size to find the individual SNPs. If you believe the Visscher height GCTA then you should also believe the cognitive ability GCTA analyses (I linked one above) which indicate significant variance explained by common SNPs.

    Also, when a trait is selected, you notice very high between group differences and consistent geographic distribution (equatorial populations are dark, rainforest dwellers are short…)
     
    You mean like a >1SD difference in IQ between SSA and northern populations? (sorry, but you can't expect me not to respond to such a provocative comment ; )

    But how do you normalize this? Large and small relative to what? And you seem to be assuming a single optimal value (which would drive low variation).

    Well, natural selection is all about selecting for a very narrow range of superior fitness in adaptation to a given environment.

    In different environments there might be different tradeoffs for things like IQ/metabolic cost/brain+hip size.

    Or there might not be such tradeoffs.

    I think you are reading too much into needing a larger sample size to find the individual SNPs. If you believe the Visscher height GCTA then you should also believe the cognitive ability GCTA analyses (I linked one above) which indicate significant variance explained by common SNPs.

    I agree with them, except I’m bothered by the absence of specific identified loci, which make it difficult to prove causality beyond mere statistical relationship.

    However, the heritability of cognitive ability appears to be much more modest than usually stated.

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3652710/table/table1-0956797612457952/

    You mean like a >1SD difference in IQ between SSA and northern populations?

    No, I mean like a >∞SD difference in skin color so that a native equatorial African is never lighter skinned than any European. No overlap, no narrowing gap (reduced by 1/3 in the US, negligible in Europe), just something as stable and divergent as skin color, hair texture, facial features.

    Read More
  61. @res
    Leaving aside Piffer's work, it is easy enough to go to the 1000 genomes browser (or SNPedia etc.) and look at the MAF for different IQ SNPs and see how much they vary between populations. Surely that is an interesting observation? It would be an amazing coincidence to have that and a zero overall contribution of genes to group differences in IQ both be true.

    There is much more work to do to understand the relevant IQ SNPs in the different populations, but surely at least one of those SNPs will be relevant in both populations?

    Leaving aside Piffer’s work, it is easy enough to go to the 1000 genomes browser (or SNPedia etc.) and look at the MAF for different IQ SNPs and see how much they vary between populations. Surely that is an interesting observation? It would be an amazing coincidence to have that and a zero overall contribution of genes to group differences in IQ both be true.

    Well, no the MAF calculation is wrong according to intelligence journal reviewers.

    You can read his rant here: https://topseudoscience.wordpress.com/2016/01/10/the-forbidden-paper-on-the-population-genetics-of-iq/

    And the corrected data here: https://topseudoscience.wordpress.com/2016/01/14/using-derived-alleles-to-amplify-selection-signatures-on-intelligence/

    And yet again, it’s only done using European GWAS hits. Population-specific variants can greatly change the data, especially when the differences in polygenic score are so low.

    Piffer is a clown. This one needs a nickname too.

    Read More
    • Replies: @res

    Piffer is a clown. This one needs a nickname too.
     
    From my point of view statements like that just indicate someone does not have real counter arguments to offer. Such statements seem to be epidemic these days. Friends don't let friends act like SJWs.

    Well, no the MAF calculation is wrong according to intelligence journal reviewers.
     
    So are you saying the MAFs given by 1000 genomes and SNPedia are wrong? Pretend Piffer does not exist. Take the SNPs from the IQ studies and look at the MAFs in 1000 genomes and SNPedia. Do you think they are identical between groups?
  62. @res

    Well, height research uses racially homogeneous samples too, and they find things like that:
     
    I see that your link is for a study on individuals from African descent, but I don't see how that counters what I said.

    If the phenotype doesn’t differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don’t need to control for race.
     
    OK. Then we agree on that. Thanks.

    It’s an error to assume height and intelligence research are comparable. Height research doesn’t requires unreasonable sample sizes to find substantial heritability.
     
    Perhaps you could offer a more rigorous definition of "unreasonable sample sizes" and "substantial heritability"? The GIANT study you linked has a sample size of 250,000 people (more than the study in this post) and explained 20% of heritability (in contrast to the 4% for this IQ study).

    The question is how comparable are IQ and height research (i.e. it is not a binary comparable or not question). The main differences I see are that height has a higher genetic heritability and is more easily and accurately measurable (as you noted earlier). These are both going to increase the sample size needed for an IQ study to explain as much heritability. In addition, the ease and frequency of measuring height means it is far easier to get good large sample data.

    Based on the estimates in http://www.biorxiv.org/content/early/2017/08/11/175406 (also linked above) it looks like explaining 50% of heritability will take an IQ sample of 800k vs. a height sample of 350k. That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.

    The GIANT study you linked has a sample size of 250,000 people (more than the study in this post)

    What? This post’s study samples 280K individuals and manages to explain just 4% of the variance.

    That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.

    This is completely unreasonable. Never saw such a sample size requirement for any other trait (even behavioral).

    Read More
    • Replies: @res

    What? This post’s study samples 280K individuals and manages to explain just 4% of the variance.
     
    My mistake. I misremembered it as 230k (which I think corresponds to one of the other studies I mentioned).

    Never saw such a sample size requirement for any other trait (even behavioral).
     
    Maybe you can have a debate with hyperbola about that.
  63. Where did I say that? There was a reason I asked for you to quote my words.

    You literally said height and IQ research were comparable.

    OK. I’m even giving you last word on that except to note I disagree with “They only threw random comments on race.”

    Show me their peer-reviewed papers on the topic.
    And their arrest warrant from the PC inquisition .

    P.S. You are aware that of your three examples (Rushton, Harpending, Hsu) two are dead, right? So as far as “doing their stuff un-bothered” not so much anymore.

    Yeah, but you know what I mean. Both Flushton and Harpoondick died from a natural cause, without ever being fired from their universities or anything. Academic freedom, a lot of charlatans get away with junk research. As far as steve shoe, he’s even an administrator or something.

    Read More
    • Replies: @res

    You literally said height and IQ research were comparable.
     
    If it was so literal you shouldn't have trouble pointing to a quote, should you? This eagerness to "rebut" strawman paraphrases is another tell that the debate is not worth continuing.

    I assume you are referring to my comment 31 where I said: "That is not the case for height" in response to: "Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences."

    I stand by that statement. Do you stand by that last quote from you? You do understand that "additive heritability" already excludes epigenetics and environmental influences, right?

    Height is an imperfect but useful analogy for IQ genetics. Since height is easier to measure and more heritable the genetic research for height proceeds more quickly (e.g. smaller sample sizes), but the progress over time seems similar. IQ is just delayed (sample sizes) and will have a smaller max % variance explained (lower heritability).

    Show me their peer-reviewed papers on the topic.
    And their arrest warrant from the PC inquisition .
     
    Yeah, James Watson is clearly not a real scientist on the topic of genetics. LOL!
    And that second sentence is another indicator of you being disingenuous on this topic. We both know the PC inquisition does not rely on arrest warrants ; ) It is more about mob justice (aka lynching).

    Academic freedom, a lot of charlatans get away with junk research. As far as steve shoe, he’s even an administrator or something.
     
    Ad hominems. The best way of declaring one has lost the reasoned argument part of the conversation.
  64. @res

    Well, height research uses racially homogeneous samples too, and they find things like that:
     
    I see that your link is for a study on individuals from African descent, but I don't see how that counters what I said.

    If the phenotype doesn’t differ by ancestry, there is no risk of mistaking neutral ancestry markers for genes that act on phenotype. So you don’t need to control for race.
     
    OK. Then we agree on that. Thanks.

    It’s an error to assume height and intelligence research are comparable. Height research doesn’t requires unreasonable sample sizes to find substantial heritability.
     
    Perhaps you could offer a more rigorous definition of "unreasonable sample sizes" and "substantial heritability"? The GIANT study you linked has a sample size of 250,000 people (more than the study in this post) and explained 20% of heritability (in contrast to the 4% for this IQ study).

    The question is how comparable are IQ and height research (i.e. it is not a binary comparable or not question). The main differences I see are that height has a higher genetic heritability and is more easily and accurately measurable (as you noted earlier). These are both going to increase the sample size needed for an IQ study to explain as much heritability. In addition, the ease and frequency of measuring height means it is far easier to get good large sample data.

    Based on the estimates in http://www.biorxiv.org/content/early/2017/08/11/175406 (also linked above) it looks like explaining 50% of heritability will take an IQ sample of 800k vs. a height sample of 350k. That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.

    I see that your link is for a study on individuals from African descent, but I don’t see how that counters what I said.

    They explicitly state that studies had only been carried on Caucasian samples and that minority samples would be needed to discover novel population specific variants.

    Perhaps you could offer a more rigorous definition of “unreasonable sample sizes” and “substantial heritability”?

    No.

    I’m just thinking this way in comparison to research on other traits for which no clueless blogers are making excuses, and that actually don’t need such excuses.

    Read More
    • Replies: @res

    They explicitly state that studies had only been carried on Caucasian samples and that minority samples would be needed to discover novel population specific variants.
     
    Yes. And where exactly does that disagree with what I said? We were discussing the reasons studies are typically done on the larger (in the countries doing most of the studies) white subpopulation. You are giving the reason that is not sufficient. They are complementary (not opposing) points.

    Perhaps you could offer a more rigorous definition of “unreasonable sample sizes” and “substantial heritability”?

    No.
     

     
    That says a lot about how worthwhile it is to continue this debate beyond this point. Thanks for the clarity.
  65. res says:
    @Double Juice JJ

    Leaving aside Piffer’s work, it is easy enough to go to the 1000 genomes browser (or SNPedia etc.) and look at the MAF for different IQ SNPs and see how much they vary between populations. Surely that is an interesting observation? It would be an amazing coincidence to have that and a zero overall contribution of genes to group differences in IQ both be true.
     
    Well, no the MAF calculation is wrong according to intelligence journal reviewers.

    You can read his rant here: https://topseudoscience.wordpress.com/2016/01/10/the-forbidden-paper-on-the-population-genetics-of-iq/

    And the corrected data here: https://topseudoscience.wordpress.com/2016/01/14/using-derived-alleles-to-amplify-selection-signatures-on-intelligence/

    And yet again, it's only done using European GWAS hits. Population-specific variants can greatly change the data, especially when the differences in polygenic score are so low.

    Piffer is a clown. This one needs a nickname too.

    Piffer is a clown. This one needs a nickname too.

    From my point of view statements like that just indicate someone does not have real counter arguments to offer. Such statements seem to be epidemic these days. Friends don’t let friends act like SJWs.

    Well, no the MAF calculation is wrong according to intelligence journal reviewers.

    So are you saying the MAFs given by 1000 genomes and SNPedia are wrong? Pretend Piffer does not exist. Take the SNPs from the IQ studies and look at the MAFs in 1000 genomes and SNPedia. Do you think they are identical between groups?

    Read More
  66. utu says:
    @Double Juice JJ
    Yes exactly, I believe in the validity of GWAS and GCTA. I just don't buy the sample size excuse of IQ researchers.

    Yes exactly, I believe in the validity of GWAS and GCTA. I just don’t buy the sample size excuse of IQ researchers.

    Clearly you do believe. It would be nice if you made and attempt to understand what you believe but be careful because it might be a traumatic event for you.

    The sample size actually might be not an excuse. The lower the sample size the easier it is to show high correlation and high heritability. With bigger sample it is harder for them. Probably they could wing it with lower sample size more easily by overfitting which probably what Visscher’s people are guilty of anyway The reason they want large sample size is because they need to identify many more SNP’s. So far they went as high as 10,000 SNPs and got 4%. The height took 200,000 SNPs to get 45% heritability. And you are very happy, Mr. Believer, with the heigh result, right? So why you do not want to let the IQ-ists have 200,000 or more SNP’s?

    Read More
    • Replies: @hyperbola
    Getting ever more SNPs is not a step forward if it means that there are ever more influences on heritability. It is for example extremely deleterious for the "promised" land of personalized medicine. And, of course, you are right about the "under-determination" of such studies.

    Personally I think the way this will go forward is much broader than simple DNA sequencing. This will never do much more than supply some starting points for more relevant studies. Something along the lines suggested by this paper.

    Scholz SW, Mhyre T, Ressom H, Shah S, Federoff HJ. 2012. Genomics and
    bioinformatics of Parkinson’s disease. Cold Spring Harb Perspect Med
    2:a009449.

    In fact, I would almost predict that eventually gene sequencing will become totally irrelevant even as a diagnostic tool. If we can find ways to monitor crucial network function, then looking for thousands (millions?) of variant genes (some of which may not have been previously identified in the context of the individual being diagnosed), then gene sequencing is a chimeric waste of time.
  67. hyperbola says:
    @res
    When you are making an argument and using the literature as evidence it is your obligation to support your argument with specific citations. And more typically, specific excerpts. Let's see how one might do that. It is frustrating to have to do your work for you.

    Let's examine one of the statements from your comment 24:

    Then look up what percentage of the “disease load” of human beings these necessary and sufficient genes account for (ca. 1%).
     
    Then take a look at the abstract for your second reference (emphasis mine): https://www.ncbi.nlm.nih.gov/pubmed/27818248

    Almost two decades after the identification of SNCA as the first causative gene in Parkinson's disease (PD) and the subsequent understanding that genetic factors play a substantial role in PD development, our knowledge of the genetic architecture underlying this disease has vastly improved. Approximately 5-10% of patients suffer from a monogenic form of PD where autosomal dominant mutations in SNCA, LRRK2, and VPS35 and autosomal recessive mutations in PINK1, DJ-1, and Parkin cause the disease with high penetrance.

     

    Sounds more like evidence against your quoted statement than evidence for it.

    I don't have good library access at the moment and it appears medical research believes in restricted availability so that is a problem as you note. But as we will see my access appears adequate for the task at hand.

    Happily your first reference has full text available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3385936/

    There we find:

    In PD for example only ~60% of heritability is understood, depending on the population studied
     
    Gosh, only 60%. That paper talks more about future possibilities than GWAS problems AFAICT, but perhaps you can supply a quote in your support from it?

    Your third reference also has full text available: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3783830/

    The abstract is a bit more supportive of your assertions. Emphasis mine, but note the "most".

    Although unbiased genome-wide association studies (GWAS) have identified novel associations to neurodegenerative diseases, most of these hits explain only modest fractions of disease heritability. In addition, despite the substantial overlap of clinical and pathologic features among major neurodegenerative diseases, surprisingly few GWAS-implicated variants appear to exhibit cross-disease association. These realities suggest limitations of the focus on individual genetic variants and create challenges for the development of diagnostic and therapeutic strategies
     
    But looking closer we find:

    For example, although up to 60-80% of AD risk is estimated to derive from genetic factors [14], known genes including the uniquely large effect of APOE (apolipoprotein E) account for just half of this genetic variance

     

    Gosh, only half of 60-80%. Rather far from 1% it seems.

    Your fourth reference is interesting and freely available so here is a link: http://www.ams.org/journals/bull/2003-40-01/S0273-0979-02-00965-5/
    But it is from 2003 (lacks current genetic knowledge) and seems to focus on the unknowability of complex systems, so I'll just note there is a difference between the glass being half full (or empty) and 1% full.

    Your fifth reference only appears to address your point tangentially, but feel free to correct me: https://tbiomed.biomedcentral.com/articles/10.1186/1742-4682-7-20

    Your final reference also seems to be tangential: http://journal.frontiersin.org/article/10.3389/fgene.2012.00067/full

    Don't get me wrong, those last three references are incredibly fascinating from a philosophy of biology point of view. It's just that I don't think they work for making your point beyond being able to throw FUD around.

    I will leave it up to you (and other readers) to decide whether I (and you) am (are) capable of correctly reading and interpreting this literature.

    Next time at least try to find references that support rather than refute your position.

    You present seriously distorted arguments that are clearly intentional misrepresentations. Your arguments are not made credible by such practices. I am not interested much in batting down your attempts at cherry picking of sentences. So I simply note:

    Ref. 2. ONLY 5-10% of Parkinsons disease can be explained by genes that approach the necessary and sufficient criteria despite over a decade of GWAS studies. This means that 90-95% cannot. For Parkinsons numerous genes have been identified that are neither necessary nor sufficient, i.e. indicate dependence on large networks of genes.

    References 4-6 provide you with a beginning to thinking about why we should normally expect that complex traits (e.g. “general cognitive function”) will be based on large numbers of genes with limited influence of single genes. The exceptions are the “rare disease” cases that constitute about 1% of human disease load (and yes on occasion these may have been classified by medical doctors within a more inclusive category of disease, e.g. the 5% of “Parkinsons”).

    Sorry you weren’t up to understanding why refs 4-6 were included. “Unknowability in complex systems” should be come a major criterion in research funding decisions! Especially for “outcomes” as nebulously defined as “general cognitive function”.

    Read More
    • Replies: @res

    You present seriously distorted arguments that are clearly intentional misrepresentations. Your arguments are not made credible by such practices. I am not interested much in batting down your attempts at cherry picking of sentences. So I simply note:
     
    I think the validity of the arguments in my comment 29 can easily be compared to your comment 71 by others, but a few thoughts.

    My "distorted" quotes vs. your vague paraphrases. I think we can all judge for ourselves the relative worth of those.

    "not interested" = unable to.

    If my arguments are such distortions and misrepresentations (when you don't have data on your side pound the table and use pejorative language) they should be easy to rebut.

    Ref. 2. ONLY 5-10% of Parkinsons disease can be explained by genes that approach the necessary and sufficient criteria despite over a decade of GWAS studies. This means that 90-95% cannot.
     
    Conveniently ignoring your refs 1. and 3. and you accuse me of cherry picking?! LOL! Projection is a terrible thing.

    And remember, 5-10% >> 1% which was the number of yours I quoted in comment 29. Are you disavowing that statement now in favor of 5-10%?

    Sorry you weren’t up to understanding why refs 4-6 were included. “Unknowability in complex systems” should be come a major criterion in research funding decisions!
     
    Since you've proved you aren't up to actually debating the facts I think I'll start with the ad hominems now, hyperbole (1%! the name is well earned). You might want to investigate the psychological concept of projection before any more discourse on my perceived inadequacies.

    "Unknowability" probably does not mean what you think. Inability to know the detailed state and mechanisms down to the fundamental particle level does not imply we are unable to do useful science and engineering with the things we do know. To think otherwise is simply ignorant. But then I guess physiology, thermodynamics, etc. are all worthless.
  68. hyperbola says:
    @utu
    Yes exactly, I believe in the validity of GWAS and GCTA. I just don’t buy the sample size excuse of IQ researchers.

    Clearly you do believe. It would be nice if you made and attempt to understand what you believe but be careful because it might be a traumatic event for you.

    The sample size actually might be not an excuse. The lower the sample size the easier it is to show high correlation and high heritability. With bigger sample it is harder for them. Probably they could wing it with lower sample size more easily by overfitting which probably what Visscher's people are guilty of anyway The reason they want large sample size is because they need to identify many more SNP's. So far they went as high as 10,000 SNPs and got 4%. The height took 200,000 SNPs to get 45% heritability. And you are very happy, Mr. Believer, with the heigh result, right? So why you do not want to let the IQ-ists have 200,000 or more SNP's?

    Getting ever more SNPs is not a step forward if it means that there are ever more influences on heritability. It is for example extremely deleterious for the “promised” land of personalized medicine. And, of course, you are right about the “under-determination” of such studies.

    Personally I think the way this will go forward is much broader than simple DNA sequencing. This will never do much more than supply some starting points for more relevant studies. Something along the lines suggested by this paper.

    Scholz SW, Mhyre T, Ressom H, Shah S, Federoff HJ. 2012. Genomics and
    bioinformatics of Parkinson’s disease. Cold Spring Harb Perspect Med
    2:a009449.

    In fact, I would almost predict that eventually gene sequencing will become totally irrelevant even as a diagnostic tool. If we can find ways to monitor crucial network function, then looking for thousands (millions?) of variant genes (some of which may not have been previously identified in the context of the individual being diagnosed), then gene sequencing is a chimeric waste of time.

    Read More
    • Replies: @res
    Direct link to your reference: https://www.ncbi.nlm.nih.gov/pubmed/22762024

    Last sentence in the abstract:

    Herein we discuss how neurogenomics and bioinformatics are applied to dissect the nature of this complex disease with the overall aim of developing rational therapeutic interventions.
     
    What is it with you and the self-refuting references? Do you not understand what the papers say or do you just think the rest of us will blindly bow down to you because you give a proper citation? You do understand that the SNP detection is a necessary but not sufficient part of this process, right?

    I don't think anyone here is arguing that finding more SNPs will be the end of the process. But it is a key step towards understanding the relationship of genetics to various traits (and biological mechanisms) and hopefully eventually offers (as described in your reference) the opportunity for therapeutic interventions, as well as a greater understanding of reality (which I think is fair to say is the basic goal of science).

    In fact, I would almost predict that eventually gene sequencing will become totally irrelevant even as a diagnostic tool.
     
    This might be the funniest statement I have read here. It sure seems counter to current trends. A good one to revisit in 10 or 20 years.

    For some counter examples, do you think people will stop using APOE as an Alzheimers vulnerability screen? Do you think JScreen is a waste of time? https://jscreen.org/faq/

    Look, I understand genetic testing is not the be all and end all of understanding biology. There are reasons PKU testing is done by looking for the necessary enzyme (phenylalanine hydroxylase) in the blood rather than just looking at SNPs (e.g. a rare non SNP variant breaking the enzyme production). And you are correct about the importance of integrating our understanding of genetics with other aspects of human biology. FWIW I have taken multiple systems biology courses and that integration is a major focus. (as is the integration between the multiple levels of biology, subcellular, cellular, tissue, organ, organism)

    The reason the genetic screens will likely always remain relevant is because of the predictive power they offer before the human entity even exists. They currently allow screening potential parents for risks (JScreen) and may eventually allow various uses of PGD (Preimplantation genetic diagnosis).
  69. res says:
    @Double Juice JJ

    The GIANT study you linked has a sample size of 250,000 people (more than the study in this post)
     
    What? This post's study samples 280K individuals and manages to explain just 4% of the variance.

    That does not seem especially unreasonable to me, but we will have to wait and see how the estimates match reality.
     
    This is completely unreasonable. Never saw such a sample size requirement for any other trait (even behavioral).

    What? This post’s study samples 280K individuals and manages to explain just 4% of the variance.

    My mistake. I misremembered it as 230k (which I think corresponds to one of the other studies I mentioned).

    Never saw such a sample size requirement for any other trait (even behavioral).

    Maybe you can have a debate with hyperbola about that.

    Read More
  70. res says:
    @Double Juice JJ

    I see that your link is for a study on individuals from African descent, but I don’t see how that counters what I said.
     
    They explicitly state that studies had only been carried on Caucasian samples and that minority samples would be needed to discover novel population specific variants.

    Perhaps you could offer a more rigorous definition of “unreasonable sample sizes” and “substantial heritability”?
     
    No.

    I'm just thinking this way in comparison to research on other traits for which no clueless blogers are making excuses, and that actually don't need such excuses.

    They explicitly state that studies had only been carried on Caucasian samples and that minority samples would be needed to discover novel population specific variants.

    Yes. And where exactly does that disagree with what I said? We were discussing the reasons studies are typically done on the larger (in the countries doing most of the studies) white subpopulation. You are giving the reason that is not sufficient. They are complementary (not opposing) points.

    Perhaps you could offer a more rigorous definition of “unreasonable sample sizes” and “substantial heritability”?

    No.

    That says a lot about how worthwhile it is to continue this debate beyond this point. Thanks for the clarity.

    Read More
  71. res says:
    @Double Juice JJ

    Where did I say that? There was a reason I asked for you to quote my words.
     
    You literally said height and IQ research were comparable.

    OK. I’m even giving you last word on that except to note I disagree with “They only threw random comments on race.”
     
    Show me their peer-reviewed papers on the topic.
    And their arrest warrant from the PC inquisition .

    P.S. You are aware that of your three examples (Rushton, Harpending, Hsu) two are dead, right? So as far as “doing their stuff un-bothered” not so much anymore.
     
    Yeah, but you know what I mean. Both Flushton and Harpoondick died from a natural cause, without ever being fired from their universities or anything. Academic freedom, a lot of charlatans get away with junk research. As far as steve shoe, he's even an administrator or something.

    You literally said height and IQ research were comparable.

    If it was so literal you shouldn’t have trouble pointing to a quote, should you? This eagerness to “rebut” strawman paraphrases is another tell that the debate is not worth continuing.

    I assume you are referring to my comment 31 where I said: “That is not the case for height” in response to: “Whatever heritability exists must be mostly due to rare variants and confounded by epigenetics and environmental influences.”

    I stand by that statement. Do you stand by that last quote from you? You do understand that “additive heritability” already excludes epigenetics and environmental influences, right?

    Height is an imperfect but useful analogy for IQ genetics. Since height is easier to measure and more heritable the genetic research for height proceeds more quickly (e.g. smaller sample sizes), but the progress over time seems similar. IQ is just delayed (sample sizes) and will have a smaller max % variance explained (lower heritability).

    Show me their peer-reviewed papers on the topic.
    And their arrest warrant from the PC inquisition .

    Yeah, James Watson is clearly not a real scientist on the topic of genetics. LOL!
    And that second sentence is another indicator of you being disingenuous on this topic. We both know the PC inquisition does not rely on arrest warrants ; ) It is more about mob justice (aka lynching).

    Academic freedom, a lot of charlatans get away with junk research. As far as steve shoe, he’s even an administrator or something.

    Ad hominems. The best way of declaring one has lost the reasoned argument part of the conversation.

    Read More
  72. res says:
    @hyperbola
    You present seriously distorted arguments that are clearly intentional misrepresentations. Your arguments are not made credible by such practices. I am not interested much in batting down your attempts at cherry picking of sentences. So I simply note:

    Ref. 2. ONLY 5-10% of Parkinsons disease can be explained by genes that approach the necessary and sufficient criteria despite over a decade of GWAS studies. This means that 90-95% cannot. For Parkinsons numerous genes have been identified that are neither necessary nor sufficient, i.e. indicate dependence on large networks of genes.

    References 4-6 provide you with a beginning to thinking about why we should normally expect that complex traits (e.g. "general cognitive function") will be based on large numbers of genes with limited influence of single genes. The exceptions are the "rare disease" cases that constitute about 1% of human disease load (and yes on occasion these may have been classified by medical doctors within a more inclusive category of disease, e.g. the 5% of "Parkinsons").

    Sorry you weren't up to understanding why refs 4-6 were included. "Unknowability in complex systems" should be come a major criterion in research funding decisions! Especially for "outcomes" as nebulously defined as "general cognitive function".

    You present seriously distorted arguments that are clearly intentional misrepresentations. Your arguments are not made credible by such practices. I am not interested much in batting down your attempts at cherry picking of sentences. So I simply note:

    I think the validity of the arguments in my comment 29 can easily be compared to your comment 71 by others, but a few thoughts.

    My “distorted” quotes vs. your vague paraphrases. I think we can all judge for ourselves the relative worth of those.

    “not interested” = unable to.

    If my arguments are such distortions and misrepresentations (when you don’t have data on your side pound the table and use pejorative language) they should be easy to rebut.

    Ref. 2. ONLY 5-10% of Parkinsons disease can be explained by genes that approach the necessary and sufficient criteria despite over a decade of GWAS studies. This means that 90-95% cannot.

    Conveniently ignoring your refs 1. and 3. and you accuse me of cherry picking?! LOL! Projection is a terrible thing.

    And remember, 5-10% >> 1% which was the number of yours I quoted in comment 29. Are you disavowing that statement now in favor of 5-10%?

    Sorry you weren’t up to understanding why refs 4-6 were included. “Unknowability in complex systems” should be come a major criterion in research funding decisions!

    Since you’ve proved you aren’t up to actually debating the facts I think I’ll start with the ad hominems now, hyperbole (1%! the name is well earned). You might want to investigate the psychological concept of projection before any more discourse on my perceived inadequacies.

    “Unknowability” probably does not mean what you think. Inability to know the detailed state and mechanisms down to the fundamental particle level does not imply we are unable to do useful science and engineering with the things we do know. To think otherwise is simply ignorant. But then I guess physiology, thermodynamics, etc. are all worthless.

    Read More
    • Replies: @hyperbola
    Not interested was the correct statement. Your attempts to cherry-pick individual sentences are just obnoxious. As others (such as utu) have also pointed out, statements that you like (e.g. "60 % of variability is explainable") become pretty meaningless when the number of genes required to reach that level becomes several hundreds.
  73. Logan says:
    @nickels
    I don't know enough about epigenetics to know how far the factory analogy goes.
    I do worry about people who are starting to use epigenetics to argue bizarre social justice constructs.

    From my preliminary reading there seems to be discrete information coded into the cell beyond DNA that directs certain functions of cell structure and organism growth.

    So does the environment play a roll, or is it just more deterministic info like DNA? Not sure.

    I don’t know either, and I don’t think anyone does.

    But I think it’s reasonably clear that DNA doesn’t decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6′ and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don’t understand well at all.

    Read More
    • Replies: @res

    But I think it’s reasonably clear that DNA doesn’t decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6′ and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don’t understand well at all.
     
    This is one of the most sensible things I have read on this comment thread. Probably a good way of explaining it to people who don't want more details, but if I might offer a variant.

    DNA decides a distribution of possible heights over possible environments. The characteristics of that distribution are interesting, but unknowable in detail. Let's assume it is somewhat Gaussian (e.g. concentrated around a center with variation in either direction). Over typical environments DNA will define a likely range of heights (I suspect of varying range widths, some people will be more sensitive to the environment, some more robust) around a "typical" height. Specifically targeted interventions in the environment might cause exceptional results (probably easier in the less functional rather than more functional direction).

    As an example with height, sufficient application of human growth hormone should be able to make people (almost, within the range seen over human history) arbitrarily tall.

    As a non-height example, a PKU specific diet might prevent adverse outcomes which would occur in almost any "typical" environment.
  74. res says:
    @hyperbola
    Getting ever more SNPs is not a step forward if it means that there are ever more influences on heritability. It is for example extremely deleterious for the "promised" land of personalized medicine. And, of course, you are right about the "under-determination" of such studies.

    Personally I think the way this will go forward is much broader than simple DNA sequencing. This will never do much more than supply some starting points for more relevant studies. Something along the lines suggested by this paper.

    Scholz SW, Mhyre T, Ressom H, Shah S, Federoff HJ. 2012. Genomics and
    bioinformatics of Parkinson’s disease. Cold Spring Harb Perspect Med
    2:a009449.

    In fact, I would almost predict that eventually gene sequencing will become totally irrelevant even as a diagnostic tool. If we can find ways to monitor crucial network function, then looking for thousands (millions?) of variant genes (some of which may not have been previously identified in the context of the individual being diagnosed), then gene sequencing is a chimeric waste of time.

    Direct link to your reference: https://www.ncbi.nlm.nih.gov/pubmed/22762024

    Last sentence in the abstract:

    Herein we discuss how neurogenomics and bioinformatics are applied to dissect the nature of this complex disease with the overall aim of developing rational therapeutic interventions.

    What is it with you and the self-refuting references? Do you not understand what the papers say or do you just think the rest of us will blindly bow down to you because you give a proper citation? You do understand that the SNP detection is a necessary but not sufficient part of this process, right?

    I don’t think anyone here is arguing that finding more SNPs will be the end of the process. But it is a key step towards understanding the relationship of genetics to various traits (and biological mechanisms) and hopefully eventually offers (as described in your reference) the opportunity for therapeutic interventions, as well as a greater understanding of reality (which I think is fair to say is the basic goal of science).

    In fact, I would almost predict that eventually gene sequencing will become totally irrelevant even as a diagnostic tool.

    This might be the funniest statement I have read here. It sure seems counter to current trends. A good one to revisit in 10 or 20 years.

    For some counter examples, do you think people will stop using APOE as an Alzheimers vulnerability screen? Do you think JScreen is a waste of time? https://jscreen.org/faq/

    Look, I understand genetic testing is not the be all and end all of understanding biology. There are reasons PKU testing is done by looking for the necessary enzyme (phenylalanine hydroxylase) in the blood rather than just looking at SNPs (e.g. a rare non SNP variant breaking the enzyme production). And you are correct about the importance of integrating our understanding of genetics with other aspects of human biology. FWIW I have taken multiple systems biology courses and that integration is a major focus. (as is the integration between the multiple levels of biology, subcellular, cellular, tissue, organ, organism)

    The reason the genetic screens will likely always remain relevant is because of the predictive power they offer before the human entity even exists. They currently allow screening potential parents for risks (JScreen) and may eventually allow various uses of PGD (Preimplantation genetic diagnosis).

    Read More
    • Replies: @hyperbola
    You are once again cherry-picking in your first sentences. Follwing the introduction, the paper includes extensive discussion of why genetic appraoches like measurement of SNPs will always be inadequate.

    No matter how many SNPs you measure, they will never be a complete set for all human beings and in many cases they may be totally irrelvant to the genetic context of the individual for which you propose to carry out a diagnosis/cure. This is the unavoidable conclusion of finding genetic variants that are neither necessary nor sufficient. It is intimately related to the creation of robustness in biological systems through complexity/redundancy of networks. That very complexity means that the kinds of screens you envisage (e.g. JScreen) are likely to be relevant only to "rare" diseases and irrelevant to most of human disease. If the functional "networks" include hundreds of genes (as the original article suggests for "general cognitive function"), then there may well be many thousands, even millions, of SNPs that may somehow affect the network function, i.e. are "associated" with the disease. Many (most?) of these may never be found by population screens that include only an infinitesimal part of the human population. The intertwinning of such complex networks also means that those SNPs that you claim to have identified with a particular disease may well have many other unexpected functional consequences. All of this means that, apart from rare disease, "personalized" medicine and most germ line manipulations are heavily overmarketed and may well be criminal manipulations.
  75. res says:
    @Logan
    I don't know either, and I don't think anyone does.

    But I think it's reasonably clear that DNA doesn't decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6' and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don't understand well at all.

    But I think it’s reasonably clear that DNA doesn’t decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6′ and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don’t understand well at all.

    This is one of the most sensible things I have read on this comment thread. Probably a good way of explaining it to people who don’t want more details, but if I might offer a variant.

    DNA decides a distribution of possible heights over possible environments. The characteristics of that distribution are interesting, but unknowable in detail. Let’s assume it is somewhat Gaussian (e.g. concentrated around a center with variation in either direction). Over typical environments DNA will define a likely range of heights (I suspect of varying range widths, some people will be more sensitive to the environment, some more robust) around a “typical” height. Specifically targeted interventions in the environment might cause exceptional results (probably easier in the less functional rather than more functional direction).

    As an example with height, sufficient application of human growth hormone should be able to make people (almost, within the range seen over human history) arbitrarily tall.

    As a non-height example, a PKU specific diet might prevent adverse outcomes which would occur in almost any “typical” environment.

    Read More
    • Replies: @Logan
    Thanks. Your explanation is much better.

    Sadly, I don't have the background to really explain what I intuitively understand about this issue.

    I listen to books while driving, and I recently listened to the novel Brave New World. In it, they clone people, then damage their IQ to create groups to handle the drone tasks without resenting it or rebelling. (Huxley didn't realize that in the real future there will be little need for drones.)

    But what I took away from it is, that while we don't know how to increase people's IQ except on the margins, we know exactly how to reduce it.
    , @hyperbola
    Your growth hormone example is probably flawed. We already know that network redundancy can allow compensation for lack of individual hormones by others in the appropriate environmental context. Although we can't do the excperiment on humans, artificial overabundance of hormones is more than likely subject to compensation. This is what robustness is about.
  76. utu says:
    @res
    That is interesting. Did you follow the associated controversy (yellow box at the top)? The response from the GCTA authors is pretty harsh by research paper standards: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987770/

    In a recent publication in PNAS, Krishna Kumar et al. (1) claim that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” We show below that those claims are false due to their misunderstanding of the theory and practice of random-effect models underlying genome-wide complex trait analysis (GCTA) (2).
    ...
    There are many other errors in the paper by Krishna Kumar et al. (1), as pointed out by us (2) and others (8). In conclusion, Krishna Kumar et al. (1, 5) misunderstood the model and assumptions underlying GCTA-GREML, and therefore used the incorrect expected mean and SD of σˆ2subset for comparison with those values observed from resampling. Hence, their conclusion about biasedness of GREML estimates is not supported by empirical evidence.
     

    Yes, I have read their response. I do not know who is right in this dispute. I have no sufficient knowledge of the nitty gritty of the GCTA method to be able to make judgments but welcomed the fact that somebody (Kumar) stepped out and made claims that sounded right to me because they confirmed my mathematical intuitions, chiefly about the overfitting which I believe might be a serious issue. I wish we had more people from within the field who could offer some criticism and force the insiders to work a bit harder and to maintain high integrity. I know from personal experience what the groupthink is and what harm can it do to the integrity of research. The practitioners of GCTA should be subjected to blind tests of analyzing real and fake data sets w/o knowing where did they come from and what do they represent. For example in the heigh study with over 200k SNPs and 4k subjects they obtained 45% heritability. I would ask them do the same for the same set of data where height was perturbed by noise of different magnitudes to see how the heritability would change and more importantly how the set of the culprit SNPs would change. This is the issue of stability and robustness. One can think of many different tests that should be performed.

    Read More
  77. utu says:
    @res
    That is interesting. Did you follow the associated controversy (yellow box at the top)? The response from the GCTA authors is pretty harsh by research paper standards: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987770/

    In a recent publication in PNAS, Krishna Kumar et al. (1) claim that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” We show below that those claims are false due to their misunderstanding of the theory and practice of random-effect models underlying genome-wide complex trait analysis (GCTA) (2).
    ...
    There are many other errors in the paper by Krishna Kumar et al. (1), as pointed out by us (2) and others (8). In conclusion, Krishna Kumar et al. (1, 5) misunderstood the model and assumptions underlying GCTA-GREML, and therefore used the incorrect expected mean and SD of σˆ2subset for comparison with those values observed from resampling. Hence, their conclusion about biasedness of GREML estimates is not supported by empirical evidence.
     

    I found this but I could not get the copy to read it:

    http://www.biorxiv.org/content/early/2016/02/13/039594

    Response to Commentary on “Limitations of GCTA as a solution to the missing heritability problem”

    In a recent manuscript, Yang and colleagues criticized our paper, “Limitations of GCTA as a solution to the missing heritability problem”. Here we show that their main claims are statistically invalid, and our results hold as stated.

    Read More
  78. Logan says:
    @res

    But I think it’s reasonably clear that DNA doesn’t decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6′ and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don’t understand well at all.
     
    This is one of the most sensible things I have read on this comment thread. Probably a good way of explaining it to people who don't want more details, but if I might offer a variant.

    DNA decides a distribution of possible heights over possible environments. The characteristics of that distribution are interesting, but unknowable in detail. Let's assume it is somewhat Gaussian (e.g. concentrated around a center with variation in either direction). Over typical environments DNA will define a likely range of heights (I suspect of varying range widths, some people will be more sensitive to the environment, some more robust) around a "typical" height. Specifically targeted interventions in the environment might cause exceptional results (probably easier in the less functional rather than more functional direction).

    As an example with height, sufficient application of human growth hormone should be able to make people (almost, within the range seen over human history) arbitrarily tall.

    As a non-height example, a PKU specific diet might prevent adverse outcomes which would occur in almost any "typical" environment.

    Thanks. Your explanation is much better.

    Sadly, I don’t have the background to really explain what I intuitively understand about this issue.

    I listen to books while driving, and I recently listened to the novel Brave New World. In it, they clone people, then damage their IQ to create groups to handle the drone tasks without resenting it or rebelling. (Huxley didn’t realize that in the real future there will be little need for drones.)

    But what I took away from it is, that while we don’t know how to increase people’s IQ except on the margins, we know exactly how to reduce it.

    Read More
  79. hyperbola says:
    @res
    Direct link to your reference: https://www.ncbi.nlm.nih.gov/pubmed/22762024

    Last sentence in the abstract:

    Herein we discuss how neurogenomics and bioinformatics are applied to dissect the nature of this complex disease with the overall aim of developing rational therapeutic interventions.
     
    What is it with you and the self-refuting references? Do you not understand what the papers say or do you just think the rest of us will blindly bow down to you because you give a proper citation? You do understand that the SNP detection is a necessary but not sufficient part of this process, right?

    I don't think anyone here is arguing that finding more SNPs will be the end of the process. But it is a key step towards understanding the relationship of genetics to various traits (and biological mechanisms) and hopefully eventually offers (as described in your reference) the opportunity for therapeutic interventions, as well as a greater understanding of reality (which I think is fair to say is the basic goal of science).

    In fact, I would almost predict that eventually gene sequencing will become totally irrelevant even as a diagnostic tool.
     
    This might be the funniest statement I have read here. It sure seems counter to current trends. A good one to revisit in 10 or 20 years.

    For some counter examples, do you think people will stop using APOE as an Alzheimers vulnerability screen? Do you think JScreen is a waste of time? https://jscreen.org/faq/

    Look, I understand genetic testing is not the be all and end all of understanding biology. There are reasons PKU testing is done by looking for the necessary enzyme (phenylalanine hydroxylase) in the blood rather than just looking at SNPs (e.g. a rare non SNP variant breaking the enzyme production). And you are correct about the importance of integrating our understanding of genetics with other aspects of human biology. FWIW I have taken multiple systems biology courses and that integration is a major focus. (as is the integration between the multiple levels of biology, subcellular, cellular, tissue, organ, organism)

    The reason the genetic screens will likely always remain relevant is because of the predictive power they offer before the human entity even exists. They currently allow screening potential parents for risks (JScreen) and may eventually allow various uses of PGD (Preimplantation genetic diagnosis).

    You are once again cherry-picking in your first sentences. Follwing the introduction, the paper includes extensive discussion of why genetic appraoches like measurement of SNPs will always be inadequate.

    No matter how many SNPs you measure, they will never be a complete set for all human beings and in many cases they may be totally irrelvant to the genetic context of the individual for which you propose to carry out a diagnosis/cure. This is the unavoidable conclusion of finding genetic variants that are neither necessary nor sufficient. It is intimately related to the creation of robustness in biological systems through complexity/redundancy of networks. That very complexity means that the kinds of screens you envisage (e.g. JScreen) are likely to be relevant only to “rare” diseases and irrelevant to most of human disease. If the functional “networks” include hundreds of genes (as the original article suggests for “general cognitive function”), then there may well be many thousands, even millions, of SNPs that may somehow affect the network function, i.e. are “associated” with the disease. Many (most?) of these may never be found by population screens that include only an infinitesimal part of the human population. The intertwinning of such complex networks also means that those SNPs that you claim to have identified with a particular disease may well have many other unexpected functional consequences. All of this means that, apart from rare disease, “personalized” medicine and most germ line manipulations are heavily overmarketed and may well be criminal manipulations.

    Read More
    • Replies: @res

    No matter how many SNPs you measure, they will never be a complete set for all human beings and in many cases they may be totally irrelvant to the genetic context of the individual for which you propose to carry out a diagnosis/cure.
     
    The weakness of this statement (who is asserting there will ever be a complete set?, nice strawman) compared to things like your earlier "1% of variance" makes it clear how little you stand by your earlier statements. You're just spewing FUD at this point.

    For those who aren't familiar with the technique, walking back statements in this fashion is a great example of Motte and Bailey: https://rationalwiki.org/wiki/Motte_and_bailey
  80. hyperbola says:
    @res

    But I think it’s reasonably clear that DNA doesn’t decide everything. It probably limits the potential, but does not enforce it.

    IOW, my DNA probably decides that my maximum attainable height will be 6′ and my maximum attainable IQ will be 120. But whether I achieve those maxima depends on a host of environmental factors, pre and post natal, we don’t understand well at all.
     
    This is one of the most sensible things I have read on this comment thread. Probably a good way of explaining it to people who don't want more details, but if I might offer a variant.

    DNA decides a distribution of possible heights over possible environments. The characteristics of that distribution are interesting, but unknowable in detail. Let's assume it is somewhat Gaussian (e.g. concentrated around a center with variation in either direction). Over typical environments DNA will define a likely range of heights (I suspect of varying range widths, some people will be more sensitive to the environment, some more robust) around a "typical" height. Specifically targeted interventions in the environment might cause exceptional results (probably easier in the less functional rather than more functional direction).

    As an example with height, sufficient application of human growth hormone should be able to make people (almost, within the range seen over human history) arbitrarily tall.

    As a non-height example, a PKU specific diet might prevent adverse outcomes which would occur in almost any "typical" environment.

    Your growth hormone example is probably flawed. We already know that network redundancy can allow compensation for lack of individual hormones by others in the appropriate environmental context. Although we can’t do the excperiment on humans, artificial overabundance of hormones is more than likely subject to compensation. This is what robustness is about.

    Read More
    • Replies: @res

    Your growth hormone example is probably flawed.

     

    More FUD, but you may be right. Emphasis mine below.

    We already know that network redundancy can allow compensation for lack of individual hormones by others in the appropriate environmental context.
     
    True. But notice that "lack" is not the same as "overabundance". Though given that the body is full of feedback systems that is likely true as well for different reasons.

    Although we can’t do the excperiment on humans, artificial overabundance of hormones is more than likely subject to compensation.

     

    Although we might not be able to do it as an experiment plenty of people are dosing children with HGH. It seems to have an effect (i.e. "robustness" has limits). A big problem with HGH is it is (at least has been historically) expensive. Between this and a reasonable concern over side effects I don't think anyone has ever tried high dose HGH on humans (which is what my thought experiment was about).

    This is what robustness is about.
     
    Indeed. And the interesting question is how subject that is to overload. I believe we don't know the answer. Yet?
  81. hyperbola says:
    @res

    You present seriously distorted arguments that are clearly intentional misrepresentations. Your arguments are not made credible by such practices. I am not interested much in batting down your attempts at cherry picking of sentences. So I simply note:
     
    I think the validity of the arguments in my comment 29 can easily be compared to your comment 71 by others, but a few thoughts.

    My "distorted" quotes vs. your vague paraphrases. I think we can all judge for ourselves the relative worth of those.

    "not interested" = unable to.

    If my arguments are such distortions and misrepresentations (when you don't have data on your side pound the table and use pejorative language) they should be easy to rebut.

    Ref. 2. ONLY 5-10% of Parkinsons disease can be explained by genes that approach the necessary and sufficient criteria despite over a decade of GWAS studies. This means that 90-95% cannot.
     
    Conveniently ignoring your refs 1. and 3. and you accuse me of cherry picking?! LOL! Projection is a terrible thing.

    And remember, 5-10% >> 1% which was the number of yours I quoted in comment 29. Are you disavowing that statement now in favor of 5-10%?

    Sorry you weren’t up to understanding why refs 4-6 were included. “Unknowability in complex systems” should be come a major criterion in research funding decisions!
     
    Since you've proved you aren't up to actually debating the facts I think I'll start with the ad hominems now, hyperbole (1%! the name is well earned). You might want to investigate the psychological concept of projection before any more discourse on my perceived inadequacies.

    "Unknowability" probably does not mean what you think. Inability to know the detailed state and mechanisms down to the fundamental particle level does not imply we are unable to do useful science and engineering with the things we do know. To think otherwise is simply ignorant. But then I guess physiology, thermodynamics, etc. are all worthless.

    Not interested was the correct statement. Your attempts to cherry-pick individual sentences are just obnoxious. As others (such as utu) have also pointed out, statements that you like (e.g. “60 % of variability is explainable”) become pretty meaningless when the number of genes required to reach that level becomes several hundreds.

    Read More
    • Replies: @res
    So now you are engaging in the argumentum ad opinione utu fallacy as well. Good to know.

    You do understand that the references (from you) that I quoted in comment 29 were using a relatively small number of SNPs, right?

    I'm curious, does this style of debate work for you in real life?

    It is funny that you still don't seem to have figured out that Parkinson's Disease was a terrible example for you to use given the importance of a relatively small number of SNPs.
    , @res

    Your attempts to cherry-pick individual sentences are just obnoxious.
     
    As opposed to you never actually quoting something from the papers you cite. Got it.

    Paraphrases > quotes
    No links to references > links to references

    Have I missed any?
    , @utu
    I cut in and respond so I won't have to punish you for invoking my name in vain. Sometime you must enforce the laws you wrote yourself.

    The blogger res can be a real pain in the neck but while him being the true believer of IQ gospel and DNA determinism he tries to grasp understanding on levels beyond and above what is attainable to the mob of IQ aficionados you find at places like unz.com. At rare moments he can display a good will and then he is amenable to mathematical arguments because he seems to have a decent background in math and he respects math and seems to be really curious. If he shed off few of his dogmas that constrain him his curiosity could flourish. But he is like that pig that only looks for truffles only and keeps missing other treasures hidden in the forrest.

    In comparison to Double Juice JJ I take res every time. While with Double Juice JJ I share skepticism about IQ stuff he is a perfect exemplar of 21 century obrazovanshchina while res is capable of independent thoughts and has no fear to travel into the Verboten Zonen.
  82. res says:
    @hyperbola
    You are once again cherry-picking in your first sentences. Follwing the introduction, the paper includes extensive discussion of why genetic appraoches like measurement of SNPs will always be inadequate.

    No matter how many SNPs you measure, they will never be a complete set for all human beings and in many cases they may be totally irrelvant to the genetic context of the individual for which you propose to carry out a diagnosis/cure. This is the unavoidable conclusion of finding genetic variants that are neither necessary nor sufficient. It is intimately related to the creation of robustness in biological systems through complexity/redundancy of networks. That very complexity means that the kinds of screens you envisage (e.g. JScreen) are likely to be relevant only to "rare" diseases and irrelevant to most of human disease. If the functional "networks" include hundreds of genes (as the original article suggests for "general cognitive function"), then there may well be many thousands, even millions, of SNPs that may somehow affect the network function, i.e. are "associated" with the disease. Many (most?) of these may never be found by population screens that include only an infinitesimal part of the human population. The intertwinning of such complex networks also means that those SNPs that you claim to have identified with a particular disease may well have many other unexpected functional consequences. All of this means that, apart from rare disease, "personalized" medicine and most germ line manipulations are heavily overmarketed and may well be criminal manipulations.

    No matter how many SNPs you measure, they will never be a complete set for all human beings and in many cases they may be totally irrelvant to the genetic context of the individual for which you propose to carry out a diagnosis/cure.

    The weakness of this statement (who is asserting there will ever be a complete set?, nice strawman) compared to things like your earlier “1% of variance” makes it clear how little you stand by your earlier statements. You’re just spewing FUD at this point.

    For those who aren’t familiar with the technique, walking back statements in this fashion is a great example of Motte and Bailey: https://rationalwiki.org/wiki/Motte_and_bailey

    Read More
  83. res says:
    @hyperbola
    Not interested was the correct statement. Your attempts to cherry-pick individual sentences are just obnoxious. As others (such as utu) have also pointed out, statements that you like (e.g. "60 % of variability is explainable") become pretty meaningless when the number of genes required to reach that level becomes several hundreds.

    So now you are engaging in the argumentum ad opinione utu fallacy as well. Good to know.

    You do understand that the references (from you) that I quoted in comment 29 were using a relatively small number of SNPs, right?

    I’m curious, does this style of debate work for you in real life?

    It is funny that you still don’t seem to have figured out that Parkinson’s Disease was a terrible example for you to use given the importance of a relatively small number of SNPs.

    Read More
  84. res says:
    @hyperbola
    Not interested was the correct statement. Your attempts to cherry-pick individual sentences are just obnoxious. As others (such as utu) have also pointed out, statements that you like (e.g. "60 % of variability is explainable") become pretty meaningless when the number of genes required to reach that level becomes several hundreds.

    Your attempts to cherry-pick individual sentences are just obnoxious.

    As opposed to you never actually quoting something from the papers you cite. Got it.

    Paraphrases > quotes
    No links to references > links to references

    Have I missed any?

    Read More
  85. res says:
    @hyperbola
    Your growth hormone example is probably flawed. We already know that network redundancy can allow compensation for lack of individual hormones by others in the appropriate environmental context. Although we can't do the excperiment on humans, artificial overabundance of hormones is more than likely subject to compensation. This is what robustness is about.

    Your growth hormone example is probably flawed.

    More FUD, but you may be right. Emphasis mine below.

    We already know that network redundancy can allow compensation for lack of individual hormones by others in the appropriate environmental context.

    True. But notice that “lack” is not the same as “overabundance”. Though given that the body is full of feedback systems that is likely true as well for different reasons.

    Although we can’t do the excperiment on humans, artificial overabundance of hormones is more than likely subject to compensation.

    Although we might not be able to do it as an experiment plenty of people are dosing children with HGH. It seems to have an effect (i.e. “robustness” has limits). A big problem with HGH is it is (at least has been historically) expensive. Between this and a reasonable concern over side effects I don’t think anyone has ever tried high dose HGH on humans (which is what my thought experiment was about).

    This is what robustness is about.

    Indeed. And the interesting question is how subject that is to overload. I believe we don’t know the answer. Yet?

    Read More
  86. utu says:
    @hyperbola
    Not interested was the correct statement. Your attempts to cherry-pick individual sentences are just obnoxious. As others (such as utu) have also pointed out, statements that you like (e.g. "60 % of variability is explainable") become pretty meaningless when the number of genes required to reach that level becomes several hundreds.

    I cut in and respond so I won’t have to punish you for invoking my name in vain. Sometime you must enforce the laws you wrote yourself.

    The blogger res can be a real pain in the neck but while him being the true believer of IQ gospel and DNA determinism he tries to grasp understanding on levels beyond and above what is attainable to the mob of IQ aficionados you find at places like unz.com. At rare moments he can display a good will and then he is amenable to mathematical arguments because he seems to have a decent background in math and he respects math and seems to be really curious. If he shed off few of his dogmas that constrain him his curiosity could flourish. But he is like that pig that only looks for truffles only and keeps missing other treasures hidden in the forrest.

    In comparison to Double Juice JJ I take res every time. While with Double Juice JJ I share skepticism about IQ stuff he is a perfect exemplar of 21 century obrazovanshchina while res is capable of independent thoughts and has no fear to travel into the Verboten Zonen.

    Read More
  87. utu says:
    @res
    That is interesting. Did you follow the associated controversy (yellow box at the top)? The response from the GCTA authors is pretty harsh by research paper standards: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4987770/

    In a recent publication in PNAS, Krishna Kumar et al. (1) claim that “GCTA applied to current SNP data cannot produce reliable or stable estimates of heritability.” We show below that those claims are false due to their misunderstanding of the theory and practice of random-effect models underlying genome-wide complex trait analysis (GCTA) (2).
    ...
    There are many other errors in the paper by Krishna Kumar et al. (1), as pointed out by us (2) and others (8). In conclusion, Krishna Kumar et al. (1, 5) misunderstood the model and assumptions underlying GCTA-GREML, and therefore used the incorrect expected mean and SD of σˆ2subset for comparison with those values observed from resampling. Hence, their conclusion about biasedness of GREML estimates is not supported by empirical evidence.
     

    Here is an interesting article that explains the overfitting issue which I keep yapping about. Also from the formula presented there the importance why the data set must be large to reduce the overfitting effect can be seen which when one uses many SNPs is compounded proportionally to the numbers of SNPs. This is all intuitively obvious but actual formulas are not. The derivation of formulas requires some assumptions like normal distribution which not always are valid.

    Pitfalls of predicting complex traits from SNPs

    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4096801/

    If the correlation (R) between a phenotype and a single SNP in the population is zero (that is, the SNP is not associated with the trait), the expected value of the squared correlation (R2) estimated from a sample of size N is 1/(N-1), or approximately 1/N if N is large. Hence, a randomly chosen ‘candidate’ (but not truly associated) SNP explains 1/N of variation in any sample. Usually 1/N is small enough not to worry about. However, a set of m uncorrelated SNPs that have nothing to do with a phenotype of interest would, when fitted together, explain m/N of variation (due to the summing of their effects). For example, a set of 100 independent SNPs when fitted together in a regression analysis in a discovery sample of Nd = 1000 would, on average, explain R2 =10% of phenotypic variance in the discovery sample under the null hypothesis of no true association.

    Applying the incorrect validation procedure results in over-estimation of the accuracy of the prediction (or over-fitting). An example of where over-fitting occurs is when testing the prediction in the discovery sample, i.e., the same data are used to estimate the effect of SNPs on phenotype and to make predictions

    And one more thing. They call the two sets of data the discovery sample (on which you create the model) and the validation sample (on which you confirm your model). I wonder what criteria are used based on corrections r_discovery and r_validation if sample sizes are N_discovery and N_validation. Which correlation as the final one is reported? And what if the sets overlap?

    A less obvious mistake is to select the most significantly associated SNPs in the entire sample and to use these to estimate SNP effects and test their prediction accuracy in the discovery and validation sets55. In this case the variance explained by the SNPs when applied in the validation sample is inflated. It creates bias and misleading results because the initial selection step of the SNPs is based upon there being a chance correlation between these SNPs and the entire sample, so also between the SNPs and any sub-sample

    In humans, a polygenic prediction analysis of height in 5,117 individuals from the Framingham Heart Study (FHS; original and offspring cohorts only) reported a prediction R2 of 0.25 using 10-fold cross-validation when including all individuals in the analysis60. However, because FHS includes many related individuals, the authors repeated the analysis restricting the 10-fold cross-validation samples to individuals with no known close relatives (parent-offspring, sibling, or half-sib) in the data set based on pedigree information. In this restricted analysis, the prediction R2 decreased to 0.15. We caution that cryptic relatedness can still inflate prediction accuracy even when known close relatives are excluded.

    Another way in which prediction accuracy can be inflated is if the discovery and validation samples contain similar patterns of population stratification and the eventual target population is not similarly stratified. For example, this could occur if discovery and validation samples are independently sampled from a stratified population such as European Americans

    Read More
    • Replies: @res
    I get it, utu. I have a pretty good background in machine learning and understand the reflexive concern with N << x (sample size << number of explanatory variables).

    Not sure if you know much about cross validation (mentioned in one of your excerpts), but if not this is worth a look: https://en.wikipedia.org/wiki/Cross-validation_(statistics)

    I think usual practice is to report results based on either cross validation or a held out test (validation in your terminology, but notice nuances in first answer at https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set ) set (the held out test set is best, as long as it is large enough) and be explicit about which it is. If someone reports results based on their (non-cross validated) training (discovery in your terminology) set with most ML algorithms they should be laughed at.

    Your points about relatedness and stratification are good ones. Just don't throw the baby out with the bathwater. Misleading results because of a worst case screw up does not mean the entire body of work should be ignored (this sort of thing is why I keep griping about "FUD"). I think people in the field are fairly aware of issues like that, but it's hard to be sure. A lot of now misleading work was done in the 2000s by people who did not understand multiple hypothesis testing issues.

    If you are interested in learning more about this sort of thing from a POV with a good mix of theory and practice, this class is excellent: http://online.stanford.edu/course/statistical-learning-winter-2014
    The textbook is freely available as a PDF and is also excellent, as is its big brother referred to in the StackExchange answer.
    If you are more theoretically inclined this class is also excellent, but is more challenging IMHO (it is a real Caltech class as taught to students there): https://work.caltech.edu/telecourse

    If anyone here can describe the particular ways in which GCTA manages to avoid these issues (or not) I would love to hear it, but for now I am going with people I trust to get the math right who seem to be OK with it.
    , @res
    I just took a closer look at that paper. Did you notice who the final author was by any chance? Peter Visscher.

    Speaking of prediction, I suspect the best prediction algorithms will incorporate the parent's genetic and phenotypic information as well. Idea being that the parent's phenotypic information offers estimates of both the full spectrum of genetic effects AND the environmental effects they experienced (perhaps a decent estimate of child environment?). Then look at the difference in genetic scores between parents and (potential) child to correct the old style mid-parental (e.g. height) estimate.

    Here is a paper discussing the mid-parental height estimate. They give correlation coefficients of about 0.6 for mid-parental height to final height. So an R^2 of about 0.36. Not great, but useful. I wonder if adding the genetic scores would improve that much.

    This technique could be extended to include siblings if they are old enough to give phenotypic information (even the current height percentile of a few year old child would probably give useful information if combined with its genotypic score).
  88. utu says:

    Blind test of efficacy of GWAS and GCTA methods

    Data (genome) generation:

    Generate sequence G(i)={P(k)} where k=0,…,K and P(k)=0,1,2, or 4 (or only binary values 0 or 1) are assigned randomly with the same probabilities for each Pk value and each location k. This is the sequence of all SNPs for one subject (individual).

    Repeat the process and generate i=0,…,N G(i) sequences. (There are N subjects.)

    Select the subset of active (associate with trait) SNPs

    Select randomly a subset of k(j) for j=0,…,J of indices.

    Generate trait value:

    For the sequence G(i) calculate polygenic score for each i:

    PS(i)=P(k(0))+….+P(k(J))

    Calculate mean M and variance V from all PS(i)’s

    Scale to mean=0 and SD=1: PS(i)<– [PS(i)-M]/sqrt(V)

    Add environment factor as Gaussian noise to the score:

    PS(i)<–PS(i)+Gnoise(0,SD)

    Data sizes

    The number of subjects (sample size) should be large N=10^6. But we will give the researches subsets smaller than N. Say n=10^4, 5*10^4, 10^5, 2*10^5, 5*10^5… to test how their solutions depend on the sample size.

    Number J of SNPs associated with the trait will vary from J=1000 to say 200,000. The larger J the difficulty will increase as Newton(K,J) (Newton symbol).

    The tests should be run for various n numbers of subjects, for various J numbers of trait associated SNPs and for various standard deviations SD.

    Generating such a data base is trivial. Couple hours on a lap top?

    Solutions?

    Will GWAS and/or GCTA identify the sequence {k(j)} of active SNPs and will it estimate the explainable variance (heritability): h^2=1/(1+SD^2)?

    How GWAS and GCTA would perform? How would they find say J=200,000 SNPs from among K=10^6 if there are 10^217319 possible combinations?

    Keep in mind that while the data is constructed uniquely it does not guarantee that the solution is unique. There might be more than one sequence of SNPs that produce the same polygenic scores. Let them find all solution or deal with it as the linked disequilibrium issue.

    In which cases they will overestimate the heritability? The researcher will have no a priori knowledge of heritability. We will tell them we could not find twins for these traits.

    W/o doing exhaustive test like these and publishing the results I will not have confidence in GWAS and GCTA methods and neither should you. Is Peter Visscher or any of his flunkeys reading this blog?

    Read More
    • Replies: @utu
    The test can be made more difficult by calculating the polygenic score as weighted sum instead of a simple sum.

    Furthermore, nonlinear effects of interactions between SNP's can be generated by making the weights dependent on values of other SNPs.
    , @res

    Is Peter Visscher or any of his flunkeys reading this blog?
     
    Come on utu, you thinking you know better than the people actually doing this work and responding in an arrogant and dismissive fashion shows immense hubris and I think is a big part of the reason we see researchers who show up here getting annoyed with you. That sort of thing frustrates me because this blog is a great opportunity to interact with people doing or using the research professionally. Gratuitously annoying them squanders that opportunity.

    We all make mistakes at times, but when interacting with someone who is doing original cutting edge work in the field (e.g. Visscher IMHO) it is much more appropriate to say something like: "Did you/he take this into account? How? How did you verify that?" My understanding is the simulation results in the original GCTA paper were an attempt to answer questions like that.
  89. utu says:
    @utu
    Blind test of efficacy of GWAS and GCTA methods

    Data (genome) generation:

    Generate sequence G(i)={P(k)} where k=0,...,K and P(k)=0,1,2, or 4 (or only binary values 0 or 1) are assigned randomly with the same probabilities for each Pk value and each location k. This is the sequence of all SNPs for one subject (individual).

    Repeat the process and generate i=0,...,N G(i) sequences. (There are N subjects.)

    Select the subset of active (associate with trait) SNPs

    Select randomly a subset of k(j) for j=0,...,J of indices.

    Generate trait value:

    For the sequence G(i) calculate polygenic score for each i:

    PS(i)=P(k(0))+....+P(k(J))

    Calculate mean M and variance V from all PS(i)'s

    Scale to mean=0 and SD=1: PS(i)<-- [PS(i)-M]/sqrt(V)

    Add environment factor as Gaussian noise to the score:

    PS(i)<--PS(i)+Gnoise(0,SD)

    Data sizes

    The number of subjects (sample size) should be large N=10^6. But we will give the researches subsets smaller than N. Say n=10^4, 5*10^4, 10^5, 2*10^5, 5*10^5... to test how their solutions depend on the sample size.

    Number J of SNPs associated with the trait will vary from J=1000 to say 200,000. The larger J the difficulty will increase as Newton(K,J) (Newton symbol).

    The tests should be run for various n numbers of subjects, for various J numbers of trait associated SNPs and for various standard deviations SD.

    Generating such a data base is trivial. Couple hours on a lap top?

    Solutions?

    Will GWAS and/or GCTA identify the sequence {k(j)} of active SNPs and will it estimate the explainable variance (heritability): h^2=1/(1+SD^2)?

    How GWAS and GCTA would perform? How would they find say J=200,000 SNPs from among K=10^6 if there are 10^217319 possible combinations?

    Keep in mind that while the data is constructed uniquely it does not guarantee that the solution is unique. There might be more than one sequence of SNPs that produce the same polygenic scores. Let them find all solution or deal with it as the linked disequilibrium issue.

    In which cases they will overestimate the heritability? The researcher will have no a priori knowledge of heritability. We will tell them we could not find twins for these traits.

    W/o doing exhaustive test like these and publishing the results I will not have confidence in GWAS and GCTA methods and neither should you. Is Peter Visscher or any of his flunkeys reading this blog?

    The test can be made more difficult by calculating the polygenic score as weighted sum instead of a simple sum.

    Furthermore, nonlinear effects of interactions between SNP’s can be generated by making the weights dependent on values of other SNPs.

    Read More
    • Replies: @res
    It is important to remember that including nonlinear effects greatly (e.g. doubles if adding a quadratic term or squares if adding all of the interaction terms) increases the number of potential explanatory variables (i.e. makes your other concerns much worse). If you care about nonlinear effects I think it is worth looking into Steve Hsu's compressed sensing work which exploits the sparsity of the explanatory variables which actually have an effect.
  90. res says:
    @utu
    Here is an interesting article that explains the overfitting issue which I keep yapping about. Also from the formula presented there the importance why the data set must be large to reduce the overfitting effect can be seen which when one uses many SNPs is compounded proportionally to the numbers of SNPs. This is all intuitively obvious but actual formulas are not. The derivation of formulas requires some assumptions like normal distribution which not always are valid.

    Pitfalls of predicting complex traits from SNPs
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4096801/

    If the correlation (R) between a phenotype and a single SNP in the population is zero (that is, the SNP is not associated with the trait), the expected value of the squared correlation (R2) estimated from a sample of size N is 1/(N-1), or approximately 1/N if N is large. Hence, a randomly chosen ‘candidate’ (but not truly associated) SNP explains 1/N of variation in any sample. Usually 1/N is small enough not to worry about. However, a set of m uncorrelated SNPs that have nothing to do with a phenotype of interest would, when fitted together, explain m/N of variation (due to the summing of their effects). For example, a set of 100 independent SNPs when fitted together in a regression analysis in a discovery sample of Nd = 1000 would, on average, explain R2 =10% of phenotypic variance in the discovery sample under the null hypothesis of no true association.

    Applying the incorrect validation procedure results in over-estimation of the accuracy of the prediction (or over-fitting). An example of where over-fitting occurs is when testing the prediction in the discovery sample, i.e., the same data are used to estimate the effect of SNPs on phenotype and to make predictions
     
    And one more thing. They call the two sets of data the discovery sample (on which you create the model) and the validation sample (on which you confirm your model). I wonder what criteria are used based on corrections r_discovery and r_validation if sample sizes are N_discovery and N_validation. Which correlation as the final one is reported? And what if the sets overlap?

    A less obvious mistake is to select the most significantly associated SNPs in the entire sample and to use these to estimate SNP effects and test their prediction accuracy in the discovery and validation sets55. In this case the variance explained by the SNPs when applied in the validation sample is inflated. It creates bias and misleading results because the initial selection step of the SNPs is based upon there being a chance correlation between these SNPs and the entire sample, so also between the SNPs and any sub-sample
     

    In humans, a polygenic prediction analysis of height in 5,117 individuals from the Framingham Heart Study (FHS; original and offspring cohorts only) reported a prediction R2 of 0.25 using 10-fold cross-validation when including all individuals in the analysis60. However, because FHS includes many related individuals, the authors repeated the analysis restricting the 10-fold cross-validation samples to individuals with no known close relatives (parent-offspring, sibling, or half-sib) in the data set based on pedigree information. In this restricted analysis, the prediction R2 decreased to 0.15. We caution that cryptic relatedness can still inflate prediction accuracy even when known close relatives are excluded.
     
    Another way in which prediction accuracy can be inflated is if the discovery and validation samples contain similar patterns of population stratification and the eventual target population is not similarly stratified. For example, this could occur if discovery and validation samples are independently sampled from a stratified population such as European Americans

    I get it, utu. I have a pretty good background in machine learning and understand the reflexive concern with N << x (sample size << number of explanatory variables).

    Not sure if you know much about cross validation (mentioned in one of your excerpts), but if not this is worth a look: https://en.wikipedia.org/wiki/Cross-validation_(statistics)

    I think usual practice is to report results based on either cross validation or a held out test (validation in your terminology, but notice nuances in first answer at https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set ) set (the held out test set is best, as long as it is large enough) and be explicit about which it is. If someone reports results based on their (non-cross validated) training (discovery in your terminology) set with most ML algorithms they should be laughed at.

    Your points about relatedness and stratification are good ones. Just don't throw the baby out with the bathwater. Misleading results because of a worst case screw up does not mean the entire body of work should be ignored (this sort of thing is why I keep griping about "FUD"). I think people in the field are fairly aware of issues like that, but it’s hard to be sure. A lot of now misleading work was done in the 2000s by people who did not understand multiple hypothesis testing issues.

    If you are interested in learning more about this sort of thing from a POV with a good mix of theory and practice, this class is excellent: http://online.stanford.edu/course/statistical-learning-winter-2014
    The textbook is freely available as a PDF and is also excellent, as is its big brother referred to in the StackExchange answer.
    If you are more theoretically inclined this class is also excellent, but is more challenging IMHO (it is a real Caltech class as taught to students there): https://work.caltech.edu/telecourse

    If anyone here can describe the particular ways in which GCTA manages to avoid these issues (or not) I would love to hear it, but for now I am going with people I trust to get the math right who seem to be OK with it.

    Read More
  91. res says:
    @utu
    Blind test of efficacy of GWAS and GCTA methods

    Data (genome) generation:

    Generate sequence G(i)={P(k)} where k=0,...,K and P(k)=0,1,2, or 4 (or only binary values 0 or 1) are assigned randomly with the same probabilities for each Pk value and each location k. This is the sequence of all SNPs for one subject (individual).

    Repeat the process and generate i=0,...,N G(i) sequences. (There are N subjects.)

    Select the subset of active (associate with trait) SNPs

    Select randomly a subset of k(j) for j=0,...,J of indices.

    Generate trait value:

    For the sequence G(i) calculate polygenic score for each i:

    PS(i)=P(k(0))+....+P(k(J))

    Calculate mean M and variance V from all PS(i)'s

    Scale to mean=0 and SD=1: PS(i)<-- [PS(i)-M]/sqrt(V)

    Add environment factor as Gaussian noise to the score:

    PS(i)<--PS(i)+Gnoise(0,SD)

    Data sizes

    The number of subjects (sample size) should be large N=10^6. But we will give the researches subsets smaller than N. Say n=10^4, 5*10^4, 10^5, 2*10^5, 5*10^5... to test how their solutions depend on the sample size.

    Number J of SNPs associated with the trait will vary from J=1000 to say 200,000. The larger J the difficulty will increase as Newton(K,J) (Newton symbol).

    The tests should be run for various n numbers of subjects, for various J numbers of trait associated SNPs and for various standard deviations SD.

    Generating such a data base is trivial. Couple hours on a lap top?

    Solutions?

    Will GWAS and/or GCTA identify the sequence {k(j)} of active SNPs and will it estimate the explainable variance (heritability): h^2=1/(1+SD^2)?

    How GWAS and GCTA would perform? How would they find say J=200,000 SNPs from among K=10^6 if there are 10^217319 possible combinations?

    Keep in mind that while the data is constructed uniquely it does not guarantee that the solution is unique. There might be more than one sequence of SNPs that produce the same polygenic scores. Let them find all solution or deal with it as the linked disequilibrium issue.

    In which cases they will overestimate the heritability? The researcher will have no a priori knowledge of heritability. We will tell them we could not find twins for these traits.

    W/o doing exhaustive test like these and publishing the results I will not have confidence in GWAS and GCTA methods and neither should you. Is Peter Visscher or any of his flunkeys reading this blog?

    Is Peter Visscher or any of his flunkeys reading this blog?

    Come on utu, you thinking you know better than the people actually doing this work and responding in an arrogant and dismissive fashion shows immense hubris and I think is a big part of the reason we see researchers who show up here getting annoyed with you. That sort of thing frustrates me because this blog is a great opportunity to interact with people doing or using the research professionally. Gratuitously annoying them squanders that opportunity.

    We all make mistakes at times, but when interacting with someone who is doing original cutting edge work in the field (e.g. Visscher IMHO) it is much more appropriate to say something like: “Did you/he take this into account? How? How did you verify that?” My understanding is the simulation results in the original GCTA paper were an attempt to answer questions like that.

    Read More
    • Agree: utu
    • Replies: @utu
    I pressed the Agree button but on the second thought I feel I need to point few things. I understand that my irreverence is ineffective but at least I formulate questions that often need to be asked. I growl and bark and I know I will be ignored. I can't be like a tail wagging puppy awaiting in anxious trepidation for scraps form the master table. It is not my nature.

    Every professional group must be scrutinized. They are just a group of humans who share similar beliefs and biases. The sociological processes apply equally to physicists or biologists as to psychologists or gender studies scientists. The scientific process suppose to be self correcting but are we sure that we are not letting the periods between corrections running for too long by acquiescing to their narrative and shying away from irreverence as if we were too concerned about their fragile egos?

    Look at climate general circulation models (GCM). All these people who work on them in various groups come from similar backgrounds and share similar beliefs. Nobody from outside is able to verify or understand even how the models were constructed. Nobody can run them independently of national labs that have expensive supercomputers. These people depend on grants form the same sources and know very well on which side their bread is buttered. Probably they are the only people who eventually will implement the necessary corrections in the end but do we need to give them unconditional reverence in the mean time? Shouldn't we ask questions an point out what we want them do to?

    Similar situation is among people who do GWAS and GCTA studies. They all believe in the same thing. I bet there is not a single one among them who is a genuine skeptic. They all think just like you that it is just a matter of time before the results will finally materialize. They already know the future outcomes. They do not even for a second think that the result might be not coming or that if they come they might be wrong.

    Genome sequencing did not deliver what was expected. The problem turned out to me much more difficult and not just in controversial areas like human intelligence. Even the simple trait of height made them sweat. The missing heritability gap is getting to the MSM and is spoiling the optimistic narrative of the New Brave World. The movers and shakers behind the narrative are getting impatient. Desperate men do desperate things. Is Visscher desperate?
  92. res says:
    @utu
    The test can be made more difficult by calculating the polygenic score as weighted sum instead of a simple sum.

    Furthermore, nonlinear effects of interactions between SNP's can be generated by making the weights dependent on values of other SNPs.

    It is important to remember that including nonlinear effects greatly (e.g. doubles if adding a quadratic term or squares if adding all of the interaction terms) increases the number of potential explanatory variables (i.e. makes your other concerns much worse). If you care about nonlinear effects I think it is worth looking into Steve Hsu’s compressed sensing work which exploits the sparsity of the explanatory variables which actually have an effect.

    Read More
  93. utu says:
    @res

    Is Peter Visscher or any of his flunkeys reading this blog?
     
    Come on utu, you thinking you know better than the people actually doing this work and responding in an arrogant and dismissive fashion shows immense hubris and I think is a big part of the reason we see researchers who show up here getting annoyed with you. That sort of thing frustrates me because this blog is a great opportunity to interact with people doing or using the research professionally. Gratuitously annoying them squanders that opportunity.

    We all make mistakes at times, but when interacting with someone who is doing original cutting edge work in the field (e.g. Visscher IMHO) it is much more appropriate to say something like: "Did you/he take this into account? How? How did you verify that?" My understanding is the simulation results in the original GCTA paper were an attempt to answer questions like that.

    I pressed the Agree button but on the second thought I feel I need to point few things. I understand that my irreverence is ineffective but at least I formulate questions that often need to be asked. I growl and bark and I know I will be ignored. I can’t be like a tail wagging puppy awaiting in anxious trepidation for scraps form the master table. It is not my nature.

    Every professional group must be scrutinized. They are just a group of humans who share similar beliefs and biases. The sociological processes apply equally to physicists or biologists as to psychologists or gender studies scientists. The scientific process suppose to be self correcting but are we sure that we are not letting the periods between corrections running for too long by acquiescing to their narrative and shying away from irreverence as if we were too concerned about their fragile egos?

    Look at climate general circulation models (GCM). All these people who work on them in various groups come from similar backgrounds and share similar beliefs. Nobody from outside is able to verify or understand even how the models were constructed. Nobody can run them independently of national labs that have expensive supercomputers. These people depend on grants form the same sources and know very well on which side their bread is buttered. Probably they are the only people who eventually will implement the necessary corrections in the end but do we need to give them unconditional reverence in the mean time? Shouldn’t we ask questions an point out what we want them do to?

    Similar situation is among people who do GWAS and GCTA studies. They all believe in the same thing. I bet there is not a single one among them who is a genuine skeptic. They all think just like you that it is just a matter of time before the results will finally materialize. They already know the future outcomes. They do not even for a second think that the result might be not coming or that if they come they might be wrong.

    Genome sequencing did not deliver what was expected. The problem turned out to me much more difficult and not just in controversial areas like human intelligence. Even the simple trait of height made them sweat. The missing heritability gap is getting to the MSM and is spoiling the optimistic narrative of the New Brave World. The movers and shakers behind the narrative are getting impatient. Desperate men do desperate things. Is Visscher desperate?

    Read More
  94. factorize says:

    When will the heritability scores for intelligence reported in these GWAS cross the line into
    offering real world predictability? It would seem that when teachers or those interested in finding
    mates (and others) started to access this science, then the debates would become somewhat mute. Even if the research turned out to be somehow mistaken, perception of IQ and perhaps other traits would tend to conform to the gene chip results and not casual arm chair science.

    I am greatly looking forward to the completion of some of the GWAS research into autistic and schizoidal type behavior. There is a suspicion that such behavior might be present in my family, though we have never been completely sure ourselves. Yet, it would not be overly surprising that this is completely untrue. If so, it would be a great boost to us to know that we are in fact quite normal.

    There has been a certain amount of politicalization of behavior (e.g., in the former Soviet Union among others) that would for the sake of freedom and democracy be best to halt. Locking up people that are deemed crazy by the state has been a useful tactic to maintain state power. However, such measures would be less acceptable to the citizenry if it could be proven to be based on pseudo-science.

    Read More
  95. res says:
    @utu
    Here is an interesting article that explains the overfitting issue which I keep yapping about. Also from the formula presented there the importance why the data set must be large to reduce the overfitting effect can be seen which when one uses many SNPs is compounded proportionally to the numbers of SNPs. This is all intuitively obvious but actual formulas are not. The derivation of formulas requires some assumptions like normal distribution which not always are valid.

    Pitfalls of predicting complex traits from SNPs
    https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4096801/

    If the correlation (R) between a phenotype and a single SNP in the population is zero (that is, the SNP is not associated with the trait), the expected value of the squared correlation (R2) estimated from a sample of size N is 1/(N-1), or approximately 1/N if N is large. Hence, a randomly chosen ‘candidate’ (but not truly associated) SNP explains 1/N of variation in any sample. Usually 1/N is small enough not to worry about. However, a set of m uncorrelated SNPs that have nothing to do with a phenotype of interest would, when fitted together, explain m/N of variation (due to the summing of their effects). For example, a set of 100 independent SNPs when fitted together in a regression analysis in a discovery sample of Nd = 1000 would, on average, explain R2 =10% of phenotypic variance in the discovery sample under the null hypothesis of no true association.

    Applying the incorrect validation procedure results in over-estimation of the accuracy of the prediction (or over-fitting). An example of where over-fitting occurs is when testing the prediction in the discovery sample, i.e., the same data are used to estimate the effect of SNPs on phenotype and to make predictions
     
    And one more thing. They call the two sets of data the discovery sample (on which you create the model) and the validation sample (on which you confirm your model). I wonder what criteria are used based on corrections r_discovery and r_validation if sample sizes are N_discovery and N_validation. Which correlation as the final one is reported? And what if the sets overlap?

    A less obvious mistake is to select the most significantly associated SNPs in the entire sample and to use these to estimate SNP effects and test their prediction accuracy in the discovery and validation sets55. In this case the variance explained by the SNPs when applied in the validation sample is inflated. It creates bias and misleading results because the initial selection step of the SNPs is based upon there being a chance correlation between these SNPs and the entire sample, so also between the SNPs and any sub-sample
     

    In humans, a polygenic prediction analysis of height in 5,117 individuals from the Framingham Heart Study (FHS; original and offspring cohorts only) reported a prediction R2 of 0.25 using 10-fold cross-validation when including all individuals in the analysis60. However, because FHS includes many related individuals, the authors repeated the analysis restricting the 10-fold cross-validation samples to individuals with no known close relatives (parent-offspring, sibling, or half-sib) in the data set based on pedigree information. In this restricted analysis, the prediction R2 decreased to 0.15. We caution that cryptic relatedness can still inflate prediction accuracy even when known close relatives are excluded.
     
    Another way in which prediction accuracy can be inflated is if the discovery and validation samples contain similar patterns of population stratification and the eventual target population is not similarly stratified. For example, this could occur if discovery and validation samples are independently sampled from a stratified population such as European Americans

    I just took a closer look at that paper. Did you notice who the final author was by any chance? Peter Visscher.

    Speaking of prediction, I suspect the best prediction algorithms will incorporate the parent’s genetic and phenotypic information as well. Idea being that the parent’s phenotypic information offers estimates of both the full spectrum of genetic effects AND the environmental effects they experienced (perhaps a decent estimate of child environment?). Then look at the difference in genetic scores between parents and (potential) child to correct the old style mid-parental (e.g. height) estimate.

    Here is a paper discussing the mid-parental height estimate. They give correlation coefficients of about 0.6 for mid-parental height to final height. So an R^2 of about 0.36. Not great, but useful. I wonder if adding the genetic scores would improve that much.

    This technique could be extended to include siblings if they are old enough to give phenotypic information (even the current height percentile of a few year old child would probably give useful information if combined with its genotypic score).

    Read More
    • Replies: @res
    Argh. Left out the paper link. http://www.nature.com/pr/journal/v44/n4/full/pr1998502a.html
    Target Height as Predicted by Parental Heights in a Population-Based Study
    , @utu
    Idea being that the parent’s phenotypic information offers estimates of both the full spectrum of genetic effects AND the environmental effects

    Not sure what this would prove. By doing genes only you can estimate the true heritability but by including the phenotype of parents you may end up overestimating it., though yes, the prediction would be better.
  96. res says:
    @res
    I just took a closer look at that paper. Did you notice who the final author was by any chance? Peter Visscher.

    Speaking of prediction, I suspect the best prediction algorithms will incorporate the parent's genetic and phenotypic information as well. Idea being that the parent's phenotypic information offers estimates of both the full spectrum of genetic effects AND the environmental effects they experienced (perhaps a decent estimate of child environment?). Then look at the difference in genetic scores between parents and (potential) child to correct the old style mid-parental (e.g. height) estimate.

    Here is a paper discussing the mid-parental height estimate. They give correlation coefficients of about 0.6 for mid-parental height to final height. So an R^2 of about 0.36. Not great, but useful. I wonder if adding the genetic scores would improve that much.

    This technique could be extended to include siblings if they are old enough to give phenotypic information (even the current height percentile of a few year old child would probably give useful information if combined with its genotypic score).

    Argh. Left out the paper link. http://www.nature.com/pr/journal/v44/n4/full/pr1998502a.html
    Target Height as Predicted by Parental Heights in a Population-Based Study

    Read More
  97. factorize says:

    ,

    this is an important point that you raise about there being scientific tribalism.
    The current democratization of scientific datasets, computational resources and
    open source publishing, while helping to broaden the scientific talent pool, has at the
    same time raised significant concerns about publication quality. Nearly anyone can now
    unconsciously run a GWAS analysis without any conceptual awareness of basic
    psychometric ideas.

    With this in mind, it might be helpful to draw attention to basic findings in the
    psychometric literature. One such finding that does not appear to have gained its
    due attention is the study of g in other animals. Apparently, not only do primates
    display general cognitive ability, but so to do mice, rabbits, raccoons, ravens and others.
    These animals could help normalize the discussion of g in the mainstream discussion.
    And of course, having the ability to manipulate the genes and environments of chimps
    in particular could result in powerful insights into the nature of human intelligence.

    Read More
  98. factorize says:

    Just got caught off. Here was what I wanted to post.

    Animals offer a very profound insight into the question of g, though for whatever reason
    have largely not been included in the debate. For example, it would be hardly worth arguing
    that chimps have a higher g score than mice. Yet, this is a speciesist statement. Is speciesism
    (defined here as claims that suggest IQ differences exist between species) somehow more acceptable because it is essentially uncontestable by even a casual non-expert observer, while racism (defined here specifically as claims that g differences exist by race) is less acceptable because it is contestable by casual non-expert observers? Is a speciesist non-racist an oxymoron?

    Read More
  99. utu says:
    @res
    I just took a closer look at that paper. Did you notice who the final author was by any chance? Peter Visscher.

    Speaking of prediction, I suspect the best prediction algorithms will incorporate the parent's genetic and phenotypic information as well. Idea being that the parent's phenotypic information offers estimates of both the full spectrum of genetic effects AND the environmental effects they experienced (perhaps a decent estimate of child environment?). Then look at the difference in genetic scores between parents and (potential) child to correct the old style mid-parental (e.g. height) estimate.

    Here is a paper discussing the mid-parental height estimate. They give correlation coefficients of about 0.6 for mid-parental height to final height. So an R^2 of about 0.36. Not great, but useful. I wonder if adding the genetic scores would improve that much.

    This technique could be extended to include siblings if they are old enough to give phenotypic information (even the current height percentile of a few year old child would probably give useful information if combined with its genotypic score).

    Idea being that the parent’s phenotypic information offers estimates of both the full spectrum of genetic effects AND the environmental effects

    Not sure what this would prove. By doing genes only you can estimate the true heritability but by including the phenotype of parents you may end up overestimating it., though yes, the prediction would be better.

    Read More
Current Commenter says:

Leave a Reply - Comments on articles more than two weeks old will be judged much more strictly on quality and tone


 Remember My InformationWhy?
 Email Replies to my Comment
Submitted comments become the property of The Unz Review and may be republished elsewhere at the sole discretion of the latter
Subscribe to This Comment Thread via RSS Subscribe to All James Thompson Comments via RSS