The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 BlogviewJames Thompson Archive
Piffer Replies to Prof Posthuma
🔊 Listen RSS
Email This Page to Someone

 Remember My Information



=>

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

Debaters-Council

Dear Prof Posthuma,

Thank you for your comments. These comments are not new, and that is not necessarily a bad thing. Actually, it works to my advantage because over the years I have had the opportunity to develop ways to rebut these criticisms.

One of the ways of answering your criticisms, and the one which convinces me the most about the validity of my findings, is the new Monte Carlo approach I developed. I show that thousands of unlinked random SNPs (matched for Minor Allele Frequency using the SNPSNAP algorithm) rarely (p<0.01) achieve the same predictive power as the polygenic scores built from GWAS hits. The issues of Linkage Disequilibrium decay, different causal variants, etc, mentioned by you simply create noise, they do not bias the results in one direction. There is no reason why Linkage Disequilibrium decay should produce the pattern we observe, and magically match the IQ scores of populations so closely. As the paper you cite (Martin et al., 2017) explains: “We demonstrate that scores inferred from European GWASs are biased by genetic drift in other populations even when choosing the same causal variants, and that biases in any direction are possible and unpredictable”.

But genetic drift has been controlled for and ruled out in my papers by two different and complementary methods. First, a Mantel-like test, based on regressing phenotypic values on Fst distances and polygenic score distances, showing that polygenic scores predict average intelligence above and beyond Fst distances (i.e. drift and all that is not directional). Second, a method that shows the unviability of drift to explain my results is a Monte Carlo simulation with several thousands of SNPs, whose correlation to population IQ is outperformed 99% or more by the GWAS hits (for a demonstration, see my paper: https://www.preprints.org/manuscript/201701.0127/v3).

The factor analysis of GWAS hits produced even better results, outperforming 99.8% of the random SNPs. For a report, check: https://rpubs.com/Daxide/279148

What is remarkable is that height GWAS hits fail to predict population IQ. Guess what they predict? Height. The East Asian advantage we observe for education or intelligence-related SNPs disappears and turns into a lower score for the notoriously not gigantic Chinese, Vietnamese and Japanese. A demonstration of this can be seen here: https://f1000research.com/articles/4-15/v3. Look at table 1 and compare the polygenic scores to those for intelligence such as my table2 of my 2015 Intelligence paper http://www.sciencedirect.com/science/article/pii/S0160289615001087 or the more recent scores: (https://topseudoscience.wordpress.com/2017/06/02/new-genes-same-results-group-level-genotypic-intelligence-for-26-and-52-populations). They almost look like their mirror image, with ranks reversed.

An issue I see in the Martin et al. paper is that the polygenic scores were created using a very liberal p-value for inclusion thus pulling in a lot of false positives. False positives are expected to work like random SNPs, hence it is not surprising that they could not reproduce the results in non-Europeans.

When we home-in on the causal variants by picking the right alleles, instead of using a brute-force approach, we tend to see that the same genes have the same effects across different super-populations. For example, countless studies showed that the APOE4 allele is involved in Alzheimer’s disease and has a variety of health-related effects. This allele confers risk on African Americans and European-Americans alike (http://www.nytimes.com/2013/04/10/health/african-americans-have-higher-risk-of-alzheimers-study-shows.html). Accidentally, I should mention that this variant also has a population pattern closely mirroring the intelligence polygenic scores, perhaps due to the general effect on cognition.

The strength of my approach is in using the SNPs that replicated across many GWAS studies, increasing the chance of dealing with true causal variants or SNPs in close Linkage Disequilibrium with them, hence reducing the effect of Linkage Disequilibrium decay.

And Europeans are not even the top scorers, as the “reference-population-bias” hypothesis would predict. This hypothesis is widespread but lacks any logical rationale. In fact, I consistently observed higher polygenic factor scores for East Asians than for Europeans. If there had been a pro-European (i.e. pro GWAS-reference population) bias built into the cross-population comparison, this would imply that my method underestimates all non-European scores, not just Africans. I am so amused that the debate is fixated on the lower African scores, and nobody notices the East Asian advantage. You cannot have it both ways: if my method had a pro-White bias, then the East Asian scores would also be underestimated. This would actually imply that the East Asian advantage is even bigger than that which I have found. This reduction ad absurdum shows the absurdity of claims against my method.

Finally, a paper published this week, using GWAS hits, replicates the East Asian advantage on educational attainment found by several of my papers (although funnily they do not acknowledge my studies, although one of the authors is familiar with my results, because a while ago I had shared my results with him via email): http://biorxiv.org/content/early/2017/06/04/146043

This paper strengthens the argument that SNPs which predict within-population differences can be used to predict between-population differences.

I recently published a paper where I put together all my main findings to date: https://www.preprints.org/manuscript/201706.0039/v1

That paper should be able to answer general questions about my findings and my methods.

In summary, within-population differences can be used to predict between-population differences.

 
• Category: Science 
Hide 40 CommentsLeave a Comment
Commenters to Ignore...to FollowEndorsed Only
    []
  1. Gerhard says:

    This research program is based on two fundamental conjectures: (1) within-population differences in education, IQ etc are caused by the same causal variants everywhere, and (2) allele frequencies vary among populations. As long as these two conjectures are true, causal SNPs discovered in Europeans and the polygenic scores constructed from them can predict between-population differences.

    The problem is that our present polygenic scores are not computed from known causal variants, but from GWAS hits that in the vast majority of cases are merely in linkage disequilibrium with the causal variants. Even if the same causal polymorphisms were polymorphic in all populations, the linkage phase and extent of linkage disequilibrium between GWAS hit and causal polymorphism is not necessarily the same everywhere. This is the most important reason why most of the polygenic scores defined for Europeans have low predictive power for non-Europeans. Also, causal variants that are polymorphic in Europeans may be monomorphic elsewhere, and vice versa.

    Instead of bickering about the limitations of our present polygenic scores, what needs to be done is to make the transition from microarray-based discovery studies to sequencing-based fine mapping of the causal variants. This will necessitate large-scale studies in non-Europeans in order to capitalize on ethnic variations in linkage patterns. African populations are especially suitable for fine mapping because of their generally lower linkage disequilibrium.

    The wider issue, not within science but more generally, is what is preferable in this case: knowledge or ignorance. Knowledge constrains the kinds of beliefs that people can reasonably hold. Racists have their own favored beliefs about polygenic scores, and politically correct types have different favored beliefs. We would inflict great emotional damage on these people by telling them the truth. That would be cruel and unreasonable, wouldn’t it?

    Read More
    • Replies: @Davide Piffer
    LD decay is not racist. I think nobody is denying the existence of LD decay, or even that it is a problem. What the LD decay objection leaves unanswered is this: why if these SNPs were completely noise, do they match population IQ more than almost all the random SNPs and more than the height SNPs? And not just in the first SNP GWAS set used by Piffer in 2015, but in all subsequent polygenic scores computed from independent studies?
    LD decay is not complete and when choosing the SNPs replicated across studies it's less of an issue because the chance of hitting on a true causal SNP is much higher, and even for the tag SNPs, the average LD decay will be lower. It's simply a nuisance that is gonna add error to the prediction. As the Martin et al. paper pointed out, patterns of LD decay follow genetic drift and do not have bias for some populations. In other words, LD decay is not racist.
    , @Emil O. W. Kirkegaard
    Sequencing is very expensive, and not necessary. Higher density arrays will also reduce the LD decay problems. One does not need the specific causal variants, just some marker in very close vicinity of it so that the decay is unproblematic.

    The more important issue is getting a larger sample of countries, because n=26 is not very convincing no matter what is done. Keep in mind that there's genomic autocorrelation too, so the real independent n sample size is much smaller.

    Note: there is real data-based simulation evidence behind the claim of unitary causal patterns.

    http://biorxiv.org/content/early/2016/11/03/085092

    We might also note that the environmentalists gambling on non-unitary causal patterns in variants is not wise because this plays directly into the hands of people say that the races are so different they should be labeled different species. The unitary causal patterns with some LD decay is more in line with the 'only one species'-position.

    IMO, direct evidence not convincing as of now, but looking forward to larger databases of genomic data, e.g. country level (of natives!).
    , @res

    This research program is based on two fundamental conjectures: (1) within-population differences in education, IQ etc are caused by the same causal variants everywhere, and (2) allele frequencies vary among populations.
     
    Let's look at both of your statements here. Your (2) is demonstrably true. Anybody who doubts that should take a look at a reference like SNPedia or for the visually inclined see:
    Geography of Genetic Variants Browser - http://popgen.uchicago.edu/ggv

    As for your (1) my sense is that Piffer's work makes a looser conjecture. That the SNPs detected in (European) GWAS are indicative of selection effects everywhere. This is subject to LD issues as Piffer has discussed above and in comments, but there is no requirement that the European SNPs be the only relevant SNPs.

    Although not directly applicable to this, this statement from the height paper linked above (thanks) seems related:

    Controlling for population differences in derived allele frequencies

    Allele status could be ascertained for 691 of the 697 SNPs. Among the alleles with a positive effect, there were 370 derived and 321 ancestral alleles, respectively Since this is not an equal representation, it creates a potential confounding factor. The derived allele frequency (DAF) was computed including both positive and negative effect alleles, to verify that these varied among populations. Average DAF is reported in Table 6. These indeed confirmed previous findings that non-African populations have higher frequencies of derived alleles (Henn et al., 2015). Since there are more derived alleles with a positive effect in this sample, the polygenic scores for African populations are lowered compared to non-African populations. Correcting for this bias will thus increase the polygenic scores of African populations relative to the others.
     
    Some thoughts/questions about this passage:
    - I don't recall seeing something similar in the IQ/EA work. Did I miss it, or is it not relevant there, or ...?
    - Was the correction large enough to explain the observed height polygenic score results disparity for Africans (a 4" underestimate IIRC)?
    - What about African specific causal SNPs (e.g. Pygmy height?)?

    P.S. Is anyone currently doing either fine mapping or large scale GWAS work on African populations?
    ReplyAgree/Disagree/Etc.
    AgreeDisagreeLOLTroll
    These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
    Ignore Commenter Follow Commenter
    Sharing Comment via Twitter
    /jthompson/piffer-replies-to-prof-posthuma/#comment-1898898
    More... This Commenter This Thread Hide Thread Display All Comments
  2. @Gerhard
    This research program is based on two fundamental conjectures: (1) within-population differences in education, IQ etc are caused by the same causal variants everywhere, and (2) allele frequencies vary among populations. As long as these two conjectures are true, causal SNPs discovered in Europeans and the polygenic scores constructed from them can predict between-population differences.

    The problem is that our present polygenic scores are not computed from known causal variants, but from GWAS hits that in the vast majority of cases are merely in linkage disequilibrium with the causal variants. Even if the same causal polymorphisms were polymorphic in all populations, the linkage phase and extent of linkage disequilibrium between GWAS hit and causal polymorphism is not necessarily the same everywhere. This is the most important reason why most of the polygenic scores defined for Europeans have low predictive power for non-Europeans. Also, causal variants that are polymorphic in Europeans may be monomorphic elsewhere, and vice versa.

    Instead of bickering about the limitations of our present polygenic scores, what needs to be done is to make the transition from microarray-based discovery studies to sequencing-based fine mapping of the causal variants. This will necessitate large-scale studies in non-Europeans in order to capitalize on ethnic variations in linkage patterns. African populations are especially suitable for fine mapping because of their generally lower linkage disequilibrium.

    The wider issue, not within science but more generally, is what is preferable in this case: knowledge or ignorance. Knowledge constrains the kinds of beliefs that people can reasonably hold. Racists have their own favored beliefs about polygenic scores, and politically correct types have different favored beliefs. We would inflict great emotional damage on these people by telling them the truth. That would be cruel and unreasonable, wouldn't it?

    LD decay is not racist. I think nobody is denying the existence of LD decay, or even that it is a problem. What the LD decay objection leaves unanswered is this: why if these SNPs were completely noise, do they match population IQ more than almost all the random SNPs and more than the height SNPs? And not just in the first SNP GWAS set used by Piffer in 2015, but in all subsequent polygenic scores computed from independent studies?
    LD decay is not complete and when choosing the SNPs replicated across studies it’s less of an issue because the chance of hitting on a true causal SNP is much higher, and even for the tag SNPs, the average LD decay will be lower. It’s simply a nuisance that is gonna add error to the prediction. As the Martin et al. paper pointed out, patterns of LD decay follow genetic drift and do not have bias for some populations. In other words, LD decay is not racist.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  3. @Gerhard
    This research program is based on two fundamental conjectures: (1) within-population differences in education, IQ etc are caused by the same causal variants everywhere, and (2) allele frequencies vary among populations. As long as these two conjectures are true, causal SNPs discovered in Europeans and the polygenic scores constructed from them can predict between-population differences.

    The problem is that our present polygenic scores are not computed from known causal variants, but from GWAS hits that in the vast majority of cases are merely in linkage disequilibrium with the causal variants. Even if the same causal polymorphisms were polymorphic in all populations, the linkage phase and extent of linkage disequilibrium between GWAS hit and causal polymorphism is not necessarily the same everywhere. This is the most important reason why most of the polygenic scores defined for Europeans have low predictive power for non-Europeans. Also, causal variants that are polymorphic in Europeans may be monomorphic elsewhere, and vice versa.

    Instead of bickering about the limitations of our present polygenic scores, what needs to be done is to make the transition from microarray-based discovery studies to sequencing-based fine mapping of the causal variants. This will necessitate large-scale studies in non-Europeans in order to capitalize on ethnic variations in linkage patterns. African populations are especially suitable for fine mapping because of their generally lower linkage disequilibrium.

    The wider issue, not within science but more generally, is what is preferable in this case: knowledge or ignorance. Knowledge constrains the kinds of beliefs that people can reasonably hold. Racists have their own favored beliefs about polygenic scores, and politically correct types have different favored beliefs. We would inflict great emotional damage on these people by telling them the truth. That would be cruel and unreasonable, wouldn't it?

    Sequencing is very expensive, and not necessary. Higher density arrays will also reduce the LD decay problems. One does not need the specific causal variants, just some marker in very close vicinity of it so that the decay is unproblematic.

    The more important issue is getting a larger sample of countries, because n=26 is not very convincing no matter what is done. Keep in mind that there’s genomic autocorrelation too, so the real independent n sample size is much smaller.

    Note: there is real data-based simulation evidence behind the claim of unitary causal patterns.

    http://biorxiv.org/content/early/2016/11/03/085092

    We might also note that the environmentalists gambling on non-unitary causal patterns in variants is not wise because this plays directly into the hands of people say that the races are so different they should be labeled different species. The unitary causal patterns with some LD decay is more in line with the ‘only one species’-position.

    IMO, direct evidence not convincing as of now, but looking forward to larger databases of genomic data, e.g. country level (of natives!).

    Read More
    • Replies: @Davide Piffer
    We already have samples with more populations (52+) and ALL natives. I am talking about ALFRED. The big problem is coverage is very low so only about 10% of the variants are present. Nonetheless, it is still possible to create polygenic scores or factor analyze those. What we lose in terms of number of SNPs/genomic resolution we gain in population N/spatial resolution.
    , @res
    Interesting link. Thanks.

    Do you have current cost data (in study quantities) for sequencing and lower/higher density arrays?

    Nice "one species" point.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  4. res says:
    @Gerhard
    This research program is based on two fundamental conjectures: (1) within-population differences in education, IQ etc are caused by the same causal variants everywhere, and (2) allele frequencies vary among populations. As long as these two conjectures are true, causal SNPs discovered in Europeans and the polygenic scores constructed from them can predict between-population differences.

    The problem is that our present polygenic scores are not computed from known causal variants, but from GWAS hits that in the vast majority of cases are merely in linkage disequilibrium with the causal variants. Even if the same causal polymorphisms were polymorphic in all populations, the linkage phase and extent of linkage disequilibrium between GWAS hit and causal polymorphism is not necessarily the same everywhere. This is the most important reason why most of the polygenic scores defined for Europeans have low predictive power for non-Europeans. Also, causal variants that are polymorphic in Europeans may be monomorphic elsewhere, and vice versa.

    Instead of bickering about the limitations of our present polygenic scores, what needs to be done is to make the transition from microarray-based discovery studies to sequencing-based fine mapping of the causal variants. This will necessitate large-scale studies in non-Europeans in order to capitalize on ethnic variations in linkage patterns. African populations are especially suitable for fine mapping because of their generally lower linkage disequilibrium.

    The wider issue, not within science but more generally, is what is preferable in this case: knowledge or ignorance. Knowledge constrains the kinds of beliefs that people can reasonably hold. Racists have their own favored beliefs about polygenic scores, and politically correct types have different favored beliefs. We would inflict great emotional damage on these people by telling them the truth. That would be cruel and unreasonable, wouldn't it?

    This research program is based on two fundamental conjectures: (1) within-population differences in education, IQ etc are caused by the same causal variants everywhere, and (2) allele frequencies vary among populations.

    Let’s look at both of your statements here. Your (2) is demonstrably true. Anybody who doubts that should take a look at a reference like SNPedia or for the visually inclined see:
    Geography of Genetic Variants Browser – http://popgen.uchicago.edu/ggv

    As for your (1) my sense is that Piffer’s work makes a looser conjecture. That the SNPs detected in (European) GWAS are indicative of selection effects everywhere. This is subject to LD issues as Piffer has discussed above and in comments, but there is no requirement that the European SNPs be the only relevant SNPs.

    Although not directly applicable to this, this statement from the height paper linked above (thanks) seems related:

    Controlling for population differences in derived allele frequencies

    Allele status could be ascertained for 691 of the 697 SNPs. Among the alleles with a positive effect, there were 370 derived and 321 ancestral alleles, respectively Since this is not an equal representation, it creates a potential confounding factor. The derived allele frequency (DAF) was computed including both positive and negative effect alleles, to verify that these varied among populations. Average DAF is reported in Table 6. These indeed confirmed previous findings that non-African populations have higher frequencies of derived alleles (Henn et al., 2015). Since there are more derived alleles with a positive effect in this sample, the polygenic scores for African populations are lowered compared to non-African populations. Correcting for this bias will thus increase the polygenic scores of African populations relative to the others.

    Some thoughts/questions about this passage:
    - I don’t recall seeing something similar in the IQ/EA work. Did I miss it, or is it not relevant there, or …?
    - Was the correction large enough to explain the observed height polygenic score results disparity for Africans (a 4″ underestimate IIRC)?
    - What about African specific causal SNPs (e.g. Pygmy height?)?

    P.S. Is anyone currently doing either fine mapping or large scale GWAS work on African populations?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  5. @Emil O. W. Kirkegaard
    Sequencing is very expensive, and not necessary. Higher density arrays will also reduce the LD decay problems. One does not need the specific causal variants, just some marker in very close vicinity of it so that the decay is unproblematic.

    The more important issue is getting a larger sample of countries, because n=26 is not very convincing no matter what is done. Keep in mind that there's genomic autocorrelation too, so the real independent n sample size is much smaller.

    Note: there is real data-based simulation evidence behind the claim of unitary causal patterns.

    http://biorxiv.org/content/early/2016/11/03/085092

    We might also note that the environmentalists gambling on non-unitary causal patterns in variants is not wise because this plays directly into the hands of people say that the races are so different they should be labeled different species. The unitary causal patterns with some LD decay is more in line with the 'only one species'-position.

    IMO, direct evidence not convincing as of now, but looking forward to larger databases of genomic data, e.g. country level (of natives!).

    We already have samples with more populations (52+) and ALL natives. I am talking about ALFRED. The big problem is coverage is very low so only about 10% of the variants are present. Nonetheless, it is still possible to create polygenic scores or factor analyze those. What we lose in terms of number of SNPs/genomic resolution we gain in population N/spatial resolution.

    Read More
    • Replies: @Emil O. W. Kirkegaard
    How far can we go by imputing the ALFRED data?
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  6. res says:
    @Emil O. W. Kirkegaard
    Sequencing is very expensive, and not necessary. Higher density arrays will also reduce the LD decay problems. One does not need the specific causal variants, just some marker in very close vicinity of it so that the decay is unproblematic.

    The more important issue is getting a larger sample of countries, because n=26 is not very convincing no matter what is done. Keep in mind that there's genomic autocorrelation too, so the real independent n sample size is much smaller.

    Note: there is real data-based simulation evidence behind the claim of unitary causal patterns.

    http://biorxiv.org/content/early/2016/11/03/085092

    We might also note that the environmentalists gambling on non-unitary causal patterns in variants is not wise because this plays directly into the hands of people say that the races are so different they should be labeled different species. The unitary causal patterns with some LD decay is more in line with the 'only one species'-position.

    IMO, direct evidence not convincing as of now, but looking forward to larger databases of genomic data, e.g. country level (of natives!).

    Interesting link. Thanks.

    Do you have current cost data (in study quantities) for sequencing and lower/higher density arrays?

    Nice “one species” point.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  7. utu says:

    I show that thousands of unlinked random SNPs (matched for Minor Allele Frequency using the SNPSNAP algorithm) rarely (p<0.01) achieve the same predictive power as the polygenic scores built from GWAS hits.

    Rarely does not mean never, right? How rarely? If any other set of randomly selected SNP’s have the same predictive power as the original 9 SNPs with respect to the sequence of 26 IQ numbers then we may ask what else is lurking in the set of 10 millions of SNPs and what else they can predict. What is the predictive power of randomly selected SNPs from among 10 million? What else they can predict? If we generate a random sequence of 26 numbers can we find 9 or more SNP’s that will predict these numbers with r=0.9 correlation?

    magically match the IQ scores of populations so closely

    Clearly the author did not explore the issue. Perhaps there is nothing magical about this. Spurious correlations will occur in the undetermined system where you have 10 million SPNs and only 26 data points. Perhaps we may find SNP’s that will explain tomorrow’s lottery results in London, Lusaka and Tokyo.

    Read More
    • Replies: @Davide Piffer
    Your comment simply shows lack of understanding. You clearly have not read my papers or you'd find the answer there. It is only you who have not even bothered exploring the issue and is uttering nonsense. The sets of 9 SNPs that can predict IQ with the same correlation varies between 1% and 0.2%, depending on the polygenic/factor score being used. See: https://rpubs.com/Daxide/279148
    , @Anonymous

    Rarely does not mean never, right? How rarely?
     
    Poor Utu, you need to go to school (or at least start reading Wikipedia or something). You quote Davide's "rarely (p<0.01)" and proceed asking just how rarely. Seriously? Come on, man, get an education.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  8. @utu

    I show that thousands of unlinked random SNPs (matched for Minor Allele Frequency using the SNPSNAP algorithm) rarely (p<0.01) achieve the same predictive power as the polygenic scores built from GWAS hits.
     
    Rarely does not mean never, right? How rarely? If any other set of randomly selected SNP's have the same predictive power as the original 9 SNPs with respect to the sequence of 26 IQ numbers then we may ask what else is lurking in the set of 10 millions of SNPs and what else they can predict. What is the predictive power of randomly selected SNPs from among 10 million? What else they can predict? If we generate a random sequence of 26 numbers can we find 9 or more SNP's that will predict these numbers with r=0.9 correlation?

    magically match the IQ scores of populations so closely
     
    Clearly the author did not explore the issue. Perhaps there is nothing magical about this. Spurious correlations will occur in the undetermined system where you have 10 million SPNs and only 26 data points. Perhaps we may find SNP's that will explain tomorrow's lottery results in London, Lusaka and Tokyo.

    Your comment simply shows lack of understanding. You clearly have not read my papers or you’d find the answer there. It is only you who have not even bothered exploring the issue and is uttering nonsense. The sets of 9 SNPs that can predict IQ with the same correlation varies between 1% and 0.2%, depending on the polygenic/factor score being used. See: https://rpubs.com/Daxide/279148

    Read More
    • Replies: @utu
    I went to your preprint that you gave a link to on another thread and found this:

    The polygenic score computed using the 9 SNPs was highly correlated (r=0.88) to an estimate [2] of average population IQ (fig. 1). A Monte Carlo simulation was run using 818 PS computed from groups of 9 SNPs taken from the random dataset. The average correlation between population IQ and the random polygenic scores was 0.22 (N=818). The slightly positive correlation can be interpreted as an effect of spatial/phylogenetic autocorrelation [8] A Monte Carlo approach was used: the percentile corresponding to a correlation coefficient r=0.88 was found to be 99% (using the 818 random polygenic scores), implying that the result is highly significant
     
    What were the top 8 correlations out of 818?
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  9. Anonymous says: • Disclaimer
    @utu

    I show that thousands of unlinked random SNPs (matched for Minor Allele Frequency using the SNPSNAP algorithm) rarely (p<0.01) achieve the same predictive power as the polygenic scores built from GWAS hits.
     
    Rarely does not mean never, right? How rarely? If any other set of randomly selected SNP's have the same predictive power as the original 9 SNPs with respect to the sequence of 26 IQ numbers then we may ask what else is lurking in the set of 10 millions of SNPs and what else they can predict. What is the predictive power of randomly selected SNPs from among 10 million? What else they can predict? If we generate a random sequence of 26 numbers can we find 9 or more SNP's that will predict these numbers with r=0.9 correlation?

    magically match the IQ scores of populations so closely
     
    Clearly the author did not explore the issue. Perhaps there is nothing magical about this. Spurious correlations will occur in the undetermined system where you have 10 million SPNs and only 26 data points. Perhaps we may find SNP's that will explain tomorrow's lottery results in London, Lusaka and Tokyo.

    Rarely does not mean never, right? How rarely?

    Poor Utu, you need to go to school (or at least start reading Wikipedia or something). You quote Davide’s “rarely (p<0.01)" and proceed asking just how rarely. Seriously? Come on, man, get an education.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  10. utu says:
    @Davide Piffer
    Your comment simply shows lack of understanding. You clearly have not read my papers or you'd find the answer there. It is only you who have not even bothered exploring the issue and is uttering nonsense. The sets of 9 SNPs that can predict IQ with the same correlation varies between 1% and 0.2%, depending on the polygenic/factor score being used. See: https://rpubs.com/Daxide/279148

    I went to your preprint that you gave a link to on another thread and found this:

    The polygenic score computed using the 9 SNPs was highly correlated (r=0.88) to an estimate [2] of average population IQ (fig. 1). A Monte Carlo simulation was run using 818 PS computed from groups of 9 SNPs taken from the random dataset. The average correlation between population IQ and the random polygenic scores was 0.22 (N=818). The slightly positive correlation can be interpreted as an effect of spatial/phylogenetic autocorrelation [8] A Monte Carlo approach was used: the percentile corresponding to a correlation coefficient r=0.88 was found to be 99% (using the 818 random polygenic scores), implying that the result is highly significant

    What were the top 8 correlations out of 818?

    Read More
    • Replies: @utu
    I found the answer

    that is, over a total of 819 runs, a correlation coefficient equal to or higher than 0.88 occurred 8 times
     
    Randomly 8 groups were found that produced better correlations with IQ's than the 9 SNP's used in the study.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  11. utu says:
    @utu
    I went to your preprint that you gave a link to on another thread and found this:

    The polygenic score computed using the 9 SNPs was highly correlated (r=0.88) to an estimate [2] of average population IQ (fig. 1). A Monte Carlo simulation was run using 818 PS computed from groups of 9 SNPs taken from the random dataset. The average correlation between population IQ and the random polygenic scores was 0.22 (N=818). The slightly positive correlation can be interpreted as an effect of spatial/phylogenetic autocorrelation [8] A Monte Carlo approach was used: the percentile corresponding to a correlation coefficient r=0.88 was found to be 99% (using the 818 random polygenic scores), implying that the result is highly significant
     
    What were the top 8 correlations out of 818?

    I found the answer

    that is, over a total of 819 runs, a correlation coefficient equal to or higher than 0.88 occurred 8 times

    Randomly 8 groups were found that produced better correlations with IQ’s than the 9 SNP’s used in the study.

    Read More
    • Replies: @Davide Piffer
    Have you ever heard of stuff called p value or significance testing?
    You can think of p= 0.01 this way. In a room with 1000 people, you will find on average only 10 whose poor understanding of statistics is as bad or worse than yours.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  12. saxo says:

    utu,

    Again, your lack of statistical reasoning in interpreting experimental results is showing. As the paper said, p < 0.01 , i.e. the confident level is better than 1 in 100. With 819 runs the upper limit of false positive is 8, which is exactly you found.

    In life and social sciences the acceptable significant level is p < 0.05 , i.e. 1 in 20
    p < 0.01 is already exceeded the expected significant level.

    Go learn some more statistics.

    You also have a pre-conceived one track mind that ALL random trials should be false while in real life there could be real effects not yet discovered yet. Those 8 cases should be examined in more details in case they are real.

    Read More
    • Replies: @utu

    "the upper limit of false positive is 8"

    "Those 8 cases should be examined in more details in case they are real."
     
    At this point those 8 are all positive and there is nothing false about them. The only criterion is the correlation with IQ and those 8 might be performing better than the original group, so there is no category of false or not-false.

    What is the main claim of Davide Piffer's paper: "Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88." And then he proceeds with doing his random search to prove some statistical significance (significance of what - I fail to see it) and he quickly finds 8 other groups of 9 SNP's that outperform the original one in terms of correlation. So actually he undermines the significance of his claim. Finding 9 SNPs which explain IQ differences among populations with r=0.88 is apparently very easy. A random process can do it in 100 trials. There are millions of 9 SNPs groups out there that can do the job. Should one paper be written for each case?

    My question is why not continue the random search beyond 818 trials and find the one that maximizes the correlation with populations IQ's. How high can you get? Can you get to r=0.99? And if you did, how would you explain it? My next question would be as follows: Replace the IQ list with 26 random numbers and do the search for SNP's that will correlate with it the best. And do it for many different 26 random numbers sets. Then analyze the random numbers sets for which you can get high correlations. Only then you can talk about significance of the results against the spurious correlations that are bound to happen in this heavily undetermined system (26 dependent variables and potentially millions of independent variables).
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  13. @utu
    I found the answer

    that is, over a total of 819 runs, a correlation coefficient equal to or higher than 0.88 occurred 8 times
     
    Randomly 8 groups were found that produced better correlations with IQ's than the 9 SNP's used in the study.

    Have you ever heard of stuff called p value or significance testing?
    You can think of p= 0.01 this way. In a room with 1000 people, you will find on average only 10 whose poor understanding of statistics is as bad or worse than yours.

    Read More
    • Replies: @utu
    You are missing to see the significance of your significance testing. It is not that only 1% of randomly selected 9 SNPs groups outperform the group you have identified. It is the opposite. After a short search of 818 trials you found 8 groups that have higher correlations with populations IQs than the group you have originally identified. Why not concentrate on the one that yields the highest correlation? If you keep searching, perhaps you could find a group with r=0.999 on the N=26 populations IQ's set and then you could write paper titled: 99% of IQ variance explained with 9 SNPs. Don't you want to get higher correlation? Why to set on just r=0.88? You exactly know why, because higher correlations strongly suggest that the correlation effect might be spurious which is likely because the set N=26 is small. So it is possible that the r=0.88 you obtained is also a result of being lucky spurious effect and thus it might be meaningless. If you want to find out how significant is your result with respect to spurious effects you need to randomize the IQ values and commence the search of SNPs groups that correlate with it.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  14. VisitorQ says:

    JT: Finally, a paper published this week, using GWAS hits, replicates the East Asian advantage on educational attainment found by several of my papers (although funnily they do not acknowledge my studies, although one of the authors is familiar with my results, because a while ago I had shared my results with him via email):

    http://biorxiv.org/content/early/2017/06/04/146043

    This paper strengthens the argument that SNPs which predict within-population differences can be used to predict between-population differences.

    For this paper though, the supplemental tables 4 and 20 show binomial tests on the ≈85 polygenic alleles associated with educational attainment tend to be from East Asian positive:European negative results and East Asian positive:Native American negative results. East Asian positive:Near East negative and negative results also drive associations in the extended dataset.

    The binomial tests on the ≈85 polygenic alleles under study for African vs Europeans, East Asians, etc. are essentially all neutral. Native Australian and Papuan scores also tend to be neutral in binomial tests in the extended data set.

    Do you have any ideas on you think the binomial tests have these results? Is the structure of between population genetic variance of educational accomplishment East Asian>African/Oceanian>Native American/European as they suggest?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter Display All Comments
  15. utu says:
    @saxo
    utu,

    Again, your lack of statistical reasoning in interpreting experimental results is showing. As the paper said, p < 0.01 , i.e. the confident level is better than 1 in 100. With 819 runs the upper limit of false positive is 8, which is exactly you found.

    In life and social sciences the acceptable significant level is p < 0.05 , i.e. 1 in 20
    p < 0.01 is already exceeded the expected significant level.

    Go learn some more statistics.

    You also have a pre-conceived one track mind that ALL random trials should be false while in real life there could be real effects not yet discovered yet. Those 8 cases should be examined in more details in case they are real.

    “the upper limit of false positive is 8″

    “Those 8 cases should be examined in more details in case they are real.”

    At this point those 8 are all positive and there is nothing false about them. The only criterion is the correlation with IQ and those 8 might be performing better than the original group, so there is no category of false or not-false.

    What is the main claim of Davide Piffer’s paper: “Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88.” And then he proceeds with doing his random search to prove some statistical significance (significance of what – I fail to see it) and he quickly finds 8 other groups of 9 SNP’s that outperform the original one in terms of correlation. So actually he undermines the significance of his claim. Finding 9 SNPs which explain IQ differences among populations with r=0.88 is apparently very easy. A random process can do it in 100 trials. There are millions of 9 SNPs groups out there that can do the job. Should one paper be written for each case?

    My question is why not continue the random search beyond 818 trials and find the one that maximizes the correlation with populations IQ’s. How high can you get? Can you get to r=0.99? And if you did, how would you explain it? My next question would be as follows: Replace the IQ list with 26 random numbers and do the search for SNP’s that will correlate with it the best. And do it for many different 26 random numbers sets. Then analyze the random numbers sets for which you can get high correlations. Only then you can talk about significance of the results against the spurious correlations that are bound to happen in this heavily undetermined system (26 dependent variables and potentially millions of independent variables).

    Read More
    • Replies: @Davide Piffer
    You utter utter nonsense
    , @res
    utu, you state

    What is the main claim of Davide Piffer’s paper: “Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88.”
     
    At the top of page 2 of his latest paper Piffer states: "Piffer [8] identified 9 genomic loci (table S1) that were replicated across the three largest GWAS of educational attainment published to date [9-11]."
    (Davide, I have been trying to find the referenced Table S1 and failed after looking in both your most recent paper and reference 8 aka doi: 10.20944/preprints201611.0047.v1 The latter mentions "table 13" which I also could not find. I looked for supplemental material for both papers but did not see any. Could you please point me to a description of your selection process for your 9 SNPs?)

    The key point (which I have mentioned at least once recently) is that Piffer did not AFAICT pick these SNPs himself--they were derived from earlier GWAS. He is testing a hypothesis based on SNPs derived from other research. This is very different from cherry picking SNPs himself. Even if that were not the case, being able to replicate the results with more SNPs from later studies (as I have also mentioned) is additional evidence in favor of this being a real phenomenon.

    In my opinion the people chastising you (utu) for lack of statistics knowledge are on target. If you really don't understand the idea of getting a p value from the random simulations described as a way of testing the hypothesis then you need to spend some time educating yourself. Your criticisms would have merit if Piffer had gone looking for best fitting SNPs from the whole genome and then presented those, but that's not what he did.

    For utu's benefit, here is the defintion of p value from http://www.statsdirect.com/help/basics/p_values.htm

    The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested.
     
    Compare that definition to Piffer's random SNP methodology.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  16. Jack667 says:

    My understanding from reading papers like this one is that Piffer’s approach can, at best, take us only half-way towards understanding the genetic basis of race differences. That paper shows that at least half of the genetic variants influencing IQ are very rare; lots of the causal variants are different for each extended family. Piffer’s approach tests for differences in common variants between races. However, even if it was firmly established that the frequencies of common causal variants differ between races, that would not say anything about those rare variants that GWAS cannot find. I guess you could make some theoretical argument about selective pressures, but you cannot show it empirically because the variants are so rare.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter Display All Comments
  17. @utu

    "the upper limit of false positive is 8"

    "Those 8 cases should be examined in more details in case they are real."
     
    At this point those 8 are all positive and there is nothing false about them. The only criterion is the correlation with IQ and those 8 might be performing better than the original group, so there is no category of false or not-false.

    What is the main claim of Davide Piffer's paper: "Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88." And then he proceeds with doing his random search to prove some statistical significance (significance of what - I fail to see it) and he quickly finds 8 other groups of 9 SNP's that outperform the original one in terms of correlation. So actually he undermines the significance of his claim. Finding 9 SNPs which explain IQ differences among populations with r=0.88 is apparently very easy. A random process can do it in 100 trials. There are millions of 9 SNPs groups out there that can do the job. Should one paper be written for each case?

    My question is why not continue the random search beyond 818 trials and find the one that maximizes the correlation with populations IQ's. How high can you get? Can you get to r=0.99? And if you did, how would you explain it? My next question would be as follows: Replace the IQ list with 26 random numbers and do the search for SNP's that will correlate with it the best. And do it for many different 26 random numbers sets. Then analyze the random numbers sets for which you can get high correlations. Only then you can talk about significance of the results against the spurious correlations that are bound to happen in this heavily undetermined system (26 dependent variables and potentially millions of independent variables).

    You utter utter nonsense

    Read More
    • Replies: @utu
    Look, I found this on the net that, I think, may fit your preoccupation with p-value.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.
     
    Still it was fortunate that you undertook this random search to satisfy the requirements stemming from the prevalent fashion in your field for having p-value test because you found groups of SNPs that outperformed the one you have selected. The problem is that you do not realize the significance of your finding.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  18. utu says:
    @Davide Piffer
    Have you ever heard of stuff called p value or significance testing?
    You can think of p= 0.01 this way. In a room with 1000 people, you will find on average only 10 whose poor understanding of statistics is as bad or worse than yours.

    You are missing to see the significance of your significance testing. It is not that only 1% of randomly selected 9 SNPs groups outperform the group you have identified. It is the opposite. After a short search of 818 trials you found 8 groups that have higher correlations with populations IQs than the group you have originally identified. Why not concentrate on the one that yields the highest correlation? If you keep searching, perhaps you could find a group with r=0.999 on the N=26 populations IQ’s set and then you could write paper titled: 99% of IQ variance explained with 9 SNPs. Don’t you want to get higher correlation? Why to set on just r=0.88? You exactly know why, because higher correlations strongly suggest that the correlation effect might be spurious which is likely because the set N=26 is small. So it is possible that the r=0.88 you obtained is also a result of being lucky spurious effect and thus it might be meaningless. If you want to find out how significant is your result with respect to spurious effects you need to randomize the IQ values and commence the search of SNPs groups that correlate with it.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  19. res says:
    @utu

    "the upper limit of false positive is 8"

    "Those 8 cases should be examined in more details in case they are real."
     
    At this point those 8 are all positive and there is nothing false about them. The only criterion is the correlation with IQ and those 8 might be performing better than the original group, so there is no category of false or not-false.

    What is the main claim of Davide Piffer's paper: "Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88." And then he proceeds with doing his random search to prove some statistical significance (significance of what - I fail to see it) and he quickly finds 8 other groups of 9 SNP's that outperform the original one in terms of correlation. So actually he undermines the significance of his claim. Finding 9 SNPs which explain IQ differences among populations with r=0.88 is apparently very easy. A random process can do it in 100 trials. There are millions of 9 SNPs groups out there that can do the job. Should one paper be written for each case?

    My question is why not continue the random search beyond 818 trials and find the one that maximizes the correlation with populations IQ's. How high can you get? Can you get to r=0.99? And if you did, how would you explain it? My next question would be as follows: Replace the IQ list with 26 random numbers and do the search for SNP's that will correlate with it the best. And do it for many different 26 random numbers sets. Then analyze the random numbers sets for which you can get high correlations. Only then you can talk about significance of the results against the spurious correlations that are bound to happen in this heavily undetermined system (26 dependent variables and potentially millions of independent variables).

    utu, you state

    What is the main claim of Davide Piffer’s paper: “Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88.”

    At the top of page 2 of his latest paper Piffer states: “Piffer [8] identified 9 genomic loci (table S1) that were replicated across the three largest GWAS of educational attainment published to date [9-11].”
    (Davide, I have been trying to find the referenced Table S1 and failed after looking in both your most recent paper and reference 8 aka doi: 10.20944/preprints201611.0047.v1 The latter mentions “table 13″ which I also could not find. I looked for supplemental material for both papers but did not see any. Could you please point me to a description of your selection process for your 9 SNPs?)

    The key point (which I have mentioned at least once recently) is that Piffer did not AFAICT pick these SNPs himself–they were derived from earlier GWAS. He is testing a hypothesis based on SNPs derived from other research. This is very different from cherry picking SNPs himself. Even if that were not the case, being able to replicate the results with more SNPs from later studies (as I have also mentioned) is additional evidence in favor of this being a real phenomenon.

    In my opinion the people chastising you (utu) for lack of statistics knowledge are on target. If you really don’t understand the idea of getting a p value from the random simulations described as a way of testing the hypothesis then you need to spend some time educating yourself. Your criticisms would have merit if Piffer had gone looking for best fitting SNPs from the whole genome and then presented those, but that’s not what he did.

    For utu’s benefit, here is the defintion of p value from http://www.statsdirect.com/help/basics/p_values.htm

    The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested.

    Compare that definition to Piffer’s random SNP methodology.

    Read More
    • Replies: @utu
    My point is not about cherry picking. He did a right thing. He identified group of SNPs on the basis of previous studies and decide to test it against the set of 26 IQ's. My point is the opposite to the cherry picking. My point is about the preponderance (1%) of randomly selected groups of 9 SNPs that produce correlation r≥0.88. Why Davide Piffer wrote a paper on the group of SNPs that he found in other publications which "merely" produced r=0.88 while among the randomly found groups there were some that had r>0.88? Shouldn't higher correlation warrant more interest?

    If you really don’t understand the idea of getting a p value from the random simulations
     
    Tell me what possibly one can't understand about such a trivial concept. I was surprised that it even had a name. Small minds usually like to give lofty names to trivial things. There is a reason why statisticians do not enjoy the highest respect among mathematicians.

    To master probability and statistics requires mastering a great chunk of math. But we begin to go wrong when we mindlessly apply equations in inappropriate situations because of the allure of quantification. Worse, we routinely reify the mathematics; for example, p-values positively wriggle with life: to most, they are mysterious magic numbers.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.
    http://wmbriggs.com/post/3169/
     
    , @Davide Piffer
    Utu lacks statistics knowledge, and repeatedly fails to understand simple concepts even when several people try to explain him. What I did is a Monte Carlo simulation, it's not arcane or illegal as Utu is trying to suggest but it's totally legit and it's an empirical way of finding a p value, in this case much better in my opinion than traditional statistical tests which rely on too many unsatisfied assumptions (e.g. normality, lack of spatial autocorrelation in the data, etc.)
    See:
    https://www.biomedware.com/files/documentation/OldCSHelp/MCR/Calculating_Monte_Carlo_p-values.htm

    The supplementary table will be added to the final publication. In the meantime, you can view it here: https://docs.google.com/document/d/198eH3X87-969Hxv60MuuL7sGtPEmMYeB2RoM-pGd7cg/edit?usp=sharing

    By the way, I have posted a new paper on calculating LD decay and getting rid of it when computing polygenic scores: https://rpubs.com/Daxide/283453

    I don't have a lot of free time to work on this so that is the provisional version.

    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  20. utu says:
    @res
    utu, you state

    What is the main claim of Davide Piffer’s paper: “Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88.”
     
    At the top of page 2 of his latest paper Piffer states: "Piffer [8] identified 9 genomic loci (table S1) that were replicated across the three largest GWAS of educational attainment published to date [9-11]."
    (Davide, I have been trying to find the referenced Table S1 and failed after looking in both your most recent paper and reference 8 aka doi: 10.20944/preprints201611.0047.v1 The latter mentions "table 13" which I also could not find. I looked for supplemental material for both papers but did not see any. Could you please point me to a description of your selection process for your 9 SNPs?)

    The key point (which I have mentioned at least once recently) is that Piffer did not AFAICT pick these SNPs himself--they were derived from earlier GWAS. He is testing a hypothesis based on SNPs derived from other research. This is very different from cherry picking SNPs himself. Even if that were not the case, being able to replicate the results with more SNPs from later studies (as I have also mentioned) is additional evidence in favor of this being a real phenomenon.

    In my opinion the people chastising you (utu) for lack of statistics knowledge are on target. If you really don't understand the idea of getting a p value from the random simulations described as a way of testing the hypothesis then you need to spend some time educating yourself. Your criticisms would have merit if Piffer had gone looking for best fitting SNPs from the whole genome and then presented those, but that's not what he did.

    For utu's benefit, here is the defintion of p value from http://www.statsdirect.com/help/basics/p_values.htm

    The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested.
     
    Compare that definition to Piffer's random SNP methodology.

    My point is not about cherry picking. He did a right thing. He identified group of SNPs on the basis of previous studies and decide to test it against the set of 26 IQ’s. My point is the opposite to the cherry picking. My point is about the preponderance (1%) of randomly selected groups of 9 SNPs that produce correlation r≥0.88. Why Davide Piffer wrote a paper on the group of SNPs that he found in other publications which “merely” produced r=0.88 while among the randomly found groups there were some that had r>0.88? Shouldn’t higher correlation warrant more interest?

    If you really don’t understand the idea of getting a p value from the random simulations

    Tell me what possibly one can’t understand about such a trivial concept. I was surprised that it even had a name. Small minds usually like to give lofty names to trivial things. There is a reason why statisticians do not enjoy the highest respect among mathematicians.

    To master probability and statistics requires mastering a great chunk of math. But we begin to go wrong when we mindlessly apply equations in inappropriate situations because of the allure of quantification. Worse, we routinely reify the mathematics; for example, p-values positively wriggle with life: to most, they are mysterious magic numbers.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.

    http://wmbriggs.com/post/3169/

    Read More
    • Replies: @Davide Piffer
    Let's stop feeding this troll
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  21. @res
    utu, you state

    What is the main claim of Davide Piffer’s paper: “Look guys, I found 9 SNPs with which I can explain IQ differences among populations with r=0.88.”
     
    At the top of page 2 of his latest paper Piffer states: "Piffer [8] identified 9 genomic loci (table S1) that were replicated across the three largest GWAS of educational attainment published to date [9-11]."
    (Davide, I have been trying to find the referenced Table S1 and failed after looking in both your most recent paper and reference 8 aka doi: 10.20944/preprints201611.0047.v1 The latter mentions "table 13" which I also could not find. I looked for supplemental material for both papers but did not see any. Could you please point me to a description of your selection process for your 9 SNPs?)

    The key point (which I have mentioned at least once recently) is that Piffer did not AFAICT pick these SNPs himself--they were derived from earlier GWAS. He is testing a hypothesis based on SNPs derived from other research. This is very different from cherry picking SNPs himself. Even if that were not the case, being able to replicate the results with more SNPs from later studies (as I have also mentioned) is additional evidence in favor of this being a real phenomenon.

    In my opinion the people chastising you (utu) for lack of statistics knowledge are on target. If you really don't understand the idea of getting a p value from the random simulations described as a way of testing the hypothesis then you need to spend some time educating yourself. Your criticisms would have merit if Piffer had gone looking for best fitting SNPs from the whole genome and then presented those, but that's not what he did.

    For utu's benefit, here is the defintion of p value from http://www.statsdirect.com/help/basics/p_values.htm

    The P value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true – the definition of ‘extreme’ depends on how the hypothesis is being tested.
     
    Compare that definition to Piffer's random SNP methodology.

    Utu lacks statistics knowledge, and repeatedly fails to understand simple concepts even when several people try to explain him. What I did is a Monte Carlo simulation, it’s not arcane or illegal as Utu is trying to suggest but it’s totally legit and it’s an empirical way of finding a p value, in this case much better in my opinion than traditional statistical tests which rely on too many unsatisfied assumptions (e.g. normality, lack of spatial autocorrelation in the data, etc.)
    See:

    https://www.biomedware.com/files/documentation/OldCSHelp/MCR/Calculating_Monte_Carlo_p-values.htm

    The supplementary table will be added to the final publication. In the meantime, you can view it here: https://docs.google.com/document/d/198eH3X87-969Hxv60MuuL7sGtPEmMYeB2RoM-pGd7cg/edit?usp=sharing

    By the way, I have posted a new paper on calculating LD decay and getting rid of it when computing polygenic scores: https://rpubs.com/Daxide/283453

    I don’t have a lot of free time to work on this so that is the provisional version.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  22. utu says:
    @Davide Piffer
    You utter utter nonsense

    Look, I found this on the net that, I think, may fit your preoccupation with p-value.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.

    Still it was fortunate that you undertook this random search to satisfy the requirements stemming from the prevalent fashion in your field for having p-value test because you found groups of SNPs that outperformed the one you have selected. The problem is that you do not realize the significance of your finding.

    Read More
    • Replies: @Davide Piffer
    Ok this is gonna be my last reply to Utu if he keeps trolling. If only he would read my papers more carefully he'd realize he could have spared us all his posts.
    A while ago I had proposed the technique of reverse-engineering polygenic scores by finding those with a high correlation to them. It's just not the focus of this paper. This paper adopts a more conservative approach. If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.
    This is what I wrote (Piffer, 2015, Intelligence): "Finally, thismethod can be “reverse-engineered” to aid in the detectionof new GWAS hits by selecting polymorphisms whose frequencies
    correlate with the polygenic score or selection factor. These genes (or
    “polygenes”) will have a higher probability of being intelligence related thus reducing the need for extremely large samples and the reliance upon ‘chance capitalization’ typical of current intelligence
    GWA studies".
    This is just not the focus of this paper. Look up Monte Carlo and then you will understand what I did.
    , @saxo
    You seems to rely more on subjective hand waving and foams in the mouth than on objective statistical p values. Sad.

    Have you bought any bridge lately ?
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  23. @utu
    My point is not about cherry picking. He did a right thing. He identified group of SNPs on the basis of previous studies and decide to test it against the set of 26 IQ's. My point is the opposite to the cherry picking. My point is about the preponderance (1%) of randomly selected groups of 9 SNPs that produce correlation r≥0.88. Why Davide Piffer wrote a paper on the group of SNPs that he found in other publications which "merely" produced r=0.88 while among the randomly found groups there were some that had r>0.88? Shouldn't higher correlation warrant more interest?

    If you really don’t understand the idea of getting a p value from the random simulations
     
    Tell me what possibly one can't understand about such a trivial concept. I was surprised that it even had a name. Small minds usually like to give lofty names to trivial things. There is a reason why statisticians do not enjoy the highest respect among mathematicians.

    To master probability and statistics requires mastering a great chunk of math. But we begin to go wrong when we mindlessly apply equations in inappropriate situations because of the allure of quantification. Worse, we routinely reify the mathematics; for example, p-values positively wriggle with life: to most, they are mysterious magic numbers.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.
    http://wmbriggs.com/post/3169/
     

    Let’s stop feeding this troll

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  24. @utu
    Look, I found this on the net that, I think, may fit your preoccupation with p-value.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.
     
    Still it was fortunate that you undertook this random search to satisfy the requirements stemming from the prevalent fashion in your field for having p-value test because you found groups of SNPs that outperformed the one you have selected. The problem is that you do not realize the significance of your finding.

    Ok this is gonna be my last reply to Utu if he keeps trolling. If only he would read my papers more carefully he’d realize he could have spared us all his posts.
    A while ago I had proposed the technique of reverse-engineering polygenic scores by finding those with a high correlation to them. It’s just not the focus of this paper. This paper adopts a more conservative approach. If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.
    This is what I wrote (Piffer, 2015, Intelligence): “Finally, thismethod can be “reverse-engineered” to aid in the detectionof new GWAS hits by selecting polymorphisms whose frequencies
    correlate with the polygenic score or selection factor. These genes (or
    “polygenes”) will have a higher probability of being intelligence related thus reducing the need for extremely large samples and the reliance upon ‘chance capitalization’ typical of current intelligence
    GWA studies”.
    This is just not the focus of this paper. Look up Monte Carlo and then you will understand what I did.

    Read More
    • Replies: @Anonymous

    If only he would read my papers more carefully
     
    Utu is not a reader, he is a writer. :-)
    , @utu
    If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.

    Your result (singular) was identifying 9 SNPs that highly correlated with populations IQs. It turns out that 1 in 100 randomly selected SNPs have the same (or better) property. This neither strengthen nor weakens your result but raises a question whether the correlations are spurious. You do not address the issue of spurious correlation. I suggested in my other trolling comments that IQ should be randomized to estimate statistically how likely is a spurious correlation on the set of 26 populations. This is the issue of the undetermined system.

    I am pretty confident that if you continued random search for SNP's you could easily find single SNP's or 2 SNPs, or 3 SNPs groups that also have high correlations with IQ. If finding randomly a group of 9 has p≈0.01 then this might imply that finding SNP with "right property" of belonging to a group of 9 has probability (0.01)^(1/9)≈0.6 which is very high. Wow, 60% of genome has a right kind of SNPs to contribute to high correlation. When you did your randomization it would be useful to look on the histogram of correlations to see how close to binomial distribution it was.

    Let's look at implications of your result. The straight line that fits polygenic scores (PS) to IQ's (Table 2 in your paper) with correlation r=0.91 is as follows:

    IQ=24.5+131.1*PS

    Let's create virtual populations Cloud 9 of people who have all 9 SNPs and Cloud 8 or people who have 8 out of 9 SNPs. Your formula makes the following predictions about IQs for these populations:

    Cloud 9: IQ=155 (9 SNPs)
    Cloud 8: IQ=141 (8 out of 9 SNPs)
    Cloud 7: IQ=126 (7 out of 9 SNPs)

    Data in Table 8 do not preclude existence of these high IQ populations Cloud 9, Cloud 8, Cloud 7. In Europe the lowest frequency is 0.229 for rs11584700 , so Cloud 9 could be as high as 22.9% but obviously it depends on mutual (spatial) correlations among frequencies which I do not have. On the other hand from IQ distribution among Europeans we can estimate population sizes:

    Cloud 9: IQ=155 0.01%
    Cloud 8: IQ=141 0.3%
    Cloud 7: IQ=126 4.1%

    Go to some large genome database of Europeans and find how large are Cloud 9, Cloud 8, Cloud7? This should be your first check. If the results will not make sense you drop the towel then. If they do make sense, which I think is highly unlikely, you should look at large database of IQ's and try to come with predictive model based on these or others SNP's. How big correlation can you get there when N=70,000 rather than N=26? If you get r=0.25 this will be worth publishing. But do not count for more.

    ______
    You should relax a bit. And be more appreciative that somebody gave your paper time of day on some level of scrutiny.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  25. Anonymous says: • Disclaimer
    @Davide Piffer
    Ok this is gonna be my last reply to Utu if he keeps trolling. If only he would read my papers more carefully he'd realize he could have spared us all his posts.
    A while ago I had proposed the technique of reverse-engineering polygenic scores by finding those with a high correlation to them. It's just not the focus of this paper. This paper adopts a more conservative approach. If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.
    This is what I wrote (Piffer, 2015, Intelligence): "Finally, thismethod can be “reverse-engineered” to aid in the detectionof new GWAS hits by selecting polymorphisms whose frequencies
    correlate with the polygenic score or selection factor. These genes (or
    “polygenes”) will have a higher probability of being intelligence related thus reducing the need for extremely large samples and the reliance upon ‘chance capitalization’ typical of current intelligence
    GWA studies".
    This is just not the focus of this paper. Look up Monte Carlo and then you will understand what I did.

    If only he would read my papers more carefully

    Utu is not a reader, he is a writer. :-)

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  26. saxo says:
    @utu
    Look, I found this on the net that, I think, may fit your preoccupation with p-value.

    The social sciences don’t really have any proven underlying factors and need things like p-factors to sound like they’re onto something when, in fact, none of their premises are testable. Not to mention that p-factors, and confidence intervals appear in every paper because nowadays no self respecting paper would omit them (IOW: it’s fashionable). I don’t think anyone in social sciences really pays attention to those things and futhermore assume that no one else does either.
     
    Still it was fortunate that you undertook this random search to satisfy the requirements stemming from the prevalent fashion in your field for having p-value test because you found groups of SNPs that outperformed the one you have selected. The problem is that you do not realize the significance of your finding.

    You seems to rely more on subjective hand waving and foams in the mouth than on objective statistical p values. Sad.

    Have you bought any bridge lately ?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  27. utu says:
    @Davide Piffer
    Ok this is gonna be my last reply to Utu if he keeps trolling. If only he would read my papers more carefully he'd realize he could have spared us all his posts.
    A while ago I had proposed the technique of reverse-engineering polygenic scores by finding those with a high correlation to them. It's just not the focus of this paper. This paper adopts a more conservative approach. If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.
    This is what I wrote (Piffer, 2015, Intelligence): "Finally, thismethod can be “reverse-engineered” to aid in the detectionof new GWAS hits by selecting polymorphisms whose frequencies
    correlate with the polygenic score or selection factor. These genes (or
    “polygenes”) will have a higher probability of being intelligence related thus reducing the need for extremely large samples and the reliance upon ‘chance capitalization’ typical of current intelligence
    GWA studies".
    This is just not the focus of this paper. Look up Monte Carlo and then you will understand what I did.

    If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.

    Your result (singular) was identifying 9 SNPs that highly correlated with populations IQs. It turns out that 1 in 100 randomly selected SNPs have the same (or better) property. This neither strengthen nor weakens your result but raises a question whether the correlations are spurious. You do not address the issue of spurious correlation. I suggested in my other trolling comments that IQ should be randomized to estimate statistically how likely is a spurious correlation on the set of 26 populations. This is the issue of the undetermined system.

    I am pretty confident that if you continued random search for SNP’s you could easily find single SNP’s or 2 SNPs, or 3 SNPs groups that also have high correlations with IQ. If finding randomly a group of 9 has p≈0.01 then this might imply that finding SNP with “right property” of belonging to a group of 9 has probability (0.01)^(1/9)≈0.6 which is very high. Wow, 60% of genome has a right kind of SNPs to contribute to high correlation. When you did your randomization it would be useful to look on the histogram of correlations to see how close to binomial distribution it was.

    Let’s look at implications of your result. The straight line that fits polygenic scores (PS) to IQ’s (Table 2 in your paper) with correlation r=0.91 is as follows:

    IQ=24.5+131.1*PS

    Let’s create virtual populations Cloud 9 of people who have all 9 SNPs and Cloud 8 or people who have 8 out of 9 SNPs. Your formula makes the following predictions about IQs for these populations:

    Cloud 9: IQ=155 (9 SNPs)
    Cloud 8: IQ=141 (8 out of 9 SNPs)
    Cloud 7: IQ=126 (7 out of 9 SNPs)

    Data in Table 8 do not preclude existence of these high IQ populations Cloud 9, Cloud 8, Cloud 7. In Europe the lowest frequency is 0.229 for rs11584700 , so Cloud 9 could be as high as 22.9% but obviously it depends on mutual (spatial) correlations among frequencies which I do not have. On the other hand from IQ distribution among Europeans we can estimate population sizes:

    Cloud 9: IQ=155 0.01%
    Cloud 8: IQ=141 0.3%
    Cloud 7: IQ=126 4.1%

    Go to some large genome database of Europeans and find how large are Cloud 9, Cloud 8, Cloud7? This should be your first check. If the results will not make sense you drop the towel then. If they do make sense, which I think is highly unlikely, you should look at large database of IQ’s and try to come with predictive model based on these or others SNP’s. How big correlation can you get there when N=70,000 rather than N=26? If you get r=0.25 this will be worth publishing. But do not count for more.

    ______
    You should relax a bit. And be more appreciative that somebody gave your paper time of day on some level of scrutiny.

    Read More
    • Replies: @Davide Piffer
    Really, this is my last reply to this troll. You are entirely missing the point. I am not interested in predicting IQ within populations (we have GWAS for that). I apply GWAS results to between-population differences. You are making a silly confusion between within-population and between population variance. Since you show that you constantly fail to understand the logic of my method and what's worse, even basic statistics, you are entirely unqualified to take part in this discussion.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  28. @utu
    If we assume that the random SNPs with a higher correlation to IQ have some meaning, then my results would become even more significant than they are.

    Your result (singular) was identifying 9 SNPs that highly correlated with populations IQs. It turns out that 1 in 100 randomly selected SNPs have the same (or better) property. This neither strengthen nor weakens your result but raises a question whether the correlations are spurious. You do not address the issue of spurious correlation. I suggested in my other trolling comments that IQ should be randomized to estimate statistically how likely is a spurious correlation on the set of 26 populations. This is the issue of the undetermined system.

    I am pretty confident that if you continued random search for SNP's you could easily find single SNP's or 2 SNPs, or 3 SNPs groups that also have high correlations with IQ. If finding randomly a group of 9 has p≈0.01 then this might imply that finding SNP with "right property" of belonging to a group of 9 has probability (0.01)^(1/9)≈0.6 which is very high. Wow, 60% of genome has a right kind of SNPs to contribute to high correlation. When you did your randomization it would be useful to look on the histogram of correlations to see how close to binomial distribution it was.

    Let's look at implications of your result. The straight line that fits polygenic scores (PS) to IQ's (Table 2 in your paper) with correlation r=0.91 is as follows:

    IQ=24.5+131.1*PS

    Let's create virtual populations Cloud 9 of people who have all 9 SNPs and Cloud 8 or people who have 8 out of 9 SNPs. Your formula makes the following predictions about IQs for these populations:

    Cloud 9: IQ=155 (9 SNPs)
    Cloud 8: IQ=141 (8 out of 9 SNPs)
    Cloud 7: IQ=126 (7 out of 9 SNPs)

    Data in Table 8 do not preclude existence of these high IQ populations Cloud 9, Cloud 8, Cloud 7. In Europe the lowest frequency is 0.229 for rs11584700 , so Cloud 9 could be as high as 22.9% but obviously it depends on mutual (spatial) correlations among frequencies which I do not have. On the other hand from IQ distribution among Europeans we can estimate population sizes:

    Cloud 9: IQ=155 0.01%
    Cloud 8: IQ=141 0.3%
    Cloud 7: IQ=126 4.1%

    Go to some large genome database of Europeans and find how large are Cloud 9, Cloud 8, Cloud7? This should be your first check. If the results will not make sense you drop the towel then. If they do make sense, which I think is highly unlikely, you should look at large database of IQ's and try to come with predictive model based on these or others SNP's. How big correlation can you get there when N=70,000 rather than N=26? If you get r=0.25 this will be worth publishing. But do not count for more.

    ______
    You should relax a bit. And be more appreciative that somebody gave your paper time of day on some level of scrutiny.

    Really, this is my last reply to this troll. You are entirely missing the point. I am not interested in predicting IQ within populations (we have GWAS for that). I apply GWAS results to between-population differences. You are making a silly confusion between within-population and between population variance. Since you show that you constantly fail to understand the logic of my method and what’s worse, even basic statistics, you are entirely unqualified to take part in this discussion.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  29. utu says:

    I am not interested in predicting IQ within populations

    I think you have just blinked, Davide.

    The formula that follows from your fit of PS to IQ (IQ=24.5+131.1*PS) has objective existence independent of your interests or wishes. It is you who brought it to life and now it asks questions and you pretend that you do not hear them. But these are really your own questions that you failed or were afraid to ask in the first place. Remember the story of Golem?

    I apply GWAS results to between-population differences.

    That’s why I constructed for you two populations Cloud 9 and Cloud 8 to get your attention. These two populations have polygenic scores PS=9/9=1 or PS=8/9=0.88.9. While they do not have separate geographic locations or separate ethnic identities they do exist dispersed and distributed (just like cloud) among and within other populations. If these populations existed on the map you would have included them in the studies but since they are dispersed you want to pretend they do not exist? Where is your curiosity and following through the consequences of what you have started?

    Anyway, the formula from your fit predicts that IQ of Cloud 9 and Cloud 8 are 155 and 141, respectively. This can be verified by looking up large group of individuals who qualify to be members of Cloud 9 or Cloud 8 (have all 9 or 8 out 9 SNPs, respectively) and who underwent IQ testing.

    You are making a silly confusion between within-population and between population variance.

    I think you are disingenuous to pretend that you can separate one from the other. What made you think to use the average of frequencies for 9 SNPs to predict average populations IQ? Where does the average come from? It is a sum of all individual IQs, right? For the average to contain information on SNPs it is required that individual IQ’s contain information on the SNPs and by summing them up this information is not wiped out (averaged out). So there must be a functional (not random) relationship between individual IQ and SNPs. You have established that the average IQ is a linear function of average frequency of 9 SNPs. Granted, it does not follow that exactly the same relationship will hold for individual IQ’s but the reverse is always true, i.e., If IQ is a linear function of SNPs then the average of IQ’s is linear function of average frequencies of SNP’s.

    Read More
    • Replies: @utu
    From what you wrote to Dr. Thompson http://www.unz.com/jthompson/the-dna-of-genius-n2/

    "One of those SNPs was missing from Watson and Venter "
     
    we know that Watson and Venter members of population Cloud 8 that has expected IQ of 141 according to the formula you have established. And lo and behold

    Watson IQ=120 (https://www.simonsfoundation.org/science_lives_video/james-d-watson/)
    Venter IQ=141 (https://www.bbvaopenmind.com/en/craig-venter-the-man-who-knew-himself/)
    , @Davide Piffer
    These SNPs that explain variance within populations are markers of polygenic selection. They do not have to explain a lot of variance between populations, or even within populations. Again, after your statistical ignorance, your ignorance of evolutionary genetics is showing up. The polygenic evolution model predicts that a few SNPs will have frequencies correlated to frequencies of countless other SNPs. I just need to know the few most important SNPs to gather a signal and infer to the distribution of the other unknown SNPs.
    If selection pressure acted on these 9 SNPs by driving their frequencies up in population A compared to B, then it has also done the same to other SNPs. We don't need to know what these other SNPs are because theory predicts that they will have similar distribution.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  30. utu says:
    @utu
    I am not interested in predicting IQ within populations

    I think you have just blinked, Davide.

    The formula that follows from your fit of PS to IQ (IQ=24.5+131.1*PS) has objective existence independent of your interests or wishes. It is you who brought it to life and now it asks questions and you pretend that you do not hear them. But these are really your own questions that you failed or were afraid to ask in the first place. Remember the story of Golem?

    I apply GWAS results to between-population differences.

    That's why I constructed for you two populations Cloud 9 and Cloud 8 to get your attention. These two populations have polygenic scores PS=9/9=1 or PS=8/9=0.88.9. While they do not have separate geographic locations or separate ethnic identities they do exist dispersed and distributed (just like cloud) among and within other populations. If these populations existed on the map you would have included them in the studies but since they are dispersed you want to pretend they do not exist? Where is your curiosity and following through the consequences of what you have started?

    Anyway, the formula from your fit predicts that IQ of Cloud 9 and Cloud 8 are 155 and 141, respectively. This can be verified by looking up large group of individuals who qualify to be members of Cloud 9 or Cloud 8 (have all 9 or 8 out 9 SNPs, respectively) and who underwent IQ testing.

    You are making a silly confusion between within-population and between population variance.


    I think you are disingenuous to pretend that you can separate one from the other. What made you think to use the average of frequencies for 9 SNPs to predict average populations IQ? Where does the average come from? It is a sum of all individual IQs, right? For the average to contain information on SNPs it is required that individual IQ's contain information on the SNPs and by summing them up this information is not wiped out (averaged out). So there must be a functional (not random) relationship between individual IQ and SNPs. You have established that the average IQ is a linear function of average frequency of 9 SNPs. Granted, it does not follow that exactly the same relationship will hold for individual IQ's but the reverse is always true, i.e., If IQ is a linear function of SNPs then the average of IQ's is linear function of average frequencies of SNP's.

    From what you wrote to Dr. Thompson http://www.unz.com/jthompson/the-dna-of-genius-n2/

    “One of those SNPs was missing from Watson and Venter “

    we know that Watson and Venter members of population Cloud 8 that has expected IQ of 141 according to the formula you have established. And lo and behold

    Watson IQ=120 (https://www.simonsfoundation.org/science_lives_video/james-d-watson/)
    Venter IQ=141 (https://www.bbvaopenmind.com/en/craig-venter-the-man-who-knew-himself/)

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  31. @utu
    I am not interested in predicting IQ within populations

    I think you have just blinked, Davide.

    The formula that follows from your fit of PS to IQ (IQ=24.5+131.1*PS) has objective existence independent of your interests or wishes. It is you who brought it to life and now it asks questions and you pretend that you do not hear them. But these are really your own questions that you failed or were afraid to ask in the first place. Remember the story of Golem?

    I apply GWAS results to between-population differences.

    That's why I constructed for you two populations Cloud 9 and Cloud 8 to get your attention. These two populations have polygenic scores PS=9/9=1 or PS=8/9=0.88.9. While they do not have separate geographic locations or separate ethnic identities they do exist dispersed and distributed (just like cloud) among and within other populations. If these populations existed on the map you would have included them in the studies but since they are dispersed you want to pretend they do not exist? Where is your curiosity and following through the consequences of what you have started?

    Anyway, the formula from your fit predicts that IQ of Cloud 9 and Cloud 8 are 155 and 141, respectively. This can be verified by looking up large group of individuals who qualify to be members of Cloud 9 or Cloud 8 (have all 9 or 8 out 9 SNPs, respectively) and who underwent IQ testing.

    You are making a silly confusion between within-population and between population variance.


    I think you are disingenuous to pretend that you can separate one from the other. What made you think to use the average of frequencies for 9 SNPs to predict average populations IQ? Where does the average come from? It is a sum of all individual IQs, right? For the average to contain information on SNPs it is required that individual IQ's contain information on the SNPs and by summing them up this information is not wiped out (averaged out). So there must be a functional (not random) relationship between individual IQ and SNPs. You have established that the average IQ is a linear function of average frequency of 9 SNPs. Granted, it does not follow that exactly the same relationship will hold for individual IQ's but the reverse is always true, i.e., If IQ is a linear function of SNPs then the average of IQ's is linear function of average frequencies of SNP's.

    These SNPs that explain variance within populations are markers of polygenic selection. They do not have to explain a lot of variance between populations, or even within populations. Again, after your statistical ignorance, your ignorance of evolutionary genetics is showing up. The polygenic evolution model predicts that a few SNPs will have frequencies correlated to frequencies of countless other SNPs. I just need to know the few most important SNPs to gather a signal and infer to the distribution of the other unknown SNPs.
    If selection pressure acted on these 9 SNPs by driving their frequencies up in population A compared to B, then it has also done the same to other SNPs. We don’t need to know what these other SNPs are because theory predicts that they will have similar distribution.

    Read More
    • Replies: @res
    It would be interesting to see if the other SNPs correlating with either your polygenic score or the outcome variables themselves are showing up in the GWAS but just not at the extreme p-values needed to qualify.

    It seems to me one could argue for a methodology using your techniques to select the thousand best correlating SNPs and then look at those in the GWAS but using a multiple hypothesis correction for only one thousand alternative hypotheses rather than the hundreds of thousands presented by using all SNPs.

    Perhaps not justifiable for the outcome variable correlations since the same coincidence could affect both results (though if the within and between genetics really are as different as critics claim...), but I think it works for your polygenic scores.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  32. res says:
    @Davide Piffer
    These SNPs that explain variance within populations are markers of polygenic selection. They do not have to explain a lot of variance between populations, or even within populations. Again, after your statistical ignorance, your ignorance of evolutionary genetics is showing up. The polygenic evolution model predicts that a few SNPs will have frequencies correlated to frequencies of countless other SNPs. I just need to know the few most important SNPs to gather a signal and infer to the distribution of the other unknown SNPs.
    If selection pressure acted on these 9 SNPs by driving their frequencies up in population A compared to B, then it has also done the same to other SNPs. We don't need to know what these other SNPs are because theory predicts that they will have similar distribution.

    It would be interesting to see if the other SNPs correlating with either your polygenic score or the outcome variables themselves are showing up in the GWAS but just not at the extreme p-values needed to qualify.

    It seems to me one could argue for a methodology using your techniques to select the thousand best correlating SNPs and then look at those in the GWAS but using a multiple hypothesis correction for only one thousand alternative hypotheses rather than the hundreds of thousands presented by using all SNPs.

    Perhaps not justifiable for the outcome variable correlations since the same coincidence could affect both results (though if the within and between genetics really are as different as critics claim…), but I think it works for your polygenic scores.

    Read More
    • Replies: @Davide Piffer
    Seems like a viable test. Currently I don't have the time to work on this. Do you? I am happy to offer my support.
    , @res
    Another thought about this idea. It would be interesting to take a look at it by "rerunning history." Say by starting from the IQ GWAS (plural) leading to the 9 SNPs in the polygenic score, then using that polygenic score to power the methodology described on the old data. It would be interesting to see if any (how many?) of the IQ SNPs found more recently would have been found. I think this would make a good validation test for the methodology of using Piffer's polygenic score to augment GWAS SNP discovery. It might also give insights about an appropriate threshold to use for the correlation cutoff.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  33. @res
    It would be interesting to see if the other SNPs correlating with either your polygenic score or the outcome variables themselves are showing up in the GWAS but just not at the extreme p-values needed to qualify.

    It seems to me one could argue for a methodology using your techniques to select the thousand best correlating SNPs and then look at those in the GWAS but using a multiple hypothesis correction for only one thousand alternative hypotheses rather than the hundreds of thousands presented by using all SNPs.

    Perhaps not justifiable for the outcome variable correlations since the same coincidence could affect both results (though if the within and between genetics really are as different as critics claim...), but I think it works for your polygenic scores.

    Seems like a viable test. Currently I don’t have the time to work on this. Do you? I am happy to offer my support.

    Read More
    • Replies: @res
    I have time, but I'm not sure how to get access to the GWAS results for SNPs giving strong (but not strong enough) signals. One approach would be to use your technique to find the SNPs and send the candidates to a GWAS researcher who could check them. Not sure if you know anyone like that friendly enough to your ideas to do it. I think the decoupling would also help reduce the possibility of people claiming cherry picking.

    TLDR; I'd be happy to help if I have access to the necessary data.

    P.S. On a related note, if there is other work like this (e.g. I have spent a fair amount of time with R and R markdown) that would be helpful to share perhaps we should talk. I can contact you offline (e.g. email) as long as you are OK with respecting my anonymity here.

    P.P.S. I seem to remember seeing an analysis discussing the number of SNPs typically seen in GWAS for differing values of -log10(p), but I don't have a handy reference. It might be helpful to estimate the number of SNPs (both possible and actual?) likely to appear in the region ~2 (i.e. factor of hundreds per above) below the current threshold.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  34. res says:
    @Davide Piffer
    Seems like a viable test. Currently I don't have the time to work on this. Do you? I am happy to offer my support.

    I have time, but I’m not sure how to get access to the GWAS results for SNPs giving strong (but not strong enough) signals. One approach would be to use your technique to find the SNPs and send the candidates to a GWAS researcher who could check them. Not sure if you know anyone like that friendly enough to your ideas to do it. I think the decoupling would also help reduce the possibility of people claiming cherry picking.

    TLDR; I’d be happy to help if I have access to the necessary data.

    P.S. On a related note, if there is other work like this (e.g. I have spent a fair amount of time with R and R markdown) that would be helpful to share perhaps we should talk. I can contact you offline (e.g. email) as long as you are OK with respecting my anonymity here.

    P.P.S. I seem to remember seeing an analysis discussing the number of SNPs typically seen in GWAS for differing values of -log10(p), but I don’t have a handy reference. It might be helpful to estimate the number of SNPs (both possible and actual?) likely to appear in the region ~2 (i.e. factor of hundreds per above) below the current threshold.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  35. res says:
    @res
    It would be interesting to see if the other SNPs correlating with either your polygenic score or the outcome variables themselves are showing up in the GWAS but just not at the extreme p-values needed to qualify.

    It seems to me one could argue for a methodology using your techniques to select the thousand best correlating SNPs and then look at those in the GWAS but using a multiple hypothesis correction for only one thousand alternative hypotheses rather than the hundreds of thousands presented by using all SNPs.

    Perhaps not justifiable for the outcome variable correlations since the same coincidence could affect both results (though if the within and between genetics really are as different as critics claim...), but I think it works for your polygenic scores.

    Another thought about this idea. It would be interesting to take a look at it by “rerunning history.” Say by starting from the IQ GWAS (plural) leading to the 9 SNPs in the polygenic score, then using that polygenic score to power the methodology described on the old data. It would be interesting to see if any (how many?) of the IQ SNPs found more recently would have been found. I think this would make a good validation test for the methodology of using Piffer’s polygenic score to augment GWAS SNP discovery. It might also give insights about an appropriate threshold to use for the correlation cutoff.

    Read More
    • Replies: @Davide Piffer
    I had proposed this reverse engineering of my method in my 2015 paper but had not thought about using existing GWAS to validate it. If you email me we can discuss it in depth.
    , @Davide Piffer
    pifferdavide@gmail.com
    , @utu
    using that polygenic score

    Traits like height, weight, IQ that have extended continuum must depend on very large number of SNPs. A polygenic score which is a sum of N SNPs can produce only N discrete values regardless whether all SNPs have the same weight or different weights. So either N must be very large if you define PS as a sum of SNPs or it is nonlinear dependence that takes into account mutual interactions among SNPs. For example for two SNPs you may have 3 possible outcomes:

    (0,1)-->Y01; (1,0)-->Y10; (1,1)-->Y11

    Then if you take into account all possible combination among N SNP's you will have 2^N possible discrete values. What I am saying is that defining a polygenic score as a sum of SNPs is probably too simplistic.

    Two interesting papers:

    Zeroing in on the Genetics of Intelligence, Ruben C. Arslan and Lars Penke

    Thanks to the stringent statistical standards adopted in genetics nowadays, many false positives were rooted out or prevented. Collaborative efforts, yielding very large sample sizes, provide a ceiling on possible effect sizes, and confidence that reported null findings are true negatives. We are slowly learning where the genetic basis of intelligence is not, and thus also where it can still be.

    Sample sizes on the order of hundreds of thousand may be needed to actually explain substantial variance by causal common genetic markers in exploratory analyses [12], although new approaches are starting to supersede both ill-fated candidate gene analyses and purely exploratory GWAS
     
    Results of a “GWAS Plus:” General Cognitive Ability Is Substantially Heritable and Massively Polygenic
    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112390#s1

    Height is highly heritable, uncontroversial in definition, and easily measured, almost without error. And yet, the SNPs identified by initial GWAS for height (reviewed in Ref [22]) each accounted for around 0.3% or less of the phenotypic variance, and in total, 3%.

     

    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  36. @res
    Another thought about this idea. It would be interesting to take a look at it by "rerunning history." Say by starting from the IQ GWAS (plural) leading to the 9 SNPs in the polygenic score, then using that polygenic score to power the methodology described on the old data. It would be interesting to see if any (how many?) of the IQ SNPs found more recently would have been found. I think this would make a good validation test for the methodology of using Piffer's polygenic score to augment GWAS SNP discovery. It might also give insights about an appropriate threshold to use for the correlation cutoff.

    I had proposed this reverse engineering of my method in my 2015 paper but had not thought about using existing GWAS to validate it. If you email me we can discuss it in depth.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  37. @res
    Another thought about this idea. It would be interesting to take a look at it by "rerunning history." Say by starting from the IQ GWAS (plural) leading to the 9 SNPs in the polygenic score, then using that polygenic score to power the methodology described on the old data. It would be interesting to see if any (how many?) of the IQ SNPs found more recently would have been found. I think this would make a good validation test for the methodology of using Piffer's polygenic score to augment GWAS SNP discovery. It might also give insights about an appropriate threshold to use for the correlation cutoff.
    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  38. utu says:
    @res
    Another thought about this idea. It would be interesting to take a look at it by "rerunning history." Say by starting from the IQ GWAS (plural) leading to the 9 SNPs in the polygenic score, then using that polygenic score to power the methodology described on the old data. It would be interesting to see if any (how many?) of the IQ SNPs found more recently would have been found. I think this would make a good validation test for the methodology of using Piffer's polygenic score to augment GWAS SNP discovery. It might also give insights about an appropriate threshold to use for the correlation cutoff.

    using that polygenic score

    Traits like height, weight, IQ that have extended continuum must depend on very large number of SNPs. A polygenic score which is a sum of N SNPs can produce only N discrete values regardless whether all SNPs have the same weight or different weights. So either N must be very large if you define PS as a sum of SNPs or it is nonlinear dependence that takes into account mutual interactions among SNPs. For example for two SNPs you may have 3 possible outcomes:

    (0,1)–>Y01; (1,0)–>Y10; (1,1)–>Y11

    Then if you take into account all possible combination among N SNP’s you will have 2^N possible discrete values. What I am saying is that defining a polygenic score as a sum of SNPs is probably too simplistic.

    Two interesting papers:

    Zeroing in on the Genetics of Intelligence, Ruben C. Arslan and Lars Penke

    Thanks to the stringent statistical standards adopted in genetics nowadays, many false positives were rooted out or prevented. Collaborative efforts, yielding very large sample sizes, provide a ceiling on possible effect sizes, and confidence that reported null findings are true negatives. We are slowly learning where the genetic basis of intelligence is not, and thus also where it can still be.

    Sample sizes on the order of hundreds of thousand may be needed to actually explain substantial variance by causal common genetic markers in exploratory analyses [12], although new approaches are starting to supersede both ill-fated candidate gene analyses and purely exploratory GWAS

    Results of a “GWAS Plus:” General Cognitive Ability Is Substantially Heritable and Massively Polygenic

    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112390#s1

    Height is highly heritable, uncontroversial in definition, and easily measured, almost without error. And yet, the SNPs identified by initial GWAS for height (reviewed in Ref [22]) each accounted for around 0.3% or less of the phenotypic variance, and in total, 3%.

    Read More
    • Replies: @res
    After all of this discussion have you still not figured out that the idea of the polygenic score is to serve as a signal of polygenic selection?! (i.e. not just as an explanation of the variance accounted for by those SNPs directly) The idea as I understand it is that even though the individual SNPs represent a small fraction of IQ variance they effectively indicate the overall selection effect operating on all IQ SNPs, which allows for more explanatory power than one would expect from the small percent variance explained.

    Thanks for reminding us that before they figured out the need to use strict multiple hypothesis corrections in genetic studies false positives were common. I think all of the SNPs we are talking about postdate that realization.

    P.S. Your N values argument applies only to individuals. Piffer's work is primarily looking at population frequencies. An important difference. It helps to understand what you are criticizing before making the criticisms.
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  39. res says:
    @utu
    using that polygenic score

    Traits like height, weight, IQ that have extended continuum must depend on very large number of SNPs. A polygenic score which is a sum of N SNPs can produce only N discrete values regardless whether all SNPs have the same weight or different weights. So either N must be very large if you define PS as a sum of SNPs or it is nonlinear dependence that takes into account mutual interactions among SNPs. For example for two SNPs you may have 3 possible outcomes:

    (0,1)-->Y01; (1,0)-->Y10; (1,1)-->Y11

    Then if you take into account all possible combination among N SNP's you will have 2^N possible discrete values. What I am saying is that defining a polygenic score as a sum of SNPs is probably too simplistic.

    Two interesting papers:

    Zeroing in on the Genetics of Intelligence, Ruben C. Arslan and Lars Penke

    Thanks to the stringent statistical standards adopted in genetics nowadays, many false positives were rooted out or prevented. Collaborative efforts, yielding very large sample sizes, provide a ceiling on possible effect sizes, and confidence that reported null findings are true negatives. We are slowly learning where the genetic basis of intelligence is not, and thus also where it can still be.

    Sample sizes on the order of hundreds of thousand may be needed to actually explain substantial variance by causal common genetic markers in exploratory analyses [12], although new approaches are starting to supersede both ill-fated candidate gene analyses and purely exploratory GWAS
     
    Results of a “GWAS Plus:” General Cognitive Ability Is Substantially Heritable and Massively Polygenic
    http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0112390#s1

    Height is highly heritable, uncontroversial in definition, and easily measured, almost without error. And yet, the SNPs identified by initial GWAS for height (reviewed in Ref [22]) each accounted for around 0.3% or less of the phenotypic variance, and in total, 3%.

     

    After all of this discussion have you still not figured out that the idea of the polygenic score is to serve as a signal of polygenic selection?! (i.e. not just as an explanation of the variance accounted for by those SNPs directly) The idea as I understand it is that even though the individual SNPs represent a small fraction of IQ variance they effectively indicate the overall selection effect operating on all IQ SNPs, which allows for more explanatory power than one would expect from the small percent variance explained.

    Thanks for reminding us that before they figured out the need to use strict multiple hypothesis corrections in genetic studies false positives were common. I think all of the SNPs we are talking about postdate that realization.

    P.S. Your N values argument applies only to individuals. Piffer’s work is primarily looking at population frequencies. An important difference. It helps to understand what you are criticizing before making the criticisms.

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
  40. @Davide Piffer
    We already have samples with more populations (52+) and ALL natives. I am talking about ALFRED. The big problem is coverage is very low so only about 10% of the variants are present. Nonetheless, it is still possible to create polygenic scores or factor analyze those. What we lose in terms of number of SNPs/genomic resolution we gain in population N/spatial resolution.

    How far can we go by imputing the ALFRED data?

    Read More
    ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
Current Commenter says:

Leave a Reply - Comments on articles more than two weeks old will be judged much more strictly on quality and tone


 Remember My InformationWhy?
 Email Replies to my Comment
Submitted comments become the property of The Unz Review and may be republished elsewhere at the sole discretion of the latter
Subscribe to This Comment Thread via RSS Subscribe to All James Thompson Comments via RSS