The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 Gene Expression BlogTeasers
Reanalyzing Data, It Does a Mind Good
Email This Page to Someone

 Remember My Information

Search Text Case Sensitive  Exact Words  Include Comments


There’s been a lot of talk on Twitter and the blogs about PLOS’ new data sharing policy. I don’t have much deep to say, except that I’m for it. I do think from what I can tell that there is a cultural element to the reaction, pro or con. People in genomics seem to be responding of the form “yes, of course.” On the other hand those in other fields have less positive reactions.

You can go elsewhere to hear “both sides.” I am confident that this will be the future, and the naysayers will have to deal. One of the major reasons that formalized data release is good is that in a field like genomics there is more data than people to analyze the data. By this, I mean that you can ask many different questions of data, but you may only be interested in a subset of those questions. Other people in your lab might have different questions, but ultimately you’re probably leaving avenues on the table because you don’t have the time or inclination. To give you a funny example, a few years ago I stumbled on the fact that Dan MacArthur probably has recent (>200 years) South Asian ancestry. As an academic genomicist Dan could have dug up this fact himself, but he has grants and papers to write, not to mention a non-scientific life. So it was left to me to stumble upon the fact. On the margin it’s not that useful to Dan, but it’s something. You never know what’s going to happen when you release data, because you can’t read the minds of others. And that sort of surprise is a good thing.

One of the greatest intellectual philanthropists in recent years has been Mait Metspalu. He has plenty of publications to his name, but he’s also generously released and assembled the data together in convenient form. This allows for easy reanalysis. A few days ago I noticed that he had put up a few more European populations, including understudied groups like Greeks. With the recent flair up on Ukraine I thought I would process some of the new data. I pruned the data set down to 230,000 high quality SNPs, and focused on a large and small data set respectively of 500 and 340 individuals.

Click for larger images.






- As suggested by Dienekes modern Greeks seem to have been impacted more by northern gene flow (Slavs) than the inhabitants of Magna Graecia (Southern Italy and Sicily)

- There’s not much difference between Poles, Ukrainians, and Russians (though there are Russian samples from traditionally Finnic regions which are more diverse)

- Not much difference between Romanians, Bulgarians, and Hungarians

- The Northern European clusters can separate reasonably. Slavic, Finnic, and Germanic

I’ll leave it to readers to make further comments.

Tools used: Plink 1.9, ADMIXTURE and TreeMix.

Methods: First two plots are MDS representations of pairwise genetic differences between individuals. I used kerneling to lasso around the centroids of specific populations. The middle two are from TreeMix, and I asked for 5 migrations, rooting with outgroups, and allowed to reorder globally. Finally, the last is just ADMIXTURE. Ran at K = 6. You see the mean for each population.

• Category: Science • Tags: Population Genetics 
Commenters to FollowEndorsed Only
  1. Chad says:

    To data, nearly all my major work has incorporated reanalysis of public data in some way (microarrays, whole-genome sequencing, etc). Some of these data sets have been sitting around for years. To me it is so obvious the value that this data has in furthering science. Furthermore, if I can answer the same question with a public dataset, rather than producing a new one, then I am saving the taxpayers time and money. What has really blown my mind are the arguments that if somebody takes public data, reanalyzes it and produces a new result, that this is somehow “scooping” or “stealing”.

    Read More
    These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
    Sharing Comment via Twitter
    More... This Commenter Display All Comments

Comments are closed.

Subscribe to This Comment Thread via RSS Subscribe to All Razib Khan Comments via RSS
Confederate Flag Day, State Capitol, Raleigh, N.C. -- March 3, 2007
The major media overlooked Communist spies and Madoff’s fraud. What are they missing today?
Are elite university admissions based on meritocracy and diversity as claimed?
The “war hero” candidate buried information about POWs left behind in Vietnam.
The evidence is clear — but often ignored