The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 TeasersGene Expression Blog
Reanalyzing Data, It Does a Mind Good
🔊 Listen RSS
Email This Page to Someone

 Remember My Information


Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks


There’s been a lot of talk on Twitter and the blogs about PLOS’ new data sharing policy. I don’t have much deep to say, except that I’m for it. I do think from what I can tell that there is a cultural element to the reaction, pro or con. People in genomics seem to be responding of the form “yes, of course.” On the other hand those in other fields have less positive reactions.

You can go elsewhere to hear “both sides.” I am confident that this will be the future, and the naysayers will have to deal. One of the major reasons that formalized data release is good is that in a field like genomics there is more data than people to analyze the data. By this, I mean that you can ask many different questions of data, but you may only be interested in a subset of those questions. Other people in your lab might have different questions, but ultimately you’re probably leaving avenues on the table because you don’t have the time or inclination. To give you a funny example, a few years ago I stumbled on the fact that Dan MacArthur probably has recent (>200 years) South Asian ancestry. As an academic genomicist Dan could have dug up this fact himself, but he has grants and papers to write, not to mention a non-scientific life. So it was left to me to stumble upon the fact. On the margin it’s not that useful to Dan, but it’s something. You never know what’s going to happen when you release data, because you can’t read the minds of others. And that sort of surprise is a good thing.

One of the greatest intellectual philanthropists in recent years has been Mait Metspalu. He has plenty of publications to his name, but he’s also generously released and assembled the data together in convenient form. This allows for easy reanalysis. A few days ago I noticed that he had put up a few more European populations, including understudied groups like Greeks. With the recent flair up on Ukraine I thought I would process some of the new data. I pruned the data set down to 230,000 high quality SNPs, and focused on a large and small data set respectively of 500 and 340 individuals.

Click for larger images.






– As suggested by Dienekes modern Greeks seem to have been impacted more by northern gene flow (Slavs) than the inhabitants of Magna Graecia (Southern Italy and Sicily)

– There’s not much difference between Poles, Ukrainians, and Russians (though there are Russian samples from traditionally Finnic regions which are more diverse)

– Not much difference between Romanians, Bulgarians, and Hungarians

– The Northern European clusters can separate reasonably. Slavic, Finnic, and Germanic

I’ll leave it to readers to make further comments.

Tools used: Plink 1.9, ADMIXTURE and TreeMix.

Methods: First two plots are MDS representations of pairwise genetic differences between individuals. I used kerneling to lasso around the centroids of specific populations. The middle two are from TreeMix, and I asked for 5 migrations, rooting with outgroups, and allowed to reorder globally. Finally, the last is just ADMIXTURE. Ran at K = 6. You see the mean for each population.

• Category: Science • Tags: Population Genetics 
Hide One CommentLeave a Comment
Commenters to FollowEndorsed Only
Trim Comments?
  1. To data, nearly all my major work has incorporated reanalysis of public data in some way (microarrays, whole-genome sequencing, etc). Some of these data sets have been sitting around for years. To me it is so obvious the value that this data has in furthering science. Furthermore, if I can answer the same question with a public dataset, rather than producing a new one, then I am saving the taxpayers time and money. What has really blown my mind are the arguments that if somebody takes public data, reanalyzes it and produces a new result, that this is somehow “scooping” or “stealing”.

Comments are closed.

Subscribe to All Razib Khan Comments via RSS