Razib Khan
Open genomics

I had a friend recently email me about human genetic data sets. Some, like POPRES, are restricted to researchers. But there are a lot of data available for the public. Zack Ajmal has posted on most of them at some point. Feel free to post links to others in the comments.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Genomics, Open genomics 
A few months ago I purchased a decent desktop just to crunch ADMIXTURE and other packages to analyze genomic data. More recently I set up a ~100 GB Dropbox account, and have started to “push” all of my output files from ADMIXTURE, PLINK, etc., as well as various scripts (Perl, shell, R, etc.) into the public folder (more precisely, a script is running ADMIXTURE and moving the files into the appropriate Dropbox folders as I type this, and Dropbox syncs with the online folders). I’m doing this for two reasons.

First, I want to make the pipeline of data generation easier for me. Instead of running ADMIXTURE, and then processing the files laboriously with R to generate plots, I’ve now created a system where a few automated scripts begin ADMIXTURE runs, and then another script creates files for distruct, and runs distruct, and then trims the images output and converts them into PNGs. This should allow me to resurrect my side projects, even while I’m rather busy with the “main events” of my life.

Second, I am beginning to feel that the promise of the “genome blogging revolution” kind of faded out. Granted, there’s only so much you can do with the same data sets, so I’m going to try and put together large pedigree files in my Dropbox account. But it seems like people need more of a push. Toward that end I hope that distribution of scripts which make the process more “turnkey” will stimulate people going forward.

Addendum: I know that some of the first paragraph is going to be gibberish to some readers. But I hope you’ll appreciate the outcomes of that gibberish!

Call to Participate in a New Study on Social Networking and Personal Genomics:

Do you share your information with others? How has your personal genetic information influenced your lifestyle and the way you approach your health and medical decisions? Can genetic information create new communities and connections?

The Social Networking and Personal Genomics Study at the Center for Biomedical Ethics invites participants between the ages of 18 and 75 to spend approximately 2 hours with us in a focus group setting. Participants must have purchased direct-to-consumer personal genetic information from 23andMe, Inc., shared their information with others, and be willing to discuss their perspectives and experiences. Focus group members will receive a $50 gift card for their participation and childcare will be available on an as-needed basis at no cost. For additional information or to enroll, please contact Simone Vernez, Project Manager, by email at or by telephone at (650) 723- 9364. For more information on the study itself, including specific research aims and funding please visit For general information about participant rights, contact 1-866-680-2906.

I released my 1,000,000 SNPs into the public domain yesterday. Why? To borrow a line from William Jefferson Clinton: because I could. And perhaps, because I should? There’s nothing to fear. Genomes Unzipped hit all the salient points when that crew released their data into the public domain. How about releasing yours?

Also, I didn’t consult my parents or my siblings about this decision. I weight the probability of downside risk to them trivially low. Far lower than the chance of them getting killed in a car crash. My genome, my right. Of course I’m no Howard Roark, a pure egoist. I asked one person of interest who might suffer some future downside risk with me, and they had no concern if I did this if my judgement was that the fear was unfounded. Honestly, I’m a cerebral person of the mind, a man of reflection, not action, but this act did make me feel as if I’m affecting a little bit of change in this world.

A friend on Facebook asked me if I wasn’t worried about being Gattacaed in the future. No, not at all. There’s nothing actionable in my SNPs, and once the scientific research catches up with the genotype data I suspect that our full genome will be part of the record which institutions are already accessing routinely. The technology to surreptitiously sequence and analyze the genetic information of others will be there, and will be used. The issue at hand is not if, but when. I think we need to confront the possibility of radical transparency as part of the near term future. Speaking of which: how do you know that is my genotype? Trust me, it is! :-)

