The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

23andme_logo I really admire what 23andMe has done. To a great extent they are the “Uber” of DTC personal genomics. FamilyTree DNA really pioneered the sector in the early 2000s, while The Genographic Project scaled things up massively in the middle 2000s. But in the late 2000s 23andMe brought Silicon Valley “disruption” to the game, pushing into disease and traits in a way that both the two earlier efforts consciously avoided. We know how that ended.

But it wasn’t all in vain. 23andMe is today a healthy company, and its shoot-first-ask-questions later actions in the first half of the teens really brought personal genomics into peoples’ lives.

So what’s going with stories like this, 23andMe Has Abandoned The Genetic Testing Tech Its Competition Is Banking On:

For years, genetic-testing startup 23andMe was working to develop a cutting-edge technology that could dramatically expand what its customers might learn about their DNA. While the company’s core product, a $199 “spit kit,” can tell you about your health and ancestry based on small bits of your genetic code, tests based on the new technology — called next-generation sequencing — could provide much more comprehensive information, including your potential risks for many diseases.

But 23andMe has given up on the technology for now, BuzzFeed News has learned.

I think one way to understand what’s going on is that though the firm’s consumer face is still as a DTC personal genomics outfit, it is really banking on becoming a genetically savvy pharmaceutical corporation. Genomics is the future, but pharm is the present.

51cz5E2hKTL._SX378_BO1,204,203,200_ 23andMe probably has ~1.5 million genotypes now. They’ll confirm more than 1 million. If they had more than 2 million I assume they would tell us they did. What are they doing with those genotypes? It was always understood by most that 23andMe was increasing its database to the point where they could generate associations that academics could not because of lack of statistical power. The problem now, with more than 1 million genotypes, is that they need phenotypes.

It is much more valuable for 23andMe to get rich data on one customer, than it is to gain one hundred more random genotypes. That’s probably why they’re not sweating that the $199 price point discourages people, especially when those people are getting less than they did in the past. That’s also why they are pulling out of the game in next generation sequencing. Sequencing is basically a commodity business now, and just not as good a return on investment as gearing toward the pharmaceutical market. Sequencing deeply has some benefits, but there is no way 23andMe would be able to subsidize the $1,000 cost of a good 30x genome to get enough of a sample size to return the investment.

None of this is a big secret. A friend of mine was talking about this in the broadest sketches at the 23andMe party at ASHG.

• Category: Science • Tags: 23andMe, Genomics 
🔊 Listen RSS

Screenshot 2016-08-26 19.23.09

Update: In light of further comments I may have been wrong about Hong’s recent admixture! See the comments below (also, further discussion with Spencer Wells offline). I don’t have total clarity on what’s going on, because I’m sure my friends weren’t lying…but they were also early adopters, and the methods may have changed. And, I do think 23andMe has the talent and methods to resolve Korean ancestry, so it’s a matter of investment, not data.

All that being said, all individuals should pull down the raw data and do a reanalaysis.

End update

Quartz has an article up, 23andMe has a problem when it comes to ancestry reports for people of color, which I want to comment on at length. Though literally taken the title is not something I’d disagree with too much, the tone and details I have serious issues with.

First, some disclosure. Hong talked to me on the phone for an hour about this story. Mostly we talked about her Korean ancestry results. More on that later. Second, I consulted for 2.5 years for Family Tree DNA, am friends with Spencer Wells (who is quoted), and am on friendly terms (I’d like to think!) with Joanna Mountain, and quite respect many of the scientists at 23andMe (e.g., Kaisa Bryc and Ivan Juric off the top of my head).

I will go through the article point by point. First:

I doubt that most 23andMe users realize how paltry the company’s data is for non-Caucasians. For example: The data set that 23andMe used to generate my report has 76 Koreans in it, according to Dr. Joanna Mountain, the company’s senior director of research. 76 Koreans. It is estimated there are at least 7 million Koreans living outside of the Korean peninsula—including 1.7 million in the US—among a worldwide population of 83 million.

Seventy-six Koreans seemed small to me, but what do I know? I’m just a journalist. So I spoke to geneticist Spencer Wells, founder and former director of National Geographic’s Genographic Project (arguably a 23andMe competitor), which he ran from 2005-2015. “[76] is a really low number,” he concurred.

The small sample sizes seem really, really problematic if you are a lay person, or a journalist. The issue is that with genotype technology that looks for common polymorphisms you really don’t get that much more information from 1,000 individuals than you do from 100. All things equal, more sample size is better, but the gap between 10 and 100 is much much greater than 100 and 1,000 or 100 and 10,000. You can see this in the robustness of results for model-based clustering conditional on different sample sizes. For a homogeneous population like the peoples of the Korean peninsula, who seem relatively panmictic, a bigger sample size would have only marginal effect on the overall outcomes using these methods (also, it might matter if you were looking at low-frequency alleles from whole genome sequencing).

Before I talked to Hong I checked in with a friend who was half north Korean (in that her father’s family was from the northern half of the peninsula and migrated south) and half central Korean (i.e., her mother’s family was from around Seoul). Just like her husband, whose family was from Busan in the far south, her results came back as 99% Korean. Some genetic research has been done on Koreans, and there just isn’t that much structure. The Koreans have a composite origin if you go far back enough, but they’ve been intermarrying with each other a long time.


Also, astonishingly, the report shows that I am 13.4% Japanese and 14% Chinese—and only 61.6% Korean. I was looking forward to watching my parents freak out. My sister texted me, “Oh [Dad will] probably blame Mom.”

To my disappointment, my parents did not freak out, nor did they get into an amusing argument about which of their ancestors was the ho. Because they simply did not believe the data. And, for once, they were right.

The public relies on journalists for the truth. Sometimes the truth can be slippery. But sometimes it is clear. Most of conversation between Hong and myself was about her Korean ancestry. As I said to her, I asked a handful of my Korean friends about their 23andMe results before we spoke. From that I told Hong I was 99% sure that she had recent non-Korean ancestry. 23andMe’s results are really robust. I tried to emphasize that over and over. Hong can believe what she wants, but it is obvious that she almost certainly has non-Korean ancestry relatively recently in the past.

Because 23andMe uses chromosome painting, you can see she has very long segments of inferred Chinese and Japanese ancestry. This non-Korean ancestry is probably from within the last three generations because ancestry tract lengths indicate that recombination hasn’t broken apart the associations across the chromosomes (there are 20-40 recombination events across the genome per generation).


I asked Wells whether my percentage breakdowns of Korean, Chinese, and Japanese meant anything. “Yes,” he said, “but I think it is misleading to go to a decimal place or even to go out two digits.” Wells said that another problem with the data is that “Most of those [samples] are from the US. They’re not terribly useful for studies of indigenous composition—which is effectively what this analysis is trying to do.”

I had a long text conversation with Spencer on this after the article came out. I can see where he’s coming from. And 23andMe does have a shortfall of indigenous and non-European samples. But as I said, I asked around to Korean friends who had used 23andMe before and the population is pretty homogeneous, and the friends’ results I cited above were representative. I have also worked with and seen samples from Family Tree DNA, and it’s the same story. There might be undersampled populations from Korea, but I’d bet against it. Koreans are relatively homogeneous, with a position between Japanese and North Chinese. Where you would expect them to be.

Spencer is correct about the decimal places issue. They give people a false impression of precision. I do know that scientists within DTC companies struggle against it. But scientists don’t always win these arguments.


I also interviewed Harvard geneticist Robert Green, who made the important point that private companies have different methods and standards from those of an academic lab. “There is a difference between analysis you can do with hundreds of [genetic] markers at a research level, and the kind of analysis that even the best companies can do, which is more an approximation,” he said.

Green is a medical geneticist who does great work. But I’ll be generous and assume he’s taken totally out of context here, because what he says makes no sense. The genotyping platforms do have error rates (no-calls, mistypings, etc.) on the order of 1%. But they’re using hundreds of thousands of SNPs. This error rate doesn’t matter too much for what 23andMe is doing in relation to ancestry. And with population structure inference these errors usually don’t cause a major issue if they aren’t systematic.

Then there’s this:

A few of the geneticists I interviewed for this article (but not Green or Wells) outright accused 23andMe of commercially driven ethnic bias. For example, no distinction is made between northern and southern Chinese, who have very different traits. This was a serious allegation, so I put the question directly before 23andMe’s Mountain. “As a scientist, I find that insulting,” she said in a phone interview.

I brought up the issue with the Chinese to Hong, and I apologize to Mountain here if it came off as offensive, because I certainly didn’t mean it that way. My point, which I’ve brought up for years both in public, and when I have consulted for DTC companies, is that South and East Asians are huge groups, and it’s incongruous that they aren’t differentiated as much as the Europeans. These tests basically tell you are South Asian, or Chinese, or Korean, or Japanese. In the case of Koreans and Japanese there isn’t that much structure within these groups, but that is not the case with the Han Chinese. There is an decent amount of structure, but last I checked 23andMe has a catchall Han Chinese group. Why? I’ll get to that later. (It’s not because they don’t have the data.)

Though I disagree with the tone and the emphasis, a simple inspection by Hong has shed light on something that has been glaringly obvious in the genetic genealogy community: there is laser-like focus on differentiating very close Northern European groups, such as Irish and English, and not so much emphasis on differentiating diverse populations such as South Asians. This was one thing I did talk to Hong about at length. I don’t think it’s crass racism, and I think that I made that clear to her, but I’m not happy with the situation either (23andMe representatives know I’m not happy, and have talked to me about it at ASHG).

The final sections involve Hong reviewing the disparities in sample representation. As I said above, some of this overdone. But, it is a little ridiculous that there are only a few hundred African population samples in their data. Granted, it turns out that between-population genetic distance in Africa is actually not as much as you’d think based on aggregate variation (the within population variation is what makes all the news). I think Hong is correct that 23andMe should have made more effort on sample collection these past few years…but I’m not CEO of 23andMe, and Joanna Mountain and her scientists don’t call all the shots. I think Hong’s piece leaves Mountain and the researchers holding the bag for something that really isn’t their doing (perhaps it is, but I’m really skeptical of that).


Could the company be doing a better job with collecting ethnographic data? “Absolutely they could,” Wells said, “but it’s not their raison d’être.” Which, of course, is pharma and health research. Fair enough—it’s their money. But how about a disclaimer attached to the ancestry part of the report? Like, “for entertainment purposes only?” Because data based on 76 Koreans (or any other ethnic group) is definitely not worth potentially causing family discord or a blood feud. I don’t know whether the company understands the realities of deadly global ethnic tensions and the potential damage created by people’s trust in these reports.

I think Spencer has highlighted the major dynamic here: 23andMe is pivoting towards biomedical research. It has a database of north of a million, mostly European-origin individuals. The real money now comes from leveraging the database to collect information on health, and combining it with the genotypes they already have. On the margin, getting greater population diversity is probably not a major avenue by which they could gain higher valuations. And getting from one million to ten million genotypes is nothing without increasing their database of phenotypes.

The real story here is not one of racism. It’s one of capitalism. Most of 23andMe’s customers are white European in ancestry, and a disproportionate number of those are Northern European. Is it a surprise that their tools breakdown Northern European ancestry so finely? That’s their customer base.

Second, many Asians I’ve talked to are relatively uninterested in fine-grained breakdowns in their ancestry. For several years I worked with an engineer from Fujian, and his Family Tree DNA results showed that he was shifted toward the southern end of the north-south Chinese cline. He didn’t care at all, because he was from Fujian, so of course he knew this. Many Asians seem to have this attitude where the ancestry results are viewed as confirmatory. Hong’s case, where there was a surprise, is exceptional.

If 23andMe wanted to they could easily breakdown Asians into further subcomponents. I think there are two reasons they don’t want to aside from the firm’s recent focus on health and pharma. First, they don’t have that many Asian customers. Second, their Asian customers might actually get a bit irritated!

Ultimately, Hong can think whatever she wants to about her 23andMe results. But the data are out there. It’s pretty obvious that unless there was a sample mix-up, she has recent Chinese and Japanese ancestry (she could put the raw results in the public domain and have people cross-check with other methods, like PCA; I’m pretty sure they would confirm the 23andMe results).

On a last nerdy note: the data generated by DTC companies is great. Their Illumina SNP-chips are really good, with 99% or so correct-call rates. Hong referred to data in the piece when she really meant results. The thing is that results are basically generated through a sieve of methods geared toward human digestibility. 23andMe and other DTC companies differ because of different methods and parameters in those methods, that are determined by what humans want out of these techniques. But the data, that’s pretty straightforward and robust.

If you are interested in a more philosophical take, Joe Pickrell’s What is ancestry?

Addendum: My conversation with Hong was very wide-ranging. We talked about EDAR, random mating populations, and local ancestry deconvolution. Well, perhaps not in those words. It’s a little saddening to me that ultimately what came out of all that is a piece which tries to paint 23andMe as prejudiced against minorities. The only prejudice they exhibit as a firm is against smaller market share.

• Category: Science • Tags: 23andMe 
🔊 Listen RSS

23andMe_Logo_blog23andMe just made a huge deal with the biotech firm Genetech. You’ve probably heard about the details elsewhere, but if not, Matt Herper has an excellent lowdown:

A deal being announced today with Genentech points the way for 23andMe, the personal genetics company backed by Facebook billionaire Yuri Milner and Google Ventures to become a sustainable business – even if the company’s discussions with the U.S. Food and Drug Administration stretch on for years.

According to sources close to the deal, 23andMe is receiving an upfront payment from Genentech of $10 million, with further milestones of as much as $50 million. The deal is the first of ten 23andMe says it has signed with large pharmaceutical and biotech companies.

Such deals, which make use of the database created by customers who have bought 23andMe’s DNA test kits and donated their genetic and health data for research, could be a far more significant opportunity than 23andMe’s primary business of selling the DNA kits to consumers. Since it was founded in 2006, 23andMe has collected data from 800,000 customers and it sells its tests for $99 each. That means this single deal with one large drug company could generate almost as much revenue as doubling 23andMe’s customer base.

From what I know 23andMe has been losing money for on the sales of kits. They were loss leaders. And 23andMe isn’t a non-profit. Though “insiders” have been talking for years how 23andMe has wanted to start selling its huge database to pharma/biotech, even if you were a casual consumer you could connect the dots and assume that there’s a reason they were collecting data on your traits and pestering you to fill in all sorts of personal details on your profile. It’s a business. And businesses try to work the angles to make a profit.

Nevertheless, some people are outraged. My question is simple: is there something about genetic and medical information that privileges it and makes it more precious than the enormous cloud of data private firms already have on you? I don’t think genes are magic, so I’m not convinced, though there are some real issues like life insurance risk for those with highly penetrant disease variants.

Years ago I had a discussion with Mike Snyder about the utility of genomic information for practical ends such as in personalized medicine. The problem to me seemed to be that sample sizes for most academic and even industry studies were just too small. Everything was too under-powered to really find much new and interesting. The real juice would be squeezed by enormous sample sizes on the order of hundreds of thousands, as well as whole genomes at reasonable coverage, intersected with non-genetic data. Yes, health history, but also various lifestyle factors. All of this requires a level of permeability across what are today information silos, and also a comfort of individuals with their information swelling the massive data cloud. The upside on the individual scale is a yield of new results. Of course the problem is that this system encourages “free riding” from those who wish to receive the benefits of the research, but do not want to “opt-in.” They get the benefits of science without sacrificing their privacy. 23andMe has at least shown that this isn’t an insurmountable problem, as it has convinced hundreds of thousands to opt-in to its research. The question is whether this was done transparently.

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS
IBD segments in my children

IBD segments in my children

One minor aspect of having multiple children when you’re interested in genetics is that you get to “run your own experiment”, so to speak. The above image is a screenshot of 23andMe’s Genome View feature, which basically allows you to compare your genome with relatives. Above you see the comparison of the autosomal genome of my son and my daughter. They’re full siblings. So the expectation of their relatedness in terms of identity by descent is 0.50. The reality is that it’s 0.53. They’re a little more related than the expected value. But a simple summary isn’t that interesting. What’s obvious above is that across long regions of the genome my children are identical twins. In other regions of the genome they’re half identical. Finally, there are segments where they aren’t related all. If you are a bit confused how that could be, imagine that my daughter inherits segments from her maternal grandmother and paternal grandfather, while my son inherits segments from his paternal grandmother and maternal grandfather.

It’s called life in the 21st century.

• Category: Science • Tags: 23andMe, Genomics 
🔊 Listen RSS
Razib's daughter's 23andme Chromosome Painting

Razib’s daughter’s 23andme Ancestry Composition

Everyone knows that I think 23andme provides a great service. But I’ve had some criticisms in the past. Several years ago I thought it was rather strange of them to limit their chromosome painting feature to only a few ancestral components when it produced strange confusing results (e.g., many East Africans being mostly European in ancestry). Over the past few years they’ve nicely expanded their ancestry components, addressing this concern. But at this point my own inclination is to say that they’ve gone too far. For example a friend who is ethnically Japanese from Honshu gets these results: 76.1% – Japanese, 3.5% – Korean, 19.7% – Nonspecific East Asian, 0.4% – Nonspecific East Asian & Native American and 0.3% – European. I can give you reasonable explanations for these proportions, but it’s going to be confusing for many Japanese to be told they’re only 3/4 Japanese genetically. The issue is the scope of their reference population. It’s not capturing the diversity of the whole population of Japan.

But this is a minor concern in comparison to something else I’ve noticed. Here are some ancestral proportions from my family:

French & German British & Irish
Razib’s daughter 14.0% 8.0%
Razib’s wife 5.8% 1.6%
Razib’s father-in-law 5.2% 2.4%
Razib’s mother-in-law 3.0% 3.1%

I’ll tell you right now that I don’t have any European ancestry according to 23andMe, so my daughter’s elevated French & German and British & Irish have nothing to do with her paternal lineage. In addition I can look at the chromosome level ancestry. Most of my daughter’s chromosome 3 and all of chromosome 6 are French & German (at least the ones obviously inherited from her mother). But there is no French & German ancestry for my wife on these chromosomes (either copy). There is one recombination event from her maternal chromosome 6 and three for chromosome 3, but I don’t think that should have such a large effect.

So hypotheses? My own hunch is that clusters like French & German are somewhat artificial, insofar as they cover a very large geographic area (though granted Europe from the Bay of Biscay to the Elbe is definitely relatively genetically homogeneous). People of mixed European ancestry, like many American whites, often may resolve strangely because the methods used have a difficult time distinguishing mixed ancestry from populations which are composed of mixed ancestry (like many American whites the “French” have diverse ancestries from different regions of Europe, so many Americans may look somewhat “French”). A friend from Guatemala who is ethnically mestizo of many generations has 20% unassigned ancestry, presumably because so many recombination events have intercalated Amerindian and European segments, making it impossible for 23andMe to give a correct assignment. My parents and myself have unassigned proportions of nearly 10%, likely due to ancient admixture between our South and East Asian components, which dates back over 1,000 years in the past.

Personal genomics services are great, and I heartily recommend them. But people should be careful about taking all the results at face value. It’s like with anything else, be an informed and cautious consumer. It can be great for some things, like Dan MacArthur’s South Asian ancestry, which literally jumped out of the genetic background. But the finer you get in the grain, the more confusion will result.

Update: In the comments 23andMe scientist Eric Durand asked if I’d enabled split view. The reason he asks is that 23andMe phases the genotypes (reconstructed each physically linked segment of chromosome of the pair) before assigning ancestry along a segment. If you don’t have family information you have to use population based information. But more powerful are parental trios, since offspring simply have mixed & matched segments of their parents. And yes, I do have split view enabled, and checked it. Same weird result:



• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

23andMe As most of you know 23andMe is no longer providing health interpretation services, though they are still providing genealogy (and are unrolling a more advanced ancestry painting right now). You can still download raw data though, so you can find third party providers to calculate health risks. Unfortunately, as some have noted 23andMe actually did a rather good job outlining the probabilities. And, I think this answers the question about what 23andMe’s strategy was when it started courting FDA intervention about a year ago when it began to emphasize their health services (you probably encountered the marketing push somewhere). There doesn’t seem to have been a master strategy with contingencies. I obviously wish the company luck, and am hopeful it will make it through this rough patch, but they need to be on the ball from now on. For the customer base, I think it is fine to order their kits so you can get the raw data and genealogy services.

• Category: Science • Tags: 23andMe 
🔊 Listen RSS

Earlier editions:

Using your 23andMe data: exploring with MDS
Using your 23andMe data in Plink

From Reconstructing Indian Population History:

We hypothesize that founder effects are responsible for an even higher burden of recessive diseases in India than consanguinity. To test this hypothesis, we used our data to estimate the probability that two alleles from a group share a common ancestor more recently than that group’s divergence from other Indians, and compared this to the probability that an individual’s two alleles share an ancestor in the last few generations due to consanguinity…Nine of the 15 Indian groups for which we could make this assessment had a higher probability of recessive disease due to founder events than to consanguinity, including all the Indo-European speaking groups (Table 2). It is important to systematically survey Indian groups to identify those with the strongest founder effects, and to prioritize them for studies to identify recessive diseases and map genes.

South Asian populations exhibit a lot of between population genetic distance, and not simply as a function of geography. With more markers and an expansive data set Dan MacArthur will be able to assess exactly which South Asian caste his ancestry is from.

But this is an issue where I have fancied myself an outlier. My own background is moderately heterogeneous, and I’ve always explained to people that I’m not inbred like most South Asians, only half in jest (from what I can tell Muslims in the subcontinent have castes too, though they may somewhat different terminology). I know that my paternal grandmother came from a Brahmin family (clear by the customs preserved in the family even in her generation), while my maternal grandfather was almost certainly from a group with a Kayastha origin (going by surname, and who my mother actually clusters with). My maternal grandmother had considerable non-Bengali ancestry, which does show up in Middle Eastern signatures in my mother.

But this is talk. Am I truly not as inbred as the average brown? Leveraging methods which I discussed earlier (see posts above) I can very quickly check this.

First, you need to prune your data set to a reasonably homogeneous reference population which resembles your own ethnic makeup. The way you infer the extent of inbreeding is simply to look at the distribution of genetic variants, and see how shifted away from the population norm you are. Since different populations have different background distributions putting yourself within the wrong reference set leads to absurdity. Compared to a Bushmen reference every non-African would come out as inbred. The computation is not faulty, but it’s not giving you useful information.

In the .fam file of PHYLO I picked out every single non-Pakistani South Asian as my reference, mostly Gujarati, but with some South Indians as well. By looking at the expected genotypes by pooling this population together I want to get a sense of my own place. Additionally I’ll add my daughter and my 1/4 Filipino friend as controls, in that they should be way less “inbred” than everyone else since they are the products of recent admixture.

After using the – -keep function of Plink I merged the file with my own, my daughter’s, and friend’s. There were north of 90,000 SNPs, more than sufficient for the simple computation I wanted to do. I’ll output the F-statistic with the – -het function like so:

plink – -noweb – -bfile DATASET – -het

The output is in plink.het. You’ll see the labels in the leftmost column, and the statistic you want in the rightmost column. In the results below are sorted from most to least inbred, at least using the F-statistic as a measure of that (this isn’t really totally accurate because the population isn’t really a homogeneous random mating set, but I think it gets the intuition out there):

The reason that my daughter and my friend have negative values is that they have way fewer homozygotes than you should get my chance. But they’re recent admixtures, so question of inbreeding is near not-even-wrong for them. The Plink documentation says that negative F values are noise (they are not contamination in this case), but I think I’ll chalk it up to a not-totally-homogeneous population. My position this list is not as low as I’d like, but I’ll take it. I believe I can still claim I’m less inbred than the average brown.

🔊 Listen RSS

Note: please read the the earlier post on this topic if you haven’t.

The above image is from 23andMe. It’s from a feature which seems to have been marginalized a bit with their ancestry composition. Basically it is projecting 23andMe customers on a visualization of genetic variation from the HGDP data set. This is actually a rather informative sort of representation of variation. But there has always been an issue with the 23andMe representation: you are projected onto their invariant data set. In other words, you can’t mix & match the populations so as to explore different relationships. The nature of the algorithm and representation produces strange results, so varying the population sets is often useful in smoking out the true shape of things.

With the MDS feature I wrote about yesterday you can now compute positions with different weights of populations and mixes. This post will focus on how to manipulate the overall data set. You should have PHYLO from the the earlier post. Open up the .fam file. It should look like this:

Malayan A382 0 0 1 -9
Paniya D36 0 0 1 -9
BiakaPygmies HGDP00479 0 0 1 -9
BiakaPygmies HGDP00985 0 0 1 -9
BiakaPygmies HGDP01094 0 0 1 -9
MbutiPygmies HGDP00982 0 0 1 -9
Mandenkas HGDP00911 0 0 1 -9
Mandenkas HGDP01202 0 0 1 -9
Yorubas HGDP00927 0 0 1 -9
BiakaPygmies HGDP00461 0 0 1 -9
BiakaPygmies HGDP00986 0 0 1 -9
MbutiPygmies HGDP00449 0 0 1 -9
Mandenkas HGDP00912 0 0 1 -9
Mandenkas HGDP01283 0 0 1 -9
Yorubas HGDP00928 0 0 2 -9

And so forth. PHYLO has 1,500+ individuals. This is a bit much, which is why the – -genome command took so long. To ask particular questions it is often useful to prune the population down. I have a friend who is 1/4 Filipino who is curious as to whether his ancestry was more Chinese or native Filipino. How to answer this?

– You want a range of East Asian populations, north to south.

– You want a good out group. I’ll use the Utah whites.

All you need to do is go through the .fam file and keep only those lines you want, and put them into a new file, keep.txt. Then you run this command:

plink – -noweb – -bfile PHYLO – -keep keep.txt – -make-bed – -out PHYLONARROW

So I’ve now made a new pedigree data set which is a subset of the original. Now I merged my friend and my daughter’s genotype into this data set. What about if I wanted to remove some individuals, for examples, the ones in keep.txt? You do it like so:

plink – -noweb – -bfile PHYLO – -remove keep.txt – -make-bed – -out PHYLOAFEWGONE

With – -keep and – -remove, and making files drawn from the .fam file(s), you can customize your own data set for your own purposes. Again you want to produce an MDS, so run:

– -plink – -noweb – -bfile PHYLONARROW – -genome

-plink – -noweb – -bfile PHYLONARROW – -read-genome plink.genome – -mds-plot 6

This time – -genome will run very fast, because there are far fewer individuals. Here is my plot of the result of the outcome (my friend is “RF,” my daughter is “RD”):

Note that RF is aligned straight toward the “Dai” population, an ethnic group from South China, but not Han (they are related to the Thai). It seems plausible that my friend is of mixed Chinese and Filipino background. My daughter’s minimal East Asian ancestry is indeed Southeast Asian, and this is clear from this plot, as she is shifted further toward the Cambodians (this may be due to South Asian affinities as well).

The point is not to rely on one plot, but to generate many so as to explore the possibilities, and develop and intuition.

• Category: Science • Tags: 23andMe, Genomics 
🔊 Listen RSS

Looks like 23andMe has a new $99 price point. If so, that’s 100 markers per cent! (here’s the press release)

1) Privacy: Yes, this a privacy risk. 23andMe is fundamentally an IT company, and IT companies mess up. But I am confident that within 10-15 years genetic information is going to be pretty easy to get anyhow. Your data will be in too many places for any expectation of privacy.

2) Cost/worth it: That is dependent on your income. If you are willing to spend $100 on a nice meal, I think $100 for 1 million markers is an excellent proposition. The markers never depreciate, though in the near future you will you get sequence data which will supersede them.

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

At this point if you have spare cash why not shell out $300 for a raw copy of your genotype? (yes, I know 23andMe provides other services) I’m sure many readers spend $100 on nice meals now and then. That’s one day. Your genotype won’t ‘depreciate’ in a literal sense, and more practically until whole-genome sequencing gets affordable within the next decade (i.e., < 10 years) 1 million SNPs is a pretty good deal. And not to be morbid, but it is probably best to get older family members typed now (though if they have had hospital stays you can probably later retrieve genetic material, it will be a bureaucratic pain).

The reason I’m posting this now though is that I received a notification about a $50 discount code from 23andMe. Here it is: YHPRD7. It’s valid for the next few days. $50 isn’t trivial for most people, so perhaps it will prompt a few here to go and purchase.

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

23andMe has done some great things, and I highly recommend its service to friends. But I’m really glad that CeCe Moore is being consulted by them in regards to improving their ancestry feature set. Below are the “ancestry paintings” for myself & my daughter.

According to 23andMe I’m 40% Asian, and she is 8% Asian. Obviously something is off here. The situation easily resolved itself when I tuned my parameters and increased my sampled populations in Interpretome. But it just goes to show you the limits of this sort of thing without fine-grained control of the details of the analysis.


• Category: Science • Tags: 23andMe 
🔊 Listen RSS

A few readers have pointed me to controversies having to do with 23andMe’s “terms of use”. You can read about it over at Your Genetic Genealogist, who has two posts up on the issues. I think the crux is that the early enthusiasts for personal genomics in the genetic genealogy community can not support the revenue needs of a firm like 23andMe. The question for the firm is how to expand its reach more fully into the domain of personalized healthcare, where the big money and mainstream impact is, without alienating these early adopters, who are not bashful about spreading bad buzz all over the blogosphere.


From what I can tell there’s a lot of confusion as to what’s going on. Myself, I don’t care about the details too much. My main interest is getting the raw data, I don’t pay that much attention to the various health & genealogy services that 23andMe provides. But I can understand why others feel differently. I also know that 23andMe is not irrational, and is trying to run a firm which can generate a profit. They’re not a charity.

The key is how they can make the “person on the street” more interested. I have purchased eight accounts in their system, most of them with the monthly personal genome service fees. It’s pretty clear that most of the people who I’ve purchased these accounts for don’t play close attention to the results. Yes, they were curious, but they haven’t kept up with the health report updates, or explored the other services. Obviously I’m going to cancel the subscriptions for that reason, as I’m not interested in paying for a service that’s not being utilized.

I wish 23andMe, and all the new personal genomics firms, the best of luck. This is a time of great change, and I think in 2020 this sort of service is going to be a seamless part of our lives. But working out the details isn’t always going to be without error (my own suggestion would be a reversion to more fine-grained service with the subscriptions). Life comes at you fast….

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

From 23andMe: “To show our appreciation and to encourage others to join in this research revolution we are giving you a $50 coupon that you can share with as many people as you like. This coupon expires in 7 days (August 9, 2011) so make sure you get the word out fast.” At current prices that works to 24% off for the yearly price ($9/month X 12 months + $99).

(this is for “new customers only”)

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

First, Sam Snyder. Here’s the link to the file in dropbox.

Second, Heather Frawley. I’ve uploaded her text file as well as pedigree format at RapidShare as a zip file. Click “Free Download” at the bottom right of the page. It’ll take about ~5 minutes to pull down the 10 MB file.

Remember, if you want to have your public genotype posting publicized or want me to upload and format it, email me at contactgnxp -at- gmail -dot- com.

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

According to Your Genetic Genealogist, it is:

1000 African American
3500 Latino/Hispanic
5500 East Asian
3400 South Asian
4900 Southern European
6200 Ashkenazi Jewish
56,000 Northern European
1,000 First generation from two continents

I’m kind of surprised that there are so few African Americans, since the marginal return on ancestry matching technologies for the black American community is going to be higher than for other groups. If these numbers are true then I have on the order of ~10% of the 23andMe genotypes for black Americans in the African Ancestry Project. Zack Ajmal referring to the over 3,000 South Asians quips: “Now if 10-20% of them would participate in Harappa Ancestry Project!” My main concern is that if HAP gets more well known Zack will have hundreds of Tamil Brahmins sending him pretty much duplicate genotypes.

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

Call to Participate in a New Study on Social Networking and Personal Genomics:

Do you share your information with others? How has your personal genetic information influenced your lifestyle and the way you approach your health and medical decisions? Can genetic information create new communities and connections?

The Social Networking and Personal Genomics Study at the Center for Biomedical Ethics invites participants between the ages of 18 and 75 to spend approximately 2 hours with us in a focus group setting. Participants must have purchased direct-to-consumer personal genetic information from 23andMe, Inc., shared their information with others, and be willing to discuss their perspectives and experiences. Focus group members will receive a $50 gift card for their participation and childcare will be available on an as-needed basis at no cost. For additional information or to enroll, please contact Simone Vernez, Project Manager, by email at [email protected] or by telephone at (650) 723- 9364. For more information on the study itself, including specific research aims and funding please visit For general information about participant rights, contact 1-866-680-2906.

I released my 1,000,000 SNPs into the public domain yesterday. Why? To borrow a line from William Jefferson Clinton: because I could. And perhaps, because I should? There’s nothing to fear. Genomes Unzipped hit all the salient points when that crew released their data into the public domain. How about releasing yours?

Also, I didn’t consult my parents or my siblings about this decision. I weight the probability of downside risk to them trivially low. Far lower than the chance of them getting killed in a car crash. My genome, my right. Of course I’m no Howard Roark, a pure egoist. I asked one person of interest who might suffer some future downside risk with me, and they had no concern if I did this if my judgement was that the fear was unfounded. Honestly, I’m a cerebral person of the mind, a man of reflection, not action, but this act did make me feel as if I’m affecting a little bit of change in this world.

A friend on Facebook asked me if I wasn’t worried about being Gattacaed in the future. No, not at all. There’s nothing actionable in my SNPs, and once the scientific research catches up with the genotype data I suspect that our full genome will be part of the record which institutions are already accessing routinely. The technology to surreptitiously sequence and analyze the genetic information of others will be there, and will be used. The issue at hand is not if, but when. I think we need to confront the possibility of radical transparency as part of the near term future. Speaking of which: how do you know that is my genotype? Trust me, it is! 🙂

• Category: Science • Tags: 23andMe, Genetics, Genomics, Open genomics 
🔊 Listen RSS

Dr. Daniel MacArthur at Genomes Unzipped:

23andMe announced yesterday that it will now be releasing information on Alzheimer’s disease risk markers in the APOE gene to customers who purchased their recently upgraded v3 test. The APOE markers are famously associated with a major increase in risk for late-onset Alzheimer’s, with individuals carrying two copies of the ε4 version of the gene being around 15 times more likely than average to develop the disease. Customers who have been tested on the v3 platform will be able to able to access their APOE status after “unlocking” it; customers on earlier versions of the test will need to upgrade to get access. You can see screenshots of the unlocking and results pages here.

I don’t put much weight on 23andMe’s disease risk estimates since I have a relatively large pedigree, and my four grandparents all made it at least to age 75 (one made it to 100, and two to 80+), so I have some sense of my odds of late onset diseases. But, I will admit I was still a little anxious when “unlocking” my results for this locus. This is a classic “tail risk” event which hooks into all the cognitive biases which we as humans come preloaded with. I will probably die of cancer or heart disease, but not due to a mutation of large effect which exhibits Mendelian inheritance patterns.* But I still fear that possibility!

Well, I don’t have the at-risk genotype. As someone who doesn’t focus too strongly on medical genetics I was surprised that Alzheimer’s is 60-80% heritable. This actually makes me someone less worried about this disease for myself, as across my extended pedigree this disease doesn’t seem to crop up very often (my grandparents experienced rapid mental degeneration all within the last year of their deaths, it wasn’t slow and gradual). Of course this is balanced by the recurrent issues with circulatory problems and such within my family. But I take proactive efforts to mitigate the environmental component of risk elevation, as well as the gene x environment interactions.

In sum, there’s no long term point in being ignorant. Though I will concede depending on your own psychology there may be a short to medium term benefit to not knowing your long term risks.

* In fact, I don’t have much of a history of cancer in my family. So it will probably be heart disease. This is what killed all four of my grandparents, though my maternal grandfather did make it to 95 before being diagnosed with the illness.

• Category: Science • Tags: 23andMe, Health 
🔊 Listen RSS

23andMe Sale tomorrow:

For a limited time, you can order a 23andMe kit for $0 up front, plus a 12-month commitment to our Personal Genome Service® at $9/month. This is down from the regular price of $199 plus $9/month.

This promotional price will be available from 12:00AM PST until 11:59PM PST on Monday 4/11/11, or while supplies last!

Update: Sale is a go right now. 5 kits per person.

• Category: Science • Tags: 23andMe, Genetics 
🔊 Listen RSS

Dan MacArthur points me to this nice post over at Daily Kos, Our Genome Decoded: How Companies Like 23andMe Are Advancing the Field of Personal Genomics:

…However, in the past few years several private biotech companies have started offering a “personal genome service” that involves sequencing the most variable portions of our DNA. The goals are straightforward – to give individuals information about their ancestry and inherited traits. While there are definite limitations – both technically and bioethically – to the amount and type of information that can be obtained from personal genome sequencing, in my case the service answered a lingering question about something important to me, and thus was well worth it.

In this article, I’m going to tell the story about why I chose to purchase a personal genome service, briefly explain how it works, show my interesting results, and finally, provide some commentary on how these services will impact the fields of genomics and medicine.

One step at a time. I also appreciate that Michelle keeps posting on her ADMIXTURE results.

• Category: Science • Tags: 23andMe, Personal Genomics 
🔊 Listen RSS

Over the past few days I’ve been very disturbed…and angry. The reason is that I’ve been reading Misha Angrist and Dr. Daniel MacArthur. First, watch this video:

In the very near future you may be forced to go through a “professional” to get access to your genetic information. Professionals who will be well paid to “interpret” a complex morass of statistical data which they barely comprehend. Let’s be real here: someone who regularly reads this blog (or Dr. Daniel MacArthur or Misha’s blog) knows much more about genomics than 99% of medical doctors. And yet someone reading this blog does not have the guild certification in the eyes of the government to “appropriately” understand their own genetic information. Someone reading this blog will have to pay, either out of pocket, or through insurance, someone else for access to their own information. Let me repeat: the government and professional guilds which exist to defend the financial interests of their members are proposing that they arbitrate what you can know about your genome. A friend with a background in genomics emailed me today: “If they succeed in ramming this through, then you will not be able to access your own damn genome without a doctor standing over your shoulder.” That is my fear. Is it your fear? Do you care?

In the medium term this is all irrelevant. Sequencing will be so cheap that it will be impossible for the government and well-connected self-interested parties to prevent you from gaining access to your own genetic information. Until then, they will slow progress and the potential utility of this business. Additionally, this sector will flee the United States and go offshore, where regulatory regimes are not so strict. BGI should give glowing letters of thanks to Jeffrey Shuren and the A.M.A.! This is a power play where big organizations, the government, corporations, and professional guilds, are attempting to squelch the freedom of the consumer to further their own interests, and also strangle a nascent economic sector of start-ups as a side effect.

You are so much more than your genes. So much more than that 3 billion base pairs. But they are a start, a beginning, and how dare the government question your right to know the basic genetic building blocks of who you are. This is the same government which attempted to construct a database of genetic information on foreign leaders. We know very well then who they think should have access to this data. The Very Serious People with a great deal of Power. People with “clearance,” and “expertise,” have a right to know more about your own DNA sequence than you do.

What can you do? What can we do? Can we affect change? I don’t know, I can’t predict the future. But this is what I’m going to do.

1) I am going to release my own 23andMe sequence into the public domain soon. I encourage everyone to download it. I would rather have someone off the street know my own genetic information than be made invisible by the government. That is my right. For now that right is not barred by law. I will exercise it.

2) Spread word of this video via social networking websites and twitter. The media needs to get the word out, but they only will if they know you care. Do you care? I hope you do. This is a power grab, this is not about safety or ethics. If it was, I assume that the “interpretative services” would be provided for free. I doubt they will be.

3) Contact your local representative in congress. I’ve never done this myself, but am going to draft a quick note. They need to be aware that people care, that this isn’t just a minor regulatory issue.

4) The online community needs to get organized. We’re not as powerful as a million doctors and a Leviathan government, but we have right on our side. They’re trying to take from us what is ours.

5) Plan B’s. We need to prepare for the worst. Which nations have the least onerous regulatory regimes? Is genomic tourism going to be necessary? How about DIYgenomics? The cost of the technology to genotype and sequence is going to crash. I know that the Los Angeles DIYbio group has a cheap cast-off sequencer. For those who can’t afford to go abroad soon we’ll be able to get access to our information in our homes. Let’s prepare for that day.

This is a call to arms, a start. I’ve been complacent about this issue, focusing more on the fascinating aspects of ancestry inference which are enabled by personal genomics. No more. I’ll be doing a lot of reading today. If you have a blog, post the video. Raise awareness. Let’s make our voices heard. If they take away our rights because we’re silent, we have only ourselves to blame. If they take aware our rights despite our efforts, we’ll set up the infrastructure for the day when we can take back what is ours.

P.S. Feel free to post info and ideas in the comments. I just literally woke up to the urgency of this issue in the past 48 hours.

Update: here is Jeffrey Shuren’s email address: [email protected].

• Category: Science • Tags: 23andMe, FDA, Genetics, Genomics, Jeffrey Shuren, Select Post 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"