The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 TeasersGene Expression Blog
Admixture Analysis Isn't Wrong, It Misleads
🔊 Listen RSS
Email This Page to Someone

 Remember My Information



=>

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
AgreeDisagreeLOLTroll
These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

Screenshot 2016-09-18 20.57.52

The above results are from Ancestry. You can see here 4% Melanesian. This is common in South Asians. And it’s not an error in the method. Rather, it is a natural outcome of the methods uses to generate admixture profiles.

Basically what’s going on is this:

1) You have data. In this case, the data are your own genotypes, as well as that of a set of individuals which represent world genetic variation, and are categorized into discrete populations.

2) You have a model or set of models. These models have different parameters.

3) You look at the data you have, and pick the parameters which best explain the data given the model.

If you have 100,000 or more markers that’s more than enough genotype data for individuals. The models themselves are quite stylized (e.g., HWE random mating sets of populations), but close enough to reality to give good results in many cases. For example, Ashkenazi Jews are often assigned to be ~100% Ashkenazi Jewish through these methods.

Then again, Ashkenazi Jews are a good test case. This is a population which went through a bottleneck about 500 to 1,000 years ago, and has been reasonably endogamous most of this time. Additionally, it’s not extremely structured due to inbreeding in different clan lineages. Though cousin marriage and uncle-niece marriage has been practiced by Ashkenazi Jews, the runs of homozygosity you see in Jewish genomes is not such that indicates a highly inbred population, as is common in the Middle East or South Asia. Rather, there are lots of medium length segments identical by descent across individuals.

Ashkenazi Jewish population is rather simple, and it is actually a rather clear and distinct population cluster. It stands to reason that when you create an Ashkenazi Jewish reference panel in your training data set it’s a pretty good match to the individuals you are testing.

The problems occur when you are to generate clusters and ancestry assignments for populations which are not so clear and distinct. Why do South Asians routinely come out as part Melanesian or Polynesian? This post was prompted by a Facebook thread where a South Asian customer of Ancestry was interested to see she had Polynesian ancestry. The reality is she almost certainly does not have Polynesian ancestry.

What’s going on is that the reference panel for South Asians used by many of the DTC genomics companies is not diverse enough to capture South Asian genetic diversity. There is an element of South Asian ancestry, “Ancestral South Indian” or ASI, which has deep shared ancestry with populations across Southern Eurasia and out toward Oceania. The admixture analysis method is searching through the reference panels for combinations of genotypes which can explain individual genetic variation. Since the South Asian training set is insufficient to explain all the South Asian variation the algorithms are filling in the balance of the variation with the closest available proxies to the “ghost clusters.”

The method is constrained and conditioned on two things:

1) The data being put in, which is often insufficient.

2) The set of populations that it is forced to work with to generate the combinations in individuals (the parameter values in the model to explain the data) are often insufficient or artificial.

What I mean by the last is that many of the genetic clusters are not taxonomically equivalent. “South Asian” ancestry is much more diverse and diffuse than “Melanesian” ancestry. This why Melanesian ancestry can explain South Asian ancestry, but generally not the reverse.

 
• Category: Science • Tags: Genetics, Genomics 
Hide 2 CommentsLeave a Comment
Commenters to Ignore...to FollowEndorsed Only
Trim Comments?
  1. If you and I can see what is going on with far less detail rich factual data that the reference genome sets, it seems like it ought to be possible to redesign the models to overcome some of these problems (and perhaps also others like the Korean-Japanese issue discussed earlier).

    For example, what if you trained a model with ANI-ASI data sets first from published data?

    More generally, could you train a model with sort of a consensus big picture population history of all major developed populations from published work rather than going straight from raw genomes to analysis without that kind of intermediation?

  2. Thanks for this clarification! I’ve had to explain to so many people that just because they have Polynesia coming up, does not mean that they have Polynesian ancestry. While that category works perfectly fine for Polynesians, those of Southeast Asian ancestry will pick up some Polynesia. Filipinos get about 32% – 40%, Chinese have been getting 10% – 12%, and I heard a Vietnamese said he had 15% Polynesia.

    I figured it was the same with the Melanesia category when I heard those of Aboriginie background getting some of that as well as Asia South.

    I’m sorry I missed your presentation at FTDNA’s conference last year, had to cancel my trip a few hours before I left. Would’ve enjoyed it I’m sure! Still waiting for the “Oceania” category to be put back.

Comments are closed.

Subscribe to All Razib Khan Comments via RSS