The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 TeasersGene Expression Blog
Visualizing Variation, Input → Output
🔊 Listen RSS
Email This Page to Someone

 Remember My Information


Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

I have noted a few times that one thing you have to be careful about in two dimensional plots which show genetic variance is that the dimensions in which the data are projected upon are often generated from the data itself. So adding more data can change the spatial relationships of previous data points. Additionally, in 23andMe’s global similarity advanced plot you are projected onto the dimensions generated from the HGDP data set. There are some practical reasons for this. First, it’s computationally intensive to recalculate components of variance every time someone is added to the data set. Second, it isn’t as if the ethnic identity of any given individual is validated. What would you do if an alien sent in a kit and spuriously put “French” as their ancestry?

So, in reply to this comment: “Let me rephrase: is there any difference when you switch to the world-wide plot? I imagine not, or you would’ve mentioned it.” Actually, there is a slight difference. Below on the right you have a “world view,” with my position being marked with green, and on the left a “zoom in” for Central/South Asia in the HGDP data set.

Because of the “business” of the plot it is hard to see the difference. But when I wasn’t “sharing” genes with people this is what you saw:

1) There is a definite gap between a Central Asian Hazara/Uyghur cluster and a South Asian one which consists of the Pakistani groups.

2) In the Central/South Asia zoom I’m in the gap between the two clusters, about 1/3 of the way toward the Central Asian cluster away from the South Asian cluster (the next closest individual shifted in that direction who isn’t a family member is Bangladeshi).

3) In contrast, in the world view I’m on the edge of the Central Asian cluster, toward the South Asian one, but definitely separated by a clean gap from it.

You can see some generalized differences between the two plots. The Central/South Asia view has a major linear cluster, with the Kalash a distinctive outgroup. In the world view this is not so, rather, you have a group of Pakistanis with non-trivial African admixture shifted in that direction (mostly Makrani, but one of the Sindhis in the HGDP data set seems to be a brownlatto!). Since there isn’t much African variance in the South Asian zoom aside from what the admixed individuals bring to the table naturally it doesn’t shake out as one of the two top dimensions. So what’s going on with me? I don’t have a good hypothesis, but I suspect that my likely Southeast Asian ancestry shifted me further toward the Asian cluster in the world view. There are some groups very closely related to the Burmese in the HGDP (e.g., Naxi) which are in the world view, and, naturally not in the Central/South Asia zoom. When you break ancestry into “European” and “Asian” components then the Hazara/Uyghur cluster is an OK substitute (both are hybrids, with “European” and “Asian” ancestry in about equal proportions), but this is actually a first approximation. These two groups have more “northern” Asian ancestry, while mine is more “southern.” Because of their inclusion in the Central/South Asia cluster the west-east dimension in Eurasia is constructed from more northern East Asian populations, which might underestimate my East Asian element.

There’s actually a much better example than me though who I’m sharing genes with. This individual is an ethnic Persian. Note that in the world view they seem to be on the margins of the European cluster, verging toward the Central/South Asia group. But when you do the Central/South Asia zoom view, they’re in that cluster! Note the very different positions. Their “neighbor” in the zoom view is totally different from their neighbor in the world view:

My argument for why I’m more “Asian” in the world view is that the world view has Asian groups to which I am closer, which are excluded in my zoom view. A much more extreme case seems to be happening with this Persian individual, whose family is from northern Iran and has an oral history of Russian ancestry on one of his lineages.

This is the sort of reason why I assume any reader who points to a paper and a plot and asserts that “this proves X” is somewhat cognitively challenged. The patterns in PCA aren’t necessarily arbitrary. But, they do need to be interpreted with care. One set of results isn’t dispositive of any given position in a debate, at least least until you get to the ridiculous boundary conditions (in some ways, I think of a lot of genetic data visualization like I think of regression. It’s how people use/interpret it that is problematic, not the method itself).

Finally, doesn’t it seem ridiculous to you that South Asians are being projected onto a plot where the dimensions are generated from liminal populations! Imagine, if you will, that Europeans were projected onto a plot generated from the variance of Finnic and Slavic groups only. That’s a good analogy. The Pakistani groups in the HGDP data set are not good representatives of South Asian genetic variation, because they’re shifted to the margins of the distribution. That’s one reason that the Harappa Ancestry Project is so needful (and why if you just got your v3 results and are Iranian, Tibetan, Burmese, or South Asian, you should send it in. And v2 folks as well!).

(Republished from Discover/GNXP by permission of author or representative)
Hide 6 CommentsLeave a Comment
Commenters to FollowEndorsed Only
Trim Comments?
  1. A much more extreme case seems to be happening with this Persian individual, whose family is from northern Iran and has an oral history of Russian ancestry on one of his lineages.

    Russians have very little Mongoloid DNA material, and if just one lineage of the Persian individual has any Russian ancestry, Mongoloid DNA material coming from the Russian ancestry must be even much less, if there is any, as to have no effect on the plot results of the Persian individual.

  2. BTW, I doubt that the Persian individual inherited any DNA material from the Russian ancestor at all if the Russian ancestor is from a relatively distant generation (e.g., great great great grandfather).

  3. onur, don’t mind-read. it irritates me a lot. i wasn’t talking about mongoloid at all, but european ancestry. DO NOT CONNECT DOTS IN THE FUTURE FOR PURPOSES OF FURTHER ARGUMENT. i hate that.

  4. i wasn’t talking about mongoloid at all, but european ancestry.

    Then I misunderstood you in this case. Sorry for that.

  5. Are the Kalash really that divergent from other C/S Asians or just a highly inbred bunch?

Comments are closed.

Subscribe to All Razib Khan Comments via RSS