The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
India genomics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Zack Ajmal has been taking his Reference 3 data set for a stroll over at the Harappa Ancestry Project. Or, more accurately, he’s been driving his computer to crunch up ADMIXTURE results ascending up a later of K’s. Because it is the Harappa Ancestry Project Zack’s populations are overloaded a touch on South Asians. He managed to get a hold of the data set from Reconstructing Indian History. If you will recall this paper showed that the South Asian component which falls out of ancestry structure inference algorithms may actually be a stabilized hybrid of two ancient populations, “Ancestral North Indian” (ANI) and “Ancestral South Indian” (ASI). ANI are a population which can be compared pretty easily to other West Eurasians. There are no “pure” groups of ASI, but the indigenous peoples of the Andaman Islands are the closest, having diverged from the mainland ASI populations tens of thousands of years ago.

At K = 11, that is, 11 inferred ancestral populations, Zack seems to have now stumbled onto the patterns which one would expect from this hybrid model of South Asians. Let me quote him:

Now let’s take all the reference populations with an Onge component between 10% to 50% and use the equation above to calculate their ASI percentage. The results are in a spreadsheet. There are several populations with an even higher Ancestral South Indian than any of the Reich et al groups, with Paniya being the highest at 67.4%.

The r-squared between % ASI and % Onge, an Andaman group, is 0.994. That means 99.4% of the variation in the former can be explained by variation of the latter. The % ASI is consistently higher than Onge. Why? The last common ancestors of Andaman Islanders and the ASI diverged on the order of tens of thousands of years ago. Dienekes observed ADMIXTURE needs good reference populations, and the Onge have been so long diverged from the last common ancestor with the mainland ASI populations that it’s not a perfect proxy for this ancient group. But it seems that the underestimate is systematically biased in the same direction, so that explains the good fit between the two trends.

Zack naturally generated a pairwise matrix of Fsts between these inferred ancestral populations. Remember, the value within Fst shows the proportion of the genetic variance in the two populations which can be partitioned across them, but not within them. So it’s a rough measure of genetic distance.

Here’s the matrix. I’ve renamed some populations:

S Asian Andaman E Asian SW Asian European Siberian W African Papuan Amerindian Khoisan/Pygmy E African
S Asian 0 0.165 0.121 0.09 0.071 0.134 0.184 0.21 0.175 0.261 0.15
Andaman 0.165 0 0.122 0.161 0.152 0.144 0.224 0.209 0.207 0.304 0.304
E Asian 0.121 0.122 0 0.152 0.137 0.067 0.216 0.205 0.139 0.294 0.187
SW Asian 0.09 0.161 0.152 0 0.048 0.163 0.179 0.235 0.208 0.257 0.143
European 0.071 0.152 0.137 0.048 0 0.143 0.186 0.223 0.178 0.261 0.148
Siberian 0.134 0.144 0.067 0.163 0.143 0 0.232 0.228 0.141 0.311 0.203
W African 0.184 0.224 0.216 0.179 0.186 0.232 0 0.286 0.281 0.123 0.059
Papuan 0.21 0.209 0.205 0.235 0.223 0.228 0.286 0 0.29 0.367 0.26
Amerindian 0.175 0.207 0.139 0.208 0.178 0.141 0.281 0.29 0 0.364 0.252
Khoisan/Pygmy 0.261 0.304 0.294 0.257 0.261 0.311 0.123 0.367 0.364 0 0.133
E African 0.15 0.195 0.187 0.143 0.148 0.203 0.059 0.26 0.252 0.133 0

The South Asian population above is very different from the components you’ve seen before. It seems equivalent to ANI more than anything else. This is a good reminder that the labels we’re giving to these ancestral groups are mnemonics, they’re not to be taken as literal and concretely. Personally I find Fst matrices hard to read, so I’ve generated a number of multidimensional scaling plots illustrating the relationships with the matrix. Clarity can be achieved by mixing & matching the populations, so that’s what I did. Also, I only display dimension 1 and dimension 2. Remember that dimension 1 is the one with more weight.

Do not think of these as real concrete populations from which all modern populations emerged. These eleven populations are abstractions which fulfill the dictates of the algorithm. But, I do think that with that caveat in mind, there are suggestive patterns.

First, the “SW Asian” component isn’t that much closer to “W Africans” than the other West Eurasian groups. Yet we know in reality that Southwest Asian populations are closer to Africans. What’s going on? Southwest African populations have African admixture. And, that admixture is recent enough that it shakes out rather easily. This is in contrast to the normal South Asian modal components, which are indicative of a greater time since admixture, which was thorough enough that it is not trivial to tease out the two ancestral groups from each other’s genetic background. Fission and fusion are normal parts of the history of any geographically expansive species. ADMIXTURE will capture the earlier parts of fusion. But after a long enough period of time that fusion becomes its own distinctive element.

There is the conventional east-west division you see in Eurasia on PCA, but you see evidence of the north-south secondary component on these plots too. The Andaman populations are closer to East Eurasians than West Eurasians, but, they also occupy their own position which highlights a north-south axis.

Finally, the S. Asian/ANI population seems somewhat closer to “Europeans” than “SW Asians. That is interesting. But this where you have to very careful and remember that these “pure” ancestral components can themselves fractionate into substituent elements at higher K’s or when you constrain the data set appropriately (Africans and inbred groups tend to hog clusters in ADMIXTURE). If you’ve read all the genome bloggers you will be aware that “European” and “SW Asian” components themselves break apart upon closer inspection. The “SW Asian” component usually divides into a northern and southern branch. The northern branch is often positioned closer to the other “European” groups than it is to the southern branch in terms of genetic distance. Here are a selection of West Eurasian groups sorted by their “S Asian” proportion:

South Asian %
Iranians 30%
Lezgins (Caucasian) 29%
Georgians (Caucasian) 26%
Adygei (Caucasian) 24%
Armenians 22%
Turks 21%
Syrians 19%
Druze 18%
Lebanese 17%
Samaritians 16%
Palestinian 15%
Cypriots 14%
Saudis 14%
Yemenese 14%
Russian 8%
Tuscans 7%
Hungarians 7%
Utah whites 7%
Orcadian 5%
British 5%
French 5%
Italian 5%
Finnish 4%

Also observe that the distance between SW Asians and Europeans is smaller than bertween Europeans and S Asians. Crunching up the K’s, or limited the data set to West Eurasian groups, would probably show more fine-grained relationships.

🔊 Listen RSS

Whenever Zack Ajmal posts a new update to the Harappa Ancestry Project he appends some data to his ethnic database. This sends me to Wikipedia, because how many people are supposed to know what a “Muslim Rawther” means? Well, if you are a Muslim Rawther, and perhaps from Southern India, you would. But South Asian ethno-linguistic categories and hierarchies are notoriously Byzantine, and I have difficulty making sense of them. This isn’t too surprising in my case, as my family’s background is relatively mixed in the very recent past (e.g., Hindus and Muslims, and people of various caste backgrounds), so we’re not the sort who can go at length about our pure ancestry and all that stuff. Unfortunately, Wikipedia isn’t always useful, because the people editing the entries on particular South Asian ethnic groups are often people from those ethnic groups, so you get a lot of extraneous information, and a particular slant on how awesome and high achieving the group (also, sometimes there’s funny stuff about how notoriously good looking that particular caste!). On occasion there are other sources which are informative. For example, Zack has several individuals from the Tamil Nadar caste. I know a little about this group because 1) I have a friend whose family is Nadar (he’s American, so saying he’s an American Nadar is pretty worthless), 2) The New York Times profiled the group last fall.

When Zack noted that a group termed Tamil Vishwakarma had submitted entries, I went to Wikipedia. That was the first time I’d heard of the group. This is what I found:

Viśvákarma is the term used in India for a caste of priests, engineers, architects, sculptors, temple builders and artists. The term is applied to five sub-castes; blacksmiths, carpenters, coppersmiths, goldsmiths and sculptors.They connect themselves as Pancha janas of vedic period [Rathakara, Karmakara, Thakshaka, Kumbhakara,and NishadaSthapathies] and worshiping various forms of Viswakarma, i.e., Twostar, Daksha prajapathy, Takshaka and Maya and Rhibhus etc.

Vishwakarma Brahmins are also called Rathakara Brahmins, and the Rathakara mentioned in the Rigveda (1.6.32) indicates high status and is associated with the placing of the holy sacrificial fire in the Yajna kunda…According to the Srautasutras, the Rathakara (Chariot-maker) is entitled to perform all the sacrifices….In many sacrifices, like the Rajasuya, the Rathakara played a role as recipient of the offerings (ratninaḥ)….

First, I don’t know what a lot of this means. For example, “many sacrifices, like the Rajasuya….” makes no impression on me, as I don’t know what Rajasuya is supposed to be. But the salient point here is that the Vishwakarma are making some assertion to a relationship with Brahmins. This, I can understand. Many non-Brahmin groups in South Asia want to associate with Brahmins, because Brahmins are high status and socially superior. I assume most of the time this is made up, how many fallen Brahmins can there be exactly? It’s kind of like claiming descent from Muhammad among Muslims, or being descended from a particular lecherous and promiscuous king among the poor of Europe.

But after months of the Harappa Ancestry Project you can shift your assessment of the probabilities based on the genetics alone. South Indian Brahmins are genetically distinctive consistently from other non-Brahmin South Indians. So how would I go about exploring the veracity of the Vishwakarma’s claims?

First, I am looking at K = 4. So the data set has four ancestral populations: South Asians, Europeans, East Asians, and Africans. These are hypothetical abstractions, so focus on the relative relationships across individuals and groups, not on the absolute quanta. I took Zack’s ADMIXTURE results, ethnic labels, and added a few categories myself. You can see the CSV here. Basically I took the ones with caste identification and partitioned them into Brahmin vs. non-Brahmin. Note that the non-Brahmin categories includes groups of all caste ranks. It’s socially heterogeneous. I also added a geographical label. NW = Pakistan, and the northwestern third of India. NE = Bangladesh and the northeastern third of India (Bihar is in the northeast here). S is south, for the four Dravidian dominated states. And C, central, includes Mahrashtra, Gujarat, etc.

First, let’s look at all the Brahmins and the two Vishwakarma. I sorted by South Asian ancestry.

The Vishwakarma are outliers among the Brahmins. You can see a discontinuity.

Sorting by South Asian, European, and then East Asian ancestry, here are the Vishwakarma’s neighbors:

They’re like other non-Brahmin South Indians. No discontinuity. I can’t attest to the spiritual Brahmanitude of the Vishwakarma, but I’d say that they’re probably asserting Brahmin associations to elevate their status vis-a-vis other castes.

Now let’s look at the all the Harappa samples. I will sort first by region, and then by South Asian ancestry.

A few notes. Jatts are originally the freehold peasant cultivators of Punjab I think. They think they’re pretty awesome! Sourastrians are transplants from Gujarat to Tamil Nadu in the South. They’ve maintained their Indo-Aryan dialect. The two Bengalis with a lot of East Asian ancestry are my parents. There is a pretty fit here to a two parameter model of predicting South Asian ancestral quanta: geography & caste. Where this breaks down the most seems to be in the far northwest, where the Brahmins don’t seem to be that much less South Asian than non-Brahmins, and in fact, perhaps more South Asian than the peasant Jatts.

Finally, the origins of Caribbean Indians is generally presumed to be among the peasants of the northeastern half of the Indo-Gangetic plain, going by the historical sources and the persistence of Bhojpuri in Trinidad and Guyana. This looks about right.

• Category: Science • Tags: Genetics, Genomics, India Genetics, India genomics 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"