The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Iman, a Somali model

Since I started up the African Ancestry Project one of the primary sources of interest has been from individuals whose family hail for Northeast Africa. More specifically, the Horn of Africa, Ethiopia, Eritrea, and Somalia. The problem seems to be that 23andMe’s “ancestry painting” algorithm uses West African Yoruba as a reference population, and East Africans are often not well modeled as derivative of West Africans. So, for example, the Nubian individual who I’ve analyzed supposedly comes up to be well over 50% “European” in ancestry painting. Then again, I”m 55-60% “European” as well according that method! So we shouldn’t take these judgments to heart too much. Obviously something was off, and thanks to Genome Bloggers like Dienekes Pontikos we know what the problem was: the populations of the Horn of Africa have almost no distinctive “Bantu” element to connect them with West Africans like the Yoruba. Additionally, a closer inspection shows that the “Eurasian” component present in these populations is very specific as well, almost totally derived from Arabian-like sources. When breaking apart the West Eurasian populations it is no surprise that Northern Europeans and Arabians are among the most distant pairs, even excluding recent Sub-Saharan African admixture. The HapMap Utah European American sample and the Nigerian Yoruba are very suboptimal for people with eastern African background. In contrast, African Americans are a mixture of West Africans and Northern Europeans, so the ancestry painting algorithm has nearly perfect reference populations for them. The results for African Americans may not be very detailed and rich, but they’re probably pretty accurate at the level of grain which they’re offering results.

Though I’m happy to give people of Northeast African ancestry more detailed results than 23andMe, one of my motivations for the African Ancestry Project was to obtain a data set which would allow me to explore the genomic variation in the east of Africa myself. This region is a strong candidate for “source” populations for non-Africans within the last 100,000 years, and, it seems to have experienced rapid population turnover within the last 2,000-3,000 years. My data set is not particularly adequate to my ambitions, yet. But I do now have 5 unrelated Somalis. To my knowledge there hasn’t much exploration of Somali genomics using thick-marker SNP chips, so why not? N = 5 is better than N = 0 in these cases of extreme undersampling.

Before I proceed to methods and results, I want to note that I put up most of my files here. It’s a ~25 MB compressed folder with images, spreadhseets, as well as raw output from ADMIXTURE and EIGENSOFT. I hope readers will take this as an invitation to poke around themselves.

Since my focus was on the Horn of Africa the coverage of populations is relatively constrained compared to what I normally run. From the HapMap I took the Yoruba, Masai, and Luhya. I renamed Masai “Nilotic Kenya” and the Luhya “Bantu Kenya.” The Behar et al. data set has a fair number of Ethiopians, gentiles and Jews. A reader helpfully labelled the various ethnicities by ID. I was going to do that myself, but because this tedious work was done for me I felt much more motivated to produce something instead of putting this task off! From the Behar et al. I also took some Arab populations, as well as Georgians, Lithuanians and Belorussians. I combined the two latter populations into “Baltic.” Syrians and Jordanians were converted to “Levantine” in the bar plots. I left Saudis, Yemenis, and Yemeni Jews disaggregated. Finally, I added some individuals from the AAP: all the people from the Horn of Africa who are unmixed in ancestry, as well as my Nubian individual. In the display that follows AAP members are combined with the ethnic groups which are appropriate in Behar et al.: Oromos, Amharas, and Tigray. Ethiopian Jews (the Beta Israel) I left as is. To mix it up I also brought over the Sandawe from Henn et al. The Somalis are all from AAP. They do not seem related (close relatives generally form their own cluster).

I tried to balance my populations in an ad hoc fashion. I took only ~30 Yoruba, but decided to add in more Masai, because they seemed to be a mixed population rather than a reference, and I wanted to flesh out their variation. I removed individuals who were closely related as per Zack Ajmal’s findings in his review of his reference data sets. After combining the data sets I was left with ~210,000 SNPs, with less than 0.1% missing. I ran this from K = 2 to K = 8 in ADMIXTURE, and, I also generated the top six independent dimensions of genetic variation in EIGENSOFT. I also took the Fst values from ADMIXTURE of the inferred ancestral populations and generated MDS representations of the genetic distances (though the original file can be found in the attached folder).

There are several different types of plots below. The MDS and PCA should be rather straightforward. But a little explanation for the ADMIXTURE bar plots. There are three for every K. First, average results by population. Second, a fine-grained display of all the individuals from all the populations. Third, a fine-grained display of some populations of interest. Please note that in the second set of plots I don’t label all the individuals by population, since it would unreadable. But they go alphabetically, so you should be able to see where populations start, and where they end.

Before I you even look at the results and we discuss them, there is one clear issue which jumps out: there are closely related individuals or clans in the Masai data set which I need to remove in future runs. Though these individuals hogged up higher K’s it didn’t effect the relationships across other populations, so I decided to publish this now before refining it for the future. It’s a learning experience. You can see that these individuals form their own clusters in the MDS and PCA as well. At least the problem reoccurs systematically using different methodologies.

(note: some of the images are larger than shown, so if you want to see better labels for the fine-grained plots, get the image URL and look at it separately)

[zenphotopress album=287 sort=sort_order number=50]



The fact that the Masai “break down” at K = 6 is really problematic, as there’s information that’s probably lost here. But several immediate observations:

1) The Somalis, like the Ethiopian groups, show almost no impact from the Bantu expansion. This is contrast to the one Nubian individual, who may have more West African ancestry through intermediate groups, or through direct contact with Bantus who were enslaved and brought to Sudan.

2) When you break apart West Eurasian ancestry the Ethiopian and Somali groups have their contribution almost exclusively from an ancestral component in southern Arabia. This makes some sense because of geography, but when you look at the fractions of “northern” admixture even among Yemeni Jews the proportions are not reflected among the Horn of Africa groups. One hypothesis which is consistent with this might be that the admixture event between the Arabian-like group occurred at a time when south Arabians were more genetically isolated and distinct from populations to the north. I suspect this is almost certainly going to be true before the camel, let alone Islam. Interestingly, just as the Nubian individual has more West African affinities, they also have more European affinities. The Nubian individual’s ancestry is simply more cosmopolitan than that of Ethiopians and Somalis, which is not historically that surprising.

3) There is a rough rank order of admixture estimates. In terms of Africanness it goes from Somali > Oromo > Beta Israel ~ Amhara > Tigray. The sample sizes are small though, so we should be cautious. The Amhara seem to vary the most. One might suspect that the Amhara, being the traditional core ethnicity of Ethiopia of late, assimilated other groups. If you look at the PCA the Somali actually look the most “East African” of the groups on PC 2. Note also the linear pattern of distribution other Ethiopians and the Masai toward Arabians and Bantu respectively. This is suggestive of some sort ancient admixture event between an East African substrate and other populations. I will label this population “Ancestral East Africans” (AEA).

4) The relationship of the Sandawe to the other groups is interesting. It seems clear that the Sandawe are related to the AEA, but are somewhat at a remove. Note that a “Sandawe” component is often found in low proportions outside of the Sandawe across East Africa. While the Ethiopians and Somalis do not have a Bantu aspect to their ancestry, they may have an “Ancestral Sandawe” (AS) one.

I don’t want to say more until I get the Masai data set fixed (and I might make recourse to some of Dienekes’ “tricks,” as well as supervised runs). But overall I would say that the ethnogesis of the Semitic and Cushitic people of the Horn of Africa pre-dates the Bantu expansion. I will do some more playing with this, but they do not seem to generate a “Ethiopian-Somali” cluster so easily as South Asians do. This may be because they are never numerous in any of these analyses. Or, it may be due to the possibility that the admixture event was recent enough that the underlying populations are not as obscured as amongst South Asians. I lean toward the latter, for now. As in South Asia, I do not think that the ethnogenesis of the families of Ethiopian peoples is quite a “one off” admixture event. It is suggestive that you have two major language families, Semitic and Cushitic, in this region.

Image credit: Wikimedia

🔊 Listen RSS

In the open thread someone asked: “Any recent stuff on the genetics of Ethiopians.” That prompted me to look around, because I’m curious too. Poking around Wikipedia I couldn’t find anything recent. A lot of the studies are older uniparental lineage based works (NRY and mtDNA). Ethiopia is interesting because unlike almost all other Sub-Saharan African nations it has a long written history. Culturally and linguistically it has both Sub-Saharan African, and non-Sub-Saharan African, affinities. The languages of highland Ethiopia are clearly Semitic. Those of lowland Ethiopia are Cushitic, a branch of the broader Afro-Asiatic language family concentrated around the Horn of Africa (Somali is a Cushitic language, though most Ethiopian nationals who speak a Cushitic dialect are of the Oromo group).

From a human evolutionary genetic perspective, Ethiopia also has specific interest. It is likely that the main recent pulse of humans Out of Africa traversed this region. Additionally, there is some evidence of deep time connections between the groups ancestral to Ethiopians and the Khoisan of southern Africa. It may be that Ethiopians and Khoisan are reservoirs of ancient genetic variation in Sub-Saharan Africa which as been overlain by Bantu in most other regions outside of West Africa. Finally, Ethiopians are known to have high altitude adaptations. This could be due to long term residence in the region, or, assimilation of favorable alleles from the long term residents by later populations.

Fortunately we can get a sense of the genetic affinities of Ethiopians thanks to a paper published last spring, The genome-wide structure of the Jewish people. The focus was clearly on Jews, but they surveyed Amhara & Tigray (Semitic speaking highlanders), Ethiopian Jews (similar ethnically to the Amhara & Tigray, but religiously non-Christian), and Oromo. In the PCA the Oromo and Semitic speaking populations are pretty obviously distinct clusters.

This just means that when you take worldwide genetic variation, and pull out the biggest independent dimensions, and then visualize individuals on the two largest dimensions in terms of how they explain variance, the Oromo and other Ethiopians don’t really intersect. Interestingly the Amhara and Tigray are almost indistinguishable, but the Ethiopian Jews are in their own cluster. There are, for the record, 7 Oromo, 7 Amhara, 5 Tigray, and 13 Ethiopian Jews in the sample.

Now let’s look at the genetic variation in ADMIXTURE. Remember this assigns the genomes of individuals in proportions to K ancestral units. As an example, if you had African Americans, Yoruba, and White Americans, in a total pool, and did K = 2, you might have a tendency where Yoruba and White Americans are in two totally different ancestral populations of K, while African Americans are 80% in one ancestry and 20% in another. The interpretation of this is straightforward, but when it comes to populations whose backgrounds we don’t know as well, one should be careful. The selection of a particular value for K is going to be really important, and we shouldn’t confuse the method from the reality which the method is trying to plumb.

First, K = 8 from Behar et al. I’ve reedited to highlight populations which might inform the variation of Ethiopians.

Now let’s look at a series of K’s. Note the changes.

Luckily for us, we don’t need to stop here. Dienekes included Behar’s Ethiopians (non-Jews) for Dodecad. Additionally, he included the Masai population from the HapMap. This turns out to be important because he found that Ethiopian Sub-Saharan ancestry is similar to that of the Masai, not the other African groups.

Dienekes also provided individual outputs. I’ve stitched together Ethiopians with Egyptians and Saudis. The color coding is the same as above.

You should be able to tell where the three groups start and stop pretty easily. I’m 99% sure that the six individuals with more East African and less Southwest Asian ancestry are all Oromo. Ethiopians, in particular highland Ethiopians, seem to me likely an ancient stabilized hybrid population between a population from Arabia, and a local Sub-Saharan population. This population seems unlikely to have been related to the peoples of West-Central Africa, who are associated with the Bantus across eastern and southern Africa. The Bantu agricultural toolkit runs into ecological constraints in various regions, and it is in those regions that non-Bantu populations have persisted. Ethiopia, with its unique climate and topography, naturally remains non-Bantu (as well as the Horn of Africa as a whole). The possible connections between Khoisan and Ethiopia may be a function of the fact that these areas harbor genetic variants which have disappeared in the intervening regions because of the Bantu expansion. I have a hard time accepting that the Bantu expansion was particular eliminationist, but I am starting to suspect that outside of Ethiopia population densities were very, very, low.

The antiquity of this ancient hybridization event to me is attested by the fact that Ethiopians lack any of the other Middle Eastern components besides the one modal in Saudi Arabia. There is a great deal of intra-population variance in the Saudi data set. Why? Part of this must be the slave trade, as well as pilgrims who remained in places like Mecca. But, I think part of the untold story here is that there may have been a larger genetic impact on Arabia after the rise of Islam from the Levant than vice versa! Probably the gene flow precedes Islam, as Arabia was hooked into worldwide trade and population movements, which Ethiopia was relatively insulated from. The Saudi data set has several people who are “pure” Southwest Asian, but also several who have a great deal of West Asian + South European. These seem likely to be people who have some background in the Fertile Crescent.

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"