The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
East Africa

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

In the post yesterday I reported what was generally known about the Horn of Africa, that its populations seem to lie between those of Sub-Saharan African and Eurasia genetically. This is totally reasonable as a function of geography, but there are also suggestions that this is not simply a function of isolation by distance (i.e., populations at position 0.5 on the interval 0.0 to 1.0 would presumably exhibit equal affinities in both directions due to gene flow). For example, you observe the almost total lack of “Bantu” genetic influence on the Semitic and Cushitic populations of the Horn of Africa, and the lack of Eurasian influence in groups to the south and west of the Horn except to some extent the Masai.

Tacking horizontally in terms of discipline, over the past few generations there has been a veritable cottage industry making the case for the recent origin of many ethno-linguistic populations through a process of cultural self-creation. Clearly there are many cases of this, some of them studied in depth by anthropologists (e.g., the shift from Dinka to Nuer identity). But there has been an unfortunate tendency to over-generalize in this direction. In some ways this is peculiar insofar as these models presuppose the infinite plasticity of culture without observing the sharp and strong norms which those very same phenomenon can enforce. The genetic isolation of non-Muslims in the Middle East after the rise of Islam seems rather well validated by the evidence from genomics. The norms of both Muslims and non-Muslims strongly biased them toward endogamy, and nature of Islamic hegemony and domination was such that Muslims were the ones who were likely to have cosmopolitan affinities with the “Islamic international.” In contrast, non-Muslim minorities began a long process of involution after the Islamic Arab conquests, only disrupted in the past century by emigration and to a lesser extent emancipation.

So back to the Horn of Africa. The vast majority of the people of the Horn of Africa speak an Afro-Asiatic language. Arabic and Hebrew are the most famous members of this group, but it is a very broad classification, ranging from the dialects of the Berbers in the Maghreb all the way to ancient Akkaddian. There are two large subfamilies of particular note and interest here: Semitic and Cushitic. The map above shows the distribution within the Horn of Africa. One can “quick & dirty” summarize the pattern here by observing that Semitic languages in Ethiopia tend to be concentrated in the north-central Christian highlands, while Cushitic is found everywhere else. Additionally, there is the confluence between religion and ethnicity, as there are Cushitic Muslims (Somalis, Afar, etc.) and Cushitic Christians (many Oromo, etc.). From what I can gather many Cushitic social and political elites have had a tendency toward assimilating into an Amhara Semitic identity (Haile Selassie’s mother was a Muslim Oromo). We could therefore generate a possible model where Semitic langauges arrived late to Ethiopia and spread through elite emulation, so the difference between Semitic and Cushitic peoples should be marginal in the genomic dimension (such as the marginal differences between Hausa and Yoruba in Nigeria). Or, we could posit that the Semitic element is distinctive from a pre-existent Cushitic substratum.

To make a long story short by running more ADMIXTURE with a Horn of Africa centered data set I have discerned that one can actually differentiate Cushitic and Semitic elements in the Horn and tentatively identify them with different ancestral components. First, the technical details….

I began with the data set I started with in the runs I posted yesterday. Strange outliers in the Masai were removed. These are a few sets of individuals who “fix” for minority ancestral components. This is a tell that there’s structure within the Masai being picked up, but more like distantly related individuals, not ethnic level differences. After running this I noticed that a lot of the same then popped up in the non-Jewish Yemeni and Saudi samples. To some extent this is like “whack-a-mole.” If you remove one problem others simply pop out of the woodwork. So I removed all the non-Jewish Yemenis and Saudis. The number of markers remained the same, 210,000 SNPs.

There were still a few issues with outliers, especially with the Bantu Kenya, and to a lesser extent the Levantine samples. But at this point I decided to go with it, since these are marginal to the story of the Horn of Africa in any case. I stated yesterday that in general Horn of Africa populations don’t present their own clusters, but are a composite of others, mostly East African and Arabian. After I removed some of the spurious Masai components and ran ADMIXTURE up to K = 10 I did finally get a Horn of Africa cluster, “HoAc”. Additionally, I also found that you can see systematic differences between Cushitic Oromo and Somalis, and the Semitic Ahmara, Ethopian Jews, and Tigray.

Below are bar plots of K = 7 and K = 9. The lower K’s aren’t too different from what I posted yesterday, while K = 8 and K = 10 has too many minor components. I’ve posted only fine-grained and Horn of Africa focused plots, instead of the more general summary plots which show average ancestral quanta. Also, below these I’ve posted two dimensional representations of genetic distances between inferred ancestral groups for K = 7 and K = 9. I’ve removed several components though, in the case of one because it was clearly a spurious “extended family” cluster, and in some cases to better visualize relationships.

To cut to the chase, it looks like all Horn of Africa populations share a HoAc base, which one might term “Cushitic,” though that is not totally accurate. On top of that base you see differences based on language family. The Semitic speaking groups have an ancestral component which is identical to the one fixed in Yemeni Jews, while the Cushitic speaking ones tend to lack this. But observe that the Semitic speaking populations generally have the component found in the Cushitic speaking groups, and especially the Somalis in which it often fixes. This is why I put the sequence of language-population expansions so that the Semitic is overlain upon a Cushitic base. Additionally, there does seem to be admixture from Nilotic groups into Ethiopian, but not Somali, populations. This is most consistent and evident in the Oromo, and where an isolation by distance model seems plausible, as the Oromo are geographically the most likely to have interacted with Nilo-Saharan populations and the Somali the least.

Finally, please keep in mind that if the Somalis are 100% cluster X, that does not mean that the Somalis are derived from some real homogeneous ancestral cluster X. These ADMIXTURE components are very interesting in helping to flesh out relationships horizontally across populations today, but we should be cautious about what they can tell us about relationships vertically in terms of how populations emerged over time. A thoroughly admixed group can break out into its own distinctive cluster if it exhibits a level of internal homogeneity and the ancestral “reference” populations themselves no longer exist. This seems to be what has occurred in South Asia, where certain groups shake out as “100% South Asian,” but themselves on the deeper genomic level seem to be stabilized admixtures of ancient fusions between two ancestral groups which were very diverged. A South Asian analogy to the Horn of Africa might lead us to infer that Somalis are the equivalent of these populations, where they lack admixture with more recent arrivals to the region after the initial admixture event between “Ancestral East Africans” (AEA) the Arabians of yore. This may simply be a function of geography and historical contingency, as the position of Somalis is more “sheltered” because of the quasi-peninsular nature of their region of the Horn. Additionally, Somalia is relatively dry and unsuitable for agriculture, making it perhaps less ecologically friendly than the highlands of Ethiopia to Semitic populations bringing a new agricultural toolkit.

There’s plenty more you can say, but I’ll hold off, and add a word of caution: it is very possible that I was looking for these specific clusters and arrived at them via confirmation bias. As I’ve noted before, if you tune ADMIXTURE’s parameters in the proper fashion you can “arrive” at the answers you want. How to protect against this? If I keep performing ad hoc runs and going by intuition, lots of repetition often helps. You naturally arrive at a sense of the underlying distribution of possibilities, can guard against anchoring upon an outlier result, because you know that it is atypical (this is though on reason that ground-breaking results are ignored, as they don’t fit the paradigm, so there’s a flip-side to this bias). I also run cross-validation now and then to find the optimal number of K’s, but that really slows down the program, so I this is a matter of trade offs for me. I’m rather sure that the differences between Ethiopian and Somali groups are robust, because the same pattern of relationships (e.g., the Amhara tendency to resemble the Tigray more than the Somali) reoccurs over and over. But I’m not so confident about the inference I’ve drawn here about the Afro-Asiatic language families and the partitioning of the Cushitic and Semitic groups.

You can find some more files here.

Image credit: Wikipedia

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS In light of my last post I had to take note when Dienekes today pointed to this new paper in the American Journal of Physical Anthropology, Population history of the Red Sea—genetic exchanges between the Arabian Peninsula and East Africa signaled in the mitochondrial DNA HV1 haplogroup. The authors looked at the relationship of mitochondrial genomes, with a particular emphasis upon Yemen and the Horn of Africa. This sort of genetic data is useful because these mtDNA lineages are passed from mother to daughter to daughter to daughter, and so forth, and are not subject to the confounding effects of recombination. They present the opportunity to generate nice clear trees based on distinct mutational “steps” which define ancestral to descendant relationships. Additionally, using neutral assumptions mtDNA allows one to utilize molecular clock methods to infer the time until the last common ancestor of any two given lineages relatively easily. This is useful when you want to know when a mtDNA haplgroup underwent an expansion at some point in the past (and therefore presumably can serve as a maker for the people who carried those lineages and their past demographic dynamics).

What did they find? Here’s the abstract:

Archaeological studies have revealed cultural connections between the two sides of the Red Sea dating to prehistory. The issue has still not been properly addressed, however, by archaeogenetics. We focus our attention here on the mitochondrial haplogroup HV1 that is present in both the Arabian Peninsula and East Africa. The internal variation of 38 complete mitochondrial DNA sequences (20 of them presented here for the first time) affiliated into this haplogroup testify to its emergence during the late glacial maximum, most probably in the Near East, with subsequent dispersion via population expansions when climatic conditions improved. Detailed phylogeography of HV1 sequences shows that more recent demographic upheavals likely contributed to their spread from West Arabia to East Africa, a finding concordant with archaeological records suggesting intensive maritime trade in the Red Sea from the sixth millennium BC onwards. Closer genetic exchanges are apparent between the Horn of Africa and Yemen, while Egyptian HV1 haplotypes seem to be more similar to the Near Eastern ones.

Much of this is totally concordant with the results we’ve generated from the autosomal genome. Though the autosomal genome is much more difficult when it comes to implementing many of the tricks & techniques of phylogeography outlined above, it does offer up a much more robust and thorough picture of genetic relationships between contemporary populations. Instead of a a distinct and unique line of paternal or maternal ancestry, thousands of autosomal SNPs can allow one t o get a better picture of the nature of the total genome, and the full distribution of ancestors.

The map to the left shows the spatial gradients of the broader haplogroup under consideration, HV1. But what about the branches? Below is an illustration of the phylogenetic network of branches of HV1, with pie-charts denoting the regional weights of a given lineage:

Since the shading is so difficult, let me jump to the text:

…Curiously, the HV1 root haplotype with substitution at position 16,067 was not observed in the Arabian Peninsula except in four Yemeni Jews, but was observed in 11 Caucasus, four Egyptian, one European, two Maghreb, and six Near Eastern samples, thus supporting a possible origin in the Near East. Haplotype 16,067–16,362, possibly defining a pre-HV1 haplogroup, has so far been observed in Dubai (one), Ethiopia (four), Maghreb (one), and Yemen (three)….

I think you have be very, very, careful to not read too much into mtDNA lineage distributions and what they may tell you about the past, at least in and of themselves. With the rise of ancient DNA and deeper analyses of mtDNA sequences as well as better geographical coverage many of the inferences of the last 10 years are being radically revised. But, combined with the autosomal results the origin of these mtDNA haplogroups in the Middle East within the last ~10 thousand years seems eminently possible.

Finally, here are their time until the most recent common ancestor estimates:

…The TMRCA estimate for HV1 was 22,350 (14,737–30,227) years when taking into consideration the sequences without the polymorphism at 15,218—a figure which closely matches the estimate of 18,695 (13,094–24,449) years when not considering those two sequences. The control region age estimate of HV1 also presents a similar age, dating to 19,430 (6,840–32,023) years. Age estimates of HV1 daughter sub-haplogroups are only slightly lower—15,178 (8,893–21,671) years for HV1a and 17,682 (10,320–25,316) years for HV1b. The common Arabian Peninsula and East African sub-haplogroups HV1a3 and HV1b1 share a close age of 6,549 (2,456–10,746) years and 10,268 (4,792–15,918) years, respectively. Sub-haplogroups HV1a1 and HV1a2, which despite being rare seem to have a wider geographical distribution, have TMRCA of 10,268 (3,602–17,194) years and 9,518 (3,963–15,255) years, respectively. The ratio of the dates based on the ρ statistic for the synonymous clock relative to the complete sequence was 1.24, closely overlapping in most branches except for HV1a1 which has a very broad age estimate based only on synonymous diversity [23,616 (4,917–42,315) years]….

The confidence intervals on these estimates are really large. All you can say with a high degree of certainty is that the expansion of the family of HV1 haplogroups does not predate the Last Glacial Maximum, 15 to 20 thousand years ago. Many of the daughter branches seem to have emerged in the Holocene, possibly after the rise of agriculture. But with the huge possible set of ranges these temporal estimates come close to offering up pretty much zero additional clarity on the chronology of population dynamics in this region .

Readers might also be interested this from last January, Internal Diversification of Mitochondrial Haplogroup R0a Reveals Post-Last Glacial Maximum Demographic Expansions in South Arabia (with some of the same authors). One aspect of these sorts of papers working with mtDNA is that they remain generally oriented toward the proposition that Pleistocene population structure is extremely important in predicting contemporary patterns of genetic variation. I’m not sure this is such a robust model. The autosomal and uniparental data from Ethiopia and Somalia strongly leans us toward the proposition of admixture of two very distinct populations, one in East Africa (“Ancestral East Africans”), and Eurasian group which are likely to have been intrusive. The genetic distance between the Eurasian inferred ancestral component, which is nearly identical to that of southern Arabia, and other Eurasian components is not so large that it seems plausible that there could have a separation during the Pleistocene. In other words, there was a lot of Holocene migration. If I had to guess I would say it had something to do with the agricultural and pastoral lifestyles brought by Arabians to the Horn of African within the last 10,000 years. Simple ecology imposed a limit upon the expansion of these peoples into more classical lush tropical Africa. Eventually a population did emerge to exploit these territories, Bantus from west-central Africa. Just like the Arabian-AEA hybrid population they encountered ecological, and also demographic, limits on the margins of the Semitic and Cushitic dominated territories in the Horn of Africa. And then of course there are the Nilotes….

Citation: Musilová, Eliška, Fernandes, Verónica, Silva, Nuno M., Soares, Pedro, Alshamali, Farida, Harich, Nourdin, Cherni, Lotfi, Gaaied, Amel Ben Ammar El, Al-Meeri, Ali, Pereira, Luísa, & Černý, Viktor (2011). Population history of the Red Sea—genetic exchanges between the Arabian Peninsula and East Africa signaled in the mitochondrial DNA HV1 haplogroup American Journal of Physical Anthropology : 10.1002/ajpa.21522

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Maju pointed me to a new paper on the genetics of Sudanese today. My interest was piqued, then not so much when I looked more closely. Genetic variation and population structure among Sudanese populations as indicated by the 15 Identifiler STR loci:

There is substantial ethnic, cultural and linguistic diversity among the people living in east Africa, Sudan and the Nile Valley. The region around the Nile Valley has a long history of succession of different groups, coupled with demographic and migration events, potentially leading to genetic structure among humans in the region.

We report the genotypes of the 15 Identifiler microsatellite markers for 498 individuals from 18 Sudanese populations representing different ethnic and linguistic groups. The combined power of exclusion (PE) was 0.9999981, and the combined match probability was 1 in 7.4 1017. The genotype data from the Sudanese populations was combined with previously published genotype data from Egypt, Somalia and the Karamoja population from Uganda. The Somali population was found to be genetically distinct from the other northeast African populations. Individuals from northern Sudan clustered together with those from Egypt, and individuals from southern Sudan clustered with those from the Karamoja population. The similarity of the Nubian and Egyptian populations suggest that migration, potentially bidirectional, occurred along the Nile river Valley, which is consistent with the historical evidence for long-term interactions between Egypt and Nubia.

We show that despite the levels of population structure in Sudan, standard forensic summary statistics are robust tools for personal identification and parentage analysis in Sudan. Although some patterns of population structure can be revealed with 15 microsatellites, a much larger set of genetic markers is needed to detect fine-scale population structure in east Africa and the Nile Valley.

The upside: nearly 500 individuals from a huge range of ethnic groups in Sudan. This is the level of population coverage you’d want. Most of the ethnic groups cover the sample size range from 10 to 50. The downside: only 15 microsatellite markers. About the same number as in the study which I critiqued earlier this week. This is just not a huge number. The authors did try very hard to prune the marker set to be ancestrally informative on this scale, but I think it’s pretty obvious that there are major shortcomings in their analysis. 15 STRs is probably useful for inter-continental genetic variation, but not for intra-continental differences. The paper is open access so you can read the whole thing, but I want to highlight a speculation which they offer based on their results:

The number of unique alleles (Figure 2B) was greatest in the Somali population, and and in the population structure analyses (Figure 5), the Somali population grouped separately from other populations. Because the Somali population is separated both geographically and linguistically from the other populations included in our study, it is not surprising that it is also genetically distinct. It is possible that the Bantu expansion from West Africa had a stronger effect on the region of the Horn of Africa, where Somalia is located, compared with the region where Sudan is located. For example, the languages in Somalia belong to two major linguistic families, the Afro-Asiatic and Niger-Congo, whereas Nilo-Saharan is absent and the Bantu Swahili language is one of the major languages in Somalia (Ethnologue[1]). Another explanation could be that the Somali population is of both Eurasian and sub-Saharan origin, as suggested by a recent study[33], potentially explaining the differentiation of this population from some east African groups, although many of the Sudanese populations, such as Arabs and the Beja, may also have mixed Eurasian and sub-Saharan origin.

I think what is more possible is that as hard as they tried these markers don’t give a insightful picture at the fine scale. By insightful, I mean that there aren’t too many results I’d trust beyond what you’d already intuitively accept. The genome bloggers have already shown that there’s hardly any Bantu admixture in the Horn of Africa.

But the main reason I’m talking about this paper is this: I have one Nubian sample in the African Ancestry Project. Just one. As opposed to 34 in this paper. But my N = 1 makes me really wary of the results from this paper based on 15 STRs. How can my one sample make me wary of the results from 34? Because I have nearly 1 million SNPs from 23andMe’s v3 raw data! So there you have it. The number 1 million isn’t really that big of a deal. I’d be wary if I had 50,000 SNPs (I came up with the number based on running a lot of ADMIXTURE on African populations).

So this is what I did. I took my data set from the African Ancestry Project, pruned a lot of the populations, added Egyptians, and limited AAP members to Ethiopians, Somalis, a Yemeni, and my Nubian. I ran them from K = 2 to K = 12 with ~40,000 SNPs. You can find all of the results for this run at the African Ancestry Project website. But here I want to focus on K = 8. Below is the plot for the reference populations, and then the individual plot for all the Egyptians, and AAP project members who are Ethiopian, Somali, then the Nubian (AF070), and finally the Yemeni (AF091). The Nubian individual is highlighted with a red line, while I’ve placed a blue line underneath the Egyptians. In case you are curious, AF004, AF005, AF006, and AF034 are Somali. AF023 and AF064 are Oromo Ethiopian. AF036 is 7/8 Eritrean and 1/8 Italian, while AF001 is 100% Eritrean.

If the Nubian sample I have is representative it seems plausible that Nubians do have a minor component of Egyptian ancestry, but that Nubians are by an large a more conventional East African population. And contrary to the speculation in the paper, Somalis have surprisingly little ancestry from the Bantu expansion. I’m a lot more confident of this assertion than about the nature of Nubians from my one sample, you see this pattern of Bantu exclusion from Afro-Asiatic groups in Ethiopia and Somalia modulating the parameters every which way. I’d bet $250 that Sudanese as a whole don’t have less Bantu admixture than the Somalis (both groups have a little affinity, perhaps more through common ancestry with Bantu groups than real Bantu expansion ancestry, despite the existence of a Bantu ex-slave class in Somalia). I’d bet $100 that my Nubian is representative and that Nubians don’t have much as gene flow from Egypt as this paper infers.

The best thing about people releasing genome data is that you can actually go beyond the armchair when it comes to critiquing a paper.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, East Africa, Genetics, Genomics, Nubia 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"