The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
African Genetics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS
inferred effective population size over time

inferred effective population size over time

600px-Namibian_Bushmen_Girls The Khoisan are not the oldest people on the face of this earth, they simply have been the lest impinged by population crashes over the past ~200,000 years. This is not a shocking assertion, but it is supported with greater robustness by a new paper in Nature Communications, Khoisan hunter-gatherers have been the largest population throughout most of modern-human demographic history. Unfortunately the standard tropes that come to the fore whenever the Khoisan are the objects of genetic scrutiny have emerged fully formed. But the worst problem is that one of the authors seems to be contributing to the misleading perceptions. Here are the press release

“Khoisan hunter/gatherers in Southern Africa have always perceived themselves as the oldest people,” said Prof Schuster, an NTU scientist at the Singapore Centre on Environmental Life Sciences Engineering (SCELSE) and a former Penn State University professor.

“Our study proves that they truly belong to one of mankind’s most ancient lineages, and these high quality genome sequences obtained from the tribesmen will help us better understand human population history, especially the understudied branch of mankind such as the Khoisan.

Though I think I know what Dr. Schuster is saying, I also believe that it is important remember that both you and I are also descendants of one of mankind’s most “ancient lineages.” We’re all equally descended from ancient lineages, because we’re all descended from ancient proto-humans. The Khoisan are not fossils preserved from a bygone age, they’re a modern people who descend from those same archaic Africans that we descend from. The phylogenetic distinction of the Khoisan is that their ancestors seem to have diverged from other human beings rather early on. In fact, it is defensible to suggest that they departed from the family tree of other humans at the first fork in the road, with the ancestors of Nilotic and Bantu Africans separating from the ancestors of Eurasians rather later on (also, it seems that the lineage that led up to the Khoisan after their divergence from the proto-non-Africans/agriculturalist Africans diversified and are represented by the Hadza and Pygmies, in addition to the Khoisan).

So what is the figure at the top of this post telling us? It’s showing you the effective population in the past by using the variation within the genome of individuals. In any given generation only a fraction of the population reproduces, and among the fraction which reproduces there is variation in output (e.g., some have one offspring live to adulthood, others have four). This increases random genetic drift by reducing population size and increasing variance of the sampling outcome. Drift tends to remove variation as some alleles fix and others go extinct due to the fluctuations that it produces. What happens when populations go through a bottleneck is that a lot of the variation that was present in a population can be squeezed out of it, and it takes a long time for that variation to replenish itself through mutation alone. This is why the long term effective population of the Khoisan is so much larger than the Han Chinese, despite there being four orders of magnitude more Han Chinese in the world today than Khoisan. As you can see in the figure the Khoisan were not subject to nearly as strong a bottleneck in the past. The small long term effective population size of non-Africans, reflected in their relative genetic homogeneity in comparison to Africans, is due to the bottlenecks that occurred as the ancestors of non-Africans were migrating across the world. About ~30,000 years ago the effective population of the Khoisan was ~10,000, while that of the ancestors of Eurasians was ~1,000. Both groups were going through a bottleneck, but the proto-Eurasians were experiencing a much stronger one. The ancestors of agriculturalist Africans, the vast majority of Sub-Saharan Africans today, were also experiencing a bottleneck, but a milder one.

One major caveat with the time scale above is that it is sensitive to the mutational rate. The figure above uses ~2.5 X 10-8 value, while some estimates are close to ~1.25 X 10-8. Though the authors brush this problem off, arguing that paleoclimate models can fit both parameter values in terms of explanations for the demographic patterns, from what I know the paleonathropology more neatly fits the dates from the higher mutation rate. With the lower value the divergence between Khoisan and non-Khoisan is almost contemporaneous with the first fossil evidence of anatomically modern H. sapiens in Africa. But it is true that in either scenario non-Africans are subject to a strong bottleneck, long evident from the genetic data, and that the Khoisan have relatively large long term effective populations.

To put my cards on the table I do not accept the paleoclimatic explanation for why the Khoisan in particular had a large population while the ancestors of agriculturalist Africans and non-Africans did not. First, in Science Pontus Skoglund observes that “Many African populations are not included for comparison….” Yes. He also points out that the Khoisan could have gotten their diversity from interbreeding, something that has support in the literature. That being said, that’s not my biggest problem with the explanatory model outlined in the paper. In the blog post yesterday I reviewed results which imply that the Igbo of Nigeria have some admixture from a hunter-gatherer group genetically closer to the Khoisan than to to the Pygmies. This is peculiar because geographically the Igbo are far closer to the Pygmy populations. So what’s going on here?

Human populations move. Paleoclimate models are often predicated on the idea that populations were geographically fixed for eons. But there’s a fair amount of circumstantial evidence that Khoisan-like people were extant as far north as Ethiopia in the recent past. The major element of the ancestry of the Bantu speaking majority in South Africa was resident in Cameroon ~3,000 years ago. A hybrid Eurasian-Sub-Saharan African populations in the highlands of Ethiopia only emerged in the past 3,000 years. And so on. We don’t know that the ancestors of agriculturalist Africans were resident in the northern fringe of Sub-Saharan African ~100,000 years ago. We don’t if the ancestors of the modern Khoisan have such deep time history in southern Africa.

In light of evidence of Eurasian back-migration into Sub-Saharan Africa in deep antiquity it is quite reasonable to suggest that the intermediate population dynamics of the agricultural Sub-Saharan populations in this paper is simply due to the fact that they are a hybrid between long resident hunter-gatherer populations with some affinity with Khoisan and Pygmies, and descendants of the original Out of Africa migration. The blogger Dienekes Pontikos has been proposing a flavor of this model for a long time. Initially I thought it was a cranky fixation of his which had little basis in reality, but over the years I’ve come around to the position that it’s a plausible theory. At least as plausible as the myriad attempts to reconcile genetic patterns with paleoclimate models.

Citation: Khoisan hunter-gatherers have been the largest population throughout most of modern-human demographic history.

• Category: Science • Tags: African Genetics 
🔊 Listen RSS
Adapted from The African Genome Variation Project shapes medical genetics in Africa

Adapted from The African Genome Variation Project shapes medical genetics in Africa

Ten years ago the story of how modern humans expanded across the face of the world would have been a relatively simple one. The story generally recounted for popular consumption was most forcefully articulated in Richard Klein’s The Dawn of Human Culture. Around ~50 thousand years ago a small group of Africans resident in the east of the continent changed in some sense. The mostly likely cause for this change was presumably a mutation which conferred upon this group the ability to engage in fully fluent language, and therefore engendered cultural flexibility previously unseen in our lineage.* The rest, as they say, was prehistory. These Africans expanded across the whole Afro-Eurasian zone, replacing archaic hominins, such as the Neandertals, in totality. They pushed the frontier of human habitation into Oceania and the New World.

That model is in some important aspects very wrong. First, the expanding African population mixed with other groups, including likely within Africa. Though less than ten percent of the ancestry outside of Africa, some of it was functionally very important. Second, the demographic expansion ~50-100 thousand years ago more accurately captures the dynamics of non-Africans, than Africans, who were characterized by much larger ancestral populations when the 100 to 1,000 ancestors of modern non-Africans left the continent. Third, modern populations seem to be to a large extent the product of fusions across the Pleistocene branches of humanity, brought together during the Holocene by rapid demographic expansion triggered by cultural innovations such as agriculture. Finally, these dynamics were not limited to populations outside of Africa.

Human_migration_out_of_Africa For far too long Africa was conceived of as a blackbox in genetic terms, eternally useful as an outgroup, basal to the rest of humanity. Yes, there were platitudes about how most human genetic diversity was localized in African populations due to the bottleneck during the Out of Africa event, but beyond that maps of human migration implicitly left one the impression that after the ancestors of modern humans left little occurred within Africa. Yes, everyone could agree that there were back migrations along the periphery of the continent, the great swath north of the Sahara around to the Horn of Africa, but Sub-Saharan Africa was neglected in these treatments. No more.

A new paper in Nature is a major step forward in bringing together a lot of the elements of new findings we’ve seen in other work, The African Genome Variation Project shapes medical genetics in Africa. The medical genetics part is important downstream. It doesn’t loom large in the paper itself. I recommend you check out the supplementary info and supplementary data, it has a lot of the meat of the paper. For me the big topline result is in the figure above, where you see the collection of results suggestiong admixture into African populations from Eurasia and also between agriculturalists and hunter-gatherers.

On the Eurasian admixture, the authors confirm what we always knew about Ethiopia and the Horn of Africa, that it was the scene of a relatively recent admixture event between an Afro-Asiatic people, and a group related to modern Nilotic peoples. What is more interesting is that they observed Eurasian admixture within Yoruba people. This admixture has been suggested by others, as the Yoruba have traces of Neandertal ancestry. This group dates the admixture back to nearly 10,000 years ago, so it as likely associated with goings on that were trans-Saharan. If that is the case these were almost certain quasi-Eurasian hunter-gatherers, and their ancestry might have been diminished in current North African groups subject to waves of farmers issuing from the east during the Neolithic. But there is also admixture with Eurasians further east in Uganda among Bantu groups. Reading the details of the supplements there is a chance that this was mediated through admixture of Eurasians with hunter-gatherer populations, and then the absorbtion of this hybrid group into the expanding wave of Bantu farmers. Speaking of which, this issue is solved, it is clear that the Bantu expansion was a major demographic transformation of eastern and southern Africa. The genes speak loudly and clearly. Additionally, the Sub-Saharan African admixture of Ethiopians is more closely related to that of the Nilotic people than the Bantus. The paper didn’t tease out the details archaeological and historically, but if you look at the dates all this was going on in eastern Africa during the rise and fall of ancient Egypt. In other words, within historical memory the whole demographic landscape of Africa was reshaped. Contrary to the idea that Africa was static, there are indications here of massive transformations.

Though the Eurasian admixture story among these populations is fascinating, there is also nuance in the input of hunter-gatherer ancestry within West African and Bantu populations. First, I suspect that these estimates are low bounds, because they don’t have exact reference populations. Some of the hunter-gatherers mixed into the Igbo and Bantu groups may have been more like agriculturalists than the extant hunter-gatherer groups within Africa. One of the peculiarities of the genetics is that it looks as if the hunter-gatherers of Sub-Saharan Africa, the Khoisan in the south and the Pygmies in the center, share more recent common ancestors than they do with the agriculturalists. This may simply be due to the fact that the agriculturalists went through rapid expansion, and this whole constellation of peoples derive from a group which was an outgroup to extant hunter-gatherers. The only complicating issue is that of Eurasian admixture; it seems likely that for very old admixture events we’re seeing underestimates, or they aren’t picked up. In other words, the “reference” Sub-Saharan Africans themselves are compounds of people who remained within Africa, and Out of Africa. The eastern Pygmies may be the only people in the world without much Out of Africa input (recall that the Khoisan have some level of Out of Africa input mediated by East African pastoralists).

A second interesting aspect of the paper is about selection within African populations. As I said above you can find much in the supplements, so I won’t review that laundry list. But, it is interesting that many of the signatures disappeared once Eurasian ancestry was “masked.” That is, within the genomes of individuals you have a mosaic of ancestries, and high genetic distances between populations at particular loci turn out often to be simply due to historical demography. Once you remove this confound you pick out signals of selection which might be due to local adaptation (though some of the Eurasian alleles might also have been subject to selection, so in some ways the filtering might be too stringent). But the masking of Eurasian ancestry also highlighted something important: the genetic variation across African populations once you remove Eurasian ancestry is not that high. This is curious in light of the truism that most genetic variation in humans is found within Africa, but as Nick Patterson pointed out to me years ago: this applies to variation within populations, not across them. Since most variation is not partitioned across populations that explains why Africans can be so genetically varied despite exhibiting not too high between population variation. After masking Eurasian ancestry the mean pairwise Fst was ~0.015. To give a sense of perspective, the Fst between Northern Italians and Lithuanians is 0.01. The Fst between the Ethiopian African ancestry (so Eurasian segments are masked) and other African populations is still 0.027, on average (the distance between Lithuanians and Southern Italians is 0.015). This reinforces the fact that the African ancestors of Ethiopians are somewhat atypical (further confirmed by the relative inaccuracy of imputation from public data sets).

The result that the Igbo seem to have ancestry from a hunter-gatherer group genetically closer to the Khoisan than the Mbuti Pygmies makes a lot more sense when you accept that much of the genetic population structure within African disappeared with the rise of agriculturalist groups which demographically swamped them. It seems plausible that the preexistent variation can be reconstructed to some extent by analyzing patterns within agriculturalists, as they likely absorbed hunter-gatherer groups over time. Within the paper the authors suggest that whole genome sequencing of more populations should be high on the priority list, and I agree. The future is going to be interesting.

Citation: The African Genome Variation Project shapes medical genetics in Africa.

* Klein appealed to some of Stephen Jay Gould’s more macromutationist/saltationist speculations.

• Category: Science • Tags: African Genetics 
🔊 Listen RSS

I mentioned this in passing on my post on ASHG 2012, but it seems useful to make explicit. For the past few years there has been word of research pointing to connections between the Khoisan and the Cushitic people of Ethiopia. To a great extent in the paper which is forthcoming there is the likely answer to the question of who lived in East Africa before the Bantu, and before the most recent back-migration of West Eurasians. On one level I’m confused as to why this has to be something of a mystery, because the most recent genetic evidence suggests a admixture on the order of 2-3,000 years before the past.* If the admixture was so recent we should find many of the “first people,” no? As it is, we don’t. I think these groups, and perhaps the Sandawe, are the closest we’ll get.

Publication is imminent at this point (of this, I was assured), so I’m going to just state the likely candidate population (or at least one of them): the Sanye, who speak a Cushitic language with possible Khoisan influences. There really isn’t that much information on these people, which is why when I first heard about the preliminary results a few years back and looked around for Khoisan-like populations in Kenya I wasn’t sure I’d hit upon the right group. But at ASHG I saw some STRUCTURE plots with the correct populations, and the Sanye were one of them. I would have liked to see something like TreeMix, but the STRUCTURE results were of a quality that I could accept that these populations were not being well modeled by the variation which dominated their data set. Though Cushitic in language the Sanye had far less of the West Eurasian element present among other Cushitic speaking populations of the Horn of Africa. Neither were their African ancestral components quite like that of the Nilotic or Bantu populations. The clustering algorithm was having a “hard time” making sense of them (it seemed to wanted to model them as linear combinations of more familiar groups, but was doing a bad job of it).

Here is an interesting article on these groups: Little known tribe that census forgot. Like the Sandawe this is a population which seems to have been hunter-gatherers very recently, and to some extent still engage in this lifestyle. In this way I think they are fundamentally different from Indian tribal populations, who are often held up to be the “first people” of the subcontinent. More and more it seems that the tribes of India are less the descendants of the original inhabitants of the subcontinent, at least when compared to the typical Indian peasant, and more simply those segments of the Indian population which were marginalized and pushed into less productive territory. Over time they naturally diverged culturally because of their isolation, but the difference was not primal. In contrast, groups like the Sanye and Sandawe may have mixed to a great extent with their neighbors (and lost their language like the Pygmies), but evidence of full featured hunting & gathering lifestyles implies a sort of direct cultural continuity with the landscape of eastern Africa before the arrival of farmers and pastoralists from the west and north.

* I understand some readers refuse to accept the likelihood of these results because of other lines of information. I am just relaying the results of the geneticists. I am not interested in re-litigating prior discussions on this. We’ll probably have a resolution soon enough.

🔊 Listen RSS

There’s two papers in Nature Genetics on the 17q21.31, and variation of haplotypes of inversions in world wide populations. Here’s a part of the discussion from the first paper:

In conclusion, we propose that the ancestral H2′ haplotype arose in eastern or central Africa and spread to southern Africa before the emergence of anatomically modern humans…Approximately 2.3 million years ago, the inversion rearranged to what we now refer as the direct orientation haplotype (H1′). This haplotype spread throughout the Homo ancestral populations in the African continent, virtually replacing the H2′ haplotype and becoming the predominant haplotype. We note that both the Denisova and Neandertal sister groups are predicted to have H1′ haplotypes…These early haplotypes were much simpler in their duplication architecture, similar to the patterns seen in great apes. We find that the more complex duplication architectures are particularly enriched in populations that migrated out of Africa. On the basis of sequence at the duplication loci, we estimate that the H2-specific duplication event occurred approximately 1.3 million years ago. Independent of the H2 duplication, the H1-specific duplication event occurred much more recently, approximately 250,000 years ago. Notably, we did not observe this haplotype in any of the African or Asian populations studied, suggesting that it may have been lost in these groups as a result of genetic drift. The H2D haplotype has risen to frequencies of 10–25% in European populations with virtually no genetic variation, suggesting an extremely recent and rapid expansion of this haplotype. High-coverage sequencing of more individuals along with fecundity data will likely shed further light on whether the high frequency of the haplotype-specific duplication in Europeans is due to selection or the effects of demographic history specific to this locus.

H2D individuals are susceptible to disease. If there is a fitness gain, there is also a loss. Despite his eugenical enthusiasms W. D. Hamilton ultimately gave up on the idea because he admitted it was difficult to predict what was beneficial and what is deleterious. Context matters. The distribution of haplotypes in this region seems to reflect echoes of deep pre-“Out of Africa” history in our species.

🔊 Listen RSS

After the second Henn et al. paper I did download the data. Unfortunately there are only 62,000 SNPs intersecting with the HGDP. This is somewhat marginal for fine-grained ADMIXTURE analyses, though sufficient for PCA from what I recall. That being said, the intersection with the HapMap data sets runs from ~190,000 SNPs, to the full 250,000 SNPs (this makes sense since the Henn et al. #2 data set has some HapMap populations in it). So I’ve been experimenting a fair amount in the past few days, and I thought I would post on one issue which was clear in the original paper, but which I have replicated.

The Fulani (Fula) people of the western Sahel seem to have a relatively old West Eurasian component which has distinct affinities with the “Maghrebi” element discerned by Henn et al. In fact, the non-Sub-Saharan African ancestry of the Fulani is almost exclusively of this origin. To me this serves as a peculiar mirror of what you see in the Cushitic and Ethiopian Semitic peoples of the far east of the Sahel-Sudan latitudinal region. These populations also seem to be compounds of a Sub-Saharan Africa element with a West Eurasian one, but in their case the admixture is almost exclusively from a Southwest Eurasian (Arabian) component. Geographically these two symmetric admixture events make sense, but the exclusivity is still a bit surprising. Additionally, in both the case of the Fulani and the Ethiopian and Cushitic groups the admixture is widely distributed and even enough to imply that they are old events. I also assumed this because in some admixture runs a “pure” Fulani cluster partitions out, which is not unexpected for stabilized hybrid populations (all human populations are stabilized hybrids if you go back far enough).

To give you a flavor of what I’m talking about here are some screen shots of a run which is currently going. It has 180,000 markers. I removed Tunisians and many African populations from the Henn et al. data set, and included in the Utah whites from the HapMap. The individual plots show the ancestral proportions for each Fulani in the data set:

So what can we see here? First, let’s reiterate something: as in the case of the populations of the Horn of Africa the West Eurasian element in the Fulani is difficult to find in “pure” form in the populations from which it putatively derived. What does that imply? I think that that means that the Fulani have an origin in relatively recent historic time, on the order of 2,000, not 10,000, years. That is because I am skeptical that the Fulani would be able to maintain genetic distinctiveness for ~10,000 years from other populations around them. In contrast, the last 2,000 years have seen the rise of various cultural institutions, from trans-Saharan nomadism to Islam, which might slow down admixture sufficiently to maintain the differences between the Fulani and their neighbors. It also implies to me that the non-Maghrebi “Near Eastern” element which Henn et al. discerned is relatively a recent phenomenon in northwest Africa, else the Fulani should also carry it. How recent? Probably from Classical Antiquity down to the Muslim period. Observe that many North Africa groups have a red “European” element. This may be from Near Eastern populations, but I suspect that the fraction here is just too high to be explained by that. Also, you can see above that some groups in Morocco have nearly as much of this as Egyptians, but far less of the more genuine Near Eastern components.

In all likelihood the West Eurasian component came to the Fulani via the Tuareg or a related or antecedent population. So if you typed the Tuareg you would probably get a better sense of the “pure” “Maghrebi” genetic profile. These genetic results also can serve as fodder to understanding the ethnogenesis of the landscape of the Sahel. In the map above it is interesting to observe that the Hausa speak an Afro-Asiatic language, even though their West Eurasian component is far lower than the Fulani, who speak Niger-Congo dialects. What gives? I suspect that the difference here is that the Hausa are a case of elite emulation of a cultural complex which was much more integrated and elaborated by the time it arrived on the West African scene. This explains how there could be language shift, while in the case of the Fulani there was none. Another hypothesis is that Afro-Asiatic derives from Sub-Saharan Africa itself, and the Chadic (Hausa) group are basal to the phylogeny. I’ll let readers explore the implications of that. A final aspect, I put the quotations in the title because perhaps the Berber dialects spread via elite emulation, and the original Maghrebi ancestors of the Fulani spoke a different language, which has been lost? As they say, for every answer there bloom a thousand questions….

Image credit: Wikipedia, Wikipedia.

🔊 Listen RSS

Zinedine Zidane, a Kabyle

There is a new paper in PLoS Genetics out which purports to characterize the ancestry of the populations of northern Africa in greater detail. This is important. The HGDP data set does have a North African population, the Mozabites, but it’s not ideal to represent hundreds of millions of people with just one group. The first author on this new paper is Brenna Henn, who was also first author on another paper with a diverse African data set. Importantly the data was posted online. Unfortunately though most of the populations didn’t have too many markers. This isn’t an issue in an of itself, but it becomes a big deal when trying to combine it with other data sets. If you limit the markers to those which intersect across two data sets you start to thin them down a lot, to the point where they’re not useful. Though the the results of the paper are worth talking about, the authors claim that they’ll be putting the data online. This is important because they used a large number of markers, so the intersections will be nice (I can, for example, envisage exploring the relationship between the North Africans and the IBS Iberian sample in the near future).

As for the paper itself, Genomic Ancestry of North Africans Supports Back-to-Africa Migrations:

Proposed migrations between North Africa and neighboring regions have included Paleolithic gene flow from the Near East, an Arabic migration across the whole of North Africa 1,400 years ago (ya), and trans-Saharan transport of slaves from sub-Saharan Africa. Historical records, archaeology, and mitochondrial and Y-chromosome DNA have been marshaled in support of one theory or another, but there is little consensus regarding the overall genetic background of North African populations or their origin and expansion. We characterize the patterns of genetic variation in North Africa using ~730,000 single nucleotide polymorphisms from across the genome for seven populations. We observe two distinct, opposite gradients of ancestry: an east-to-west increase in likely autochthonous North African ancestry and an east-to-west decrease in likely Near Eastern Arabic ancestry. The indigenous North African ancestry may have been more common in Berber populations and appears most closely related to populations outside of Africa, but divergence between Maghrebi peoples and Near Eastern/Europeans likely precedes the Holocene (>12,000 ya). We also find significant signatures of sub-Saharan African ancestry that vary substantially among populations. These sub-Saharan ancestries appear to be a recent introduction into North African populations, dating to about 1,200 years ago in southern Morocco and about 750 years ago into Egypt, possibly reflecting the patterns of the trans-Saharan slave trade that occurred during this period.

The model outline here is straightforward:

– A population of West Eurasian provenance migrated across the fringe of the southern Mediterranean >10,000 years B.P. (Maghrebi)

– This was later overlain by a later West Asian migration (Near Eastern)

– A third major element here seems to be Sub-Saharan African admixture, which these authors claim is rather new (post-Roman)

Two of the methods used will be familiar to readers of this weblog. They used ADMIXTURE to generate barplots which fractionate putative ancestral components given K number of components. Second, they also use PCA to visualize the largest components genetic variation within the samples on a plane.

[nggallery id=33]

As you “move up” the K’s you note that Maghrebi populations “split” from the Near Eastern reference, the Qataris. This is supported by the PCA, which shows that there is a dimension of variation which separates Near Easterners & Europeans from Maghrebis. The authors note that this dimension is orthogonal to the Sub-Saharan African vs. Eurasian component. That suggests that the putative Maghrebi component is likely to be part of the set of “Out of Africa” populations, rather than an African population which simply experienced continuous gene flow with West Eurasians.

They also estimate a Fst, a statistic which partitions genetic variation within and between groups. The value between Sub-Saharan Africans and Europeans is ~0.15 using HGDP SNP data, and between Europeans and East Asians ~0.10. Using the Tuscans and Qataris as European and West Asian references against the North African populations along their east-west cline they estimate Fsts from ~0.03 to ~0.06. The higher end values are from populations which are less admixed with Near Eastern elements, and the colored polygons illustrate the domain generated by ADMIXTURE Fsts across inferred ancestral components. You also see in the chart estimated time of divergence. I won’t get into the assumptions in the model, but the authors do note that ~12,000 years B.P. seems to be the low bound estimate for when the Maghbrebis diverged from other West Eurasians. This is important, because it predates agriculture.

The final set of methods outlined in this paper looked at ancestry on a more fine-grained genomic scale. To the left you see a plot where each horizontal bar represents an individual’s chromosome 1 (among a set of North Africans). Each color in that bar indicates a component of ancestry (except the black, which are centromeres). This sort of information is important, because saying someone is 50% X and 50% Y summarizes information to the point of eliding it. An individual who is a first generation product of a Chinese-European marriage is going to have the same ancestral proportions as someone who is a Uyghur for those respective populations. But a fine-scale mapping of the genomic ancestry would look very different, because the history of the admixture is very different.

There are many inferences in the paper which I won’t address. Rather, let me focus on this one assertion:

After accounting for putative recent admixture (Figure 1), the indigenous Maghrebi component (k-based) is estimated to have diverged from Near Eastern/Europeans between 18–38 Kya (Figure 3), under a range of Ne and k values. We hence suggest that the ancestral Maghrebi population separated from Near Eastern/Europeans prior to the Holocene, and that the Maghrebi populations do not represent a large-scale demic diffusion of agropastoralists from the Near East.

This is not implausible on the face of it. The component of ancestry modal in the Mozabite HGDP sample tends to have a relatively high Fst in relation to other West Eurasian groups. I had wondered if this was due to ancient Sub-Saharan African admixture which had produced a particular stabilized hybrid, but these results indicate that the component is no closer than other West Eurasians. What I’m confused and skeptical about are the range of divergence times which different papers are producing which seem somewhat implausible taken together.

There are papers which posit that East Asians separated from Europeans ~25,000 years B.P. This is in the same range as the divergence between Maghrebis and West Eurasians, but the Maghrebi genetic distance (Fst) is about 1/2 as great. Also, these sets of results which generate a “bunching” together of the separation of many extant non-African lineages in the 20-40,000 year range imply very rapid differentiation after the “Out of Africa” event, if that event did occur ~50,000 years ago (at least for most Eurasians, even assuming a revised model whereby Australian Aboriginals derive from an earlier wave). One at a time any given divergence estimate may be broadly plausible, but the literature is just not particularly coherent on this matter, and it often seems archaeologically implausible.

Citation: Henn BM , Botigué LR , Gravel S , Wang W , Brisbin A , et al. 2012 Genomic Ancestry of North Africans Supports Back-to-Africa Migrations. PLoS Genet 8(1): e1002397. doi:10.1371/journal.pgen.1002397

Image Credit: Raphaël Labbé

🔊 Listen RSS

In my post below, Tutsi probably differ genetically from the Hutu, there were many comments. Some I did not post because they were rude, though they did ask valid questions. I will address those issues, but let me quote one comment:

That’s an interesting possibility, but this admixture run didn’t split the non-hunter-gatherer Africans that well. In one of your previous analyses on East Africa you managed to get a pretty accurate ‘Afro-Asiatic/Cushitic’ and ‘Nilotic’ cluster. Is it possible that you could run this Tutsi sample using the same admixture settings as in the ‘Flavors of Afro-Asiatic’ blog post to see if he carries a significant Nilotic component or is mainly Bantu & Cushitic derived?

So I replicated ADMIXTURE runs for many of the same populations as I did in my post, Flavors of Afro-Asiatic. I also pared down the population set and generated a PCA with EIGENSOFT. Before I get to those results, let me tackle the questions.

1) “Are the Luhya suitable proxies for the Hutus?”

Probably. The reason is that Bantu-speaking populations, from the Congo to South Africa, are surprisingly similar. Not only that, but these populations are very distinctive from groups which are close them geographically, but linguistically different (e.g., Khoe, Sandawe, Masai). The Luhya are not exceptional. I’ve run the Henn et al. data sets enough to be convinced that they’re exactly as they should be. They are pretty much what you’d expect from Kenyan Bantu. A predominant element which ties them back to an East-Central African point of origin, with some admixture with other East African elements (similarly, South African Bantu exhibit Khoisan admixture). The Hutu may be peculiar, but we don’t know, and my null is that they’re mostly Bantu with some admixture, as is the case with most Bantu speaking populations (this one Tutsi seems to be an exception in that context, as they are presumably Bantu speaking). If you think that the the Luhya are not suitable, I invite you to download the HapMap Luhya, and merge them with some of the Henn et al. data sets (or HGDP or Behar data sets). I think that should convince you.

2) “The admixture percentages you give are weird for population X.”

Someone who is more technically fluent than I can correct me, but I suggest that you be very careful about taking absolute percentages too literally. If you tell a statistical algorithm to push the genetic variation you’ve input into it into a certain number of boxes, it will do that, even if it has to squeeze them in all sorts of ways. In other words, modulating the parameters is an easy way to generate plenty of weird absolute proportions. Often it’s pretty obvious that deeply admixed populations are showing up as their own distinctive cluster…but that begs the question, when is admixture so distant that it shouldn’t count? Instead of focusing on absolute percentages, look at the relationships between individuals and populations. These too can be tweaked and massaged, but my personal experience is that they’re somewhat less volatile.

3) “The Nilotic cluster doesn’t map well onto Nilotic populations.”

The labels one gives, formally or informally, to a population cluster are for ease of recollection. They are not there to transmit to you real concrete information about the deep history of a population and its relationship. Additionally, there is always going to be a lot of confusion when you leverage geographical or linguistic terminologies which have only approximate relationships to genetic clusters. Don’t get so caught up in semantics that you forget that ADMIXTURE components are abstractions, useful for smoking out genetic variation, not for perfectly mapping onto some idealized set of ur-populations.

Now to my results. I used 200,000 markers. I combined Lithuanians and Belorussians into one pot as “Baltic,” and Syrians and Jordanians into another as “Levantines.” For the PCA I focused on African populations, and used the Yemeni Jews as the outgroup. Additionally, there is clearly structure due to some family relationships amongst the Masai. This is a problem in many runs with them. Even when you remove the “problem” individuals other clusters tend to crop up at higher K’s where the Masai are very numerous. In any case, for the purpose of these runs ignore the family clusters, and focus on the more typical individuals amongst the Masai.

Remember that the Tutsi is 3/4 Tutsi, 1/4 Hutu. It is N = 1. So is the Nubian. You see in many of the Horn of Africa populations that the Eurasian component has an affinity with Yemenis, not Europeans. In contrast, the Nubian does have some European-like component. That’s probably simply due to the fact that in this run Levantines themselves have that, and Egyptians who also carry that component are part of the heritage of the Nubian. The Tutsi does have the Southwest Asian component, which the Masai seem to lack.

To get a better sense, let’s look at a slice of individuals. The Tutsi is last. The family relationships of some of the Masai are also clear. Focus on the more typical Masai and the Tutsi:

Looking at the individual results it seems that the Tutsi can be placed with the range of combinations of ancestral components of the Masai, though not the Luhya. To get a different vantage point let’s look at some PCAs, which visualize the largest components of genetic variance in the data set.

The results are not cut & dry. I am less skeptical of some Afro-Asiatic element in the Tutsi heritage, though it still seems that the dominant affinity is with the Masai.

Note: I ran K = 7 to K – 10. There wasn’t anything different in the general pattern of the runs I did not show.

🔊 Listen RSS

A Cape Coloured family

I’ve mentioned the Cape Coloureds of South Africa on this weblog before. Culturally they’re Afrikaans in language and Dutch Reformed in religion (the possibly related Cape Malay group is Muslim, though also Afrikaans speaking traditionally). But racially they’re a very diverse lot. In this way they can be analogized to black Americans, who are about ~75% West African and ~25% Northern European, with the variance in ancestral proportions being such that ~10% are ~50% or more European in ancestry. The Cape Coloureds though are much more complex. Some of their ancestry is almost certainly Bantu African. This element is related to the West African affinities of black Americans. And, they have a Northern European element, which likely came in via the Dutch, German, and Huguenot settlers (mostly males). But the Cape Coloureds also have other contributions to their genetic heritage. Firstly, they have Khoisan ancestry, whether from Bushmen or Khoi. This is well known in their oral memory. The the hinterlands of the Cape of Good Hope are beyond the ecological range of the Bantu agricultural toolkit, so the region was still dominated by the Khoisan when the Europeans arrived. But there are also other suggestions of ancestry from Asia. The existence of the Cape Malays, whose adherence to Islam derives from the Muslims slaves brought by the Dutch, hints at likely relationships to the populations of maritime Southeast Asia. Finally, there are the Indians. This element is not too well recalled in cultural memory. But the Dutch brought many slaves from India as well as Southeast Asia. The Dutch first governor of the Cape Colony had a maternal grandmother who was an Indian slave, by various accounts Goan or Bengali (the town of Stellensbosch is named for him). No doubt it was far more likely that the usual lot of the descendants of Indian slaves during the Dutch era would be to be absorbed into the melange of the Coloured population than assimilated into what later became the Afrikaners.

Why is this aspect of Cape Coloured ancestry forgotten? I think part of the reason is that there is a large South African Indian community present today, but that community post-dates the Dutch period, and arrived with the British. When South Africans think of Indians they think of these people. Interestingly when the new genetic studies confirming Indian ancestry came on the scene I was “corrected” several times by Indians themselves when reporting this part of the Coloured heritage. They were under the impression I must be mistaken, as no one was familiar with the Cape Coloureds having Indian ancestry. Unfortunately pointing to PCA and STRUCTURE plots did not clear up the confusion.

In any case, thanks to the African Ancestry Project I now have three unrelated Coloured samples (I have more, but they are related). Since AAP is Afrocentric I thought it would be appropriate to run the Coloured samples separate first. So that’s what I did.

First, the methodology. I took the Gujaratis, Utah whites, Chinese from Denver, and Luhya (Bantu) from Kenya, and merged them with the Bushmen from the Henn et al. thick-marker data set. I also decided to add in the Yemeni Jews from Behar et al., mostly to check that the West Eurasian ancestry of the Cape Coloureds was in fact Northern European. I limited the Gujarati sample to those from “Gujarati_B”, which is the “more South Asian” cluster within the HapMap data set. I also reduced the numbers for a lot of HapMap populations. I’m looking at inter-continental differences, so I assumed that N of ~20 would suffice. After merging these data sets with the Cape Coloured samples I pruned all the missing SNPs. This left me with ~230,000 markers. In my experience this is kind of overkill for ADMIXTURE at this level of genetic distance between the hypothetical parent populations, but better safe than sorry. I also ran the samples through EIGENSOFT to generate PCAs. Also know that I performed a few “trials” with Sandawe and Hadza from Henn et. al., as well as with larger samples from the HapMap. That either added nothing on the margin, or just got confusing (there’s not really too much Sandawe and Hadza in the Cape Coloureds beyond what the Bantu must have picked up).

After I ran ADMIXTURE up to K = 7 it was clear that the optimal point in terms of informativeness was K = 6. You can see that the Cape Coloured samples have Northern European, Khoisan, Bantu African, Indian, and East Asian ancestry. There is a Yemeni component in two of the Coloured individuals which begs to be explained. This component is too high to be explained by Northern European ancestry alone. It could be explained by slaves from the Muslim Arab world. Also, the Indian reference sample used here was pruned to be very homogeneous. The slaves from South Asia were almost certainly much more diverse than the Gujarati_B population, which is mostly a group of Patels. Finally, sometimes when you run ADMIXTURE you see that combinations of atypical genetic backgrounds (e.g., Khoisan + Chinese) can general components which are likely artifacts. This tends to be an issue when you have two components which aren’t normally found together, and one is at a far lower level than the other. I’ve noticed this in particular with people with low amounts of Sub-Saharan African ancestry and Eurasian genetic backgrounds. They often come out to be East African or Pygmy or Bushmen when the probability of this is likely to be very low a priori. Notice that a few of the Bushmen have the Yemeni component but nothing else besides what you’d expect. This to me increases the likely that the light green in the Coloureds is also an artifact of the Khoisan genetic background against one of the other components.

So below is the K = 6 ADMIXTURE plot, along with the informative PCA’s. Observe that the three Coloureds have IDs.

Image Credit: Wikimedia Commons.

🔊 Listen RSS

Aka Pygmies

The Pith: There has been a long running argument whether Pygmies in Africa are short due to “nurture” or “nature.” It turns out that non-Pygmies with more Pygmy ancestry are shorter and Pygmies with more non-Pygmy ancestry are taller. That points to nature.

In terms of how one conceptualizes the relationship of variation in genes to variation in a trait one can frame it as a spectrum with two extremes. One the one hand you have monogenic traits where the variation is controlled by differences on just one locus. Many recessively expressed diseases fit this patter (e.g., cystic fibrosis). Because you have one gene with only a few variants of note it is easy to capture in one’s mind’s eye the pattern of Mendelian inheritance for these traits in a gestalt fashion. Monogenic traits are highly amenable to a priori logic because their atomic units are so simple and tractable. At the other extreme you have quantitative polygenic traits, where the variation of the trait is controlled by variation on many, many, genes. This may seem a simple formulation, but to try and understand how thousands of genes may act in concert to modulate variation on a trait is often a more difficult task to grokk (yes, you can appeal to the central limit theorem, but that means little to most intuitively). This is probably why heritability is such a knotty issue in terms of public understanding of science, as it concerns the component of variation in quantitative continuous traits which is dispersed across the genome. The traits where there is no “gene for X.” Additionally, quantitative traits are likely to have a substantial environmental component of variation, confounding a simple genotype to phenotype mapping. Arguably the classic quantitative trait is height. It is clear and distinct (there aren’t arguments about the validity of measurement as occurs in psychometrics), and, it is substantially heritable. In Western societies with a surfeit of nutrition height is ~80-90% heritable. What this means is that ~80-90% of the variance of the trait value within the population is due to variance of the genes within the population. Concretely, there will be a very strong correspondence between the heights of offspring and the average height of the two parents (controlled for sex, so you’re thinking standard deviation units, not absolute units). And yet height is at the heart of the question of the “missing heriability” in genetics. By this, I mean the fact that so few genes have been associated with variation in height, despite the reality that who your parents are is the predominant determination of height in developed societies.

The issue gets even more thorny when you talk about variation across societies. This is a simple and yet complex issue. On the one hand we know that over time people across the world have gotten taller as nutrition has gotten better. What is less well known is that human populations have been shrinking until the past few centuries since the the Last Glacial Maximum ~20,000 years ago. Why? One can posit many reasons, both genetic and environmental, but it does point us to the reality that the story of height is not monotonic. That is, it doesn’t go in one direction, and has no simple one size fits all answer.

But that’s just the dimension of time. How about space? The question of whether different populations have final different genetic potentials for height is a disputed one. And yet it seems plausible that at the extremes there are genuine differences in the gene frequencies across populations which will speak to their different distributions in trait values. This is particularly interesting in the case of very populations characterized by low median adult heights, often termed “pygmies.” Of particular note are the Pygmies of Central Africa, who exist in a state of cultural symbiosis with their Bantu and Nilotic neighbors, adopting their languages, but remaining distinct.

These populations have very low median heights, but they are clearly not dwarfs (they are proportionate). Thankfully at least the population genetics of the Pygmies of Africa are now relatively well understood. It seems that the Western and Eastern Pygmy populations are very distinct clusters, with a common ancestry perhaps on the order of tens of thousands of years in the past. And not surprisingly the genetic distance between the Pygmy groups and their non-Pygmy neighbors is very large. The Western Pygmies tend to show more evidence of admixture with their Bantu neighbors than the Eastern ones (I suspect this is due to the longer residence of Bantus in this region). But for me the hardest issue to grapple with is the reality that the Pygmies of Central Africa seem to be genetically closer to the Khoisan people of Southern Africa than their Bantu or Nilotic neighbors! I believe this is evidence of an ancient hunter-gatherer continuum within Africa which has been marginalized and overlain by the recent expansion of Bantu farmers and Nilotic pastoralists.

In any case, what does all this have to do with the genetics of height? A new paper in the American Journal of Physical Anthropology synthesizes the inferences generated from population genetics with the basic logical assumptions of quantitative genetics to adduce that the difference between Pygmies and non-Pygmies in height is actually likely to be due to heritable differences. Indirect evidence for the genetic determination of short stature in African Pygmies:

Central African Pygmy populations are known to be the shortest human populations worldwide. Many evolutionary hypotheses have been proposed to explain this short stature: adaptation to food limitations, climate, forest density, or high mortality rates. However, such hypotheses are difficult to test given the lack of long-term surveys and demographic data. Whether the short stature observed nowadays in African Pygmy populations as compared to their Non-Pygmy neighbors is determined by genetic factors remains widely unknown. Here, we study a uniquely large new anthropometrical dataset comprising more than 1,000 individuals from 10 Central African Pygmy and neighboring Non-Pygmy populations, categorized as such based on cultural criteria rather than height. We show that climate, or forest density may not play a major role in the difference in adult stature between existing Pygmies and Non-Pygmies, without ruling out the hypothesis that such factors played an important evolutionary role in the past. Furthermore, we analyzed the relationship between stature and neutral genetic variation in a subset of 213 individuals and found that the Pygmy individuals’ stature was significantly positively correlated with levels of genetic similarity with the Non-Pygmy gene-pool for both men and women. Overall, we show that a Pygmy individual exhibiting a high level of genetic admixture with the neighboring Non-Pygmies is likely to be taller. These results show for the first time that the major morphological difference in stature found between Central African Pygmy and Non-Pygmy populations is likely determined by genetic factors.

First, is there a plausible physiological reason for the difference in adult height between Pygmies and non-Pygmies? The authors review the relevant evidence:

Endocrinologists have described the physiological determination of the African Pygmies’ short stature: serum levels of Insulin-Like Growth Factor 1 (IGF1) and of Growth Hormone Binding Protein (GHBP) are abnormally low, whereas the levels of Growth Hormone (GH) and IGF2 do not differ from Non-Pygmy controls…In this context, Merimee…proposed that the short stature of African Pygmies could be attributed to the absence of a growth spurt during puberty and that the genetic factor(s) implicated in the Pygmy stature were to be found in the GH-IGF1 axis…A recent gene-expression study further showed a slight (1.8-fold) under-expression of GH and a more dramatic (8-fold) under-expression of the GH receptor in adult African Pygmies, which was not found in Non-Pygmy Bantu speakers…However, the only genetic study focusing specifically on Pygmies’ stature, failed to find allele frequency differences in the promoter region of the gene encoding IGF1 between two African Pygmy populations and Non-Pygmy controls…In this context, whether the Pygmy populations’ short stature is solely due to environmental pressures experienced by individuals during growth (i.e., phenotypic plasticity), or to a complex genetic mechanism, remains to be demonstrated.

I believe that IGF can be found in meat and milk, so there are plausible dietary reasons that one could imagine this difference. As far as looking at differences between the genes which are known to impact height within populations across populations, there simply aren’t that many genes known which could account for the large between population differences. Not to mention that many of the current studies have used European populations, and so would likely have an ascertainment bias which might miss a lot of variance which is common within African populations.

The basic method in this paper is not too difficult to understand:

1) Use STRUCTURE, a program which assigns different ancestral quanta to individuals.

2) And compare the variation in a particular Pygmy-modal quantum across the population with variation in height.

If there are many genetic variants of small effect within the Pygmy genome which are resulting in their relatively low adult median height then dollops of Pygmy genome through admixture will reduce the height of non-Pygmies and dollops of non-Pygmy admixture in Pygmies will increase their height. The presumption is that if there are strong environmental impacts on height due to social differences then the disjunction between genetic identity and anthropological identity will be informative. For example, if Pygmies are put under particular stress or deprived specific nutritional intake because of their communal identity as marginalized Pygmies then different admixture levels with non-Pygmies should not matter much (and vice versa).

There’s a lot of statistics toward the aim of achieving significance in this paper (p-value > 0.05). And I really don’t understand the point of disaggregating males and females, for example. Just convert them to standard deviation units deviated from sex median! But in any case the major correlation is well illustrated by the two panels below. Pygmies are in red and non-Pygmies are in blue:

The y-axis is straightforward, height. You can see the Pygmies in their sample are shorter, on average. The x-axis is an ancestral component inferred from STRUCTURE which is generally found in non-Pygmies. You can see that as expected non-Pygmies have more of this than the Pygmies, but the descriptive statistic of a correlation between the non-Pygmy ancestry and height in Pygmies is evident even in this plot. Conversely, the Pygmy ancestry is correlated with lower adult height in non-Pygmies.

As a single result this particular finding isn’t too earth-shaking. If there was one population which was short due to genetic factors, I suspect that one would have to bet on the Pygmies of Central Africa. And as noted in the paper Pygmoid morphology is found among other hunter-gatherer tropical populations. This may not be a human ancestral type, but it is a type which has emerged repeatedly in our history, whether due to genetic or environmental factors. The big picture is that this same general procedure can be used to explore the differences in genetic dispositions across groups for many quantitative traits. With the coming era of cheap genotyping and sequencing I’m sure it will be done. A intrepid researcher has plenty of admixed populations in the New World to select from. There are in Brazil people who are socially identified and self-identify as white who have less European ancestry than those who are socially identified and self-identify as non-white. To compare the the social and genetic valences of African and European ancestral contributions for medical and psychological quantitative traits these sorts of populations will be of great future interest.

Link credit: Dienekes

Citation: Becker NS, Verdu P, Froment A, Le Bomin S, Pagezy H, Bahuchet S, & Heyer E (2011). Indirect evidence for the genetic determination of short stature in African Pygmies. American journal of physical anthropology PMID: 21541921

🔊 Listen RSS

Over the past few days I’ve been trying to read a bit on the Sandawe. Most of the stuff I’ve been able to find is in the domain of linguistics, and is basically unintelligible to me in any substantive manner. The crux of the curiosity here is that the Sandawe, like their Hadza neighbors, have clicks in their language, and so have been classified with the Khoisan. Here’s some background:

The most promising candidate as a relative of Sandawe are the Khoe languages of Botswana and Namibia. Most of the putative cognates Greenberg (1976) gives as evidence for Sandawe being a Khoesan language in fact tie Sandawe to Khoe. Recently Gueldemann and Elderkin have strengthened that connection, with several dozen likely cognates, while casting doubts on other Khoisan connections. Although there are not enough similarities to reconstruct a Proto-Khoe-Sandawe language, there are enough to suggest that the connection is real.

I can’t speak to the validity of this at all, obviously. Some scholars do argue that the clicks in the Sandawe language were only acquired through interaction with peoples such as the Hadza, making an analogy to Xhosa, a Bantu language which has been strongly influenced by Khoi dialects. In any case, after having run ADMIXTURE a bunch of times on African population sets, and checked the genetic distances of the inferred ancestral ones, one thing that is clear is that the Sandawe don’t show a particularly close genetic relationship to the Bushmen, nor do they show a close relationship to the Hadza. In fact, the Hadza, Pygmies, and Bushmen show a closer relationship to each other, distant as it is, than to the Sandawe. The Sandawe themselves are distinctive from their Bantu neighbors, but, their connections seem more clear to the Masai and other peoples to the north.

Some of the anthropological stuff that I did find on the Sandawe not having to do with linguistics considered the issue of their status as hunter-gatherers, and their shift toward a form of agriculture within the past few centuries. Not surprisingly much of this literature consisted of ideologically shrill posturing, denouncing past scholarship for insensitivity and bigotry, while taking their own maximalist position. For example there has been the hypothesis that hunter-gatherer populations tend to be genetically and culturally isolated from agriculturalists, with several African groups used as exemplars. A group of anthropologists argue strenuously that this model may just be a construction of the biases of previous generations of scholars. But they offer little in the way of counterargument, more keen on uncovering the faults in the motives and methods of their predecessors than in building anything anew.

Genetics can help us a little here. Below are the results of ADMIXTURE and PCA I ran for a selection of populations. I pulled in some Behar et al. samples and merged it with the Henn et al. data set. The marker list was pruned down to ~160,000 SNPs. The limited selection of populations was conscious, insofar as I was exploring specific questions about the relationship of East African populations to Eurasian ones. At K = 8 the populations in my data set separated rather well. Do not take this separation as evidence that this K is a reflection of absolute concrete ancestral populations. Here’s the bar plot:

Since I’ve been running this data set, with some modifications, for a week now I can pick out some trends which I feel are robust at K = 8. For example, the Eurasian-like admixture you see across eastern Africa seems to be distinctively of a southern nature, centered on Arabia (probably Yemen). This makes total geographical sense. The Ethiopians and Somalis (I have some Somali samples which I threw in with the Ethiopians since the Cushitic Ethiopians seem more similar to the Somalis than to Semitic Ethiopians) lack the genetic influence of Bantus in totality. Rather, they have an affinity with the Nilo-Saharan peoples. Finally, the Sandawe tend to “break out” as a separate population only at higher K’s, generally clustering with the Nilo-Saharan element as long as possible.

Let’s also look at a PCA of the populations above on the first two principal components:

The PCA looks a little different from the ones you’re used to seeing because there are only West Eurasian and African groups in the sample. So the second component is not the familiar west-east axis in Eurasia, but the separation between the Mbuti and other Africans. On the far right of the plot you have Orcadians, then Druze, Saudis, and Yemenis. Then you have Horn of Africa populations, Ethiopians and Somalis along the vertical axis. Then Masai and Sandawe, and Luhya, a Kenyan Bantu group. The Masai are a confusing group. Even after removing problem individuals who might be related there tends to be a choppiness in the Masai results. The Sandawe on the other hand are more consistent by and large.

The genetic distances of the inferred ancestral groups aren’t too surprising. Here are MDS visualizations:

One of the consistent trends you see is that the Masai are closer to Eurasians than the Sandawe, but, the “Masai” modal ancestral component is no closer, or even further, from Eurasians than the “Sandawe” ancestral component. At higher K’s once the “Sandawe” element partitions out it is extremely dominant among the Sandawe, and found in lower fractions among other East African groups, especially non-Bantu such as the Masai. I wouldn’t put too much stock in the high proportion in the Ethiopians above, as the outcomes are rather scattered across the K’s and population combinations. The Masai are a population who always seem to have a low fraction of Eurasian-like “Arabian”, and this is what drags the population toward the Eurasians as in the PCA above. The Sandawe seem to lack this admixture; rather, their affinity with Eurasians is deeper and may not be due to admixture at all (ADMIXTURE itself is not perfect, and may transform an admixed group into a “pure” component, as we can see sometimes as among the Fulani or among South Asians, and, I suspect the Mozabites).

Back to the Sandawe and their position in the history of East Africa. Unlike the Pygmies and Khoisan they are not basal in relation to other human lineages from what I can see here. That is, they don’t “split off” as early from the main cluster of branches in a phylogenetic tree of human populations. In fact, unlike the Pygmies and Khoisan, and like the Masai, they are closer to Eurasians than the West African or Bantu peoples. In other words, they’re less basal. In fact, the Sandawe may be closer to Eurasians than most of the Nilotic groups when recent admixture with Eurasians is removed from the picture.

I do not know if the Sandawe are indigenous to their region of Tanzania. If I had to bet money I’d say not, and that some scholarly suppositions for a northerly origin may be plausible based on the affinities with the Masai and even Cushitic and Semitic peoples of Ethiopia and Somalia. The distinctiveness of the Sandawe from their Bantu neighbors seems clear, and there is no special closeness to the Khoisan of Southern Africa. Many anthropologists and historians have pointed out that some groups can “revert” to hunting and gathering facultatively. But the total Bantu domination of much of East Africa suggests to me that this is was not the case with the Sandawe. I think a plausible model is that the Sandawe were part of the substrate of East African hunter-gatherers who have mostly been eliminated and absorbed by the Bantu. In the north related peoples contributed to the emergent Nilo-Saharan and Ethiopian and Cushitic societies, which were able to avoid being swamped by the Bantu because of ecology and their own agricultural traditions. In this model the Sandawe affinities to Khoisan groups was more a matter of horizontal cultural borrowing and influence due to proximity, than a close genetic relationship.

🔊 Listen RSS

Khoikhoi on the move….

Dienekes mentioned today a new paper, Signatures of the pre-agricultural peopling processes in sub-Saharan Africa as revealed by the phylogeography of early Y chromosome lineages. Because of the recent comments in this space on the genetic history of Africa I was curious, but after reading it I have to say I can’t make much sense of the alphabet soup of haplogroups. Remember, there are different ways to capture and analyze the variation in one’s genes. A common activity is to sweep over the whole genome and focus on single nucleotide polymorphisms, variation at the base pair level. So my own analyses using ADMIXTURE focus on tens or hundreds of thousands of such markers. But there are other types of genomic variation, such as copy number, microsatellites, and minsatellites.

Additionally, much of the older human phylogeographic literature focused on mtDNA and Y chromosomal variance. For mtDNA it was partly a function of how easy it was to extract the genetic material (it’s copious on the cellular level). But perhaps more importantly these two types of variance aren’t subject to recombination. This means they are defined by clean phylogenetic trees which do not exhibit reticulation (recombination chops apart correlated markers and mixes & matches them) and presumably are not subject to natural selection, and so perfect for coalescent theory. So you can posit lineages related to each other by steps of sets of mutations, and also easily calculate the time until the last common ancestor for two different branches of the tree using a “molecular clock” model.

Here’s the abstract:

The study of Y chromosome variation has helped reconstruct demographic events associated with the spread of languages, agriculture and pastoralism in sub-Saharan Africa, but little attention has been given to the early history of the continent. In order to overcome this lack of knowledge, we carried out a phylogeographic analysis of haplogroups A and B in a broad dataset of sub-Saharan populations. These two lineages are particularly suitable for this objective because they are the two most deeply rooted branches of the Y chromosome genealogy. Their distribution is almost exclusively restricted to sub-Saharan Africa where their frequency peaks at 65% in groups of foragers. The combined high resolution SNP analysis with STR variation of their sub-clades reveals strong geographic and population structure for both haplogroups. This has allowed us to identify specific lineages related to regional pre-agricultural dynamics in different areas of sub-Saharan Africa. In addition, we observed signatures of relatively recent contact, both among Pygmies, and between them and Khoisan speaker groups from southern Africa, thus contributing to the understanding of the complex evolutionary relationships among African hunter-gatherers. Finally, by revising the phylogeography of the very early human Y chromosome lineages, we have obtained support for the role of southern Africa as a sink, rather than a source, of the first migrations of modern humans from eastern and central parts of the continent. These results open new perspectives on the early history of Homo sapiens in Africa, with particular attention to areas of the continent where human fossil remains and archaeological data are scant.

The authors posit that the connections between southern African Bushmen and the Pygmies of central Africa which they find in the Y chromosomal lineages might have been mediated by the peregrinations of Khoikhoi pastoralists, who possibly diffused from a central-southern African ur-heimat in advance of the Bantu expansion. This seems plausible to me.

The main issue which I’m curious about in regards to all these results are the connections between Pygmies and Bushmen set against the Bantus. I certainly had no expected it, and it has been repeated several times. There is now a lot of weird evidence that demands a hypothesis.

Image credit: Wikipedia

🔊 Listen RSS

ResearchBlogging.orgThe Pith: I review a recent paper which argues for a southern African origin of modern humanity. I argue that the statistical inference shouldn’t be trusted as the final word. This paper reinforces previously known facts, but does not add much that both novel and robust.

I have now read the paper which I expressed a touch of skepticism toward yesterday. Do note, I did not dispute the validity of their results. They seem eminently plausible. I was simply skeptical that we could, with any level of robustness, claim that anatomically modern humans arose in southern vs. eastern, or western, Africa. If I had to bet, my rank order would be southern ~ eastern > western. But my confidence in my assessment is very low.

First things first. You should read the whole paper, since someone paid for it to be open access. Second, much props to whoever decided to put their original SNP data online. I’ve already pulled it down, and sent off emails to Zack, David, and Dienekes. There are some northern African populations which allow us to expand beyond the Mozabites, though unfortunately there are only 55,000 SNPs in that case (I haven’t merged the data, so I don’t know how much will remain after combining with HapMap or HGDP data set).

The abstract:

Africa is inferred to be the continent of origin for all modern human populations, but the details of human prehistory and evolution in Africa remain largely obscure owing to the complex histories of hundreds of distinct populations. We present data for more than 580,000 SNPs for several hunter-gatherer populations: the Hadza and Sandawe of Tanzania, and the ≠Khomani Bushmen of South Africa, including speakers of the nearly extinct N|u language. We find that African hunter-gatherer populations today remain highly differentiated, encompassing major components of variation that are not found in other African populations. Hunter-gatherer populations also tend to have the lowest levels of genome-wide linkage disequilibrium among 27 African populations. We analyzed geographic patterns of linkage disequilibrium and population differentiation, as measured by FST, in Africa. The observed patterns are consistent with an origin of modern humans in southern Africa rather than eastern Africa, as is generally assumed. Additionally, genetic variation in African hunter-gatherer populations has been significantly affected by interaction with farmers and herders over the past 5,000 y, through both severe population bottlenecks and sex-biased migration. However, African hunter-gatherer populations continue to maintain the highest levels of genetic diversity in the world.

Why would hunter-gatherers have so much diversity? The historical and ethnographic data here are clear: it is not that hunter-gatherers are particularly diverse, but that descendants of farming populations tend to be less diverse, and most of the world’s population are descendants of farmers. To give a classic example, ~30,000 Puritans and fellow travelers who arrived in the 1630s to New England gave rise to ~700,000 New Englanders in 1790. This is a growth by a factor of 3 to 4 per generation. And, this does not include the substantial back migration to England during the 1650s, as well as the fact that there was already spillover of New Englanders to other regions of the American colonies in the 17th and 18th centuries (e.g., eastern Long Island was dominated by New Englanders). 30,000 is not small enough to constitute a bottleneck genetically, but one can imagine much smaller founding populations rapidly compounding as agriculturalists push their way through ecologically constraining bottlenecks.

For Africa we have a good candidate for this phenomenon: the Bantu expansion. This rise of African farmers began around the region of eastern Nigeria and Cameroon ~ 3,000 years ago. It swept east, toward the lakes of eastern Africa, and down along the Atlantic coast toward modern day Angola. Between 1,000 and 2,000 years ago in its broad outlines the expansion had crested, reaching its limit in southern Africa, where the climatic regime was not favorable for their tropical agricultural toolkit (e.g., the Cape region has a Mediterranean climate). Here you still have the hunter-gatherer Bushmen, and other Khoisan groups such as the Nama, who practiced animal husbandry. By and large this expansion seems to have resulted in a great deal of biological replacement of previous peoples. South African Bantu speakers, such as Desmond Tutu, share more with Nigerians genetically than they do with the nearby Bushmen, though there has been some admixture on the frontier among Xhosa.

As I have stated, most of this paper elicits little objection from me. The major issue that I do take objection to is the inference that these results indicate the likelihood of southern, not eastern, Africa, being the origin of anatomically modern humanity. The authors do point out that many of the hallmarks of modern humanity have their earliest dates in southern, not eastern, Africa. That does add to the plausibility of their overall case, and I would be curious as to the opinion of someone more versed in the material culture and fossil remains to weigh in. But that’s where we started, not where we are, assuming that their specific contribution to the model does push it forward. So I’ll focus on the genetic data. Here’s the point which seems tendentious to me:

…Regressions of LD on distance from southwestern Africa were highly statistically significant (at 5-Kb windows, P ≈ 4.9 × 10−6) (Fig. 2C). Best-fit (Materials and Methods) locations based on LD are consistent with a common origin in southern Africa. A point of origin in southwestern Africa was approximately 300–1,000 times more likely than in eastern Africa….

If you’ve calculated regressions, you know that this can be quite the art. They are sensitive to various assumptions, as well as the data you throw into them. They’re dumb algorithms, so they’ll give you a result, even if it doesn’t always make sense. To really understand why I remain moderately skeptical of the inference in this paper, you need to look at figure 2B. I’ve reedited a bit for style. Also, some of the groups were so obscure that even I didn’t know them, so I just put in their nation.

On the y axis is linkage disequilibrium. Basically, population bottlenecks, and admixture events, along with localized selective sweeps, can elevate this statistic. The LD statistic for non-African populations is invariably higher than for African ones, and the further away, the higher the value. On the x axis is the distance from their inferred point of origin of the human expansion in south-eastern Africa. The Hadza seem to have gone through a recent bottleneck (or, are going through it now) according to other measures in the paper, so no surprise that they’re deviated above the trend line. The other hunter-gatherer groups, the Bushmen and Pygmies (Namibian and South African Bushmen, the the Biaka from western Congo and the Mbuti from the east of that nation) have low LD values, consistent with relatively stable and deep time histories for the populations, when viewed as a coherent whole (all humans have equally ancient lineages, but coherent populations can be older, or younger, depending on how you view them). My main issue is this: once you remove the non-Sub-Saharan African populations the trend line is far less stark. The Fang, who are a Bantu group near the point of origin of that language family, have nearly the same LD as some of the hunter-gatherer groups. The Mandenka, in far western Africa, have elevated LD vis-a-vis hunter-gatherers, but not nearly so much as the groups with more “northern” admixture (e.g., the Fulani).

The moral of the story here is to not just rely on the final numbers generated by statistical methods, which can be quite of a large magnitude, but look at the figures and try to make sense of them. Overall, I would say that this paper presents many interesting results, but the most robust look to be confirming what we know previously, rather than increasing the probability of a novel locus for the point of origin of modern humans (though the southern origin already gains some support from archaeology).

Citation: Brenna M. Henn, Christopher R. Gignoux, Matthew Jobin, Julie M. Granka, J. M. Macpherson, Jeffrey M. Kidd, Laura Rodríguez-Botigué, Sohini Ramachandran, Lawrence Hon, Abra Brisbin, Alice A. Lin, Peter A. Underhill, David Comas, Kenneth K. Kidd, Paul J. Norman, Peter Parham, Carlos D. Bustamante, Joanna L. Mountain, & Marcus W. Feldman (2011). Hunter-gatherer genomic diversity suggests a southern African origin for modern humans PNAS : 10.1073/pnas.1017511108

Image credit: Mark Dingemanse.

🔊 Listen RSS

In the open thread someone asked: “Any recent stuff on the genetics of Ethiopians.” That prompted me to look around, because I’m curious too. Poking around Wikipedia I couldn’t find anything recent. A lot of the studies are older uniparental lineage based works (NRY and mtDNA). Ethiopia is interesting because unlike almost all other Sub-Saharan African nations it has a long written history. Culturally and linguistically it has both Sub-Saharan African, and non-Sub-Saharan African, affinities. The languages of highland Ethiopia are clearly Semitic. Those of lowland Ethiopia are Cushitic, a branch of the broader Afro-Asiatic language family concentrated around the Horn of Africa (Somali is a Cushitic language, though most Ethiopian nationals who speak a Cushitic dialect are of the Oromo group).

From a human evolutionary genetic perspective, Ethiopia also has specific interest. It is likely that the main recent pulse of humans Out of Africa traversed this region. Additionally, there is some evidence of deep time connections between the groups ancestral to Ethiopians and the Khoisan of southern Africa. It may be that Ethiopians and Khoisan are reservoirs of ancient genetic variation in Sub-Saharan Africa which as been overlain by Bantu in most other regions outside of West Africa. Finally, Ethiopians are known to have high altitude adaptations. This could be due to long term residence in the region, or, assimilation of favorable alleles from the long term residents by later populations.

Fortunately we can get a sense of the genetic affinities of Ethiopians thanks to a paper published last spring, The genome-wide structure of the Jewish people. The focus was clearly on Jews, but they surveyed Amhara & Tigray (Semitic speaking highlanders), Ethiopian Jews (similar ethnically to the Amhara & Tigray, but religiously non-Christian), and Oromo. In the PCA the Oromo and Semitic speaking populations are pretty obviously distinct clusters.

This just means that when you take worldwide genetic variation, and pull out the biggest independent dimensions, and then visualize individuals on the two largest dimensions in terms of how they explain variance, the Oromo and other Ethiopians don’t really intersect. Interestingly the Amhara and Tigray are almost indistinguishable, but the Ethiopian Jews are in their own cluster. There are, for the record, 7 Oromo, 7 Amhara, 5 Tigray, and 13 Ethiopian Jews in the sample.

Now let’s look at the genetic variation in ADMIXTURE. Remember this assigns the genomes of individuals in proportions to K ancestral units. As an example, if you had African Americans, Yoruba, and White Americans, in a total pool, and did K = 2, you might have a tendency where Yoruba and White Americans are in two totally different ancestral populations of K, while African Americans are 80% in one ancestry and 20% in another. The interpretation of this is straightforward, but when it comes to populations whose backgrounds we don’t know as well, one should be careful. The selection of a particular value for K is going to be really important, and we shouldn’t confuse the method from the reality which the method is trying to plumb.

First, K = 8 from Behar et al. I’ve reedited to highlight populations which might inform the variation of Ethiopians.

Now let’s look at a series of K’s. Note the changes.

Luckily for us, we don’t need to stop here. Dienekes included Behar’s Ethiopians (non-Jews) for Dodecad. Additionally, he included the Masai population from the HapMap. This turns out to be important because he found that Ethiopian Sub-Saharan ancestry is similar to that of the Masai, not the other African groups.

Dienekes also provided individual outputs. I’ve stitched together Ethiopians with Egyptians and Saudis. The color coding is the same as above.

You should be able to tell where the three groups start and stop pretty easily. I’m 99% sure that the six individuals with more East African and less Southwest Asian ancestry are all Oromo. Ethiopians, in particular highland Ethiopians, seem to me likely an ancient stabilized hybrid population between a population from Arabia, and a local Sub-Saharan population. This population seems unlikely to have been related to the peoples of West-Central Africa, who are associated with the Bantus across eastern and southern Africa. The Bantu agricultural toolkit runs into ecological constraints in various regions, and it is in those regions that non-Bantu populations have persisted. Ethiopia, with its unique climate and topography, naturally remains non-Bantu (as well as the Horn of Africa as a whole). The possible connections between Khoisan and Ethiopia may be a function of the fact that these areas harbor genetic variants which have disappeared in the intervening regions because of the Bantu expansion. I have a hard time accepting that the Bantu expansion was particular eliminationist, but I am starting to suspect that outside of Ethiopia population densities were very, very, low.

The antiquity of this ancient hybridization event to me is attested by the fact that Ethiopians lack any of the other Middle Eastern components besides the one modal in Saudi Arabia. There is a great deal of intra-population variance in the Saudi data set. Why? Part of this must be the slave trade, as well as pilgrims who remained in places like Mecca. But, I think part of the untold story here is that there may have been a larger genetic impact on Arabia after the rise of Islam from the Levant than vice versa! Probably the gene flow precedes Islam, as Arabia was hooked into worldwide trade and population movements, which Ethiopia was relatively insulated from. The Saudi data set has several people who are “pure” Southwest Asian, but also several who have a great deal of West Asian + South European. These seem likely to be people who have some background in the Fertile Crescent.

🔊 Listen RSS

Last year a paper came out in Science which made a rather large splash, The Genetic Structure and History of Africans and African Americans by Tishkoff et al. Since it’s more than a year old I recommend that those of you curious about the details of the paper and don’t have academic access go through the free registration, as you can then read it in full. Unlike Reich et al. the Science paper didn’t unveil a new method of analysis. It was the standard bread & butter, with PCA’s & STRUCTURE plots & phylogenetic trees. But the coverage of populations within Africa was massive. They had a lot of results and relationships to cover, and ended up with a 100 page supplement.

I commend the whole paper to you. But there are two elements I want to highlight. First, a three dimensional PCA plot. It has the first, second and third principal components of variation. In other words, the three largest independent dimensions in terms of explanatory power of genetic variation. Panel A includes all world populations, and panel B just Africans.


For panel A, PC1 = 20% of the variance, PC2 = 5%, and PC3 = 3.5%. For panel B the PCs didn’t drop off quite so much, PC1 = 11%, PC2 = 6%, PC3 = 5% and PC4 = 4%. In case you don’t know, the Hazda are Africa’s last obligate hunter-gatherers, and speak a language with clicks in it, just as the Bushmen do. The big division highlighted in this paper is that between the “indigenous” relict populations, the Hazda, Sandawe, Bushmen and Pygmies, and those who belong to the more widespread agriculturalist and pastoralist societies of Africa. Implicit within the paper is the model of a Bantu Expansion of farmers, as well as a possible later Nilotic expansion (which brought the Tutsi and Masaai) of herders, in a north-south direction. In the process they assimilated/and or/displaced the indigenous populations, of whom the aforementioned peoples are relict islands persisting in ecologically isolated or unfavorable domains.

324_1035_F5The map to the left shows the population coverage within this paper of African groups. The pie graphs simply show ancestral quanta as inferred by STRUCTURE. You can read the paper for the blow-by-blow. But ultimately it seems there will be need for a finer-grained coverage to the south of the equator. If the Bantu expansion is as recent as archaeologists and linguists assume, on the order of ~2,000 years ago, then the gradients of genetic signals should persist. From what I can tell it is assumed on both genetic and phenotypic grounds that the Xhosa have a higher load of Khoisan ancestry than the Zulu or Tswana. The Bantu Expansion is recent enough that the semi-legendary Phoenician circumnavigation of Africa would have encountered many Khoisan peoples along the eastern coast.

Below are a selection of figures from the above paper. After selecting an image it is probably best to hit F11 for “Full Screen” if you aren’t a on a very big monitor (you can copy image location and view it in a separate window as well).

[nggallery id=5]

🔊 Listen RSS

Since we’ve been talking about Fst a fair amount, I thought it might be nice to put it in some concrete graphical perspective. First, to review Fst in the genetic context measures the proportion of genetic variation which can be attributed to between population differences. To give a “toy” example if you randomly divided the population of a large Swedish village into two groups, and calculated their Fst, it should be ~ 0, because if you randomly select from an unstructured population by definition there shouldn’t be noticeable between population differences. In contrast, if you compare a Swedish village to a Japanese village, a large fraction of the genetic variation is going to be distinct to each population. Around ~10% of the genetic variation in fact will be between the two groups. Many of the genes will be extremely informative, so that if you know the allelic state from a given individual you can predict with a high degree of certitude which population that individual was from (e.g., SLC24A5 and EDAR). A small set of ancestrally informative alleles would produce a sequence of conditional probabilities of extremely high certitude (on the order of 10 genes for these two populations should suffice, perhaps three for “government work”).

But to put this in perspective, and show how genetic variation differs from locale to locale, I though I would compare continental-scale Fst values with that in a small region, southern Africa. The Fst values for the first I obtained from Investigation of the fine structure of European populations with applications to disease association studies, and the second, Complete Khoisan and Bantu genomes from southern Africa. The Bantu in this case is Desmond Tutu, who is from the Xhosa tribe, and has substantial admixture from the non-Bantu populations which were resident in South Africa prior to the arrival of the Bantus.

First, in tabular format:

Spain Sweden Russia Japan
France 0.0008 0.0023 0.0037 0.1116
Spain 0.0047 0.0059 0.1118
Sweden 0.0025 0.1095
Russia 0.1057
KB1 NB1 TK1 MD8 Desmond Tutu
KB1 0.021 0.024 0.022 0.08
NB1 -0.007 0.006 0.091
TK1 0.016 0.088
MD8 0.061

Second, two adjacent bar graphs. In the foreground I’ve simply take the Spain vs. other Eurasian population comparisons, while in the background Desmond Tutu is the reference for the four Bushmen.


In some ways this comparison is an exaggeration of the variation in African genes. The Bushmen and Bantu populations are of very distinct origins, as the latter spread over eastern and southern Africa only in the last 2,000 years. The Bushmen-Bantu cultural gap is one of sharp discontinuity, and despite gene flow it is still to some extent a genetic one as well. But there are other factors dampening Fst in this case. First, Tutu is himself of partial Khoisan ancestry (of whom there are other groups besides the Bushmen), so his genetic distance is likely to be smaller than someone from the Zulu tribe, which has presumably had less admixture with the indigenous populations, being a bit farther from the edge of the demographic “wave of advance.” Second, the gene chips are geared toward Eurasian populations, and presumably missed African, and particularly Bushmen, specific variants because they didn’t go looking.

My own confusion on these issues the past week illustrates I suppose the difficulty in mapping these abstruse and yet materially concrete patterns onto human categories. But quite often wrestling with the difficulties in the surest path to illumination.

• Category: Science • Tags: African Genetics, Bushmen, Fst, Genetics, Genomics 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"