The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information



=>
Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
/
Population Structure

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

PhylogeneticTree“Tree thinking”, just like “population thinking”, is essential to understanding evolutionary biology. But there are problems with this. First, even on a macroevolutionary scale there is massive violation of separation between the branches of the tree of life due to lateral gene flow, whether directly or mediated via viruses. As you drill down to a finer phylogenetic grain the issue of reticulation, the transformation of a tree into a graph, becomes something you have to integrate into your model. This is one reason that TreeMix was developed, it allowed one to model gene flow across the branches of trees on a microevolutionary scale. But there’s another major issue, and that is of clines. The “graph” generated by TreeMix still models pulse admixture events. But a lot of genetic variation is generated in the context of isolation by distance dynamics. There are no singular events where populations mix, rather, every instance is part of a continuous process of mixing.* An excellent illustration of this scenario is a ring species.

When generating a “tree of life” it seems that it behooves us to take these details into account, because to a great extent the details are really what we’re interested in now (OK, at least if we’re population geneticists). Each species or section of the tree of life may exhibit different local dynamics, just like the topography of our planet exhibits local variation. Some regions may be subject to far greater gene flow across the branches rather far from the tips (e.g., plants), while others exhibit rather less (e.g., some mammalian lineages). As far as humans go, about 10 years ago a hybrid tree-cline model seemed viable, as outlined in Ramachandran et al., a serial founder event out of Africa, and then equilibration with isolation by distance, leading to a clinal overlay upon the branching process. After Pickrell et al. I think that this model just can’t suffice. Rather, there were powerful pulse admixture events due to meta-population dynamics and large scale demographic implications of cultural change over the last 10,000 years, which wreaked havoc with patterns of human genetic variation. As on the larger tree of life the dynamics which characterized particular segments of the human phylogenetic tree/graph vary. In the New World Ramachandran et al.’s original formulation might actually be rather good south of the Rio Grande. In contrast it just doesn’t work very well in South and Southeast Asia, which has been subject to a great deal of genetic change over the past ~10,000 years, well after the Out of Africa event (in these two cases there look to be pulse fusions between very distinct branches of the human phylogenetic tree in recent history).

These thoughts were stimulated in part by comments over at 3 Quarks Daily [link fixed] in response to an Omar Ali post where he mirrored content from this weblog. This elicited two broad reactions. First, some readers objected to Omar posting content from me because I’m a conservative who holds views taboo to the liberal mainstream of that website. Because obviously that’s what being liberal is, knowing what to believe, and not giving any voice to beliefs and viewpoints outside of what you label to be right, true, and orthodox.** But a more interesting objection is to those who think that my adherence to the race concept is not supported by science. Here I feel like I’m talking to Creationists, they know what they believe, but they don’t know much about what they don’t believe. Most Creationists don’t really know much about evolutionary biology, they just know their talking points. Similarly, many who object to my acceptance of the validity of the race concept just trot out talking points. Intriguingly there is a similarity to the objections by anti-evolution Creationists and anti-population structure Creationists: a fixation on Platonic categories/kinds. A major confusion that anti-evolution Creationists exhibit is that species, “kinds”, are clear and distinct categories. They’re not. So many of the critiques fail at the get-go, because when you reject a Platonic idea of species the rejection of anti-evolution Creationism is almost axiomatic. The same aspect of disagreement emerges when arguing with anti-population structure Creationists. First, they don’t know the literature, so their objections are often weird and ad hoc. That’s fine. But the bigger problem is that I don’t hold the Platonic model of population structure they seem to think I must hold. Rather, our terms or categories are only useful in an instrumental sense. Their validity or lack thereof reflects how well they model the real processes which shape the genealogies which collectively define particular populations. I can try to get this viewpoint across, but since I hold views which disagree with their views (or so they think) and so naturally am wrong, I generally don’t make much headway.

* This is where Gideon Bradburd’s SpaceMix should help. The preprint will be out soon….

** This is a joke. I’m aware there are many liberal readers of this website. But, you must admit that it’s pretty funny how narrow-minded people who label themselves “liberal” can be! E.g., the fact that I’m a self-described conservative is reason enough not to give voice to my ideas.

 
• Category: Race/Ethnicity, Science • Tags: Population Structure, Race 
🔊 Listen RSS

250px-Indig1 One of the most interesting results in the preprint on ancient European genetics (or more accurately, the ethnogenesis of Europeans in a genetic sense) is the fact that the ~8,000 year old hunter-gatherer sample from Luxembourg had a GG genotype on the SLC24A5 locus. Actually, interesting isn’t the right word, shock, and frankly a little skepticism is more precise. The reason for my reflexive incredulity is that the GG genotype is very much the minor variant in Western Eurasia, and extremely rare among unadmixed Europeans. Europeans have such a high fraction of the A allele that some population genetic statistics to test for selection at a locus are not viable, because there’s not enough variation segregating in that region. This allele also is present outside of Europe, with the A allele being the major variant in South Asians, albeit at a lower fraction, verging on ~50% or less in some South Indian groups. Yet it is not entirely implausible that this allele only swept to fixation over the past 8,000 years in Europe looking at the genomic features* of the region in which it is embedded.

I want to make more concrete why this result is a pretty big deal. If you look at the 1000 Genomes data you have results for British, Finnish, Tuscan, and Spanish individuals, as well as a well characterized sample of white Utahans of Northwest European heritage. There is also a less well characterized pooled data set of “European Americans.” Here are the genotype counts by population:

Population AA AG GG
Utah white 85 0 0
British 89 0 0
Finnish 91 2 0
Tuscan 97 1 0
Spanish 14 0 0
European American 4256 40 1

Yesterday on Twitter I suggested that I’d want at least 10,000 individuals of unadmixed Northern European ancestry before I might take a bet that I’d find someone with a GG genotype. I don’t think I was exaggerating. The sample size might be one, but the fact that the individual was homozygous for GG implies to me that the G allele was present at a far higher fraction in Northern Europe 8,000 years ago than today. In contrast the LBK farmer individual was AA on SLC24A5. Why this matters functionally is that no matter how you look at it, when comparing Europeans and dark skinned populations (e.g., Africans, South Indians, and Australasians) this locus is the one that explains the highest proportion of the variation on pigmentation of any gene. Comparing simply people of African ancestry and Europeans the variation at this gene accounts for on the order of ~1/3 of the difference.** I myself have the “European” AA genotype, with most of my other large effect loci being of the “dark” correlated alleles. The pigmentation difference between a Sub-Saharan African and myself is probably accounted for just by this locus alone. But a twist on this story is that the hunter-gatherer also exhibited the genotype associated with blue eyes in Europeans. In contrast, the farmer genotype was the one not correlated with blue eyes. On another locus which is not quite fixed for a derived light encoding variant, but very close in Europeans (and found in much lower proportions in other West Eurasians), SLC45A2, it looks as if both the hunter-gatherer and the farmer carry the modal European form.

220px-Lucy_Merriam Rather that squeezing too much more out of a few samples, I want to posit that these results increase the plausibility that the suite of genetic variants across many loci which are often diagnostic of the complexion of Northern Europeans are a function of a combination of admixture and then selection within the resultant Northwest European lineages. It seems plausible that independent selection events were occurring across these groups, and with admixture more novel variants were present in the combined population which allowed for a skew even further along the phenotypic continuum, toward the physiological limit (at least for non-albinos). Though it looks like the majority of the ancestry of Northern Europeans, especially populations around the coastal East Baltic region, derive from hunter-gatherer groups indigenous to the continent (i.e., pre-Holocene), if they were not fixed for the derived variant on SLC24A5 it seems implausible that these ur-Europeans were defined by the rosy complexions which are archetypical for Northern Europeans . This is part of the broader picture whereby the phenotypically salient population clusters we see around us today, as if they are Platonic ideals of underlying racial forms, may themselves be phenomena distinctive to the Holocene .

* A large correlated block of markers which seem to have risen in frequency recently and rapidly within the population.

** Northeast Asians have their own distinct mutations which confer light skin.

 
• Category: Science • Tags: Pigmentation, Population Structure, Race 
🔊 Listen RSS

Some have asked what the point is in poking around African population structure when Tishkoff et al. and Henn et al. have done such a good job in terms of coverage. First, it is nice to run your own analyses so you can slice & dice to your preference, and not rely on the constrained menu provided by others. There’s value in home cooking; you can flavor to your taste. Second, you never know what data people might leave on your doorstep. I’ve received the genotypes of three Somalis. Nothing too surprising, a touch more Cushitic than the Ethiopians in Behar et al., but interesting nonetheless.

Also, you can see how ADMIXTURE tends to come to weird conclusions in certain circumstances. Below is a K = 12 run ~50,000 SNPs. I’ve included in a few Behar et al. and HGDP populations to the Henn et al. set, as well as pruned a lot of the African groups which seem redundant in terms of information. I’ve added a few geographically informative labels as well.

Observe below that there is a Fulani cluster. I think this is pretty much an artifact. At K = 7 the Fulani have a majority component which is modal in West Africa & Bantu speakers, and a minority component which is identical to the one modal in Mozabite Berbers from Algeria. The Mozabites reside in the far northern Sahara, and their modal component drops off as one goes east toward western Asia and the eastern Mediterranean. I suspect that what is showing up in ADMIXTURE is the ancient hybridization of the Fulani, and perhaps their demographic expansion from this core group. We have some glimmers of the prehistory of the Fulani, and no expectation for them to be such a distinctive cluster, so I naturally jump to these inferences. But it does make me reconsider the nature of the “Sandawe,” “Mbuti” or “San” clusters in ADMIXTURE. These populations are culturally distinctive in deep ways from their neighbors, so a reflexive inference one might make is that they’re “pure” ancient substrate groups which have been overlain and marginalized by their Bantu neighbors. But their prehistory is far murkier than the Fulani because of their geographical isolation, so there is far less to go on. These “ancient” isolated groups themselves may have gone through the same sort of distinctive recent ethnogenesis processes which we presume occurred with the Fulani (also, in the plot below the Biaka are pure; but in most of the bar plots they have a minor element which they share with their neighbors, probably due to greater admixture and interaction between western Pygmies and their Bantu neighbors than among the easter ones).

OK, now let’s prune some of the “pure” and extraneous populations. Additionally, I’ll remove some of the K’s. So the proportions are going to be recalculated with a new base. So, keep in mind that the South African Bantus show elevated West African in part because the Khoisan proportion was removed, inflating the percentages for all the other elements.

Now let’s look at the pairwise Fst values between inferred populations. Remember, this measures the proportion of genetic variance which can be attributed to between population differences. The bigger the value, the larger the genetic distance. I’ll given the inferred populations labels, but don’t take that too seriously.


Fst divergences between estimated populations:
Fulani San Euro Maya Nilotic Biaka W African SW Asian Sandawe Mbuti Mozabite Bantu
Fulani 0.00 0.19 0.15 0.26 0.11 0.13 0.09 0.14 0.10 0.18 0.12 0.10
San 0.19 0.00 0.27 0.37 0.16 0.11 0.13 0.25 0.13 0.13 0.23 0.13
European 0.15 0.27 0.00 0.18 0.17 0.22 0.19 0.05 0.15 0.26 0.06 0.19
Maya 0.26 0.37 0.18 0.00 0.27 0.31 0.28 0.19 0.25 0.36 0.20 0.28
Nilotic 0.11 0.16 0.17 0.27 0.00 0.10 0.07 0.17 0.08 0.14 0.13 0.07
Biaka 0.13 0.11 0.22 0.31 0.10 0.00 0.07 0.21 0.09 0.09 0.18 0.07
W African 0.09 0.13 0.19 0.28 0.07 0.07 0.00 0.17 0.07 0.12 0.14 0.05
SW Asian 0.14 0.25 0.05 0.19 0.17 0.21 0.17 0.00 0.14 0.25 0.06 0.18
Sandawe 0.10 0.13 0.15 0.25 0.08 0.09 0.07 0.14 0.00 0.13 0.12 0.07
Mbuti 0.18 0.13 0.26 0.36 0.14 0.09 0.12 0.25 0.13 0.00 0.22 0.12
Mozabite 0.12 0.23 0.06 0.20 0.13 0.18 0.14 0.06 0.12 0.22 0.00 0.14
Bantu 0.10 0.13 0.19 0.28 0.07 0.07 0.05 0.18 0.07 0.12 0.14 0.00

Here’s the genetic distance between non-African groups and African ones on a bar plot .

Some consistent trends:

- Mbuti and Khoisan show the largest distance from non-Africans.

- Biaka are next. Again, this may be due to admixture between Biaka and neighboring groups, or, a closer relationship between the Biaka Pygmies and the non-Khoisan/Mbuti African groups with reference to the last common ancestors.

- Roughly equal distance of Bantus and West Africans.

- Marginally smaller distances between the Nilotic cluster and non-Africans.

- Finally, a consistently smaller difference between non-Africans and the Sandawe cluster.

As always we need to remember that these probably aren’t pure concrete real ancestral groups. I have no hesitation in presuming some low level consistent gene flow over time between the western Mediterranean groups of which Mozabites are part and some of the Nilotic populations in north-central Africa. This equilibration of gene frequencies would reduce the Fst value naturally. Second, the relative closeness of the Sandawe cluster jumped out at me initially when I looked at the African data. It just strikes me as weird.

Here’s Wikipedia on the Sandawe:

The Sandawe are an agricultural ethnic group based in the Kondoa district of Dodoma Region in central Tanzania. In 2000 the Sandawe population was estimated to number 40,000.

The Sandawe language is a tonal language with clicks, apparently related to the Khoe languages of southern Africa. Recent research suggests that the ancestors of the Khoe were pastoralists, and migrated into southern Africa from the northeast, perhaps from the region of the modern Sandawe.

But the Sandawe don’t seem to be that close to the South African Bushmen samples. Here’s a multidimensional scaling of the Fst relationships of selected inferred ancestral African groups (weight the x-axis more):

An aspect of PCA plots which always jumps out you is the gap between African groups and non-African ones, often spanned by populations which have likely recent admixture. One hypothesis to explain this is that there’s been little gene flow between Africa and the rest of the world since the Out of Africa event. Probably due to ecology (the Sahara). But here’s another explanation: the Bantu expansion has wiped clean much of the genetic variation of central and eastern Africa, the very variation which might span in part the African vs. non-African gap. The archaeology and anthropology indicate that both the groups currently dominant in much of eastern Africa and down to the south, the Bantu and Nilotic peoples, are intrusive on the scale of the past 3,000 years. So groups like the Hadza and the Sandawe are presumed to be relics of the older cultural and genetic variation. This may be why the Sandawe are closer to Eurasians than other African groups once you control for clear likely admixture (e.g., the Fulani). Or, it may be that the Sandawe themselves have an older admixture event due to back-migration from Eurasia….

Finally, let me leave you with a bunch of MDS plots which visualize the Fst differences.


(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS


mtDNA haplogroup G1a2

The pith: In this post I examine the most recent results from 23andMe for my family in the context of familial and regional (Bengal) history. I also use these results to offer up a framework for the ethnognesis of the eastern Bengali people within the last 1,000 years, and their relationship to other South Asian and Southeast Asian populations.

Since I received my 23andMe results last May I’ve been blogging about it a fair amount. In a recent post I inferred that perhaps I had a recent ancestor who was an ethnic Burman or some related group. My reasoning was that this explained a pattern of elevated matches on chromosomal segments with populations from southwest China in the HGDP data set. But now we have more than my genome to go on. This week I got the first V3 chip results from a sibling. And finally, yesterday the results from my parents came in. One thing that I immediately found interesting was my father’s mtDNA haplogroup assignment, G1a2. This came from his maternal grandmother, and as you can see it has a distribution which is mostly outside of South Asia. In case you care, I asked my father her background, and like my patrilineage she was a “Khan,” though an unrelated one (“Khan” is just an honorific). I received these results before the total genome assessment, and so initially assumed this confirmed my hunch that my father had some unknown recent ancestry of “eastern” provenance. But it turns out my hunch is probably wrong. In fact, my parents have about the same “eastern” proportion, with my mother slightly more! My expectation was that perhaps my mother would be around 25-30% “Asian,” and my father above 50%. The reality turns out that my father is 38%, and my mother 40%.


Image credit: f_mafra

Below are the “Ancestry Paintings” generated by 23andMe for my family (so far). What you see are the 22 non-sex chromosomes, which have two copies each, and assignments to “Asian,” “European,” and “African,” ancestry groups. The reference populations to generate these assignments come from the HapMap, the northern European sample of white Americans from Utah, Chinese from Beijing, Japanese from Tokyo, and ethnic Yoruba from Nigeria. What the assignment to one of these classes denotes is that that region of the genome is closest to that category in identity. It does not imply that your recent ancestry is European or Asian (African is probably a different matter, but there are many complaints about the results for African Americans and East Africans in the 23andMe forums). This caveat is especially important for South Asians, because we generally find that we’re ~75% European and ~25% Asian. All that means is that though most of our genetic affinity is with Europeans, a smaller fraction seems to resemble Asians more. Via “gene sharing” on 23andMe I can see that the Asian fraction varies from ~35% in South India and Sri Lanka, to ~10% in Pakistan and Punjab. This is not because South Indians have more East Asian ancestry than Punjabis. Rather, to a great extent the South Asian genome can be decomposed into two ancestral elements, one with a distant, but closer, affinity to populations of eastern Eurasia, and one with a close affinity to populations of western Eurasia. What some have termed “Ancient South Indians” (ASI) and “Ancient North Indians” (ANI). ASI ancestry, which is probably just a touch under 50% in South Asians overall, seems to shake out then as somewhat more Asian than European.* The fraction of ASI increases as one moves south and east in South Asia (and as one moves down the caste status ladder).


[zenphotopress album=249 sort=sorder_order number=4]


First, I want to note that I’ll be using abbreviations for my family members now and then (this applies to future posts). My father will be RF, my mother will be RM, and my siblings will be RS, with a number to denote which sibling. So currently we have RS1. As you can see in a gestalt sense we resemble each other a great deal as a family. We’re about 40% Asian, and 60% European. The extent of fragmentation indicates that we’re not that recent of an admixture; otherwise, the Asian and European fragments would cluster on one strand or the other. Some have suggested that my mother does exhibit less fragmentation. A hypothesis for why this may be is that her maternal grandfather was reputedly from a family of Middle Eastern origin who had resettled in South Asia, first in Delhi, and later in southeast Bengal (specifically, the district of Noakhali). Since he presumably would hardly have had any Asian ancestry according to 23andMe’s algorithm the homologs inherited from him would be overwhelmingly European, with only one generation of recombination intervening.

To assess probabilities of the plausibility of various hypotheses to explain the pattern of the results you need all the non-genomic information. Above is a map of British India. I’ve pointed to the region of Bengal from which my family comes. Of my great-grandparents 7 out of 8 were born in Comilla (which is actually a greater expanse to the southeast of Dhaka than the current Bangladeshi administrative division). 1 grandparent was born in Noakhali, which is just to the southeast of Comilla. 4 out of 8 great-grandparents were born within 5 miles of the town of Chandpur (RF’s grandparents). 3 out of 4 great-grandparents were born within 5 miles of the village of Homna (RM’s grandparents). These two locations are about 30 miles from each other as the crow flies, though transport between them would have been by water in an earlier era (Homna is on the Meghna river, which is actually a more substantial body of water than the Ganges by the time the latter reaches Bangaldesh). This region is bounded on the west by the Padma river, which narrows at Chandpur to about 2 miles in width (average depth ~1,000 feet). To the east is the Indian state of Tripura. This is a relatively porous border, defined on the map, not imposed by geography. You can see that in some regions the Bangaldesh-India border here in the east actually bisects rice paddies.


Tripuri children

Today Tripura state is majority ethnic Bengali due to mass migration of Hindus from what was East Bengal during the 20th century (and later East Pakistan, and now Bangladesh). But its indigenous people are the Tripuri, a tribe whose native language is clearly Tibeto-Burman, and physical type points to their connection with populations to the north and east. At the same time, ~90% of the Tripuri are Hindus, and during the period of Islamic rule in South Asia the rajahs of Tripura styled themselves defenders of Hindu civilization (just as the Tibeto-Burman Ahoms of Assam did). As such, linguistically and genetically the native people of Tripura exhibit a sharp contrast to the Indo-Aryan peoples of the Gangetic Plain, of whom the Bengalis are the easternmost representatives along with the Assamese. But, they have also long been part of the South Asian cultural scene, and can not longer be viewed as purely intrusive (their oral history indicates that they arrived before the Muslims, for one).

Finally, in regards to the detailed backgrounds of my 8 great-grandparents, 2 were of the Khan class. 1 was from a family of Hindu Thakurs who were recently converted to Islam. Another was of the family name Sarkar. 1 was likely from a family of Middle Eastern transplants to South Asia, at least in part. The 4 remaining great-grandparents were Bengali Muslims, with no particular background information beyond that known by my parents.

I gave you all this because genetic variation is strongly conditioned upon geographical and cultural parameters. Water barriers seem to have been particular efficacious in the pre-modern period dividing people culturally and genetically (though ironically water was also a precondition for any bulk trade). Language is also another major parameter of difference. And finally, there is religion. In the last section I would not be surprised if 300 years ago the majority of my ancestors in that generation were Hindus; there is some fluidity in this obviously. I provide the data on radius of place of birth because we know from European results that even villages exhibit genetic clustering. This is mitigated in my family because my father has a diverse background among his grandparents as far as community goes, while my mother has a grandparent who was from a different district, and to a great extent a different ethnic group in biological terms.

When I initially saw that I was ~40% Asian I was little taken aback by the high proportion (remember, the average South Asian is about 25% Asian), but there were two parsimonious explanations, a) I had a lot of ASI, b) I had ancestry which did not seem South Asian as such, but was genuinely from East Asia. To ascertain whether it was the former I began proactively gene sharing with a wide range of South Asians on 23andMe. After dozens of individuals it became clear that I was outside of the normal interval of variation. I was more Asian than individuals from South India or Sri Lanka. Additionally, even these individuals tended to be genetically closest to Central South/Asians in the HGDP data set. I was closest to East Asians. Also, on the two dimensional PCA projected onto Central South/Asians I was definitely outside of the cluster of all the other South Asians. Finally, I did find someone who broke the magic 35% barrier of Asian…and that individual was a Bangladeshi, at 38%. And, like me, he was closer to East Asians on the basic “Global Similarity” match. He also carried a Y chromosomal lineage which was rare in South Asia and common among the Hmong. Finally, when Dienekes started his Dodecad Ancestry Project it was clear that about ~15% of my ancestry clustered with an element which was not South Asian, but East Asian. If one removes this fraction, I would be about 70% European and 30% Asian, absolutely within the normal range for someone with ancestry to the east or south of the subcontinent.

If you’ve read up to this point, you may be wondering how it is that my father is 38% Asian and my mother is 40% Asian, and I’m 43% Asian. After all, shouldn’t I be an average between the two? Actually, on the PCA scatter plot I am (along with my sibling) exactly between my parents (you can’t see the offspring because the flags are just too large). So why the difference? First, remember that the PCA is projecting you onto a two dimensional axis where the x and y represent the two biggest components of variance in the data set. In other words, it’s yanking out the subset of genetic variance which really stands out in terms of between population difference. This is how an individual who is a first generation Eurasian can be so far from their parents on this plot, but still exhibit a great deal of identity by state in terms of total genome; there’s a lot of variation that the two dimensional plot does not capture (e.g., private variants to family lineages). The Ancestry Painting estimates are different; they’re looking across the whole genome and making assessments for each region as to its genetic affinity between the three reference populations. So to repeat, you have over 50 reference populations vs. 3, and, you have a small proportion of the total genetic variation, vs. the whole genome. Both methods are reporting real and valid results, but they’re somewhat different.

So there are two very simple and methodological explanations for the discrepancy above which I can think of. I’m on V2, while my parents and sibling are on V3. I know this has made a difference in other measurements. Additionally, there’s clearly some “noise” within this algorithm, resulting in people with trace African or Asian ancestry which isn’t real, even if you take into account the kludgey nature of the reference populations. But let’s take the results at face value. With the ancestry painting, recall how the European and Asian components were chunky across the genome? Both of my parents received half their genomes from their parents. My own chromosomes are a mosaic of those of my grandparents. Some of the original linkage between genomic regions because of their physical location on the same strand have been broke apart by recombination in the two generations downstream from my grandparents. Concretely, two instances of meiosis which produced sex cells. Therefore, some of the associations of alleles present in my grandparents have been transformed within me. But even without recombination, it is clear that one homologous chromosome could be more European or Asian than the total genome average. Because only one of these is passed to any given offspring, there is going to be variance from sibling to sibling. Genetics is not a pure blending process. That may be why I am 43% Asian while RS1 is 40% Asian. We’re both sampling from our parents genes, and there’s going be variance in that process (on the chromosomal level you have 22 autosomal draws from each parent where each draw has two outcomes).

An interesting implication of this is that the grandchildren of a multiracial couple will exhibit variance in their ancestral quanta from major racial groups. This is one reason why it is a fallacy to presume that intermarriage will result in the washing away of biological diversity. And processes such as assortative mating could even presumably extract out “pure” individuals from an originally admixed random-mating population.

With all that said, I now believe that with an N = 3 from eastern Bengal that I am not an exception with recent Southeast Asian ancestry, but rather eastern Bengal is part of the gene frequency cline between South Asia and Southeast Asia, and as such has a substantial fraction of eastern ancestry. Zack has my parents’ data, so once the results come back from the first runs of HAP I believe that he will see the same pattern of substantial non-South Asian ancestry in them that Dienekes found in me. The cline here is still sharp. The average Bangladeshi is probably interchangeable with just 10-20% with the average Burmese when it comes to proportions in inference of ancestral quanta algorithms. (remember that the Burmese probably have a small South Asian component too). In contrast, the average Bangladeshi probably can be interchangeable at 80-90% with a resident of Bihar (the closest match in total SNP comparison in 23andMe that I’m sharing with is a Bihar, not the two other ethnic Bengalis). This is clearly a function of geography, the north-south ranges in Burma seal it off from South Asia. In contrast, there are open plains from northern Bangladesh to Bihar. In some ways Burma has more cultural affinity and connection with peninsular South Asia because of the ease of maritime travel. The prevalence of Theravada Buddhism in Burma is a testament to the association of the lower Irrawaddy region with Sri Lanka.


Back to Bangladesh. One aspect of the Indian subcontinent in terms of religious demography is that the heart of Indo-Islam, the Delhi area, never had a Muslim majority. Rather, Muslims were a majority along the northwestern and northeastern fringe (along with a few other districts, such as northern Kerala). The predominance of Islam on the northwest isn’t that surprising, as that region borders upon the Dar-al-Islam proper. But what about Bengal? In the late 19th century the British were apparently surprised that in the united Bengal (which includes roughly the modern state of West Bengal in India, and Bangladesh) had a Muslim majority. Because of differential birth rates and conversion (this second includes sections of my family as I note above) about 2 out of 3 ethnic Bengalis alive today are Muslim, with the balance being Hindu. Bangladesh is estimated to be 90% Muslim, while West Bengal is 25% Muslim. Even today after generations of Hindu outmigration one pattern within Bangladesh is the relative concentration of Hindus to the west and north (also, Hindus in Bangladesh tend to be urban). The “buckle” of the “Koran belt” in Bangladesh is actually the district of Noakhali, on the southeast fringe of Bengal. My mother’s maternal grandfather, who came from a lineage of pirs who had originally settled in the Muslim heartland in Delhi, was from Noakhali. It is apparently said that in Noakhali even the Hindus know proper Islamic forms!**

An explanation for this pattern is that the religious influence and power of Hindu elites declined as a direct function of distance from the regions of West Bengal, which were closer to the core Aryavarata, and had traditionally been the locus of power of Hindu dynasties before the rise of Islam. Additionally, Bengal was the last region of the mainland subcontinent with a robust Buddhist society during the flowering of the Pala Empire around the year 1000. It is therefore suggested that many Bengali Muslims were converted directly from Buddhism, not Hinduism (there remains even today a small minority of ethnic Bengali Buddhists, who carry the surname “Baura.” This is in distinction to the descendants of Tibeto-Burman people who now speak Bengali, but retain a tribal identity and Theravada Buddhist religion). Also, it may be that eastern Bengal was populated mostly by animist tribes before the arrival of Muslims, and just as European colonial powers were more successful in Asia at spreading their religion among marginalized people (e.g., tribal peoples in northeast India and Southeast Asia are often Christian), so Islam found purchase among those outside of the Hindu caste system.

These models are broadly persuasive to me. But, I still am suspicious that there was such a strong disjunction in the depth of Hindu institutions in western vs. eastern Bengal; after all, the kings of Tripura to the east were Hindu when Islam was new in South Asia. If being tribal and marginal to the core Hindu civilization was one of the grounds for susceptibility to Islam it is peculiar that it is precisely many tribal people in modern Bangladesh who are not Muslim. Indeed, the Tibeto-Burman populations nearer to Indian groups in eastern South Asia are Hindu or Buddhist, not Muslim (those further in the hinterlands were not integrated into any South Asian religion, but converted to Christianity by Western missionaries within the last century).

Instead, I find the model espoused in The Rise of Islam and the Bengal Frontier, 1204-1760 broadly plausible as a complement, or even substitute, to the above hypotheses. Additionally, it has the utility of making sense of the genetic data which I have presented here so far. The author argues that eastern Bengal, most of Bangladesh, was very lightly populated before the conquest of Bengal by Muslims in the 13th century. During the modern era the western region of Bengal, in India, has tended to have issues with the moribund nature of many of the water courses. But one thousand years ago this region was more active in terms of sedimentation, while eastern Bengal was a wilderness. Over the centuries there has been a shift of large rivers to the east, opening up that area to cultivation because of improved transport. Additionally, the arrival of Muslims also resulted in the spread of new techniques of land clearing and settlement. The rough model is that eastern Bengal is in fact a relatively newly settled territory in terms of its current demographic density. As the clearance and settlement operations were performed by Muslim elites, many of the peasants who settled these lands were either Muslim, or more likely, adopted the religion of their landlords. Because of the virgin nature of the territory these original settlers entered into a phase of massive demographic expansion, to the point where eastern Bengal (Bangladesh) is now today twice as populous as western Bengal (West Bengal). The key here is that there need not be a massive conversion of the enormous masses of marginal animist, Hindu or Buddhist peasants. Rather, all one needs is a modest number of converted Bengali peasants to enter into exponential population growth until the land is “filled.” (interestingly, one sees similar patterns between descendant populations in both the USA and among Koreans. The religions in the “core” homelands are very different in constitution from the Diaspora)

I find this persuasive for two major reasons. First, Peter Bellwood’s First Farmers documents the difficulties of populations which have not been engaged in intensive farming to switch to that modality. At least back to the Mughal period Bengal was a densely settled land from which one could extract massive rents simply due to aggregate productivity. Today a united Bengal would have a population of 240 million, making it the fourth most populous nation in the world, below the USA, and just above Indonesia. In hindsight I find it less likely that the peasants of eastern Bengal descend from tribal peoples who had been practicing extensive agriculture, but were introduced to new techniques, than that western populations already habituated to the grinding expectations of intensive farming colonized the “empty” lands (in fact, Bengali peasants migrate to Assam in part because of the perception of land surplus there, even though Assam has 30 million inhabitants). But this initial phase of colonization would entail relatively few peasants, and probably exhibit some male bias. Therefore, this can to explain a substantial fraction of the eastern ancestry among Bangladeshis, as in the first generations the Bengali peasants did assimilate the native tribal peoples of the region, whether it to be the Munda Santhals or Tibeto-Burman relatives of the Tripura. With the massive numbers of ethnic Bengalis in comparison to Tibeto-Burman groups it seems one would need a great deal of gene flow in any model which posited that exchange between these two groups over long periods of time explain the high fractions that one finds of non-South Asian ancestry. In all of India there are only 10 million speakers of Tibeto-Burman languages, vs. the 240 million speakers of Bengali alone in the Indian subcontinent.

Where does this leave us? From what I gather you’ll probably not make it into the first round of results for HAP, but if you have 23andMe results and haven’t it sent it to Zack, and want to learn more about the historical genetics of the Indian subcontinent, you can still get involved! With my parents Zack now has an N = 2 of Bengalis. It would be nice to get more. We still need samples from North-Central India. The number of Punjabis is in the 5-10 range, Tamils is around 5. Enough to make inferences, but certainly not robust enough to bet the house on. In the near future I’ll get results from my other siblings, and I’ve decided to “upgrade” to the V3 chip. Once that comes in I’ll phase some of the results, and probably start comparing myself to my siblings, “phase” the results, etc.

* Native Americans, descendants of pre-Columbian Americans, have the inverted results from South Asians, mostly Asian with a European minority. This is not just due to recent European admixture. Rather, though Amerindians have affinities to East Asians, the two groups have been distinct for at least 10,000 years, and probably considerably longer.

** Also, some have stated that the people of Noakhali are sly and cunning, adept at following the letter of the law, but not the spirit. I only know this because when I was young one of my father’s friends, also from Bangladesh, complained that a mutual acquaintance from Noakhali who made much of his piety (he put his wife in purdah when she arrived from Bangladesh) requested that someone else purchase a pornographic magazine for him. His reasoning was that he did not want to be seen purchasing the magazine. It was a sin to purchase such an item for a good Muslim. Later my father and his friend (who was from northern Bangladesh for what it’s worth) commiserated that such was the way of the people of Noakhali, amongst whom you have to have your wits about you lest they exploit some angle for their own self-interest. The pious-porn-non-purchaser was notorious for being a non or late payer of rent when he was a lodger with other Bangladeshis, always emphasizing his religious piety as surety of final payment of the debt. He also eventually finagled a loophole in the immigration law of the time, obtaining green card with relative ease and no necessity of sponsorship. The proper connotation of how people from Noakhali are is probably captured by the American English word slick.

(Republished from Discover/GNXP by permission of author or representative)
 
🔊 Listen RSS

snpskinIn my post below I quoted my interview L. L. Cavalli-Sforza because I think it gets to the heart of some confusions which have emerged since the finding that most variation on any given locus is found within populations, rather than between them. The standard figure is that 85% of genetic variance is within continental races, and 15% is between them. You can see some Fst values on Wikipedia to get an intuition. Concretely, at a given locus X in population 1 the frequency of allele A may be 40%, while in population 2 it may be 45%. Obviously the populations differ, but the small difference is not going to be very informative of population substructure when most of the difference is within populations.

But there are loci which are much more informative. Interestingly, one controls variation on a trait which you are familiar with, skin color (unless you happen to lack vision). A large fraction (on the order of 25-40%) of the between population variance in the complexion of Africans and Europeans can be predicted by substitution on one SNP in the gene SLC24A5. The substitution has a major phenotypic effect, and, exhibits a great deal of between population variation. One variant is nearly fixed in Europeans, and another is nearly fixed in Africans. In other words the component of genetic variance on this trait that is between population is nearly 100%, not 15%. This illustrates that the 15% value was an average across the genome, and in fact there are significant differences on the genetic level which can be ancestrally informative. You can take this to the next level: increase the number of ancestrally informative markers to obtain a fine-grained picture of population structure. In the illustration above the top panel shows the frequencies at the SNP mentioned earlier on SLC24A5. The second panel shows variation at another SNP controlling skin color, SLC45A2. This second SNP is useful in separating South and Central Asians from Europeans and Middle Easterners, if not perfectly so. In other words, the more markers you have, the better your resolution of inter-population difference. This is why I found the following comment very interesting:

Razib’s final concession (that genetic variation exists) is revealing because I think that’s as far as the argument can really be taken. It’s a bit of a strawman, in that people who argue that race is entirely a social construct don’t actually deny that human genetic variation exists. What they deny is that there are non-arbitrary and mutually exclusive categories into which humans can be resolved. This is, I think, the point being made by the “Race by Fingerprints” etc. rhetorical device cited earlier.

In other words, it may be possible for any particular phenotypic trait or genetic locus to be resolved into a strictly cladistic system but humans, being an amalgam of such traits and locii, defy such resoution. So while the study of human genetic variation does, indeed, have “instrumental utility” the concept of biological races is, itself, an arcahic relic.

As I noted below, the comment doesn’t make sense. Here is a PCA of world populations using 250,000 markers:

lotsofmarkers

The relationships between individuals is hypothesis-free. That is, the two largest components of variance in the data just happen to produce clusters which neatly map onto geographic realities. If you think about this a little weird, it makes total sense: populations share a history of intermarriage, so over time they will develop population-specific distinctiveness. It may be true that most of the variance is between populations, but it is not difficult at all to discriminate populations, or generate clusters which are not arbitrary as a function of geography or social identity.

There are relationships which do not match intuition. Or at least intuition as it crystallized during the period of the rise of modern taxonomic science. The various phenotypically “black” peoples of the world, Africans, Melanesians, and some South Asians, do not cluster together. Rather, all non-Africans are separated from Africans by the largest component of variance within the data set. The traits used to make inferences of taxonomy in “folk biology” and early scientific attempts to generate a systematic tree of life in relation to the human races were not necessarily representative of total genome variation, which captures the evolutionary history of a population with greater accuracy and precision.

And obviously you don’t need 250,000 markers, let alone all ~3 billion base pairs in the human genome, to distinguish on the level of continental races/populations. A paper in 2002 laid out the parameters. δ is a measure of between population difference on genes.

sig1
sig2

From the paper:

…we can estimate that about 120 unselected SNPs or 20 highly selected SNPs can distinguish group CA from NA, AA from AS and AA from NA. A few hundred random SNPs are required to separate CA from AA, CA from AS and AS from NA, or about 40 highly selected loci. STRP loci are more powerful and have higher effective δ values because they have multiple alleles. Table 3 reveals that fewer than 100 random STRPs, or about 30 highly selected loci, can distinguish the major racial groups. As expected, differentiating Caucasians and Hispanic Americans, who are admixed but mostly of Caucasian ancestry, is more difficult and requires a few hundred random STRPs or about 50 highly selected loci. These results also indicate that many hundreds of markers or more would be required to accurately differentiate more closely related groups, for example populations within the same racial category.

The paper was written in 2002. Since then much has changed. Here is an image from a post from last summer:

village1

People within European villages tend to be relatively closely related. Again, it is totally reasonable that given enough markers you could assign individuals to different villages with a high confidence. Concretely, person X may show up in the pedigree of individuals from village 1 ~100 times at a given generation, while the same person may show up in the pedigree of individuals from village 2 ~10 times at a given generation. This isn’t rocket science, the basic logic as to why populations shake out based on geography and endogamy patterns is pretty obvious when you think about it.

At about the same time as the above work, A. W. F. Edwards, a statistical geneticist, published a paper titled Lewontin’s Fallacy which took direct aim at the misunderstand of the human Fst statistic and its relevance for classification. Here is Edwards answering why he wrote the article in 2002 (my co-blogger at GNXP, David B, is doing the questioning):

4. Your recent article on ‘Lewontin’s Fallacy’ criticises the claim that human geographical races have no biological meaning. As the article itself points out, it could have been written at any time in the last 30 years. So why did it take so long – and have you had any reactions from Lewontin or his supporters? [David B's question -R]

I can only speak for myself as to why it took me so long. Others closer to the field will have to explain why the penny did not drop earlier, but the principal cause must be the huge gap in communication that exists between anthropology, especially social anthropology, on the one hand, and the humdrum world of population and statistical genetics on the other. When someone like Lewontin bridges the gap, bearing from genetics a message which the other side wants to hear, it spreads fast – on that side. But there was no feedback. Others might have noticed Lewontin’s 1972 paper but I had stopped working in human and population genetics in 1968 on moving to Cambridge because I could not get any support (so I settled down to writing books instead). In the 1990s I began to pick up the message about only 15% of human genetic variation being between, as opposed to within, populations with its non-sequitur that classification was nigh impossible, and started asking my population-genetics colleagues where it came from. Most had not heard of it, and those that had did not know its source. I regret now that in my paper I did not acknowledge the influence of my brother John, Professor of Genetics in Oxford, because he was independently worrying over the question, inventing the phrase ‘the death of phylogeny’ which spurred me on.

Eventually the argument turned up unchallenged in Nature and the New Scientist and I was able to locate its origin. I only started writing about it after lunch one day in Caius during which I had tried to explain the fallacy across the table to a chemist, a physicist, a physiologist and an experimental psychologist – all Fellows of the Royal Society – and found myself faltering. I like to write to clear my mind. Then I met Adam Wilkins, the editor of BioEssays, and he urged me to work my notes up into a paper.

I have had no adverse reaction to it at all, but plenty of plaudits from geneticists, many of whom told me that they too had been perplexed. Perhaps the communication gap is still too large, or just possibly the point has been taken. After all, Fisher made it in 1925 in Statistical Methods which was written for biologists so it is hardly new. [my emphasis -R]

Richard Dawkins repeated Edward’s argument in The Ancestor’s Tale. You can read Edward’s full essay online. Also see p-ter’s lucid exposition at GNXP.

discblogsSo far I’ve been talking mostly about genes. But in terms of classification there isn’t anything magical about genes. Biological anthropologists using more robust morphometric traits have discerned an “Out of Africa” movement, just as geneticists have. You have above five individuals. All of them have dark hair and dark eyes. There’s total overlap on those traits. And yet I’m pretty sure you can assign their rough population identity to each. Why? Because humans take a look at correlated clusters of traits in assigning population identity intuitively. Some traits are more salient, such as skin color, but early geographers understood that East Asians and Europeans were different populations despite similarity of light complexion. The ancient Greeks understood that Indians and Ethiopians were different groups despite their similar complexions, because they differed on other informative traits.

Let’s bring it back down to earth. Population structure exists. Phylogenetic analyses of humans are trivial in their difficulty. They track geography rather closely, at least before the age of mass migration. Additionally, they tend to follow endogamous social groups, such as Ashkenazi Jews. A South Asian is going to be more genetically related to a South Asian than they are to an African. There are many cosmetic differences between populations. But there are also less cosmetic differences which are very important. You can even assign different regions of a chromosome to different ancestral components.

Where does this leave us? Ultimately, it’s about the “R-word.” “Race is a myth.” Or, as PBS stated, an illusion. Here’s some of the precis of the PBS documentary:

Everyone can tell a Nubian from a Norwegian, so why not divide people into different races? That’s the question explored in “The Difference Between Us,” the first hour of the series. This episode shows that despite what we’ve always believed, the world’s peoples simply don’t come bundled into distinct biological groups. We begin by following a dozen students, including Black athletes and Asian string players, who sequence and compare their own DNA to see who is more genetically similar. The results surprise the students and the viewer, when they discover their closest genetic matches are as likely to be with people from other “races” as their own.

Much of the program is devoted to understanding why. We look at several scientific discoveries that illustrate why humans cannot be subdivided into races and how there isn’t a single characteristic, trait – or even one gene - that can be used to distinguish all members of one race from all members of another.

Modern humans – all of us – emerged in Africa about 150,000 to 200,000 years ago. Bands of humans began migrating out of Africa only about 70,000 years ago. As we spread across the globe, populations continually bumped into one another and mixed their mates and genes. As a species, we’re simply too young and too intermixed to have evolved into separate races or subspecies.

So what about the obvious physical differences we see between people? A closer look helps us understand patterns of human variation:

  • In a virtual “walk” from the equator to northern Europe, we see that visual characteristics vary gradually and continuously from one population to the next. There are no boundaries, so how can we draw a line between where one race ends and another begins?
  • We also learn that most traits – whether skin color, hair texture or blood group – are influenced by separate genes and thus inherited independently one from the other. Having one trait does not necessarily imply the existence of others. Racial profiling is as inaccurate on the genetic level as it is on the New Jersey Turnpike.
  • We also learn that many of our visual characteristics, like different skin colors, appear to have evolved recently, after we left Africa, but the traits we care about – intelligence, musical ability, physical aptitude – are much older, and thus common to all populations. Geneticists have discovered that 85% of all genetic variants can be found within any local population, regardless of whether they’re Poles, Hmong or Fulani. Skin color really is only skin deep. Beneath the skin, we are one of the most similar of all species.

Certainly a few gene forms are more common in some populations than others, such as those controlling skin color and inherited diseases like Tay Sachs and sickle cell. But are these markers of “race?” They reflect ancestry, but as our DNA experiment shows us, that’s not the same thing as race. The mutation that causes sickle cell, we learn, was passed on because it conferred resistance to malaria. It is found among people whose ancestors came from parts of the world where malaria was common: central and western Africa, Turkey, India, Greece, Sicily and even Portugal – but not southern Africa.

This documentary came out in 2003. In late 2005 scientists discovered the role that SLC24A5 plays in skin color. It is the second most ancestrally informative locus typed so far to differentiate Europeans and Africans. It actually does come close to being a single gene which differentiates two populations! It is true that human populations have mixed. I probably have ancestors who were resident in China and Northern Europe within the last 1,000 years. That’s the way genealogy works. All Eurasians may be able to find a genealogical line of ancestry back to Genghis Khan (though not necessarily distinctive genes attributable to him). But that does not negate the fact that some of your ancestors show up in your pedigree orders of magnitude more than others of your ancestors. The vast majority of my ancestors within the last 1,000 years were South Asian, though a substantial minority were Southeast Asian. The question of our youth as a species and its relation to our differentiation into races and subspecies is an empirical matter, not an a priori one determined by a fixed number of years. Since races and subspecies are fuzzy characteristics they’re easy to refute, just pick the definition which is refutable. I have no idea how they adduce that traits like intelligence, musical ability, and physical aptitude, are that much older than the “Out of Africa” migration. Humans have been getting much more gracile over the last 10,000 years as a whole, while I don’t know how one can know about the musical abilities of anatomically modern humans in Africa 200,000 years. These traits are quantitative, and based on standing genetic variation, so the architecture is qualitatively different from that of skin color (though in 2003 we didn’t know the architecture of skin color, the confusion is explainable).

The old concept of “race” as outlined by anthropologists in the early 20th century, and accepted broadly, was often unclear, ad hoc, and not empirical. Over the past generation by way of refuting the concept of race people are wont to make unclear, ad hoc, and non-empirical, assertions. The reason that scholars discuss race and refute it is to eliminate confusions and misconceptions from the public, but their presentation has produced more confusions and misconceptions. The idea that human phylogeny is impossible is in the air, I have heard it from many intelligent people. I have no idea why people would be skeptical of it, the way it is presented by many scholars makes the implication clear that phylogeny is impossible, that differences are trivial. Both these are false impressions. I do not believe that the fact that mixed-race people’s real problems obtaining organs with the appropriate tissue match is a trivial affair. Human genetic differences have plenty of concrete impacts which are not socially constructed.

Personally I have no problem with abandoning the word race and all the baggage which that entails. But there’s no reason to throw the baby out with the bathwater here. In the “post-genomic” era human population substructure is taken for granted. The outlines of the history of our species, and its various branches, are getting clearer and clearer. There’s no point in replacing old rubbish with new rubbish. We have the possibility for clear and useful thought, if we choose to grasp it.

(Republished from Discover/GNXP by permission of author or representative)
 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"