The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Population Substructure

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Pritchard, Jonathan K., Matthew Stephens, and Peter Donnelly. “Inference of population structure using multilocus genotype data.” Genetics 155.2 (2000): 945-959.

Before there was Structure there was just structure. By this, I mean that population substructure has always been. The question is how we as humans shall characterize and visualize it in a manner which imparts some measure of wisdom and enlightenment. A simple fashion in which we can assess population substructure is to visualize the genetic distances across individuals or populations on a two dimensional plot. Another way which is quite popular is to represent the distance on a neighbor joining tree, as on the left. As you can see this is not always satisfying: dense trees with too many tips are often almost impossible to interpret beyond the most trivial inferences (though there is an aesthetic beauty in their feathery topology!). And where graphical representations such as neighbor-joining trees and MDS plots remove too much relevant information, cluttered F STmatrices have the opposite problem. All the distance data is there in its glorious specific detail, but there’s very little Gestalt comprehension.

Rosenberg, Noah A., et al. “Genetic structure of human populations.” Science 298.5602 (2002): 2381-2385.

Into this confusing world stepped the Structure bar plot. When I say “Structure bar plot,” in 2013 I really mean the host of model-based clustering phylogenetic packages. Because it is faster I prefer Admixture. But Admixture is really just a twist on the basic rules of the game which Structure set. What you see to the right is one of the beautiful bar plots which have made their appearance regularly on this blog over the past half a decade or more. I’ve repeated what they do, and don’t mean, ad nauseum, though it doesn’t hurt to repeat oneself. What you see is how individuals from a range of human populations shake out at K = 6. More verbosely, assume that your pool of individuals can be thought of as an admixture to various proportions of six ancestral populations. Each line is an individual, and the proportional shading of each line and the specific color represents a particular K (for K = 6, population 1, 2, 3, 4, 5, 6).

This is when I should remind you that this does not mean that these individuals are actually combinations of six ancestral populations. When you think about it, that is common sense. Just because someone generates a bar plot with a given K, that does not mean that that bar plot makes any sense. I could set K = 666, for example. The results would be totally without value (evil even!), but, they would be results, because if you put garbage in, the algorithm will produce something (garbage). This is why I say that population structure is concrete and ineffable. We know that it is the outcome of real history which we can grasp intuitively. But how we generate a map of that structure for our visual delectation and quantitative precision is far more dicey and slippery.

To truly understand what’s going on it might be useful to review the original paper which presented Structure, Inference of Population Structure Using Multilocus Genotype Data. Though there are follow-ups, the guts of the package are laid out in this initial publication. Basically you have some data, multilocus genotypes. Since Structure debuted in 2000, this was before the era of hundreds-of-thousands-loci-SNP-chip data. Today the term multilocus sounds almost quaint. In 2000 the classical autosomal era was fading out, but people did still use RFLP s and what not. It is a testament to the robustness of the framework of Structure that it transitioned smoothly to the era of massive data sets. Roughly, the three major ingredients of Structure are the empirical genotype data, formal assumptions about population dynamics, and, powerful computational techniques to map between the two first two elements. In the language of the paper you have X, the genotypes of the individuals, Z, the populations, and P, the allele frequencies of the populations. They’re multi-dimensional vectors. That’s not as important here as the fact that you only have X. The real grunt work of Structure is generating a vector, Q, which defines the contributions to each individual from the set of ancestral populations. This is done via an MCMC, which explores the space of probabilities, given the data, and the priors which are baked into the cake of the package. Though some people seem to treat the details of the MCMC as a black-box, actually having some intuition about how it works is often useful when you want to shift from default settings (there are indeed people who run Structure who are not clear about what the burn-in is exactly). What’s going on ultimately is that in structured populations the genotypes are not in Hardy-Weinberg Equilibrium. Structure is attempting to find a solution which will result in populations in HWE.

This brings us to the question of how we make sense of the results and which K to select. If you run Structure you are probably iterating over many K values, and repeating the iteration multiple times. You will likely have to merge the outputs for replicates because they are going to vary using a different algorithm. But in any case, each iteration generates a likelihood (which derives from the probability of the data given the K value). The most intuitive way to “pick” an appropriate K is to simply wait until the likelihood begins to plateau. This means that the algorithm can’t squeeze more informative juice going up the K values.* This may seem dry and tedious, but it brings home exactly why you should not view any given K as natural or real in a deep sense. The selection of a K has less to do with reality, and more with instrumentality. If, for example your aim is to detect African ancestry in a worldwide population pool, then a low K will suffice, even if a higher K gives a better model fit (higher K values often take longer in the MCMC). In contrast if you want to discern much finer population clusters then it is prudent to go up to the most informative K, no matter how long that might take.

Today model-based clustering like Structure, frappe, and Admixture are part of the background furniture of the population genetic toolkit. There are now newer methods on the block. A package like TreeMix uses allele frequencies to transform the stale phylogram into a more informative set of graphs. Other frameworks do not rely on independent information locus after locus, but assimilate patterns across loci, generating ancestry tracts within individual genomes. Though some historical information can be inferred from Structure, it is often an ad hoc process which resembles reading tea leaves. Linkage disequilibrium methods have the advantage in that they explicitly explore historical processes in the genome. But with all that said, the Structure bar plot revolution of the aughts wrought a massive change, and what was once wondrous has become banal.

* The ad hoc Delta K statistic is very popular too. It combines the rate of change of the likelihoods and the variation across replicate runs.

🔊 Listen RSS

The Pith: New software which gives you a more fine-grained understanding of relationships between populations and individuals.

According to the reader survey >50 percent of you don’t know how to interpret PCA or model-based (e.g., ADMIXTURE) genetic plots, so I am a little hesitant to point to this new paper in PLoS Genetics, Inference of Population Structure using Dense Haplotype Data, as it extends the results of those earlier methods. But it’s an important paper, and at some point I’ll starting using their software. The “big picture” is that earlier methods left “some information on the table.” That’s partly due to the fact that they were developed (or in the case of PCA leveraged, as it’s a very general technique) in an era where very dense marker data sets were not available (today we’re shifting to full genome sequences in many cases!). The information left on the table would be haplotype structure. Genetic variation in a concrete form manifests as sequences along a line, many of them physically connected. These correlations of nearby variant markers represent haplotypes of great interest, because they are excellent clues to admixture or divergence events across populations. In contrast the older methods, were looking at variation from marker to marker, each in turn independently, which collapses some of the important genomic structure that we can now inspect (in fact, linkage disequilibrium due to these correlations can distort some of the results in the older methods, so you want to “thin” your marker set).

Let me make this concrete for you. On 23andMe you can see where your friends shake out on a PCA plot using the HGDP data set as a reference. What this means is that the HGDP data set is used to generate independent dimensions of genetic variation. As is the usual case in these analyses the largest dimension separates Africans from everyone else, and the second largest dimension separates Asians from Europeans and Africans. 23andMe customers are then projected upon this variation, so you can get a sense where you are positioned in the clusters. To the left is a zoom in on the section for Central/South Asians. You can see that one of my friends, highlighted with a green color, falls almost perfectly in the Uygur cluster. According to ancestry estimates my friend is 50 percent Asian and 50 percent European. The “representative” Uygur in the 23andMe chromosome painting gives about the same results. But these are total genome estimates. The historical nature of my friend’s admixture and that of the Uygur woman is very different, as one can see in the below figure.


My friend is to the right, and the Uygur woman is to the left. Why the big difference? My friend has an East Asian parent an a European parent. The Uygur woman is the product of a marriage between Uygurs, a population which is due to admixture betwen East Asians and Europeans one to two thousand years ago. Recombination has broken apart the perfect linkage between European and East Asian regions among the Uygurs. Obviously this isn’t the case with my friend, as recombination has had no time to generate alternative sequences of ancestry. This is critical information which genome-wide estimates displayed on PCA or ADMIXTURE will miss out on.

As for this particular paper and method, I want to point you to figure 5. The darker/bluish colors indicate higher conancestry estimates, and yellower colors lower ones. Red is in the middle. The diagonal tends to be blue/red because that represents populations’ correlations with themselves, which one would expect to be high. You can’t really read the labels, but I wanted to highlight the Italian and Sardinian blocks. Explanation below.

You can see an ADMIXTURE plot underneath the heat-map. What’s going on? Sardinians exhibit the hallmarks of an isolated population with smaller effective population which has undergone more genetic drift than Italians over the same amount of time. This is naturally one reason that they “break out” rather quickly in ADMIXTURE and PCA. You see this in South Asia with the Kalash, who often emerge as their own cluster rather quickly, and separate out in a PCA as well. This is simply a function of their isolation and lower effective population size. Most of the people who use ADMIXTURE and PCA know this, but those reading these plots do not. Without that knowledge one can make incorrect inferences. The methods outlined here in the paper allow one to visually observe immediately these trends, while keeping in place broader wold-wide correlations across populations in mind. This is a big step forward not only in data analysis, but result visualization.

If you are more interested in this topic, the first author has a comparison of the various tools up. Both Dienekes and Eurogenes are using the new software. Get the software at!

Citation: Lawson DJ, Hellenthal G, Myers S, Falush D (2012) Inference of Population Structure using Dense Haplotype Data. PLoS Genet 8(1): e1002453. doi:10.1371/journal.pgen.1002453

🔊 Listen RSS

Representatives of Szechuan and Shangdong cuisine

The Pith: The Han Chinese are genetically diverse, due to geographic scale of range, hybridization with other populations, and possibly local adaptation.

In the USA we often speak of “Chinese food.” This is rather peculiar because there isn’t any generic “Chinese cuisine.” Rather, there are regional cuisines, which share a broad family similarity. Similarly, American “Mexican food” and “Indian food” also have no true equivalent in Mexico or India (naturally the novel American culinary concoctions often exhibit biases in the regions from which they sample due to our preferences and connections; non-vegetarian Punjabi elements dominate over Udupi, while much authentic Mexican American food has a bias toward the northern states of that nation). But to a first approximation there is some sense in speaking of a general class of cuisine which exhibits a lot of internal structure and variation, so long as one understands that there is an important finer grain of categorization.

Some of the same applies to genetic categorizations. Consider two of the populations in the original HapMap, the Yoruba from Nigeria, and the Chinese from Beijing. There are ~30 million Yoruba, but over 1 billion Han Chinese! Even granting that the Yoruba seem excellent representatives of Sub-Saharan African genetic variation (not Bantu, but not far from the Bantu), there are still more Han Chinese than Sub-Saharan Africans (including the African Diaspora). So it’s nice that over the past few years there’s been a deep-dive into Han genetics. A new paper in the European Journal of Human Genetics focuses on the north-south difference among Han Chinese, using groups flanking them to their north and south as references, Natural positive selection and north–south genetic diversity in East Asia.

First, let’s back up for a moment. Who are the Han? Where did they come from? The details aren’t simple, insofar there wasn’t a “Han Homesteading Act” which pushed the frontiers of Chinese culture and civilization to a limit demarcated by a national boundary line. But overall the shift in Chinese society over the past ~3,000 years been outward from a northern focus to the south. 2,000 years ago China proper, the zone where dominant Han ethnic habitation overlapped with Chinese political hegemony, consisted primarily of the Yellow River plain. Though the Han Dynasty extended their empire south toward Vietnam the landscape was still predominantly non-Han outside of a few locales beyond the Yangtze. During the Han Dynasty even the Yangtze River basin was still somewhat liminal. This changed between the year 0 and 1000. The collapse of the Han Dynasty in the 3rd century led to what are sometimes termed the Chinese Dark Ages. During this period of political fragmentation much of northern China was dominated by barbarian dynasties, and Han political elites controlled the commanding heights only in the south. With the rise of the Tang in the 7th century the shift to the Yangtze River which had occurred in the interregnum solidified. Economically, demographically, and to some extent culturally, what during the Han Dynasty would have been defined as a zone of barbarian habitation, or marginal Han civilization, had become the center of gravity of the Sinic world by 1000. The domains of the Han by this period began to push far south of the Yangtze, and some of the most preeminent intellectuals came out of relatively isolated southern provinces such as Fujian, on the coast between the Yangtze and Pearl River deltas. In the next 1,000 years the Han spread through many sections of southern China which were previous redoubts of aboriginal peoples. Yunnan for example likely did not become majority Han until the past few centuries.

This poses a question: was this expansion of the Han a biological process, or a cultural one? It seems likely some of both. There are even customs particular to some Chinese dialect groups, such as the Cantonese, which may have a pre-Han origin. This amalgamation combined with the widespread geographic diversity of China is a perfect laboratory for evolutionary processes. In Plagues and Peoples William H. McNeill notes that demographic expansion by Han peasants (as opposed to military or bureaucratic outposts) into much of southern China during the early Imperial period was limited due to diseases. One presumes that transforming the landscape would have some mitigating effect on the power of pestilence, but admixture and selection may also have allowed the biologically inoculated Han to occupy areas which were previously no-go.

Here’s the abstract of the paper:

Recent reports have identified a north–south cline in genetic variation in East and South-East Asia, but these studies have not formally explored the basis of these clinical differences. Understanding the origins of these variations may provide valuable insights in tracking down the functional variants in genomic regions identified by genetic association studies. Here we investigate the genetic basis of these differences with genome-wide data from the HapMap, the Human Genome Diversity Project and the Singapore Genome Variation Project. We implemented four bioinformatic measures to discover genomic regions that are considerably differentiated either between two Han Chinese populations in the north and south of China, or across 22 populations in East and South-East Asia. These measures prioritized genomic stretches with: (i) regional differences in the allelic spectrum for SNPs common to the two Han Chinese populations; (ii) differential evidence of positive selection between the two populations as quantified by integrated haplotype score (iHS) and cross-population extended haplotype homozygosity (XP-EHH); (iii) significant correlation between allele frequencies and geographical latitudes of the 22 populations. We also explored the extent of linkage disequilibrium variations in these regions, which is important in combining genetic association studies from North and South Chinese. Two of the regions that emerged are found in HLA class I and II, suggesting that the HLA imputation panel from the HapMap may not be directly applicable to every Chinese sample. This has important implications to autoimmune studies that plan to impute the classical HLA alleles to fine map the SNP association signals.

The authors do not focus on phylogenetic relationships and the historical inferences one can make from them much. For example they don’t posit any complex migration scenario to explain the pattern of genetic substructure in China today. Instead the spotlight is on differences in allele frequencies which seem outside of the normal expectation, and so might have been targets of selection. To frame that appropriately in a phylogenetic context they pooled a wide range of data sets together (HGDP, HapMap, SVGP) and generated a PCA which illustrates the relationships of East Asian populations on a two dimensional plot. The figure is rather hard to make out because of similarities in color coding, but the basic result is shown to the left. You see a north-south axis within China, and some separation from groups to the north and south. Interestingly some Chinese ethnic minorities are within the range of variation of the Han. There are many reasons this could be. They might have been already nested within the original Han range of variation before the demographic expansion of the latter. There could have been extensive gene flow between the Han and minorities, in particular in the direction of the latter if the Han were far more numerous. And of course many Han dialect groups could simply be culturally assimilated minorities if you go back far enough. A combination of these with various weights in different contexts is certainly the best approximation to what occurred. Pure replacement and pure cultural diffusion seems untenable as a robust explanation. Additionally, the best check for the relationship between Han and minorities is to look for the differences within the same province. So Han from Yunnan should be cross-referenced with ethnic minorities from the same locale, instead of Han from Guangdong being proxies for “South Chinese.” I suspect that the gap between the Dai and the southern Chinese is partially an artifact of undersampling Han from those particular isolated regions of China where they live cheek-by-jowl with Dai.

But the rationale for this paper was to shine a light on the effects of natural selection on the Han genome and possible adaptations, not the systematics of East Asian human populations. As noted in the abstract they used several methods to get at this issue. They looked to see the correlation between allele frequencies and latitude. The logic presumably being that latitude is correlated with climate and other geographical parameters which serve as environmental selection pressures. All things equal northern climes for example will have fewer pathogens and parasites. Consider the value of a frost season in killing many surface soil organisms. Second they also looked at differences in Fst between Han of the north and Han of the south. Fst is a measure of between population genetic differences. As it converges upon zero there’s basically no difference between the populations in question, while a value of 1.0 would indicate that all the variation is partitioned across the two groups so that you could use a marker to perfectly distinguish membership in a population for an individual. The authors had an average difference between north and south Han in mind, and looked for genomic regions where the differences were far greater than expectation. They also looked at the contribution of a given SNP to the variation you saw illustrated in the PCA. Big contributions to the inter-population variation obviously indicate differences across populations. Finally, they also looked at haplotype structure as a signature of natural selection. While Fst focuses on specific points in the genome, haplotype structure elucidates patterns across genes, sequences of markers. Natural selection tends to homogenize genomic regions temporarily as a particular variant rises in frequency and drags along its neighbors in a selective sweep hitchhike. The two methods they used have different powers to detect selective events; iHS is better at catching sweeps in mid-stream, where allele frequencies are not fixed. XP-EHH on the other hand picks up nearly completed sweeps. These two methods complement each other and rely on similar logic. Again, like Fst the authors focused on regions of the genome which were at the tails of the expected distribution given pairs of populations with the genetic distances which one sees across the total genome.

What did they find? Here’s a table which shows you some genes:

MAF latitude cor FST(CHB vs CHS) XP-EHH iHS (CHB) iHS (CHS) SNP loadings Genes
2.1 × 10−5(rs6901084) 0.50% 0.5% (positive) 0.01% 0.01% 0.10% HLA-DRB1, HLA-DQA1-2, HLA-DOB, PSMB9, BRD2, TAP2, PSMB8, TAP1, HLA-DMB, HLA-DMA, HLA-DOA
2.0 × 10−4(rs4489283) No evidence 0.5% (positive) 0.50% 0.50% 0.10% NRG1
6.6 × 10−5(rs2370969) No evidence 0.1% (negative) 0.50% 0.10% 0.10% WDR48, GORASP1, TTC21A, AXUD1, CMYA1, CX3CR1, CCR8, SLC25A38, LAMR1, MOBP
9.3 × 10−4(rs6762261) No evidence No evidence 0.10% 0.50% 0.50% EPHB1
9.5 × 10−4(rs986148) No evidence 0.1% (positive) 0.10% NA

The first thing that jumps out at me is HLA. These genes are involved in immune response, and are extremely polymorphic. If you’re going to see regional differences correlated with ecology, this is where you’d look. The expansion of the Han to the south of China was probably accompanied by changes in the type of immunological portfolio which was the norm among the peasants. It isn’t in this table, but other genes found at the intersection of tests are LPP and ADH. The former has been implicated in celiac disease, while the latter is an alcohol dehydrogenase locus. When it comes to natural selection disease matters a lot, but so does digestion. I don’t have a good explanation for the patterns here, but there are differences in cuisine within China. Rice is dominant in the center and south, while wheat and millet dominate the north. I would be interesting to know if there are also variations in alcohol production and consumption. China is in many ways equivalent to Europe, and there are differences between north and south in ADH and cultural norms in the amount and nature of alcohol consumption. Finally you have something like NRG1, which seems to be a locus of neurological function. This doesn’t exhibit difference across the two Han classes, but seems to have been the target of natural selection within the overall population. Perhaps the social norms of the culture and society of Han China reshaped the personality profiles of the population?

Going back to the analogy with cuisine: like food the components and elements of genetic variation are shaped by different forces. Modern Italian cuisine for example has a dependence upon the basic elements which were common in Italy 2,000 years ago (e.g., olive oil), but it has changed a great deal with the Columbian Exchange (e.g., tomatoes). Descent shapes the possibilities of future culinary options by fixing some constraints and preferences (traditional Jewish food is light on shellfish!). But over time new variants can arise and alter the original base. Additionally, there are local adaptations. The Cajuns are descended from Acadians, from the maritime provinces of Canada. Obviously spicy crayfish concoctions were not part of their original culinary portfolio, but they had to make due with the options that they had in their new ecology. There’s a strong correlation between warmer climes and spice, probably having to do with the anti-bacterial properties of many of these non-nutritious additives. (from what I know South Indian and South Chinese cuisines are both much spicier than North Indian and North Chinese fare). Within any broad family of cuisines one must acknowledge both the unity and diversity. And the same applies within a cultural-genetic macro-region on the scale of China.

Image credit: Rolf Muller

🔊 Listen RSS

Some have asked what the point is in poking around African population structure when Tishkoff et al. and Henn et al. have done such a good job in terms of coverage. First, it is nice to run your own analyses so you can slice & dice to your preference, and not rely on the constrained menu provided by others. There’s value in home cooking; you can flavor to your taste. Second, you never know what data people might leave on your doorstep. I’ve received the genotypes of three Somalis. Nothing too surprising, a touch more Cushitic than the Ethiopians in Behar et al., but interesting nonetheless.

Also, you can see how ADMIXTURE tends to come to weird conclusions in certain circumstances. Below is a K = 12 run ~50,000 SNPs. I’ve included in a few Behar et al. and HGDP populations to the Henn et al. set, as well as pruned a lot of the African groups which seem redundant in terms of information. I’ve added a few geographically informative labels as well.

Observe below that there is a Fulani cluster. I think this is pretty much an artifact. At K = 7 the Fulani have a majority component which is modal in West Africa & Bantu speakers, and a minority component which is identical to the one modal in Mozabite Berbers from Algeria. The Mozabites reside in the far northern Sahara, and their modal component drops off as one goes east toward western Asia and the eastern Mediterranean. I suspect that what is showing up in ADMIXTURE is the ancient hybridization of the Fulani, and perhaps their demographic expansion from this core group. We have some glimmers of the prehistory of the Fulani, and no expectation for them to be such a distinctive cluster, so I naturally jump to these inferences. But it does make me reconsider the nature of the “Sandawe,” “Mbuti” or “San” clusters in ADMIXTURE. These populations are culturally distinctive in deep ways from their neighbors, so a reflexive inference one might make is that they’re “pure” ancient substrate groups which have been overlain and marginalized by their Bantu neighbors. But their prehistory is far murkier than the Fulani because of their geographical isolation, so there is far less to go on. These “ancient” isolated groups themselves may have gone through the same sort of distinctive recent ethnogenesis processes which we presume occurred with the Fulani (also, in the plot below the Biaka are pure; but in most of the bar plots they have a minor element which they share with their neighbors, probably due to greater admixture and interaction between western Pygmies and their Bantu neighbors than among the easter ones).

OK, now let’s prune some of the “pure” and extraneous populations. Additionally, I’ll remove some of the K’s. So the proportions are going to be recalculated with a new base. So, keep in mind that the South African Bantus show elevated West African in part because the Khoisan proportion was removed, inflating the percentages for all the other elements.

Now let’s look at the pairwise Fst values between inferred populations. Remember, this measures the proportion of genetic variance which can be attributed to between population differences. The bigger the value, the larger the genetic distance. I’ll given the inferred populations labels, but don’t take that too seriously.

Fst divergences between estimated populations:
Fulani San Euro Maya Nilotic Biaka W African SW Asian Sandawe Mbuti Mozabite Bantu
Fulani 0.00 0.19 0.15 0.26 0.11 0.13 0.09 0.14 0.10 0.18 0.12 0.10
San 0.19 0.00 0.27 0.37 0.16 0.11 0.13 0.25 0.13 0.13 0.23 0.13
European 0.15 0.27 0.00 0.18 0.17 0.22 0.19 0.05 0.15 0.26 0.06 0.19
Maya 0.26 0.37 0.18 0.00 0.27 0.31 0.28 0.19 0.25 0.36 0.20 0.28
Nilotic 0.11 0.16 0.17 0.27 0.00 0.10 0.07 0.17 0.08 0.14 0.13 0.07
Biaka 0.13 0.11 0.22 0.31 0.10 0.00 0.07 0.21 0.09 0.09 0.18 0.07
W African 0.09 0.13 0.19 0.28 0.07 0.07 0.00 0.17 0.07 0.12 0.14 0.05
SW Asian 0.14 0.25 0.05 0.19 0.17 0.21 0.17 0.00 0.14 0.25 0.06 0.18
Sandawe 0.10 0.13 0.15 0.25 0.08 0.09 0.07 0.14 0.00 0.13 0.12 0.07
Mbuti 0.18 0.13 0.26 0.36 0.14 0.09 0.12 0.25 0.13 0.00 0.22 0.12
Mozabite 0.12 0.23 0.06 0.20 0.13 0.18 0.14 0.06 0.12 0.22 0.00 0.14
Bantu 0.10 0.13 0.19 0.28 0.07 0.07 0.05 0.18 0.07 0.12 0.14 0.00

Here’s the genetic distance between non-African groups and African ones on a bar plot .

Some consistent trends:

– Mbuti and Khoisan show the largest distance from non-Africans.

– Biaka are next. Again, this may be due to admixture between Biaka and neighboring groups, or, a closer relationship between the Biaka Pygmies and the non-Khoisan/Mbuti African groups with reference to the last common ancestors.

– Roughly equal distance of Bantus and West Africans.

– Marginally smaller distances between the Nilotic cluster and non-Africans.

– Finally, a consistently smaller difference between non-Africans and the Sandawe cluster.

As always we need to remember that these probably aren’t pure concrete real ancestral groups. I have no hesitation in presuming some low level consistent gene flow over time between the western Mediterranean groups of which Mozabites are part and some of the Nilotic populations in north-central Africa. This equilibration of gene frequencies would reduce the Fst value naturally. Second, the relative closeness of the Sandawe cluster jumped out at me initially when I looked at the African data. It just strikes me as weird.

Here’s Wikipedia on the Sandawe:

The Sandawe are an agricultural ethnic group based in the Kondoa district of Dodoma Region in central Tanzania. In 2000 the Sandawe population was estimated to number 40,000.

The Sandawe language is a tonal language with clicks, apparently related to the Khoe languages of southern Africa. Recent research suggests that the ancestors of the Khoe were pastoralists, and migrated into southern Africa from the northeast, perhaps from the region of the modern Sandawe.

But the Sandawe don’t seem to be that close to the South African Bushmen samples. Here’s a multidimensional scaling of the Fst relationships of selected inferred ancestral African groups (weight the x-axis more):

An aspect of PCA plots which always jumps out you is the gap between African groups and non-African ones, often spanned by populations which have likely recent admixture. One hypothesis to explain this is that there’s been little gene flow between Africa and the rest of the world since the Out of Africa event. Probably due to ecology (the Sahara). But here’s another explanation: the Bantu expansion has wiped clean much of the genetic variation of central and eastern Africa, the very variation which might span in part the African vs. non-African gap. The archaeology and anthropology indicate that both the groups currently dominant in much of eastern Africa and down to the south, the Bantu and Nilotic peoples, are intrusive on the scale of the past 3,000 years. So groups like the Hadza and the Sandawe are presumed to be relics of the older cultural and genetic variation. This may be why the Sandawe are closer to Eurasians than other African groups once you control for clear likely admixture (e.g., the Fulani). Or, it may be that the Sandawe themselves have an older admixture event due to back-migration from Eurasia….

Finally, let me leave you with a bunch of MDS plots which visualize the Fst differences.

🔊 Listen RSS

Zack has started exploring the K’s of his merged data set for HAP. A commenter suggests that:

As you have begun interpreting the reference results, let me make a friendly warning: you have to keep in mind that most of the reference populations of ethnic groups are extremely limited in sample size (with only between 2 and 25 individuals) and from very obscure sources, and you should keep away from drawing conclusions about millions of people based on such limited number of individuals.

This seems a rather reasonable caution. But I don’t think such a vague piece of advice really adds any value. These sorts of caveats are contingent upon:

– The scope of the question being asked (i.e., how fine a grain is the variation you are attempting to measure going to be)

– The sample size

– The representativeness

– The thickness of the marker set (10 autosomal markers vs. 500,000 SNPs)

This isn’t a qualitative issue, easily to divide into “right” and “wrong.” Sometimes an N = 1 is very insightful. That’s why the whole genome of one Bushman was very useful. In fact, the whole genome of any random Sub-Saharan African, and the whole genome of any random non-African (this means ancestry from before 1500 in those regions), is going to reflect clearly the differences between these two broad population sets in terms of genomic variation. Subsequent addition of individuals to generate a larger sample would be very informative of course, and allow us to answer many more questions. But the point is that even small sample sizes can answer properly framed queries.

Another issue is representativeness. The HGDP data set was biased at the outset toward more isolated and distinctive groups. There was a belief that many of these groups were going to disappear within a generation, and their genetic uniqueness should be recorded (this seems to have been correct). So apparently the clusters generated from HGDP are “cleaner” in their separation than those from the POPRES sample, which is derived from a more cosmopolitan urban set of populations. We also have the HapMap sample, and some of the ones Zack has merged into HGDP and HapMap (there are likely other public data sets, Zack was looking for those with South Asians).

After 10 years of results generated from these data sets I think we have some idea of the errors and baises introduced because of skewed representativeness and small sample size (HapMap has a thicker marker set, but HGDP has a better population coverage). In other words, we should have some intuition of where to be careful, and where not to be. For example, small tribal groups are likely to exhibit genetic distinctiveness (as well as cultural isolates, like the Roma) due to low longer term effective population size. On the other hand, if you have a set of distinct tribal groups, one presumes that the common patterns would reflect broad macro-regional genetic variation. In Zack’s combined data set he has a South Indian tribe and a Pakistani one (I mean Kalash, I understand Pathans and Baloch are tribal people, but they’re expansive and heterogeneous). Any common element between these two groups in relation to Iranians is presumably not a coincidence. Random genetic drift usually results in different allele frequencies between populations, so genetic commonalities between different isolates probably reflect common ancestry.

The main point I’m trying to make is that we’re beyond the point of generic cautions. Rather, there are specific pitfalls which we need to be cognizant of. So if you know specific ethnographic details, that is useful. If there are statistical tricks and tips, that is also useful (larger sample sizes exhibit diminishing returns in statistical power). Also, one needs to keep in mind ascertainment bias, the current generation of SNP chips are tuned to European polymorphisms, so they might miss out on the loci where other populations are polymorphic, but Europeans are not.

By analogy, unsecured credit can be problematic. Yes, I think we knew that. The key is to identify those with the means and ability to use credit responsibly. The tools and data are now available to the masses. A big “BE CAREFUL” sticker is not helpful. What is helpful are concrete and specific pointers.

For what it’s worth, I found Zack’s bar plot hard to read, so here is one I generated with larger labels (K = 6):

Yesterday Zack gave me a personal vector: 66, 1, 4, 10, 14, 0, 4, 0, 0, 3. If you’ve been reading my posts I think you know how to interpret that….

🔊 Listen RSS

mtDNA haplogroup G1a2

The pith: In this post I examine the most recent results from 23andMe for my family in the context of familial and regional (Bengal) history. I also use these results to offer up a framework for the ethnognesis of the eastern Bengali people within the last 1,000 years, and their relationship to other South Asian and Southeast Asian populations.

Since I received my 23andMe results last May I’ve been blogging about it a fair amount. In a recent post I inferred that perhaps I had a recent ancestor who was an ethnic Burman or some related group. My reasoning was that this explained a pattern of elevated matches on chromosomal segments with populations from southwest China in the HGDP data set. But now we have more than my genome to go on. This week I got the first V3 chip results from a sibling. And finally, yesterday the results from my parents came in. One thing that I immediately found interesting was my father’s mtDNA haplogroup assignment, G1a2. This came from his maternal grandmother, and as you can see it has a distribution which is mostly outside of South Asia. In case you care, I asked my father her background, and like my patrilineage she was a “Khan,” though an unrelated one (“Khan” is just an honorific). I received these results before the total genome assessment, and so initially assumed this confirmed my hunch that my father had some unknown recent ancestry of “eastern” provenance. But it turns out my hunch is probably wrong. In fact, my parents have about the same “eastern” proportion, with my mother slightly more! My expectation was that perhaps my mother would be around 25-30% “Asian,” and my father above 50%. The reality turns out that my father is 38%, and my mother 40%.

Image credit: f_mafra

Below are the “Ancestry Paintings” generated by 23andMe for my family (so far). What you see are the 22 non-sex chromosomes, which have two copies each, and assignments to “Asian,” “European,” and “African,” ancestry groups. The reference populations to generate these assignments come from the HapMap, the northern European sample of white Americans from Utah, Chinese from Beijing, Japanese from Tokyo, and ethnic Yoruba from Nigeria. What the assignment to one of these classes denotes is that that region of the genome is closest to that category in identity. It does not imply that your recent ancestry is European or Asian (African is probably a different matter, but there are many complaints about the results for African Americans and East Africans in the 23andMe forums). This caveat is especially important for South Asians, because we generally find that we’re ~75% European and ~25% Asian. All that means is that though most of our genetic affinity is with Europeans, a smaller fraction seems to resemble Asians more. Via “gene sharing” on 23andMe I can see that the Asian fraction varies from ~35% in South India and Sri Lanka, to ~10% in Pakistan and Punjab. This is not because South Indians have more East Asian ancestry than Punjabis. Rather, to a great extent the South Asian genome can be decomposed into two ancestral elements, one with a distant, but closer, affinity to populations of eastern Eurasia, and one with a close affinity to populations of western Eurasia. What some have termed “Ancient South Indians” (ASI) and “Ancient North Indians” (ANI). ASI ancestry, which is probably just a touch under 50% in South Asians overall, seems to shake out then as somewhat more Asian than European.* The fraction of ASI increases as one moves south and east in South Asia (and as one moves down the caste status ladder).

[zenphotopress album=249 sort=sorder_order number=4]

First, I want to note that I’ll be using abbreviations for my family members now and then (this applies to future posts). My father will be RF, my mother will be RM, and my siblings will be RS, with a number to denote which sibling. So currently we have RS1. As you can see in a gestalt sense we resemble each other a great deal as a family. We’re about 40% Asian, and 60% European. The extent of fragmentation indicates that we’re not that recent of an admixture; otherwise, the Asian and European fragments would cluster on one strand or the other. Some have suggested that my mother does exhibit less fragmentation. A hypothesis for why this may be is that her maternal grandfather was reputedly from a family of Middle Eastern origin who had resettled in South Asia, first in Delhi, and later in southeast Bengal (specifically, the district of Noakhali). Since he presumably would hardly have had any Asian ancestry according to 23andMe’s algorithm the homologs inherited from him would be overwhelmingly European, with only one generation of recombination intervening.

To assess probabilities of the plausibility of various hypotheses to explain the pattern of the results you need all the non-genomic information. Above is a map of British India. I’ve pointed to the region of Bengal from which my family comes. Of my great-grandparents 7 out of 8 were born in Comilla (which is actually a greater expanse to the southeast of Dhaka than the current Bangladeshi administrative division). 1 grandparent was born in Noakhali, which is just to the southeast of Comilla. 4 out of 8 great-grandparents were born within 5 miles of the town of Chandpur (RF’s grandparents). 3 out of 4 great-grandparents were born within 5 miles of the village of Homna (RM’s grandparents). These two locations are about 30 miles from each other as the crow flies, though transport between them would have been by water in an earlier era (Homna is on the Meghna river, which is actually a more substantial body of water than the Ganges by the time the latter reaches Bangaldesh). This region is bounded on the west by the Padma river, which narrows at Chandpur to about 2 miles in width (average depth ~1,000 feet). To the east is the Indian state of Tripura. This is a relatively porous border, defined on the map, not imposed by geography. You can see that in some regions the Bangaldesh-India border here in the east actually bisects rice paddies.

Tripuri children

Today Tripura state is majority ethnic Bengali due to mass migration of Hindus from what was East Bengal during the 20th century (and later East Pakistan, and now Bangladesh). But its indigenous people are the Tripuri, a tribe whose native language is clearly Tibeto-Burman, and physical type points to their connection with populations to the north and east. At the same time, ~90% of the Tripuri are Hindus, and during the period of Islamic rule in South Asia the rajahs of Tripura styled themselves defenders of Hindu civilization (just as the Tibeto-Burman Ahoms of Assam did). As such, linguistically and genetically the native people of Tripura exhibit a sharp contrast to the Indo-Aryan peoples of the Gangetic Plain, of whom the Bengalis are the easternmost representatives along with the Assamese. But, they have also long been part of the South Asian cultural scene, and can not longer be viewed as purely intrusive (their oral history indicates that they arrived before the Muslims, for one).

Finally, in regards to the detailed backgrounds of my 8 great-grandparents, 2 were of the Khan class. 1 was from a family of Hindu Thakurs who were recently converted to Islam. Another was of the family name Sarkar. 1 was likely from a family of Middle Eastern transplants to South Asia, at least in part. The 4 remaining great-grandparents were Bengali Muslims, with no particular background information beyond that known by my parents.

I gave you all this because genetic variation is strongly conditioned upon geographical and cultural parameters. Water barriers seem to have been particular efficacious in the pre-modern period dividing people culturally and genetically (though ironically water was also a precondition for any bulk trade). Language is also another major parameter of difference. And finally, there is religion. In the last section I would not be surprised if 300 years ago the majority of my ancestors in that generation were Hindus; there is some fluidity in this obviously. I provide the data on radius of place of birth because we know from European results that even villages exhibit genetic clustering. This is mitigated in my family because my father has a diverse background among his grandparents as far as community goes, while my mother has a grandparent who was from a different district, and to a great extent a different ethnic group in biological terms.

When I initially saw that I was ~40% Asian I was little taken aback by the high proportion (remember, the average South Asian is about 25% Asian), but there were two parsimonious explanations, a) I had a lot of ASI, b) I had ancestry which did not seem South Asian as such, but was genuinely from East Asia. To ascertain whether it was the former I began proactively gene sharing with a wide range of South Asians on 23andMe. After dozens of individuals it became clear that I was outside of the normal interval of variation. I was more Asian than individuals from South India or Sri Lanka. Additionally, even these individuals tended to be genetically closest to Central South/Asians in the HGDP data set. I was closest to East Asians. Also, on the two dimensional PCA projected onto Central South/Asians I was definitely outside of the cluster of all the other South Asians. Finally, I did find someone who broke the magic 35% barrier of Asian…and that individual was a Bangladeshi, at 38%. And, like me, he was closer to East Asians on the basic “Global Similarity” match. He also carried a Y chromosomal lineage which was rare in South Asia and common among the Hmong. Finally, when Dienekes started his Dodecad Ancestry Project it was clear that about ~15% of my ancestry clustered with an element which was not South Asian, but East Asian. If one removes this fraction, I would be about 70% European and 30% Asian, absolutely within the normal range for someone with ancestry to the east or south of the subcontinent.

If you’ve read up to this point, you may be wondering how it is that my father is 38% Asian and my mother is 40% Asian, and I’m 43% Asian. After all, shouldn’t I be an average between the two? Actually, on the PCA scatter plot I am (along with my sibling) exactly between my parents (you can’t see the offspring because the flags are just too large). So why the difference? First, remember that the PCA is projecting you onto a two dimensional axis where the x and y represent the two biggest components of variance in the data set. In other words, it’s yanking out the subset of genetic variance which really stands out in terms of between population difference. This is how an individual who is a first generation Eurasian can be so far from their parents on this plot, but still exhibit a great deal of identity by state in terms of total genome; there’s a lot of variation that the two dimensional plot does not capture (e.g., private variants to family lineages). The Ancestry Painting estimates are different; they’re looking across the whole genome and making assessments for each region as to its genetic affinity between the three reference populations. So to repeat, you have over 50 reference populations vs. 3, and, you have a small proportion of the total genetic variation, vs. the whole genome. Both methods are reporting real and valid results, but they’re somewhat different.

So there are two very simple and methodological explanations for the discrepancy above which I can think of. I’m on V2, while my parents and sibling are on V3. I know this has made a difference in other measurements. Additionally, there’s clearly some “noise” within this algorithm, resulting in people with trace African or Asian ancestry which isn’t real, even if you take into account the kludgey nature of the reference populations. But let’s take the results at face value. With the ancestry painting, recall how the European and Asian components were chunky across the genome? Both of my parents received half their genomes from their parents. My own chromosomes are a mosaic of those of my grandparents. Some of the original linkage between genomic regions because of their physical location on the same strand have been broke apart by recombination in the two generations downstream from my grandparents. Concretely, two instances of meiosis which produced sex cells. Therefore, some of the associations of alleles present in my grandparents have been transformed within me. But even without recombination, it is clear that one homologous chromosome could be more European or Asian than the total genome average. Because only one of these is passed to any given offspring, there is going to be variance from sibling to sibling. Genetics is not a pure blending process. That may be why I am 43% Asian while RS1 is 40% Asian. We’re both sampling from our parents genes, and there’s going be variance in that process (on the chromosomal level you have 22 autosomal draws from each parent where each draw has two outcomes).

An interesting implication of this is that the grandchildren of a multiracial couple will exhibit variance in their ancestral quanta from major racial groups. This is one reason why it is a fallacy to presume that intermarriage will result in the washing away of biological diversity. And processes such as assortative mating could even presumably extract out “pure” individuals from an originally admixed random-mating population.

With all that said, I now believe that with an N = 3 from eastern Bengal that I am not an exception with recent Southeast Asian ancestry, but rather eastern Bengal is part of the gene frequency cline between South Asia and Southeast Asia, and as such has a substantial fraction of eastern ancestry. Zack has my parents’ data, so once the results come back from the first runs of HAP I believe that he will see the same pattern of substantial non-South Asian ancestry in them that Dienekes found in me. The cline here is still sharp. The average Bangladeshi is probably interchangeable with just 10-20% with the average Burmese when it comes to proportions in inference of ancestral quanta algorithms. (remember that the Burmese probably have a small South Asian component too). In contrast, the average Bangladeshi probably can be interchangeable at 80-90% with a resident of Bihar (the closest match in total SNP comparison in 23andMe that I’m sharing with is a Bihar, not the two other ethnic Bengalis). This is clearly a function of geography, the north-south ranges in Burma seal it off from South Asia. In contrast, there are open plains from northern Bangladesh to Bihar. In some ways Burma has more cultural affinity and connection with peninsular South Asia because of the ease of maritime travel. The prevalence of Theravada Buddhism in Burma is a testament to the association of the lower Irrawaddy region with Sri Lanka.

Back to Bangladesh. One aspect of the Indian subcontinent in terms of religious demography is that the heart of Indo-Islam, the Delhi area, never had a Muslim majority. Rather, Muslims were a majority along the northwestern and northeastern fringe (along with a few other districts, such as northern Kerala). The predominance of Islam on the northwest isn’t that surprising, as that region borders upon the Dar-al-Islam proper. But what about Bengal? In the late 19th century the British were apparently surprised that in the united Bengal (which includes roughly the modern state of West Bengal in India, and Bangladesh) had a Muslim majority. Because of differential birth rates and conversion (this second includes sections of my family as I note above) about 2 out of 3 ethnic Bengalis alive today are Muslim, with the balance being Hindu. Bangladesh is estimated to be 90% Muslim, while West Bengal is 25% Muslim. Even today after generations of Hindu outmigration one pattern within Bangladesh is the relative concentration of Hindus to the west and north (also, Hindus in Bangladesh tend to be urban). The “buckle” of the “Koran belt” in Bangladesh is actually the district of Noakhali, on the southeast fringe of Bengal. My mother’s maternal grandfather, who came from a lineage of pirs who had originally settled in the Muslim heartland in Delhi, was from Noakhali. It is apparently said that in Noakhali even the Hindus know proper Islamic forms!**

An explanation for this pattern is that the religious influence and power of Hindu elites declined as a direct function of distance from the regions of West Bengal, which were closer to the core Aryavarata, and had traditionally been the locus of power of Hindu dynasties before the rise of Islam. Additionally, Bengal was the last region of the mainland subcontinent with a robust Buddhist society during the flowering of the Pala Empire around the year 1000. It is therefore suggested that many Bengali Muslims were converted directly from Buddhism, not Hinduism (there remains even today a small minority of ethnic Bengali Buddhists, who carry the surname “Baura.” This is in distinction to the descendants of Tibeto-Burman people who now speak Bengali, but retain a tribal identity and Theravada Buddhist religion). Also, it may be that eastern Bengal was populated mostly by animist tribes before the arrival of Muslims, and just as European colonial powers were more successful in Asia at spreading their religion among marginalized people (e.g., tribal peoples in northeast India and Southeast Asia are often Christian), so Islam found purchase among those outside of the Hindu caste system.

These models are broadly persuasive to me. But, I still am suspicious that there was such a strong disjunction in the depth of Hindu institutions in western vs. eastern Bengal; after all, the kings of Tripura to the east were Hindu when Islam was new in South Asia. If being tribal and marginal to the core Hindu civilization was one of the grounds for susceptibility to Islam it is peculiar that it is precisely many tribal people in modern Bangladesh who are not Muslim. Indeed, the Tibeto-Burman populations nearer to Indian groups in eastern South Asia are Hindu or Buddhist, not Muslim (those further in the hinterlands were not integrated into any South Asian religion, but converted to Christianity by Western missionaries within the last century).

Instead, I find the model espoused in The Rise of Islam and the Bengal Frontier, 1204-1760 broadly plausible as a complement, or even substitute, to the above hypotheses. Additionally, it has the utility of making sense of the genetic data which I have presented here so far. The author argues that eastern Bengal, most of Bangladesh, was very lightly populated before the conquest of Bengal by Muslims in the 13th century. During the modern era the western region of Bengal, in India, has tended to have issues with the moribund nature of many of the water courses. But one thousand years ago this region was more active in terms of sedimentation, while eastern Bengal was a wilderness. Over the centuries there has been a shift of large rivers to the east, opening up that area to cultivation because of improved transport. Additionally, the arrival of Muslims also resulted in the spread of new techniques of land clearing and settlement. The rough model is that eastern Bengal is in fact a relatively newly settled territory in terms of its current demographic density. As the clearance and settlement operations were performed by Muslim elites, many of the peasants who settled these lands were either Muslim, or more likely, adopted the religion of their landlords. Because of the virgin nature of the territory these original settlers entered into a phase of massive demographic expansion, to the point where eastern Bengal (Bangladesh) is now today twice as populous as western Bengal (West Bengal). The key here is that there need not be a massive conversion of the enormous masses of marginal animist, Hindu or Buddhist peasants. Rather, all one needs is a modest number of converted Bengali peasants to enter into exponential population growth until the land is “filled.” (interestingly, one sees similar patterns between descendant populations in both the USA and among Koreans. The religions in the “core” homelands are very different in constitution from the Diaspora)

I find this persuasive for two major reasons. First, Peter Bellwood’s First Farmers documents the difficulties of populations which have not been engaged in intensive farming to switch to that modality. At least back to the Mughal period Bengal was a densely settled land from which one could extract massive rents simply due to aggregate productivity. Today a united Bengal would have a population of 240 million, making it the fourth most populous nation in the world, below the USA, and just above Indonesia. In hindsight I find it less likely that the peasants of eastern Bengal descend from tribal peoples who had been practicing extensive agriculture, but were introduced to new techniques, than that western populations already habituated to the grinding expectations of intensive farming colonized the “empty” lands (in fact, Bengali peasants migrate to Assam in part because of the perception of land surplus there, even though Assam has 30 million inhabitants). But this initial phase of colonization would entail relatively few peasants, and probably exhibit some male bias. Therefore, this can to explain a substantial fraction of the eastern ancestry among Bangladeshis, as in the first generations the Bengali peasants did assimilate the native tribal peoples of the region, whether it to be the Munda Santhals or Tibeto-Burman relatives of the Tripura. With the massive numbers of ethnic Bengalis in comparison to Tibeto-Burman groups it seems one would need a great deal of gene flow in any model which posited that exchange between these two groups over long periods of time explain the high fractions that one finds of non-South Asian ancestry. In all of India there are only 10 million speakers of Tibeto-Burman languages, vs. the 240 million speakers of Bengali alone in the Indian subcontinent.

Where does this leave us? From what I gather you’ll probably not make it into the first round of results for HAP, but if you have 23andMe results and haven’t it sent it to Zack, and want to learn more about the historical genetics of the Indian subcontinent, you can still get involved! With my parents Zack now has an N = 2 of Bengalis. It would be nice to get more. We still need samples from North-Central India. The number of Punjabis is in the 5-10 range, Tamils is around 5. Enough to make inferences, but certainly not robust enough to bet the house on. In the near future I’ll get results from my other siblings, and I’ve decided to “upgrade” to the V3 chip. Once that comes in I’ll phase some of the results, and probably start comparing myself to my siblings, “phase” the results, etc.

* Native Americans, descendants of pre-Columbian Americans, have the inverted results from South Asians, mostly Asian with a European minority. This is not just due to recent European admixture. Rather, though Amerindians have affinities to East Asians, the two groups have been distinct for at least 10,000 years, and probably considerably longer.

** Also, some have stated that the people of Noakhali are sly and cunning, adept at following the letter of the law, but not the spirit. I only know this because when I was young one of my father’s friends, also from Bangladesh, complained that a mutual acquaintance from Noakhali who made much of his piety (he put his wife in purdah when she arrived from Bangladesh) requested that someone else purchase a pornographic magazine for him. His reasoning was that he did not want to be seen purchasing the magazine. It was a sin to purchase such an item for a good Muslim. Later my father and his friend (who was from northern Bangladesh for what it’s worth) commiserated that such was the way of the people of Noakhali, amongst whom you have to have your wits about you lest they exploit some angle for their own self-interest. The pious-porn-non-purchaser was notorious for being a non or late payer of rent when he was a lodger with other Bangladeshis, always emphasizing his religious piety as surety of final payment of the debt. He also eventually finagled a loophole in the immigration law of the time, obtaining green card with relative ease and no necessity of sponsorship. The proper connotation of how people from Noakhali are is probably captured by the American English word slick.

🔊 Listen RSS

snpskinIn my post below I quoted my interview L. L. Cavalli-Sforza because I think it gets to the heart of some confusions which have emerged since the finding that most variation on any given locus is found within populations, rather than between them. The standard figure is that 85% of genetic variance is within continental races, and 15% is between them. You can see some Fst values on Wikipedia to get an intuition. Concretely, at a given locus X in population 1 the frequency of allele A may be 40%, while in population 2 it may be 45%. Obviously the populations differ, but the small difference is not going to be very informative of population substructure when most of the difference is within populations.

But there are loci which are much more informative. Interestingly, one controls variation on a trait which you are familiar with, skin color (unless you happen to lack vision). A large fraction (on the order of 25-40%) of the between population variance in the complexion of Africans and Europeans can be predicted by substitution on one SNP in the gene SLC24A5. The substitution has a major phenotypic effect, and, exhibits a great deal of between population variation. One variant is nearly fixed in Europeans, and another is nearly fixed in Africans. In other words the component of genetic variance on this trait that is between population is nearly 100%, not 15%. This illustrates that the 15% value was an average across the genome, and in fact there are significant differences on the genetic level which can be ancestrally informative. You can take this to the next level: increase the number of ancestrally informative markers to obtain a fine-grained picture of population structure. In the illustration above the top panel shows the frequencies at the SNP mentioned earlier on SLC24A5. The second panel shows variation at another SNP controlling skin color, SLC45A2. This second SNP is useful in separating South and Central Asians from Europeans and Middle Easterners, if not perfectly so. In other words, the more markers you have, the better your resolution of inter-population difference. This is why I found the following comment very interesting:

Razib’s final concession (that genetic variation exists) is revealing because I think that’s as far as the argument can really be taken. It’s a bit of a strawman, in that people who argue that race is entirely a social construct don’t actually deny that human genetic variation exists. What they deny is that there are non-arbitrary and mutually exclusive categories into which humans can be resolved. This is, I think, the point being made by the “Race by Fingerprints” etc. rhetorical device cited earlier.

In other words, it may be possible for any particular phenotypic trait or genetic locus to be resolved into a strictly cladistic system but humans, being an amalgam of such traits and locii, defy such resoution. So while the study of human genetic variation does, indeed, have “instrumental utility” the concept of biological races is, itself, an arcahic relic.

As I noted below, the comment doesn’t make sense. Here is a PCA of world populations using 250,000 markers:


The relationships between individuals is hypothesis-free. That is, the two largest components of variance in the data just happen to produce clusters which neatly map onto geographic realities. If you think about this a little weird, it makes total sense: populations share a history of intermarriage, so over time they will develop population-specific distinctiveness. It may be true that most of the variance is between populations, but it is not difficult at all to discriminate populations, or generate clusters which are not arbitrary as a function of geography or social identity.

There are relationships which do not match intuition. Or at least intuition as it crystallized during the period of the rise of modern taxonomic science. The various phenotypically “black” peoples of the world, Africans, Melanesians, and some South Asians, do not cluster together. Rather, all non-Africans are separated from Africans by the largest component of variance within the data set. The traits used to make inferences of taxonomy in “folk biology” and early scientific attempts to generate a systematic tree of life in relation to the human races were not necessarily representative of total genome variation, which captures the evolutionary history of a population with greater accuracy and precision.

And obviously you don’t need 250,000 markers, let alone all ~3 billion base pairs in the human genome, to distinguish on the level of continental races/populations. A paper in 2002 laid out the parameters. δ is a measure of between population difference on genes.


From the paper:

…we can estimate that about 120 unselected SNPs or 20 highly selected SNPs can distinguish group CA from NA, AA from AS and AA from NA. A few hundred random SNPs are required to separate CA from AA, CA from AS and AS from NA, or about 40 highly selected loci. STRP loci are more powerful and have higher effective δ values because they have multiple alleles. Table 3 reveals that fewer than 100 random STRPs, or about 30 highly selected loci, can distinguish the major racial groups. As expected, differentiating Caucasians and Hispanic Americans, who are admixed but mostly of Caucasian ancestry, is more difficult and requires a few hundred random STRPs or about 50 highly selected loci. These results also indicate that many hundreds of markers or more would be required to accurately differentiate more closely related groups, for example populations within the same racial category.

The paper was written in 2002. Since then much has changed. Here is an image from a post from last summer:


People within European villages tend to be relatively closely related. Again, it is totally reasonable that given enough markers you could assign individuals to different villages with a high confidence. Concretely, person X may show up in the pedigree of individuals from village 1 ~100 times at a given generation, while the same person may show up in the pedigree of individuals from village 2 ~10 times at a given generation. This isn’t rocket science, the basic logic as to why populations shake out based on geography and endogamy patterns is pretty obvious when you think about it.

At about the same time as the above work, A. W. F. Edwards, a statistical geneticist, published a paper titled Lewontin’s Fallacy which took direct aim at the misunderstand of the human Fst statistic and its relevance for classification. Here is Edwards answering why he wrote the article in 2002 (my co-blogger at GNXP, David B, is doing the questioning):

4. Your recent article on ‘Lewontin’s Fallacy’ criticises the claim that human geographical races have no biological meaning. As the article itself points out, it could have been written at any time in the last 30 years. So why did it take so long – and have you had any reactions from Lewontin or his supporters? [David B’s question -R]

I can only speak for myself as to why it took me so long. Others closer to the field will have to explain why the penny did not drop earlier, but the principal cause must be the huge gap in communication that exists between anthropology, especially social anthropology, on the one hand, and the humdrum world of population and statistical genetics on the other. When someone like Lewontin bridges the gap, bearing from genetics a message which the other side wants to hear, it spreads fast – on that side. But there was no feedback. Others might have noticed Lewontin’s 1972 paper but I had stopped working in human and population genetics in 1968 on moving to Cambridge because I could not get any support (so I settled down to writing books instead). In the 1990s I began to pick up the message about only 15% of human genetic variation being between, as opposed to within, populations with its non-sequitur that classification was nigh impossible, and started asking my population-genetics colleagues where it came from. Most had not heard of it, and those that had did not know its source. I regret now that in my paper I did not acknowledge the influence of my brother John, Professor of Genetics in Oxford, because he was independently worrying over the question, inventing the phrase ‘the death of phylogeny’ which spurred me on.

Eventually the argument turned up unchallenged in Nature and the New Scientist and I was able to locate its origin. I only started writing about it after lunch one day in Caius during which I had tried to explain the fallacy across the table to a chemist, a physicist, a physiologist and an experimental psychologist – all Fellows of the Royal Society – and found myself faltering. I like to write to clear my mind. Then I met Adam Wilkins, the editor of BioEssays, and he urged me to work my notes up into a paper.

I have had no adverse reaction to it at all, but plenty of plaudits from geneticists, many of whom told me that they too had been perplexed. Perhaps the communication gap is still too large, or just possibly the point has been taken. After all, Fisher made it in 1925 in Statistical Methods which was written for biologists so it is hardly new. [my emphasis -R]

Richard Dawkins repeated Edward’s argument in The Ancestor’s Tale. You can read Edward’s full essay online. Also see p-ter’s lucid exposition at GNXP.

discblogsSo far I’ve been talking mostly about genes. But in terms of classification there isn’t anything magical about genes. Biological anthropologists using more robust morphometric traits have discerned an “Out of Africa” movement, just as geneticists have. You have above five individuals. All of them have dark hair and dark eyes. There’s total overlap on those traits. And yet I’m pretty sure you can assign their rough population identity to each. Why? Because humans take a look at correlated clusters of traits in assigning population identity intuitively. Some traits are more salient, such as skin color, but early geographers understood that East Asians and Europeans were different populations despite similarity of light complexion. The ancient Greeks understood that Indians and Ethiopians were different groups despite their similar complexions, because they differed on other informative traits.

Let’s bring it back down to earth. Population structure exists. Phylogenetic analyses of humans are trivial in their difficulty. They track geography rather closely, at least before the age of mass migration. Additionally, they tend to follow endogamous social groups, such as Ashkenazi Jews. A South Asian is going to be more genetically related to a South Asian than they are to an African. There are many cosmetic differences between populations. But there are also less cosmetic differences which are very important. You can even assign different regions of a chromosome to different ancestral components.

Where does this leave us? Ultimately, it’s about the “R-word.” “Race is a myth.” Or, as PBS stated, an illusion. Here’s some of the precis of the PBS documentary:

Everyone can tell a Nubian from a Norwegian, so why not divide people into different races? That’s the question explored in “The Difference Between Us,” the first hour of the series. This episode shows that despite what we’ve always believed, the world’s peoples simply don’t come bundled into distinct biological groups. We begin by following a dozen students, including Black athletes and Asian string players, who sequence and compare their own DNA to see who is more genetically similar. The results surprise the students and the viewer, when they discover their closest genetic matches are as likely to be with people from other “races” as their own.

Much of the program is devoted to understanding why. We look at several scientific discoveries that illustrate why humans cannot be subdivided into races and how there isn’t a single characteristic, trait – or even one gene – that can be used to distinguish all members of one race from all members of another.

Modern humans – all of us – emerged in Africa about 150,000 to 200,000 years ago. Bands of humans began migrating out of Africa only about 70,000 years ago. As we spread across the globe, populations continually bumped into one another and mixed their mates and genes. As a species, we’re simply too young and too intermixed to have evolved into separate races or subspecies.

So what about the obvious physical differences we see between people? A closer look helps us understand patterns of human variation:

  • In a virtual “walk” from the equator to northern Europe, we see that visual characteristics vary gradually and continuously from one population to the next. There are no boundaries, so how can we draw a line between where one race ends and another begins?
  • We also learn that most traits – whether skin color, hair texture or blood group – are influenced by separate genes and thus inherited independently one from the other. Having one trait does not necessarily imply the existence of others. Racial profiling is as inaccurate on the genetic level as it is on the New Jersey Turnpike.
  • We also learn that many of our visual characteristics, like different skin colors, appear to have evolved recently, after we left Africa, but the traits we care about – intelligence, musical ability, physical aptitude – are much older, and thus common to all populations. Geneticists have discovered that 85% of all genetic variants can be found within any local population, regardless of whether they’re Poles, Hmong or Fulani. Skin color really is only skin deep. Beneath the skin, we are one of the most similar of all species.

Certainly a few gene forms are more common in some populations than others, such as those controlling skin color and inherited diseases like Tay Sachs and sickle cell. But are these markers of “race?” They reflect ancestry, but as our DNA experiment shows us, that’s not the same thing as race. The mutation that causes sickle cell, we learn, was passed on because it conferred resistance to malaria. It is found among people whose ancestors came from parts of the world where malaria was common: central and western Africa, Turkey, India, Greece, Sicily and even Portugal – but not southern Africa.

This documentary came out in 2003. In late 2005 scientists discovered the role that SLC24A5 plays in skin color. It is the second most ancestrally informative locus typed so far to differentiate Europeans and Africans. It actually does come close to being a single gene which differentiates two populations! It is true that human populations have mixed. I probably have ancestors who were resident in China and Northern Europe within the last 1,000 years. That’s the way genealogy works. All Eurasians may be able to find a genealogical line of ancestry back to Genghis Khan (though not necessarily distinctive genes attributable to him). But that does not negate the fact that some of your ancestors show up in your pedigree orders of magnitude more than others of your ancestors. The vast majority of my ancestors within the last 1,000 years were South Asian, though a substantial minority were Southeast Asian. The question of our youth as a species and its relation to our differentiation into races and subspecies is an empirical matter, not an a priori one determined by a fixed number of years. Since races and subspecies are fuzzy characteristics they’re easy to refute, just pick the definition which is refutable. I have no idea how they adduce that traits like intelligence, musical ability, and physical aptitude, are that much older than the “Out of Africa” migration. Humans have been getting much more gracile over the last 10,000 years as a whole, while I don’t know how one can know about the musical abilities of anatomically modern humans in Africa 200,000 years. These traits are quantitative, and based on standing genetic variation, so the architecture is qualitatively different from that of skin color (though in 2003 we didn’t know the architecture of skin color, the confusion is explainable).

The old concept of “race” as outlined by anthropologists in the early 20th century, and accepted broadly, was often unclear, ad hoc, and not empirical. Over the past generation by way of refuting the concept of race people are wont to make unclear, ad hoc, and non-empirical, assertions. The reason that scholars discuss race and refute it is to eliminate confusions and misconceptions from the public, but their presentation has produced more confusions and misconceptions. The idea that human phylogeny is impossible is in the air, I have heard it from many intelligent people. I have no idea why people would be skeptical of it, the way it is presented by many scholars makes the implication clear that phylogeny is impossible, that differences are trivial. Both these are false impressions. I do not believe that the fact that mixed-race people’s real problems obtaining organs with the appropriate tissue match is a trivial affair. Human genetic differences have plenty of concrete impacts which are not socially constructed.

Personally I have no problem with abandoning the word race and all the baggage which that entails. But there’s no reason to throw the baby out with the bathwater here. In the “post-genomic” era human population substructure is taken for granted. The outlines of the history of our species, and its various branches, are getting clearer and clearer. There’s no point in replacing old rubbish with new rubbish. We have the possibility for clear and useful thought, if we choose to grasp it.

🔊 Listen RSS

After linking to Marnie Dunsmore’s blog on the Neolithic expansion, and reading Peter Bellwood’s First Farmers, I’ve been thinking a bit on how we might integrate some models of the rise and spread of agriculture with the new genomic findings. Bellwood’s thesis basically seems to be that the contemporary world pattern of expansive macro-language families (e.g., Indo-European, Sino-Tibetan, Afro-Asiatic, etc.) are shadows of the rapid demographic expansions in prehistory of farmers. In particular, hoe-farmers rapidly pushing into virgin lands. First Farmers was published in 2005, and so it had access mostly to mtDNA and Y chromosomal studies. Today we have a richer data set, from hundreds of thousands of markers per person, to mtDNA and Y chromosomal results from ancient DNA. I would argue that the new findings tend to reinforce the plausibility of Bellwood’s thesis somewhat.

The primary datum I want to enter into the record in this post, which was news to me, is this: the island of Cyprus seems to have been first settled (at least in anything but trivial numbers) by Neolithic populations from mainland Southwest Asia.* In fact, the first farmers in Cyprus perfectly replicated the physical culture of the nearby mainland in toto. This implies that the genetic heritage of modern Cypriots is probably attributable in the whole to expansions of farmers from Southwest Asia. With this in mind let’s look at Dienekes’ Dodecad results at K = 10 for Eurasian populations (I’ve reedited a bit):


Modern Cypriots exhibit genetic signatures which shake out into three putative ancestral groups. West Asian, which is modal in the Caucasus region. South European, modal in Sardinia. And Southwest Asian, which is modal in the Arabian peninsula. Cypriots basically look like Syrians, but with less Southwest Asian, more balance between West Asian and South European, and far less of the minor components of ancestry.

Just because an island was settled by one group of farmers, it does not mean that subsequent invasions or migrations could not have an impact. The indigenous tribes of Taiwan seem to be the original agriculturalists of that island, and after their settlement there were thousands of years of gradual and continuous cultural change in situ. But within the last 300 years settlers from Fujian on the Chinese mainland have demographically overwhelmed the native Taiwanese peoples.

During the Bronze Age it seems Cyprus was part of the Near East political and cultural system. The notional kings of Cyprus had close diplomatic relations with the pharaohs of Egypt. But between the end of the Bronze Age and the Classical Age Cyprus became part of the Greek cultural zone. Despite centuries of Latin and Ottoman rule, it has remained so, albeit with a prominent Turkish minority.

One thing notable about Cyprus, and which distinguishes it from mainland Greece, is the near total absence of a Northern European ancestral component. Therefore we can make the banal inference that Northern Europeans were not initially associated with the demographic expansions of farmers from the Middle East. Rather, I want to focus on the West Asian and Southern European ancestral components. One model for the re-population of Europe after the last Ice Age is that hunter-gatherers expanded from the peninsular “refugia” of Iberia and Italy, later being overlain by expansions of farmers from the Middle East, and perhaps Indo-Europeans from the Pontic steppe. I have a sneaking suspicion though that what we’re seeing among Mediterranean populations are several waves of expansion out of the Near East. I now would offer the tentative hypothesis that the South European ancestral element at K = 10 is a signature of the first wave of farmers which issued out of the Near East. The West Asians were a subsequent wave. I assume that the two groups must correlate to some sort of cultural or technological shift, though I have no hypothesis as to that.

From the above assertions, it is clear that I believe modern Sardinians are descendants of that first wave of farmers, unaffected by later demographic perturbations. I believe that Basques then are a people who emerge from an amalgamation of the same wave of seafaring agriculturalists with the indigenous populations preceding them (the indigenes were likely the descendants of a broad group of northern Eurasians who expanded after the end of the last Ice Age from the aforementioned refugia). They leap-frogged across fertile regions of the Mediterranean and pushed up valleys of southern France, and out of the Straits of Gibraltar. Interestingly, the Basque lack the West Asian minority element evident in Dienekes’ Spaniards, Portuguese, as well as the HGDP French (even up to K = 15 they don’t shake out as anything but a two way admixture, while the Sardinians show a minor West Asian component). Also, the West Asian and Southern European elements are several times more well represented proportionally among Scandinavians than Finns. The Southern European element is not found among the Uyghur, though the Northern European and West Asian one is. I infer from all these patterns that the Southern European element derived from pre-Indo-European farmers who pushed west from the Near East. It is the second largest component across much of the Northwestern Europe, the largest across much of Southern European, including Greece.

A second issue which First Farmers clarified are differences between the spread of agriculture from the Near East to Europe and South Asia. It seems that the spread of agriculture across South Asia was more gradual, or least had a longer pause, than in Europe. A clear West Asian transplanted culture arrived in what is today Pakistan ~9,000 years ago. But it does not seem that the Neolithic arrived to the far south of India until ~4,000 years ago. I think that a period of “incubation” in the northwest part of the subcontinent explains the putative hybridization between “Ancient North Indians” and “Ancient South Indians” described in Reconstructing Indian population history. The high proportion of “Ancestral North Indian,” on the order of ~40%, as well as Y chromosomal markers such as R1a1a, among South Indian tribal populations, is a function of the fact that these groups are themselves secondary amalgamations between shifting cultivators expanding from the Northwest along with local resident hunter-gatherer groups which were related to the ASI which the original West Asian agriculturalists encountered and assimilated in ancient Pakistan (Pathans are ~25% ASI). I believe that the Dravidian languages arrived from the Northwest to the south of India only within the last 4-5,000 with the farmers (some of whom may have reverted to facultative hunter-gathering, as is common among tribals). This relatively late arrival of Dravidian speaking groups explains why Sri Lanka has an Indo-European presence to my mind; the island was probably only lightly settled by farming Dravidian speakers, if at all, allowing Indo-European speakers from Gujarat and Sindh to leap-frog and quickly replace the native Veddas, who were hunter-gatherers.

Note: Here is K = 15.

* Wikipedia says there were hunter-gatherers, but even here the numbers were likely very small.

🔊 Listen RSS

genmap3One of the more popular posts on this weblog (going by StumbleUpon and search engine referrers) focuses on genetic variation in Europe as a function of geography. In some ways the results are common sense; populations closer to each other are more genetically related. Why not? Historically people have married their neighbors and so gene flow is often well modeled as isolation by distance. The scientific rationale for these studies is to smoke out population stratification in medical genetics research programs which attempt to find associations between genes and particular diseases. By population stratification I mean the fact that different populations will naturally have different gene frequencies, and if those populations exhibit different frequencies of the disease/trait under investigation then one may have to deal with spurious correlations. If, for example, your study population includes many people of African and European descent, presumably cautious researchers would immediately by aware of this problem and attempt to take it into account. But what about populations which are genetically closer, or whose genetic difference may not be so well manifest in physical characteristics which might clue you in to the issue of stratification? That’s why the sorts of results which might seem common sense in the aggregate are useful. One can ask questions as to the genetic closeness of Irish and English, or Irish and Spanish, in a rigorous sense. In the United States research programs which are constrained to white cases and controls may hide population stratification because of the ethnic diversity of the American population. A primary motivation for studies of Jewish genetics are the cluster of “Jewish diseases” which are common within that population. In our age it is fashionable to focus on what binds us together as a species, but genetic differences matter a great deal. Ask the parents of multiracial children who require bone marrow transplants.

A new paper in Human Heredity examines a large sample of five European populations, and goes over the between population allele frequency differences with a fine tooth comb. Genetic Differences between Five European Populations:

We sought to examine the magnitude of the differences in SNP allele frequencies between five European populations (Scotland, Ireland, Sweden, Bulgaria and Portugal) and to identify the loci with the greatest differences…We found 40,593 SNPs which are genome-wide significantly…The largest differences clustered in gene ontology categories for immunity and pigmentation. Some of the top loci span genes that have already been reported as highly stratified: genes for hair color and pigmentation (HERC2, EXOC2, IRF4), the LCT gene, genes involved in NAD metabolism, and in immunity (HLA and the Toll-like receptor genes TLR10, TLR1, TLR6). However, several genes have not previously been reported as stratified within European populations, indicating that they might also have provided selective advantages: several zinc finger genes, two genes involved in glutathione synthesis or function, and most intriguingly, FOXP2, implicated in speech development. Conclusion: Our analysis demonstrates that many SNPs show genome-wide significant differences within European populations and the magnitude of the differences correlate with the geographical distance. At least some of these differences are due to the selective advantage of polymorphisms within these loci

They looked at ~350,000 SNPs across the five populations. The sample sizes were pretty large: 1,129 individuals from Bulgaria, 1,142 from Ireland, 656 from Scotland, 620 from Sweden, and 563 from Portugal. In the supplements they had a figure where they displayed the genetic variation on the two largest principal components for their sample and color-coded by region of origin. Next to this they transposed the PCA onto a map of Europe.


This confirms previous findings that the largest component of variation in Europe is north-south (at least evaluating to the west of a particular geographical cutoff), with a secondary east-west dimension. But the focus of the paper wasn’t really phylogenetic relationships between the populations as such, but the patterns of genetic differences across them. Table 1 shows the population to population differences in SNPs. Rescaled here means that the results were rescaled for sample size, which differed between populations, along with the value after a Bonferroni correction.


The pairwise differences are what you’d expect from the PCA. Most of the between population difference is probably due to history; populations random walk into their own gene frequencies through isolation by distance. But there’s more to the story than that, as is clear in table 2.


As noted by the authors genes in specific categories or classes are overrepresented among those with large between population differences. In particular, they focus on genes related to immune function and pigmentation. The reason for variation on the former is relatively straightforward, research on patterns of natural selection in the human genome have long pinpointed loci implicated in immune function as having been particularly shaped by this evolutionary genetic parameter, no doubt because disease resistance has a major impact on reproductive fitness. Additionally, it seems likely that immune related function is constantly being buffeted by selection because of the prominence of frequency dependent dynamics. As for pigmentation, it has also shown up as a major target of natural selection in many of the more recent papers, and it’s a trait whose genetic architecture we have a reasonably good grasp of now. They also found that the NAD synthetase 1 gene was stratified. They note that this impacts metabolism and has been found to have a relationship to the disease pellagra. Loci related to diet also seem to be disproportionately affected by natural selection, and that stands to reason as the shift to agriculture was relatively recent and many populations may still be going through transients (e.g., gluten sensitivity). The densities and diets of European populations even today vary a great deal. Italy is about an order of magnitude more dense in population than Sweden, and this has likely been the case for many millennia due to differences in primary agricultural productivity. Finally, the authors observe that FOXP2 is also stratified. This is the famous “language gene,” which regularly makes press every few years. The short of it is that FOXP2 seems to be involved in complex vocalization, and been subject to selection in tetrapod lineages where vocal ability is pronounced (birds, humans, etc.). They don’t make much of the variation in the paper, but it seemed warranted to note that the gene had popped up in their tests.

The authors freely admit that their findings are provisional:

Our paper focuses on the top 11 loci and suggests plausible mechanisms for most of them. However, the total number of genome-wide significant SNPs is 150,000 and the top hits clustered in several GO categories. We cannot judge which ones are due to the effects of selection or to other mechanisms. We present a full list of genes with the best and median p values for SNPs within them (separately for the full sample and for controls only), so that others can make use of this information in future studies…

Citation: Moskvina V, Smith M, Ivanov D, Blackwood D, Stclair D, Hultman C, Toncheva D, Gill M, Corvin A, O’Dushlaine C, Morris DW, Wray NR, Sullivan P, Pato C, Pato MT, Sklar P, Purcell S, Holmans P, O’Donovan MC, Owen MJ, & Kirov G (2010). Genetic Differences between Five European Populations. Human heredity, 70 (2), 141-149 PMID: 20616560

🔊 Listen RSS

A follow up to the post below, see John Hawks, Selection’s genome-wide effect on population differentiation and p-ter’s Natural selection and recombination. As I said, it’s a dense paper, and I didn’t touch on many issues.

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"