The Unz Review - Mobile

The Unz Review: An Alternative Media Selection

A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media

Email This Page to Someone

 Remember My Information

 Gene Expression Blog

Graham Coop’s group has been exploring the implications of more complex models of spatial structured genetic variation and admixture for the last few years. I’ve already pointed Gideon Bradburd’s SpaceMix preprint, which attempts to differentiate genetic relatedness due to geographic proximity and therefore continuous gene flow, as opposed to an admixture event which is not congruous with spatial position (e.g., the Norwegian Sami have more Siberian than many groups to their east). Alisa Sedghifar now has a paper out in Genetics, The Spatial Mixing of Genomes in Secondary Contact Zones. Here’s the abstract:

Recent genomic studies have highlighted the important role of admixture in shaping genome-wide patterns of diversity. Past admixture leaves a population genomic signature of linkage disequilibrium (LD), reflecting the mixing of parental chromosomes by segregation and recombination. These patterns of LD can be used to infer the timing of admixture, but the results of inference can depend strongly on the assumed demographic model. Here, we introduce a theoretical framework for modeling patterns of LD in a geographic contact zone where two differentiated populations have come into contact and are mixing by diffusive local migration. Assuming that this secondary contact is recent enough that genetic drift can be ignored, we derive expressions for the expected LD and admixture tract lengths across geographic space as a function of the age of the contact zone and the dispersal distance of individuals. We develop an approach to infer age of contact zones using population genomic data from multiple spatially sampled populations by fitting our model to the decay of LD with recombination distance. To demonstrate an application of our model, we use our approach to explore the fit of a geographic contact zone model to three human genomic datasets from populations in Indonesia, Central Asia and India and compare our results to inference under different demographic models. We obtain substantially different results to the commonly used model of panmictic admixture, highlighting the sensitivity of admixture timing results to the choice of demographic model.

k10064 In a stylized fashion what’s going on here is that genome-wide data sets have allowed for the inference of admixture events which usually assume a single pulse of rapid random mating between two extremely diverse populations. This works in a controlled laboratory situation, but is less plausible for humans. There are cases which fit, such as the settlement of Pitcairn by the mutineers from the Bounty, but they’re exceptional (another case might be the admixture you see in some areas of Latin America from Amerindians, where the indigenous groups seem to have disappeared after a few generations, but it turns out that native women were assimilated into the European and African populations in a very short period of time). An alternative scenario is one where two populations come into contact, and admixture takes a longer period of time. In a spatial rendering there’d be a “contact zone” where gene flow might occur in fashion well modeled as a diffusion process. To give a concrete example of the latter case I will offer the Kalmyk people. The Estonian Biocentre has posted some data from this population, and all of them have varying levels of European admixture. As there is variance it is likely that this admixture did not happen all at once. Rather, once the Kalmyks migrated to Russia three hundred years ago there has been continuous gene flow into the community, as opposed to a frenzy of admixture, after which barriers might be thrown up. The latter scenario actually might be likely to occur in a case where only male Kalmyks migrated, but as it was the population it was a full folk wandering, where the tribes evacuated Dzungaria as a whole (I am aware that there were also back migrations, please don’t leave a comment explaining this to me!).


Citation: Moreno-Estrada, Andrés, et al. “Reconstructing the population genetic history of the Caribbean.” (2013): e1003925.

So what happens after an admixture event? As noted in this paper assuming a simple pulse admixture the distribution of ancestry tract lengths and LD decay is exponential. This is a function of the fact that recombination is going to break apart ancestral multi-locus allelic associations as a function of generation time. As an extreme example, the F1 offspring of two very different populations would have alternative ancestry tracts on their paternal and maternal chromosomes. Obviously LD would be very high as well. But as the F1 population randomly mates the LD would be broken apart by recombination, as ancestry tracts would begin to alternate on chromosomal segments. You can see it when you perform ancestry deconvolution on groups such as Puerto Ricans. There are short segments due to old Native American ancestry which entered the population over a narrow period of time which has been chopped up by recombination. In contrast, the African segments have a wider range of block lengths in part because there has been more continuous admixture since the settlement of the island by the Spaniards.

Sedghifar et al. building an analytical framework to allow one to make inferences which are hopefully true to the more multi-textured manner in which populations actually admix than the single pulse. As the paper is open access I invite readers to peruse the formalism as well as the simulations which were performed to evaluate their framework. It strikes me that this is a definite first-pass, but a necessary one. As noted in the paper, but well known for years, the single pulse admixture models tend to underestimate the dates of mixing (or, more charitably, they pick up the last “pulse”). So, often when I saw a paper giving an admixture estimate, I took that as a floor, and nothing more.

In the final section the framework is applied to real data sets. There are two issues that jump out into the foreground. As noted by the authors, the HUGO Pan-Asian data set, which is what you need to use for many maritime Southeast Asian groups, has very few markers. At ~50,000 SNPs it’s really an animal grade set of chip data, not human grade one (and even for animals they’re going beyond 60K SNP-chips). The second issue is geographical coverage. It strikes me that ideally they’d have transects that with more sampling by position. This obviously isn’t something that can be changed right now, so I assume that in the future the situation will improve on the data side and the methods can more robustly be applied.

They compared admixture in India, Southeast Asia, and Central Asia. It seems that their framework did not yield much in India, probably because the admixture patterns are complex and old, and could not be easily retrieved from the data with the few assumptions they had (though that in itself tells you something about the real dynamics). This is really a situation where hopefully ancient DNA will allow researchers to fix some parameters in the future. There are cases where compound pulse admixtures are actually a better model for reality than contact zones and diffusion gene flow across the borders. India may be an instance. For example, the Tamil Brahmins seem to have some indigenous South Indian admixture, but very little variation of this admixture across individuals. That implies that once the admixture occurred, there was a long period where gene flow did not occur due to strict endogamy, else you’d see more variation. In a world unencumbered by social constraints a contact zone model would work well, but South Asia may not be that world.

As expected the secondary contact zone model gave an older date of admixture for Southeast Asia, where Austronesians arrived over the lat 4,000 years. Perhaps even too old! They note: “Linguistic evidence suggests that the Austronesian expansion through Indonesia dates to ∼ 4000 years ago (Gray et al. 2009)…Our estimate of timing based on fitting a geographic contact zone (5800 years ago) is much older than dates estimated by single pulse models, but is also considerably older than the Austronesian expansion.” The citation for Gray et al. seems to be this paper, but I’m pretty sure it was meant to be Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. In a inter-disciplinary field like this you need to rely on other researchers to complement your own understanding of specific domains. As it happens I am now more skeptical of linguistic phylogenetics than I was, so I don’t put too much stock that the date inferred was much older using their methods than what the linguists believe. Rather, I’d put more of an emphasis on material remains and archaeology, though dating and provenance can be hard to pin down on some occasions.

The last empirical illustration has to do with Central Asia, and I have a bit to say about this. The authors seem to be concerned that their signal of admixture is much older than the period of the Mongol invasions, ~700 years ago. Other studies, based on a pulse admixture model, pin this exact date, and others do not. The problem I have with this is that the real demographic history actually aligns well in my opinion with the dates that are given this paper. I don’t think they needed to take the Mongol model nearly as seriously as they did. But, I doubt that any Central Asianists peer reviewed this for Genetics, so a lot of weight was given probably to the older papers in genetics, where the Mongol angle is always played up. The reality is that there was a massive continuous movement of Turkic peoples from about 500 A.D. from greater Mongolia down into the Persianate world of Central Asia. While in India a contact zone model may not work well due to a history of endogamy, the situation is more amenable to that in Central Asia. I think further extensions of this framework in Inner Asia will be fruitful and necessary.

Though the authors here focus on human data sets, presumably because there was data and we know something about human demographic history, the secondary contact zone model formalized in this paper may be more useful with populations of animals and plants, where social constraints don’t exist to enforce endogamy (unless you count reinforcement!). Also, it probably will be useful in island situations, such as in Japan, where the migration patterns are probably defined by a single admixture followed by a wave of advance which likely had secondary contact zone dynamics (the Ainu have Yayoi ancestry).

Citation: Sedghifar, Alisa, et al. “The Spatial Mixing of Genomes in Secondary Contact Zones.” Genetics (2015).

• Category: Science • Tags: Genomics

This really distills important aspects of human male behavior.

• Category: Miscellaneous • Tags: South Park

When every song sounded like this? We do.

• Category: Miscellaneous • Tags: Music


The bar plot above shows the Kalash people in yellow as very distinctive group among a panoply of Eurasian populations. The figure is from a Rosenberg lab paper. There’s nothing aberrant about this result, you can generate this plot pretty easily by using any motley set of markers. The Kalash are distinctive. But it is important to keep the distinction in perspective. They’re not a relic population, remnants of an ancient race lost to time and memory. Rather, they happen to be a highly diverged northwest South Asian group. Their divergence is due to a small isolated breeding population which has been highly endogamous.

What this means is that the Kalash have a low long term effective population and have been more strongly impacted by drift in their allele frequency spectra. Small populations are subject to great allele frequency volatility generation to generation, and tend to lose a lot of their genetic diversity, and also fix many alleles. One consequence of this is genetic inbreeding and a higher recessive disease load. These populations with a lot of drift will have less efficacy of selection in removing deleterious alleles, and if a recessive expressing variant is fixed, then that’s that.

But another major consequence of strong drift on a population so that everyone is quasi-related for all practical purposes is that when you attempt some sort of clustering they naturally fall out as a very natural grouping. They’re low hanging fruit. When you plot populations on on a PCA you normally remove closely related individuals, because they will naturally form a tight cluster, and overwhelm the between population variation you’re looking for, hogging up all the highest dimensions making them distinct from non-relatives. Inbred groups like the Kalash do the same thing, if less boldly so. If you can keep this in mind it will allow for proper inferences about the natural history of a population. If you can’t, then you will be confused.

This is preface to a nice paper in PLOS GENETICS, Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference, which reports that an earlier publication, Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool, did not control for the effect of drift due to endogamy and so came to the wrong conclusion.* I won’t repeat the methods they used, as the paper is open access. But, they account for drift much better, and show that the divergence of a presumably genetically distinct caste had much more to do with increased drift due to endogamy than it did with the separation of the two lineages at some time in the distance past. Remember, drift builds up over any two pair of lineages which separate. But if the population size in one of the daughter lineages is very low, then drift will shift it away from the ancestral frequency spectra much faster, producing an artificially “long branch.”

The Kalash and the Ari are extreme cases of this. But they illustrate the general principle that we should be cautious about making inferences when we don’t control for the vicissitudes of demographic history, which may skew the power of our methods to see in a fair and balanced manner.

* There’s an overlap of authors across the two publications, showing that scientists do and can overturn their own conclusions if new data or analysis can persuade them.

• Category: Race/Ethnicity, Science • Tags: Genetics

Screenshot from 2015-08-26 20:34:38
51LNowRMg9L._SY344_BO1,204,203,200_ When is a jackal a wolf? All the time apparently. At least according to a persuasive new paper, Genome-wide Evidence Reveals that African and Eurasian Golden Jackals Are Distinct Species.

First, let’s put this in context. Canids area big deal. They’re big social mammals whose distribution and speciose character have undergone big changes across the Pleistocene. Sound familiar? Is it any surprise that one of their kind is our “best friend.” And, according to the anthropologist Pat Shipman the symbiotic relationship between dog and man is responsible for the victory of our lineage of hominins in the evolutionary war of all against all. About six months ago that thesis would seem a stretch, as the origin of dogs does not date until almost the Holocene according most genetic scholarship (the paleontologists have found rather old suggestive skulls thought). So tens of thousands of years after modern humans replaced other lineages. But ancient DNA suggests problems with the calibration of earlier work, which may have dated their divergence from wolves too recently. That and the fact that the emergence of dogs as a distinct group of canids might be concurrent with the arrival of modern humans to Eurasia make Shipman’s thesis at least feasible, if not probable. And note that I stated divergence from wolves, not derivation. It turns out that dogs are a sister lineage to Palearctic wolves, not derived from them. As observed in this paper extant lineages of wolves are genetically rather homogeneous, and seem to have diversified relatively recently, within the last 20,000 years, on the order of 10 to 20 thousand years after the last common ancestor of extant wolves and dogs.

Screenshot from 2015-08-26 20:44:18 Where do jackals play into this? The golden jackal has a distribution which covers both Eurasia and Africa. The species’ was determined morphologically. In other words, they look similar across their range. But sometimes you can’t judge a book by its cover. As an obvious example, most people would think that a hyrax on superficial inspection was a rodent. But a close examination of anatomical details indicated a relationship to elephants to classical taxonomists, which has been validated by DNA. But, as the paper above states plainly in the title the DNA here contradicts inferences made from morphology. Wolves and dogs, and African golden jackals, form a monophyletic lineage, to which Eurasian golden jackals are an outgroup! This determination was achieved through mtDNA analyses, as well as phylogenetic reconstruction from specific genetic regions, and, genome-wide comparisons on millions of polymorphisms.

But wait there’s more! One major difference between the example above of the hyrax vs. elephant and jackal vs. wolf is that the phylogenetic distance in the latter case is far smaller across the tips of the branch. That probably explains why morphological characters were not sufficient to discern the shared ancestry and derived characteristics of the wolf and the African jackal, as opposed to the Eurasian jackal. And, a corollary to this is that hybridization between these lineages is possible. In other words, this isn’t a phylogenetic tree, it’s a phylogenetic graph! Using D-statistics the authors show that there has been a fair amount of gene flow between Eurasian wolves and Eurasian jackals. And, in particular a lot of admixture from the Eurasian jackal to the dingo and basenji breeds.

Is this starting to sound a bit familiar? As population genomics has increased coverage of human populations, modern and ancient, as well as increasing marker density and accuracy, first approximation coarse phylogenetic trees have given way to threads of gene flow edges tracing their away across the thick branches. The trees have given ways to myriad graphs which force us to make more subtle our understanding of the genetic background of our own lineage. I see no reason why the same will not be true for large mammals, or, frankly, an innumerable number of clades.

In the near feature sequencing will be ubiquitous in ecological and systematic studies. At the coarsest big picture scale we’ll still see a confirmation of the tree of life as it’s classically envisioned, exploding outward from node to node, in subdivisions of clean monophyletic lineages, pruned by extinction diversified by drift and selection. But as you focus in closely the bifurcations will turn in on themselves or thread together in tangle, as the branches begin to be stitched together by gene flow. Look even closer and you’ll see that even within a young species, like humans, our local geographic pedigrees also collapse in on themselves, and tangle and coalesce down to a set finite number of individuals, rather than the infinite space of genealogical possibilities.

• Category: Science • Tags: Genetics, Science

Japan orange, Taiwan navy South Korea green, China light blue

Update: On Twitter it came to my attention that some think that this post is about growth Actually, my point is that the Communist period, and Mao’s period of domination, with the Great Leap Forward and the Cultural Revolution, probably are huge decrements to utility over the 20th century which the Chinese are now just compensating for. I think a KMT China, even if it unified less quickly and thoroughly than China, would probably have resulted in a far more prosperous China far earlier than in our “timeline.” Perhaps not as prosperous as South Korea, and definitely not Japan, but still quite prosperous over the past three generations in comparison to Communist China when state socialism was the dominant motor of the economy. Ergo, look not at the growth itself as opposed to the “area under the curve” from 1950 on.

The_Black_Book_of_Communism_(front_cover) Organized international Communism was responsible for on the order of tens of millions of deaths in a direct and concerted fashion, conservatively estimated. It also resulted in decades of repression for those who lived under it, but did not die under it. It fell with the Soviet Union, and today post-Communist (e.g., Russia) and quasi-Communist (e.g., China) nations are trying to move on beyond what was by and large a failed experiment in social engineering, with the failure resulting in massive levels of mortality and reduced life satisfaction on the part of those who lived under Communist regimes.

But can we move on? I have noted before that over the past generation in the aggregate Chinese economic development has resulted in the greatest reduction in poverty in the history of the world. With the economic crisis which is starting to afflict China, in all likelihood a deceleration from the very rapid growth phase induced by increased labor and capital inputs is upon us, and people are wondering about the long term trajectory of the nation. The problem is that China may grow old before it grows rich. The Chinese total labor force already peaked a few years ago. Over the next few decades its dependency ratio will shift in a direction similar to Japan’s. I am hopeful that the Chinese can meet their demographic challenges, and there are those who are optimistic. But we really don’t know.

And yet it has been brought to my attention that one could argue the Communist period in China is the cause of our current predicament. Compare the wealth trajectories of South Korea and Taiwan to the People’s Republic of China. It may be that for various reasons (e.g., Japanese investments in Korea and Taiwan, as well as differences between China’s Han population and the Fujianese preponderant in Taiwan) China under a non-communist regime would never have been as wealthy as South Korea or Taiwan are today. But does anyone doubt that China would be wealthier far earlier without the convulsions of the Great Leap Forward, Cultural Revolution, and grinding poverty of the 1970s? A billion people experienced deprivation due to the miscalculations of elite intellectuals in the mid-20th century, when Communism fused with nationalism was on the march. That’s behind us. But the late economic start for China is something we continue to live with today. We might have avoided this problem of China growing old before it grows rich, if it had a 30 year head start toward entrance into the modern economy. The world might have been a very different place…. (in fact, a best case scenario is that a dynamic China would have prodded India’s Permit Raj to liberalize earlier than the 1990s).

• Category: Economics, Foreign Policy • Tags: Economics

Screenshot from 2015-08-25 01:10:15Update: I think Richard Stallman left a comment on my blog!!! OMG.

I remember very precisely that it was in the spring of 2008 that I finally transitioned toward being a total desktop Linux user. Basically I’d been in Linux for a few days…forgotten, and tried to watch something on Netflix streaming. I then realized I wasn’t in Windows! Now that Netflix works on Ubuntu I don’t really use Windows at all. I still have a dual-boot notebook, but I have two desktop computers than are Linux only machines.

Well, it looks like I’m somewhat of an outlier. I think the rise of Mac utilization among nerds over the past 10 years has really had an effect. Since you can go into the terminal on a Mac it removes a lot of the advantage of Ubuntu, which after all is still somewhat less “turn-key” that Windows or Mac OS.

Then of course there’s Android. So in a way Linux has won. Just not in the way people were imagining in the mid-2000s.

• Category: Miscellaneous • Tags: Technology

Screenshot from 2015-08-23 12:32:24 I get notifications for a lot of different things. Some of them are way off base (e.g., I think at some point a publicist sent me the contact information of a client who had written a book on homoepathy?). But some of them are spot on. I drink a lot of coffee. On the order of two to six large cups a day probably. So I was curious when I got a notification of a new KickStarter, Cultured Coffee: Reinventing Coffee. Of course for many people Folgers instant coffee will do, but that’s probably not most people who have the marginal income to engage in discretionary spending on coffee made from recently ground whole beans. Definitely curious where this will all go in the long run, though unlike the horrible “cupcake craze” it seems that coffee is here to stay.

k10255 The author of A New History of Western Philosophy admits a fondness for medieval thought, which he believes has been undervalued. There is something of a backlash to the Renaissance way of thinking about the “Middle Ages” recently, but it can get a little out of control. I have to be honest and admit that for whatever reason, many, though not all, medieval philosophers and their thoughts seem to be no more than hilarious language games. Much of this has to do with the fact that their metaphysics were different from what take for granted today. Or, perhaps more accurately they took metaphysics seriously, so their linguistic analyses of terms were a very serious affair for them.

But I’m taking a break to check out Michael Cook’s Ancient Religions, Modern Politics: The Islamic Case in Comparative Perspective. I got this on the recommendation of T Greer, and he is correct that I disagree with some of the premises. But, it’s pretty interesting and detailed on facts. I doubt I’ll finish it before getting back to A New History of Western Philosophy, but it will probably be a quick read once I have time.

• Category: Miscellaneous • Tags: Open Thread

This isn’t probably aimed at most readers, but I think it’s important to pass the word around, so, Clumpak: a program for identifying clustering modes and packaging population structure inferences across K. An ungated version. From the abstract:

The identification of the genetic structure of populations from multilocus genotype data has become a central component of modern population-genetic data analysis. Application of model-based clustering programs often entails a number of steps, in which the user considers different modelling assumptions, compares results across different predetermined values of the number of assumed clusters (a parameter typically denoted K), examines multiple independent runs for each fixed value of K, and distinguishes among runs belonging to substantially distinct clustering solutions. Here, we present clumpak (Cluster Markov Packager Across K), a method that automates the postprocessing of results of model-based population structure analyses. For analysing multiple independent runs at a single K value, clumpak identifies sets of highly similar runs, separating distinct groups of runs that represent distinct modes in the space of possible solutions. This procedure, which generates a consensus solution for each distinct mode, is performed by the use of a Markov clustering algorithm that relies on a similarity matrix between replicate runs, as computed by the software clumpp. Next, clumpak identifies an optimal alignment of inferred clusters across different values of K, extending a similar approach implemented for a fixed K in clumpp and simplifying the comparison of clustering results across different K values. clumpak incorporates additional features, such as implementations of methods for choosing K and comparing solutions obtained by different programs, models, or data subsets. clumpak, available at, simplifies the use of model-based analyses of population structure in population genetics and molecular ecology.

The website deploys the package as a web based application (kind of like Structure-harvester). I don’t do GUIs, but I thought I would mention it (the package is downloadable). I’ve shied away from posting admixture barplots not because I think they lack utility. Rather, over the years readers have had a hard time understanding their limitations, and tend to reify a little too much for my taste.

• Category: Science • Tags: Genomics


I was playing around with some data, and I saw a strange migration from Amerindians to Finns in one run. I looked through replicate runs the same pattern reoccurred. The weird thing is that I had a Siberian data set in there (Ngannassan, Koryak, and Chukchi). The Amerindians were a mix of Pima, Maya, and a few 1000 Genome Peruvians.

To explore this further I got ran TreeMix with progressively fewer populations. I got the Ancient North Eurasian genotype and also put it in there. Using various quality filters I got down to 112,000 SNPs. All of the plots are here, but representative ones are below.

As you can whatever I saw was an artifact. Probably due to merging the various Siberian populations together. Now there is a gene flow edge from at least near the Nganassans and Nenets toward the Finnic groups, or from the Finnic groups. The relationship of Mal’ta is complicated by the fact that it’s so old, and, the population structure of North-Central Eurasia seems to have changed several times over the past few tens of thousands of years.


• Category: Science • Tags: Genomics

Screenshot from 2015-08-19 23:27:08 About ten years ago David Reich and Nick Patterson were involved in a paper which posited “complex speciation” in the lineage that led to humans and chimpanzees. What that means is that there was some hybridization between the proto-chimp/bonobo lineage, and that leading to hominins. As the authors state: “These unexpected features [of the genome] would be explained if the human and chimpanzee lineages initially diverged, then later exchanged genes before separating permanently.” The primary result happens to be a disjunction between the patterns you see in the broader genome, the autosome, and the X chromosome. The divergence from the X chromosome is far less than it should be if you would set your expectation from the autosome, suggesting that it harbors signatures of recent gene flow across the two lineages.

A new paper in PLOS GENETICS, Strong Selective Sweeps on the X Chromosome in the Human-Chimpanzee Ancestor Explain Its Low Divergence, offers up a different set of possibilities. One of the authors, Thomas Mailund, has a write-up of where they were going with this paper and how they got there. Definitely read what he has to say.

The crux of the issue seems to be that the diversity on the X chromosome varies in a peculiar manner. In particular, incomplete lineage sorting, basically the overlap of variation across two species due to common ancestral alleles, seems to exhibit a bimodal distribution on the X chromosome (the bottom panel above). Going beyond just a chromosome-wide summary or average, the authors found that there were huge deserts where variation was gone, in contrast with broad swaths of the X chromosome genome where the variation is totally in light with roughly neutral assumptions (i.e., the effective population of the X chromosome is ~3/4 of the autosome, so that increases the power of drift, etc.).

Why this pattern? One explanation could be background selection. This is basically the removal of deleterious alleles as they arise, often resulting in reduced variation across a genomic region because of linkage. The X chromosome has a peculiar dynamic because in males normally recessive alleles, whether favored or disfavored, are subject to the full force of selection (since most recessive mutations are deleterious, they’d be purged more effectively). But background selection is a relatively gentle and continuous process. The width of the flanking regions impacted by selection against a focal mutant should be modest. What they found was that there were huge genomic blocks without any segments of incomplete lineage sorting in humans and chimpanzees. That is, variation was removed in some portions of the genome and not others. One process that can cause this are positive selective sweeps. The authors posit there were many of these to explain how many regions of the X chromosome seem to have been affected.

51x-WwY-sAL._SX387_BO1,204,203,200_ What was driving these sweeps? At this point they’re really tentative. But they suggest meiotic drive. Meiotic drive is pretty famous from the deleterious t haplotype in mice, but there might be a major bias in when we see drive, because if it doesn’t have a deleterious drag it might result in such rapid sweeps to fixation that we won’t ever catch it in the act. It could be pervasive as a phenomenon, but we might have a skewed perspective of its basic nature.

Finally, they also report that these regions of reduced ILS correlate with regions of the X chromosome where there is very little Neanderthal admixture. So this might be part of a broader evolutionary dynamic among apes. Mailund promises more, and I’ll be waiting….

• Category: Science • Tags: Genomics

14 – And Moses was wroth with the officers of the host, with the captains over thousands, and captains over hundreds, which came from the battle.

15 – And Moses said unto them, Have ye saved all the women alive?

16 – Behold, these caused the children of Israel, through the counsel of Balaam, to commit trespass against the Lord in the matter of Peor, and there was a plague among the congregation of the Lord.

17 – Now therefore kill every male among the little ones, and kill every woman that hath known man by lying with him.

18 – But all the women children, that have not known a man by lying with him, keep alive for yourselves.

- King James Bible, Numbers 31

In the 20th century the Lithuanian archaeologist Marija Gimbutas posited that the emergence of pre-Christian European culture went through two phases after the Mesolithic. First, there were the Neolithic Old Europeans who brought agriculture. Then there were the Kurgan people from the steppe, who brought Indo-European languages and warlike patriarchal values to the continent.By the 1990s many archaeologists had turned against the Kurgan model of Indo-Europeanization, leaning rather toward the proposition that the Old Europeans themselves were Indo-Europeans. I believe that the latest work in genetics, utilizing powerful statistical inference techniques leveraging genomics and computational biology, and ancient DNA, suggest that Gimbutas was right in terms of the role of the Kurgan people as promoters of Indo-European culture in Northern Europe. Even those who supported the Kurgan hypothesis, such as David Anthony, were apparently shocked at the magnitude of the genetic turnover.

warbefore But Gimbutas probably went very wrong is the idea that Old Europeans were a peaceful and matriarchal society. First, though there are matrilineal societies, and matrifocal societies, to my knowledge there are no matriachal societies which are analogs to the patriarchies you might find in the modern Arab world or ancient Athens (and frankly, most agricultural and post-agricultural societies). Certainly there were societies where powerful women were shaping the course of events. This influence may even be institutionalized (I’m thinking of the Iroquois as an instance of a case). But there were no societies where rulers were exclusively women and men were forced into roles of total passivity in matters of war and politics, and property as a class.

That’s the truism as informed by what we know from surveying cultures in the historical record and extant today. But there is a spectrum of empirical phenomena in terms of magnitude. During the Roman Empire the women of the Latin West continued to have liberties and freedoms that were customary for them during antiquity (the power of the Julio-Claudian women and Theodora seem less shocking when considering the public prominence of elite women during the Republican period, which some ascribe to the role of Etruscan women in their society). When the focus of Roman power shifted toward Constantinople in the 4th century, one visible marker distinguishing elite women of western cultural affiliation, as opposed to those who were of the Greek nobility, is that the latter were often veiled, perhaps echoing the seclusion of ancient Athenian women of good family.

Similarly, though Japanese civilization is influenced, perhaps even derived, in large part from Chinese civilization, one major distinction between the two is that the in the ideal and often in practice the Chinese have subordinated military values to civilian ones to an exceptional extent for a pre-modern society. In contrast, the Japanese developed a military aristocracy which eventually superseded the civilian nobility. This results in the anachronistic romanticization of a martial ethos such as in bushido, which has no clear analogy in the Chinese world view. Obviously here I am not saying that the Chinese were a purely pacific people. And there were ages when martial values were ascendant, for example the early Tang. But the fact that the founder of the Song dynasty, a general, encouraged a demilitarization of his ruling class makes much more sense in light of the ethos of Chinese elite culture going back to the end of the Warring States period. In contrast, the Western aristocracy, often directly descended from Germanic warlords, have retained an ethos where physical violence and competition is more meritorious. The emergence of firearms necessitated a shift away from direct front-line combat to minimize casualties, and a channeling of energies into patronage of high culture and foppish self-cultivation. But even today the princes of the House of Windsor continue to serve in military professions, putting the role of the soldier in Western society in stark relief as one of esteem.

51PS1EGohbL._SX309_BO1,204,203,200_ I bring this up to reiterate that though we see the past through a dark mirror, we must filter its probabilities through what we know of societies today, and those that are historically attested. Human phenomena is not infinitely flexible, but exhibits modal peaks across the distribution of possibilities. Our expectations should not be uniform and agnostic. The Old Europeans may have been gynocentric pacifists, but if they were then they were sui generis among human societies. As time machines are not feasible we will never truly know in a direct sense what they were like. Rather, we must look to aligning material remains with theoretical expectations given what we know about the nature of human societies. Interpretation will always occur. The key is to obtain the proper framework to generate true inferences. In Lawrence Keeley’s War Before Civilization the author observes how the objects which might be useful as weapons in graves have often been interpreted as “ritual” markers of status, as if conspicuous consumption was always the primary form of status competition. Written in the 1990 War Before Civilization was a seminal work taking on the neo-Rousseauan model head-on, that war was somehow a contingent invention of civilization. A terrible mistake.

A recent paper in PNAS puts the final nail in the coffin of this strong form of the neo-Roussseauan paradigm, which now has little support even from scholars such as Brian Ferguson. The paper is The massacre mass grave of Schöneck-Kilianstädten reveals new insights into collective violence in Early Neolithic Central Europe:

The Early Neolithic massacre-related mass grave of Schöneck-Kilianstädten presented here provides new data and insights for the ongoing discussions of prehistoric warfare in Central Europe. Although several characteristics gleaned from the analysis of the human skeletal remains support and strengthen previous hypotheses based on the few known massacre sites of this time, a pattern of intentional mutilation of violence victims identified here is of special significance. Adding another key site to the evidence for Early Neolithic warfare generally allows more robust and reliable reconstructions of the possible reasons for the extent and frequency of outbreaks of lethal mass violence and the general impact these events had on shaping the further development of the Central European Neolithic.

The body of of the text engages in a deep osteological analysis, but in the language of the street, “they fucked these people up.” In particular, the victims seem to have had their lower extremities maimed or crushed. If they were still alive when this occurred then it was clearly a form of torture. If they were dead, then it was clearly a spiteful mutilation of the dead, and the valence has to be symbolic rather than utilitarian. The victims in the assemblage exhibited a curious demographic pattern. There were infants below one year of age, as well as young children, but no older children or adolescents. The only two adult women were over the age of forty. The rest of the adults killed were men.

We can’t know what happened with certainty. These were preliterate people. But with what we know about the nature of human culture it seems that an obvious narrative presents itself. As noted in the paper this was an LBK site. But, it seems that the community was on the border of two LBK trade networks (as inferred from the distribution and character of material remains). On the frontier of agricultural production, when land is in surplus, one can imagine that there was little inter-group conflict between LBK coalitions. What we would probably term “tribes.” Additionally, there was almost certainly a “meta-ethnic frontier” which Mesolithic hunter-gatherers, who we now know were genetically and physically very distinct from the LBK people (naively projecting genetic variance statistics, their difference was in the ballpark as that between modern Chinese and Northern Europeans, Fst ~ 0.10).

But what happens when Malthusian constraints begin to close in? In the Moral Consequences of Economic Growth Benjamin Friedman suggests that in American history economic stagnation and stress lead to greater xenophobia, and reduced openness. And one doesn’t need a deep history lesson to observe what occurred in Europe during the 1930s. Retrenchment invariably leads to turning back to collective units of organization and protection. Once the LBK reached a stationary state, which reduced marginal returns to labor input, and likely produced increased sensitivity to environmental perturbations, then it is entirely expected that “inter-group competition” would emerge as one of the ways in which the carrying capacity would maintain a “check” on numbers. Sedentary agriculturalists must scramble for scarce resources. There’s no running off, at least at this stage of social complexity.

The fact that the LBK turned on each other should condition our understanding of how the transition to the Corded Ware may have occurred. The Y chromosomes of the LBK period are very different from what we find in Bronze Age Europe. The most reasonable model I believe is that these lineages did not go silently into the night. As they did to each other, so was done unto them. In J. R. R. Tolkien’s work there are allusions to the coming Fourth Age of Middle Earth, an age of men. The rise of agricultural mass society was the age of men in our world. Hunter-gatherer societies were no idyll, but due to their small scale, and complementarity in economic production, the relationship between the sexes was not one of male domination, where women were property to be traded as chattel. But concentrated and sedentary units of economic production that arose with village life became an inevitable target of extraction from collective groups of males, who translated their significant superior upper body strength into a reign of coercive terror. That coercion was translated into reproductive success, which is evident in the explosion of a finite set of Y chromosomal lineages on the order of ~5,000 years ago. The common R1a1a ancestor of Daniel MacArthur and myself was the original O G thug.

In evolutionary genetics R. A. Fisher introduced the idea that when selection pressures come to bear upon a population, large effect mutations may increase rapidly in frequency to increase population mean fitness. But, these mutations are not without cost, one reason that they were likely at low frequency in the first place. For example, one of the most well known adaptations to malaria famously has a very large segregation load in terms of a recessive disease. Evolutionary theory predicts over time that the adaptation will be less genetically disruptive. New mutations which allow for adaptation without the costs may emerge, or, other mutations may arise to “mask” and “modify” the deleterious effect of the initially favored allele.

When John Maynard Keynes purchased the papers of Isaac Newton he was shocked at the proportion of the great physicists writings devoted to matters occult and esoteric. Keynes declared that Newton was the ” last of the magicians, the last of the Babylonians and Sumerians, the last great mind which looked out on the visible and intellectual world with the same eyes as those who began to build our intellectual inheritance rather less than 10,000 years ago.” In opening the new age with his beautiful system of rational science, Newton nevertheless reflected an ancient ethos which persisted down into the modern period.

The Jewish people have been critical in the development of a universal ethical monotheism in the West, part of the broader evolution away from the supernatural systems of the Bronze Age that occurred across the Axial Age. But the Hebrew Bible preserves within it a world far removed from the divine Logos, a God of law and morality. The angry and jealous sky god of the Hebrews also enjoins upon them genocide of other tribes. Though the Hebrew Bible is pregnant with the possibilities of religious ethical universalism, the voice of the prophets’ righteous indignation raw with rage alive in our age, and channeled through the gentler voices of Hillel and Jesus, it also is a record of a parochial and peculiar people, who wash their hands of their atrocity by attributing it to the capricious and vindictive will of their god. If Moses and Joshua did exist, they almost certainly would have more in common with the war-chiefs of early Neolithic Europe, 4,000 years before their time, than men such as Constantine, who 1,300 years later promulgated a universal religion for a universal empire.

51w0iMybWyL._SY344_BO1,204,203,200_ Ancient Egypt, with its autocratic god-kings, was arguably one of the end-points of the Neolithic experiment with mass culture and ideology. So were Shang China and Mycenaean Greece, with their human sacrifices to propitiate the gods. Increasing primary productivity by an order of magnitude, which farming did, resulted in the emergence of huge amalgamations of humanity, and we as a species are culturally creative enough to have come up with adaptations. Literacy, cities, and social stratification, were all responses to the stresses and pressures that the opportunity of mass society presented. The emergence of powerful menacing and extortionate patrilineages was another. This was a world of gangs, thugs, and the question was not whether you would become a thug, it was whether you would be a thug or a victim of a thug. They were necessary, inevitable, cultural mutations against the background pressures that agricultural imposed upon humanity.

But as per Fisher’s model, mutants with deleterious consequences invite their own response. They are tamed and civilized by a scaffold of modifiers. The brutal gods which were but reflections of human vice and caprice were drafted in the service of primal human psychological impulses forged during the Paleolithic, reciprocity and egalitarianism arose against the background of brutality beyond imagining unleashed by the social dislocation that was a consequence of agricultural society. The men and women shaped by the Hebrew prophets and Christian Church Fathers, the rishis of the Upanishads and the Chinese sages, they are all closer to us 2,000 years later, then they were to their own forebears only a few hundred years earlier in their own past.

These models operate in the world between one of naive innate cognitive reflexes and pure cultural inventions generated without reference to the functional constraints of our minds and environments. The independent experiment of the Aztec Mesoamerican society suggests that the same stage of brutal social order that had occurred during the Neolithic was playing out in the New World. The Aztecs were engaging in ritual cannibalism and human sacrifice in a manner not seen in Old World civilizations since the Bronze Age. Some inventions are inevitable, emergent properties of the intersection of our biobehavioral toolkit and our species’ incredible cultural flexibility. Though we may believe ourselves to be far beyond the LBK people, the Nazi gas chambers or the more recent events in Rwanda suggest that the same mental reflexes of coalition-building and competition can be co-opted toward organized violent ends even today. Peace is possible, but violence is always imaginable.

Addendum: This Azar Gat article argues for the reality of war among hunter-gatherers, extensively citing what we know about Australian Aboriginal culture on the eve of European settlement. It would indicate that the only thing separating our Pleistocene ancestors from ourselves in terms of violence would be scale and organization, with ideology a novel handmaid.

• Category: History, Science • Tags: Indo-Europeans, War
Credit: Graham Crumb

Credit: Graham Crumb

If population genetics is “study of the distributions and changes of allele frequency in a population,” then the understanding of the maintenance of variation (or lack thereof) is one of the major topics of focus. In the first half of the 20th century when there was a lot more theory than data there were arguments about whether polymoprhism (in this era they’re talking about classical markers) was maintained through balancing selection or whether it was just a transient phenomena, and that at any given moment you’re just getting a snapshot of alleles sweeping up to fixation, or being purged out of the gene pool. In the second half of the 20th century it was all about neutral theory, and its discontents. Then the post-genomic era showed up, and geneticists had access to a lot of data and computational power to analyze it. Rather than relying on older molecular tests which were geared toward detecting inter-specific selection events population geneticists began scouring haplotype structure.

But even now there’s a lot of mystery. First, you might be able to adduce that selection is highly likely in a given region, but you may have no clue what that region does functionally (in some cases the region may not even be genic, in which case it has be a mysterious regulatory element). There are some good case studies where the mystery has cleared. Lactase persistence. The ways you can fight malaria. But over the past day I’ve been having to admit that it sure looks like the regions of the genome around pigmentation function are the targets of selection. But we don’t really know what selection is selecting for. And this is actually a set of selection events that I can imagine some day reaching a resolution into their probable cause. But we’re far from that.

A few years ago Eimear Kenney and company solved the mystery of why some Melanesian populations had very dark skins but blonde hair. I blogged about it, but didn’t read the paper too closely. Looking at the publication date, May 2012, I realize I was busy studying for some really big end of first year exams at that time, so that explains my lack of attention. In any case they found that a mutation, rs13289810 in TYRP1, results in blonde hair when it’s a homozygote. They didn’t find strong evidence for recent selection. That is there wasn’t a long haplotype block indicating a sweep in the past 10,000 years. The allele frequency difference across populations as well as long range linkage disequilibrium was suggestive of past selection.

map2 This was in the Solomon Islands. Today I decided to see if there was any follow up on this work. Well, Heather Norton’s group published a paper, Distribution of an allele associated with blond hair color across Northern Island Melanesia. It’s on a different set of islands, but the same results pretty much hold. The allele has a recessive effect on hair color, not much on skin color (there was a small effect in the original paper, so it seems it’s not wholly tissue specific in expression). But I just kept staring at this map and the frequencies. Look at the derived proportions…they don’t get above 0.50. But in most of the populations they’re around in appreciable proportions. I had a hard time not thinking there wasn’t balancing selection going on here. That this was something old that was persisting, but not fixing.

I asked Carlos Bustamante, and he got back me on Twitter:

I also had an exchange with the first author, and she pointed out in the supplements that the frequencies in the Solomons were quite curious too:

Region Genotype counts Frequency of 93C
Central 126 80 22 0.27
Choiseul 17 2 0 0.05
Guadalcanal 33 33 13 0.37
Isabel 23 17 7 0.33
Makira 13 11 3 0.31
Malaita 98 185 92 0.49
40 11 0 0.11
Temotu 13 11 3 0.31
Western 40 22 2 0.2
Total 405 374 142 0.36


When they looked in the HGDP data set it’s ancestral everywhere else. The derived variant isn’t floating around at low frequencies. One might naively think that it’s overdominance, but I suspect we’re looking at some negative frequency dependent selection. In the 2014 paper by Norton et al. it’s pretty clear that this is distributed across rather disparate populations. It is unlikely in my opinion to be purely due to population structure, as diverse islands have been sampled. It looks to be an old variant that’s persisted, so it dates to the Pleistocene settlement of Near Oceania. It’s also found in Australia, though we don’t know the genetic basis.

Ten years ago I would have been super excited to know the genetic basis of an interesting trait like this. But now I’m left with why? Why? We’ll be grappling with a lot of why’s in the next few decades.

• Category: Science • Tags: Blondism, Selection

admixethn_htm_m25d90b04 Whenever I post about Indian genetics there are really weird comments that pop up which go like this: “this guy doesn’t know anything about genetics, he totally ignores the research [usually published in the late 2000s and utilizing mtDNA haplgroups] of [Indian researcher that I don't really know] who has proven [something which has been superseded long ago].” Usually these are at my Facebook account, though they also pop-up on Twitter. Often I’ll indulge this people, but usually I just ignore them. If your world-view needs to be supported by mtDNA haplogroup analyses published in Human Biology, more power to you! Or if eight marker autosomal microsatellite studies from 2005 is the last you want to hear about genetics…by all means.

As it is today in a few hours you can really resolve what’s going on with questions about Indian genetics, or whether the Chinese are genetically differentiated, as long as you don’t have too strong of an agenda, can get data, and don’t go sniffing for particular results. A few weeks ago a friend who is from a Tamil Brahmin background asked me if I knew anything about the genetics of this group. Well, a bit. Above and to the left is a bar plot with admixture fractions from Harappa DNA Project . You can see that the Tamil Brahmins are homogeneous. This suggests that they’re an endogamous community with genetic coherency.

But how do they relate to other South Indians and other Brahmins? This is a question that is politically fraught. I really don’t care though, because I’m not Indian, and even if I was, I still wouldn’t care. I don’t have Zack’s data set, but I do have three Tamil Brahmin genotypes. You can see them on the PCA plot above. The North Indian data set is all Punjabi, while the South Indians are a mix of non-Brahmin Tamils and Telugus, from the 1000 Genomes. The rest is from the Estonian Biocentre data. The results are clear, you can see that Tamil Brahmins are strongly shifted toward the North Indian cluster but in comparison to Uttar Pradesh Brahmins they are South Indian skewed. The most parsimonious explanation taking into account their generally agreed upon communal history of migration from northern India is that they are predominantly a northern origin caste with some admixture from the local substrate. This seems entirely reasonable with how we know demographic processes work.

Using TreeMix I ran 20 plots each of two different data sets with Tamil Brahmins. All the plots are here (tar.gz). But below are two representative plots.



In the first set of plots the Tamil Brahmins tend to be near the positions of the North Indian groups, but have a consistent migration edge from near the Velamas. From what I can tell the Velamas are not a marginal group, but somewhat elite. It seems entirely reasonable that native gene flow into Brahmins coming from the north would be from local high status populations, since the Brahmins themselves were coming into the region as a priestly elite to serve the rulers of South India and sanctify their domains. Usually I read something about the assimilation of local religious elites, so that’s probably what happened. Also, note that Uttar Pradesh Brahmins consistently receive gene flow from Chamars, a Dalit caste in Uttar Pradesh. I suspect what’s going on her is that the Chamars are representative of the pre-Indo-Aryan population, and the Indo-Aryans amalgamated with local elites as they pushed the Aryavarta beyond the Punjab. There are allusions which can be interpreted this way in the older Hindu texts.

The second set of plots is a little more confused. The positioning of the various groups is a little schizophrenic, and you can see gene flow edges back and forth attempting to make the “fit” of the topology better. The position of the Tamil Brahmins is next to the Chamar here, but they are getting a lot of gene flow (nearly 50%) from the Uttar Pradesh Kshatriya, again indicates that the group is a composite. The Chamar make direct contributions to both Uttar Pradesh high castes.

A major shortcoming of these analyses is a paucity of good source populations for these gene flow edges. A lot of the public data is from obscure tribal groups who are somewhat inbred, and so often drift into long branches. The 1000 Genomes data has no ethnic label, so you are pooling a lot of different groups together. For whatever reason we know a lot more about the genetics of the Tharu people or the Kol than we do about the Brahmins of Tamil Nadu or Uttar Pradesh, or the Kayastha of West Bengali.

Finally, I was curious about runs of homozygosity. If the South Indian Brahmins went through a bottleneck of some sort, and have been endogamous, they’d have built up some of these. I have three 23andMe South Indian Brahmin samples, along with a Kayastha from Uttar Pradesh, and myself. I took the HapMap populations and intersected SNPs so that I got 750,000. Below is a density plot of total kb of runs of homozygosity of HapMap populations, as well as vertical lines which show where some individuals come out. I was struck that the South Indian Brahmins had 24, 25, and 26, runs respectively using default cut offs. The Kayastha from UP had 19. And I had 11. I think my relative lack is due to two factors. First, the last few generations above me in my pedigree have seen a lot of intermarriage between what in different parts of India would be different jatis (it doesn’t map totally to Muslims, but I do have a fair number of Hindu ancestors in the last few hundred years and sort of know their caste by the surname). Second, I’m Bengali, with a lot of East Asian ancestry, so without inbreeding that’s going to break apart a lot of blocs which might otherwise exist in the genome because of population admixture. If you are curious about the GIH, Gujarati population, there are a lot of Patels in that sample. They’re skewing the distribution up.


CEU = Utah White

GIH = Gujarati

CHB = Beijing, Chinese

ASW = African Americans from Oklahoma City

• Category: Science • Tags: Tamil Brahmins

41lYx8Va7WL._SY344_BO1,204,203,200_ People routinely mistake the action of adaptive evolutionary process as occurring on the level of the species. Not only is this a misunderstanding that crops up in the general public, but I’ve talked to biologists who make the same mistake. The reality is that the mainstream tradition in modern evolutionary biology is very skeptical of “for the good of the species” arguments. For me one simple reason is that I don’t think species are necessarily a clear and distinct taxonomic class. But the major factor is the reality that altruism of this sort is vulnerable to being superseded by an invading selfish free-riding strategy. As a matter of pervasive phenomena much of the “struggle for survival” that an organism experiences won’t be due to exigencies of environment or the threat from other lineages, but rather within one’s own species. Though this can be conceptualized in terms of violence, more often one can chalk it up to competition for finite resources in a Malthusian world at carrying capacity.

417SDKP-XhL._SX323_BO1,204,203,200_ The logical conclusion leads to the sort of individual-level focus that is at the heart of The Selfish Gene, though Richard Dawkins’ book is to a large extent an exposition of a Neo-Darwinian tradition which goes back to R. A. Fisher, and matured with W. D. Hamilton and George Williams. But over the past few decades there has been a small group of biologists who have rebelled from the focus on individuals and genes, and made the case for selection operating at multiple levels or biological organization, from the intra-genomic all the way to “super-organisms” such as ant colonies. Rather than old-style species/group selection, the new theorists refer to “multi-level selection.” The primary force behind this movement has been David Sloan Wilson. I like David personally and he’s a great scientist (I did a BloggingHeads with him 6 years ago). But he has a tendency in my opinion of declaring unilateral victory when most people would argue that there’s still a lot to hash out, and the war continues. His new book, Does Altruism Exist?, is in my Kindle “to-read” stack, but from what I’ve seen of the reviews he does do this again! (no worries, the rest of the book looks interesting anyway)

41z97bDZvUL._SY344_BO1,204,203,200_ My own views have evolved…over the years I have realized I am not entirely satisfied with models of human cultural variation that are individual level or entirely non-adaptive. I have long been broadly sympathetic to the project of Peter Richerson and Richard Boyd of using the frameworks developed in evolutionary biology to understand cultural processes. Additionally, I’m a big fan of Joe Henrich’s research, to the point of pre-ordering his book The Secret of Our Success: How Culture Is Driving Human Evolution, Domesticating Our Species, and Making Us Smarter six months ahead of time. There are good reasons why above-the-individual level selection would be able to operate in humans. The reduction in a verbal sense is that human cultural phenomena are such that between group variation can dwarf within group variation. The canonical example of this is language, where differences between groups are very large, and rather smaller within groups. This is simply a function of how culture, and language in particular, spreads in a population: it can be asymmetric in terms of vertical transmission. This is in contrast to genes, where you have equal contributions from both parents. One can imagine a population expanding into another where it absorbs individuals from hostile groups, changes its own genetic makeup, but by and large maintains its cultural integrity. To give a concrete example, the Xhosa people of South Africa are approximately ~25 percent Khoisan in genetic ancestry. But their culture is not “25% Khoisan.” There are influences, such as click sounds in their language, but those are accents on the basic Bantu cultural substrate which is preserved, and ties them with populations in Central and Eastern Africa.

51aEM-jiATL._SX323_BO1,204,203,200_ It is a rather different matter with biological processes because of the enforced symmetry in transmission. Maintaining between group variance requires ingenious processes, which some find implausible. But ultimately it’s an empirical matter on a species-by-species basis. I would commend readers to look through first half of Wilson’s Unto Others to get a sense of how inter-demic selection processes might be ubiquitous. My position on the role that biological above-the-level-of-individual selection plays in evolution is to be skeptical in the generality but open-minded in the specifics. After all, bdelloid rotifers show that there are cases where complex asexual species can persist, even if the general rule about asexual lineages is that they are prone to extinction.

For whatever reason arguments about multi-level selection get rather heated among evolutionary biologists. It’s often closely related to the debate about kin selection (see this post from Jerry Coyne, and follow the links). I suppose David Sloan Wilson would suggest that it’s an illustration of inter-group competition, as individuals conform to particular positions due to their identity as members of a coalition.

cover With the preliminaries out of the way, I’d like to recommend a series of papers in the Journal of Evolutionary Biology which are open access to readers if they want to dig further (especially for those with a formal bent). First, The genetical theory of multilevel selection. The title should give some readers a clue as to the tradition which this theorists works in! Charles Goodnight makes a spirited response in Multilevel selection theory and evidence: a critique of Gardner, 2015 (his point about some researchers who come out of quantitative genetics has always been obvious to me when I read their papers; it’s a different tradition). Finally, the original author responds: More on the genetical theory of multilevel selection.

I’m not an evolutionary theorist, so I’m not going to take sides (though to be honest I always find Goodnight to be a little too vociferous for my taste, but perhaps that’s just how it comes across in print). Rather, I’m chewing through some of the ideas, and find that these papers are excellent starting points to explore the literature. It’s also nice that they’re open access, as there are people who are not in academia who might have some things to say about these topics, or, who might pursue research as a career after stumbling upon these sorts of papers.

If the copious references to the Price Equation confuse, try this paper.

• Category: Science • Tags: Evolutionary Biology

About a year ago I heard a pop song on my Pandora that was a little less annoying than Ke$sha, and I looked up the singer up. Her real name was Jessica Malakouti. My immediate though was “that last name sounds Iranian.” Then I watch the video above, and my revised thought with the new priors (i.e., what she looks like) was “well, she’s probably of Lithuanian heritage, and that’s an archaized surname that sounds vaguely Iranian.” For reasons I don’t even recall somehow I stumbled onto this singer’s Wikipedia page recently, and it had been updated with the fact that she is of Iranian heritage. It turns out her father is from Iran, and she is a product of the greater Los Angeles Iranian Diaspora community. The citation for the Wikipedia entry is a Youtube interview where she refers to herself as “mixed-race” and talks about rapping in Farsi (I guess she’s a wannabe Arash).

If someone who looks like this refers to herself as “mixed-race” this country is going to need to update its 1960s era Civil Rights framework soon. One of the podcasts I listen to is On Point with Tom Ashbrook, and a week it ago it had a show with the title Race In America, From Watts To Ferguson And Beyond. Actually, on the podcast version it was shortened to “Race in America.” Despite the fact that less than 40 percent of people who are in some way not non-Hispanic white (which includes people from the Middle East, like Jessica Malakouti’s father in any case) are of black American heritage, they loom large enough in this nation’s history and consciousness that I knew that “Race is America” was going to be about two races, with the rest of us rendered invisible. Of the guests on that particular show only John McWhorter even grappled with the fact that there were groups outside of the black-white dichotomy. When I was a kid in the 1980s this was how it went too. And to a great extent it was how it should have gone. Black Americans have been in this country since the Founding, and most of their ancestry dates to before the Founding. They were the largest racial minority for most of its history, and were when I was a child. Things are different now on the ground. But you wouldn’t know that from the media. 15 years ago The New York Times published its prize winning series How Race is Lived in America. I thought that that was going to be the last testament to the old biracial America due to the nation’s changing demographics. I was wrong.

51aBlSPDX8L._SX342_BO1,204,203,200_ After finishing The Making of Modern Japan, Japan is big in the media this week. There are yearly stories on the apology, but there is a new twist with the current Prime Minister’s attempts to modify the hyper-pacifist orientation of the Japanese state and society (more precisely, it strikes me that he wants to make it so that the “Self Defense Forces” have some real bite and can be more flexible in their operations internationally). After reading a book which outlines how Japan got to where it is today, I really value the importance of dense historical knowledge. It’s like going from sepia photographs to high resolution digital color imagery. I had been meaning to get back to A New History of Western Philosophy, which I dropped in the medieval section when I switched to reading The Indo-European Controversey, but now I am curious as to whether I should fill in my blank spot (in relative terms) in regards to modern Chinese history. Perhaps it’s the antiquarian in me, but I’ve never been much curious about Chinese history beyond the reign of the Qianlong Emperor. Now I’m inclined to pick up Jonathan Spencer’s The Search for Modern China, though part of me wants to finally learn about the Taiping Rebellion through God’s Chinese Son. Recommendations are welcome.

Recently I posted some analysis where it seems pretty clear that there’s Indian admixture into the Cambodian population. The main issue that I have when trying to get a fix on this is whether it’s deep common shared ancestry via the South Eurasian substrate which was present from India all the way to the South China Sea and down toward maritime Southeast Asia, or, whether it was more recent, on the edge of historical times (and whether it was connected to the cultural impact of India on Southeast Asia). I think I presented persuasive evidence that it was in part more recent. Yesterday I stumbled onto a smoking gun which was right in the literature all along. In the supplementary table for Norton et al.’s 2007 paper on convergent light skin adaptation it reports that of 22 Cambodians the frequency of the derived variant of SLC24A5 is 9% (so 4 allele copies out of 44). They are the only East Asian group south of the Yangzi with this allele. One hypothesis is that it could be French admixture. But there are no copies of SLC45A2 derived allele. The sample size is small, but I checked the 1000 Genomes, and the Vietnamese have very low frequencies of both alleles, consistent with French admixture. The best candidate for a donor group is obviously a South Asian one. It is interesting that the allele frequency is pretty low, probably consistent with overall admixture proportion, consistent with no selection in situ.

In January I wrote that op-ed in The New York Times last year so that the debate would be a little less “battle of the sexes” in rhetoric about abortion. But whenever I read/listen to liberals talk about it they often fallback on the trope that women implicitly support abortion rights and men do not. I have no idea what the point of this caricature is, because if you are talking to your own side everyone already agrees, while pro-life people are probably going to be pretty annoyed by your blatant mischaracterization. Perhaps I’m wrong, but I think part of it is that some people feel better about their own viewpoint when they can couch it in anti-sexism, where they (often these are liberal men) are on the side of women and their opponents are not.

Over the last year and a half or so I’ve gotten more into my fitness. Partly it was for reasons of health. I’m South Asian, and we have issues with morbidity relating to metabolic disease. It runs in my family. I have kids now and I want to be around for them. I was never that fat. Probably the highest my BMI ever got was 26 in March of 2002 (I’m 5’8, that’s 170 lbs), when I pretty much cut out all soft drinks from my diet (I was never a big consumer, but I went from occasional to literally zero). Since then I’ve been as light as 140 lbs (spring of 2008), but have veered between 150 and 160 in graduate school. My weight had has not shifted much since I began to make changes, but I’ve been lifting, so losing fat and gaining muscle. This has really helped the second, and not secondary, reason that I am working out, and that is aesthetic. It is really nice not to be soft anymore!

In any case, there was a recent link posted about low fat vs. low carb diets. The problem is that nutrition is basically a semi-science, and people rightly can offer their own opinions. Personally I find cutting carbs the main way I can sustain cutting calories, but the robotic pattern of responses by low-carb folks is too reminiscent of low-fat propoganda. For a balanced, and striving toward scientific view, I’d suggest you check out my friend Kevin Klatt’s blog (or send him questions on Twitter, that’s what I do).

Preemptive apology if I can’t respond to all your comments, though I try to read the “open threads.” I’ve got a lot of responsibilities in “real life” as I attempt to be a “grown-up,” so don’t take it personally.

• Category: Miscellaneous • Tags: Open Thread

516JD1M3N5L._SX323_BO1,204,203,200_ I got curious about pigmentation about ten years when reading the coda to Armand Leroi’s Mutants: On Genetic Variety and the Human Body, where he observes curiously that after all these decades geneticists still didn’t understand very well the basis of normal variation in skin color. I read that in the summer of 2005, so Armand had probably written it in 2004 (he can correct me if he has time, he occasionally comments here). Depending on how you view it, it was a fortunate or unfortunate time to write something like this. Over the past ten years geneticists have solved the basis of normal variation in human pigmentation. In fact, most of the major work was completed between 2005 and 2007. In December of 2005 Science published SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans. The authors reported that rs1426654 was nearly disjoint in distribution between Africans and Europeans, and, that it explained on the order of 1/3 of the variance in pigmentation between the two populations (European populations are fixed for the A allele, Africans for the G allele).

41h+3YmTZRL._SX329_BO1,204,203,200_ There are several facts just within that statement that illustrates why pigmentation genomics has been such a success in comparison to other domains tackled by the new methods. First, pigmentation pathways seem to be somewhat constrained across animals, so model organisms can given us a lot of insight and clues. A lot of the pigmentation genes, such as KITLG, TYR, and SLC24A5, actually increase or decrease melanin production and alter tissue specific expression just as they do in humans, across vertebrates. Second, the fact that I just named genes off the top of my head highlights the fact that are a few conserved loci that explain most of the variance, crop up in study after study. This is in contrast to height, where the variance is distributed across thousands of genes, and the only one I can name off the top of my head is HGMA2. And it explains a princely ~0.3% of the variance of the trait.

This wasn’t entirely a surprise. I happen to have had a copy of The Genetics of Human Populations. In it, L. L. Cavalli-Sforza reported on a classical pedigree analysis of individuals in Britain of varying levels of African ancestry dating to the 1950s. In particular, in genetic jargon the study focused on the variance in trait values between parentals, F1 individuals, and “back-cross” individuals (as well as a few F2 individuals from what I recall). The research concluded that pigmentation was probably controlled by on the order of 10 genes or so. In particular, the authors suggested that the trait was unlikely to be highly polygenic, which for the designs of that period really meant more than a dozen loci or so, beyond which they lacked the power to differentiate the number of independent effects with any precision (i.e., they wouldn’t be able to distinguish between a trait where 25 loci explain 90% of the variance, and a trait where 500 loci explain 90% of the variance). Third, pigmentation loci exhibit a relatively high pairwise Fst. That is, most of the variation on many of these alleles is partitioned between populations, rather than within them. Obviously that is convenient when you are trying to detect associations between genes and phenotypes which are partitioned on an inter-continental scale.

The illustration with SLC24A5 is pretty straightforward; the frequency of the derived allele is 100% in Europeans, and over 99% ancestral in unadmixed Sub-Saharan Africans. In the 1000 Genomes frequency in the Utah white American sample of the derived A allele is 100% (out of 99 individuals). In the 91 British individuals it is 100%. In the Tuscan set of 107, there are 213 A alleles, and 1 G allele. In the 107 Spanish individuals, the A allele is at 100%. In contrast, for the Yoruba Nigerian data set, there are 3 A alleles for 213 G variants. For the Esan of Nigeria, it is 5 A for 193 G. For the Chinese samples from Beijing, 6 A alleles, and 200 G. At this point you might think that the A variant at this SNP position is diagnostic of European ancestry, but it is not. I, for example, am homozygous for the A variant, as are both of my parents. In the 1000 Genomes data there are 25 Bengalis who are AA, 42 who are AG, and 19 who are GG. In the Sri Lankan Tamil population A is at 49% frequency.

F1.medium The figure to my left is from Heather Norton’s Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians, and it uses neighbor-joining trees to represent genetic distances at particular loci then known (2007) to be implicated in inter-continental variation in pigmentation. The abbreviations are pretty self-evident, WA=West African, NA = Native American, EA = East Asian, IM = Island Melanesian, SA = South Asian, and EU = European. What you see is that pigmentation genes are not particularly phylogenetically representative. That is, whole genome relationships, whereby all non-Africans form one clade set against Africans, are not reflected here. Looking at these patterns, you would have inferred that Europeans were the outgroup. And, the lowest genetic distance from West Africans are Island Melanesians. What’s going on here is Island Melanesians and West Africans have similar phenotypes in skin color, and that is being reflected in these genes. Roughly, Melanesians and West Africans exhibit a fair amount of functional constraint around pigmentation genes. They haven’t changed much. In contrast, East Asians and Europeans actually are not too different in their pigmentation on a world-wide scale, but that is not reflected in these trees. Why? As is made clear in the title of Norton et al.’s paper East Asians and Europeans arrived at their phenotypes via different mutational paths. I say different mutational paths because there is a broad overlap in genes, but, the alleles are often different (different SNPs or regulatory elements within the gene).

One of the questions that I often get is how to translate genetic variation into realized trait value shifts in individuals, as opposed to simply proportion of variation explained within the population. Luckily, geneticists who study pigmentation have a quantitative unit, a “melanin index” (MI), which naturally utilizes the fact that individuals with darker skin exhibit less reflectance. But there are two problems giving a simple answer to these sorts of questions. First, a substitution of an allele may have an average effect, but, that effect may not be realized for various reasons (e.g., epistasis). And there are still individual differences between people with the exact same genotype. Second, that effect manifests within a population, and different populations have different mixes of alleles.

Screenshot from 2015-08-14 22:52:45 The table to the left is adapted from The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent. I think we can agree that the results here fit our intuitions. These are averages. Some of the populations in this list, such as the South Asian ones, as well as African Americans, exhibit a lot of variance within population. We now know why; they have a lot of segregating variants. Even within families you can see variation across siblings of quite an extreme nature. The subtle difference between Europeans and East Asians comports with my experience too. The American white population is mostly Northern European, so this is probably a bit on the low side in MI for a typical European population. A paper on Cuban pigmentation genetics given a median MI for self-identified whites as 34. The ancestry is 86% European, 7% African, and 7% Native American, in this set. Therefore the average Iberian probably is somewhat lighter complected, but not by much. Notice how much darker Bougainville Islanders are than African Americans. Though the latter may be “black” in figurative terms, Bougainville Islanders are black in literal terms. Along with some Sudanic people they are among the darkest skinned in the world. In these data Tamil Brahmins are at 41. These are people whose surnames are often, but not always, Iyer. The stereotype, and my personal experience, is that the modal Tamil Brahmin is light to medium brown. Some are rather dark, while a few may have complexions that veer on brunette white. To be honest in my personal experience I have not met any Tamil Brahmins whose skins are white, though it has not been uncommon for me to meet such individuals with such fair skins from Northwest India, in particular Punjabis and Kashmiris (the best way to judge for me is meeting people in real life, as I’ve heard that Indian celebrities often are made up in a way that lighten them up somewhat).

The supplements of the paper have allele frequencies of SLC24A5 for various castes. Kashmiri Pandits are at >95% frequency for the A allele. Other Brahmins are at ~80%, irrespective of whether they are in the North or South. Punjabis, irrespective of caste are at ~95%. Middle castes in South India, like the Reddy and Naidu, are at ~60 to 65%. Chamars, a Dalit caste in North India clock in at 68%, while the Toda people of the Nilgiri plateau of the far south of India have a derived allele frequency of 86%. The low caste individuals in Bihar at 78%. At the other end of the distribution some of the Austro-Asiatic tribes have very low frequencies. The Juang people for example are at 7%. Part of this may just be recent East Asian admixture. But it can’t explain all of it, these groups are mostly of the same component elements as other South Asians, albeit at fractions skewed toward the Ancestral South Indians (ASI). I don’t see any geographic pattern that suggests why selection would happen in certain regions and not in others, though it is suggestive that the Kashmiris and Toda are both living at high elevations, so are the Austro-Asiatic groups. I’ll get back to this paper when we talk about selection, but I’ll set it aside for now.

Rather, what are the effects on MI of substitutions of particular alleles at given genes again? The paper on Cuban admixture and pigmentation genetics and another using Cape Verde as the population of interest are particular useful, because these two data sets have a wide range in ancestral quanta (these are not the only papers with these sorts of results, but this post isn’t a literature review!). The figure to the right is from the second paper, and shows the effect size in standardized units of variants which were statistically significant in their study. Pretty much every study tends to come to the conclusion that SLC24A5 is the biggest effect locus in the genome on this trait if the data set includes substantial West Eurasian ancestry. The main qualification I’d put on that is that East Asians have been understudied for this trait, so the European derived alleles are much more well understood. Be as that maybe, each substitution of SLC24A5 derived allele, A, reduces MI by ~5 units. That is, it’s additive to a first approximation. Some studies do show a mild dominance effect…but of the A allele. That is, light is dominant to dark (e.g., in the Cape Verde study GG is further away from GA than AA is). It’s actually a consistent result. This is curious, because many people believe that dark skin is dominant to light skin. Thanks to genetics we know in a quantitative sense that that’s not true. In fact, perhaps the reverse is on SLC24A5 and KITLG (concretely, individuals who are heterozygous will be lighter than you would expect going by mid-parent mean).

But, in a qualitative sense it is true, because many people simply “bin” complexion into white and non-white, with the latter encompassing a range all the way from pale olive-brown to black. Really the perception is a function of human culture, and ideas of contagion. I don’t like to make invidious accusations of racism often (I don’t think they’re warranted most of the time), but the perception that dark skin is dominant over white skin seems pretty easily explained by hypodescent within a framework of white racial superiority and exclusivity. Most people who have this impression are not racist at all, but, as per the cliche they’ve internalized some perspectives about the recessive nature of whiteness which derives from a model whereby racial purity is essential and necessary for white identity. And, as I like to say, revealed preferences are telling. The majority of whites rapturously reading Ta-Nehisi CoatesBetween the World and Me have mostly white friends, live in mostly white neighbors, and date mostly white people. Yes, some of this is happenstance, but a sequence of events which consistently fall in one direction indicate preferences at variance with avowals of racial neutrality (Seinfeld and Girls operate in core white social worlds in a riotously diverse megalopolis where whites are a minority; believe it or not you can be friends mostly with people who are not the same race and exhibit good mental health, just ask me about my experience).

With that sociological tangent out of the way, what does this mean? What if I was GG, instead of AA, on SLC24A5? You would expect I’d be about 10 MI units darker. Instead of being an average complected South Asian, neither dark nor fair, I’d be a dark skinned one. As the above statistics suggest it is very rare to find someone of unadmixed European background who carries a G allele at this SNP. But some do exist in the above data, so what would they look like? Let’s take a Northern European, with an MI ~30. The predicted value is about the same as for a “white Cuban.” In other words, they would be swarthy, notably so in Northern Europe. How about two alleles, so they are a homozygote for the ancestral allele, G. You don’t really see Europeans with this genotype at all today. Assuming all other loci the same (e.g., probably the derived variant on SLC45A2), it looks as if you’d expect this Northern European substituted at that SNP be about the same complexion as many Northern Indians today. Though some Northern Indians can pass as white, they are not common. Most are visibly brown in some sense.

But wait, there’s more! SLC45A2 is not as strong an effect as SLC24A5, but it’s still significant. In the Cuban study a substitution at its major SNP of interest has an effect of ~3 units. If the genotypes at both these loci were ancestral homozygous in a Northern European, then the expected MI would be > 45. That’s around where the Senoi of Malaysia are. Definitely brown, a touch on the darker shade. Then there are other loci, TYR, TYRP1, ASIP, KITLG, and APBA2. Few enough that I can name, but enough that touching on each would be repetitious and boring. SLC24A5 and SLC45A2 seem relevant to pigmentation anytime you have a West Eurasian population in the mix. The other loci are hit and miss. But one thing that comes out of the studies in admixed populations is that there is still a significant residual that has not been accounted for in this variation. In the Cape Verde study 44% of the variance seems to be due to “genomic ancestry.” That is, African vs. European. The implication here is that the loci we’re catching are at the large effect end of the long tail of distribution of effects, and there are smaller effect loci still segregating which we haven’t picked up. In European populations where a lot of this work began only a few large effect loci may be segregating, with the others being fixed, and so not variable. This doesn’t change the big picture about the genomic architecture. But, it’s more like half a dozen loci can explain half the heritable variation, as opposed to 90%. At least in that study (it seems that the population you are studying matters for the final summary statistic).

eye I left OCA2 and HERC2 out of the above list for a reason. Looking at them alone gives me a reason to post this beautiful figure of eye color distributions on a two dimensional axis. As most of you may know, SNPs in the OCA2 and HERC2 region of the genome account for most of the blue vs. brown eye color variation in Europeans. Eye color varies less in human populations, and fewer genes likely effect this variation. In the Cape Verde sample the proportion of variation explained by African vs. European ancestry was 44% (the r-squared). For eye color? A mere 8% (note that they used an RGB quantification scale, rather than binning phenotypes). The correlation between skin color and eye color in this data set was 0.38, so 14% of the variation of eye color could be explained by variance of skin color.

kartandtinki1_vanessa-williams_03.jpgThe combination of brown skin and light eyes in women such as Vanessa Williams, the first black Miss America, is totally understandable. All black Americans with roots in this country have ancestry that goes back to the 18th century at the latest, and all of them have white American ancestry (I’ve looked at a lot of black American genotypes; they’re mostly African, but all have some European ancestry, and I literally mean all). So the derived variants around OCA2 and HERC2 are segregating at frequencies weighted by European ancestry in African Americans, ~20% × 75%, so 0.152, which implies that a few percent of African Americans should have light eyes. While skin color seems mostly additive, eye color does seem to exhibit a recessive expression pattern for the lighter variants. Therefore you need to square the q element of the Hardy-Weinberg equation in this case.

kgt But are the variants that result in blue eyes only relevant for eye color? Might they not explain skin color as well? That depends. The Cape Verde study did not find any of the blue vs. brown eye color SNPs to correlate with skin color when one controlled for genomic ancestry and the state of a nearby pigmentation gene. In contrast, the Cuba study did find that an OCA2 marker had an effect on skin color, a little over 1 MI units. This is a smaller effect compared to SLC24A5 obviously, but it is still an effect. As I indicated above, if you follow this literature you notice that a few genes have major effects no matter how you mix and match the data set and population coverage. Others are spottier, and may not reach statistical significance, depending on your mix of populations. It is important to not make one study dispositive of any particular thesis.

What about hair color? While blue eyes are the majority state in much of Northern Europe, blonde hair in adults is rarer. This makes sense when you notice that one of the major pigmentation genes associated with blonde hair, KITLG, in a derived allele, only has a frequency of that allele at 15% in much of Northern Europe. That means that only a few percent of individuals are homozygote. The above image of mice is from A molecular basis for classic blond hair color in Europeans. The individual in the middle is a heterozygote. The authors claim that they can see a subtle effect. I suppose it’s there if you squint (my son is a heterozygote, and I will report his hair is lighter than his sister’s, who is homozygote for the ancestral variant). The individual to the right in the figure is an pale homozygote for the derived allele. This locus also shows up in cats and horses in generating tissue specific depigmentation, though in humans it has also been implicated in skin color and testicular cancer as well (yes, you read that right!).

But the scientific story about pigmentation isn’t simply one of GWAS after GWAS. There’s a huge evolutionary story here involving classic population genetic parameters, in particular natural selection. Many of these alleles have been implicated in selective sweep events. That is, the allele has increased in frequency very rapidly, often very recently. One major tell is that there are long haplotype blocks around these alleles. This means that there are sequences of variants closely associated with each other, which is suggestive of the fact that they’re co-inherited together as a unit in a region of the genome where the frequency is increasing faster than recombination can break apart the association. The region around OCA2 and HERC2 is Europeans is the third longest haplotype in the Northern European genome. SLC24A5 is a long haplotype that has very little variation in it from which one can infer structure. The paper above, The Light Skin Allele of SLC24A5 in South Asians and Europeans Shares Identity by Descent, the authors sequence the region around that locus to smoke out variation. There just isn’t that much time for the derived allele for to have accrued mutations. They conclude that the SNP in SLC24A5 responsible for lighter skin derives from a common mutation across all the populations in which it is prevalent. That is, the SNP spread through migration or selection from one individual, rather than the extant variation of a population, so that there were several genetic backgrounds from which selection could. A paper from 2013, Molecular phylogeography of a human autosomal skin color locus under natural selection, attempts to look at the haplotype patterns with a bigger population coverage but lower marker density. It comes to the conclusion that “The distributions of C11 and its parental haplotypes make it most likely that these two last steps occurred between the Middle East and the Indian subcontinent.” In other words, the SNP took off from a launching pad in West Asia. If you look at their evidence it is modest at best, they don’t have many variants to generate haplotypes, especially in a genetic region which lacks diversity.

10K0 All this talk about the past has been about inference. In the South Asian paper they use Bayesian methods to infer that the derived allele SLC24A5 arose in a genetic background which coalesces 20-30 thousand years ago, with enormous confidence intervals on the order of tens of thousands of years. You don’t know much more than you already did, as the distribution of the derived variant strongly suggests it arose after East and West Eurasians diverged. Haplotype based methods suggest that the sweep up in frequency increased only in the last 5-10 thousand years.

So what do the ancient DNA tell us? The figure to the left is from Eight thousand years of natural selection in Europe. You can see that there is a transect in time of alleles in Northern Europe. Blue is the variant in SLC24A5, green is SLC45A2, and red is OCA2. The variation in allele frequencies over time are pretty similar to what you’d expect for a positive selective sweep, which is what the genomics is telling us occurred. The sweep of SLC24A5 is to fixation. This makes sense on an additive trait where selection prefers homozygote state to heterozygote state. SLC45A2 is close to fixation, though not as total as SLC24A5. Its trajectory has been more gentle, indicating a lower selection coefficient, a least across its arc up toward fixation. For OCA2 the pattern looks like one of demographic decline, as it was fixed in European hunter-gatherers. And yet at some point the frequency began to increase again. As this region of the genome has a long haplotype it’s suggestive of selection, and not just demographic change. Since blue eyes are recessive one major issue for any selective model that hinges on this trait is how selection would be effective at lower frequencies. E.g., if 20% of the population has the alleles then only 4% of the population has the favored trait.

Of course there is Population Genomics in Bronze Age Eurasia, which has a much larger number of SNPs. But unfortunately as they went with a whole genome methodology, they didn’t target the most important functional markers, but caught a lot of tag SNPs which are associated with the major ones. You can find the list for the populations in the supplements, but there are a lot of other genes. I took the table and filtered it for pigmentation SNPs, and also added the ones from the above paper. There is one overlap, at OCA2. As most of the SNPs are not super critical, I just paired them down to really informative ones. You can access the full spreadsheet here.

Bronze Age
SNP gene Africa N_Eur S_Asia S_Eur Asia Eur Step HG Neo SHG WHG EN BA Yam
rs12821256 KITLG 0.00 0.17 0.03 0.05 0.13 0.07 0.33 0.00 0.10
rs1805005 MC1R 0.00 0.08 0.01 0.20 0.00 0.05 0.00 0.00 0.00
rs1805007 MC1R 0.00 0.10 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rs1805008 MC1R 0.00 0.07 0.00 0.03 0.00 0.03 0.00 0.00 0.00
rs1805009 MC1R 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
rs2228479 MC1R 0.00 0.07 0.09 0.10 0.00 0.13 0.20 0.00 0.00
rs885479 MC1R 0.00 0.12 0.08 0.03 0.09 0.00 0.00 0.00 0.00
rs885479 MC1R 0.00 0.12 0.08 0.03 0.09 0.00 0.00 0.00 0.00
rs12913832 OCA2 0.01 0.85 0.08 0.30 0.40 0.41 0.00 1.00 0.56 1 1 0.5 0.5 0.1
rs2470102 SLC24A5 0.05 1.00 0.73 1.00 0.94 0.95 1.00 0.33 0.88
rs28777 SLC45A2 0.12 0.98 0.23 0.95 0.50 0.61 0.33 0.43 0.56
rs35395 SLC45A2 0.16 0.98 0.23 0.95 0.78 0.56 0.00 0.20 0.33
rs1426654 SLC24A5 0.00 1.00 0.69 1.00 0.65 0.18 0.9 1 1
rs16891982 SLC45A2 0.00 0.98 0.06 0.90 0.65 0.00 0.2 0.75 0.4

erin-chambers-05 I didn’t mention MC1R much above because it doesn’t explain much variance. It’s well known for two things. First, there’s a huge body of research from the era of classical mouse genetics on this locus because of its importance in fur coloration, and coat color across mammals in general. Second, a lot of knockouts at this locus seems a necessary, but not sufficient, condition for being red-haired or a ginger. The decreased production in eumelanin combined with constitutive production of pheomelanin results in a reddish tinge. Most people have pheomelanin, but it’s masked by emelanin. When I’ve bleached my hair there are two stages. First, the eumelanin gets stripped out, and my hair is left reddish/copper colored. Then a second bleaching removes the pheomelanin.

Before the “golden age” of pigmentation genetics, basically between December of 2005 and the end of 2007, there was a lot of exploration of MC1R because that’s where the light was. Here’s a paper from 2000, Evidence for Variable Selective Pressures at MC1R:

It is widely assumed that genes that influence variation in skin and hair pigmentation are under selection. To date, the melanocortin 1 receptor (MC1R) is the only gene identified that explains substantial phenotypic variance in human pigmentation. Here we investigate MC1R polymorphism in several populations, for evidence of selection. We conclude that MC1R is under strong functional constraint in Africa, where any diversion from eumelanin production (black pigmentation) appears to be evolutionarily deleterious. Although many of the MC1R amino acid variants observed in non-African populations do affect MC1R function and contribute to high levels of MC1R diversity in Europeans, we found no evidence, in either the magnitude or the patterns of diversity, for its enhancement by selection; rather, our analyses show that levels of MC1R polymorphism simply reflect neutral expectations under relaxation of strong functional constraint outside Africa.

The basic model here is that MC1R started losing function due to relaxation of constraint, and variation started to become dominated by neutral processes. It turns out that Neanderthals too had variation around MC1R. Further investigation suggests that modern Europeans don’t seem to have this variant. More recent evidence suggests that some haplotypes did introgress from Neanderthals at this locus, though perhaps into East Asians far more than Europeans.

So look at the MC1R SNPs in the table above. Neolithic and HG samples are all fixed for the derived variant. That is, one reason it seems implausible that the diversity of MC1R in Europe today is due to long term drift in situ is that it didn’t exist in the continent before the arrival of people from the steppe.

Second, rs12821256, in KITLG, associated with blonde hair in Europeans, is also no present in the ancient hunter-gatherers. But, it is present in the Neolithic farmers, as well as the people coming from the steppe. In fact the steppe samples have a higher fraction than any modern population (in the 1000 Genomes the frequency is ~20% in the British and Finnish samples). Remember, KITLG has been implicated in skin depigmentation in several studies, though the effect size is more modest than SLC24A5.

For the two solute carrier genes the trends are what we already knew. The frequency for 24A5 is high in the steppe, in fact, fixed, and high among the Neolithic farmers. It is low in Western European hunter-gatherers, and segregating at modest frequencies among the Scandinavian hunter-gatherers. The work above suggestions that the genetic background around rs1426654, which is a nonsynonomous change, dates to the Upper Paleolithic. But, both ancient DNA and haplotype based selection methods suggest that in places like Europe and India the frequency of this allele and its flanking sequence have been rapidly rising over the past ~10,000 years. The fact that some European hunter-gatherers had the derived variant of rs1426654, seems to confirm the idea that this mutation arose during the Ice Age, and was widely distributed. But, we can’t really adduce where the particular variant came from until we get good haplotype data from these ancient samples. Let me quote from Molecular Phylogeography of a Human Autosomal Skin Color Locus Under Natural Selection:

With sufficiently strong positive selection for C11, it is possible that this haplotype could have originated anywhere within its current range and spread via local migration. However, selection acting in concert with major population migrations would have facilitated a much more rapid dispersal. Archeological, mitochondrial, and Y-chromosomal data suggest involvement of multiple dispersals in shaping the current populations of Europe and the Middle East (Soares et al. 2010). Because A111T is far from fixation in most Indian samples (Table S1), the high diversity of B-region haplotypes associated with C11 in the GIH sample may be the result of prolonged recombination rather than early arrival of A111T. In fact, the decrease in frequency of A111T to the east of Pakistan suggests that C11 originated farther to the west and after the initial genetic split between western and eastern Eurasians. On this basis, we hold the view that an origin of C11 in the Middle East, broadly defined, is most likely.

Where does this leave us? First, we understand the genetic architecture of normal variation in pigmentation in humans to a good degree. Depending on how much residual there is in smaller effect QTLs there are publications to come which will probably yield a few more genes, but the remaining variance may simple be distributed across many small-effect loci. Second, the frequency of many pigmentation genes seems have changed due to natural selection. in South Asia and Ethiopia the methods have been able to detect genomic signatures of positive selection at SLC24A5. It can’t be ancestry alone, just look at table S5 for South Asia. The range across populations is huge, even if you exclude those with enriched East Asian ancestry.

Third, we don’t really know why this selection occurred across these pigmentation genes. This is going to sound strange of course. There are many theories out there. Readers regularly ask me what I think about Peter Frost’s thesis. My standard response is that I’m skeptical, but who knows? Peter has asserted that the selection he speaks of began in a very narrow delimited area in northeastern Europe. In the next few years we will have ancient DNA and be able to test some of his predictions. A more widely accepted thesis is promoted by Nina Jablonski in Skin: A Natural History. In her model at lower latitudes selection constrains variation due to high UV, while at higher latitudes there is relaxation of that constraint, and selection for vitamin D synthesis. The story is neat, but selection for SLC24A5 at lower latitudes, and higher elevation as those latitudes, occurs.

gh_map_world_v7The map to the left makes clear that the Sudan has some of the highest radiation levels in the world. It is reasonable then that people in this area would have darker skin than anywhere else. But Ethiopia’s radiation levels are not that much lower. And yet we know that there hasn’t been strong selection against the light skin alleles presumably derived from West Eurasian migrants. Rather, the reverse has occurred! None of the parsimonious models seem to explain very well the complexity on offer here.

Then, as Graham Coop observed in response to an Ewen Callaway piece in Nature where the latter inferred that European hunter-gatherers must have been dark skinned and blue eyed because of what genetics implies, we don’t really know the genetic architectures of pigmentation of ancient individuals. The reason is simple: we have genotype data, but not phenotype data. East Asians and Western Europeans converge upon lighter complexions via diverse genetic mechanisms, so why couldn’t ancient European hunter-gatherers be the same? This is a fair point. And, if true, then selection on pigmentation loci couldn’t, by definition, target pigmentation, since there wouldn’t be much heritable phenotypic variation to select upon.

401px-Vanuatu_blonde-200x300 But in response to the idea we should be phenotype-agnostic, pigmentation is one of the most well characterized traits for mammals in regards to the genetics. The parameter space of possibilities is not infinitely constrained. The same genes, and sometimes same mutations, re-occur across different populations. The reason some Melanesians have blonde hair is due to a mutation in TYRP1. Again, this is a locus implicated in pigmentation variation across many populations, and in other mammalian lineages. If we had good high quality whole genome sequences we could actually look for functional mutations across a set of pigmentation loci. If ancient European hunter-gatherers were functionally constrained around the pigmentation genes, or subject to neutral dynamics, that would be informative. A better characterization of all the diverse modern populations will probably give us better expectations of the size of the parameter space of genetic variation and how it maps onto phenotypic variation.

I’ve been giving a lot of thought to this topic for a while. And I have to say that in terms of the evolutionary origin of this trait and its variation, I’m left befuddled. After talking to researchers who are on the cutting edge in this area I’m pretty sure they are confused, too. That’s not dispiriting; that’s the state of science before discoveries push the edge of knowledge further. But, I’d also appreciate it if in response to this very long post readers don’t go Google Pundit on me and start throwing down a list of publications which resolve all these problems. I’m moderately familiar with this literature, and have probably internalized studies which go in both directions. In response to a post into which I put more effort over the last day than I probably should have, I expect the comments to be not-annoying. Or else (I assume you know what’s in that conditional!).

• Category: Science • Tags: Genomics, Pigmentation


Byzantine Empire 717 A.D.

Byzantine Empire 717 A.D.

I’ve been looking at some European genotype data. So I have some samples from Greece. One of the things I noticed is that there seem to be two clusters of Greece. You can see it above. The Italian sample is really a southern Italian one (not Sicilian though). The Balkan sample are Serbs, Bulgarians, and Romanians. You can see that they are shifted toward the Poles. And so are the Greeks, in comparison to the Italians. This is not entirely surprising. What was surprising to me was that there were a number of Greeks who in the same cluster at the Italians.

The historical context for this are the Sclaveni migrations. These were Slavic peoples who pushed south, as far as the Peloponnese, after the Byzantine Empire ceded the Balkans to barbarian groups due to threats in the east from Persians and then Muslims. In fact the demographic basis of the Byzantine Empire between the loss of the Levant and Egypt and the Battle of Manzikert was Anatolia in 1056, though there were fortifications around major cities such as Thessaloniki. After the loss of their Anatolian heartlands to the Seljuks the Empire turned back toward the Balkans, which had been conquered by Basil II in the first decades of the 11th century.

51SyHrRbsQL._SX325_BO1,204,203,200_ These results, and others, indicate the impact of the Slavic migrations on the Balkans, and Greece proper. But, what they also suggest is that there is population structure within Greece. Why? I can think of two hypotheses. First, some of the islands in the Aegean were never touched by Slavs, and may have maintained endogamy until the modern period. Even if the Slavs never conquered the cities, their impact would be felt by migration from rural areas. But in a pre-modern era barriers such as water and mountains often serve as potent obstacles to continuous gene flow. The second, to me more plausible, scenario is the second cluster without much Slavic genetic impact are those who descend from Anatolian Greeks, who arrived in the early 20th century due to the population exchange with Turkey. These western Anatolian Greeks would have shielded from the Sclaveni migrations obviously.

To tease the relationships apart I decided to run TreeMix 20 times. As per reader suggestion, I won’t give you all the plots. But you can download them. Below is a representative one. The various Jewish groups form their own clade. The affinity of Cypriot Greeks with Anatolians is a function I believe of the fact that they are culturally Hellenized (the ancient Bronze Age polity of Cyprus was part of the orbit of Egypt, and was not Greek), even if that is an ancient occurrence. I separated the Greeks into two cluster, the major one being “Greece” and the minor one clustering with southern Italians as “GreeceItaly.” What is pretty obvious is that GreeceItaly has much less of the Slavic admixture. In this tree the Greeks proper are placed near the Balkan and Polish position on the graph, but with a huge migration arrow from nearly the GreeceItaly position. The Balkan node has a smaller migration parameter. The Greeks tend to flip from being near the Poles to being near the GreekItaly cluster, and swapping the migration arrow direction.


• Category: Science • Tags: Greeks

Yesterday I tweeted out an article, Coca-Cola Funds Scientists Who Shift Blame for Obesity Away From Bad Diets. The title, and frankly, the story is a bit slanted. I wasn’t totally comfortable about the piece…but I really hate the soft drink industry. So much of our obesity problem would go away if people stopped drinking the stuff. The funding does not necessarily entail a particular conclusion. Rather, conclusions can lead to funding. But, we’ve all seen the research which suggests that pharmaceutical companies do trials which have suspiciously high success rates. And scientists are human beings,and it seems that even unconsciously biases can slip in. We need to balance the tensions and not get carried away by an extreme perspective about the nature of human motivations. Scholars are no saints.

But the ultimate focus should be on the science. That’s really what’s at the heart of the matter. My friend Kevin Klatt, who studies nutrition at Cornell, outlines his own concerns at length about The New York Times piece, Funding: Tales of Defamation:

Until the industry funded research argument is balanced by an equally loud message that non-industry funding is highly limited, those shouting the loudest do little to address their own issue. This notion that researchers seeking industry money are doing conflicted research does little but subtly suggest that academic researchers find a new job or risk having their reputations threatened due to their funding source (no, I’m not being dramatic – go look at article’s written about Susan Jebb). Keeping up with the academia lifestyle is busy enough without a bunch of people who aren’t in your field telling you how you should fund yourself. If you get the time, I’d also urge you to consider educating individuals’ to encourage NIH to fund nutrition research that has been established as a priority by organizations like ASN. As evidenced by the seemingly consistent stream of low-fat vs low-carb studies in the literature, NIH doesn’t seem to be paying attention to these.

What’s a scientist to do? This is a fallen world, and we are of it. Obviously there are cases where the conflict of interest is extreme. But often funding from private sources is what researchers have to do to keep their work afloat. If money was what scientists were after…they would actually go work for their funders.

Second, I want to point you to what’s going on with Kevin Folta. He’s a passionate researcher at University of Florida who works on GMOs. You know where this is going. The Radical Activist Attack on a Teacher:

When asked about my speaker fees I always just say, “Take what you think would be customary and donate it to my outreach program.” We’re talking thousands of dollars here.

In Fall of 2014 the Monsanto company offered support for the program, and I thought that was great. Love ‘em or hate ‘em, my workshops were teaching everyone from kids to scientists, so I was glad to welcome their support.

It never was a secret. At universities, our records are public, and people know where our funding is from. You can probably find it online if you look hard enough, but just ask and I’m glad to tell you about who sponsors my research or who sponsors my outreach.

Last week the public information voluntarily hit the right activist ear, and they went ballistic. Screams of “Shill!” could be heard everywhere from drum circles, to hackeysack games, to the Whole Foods Gluten Free Bisque Repository. After all, $25K is a lot of money, so to most people this was the smoking gun of high collusion they always suspected. Heck, anyone that talks about science must be getting paid off.

Kevin’s been put on blast by activists. It’s Mon$santo all the time. He’ll persevere, because he didn’t do anything wrong and untoward. But now those who are not heavily engaged on the topic are going to have to discern whether Monstanto is poisoning our crops and buying our scientists.

I guess it shows that sometimes the substance of science matters less than style. No one really knows anything about nutrition. I exaggerate for effect, but you know of what I speak. In contrast, we know a fair amount about GMO. But in both cases there are passionate public debates, and egos being bruised and reputations shredded.

I’m glad I’m not very controversial!

• Category: Science • Tags: GMO

As you can see from the Tweet above some people are trying to score political points about off Sundar Pichai being tapped to lead Google. I joked in response that these CEOs “sure don’t look like America.” Excessive focus on whom/whom issues inevitably gets knotty and difficult to navigate. I don’t personally care who makes good products as long as the products are good. But reading a Time magazine piece, Everything You Need to Know About the New CEO of Google, made me reconsider an assumption I’d had. The article ends: “He’ll join Microsoft chief Satya Nadella as one of the few minority CEOs in Silicon Valley.” This is a pretty strong assertion. My impression is that at large firms like Apple the management does tend to be white males, while the engineering talent is Asian or Asian American to a much higher degree. But I’d never bothered to check.

If you go to the Wikipedia entry for “Silicon Valley” it has an entry for notable companies. In particular, I looked at the ones which were “Fortune 1000.” Some are very well known. Google, Yahoo, and Apple, for example. Others are lower key, but not obscure. Juniper Networks is probably one of those. Then there’s Xilinx and Maxim Integrated Products, which occupy opposite poles of distinctiveness and lack thereof of corporate names, despite being obscure to the general public. I don’t recall hearing of them before I saw them on the list.

It’s not that hard to look up CEOs, and that’s what I did. The results are below.


To my surprise there’s actually a fair number of minorities as CEOs at large firms with a a presence in Silicon Valley. This went against my expectation. 5 out of 32 CEOs in “Fortune 1000″ Silicon Valley firms were of Indian ethnicity. That’s ~16%. As ~1% of the American population is Indian American, that means they are more than an order of magnitude over-represented among CEOs. 21 out of the 32 CEOs were white, 23 if you include the two Middle Eastern men (if they had Southern European names they would definitely be categorized as white). So whites are actually barely over-represented among these CEOs in comparison to the general American population (~63% for non-Hispanic whites). Of course I don’t deny that in comparison to their representation in professional ranks at these types of firms people of Asian origin do seem under-represented in management overall. But, I’d challenge the null hypothesis that society can or should aim for perfect proportionality in all facets of life, and deviations are only due to invidious discrimination, implicit or explicit (there’s very little explicit discrimination, but there is some implicit discrimination when people use words like “corporate culture”). We don’t know all the various factors which result in these sorts of statistics, and Silicon Valley is too important to American productivity to tinker with too much.

Fortune 1000 Company Demographic
Adobe Systems Brown man
Marvell Semiconductors Asian man
Nvidia Asian man
Advanced Micro Devices (AMD) Asian woman
Brocade Communications Systems Black man
Google Brown man
LSI Logic Brown man
NetApp Brown man
SanDisk Brown man
Juniper Networks Middle Eastern man
Maxim Integrated Products Middle Eastern man
Agilent Technologies White man
Apple Inc. White man
Applied Materials White man
Cisco Systems White man
eBay White man
Electronic Arts White man
Facebook White man
Intel White man
Intuit White man
KLA Tencor White man
National Semiconductor White man
Netflix White man
Oracle Corporation White man White man
Sanmina-SCI White man
Symantec White man
Western Digital Corporation White man
Xilinx White man
Hewlett-Packard White woman
Lockheed Martin White woman
Yahoo! White woman
• Category: Economics • Tags: Silicon Valley
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"