The Unz Review - Mobile

The Unz Review: An Alternative Media Selection

A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media

Email This Page to Someone

 Remember My Information

 Gene Expression Blog

9781400067930 Though Nassim Taleb is more well known for The Black Swan, I actually liked his earlier book Fooled by Randomness, better. It seemed aimed toward more general issues than The Black Swan.

One of Taleb’s hobby-horses in Fooled by Randomness is that the book The Millionaire Next Door was based on faulty inferences, and misleading many people. This was back in the heady days before the property bust, so many middle class individuals were investing in the “can’t miss” and eternally appreciating real estate bubble. In any case, The Millionaire Next Door had a simple strategy: observe the characteristics of millionaires, and so gain insight into what might make you a millionaire. The problem pointed out by Taleb is that the sample set is highly biased; you see all the millionaires with the characteristics of interest, but not the more numerous non-millionaires. One of the major variables, perhaps the major variable, in becoming a millionaire is what we’d all luck. There may be many necessary conditions, but luck is one we can’t cultivate. One might increase the chance that one is a millionaire…but The Millionaire Next Door misled many people into thinking that just by doing what millionaires had done any person could become on themselves.

So consider this from The Wall Street Journal, Best-Paid CEOs Run Some of Worst-Performing Companies:

The analysis, from corporate-governance research firm MSCI, examined the pay of some 800 CEOs at 429 large and midsize U.S. companies during the decade ending in 2014, and also looked at the total shareholder return of the companies during the same period.

MSCI found that $100 invested in the 20% of companies with the highest-paid CEOs would have grown to $265 over 10 years. The same amount invested in the companies with the lowest-paid CEOs would have grown to $367. The report is expected to be released as early as Monday.

The original report is also online. There are other studies which support this conclusion. The correlation between CEO pay and firm performance is relatively weak to non-existent.

Does this mean CEOs are worthless? Not necessarily. There’s some range constriction going on. The average person on the street wouldn’t have the minimum necessary skills and aptitudes to be a CEO of a large firm. But the variation among CEOs in pay might be due to a whole different set of skills than the characteristics which constrain the set of individuals who might become CEOs. For example, the average CEO might be far more conscientious and intelligent than the average person. But, it may be that the less conscientious CEOs actually get paid more. And then of course there is luck in falling into a good board situation, which anchors you to a particular set point in terms of future salary expectations. And the outcome of a firm may have only the most marginal relationship to the CEO performance (consider how we attribute macroeconomic performance to American presidents, when they probably have only marginal influence on the business cycle).

And once you make it into a particular class, social connections can help prevent you from sliding back down. To a great extent the same of Yahoo to Verizon is a failure for Marissa Mayer. But she’ll be fine, and obtain another CEO position if she so chooses. If she had turned around Yahoo, always a long shot, she would have been dubbed a genius. As it is, she’ll get a golden parachute and look to future opportunities.

What’s the take-home less? Social mobility is a thing in the United States. But the reality is that what you really need to do is somehow make it into a particular segment of the class structure. Once you are there, the reality is that your own competence probably matters less than chance and necessity. Even if you don’t become a superstar, the nature of the American class structure will probably make it so you’ll be shielded from the bracing consequences of creative destruction.

• Category: Economics • Tags: Economics

51Qh5-h64SL._SX384_BO1,204,203,200_ When people ask me what they should read to understand genetics, I don’t really know what to say. But An Introduction to Genetic Analysis is what I reviewed for my genetics qualifying exam. If you want to understand what PCA is, the Wikipedia page should suffice, especially if you have taken linear algebra. Perhaps ironically for someone interested in evolution and genetics I’ve read only a few textbooks devoted to these topics. Rather, I try and read papers. And with the preprint revolution there’s really far less of an excuse to not engage with the literature in such a direct fashion if you are interested.

re: question about inferring admixture from allele, as opposed to genotype data. One could convert to diploid genotype. Or, one could use a PCA based admixture method which takes allele data as inputs.

First CRISPR trial in humans is reported to start next month. In China.

The Great Ordeal finished with a bang. I’d recommend it, though it is a difficult and frustrating read. Even being conditioned by the previous books that the protagonist is pretty creepy, it went even further in The Great Ordeal. But R Scott Bakker shines where you’d expect, in world-building and haunting evocations and expositions of what had heretofore been beyond the horizon. In particular the sections in Ishterebinth illustrate Bakker’s ability to take a tired trope, elves (he calls them nonmen), and transform it into something novel and multi-textured. Interestingly, as I was reading these sections I began to think that the nonmen looked just like the engineers in the world of the Alien films, and someone also added that observation to their entry in the wiki.

Congo: The Epic History of a People is kind of like reading Oedipus Rex. It’s hurtling toward tragedy. For the section on the “Great War in Africa” I’d just recommend Dancing in the Glory of Monsters. One might ask, why read books like this? Because to confront reality is hard, but to understand the world one must expose oneself to horrible truths.

One of the aspects of American culture that I have long disliked is the inability to acknowledge that democratic polities will naturally lead to an element of populism, and the people are often illiberal. The Founders were aware of the pitfalls of democratic populism, but the skepticism of the 18th century gave way to the embrace of democracy in the Age of Jackson. I’ve long been skeptical of this, but it’s interesting to watch people attempt to deny legitimacy to popular will where in other cases that is all that matters.

Joshua Schraiber is looking to get some post-docs.

In other news, why do people with Ph.D.s aim to get post-docs so that they can get a job in the private sector? Shouldn’t the 5+ years in a Ph.D. program in the biological sciences train you for jobs outside of academia? If not, then we’re doing it wrong.

I don’t talk about contemporary politics much. That’s because I don’t have much to say. On some topics, such as international affairs, not to be immodest, I’m actually more well informed on history and ethnographic detail than many people who write columns. But because I know a fair amount I’m also conscious of how little we can say concretely. Stuff happens. Big coarse heuristics are probably for the best, because this isn’t like sending a probe to Jupiter. We just don’t have a good grasp of mechanics. As for domestic politics, my current attitude is to ask my friends every now and then what’s happening. My time is better spent on intellectual interests, working, and spending time with my family.

So are there neighborhoods where kids hang around on the block? A suburban cul-de-sac? That’s the childhood I want for my kids, but the streets seem to be empty of children. Are they playing video games?

Uncle Sam Wants You — Or at Least Your Genetic and Lifestyle Information.

Someone asked me about Game of Thrones a few weeks back. Everything seems to moving in directions you’d predict. I suspect that much of the narrative in the book is not going to be so pat. The show-runners for the HBO series seem to want to squeeze an incredible amount into the last two seasons, while Martin has at least two books to go, and probably three (his books are barely physically feasible, there are so many pages).

One thing watching the television show has impressed upon me: the average IQ of people watching television is much lower than those who read books. The “theories” promoted by those who primarily watch the television show are often far stupider than anything I remember from the message boards of the late 1990s and early 2000s, when those who read the books came up with plausible models such as R+L=J.

Unlike most of my friends I don’t have a problem with gentrification. If a city is expensive, then only people who can afford there should be able to live there. That might impact the cost or availability of services provided by low wage earners, but that’s just how life goes. But being a gentrifier myself it’s interesting to see neighborhoods in transition. The demographic switch can happen very rapidly (e.g., if I see young white women on a block I assume it’s safe). But there is the phenomenon of established businesses often being geared toward the lower-income population that was previously dominant. Eateries and churches might still be frequented by old-timers, who hang around in some way almost as ghosts, strangers in the neighborhoods that grew around them.

51ucb328bdL The Kindle version of The High Frontier: Human Colonies In Space was free yesterday, so I bought it. There are some awesome things going on in space right now, and it’s fascinating to look back to a time when this was the science which captured the public imagination. It strikes me we are in the golden age of planetary probes, so who is the Richard Dawkins of this field?

The whole DNC email leak and Debbie Wasserman-Schultz resignation strikes me as strange. Obviously I don’t follow politics, because everyone knew they were engaging in these shenanigans. Is it different because we know for a fact?

Detecting Heterogeneity in Population Structure Across the Genome in Admixed Populations. I think the method is a bit under-powered…but I think that’s because local ancestry deconvolution hasn’t progressed that far in the past 3-4 years. I hear things will change soon. Also, high-quality whole genome sequences will change things.

Evolution Is Happening Faster Than We Thought.

I’ve started a Blue Apron subscription. Pretty impressed so far in that it has “nudged” me to start cooking.

• Category: Miscellaneous • Tags: Open Thread

41Y1PqrWh5L._SX392_BO1,204,203,200_ One of the interesting things about genetics, and population genetics even more specifically, is how the theory and analysis outran the biophysical mechanism of the phenomenon. By this, I mean that the Mendelian laws inferred from transmission of physical characteristics predate any understanding about how genes were embedded within chromosomes, let alone the structural nature of DNA.

Population genetics, which fused the quantitative evolutionary thinking of the biometrical school with Mendelism, arguably outran the data by decades. Until the molecular evolution revolution of the 1960s controversies such as the role of selection and drift in shaping variation were rhetoric rich and data poor. Though the allozyme era was clarifying, I do think people who were shaped by that era get a bit fixated on being a particular camp. In contrast, with the genomics revolution many researchers seem to be more willing to let the data speak, because the data is so copious. A model that is relevant in one part of the tree of life may not be as predictive in another portion of it.

The rise of data makes old questions live again. With that, I present a paper in PNAS where the first author is Jonathan Wakely, a pioneer of coalescent theory, Effects of the population pedigree on genetic signatures of historical demographic events:

Genetic variation among loci in the genomes of diploid biparental organisms is the result of mutation and genetic transmission through the genealogy, or population pedigree, of the species. We explore the consequences of this for patterns of variation at unlinked loci for two kinds of demographic events: the occurrence of a very large family or a strong selective sweep that occurred in the recent past. The results indicate that only rather extreme versions of such events can be expected to structure population pedigrees in such a way that unlinked loci will show deviations from the standard predictions of population genetics, which average over population pedigrees. The results also suggest that large samples of individuals and loci increase the chance of picking up signatures of these events, and that very large families may have a unique signature in terms of sample distributions of mutant alleles.

The paper is open access, so read the whole thing. The major math is tucked away in the extended material. Many of the formalisms in the text are those you’d regularly encounter in population genetics. The issue they’re addressing here is the fact that real populations exhibit pedigree structure, and even unlinked loci, which we treat as independent evolutionary histories, share a pedigree history.

If you read the text though it is notable how robust standard population genetic inferences are to the fact that in a literal sense they’re based on false assumptions. Massive demographic expansion (e.g., Genghis Khan haplotype) and unrealistic selection coefficients don’t seem to disturb the lineages enough so that the assumption of independent assortment starts to become misleading.

This shouldn’t be entirely surprising. I would argue that genomics has not really revolutionized evolution or population biology. The big frameworks are vindicated because nature is one, and the glimmers of reality you see in sparse data nevertheless sample from a comprehensible underlying distribution. As we get more data we’re getting more clarity, but the overall picture is not shocking or surprising.

Citation: John Wakeley, Léandra King, and Peter R. Wilton, Effects of the population pedigree on genetic signatures of historical demographic events

• Category: Science • Tags: Evolution, Genetics

51zeajUmWhL._SX316_BO1,204,203,200_ Reading The Essential Talmud about ten years ago I vaguely recall the author stating that it was common for working class males to devote each day to one page of one a tractate from the commentaries on the oral law of the Jewish religion. As I am not religious, and look dimly on excessive orthopraxy, it struck me as a depressing thought.

But I am not entirely different. I often will relax at some point in the day and open up a random page of a population genetics textbook. Just as those Jewish men attempted to gain insight into the divine intent for how they should live their life, so with population genetics I am attempting to refine the theory which allows me to interpret the world around me.

It would probably help anyone who reads many of my posts as well, as it develops particular habits of mind. Though I often recommend Principles of Population Genetics, Elements of Evolutionary Genetics is also excellent. So in the future I’ll try to write up short insights which are pretty banal to most population geneticists, but which might be interesting to a motivated public, if my modest readership can be considered the “public.”

Page 100 has a section, “Selection in inbreeding populations.” The most important formal relationship on this page is:

Δqqs[h(1 -f) + f]

q = minor allele frequency on a biallelic locus, that is, the remainder from 1 – p

h = dominane coefficient , so that h = 0 means q is totally recessive and h = 0.5 means that the locus is additive in regards to allelic effect.

f = inbreeding coefficient, a basic measure of two alleles at the same locus sharing recent common ancestry (and therefore, rendering the genotype likely homozygous). From 0 to 1, with 1 meaning totally inbred and homozygous.

s = selection coefficient against the population mean fitness. Usually the value is near zero, though not exactly zero. A positive selection coefficient of 0.01 is considered very favorable for a new mutant.

What you see here is that in an instance where q is entirely recessive, inbreeding increases the selection on the locus. In a normal population with lots of random mating homozygous recessive genotypes are rare. When f ≈ 0 the change in the frequency of q is just a function of the selection coefficient and the dominance. As inbreeding increases, the importance of alleles (or lack thereof) in heterozygote genotypes decreases. For recessive traits inbreeding is another way to expose the novel alleles to selection.

This is one reason that unscrupulous breeders of animals sometimes utilize very close relatives in programs to change traits. The problem is that inbreeding has an effect across the whole genome, even if you are interested in particular loci. And that effect on the whole genome is often very bad, as lots of deleterious alleles with recessive expression are present in populations which are normally outbred. Of course in plants this also results in purging of genetic load, as alleles get flushed out of the system. Unfortunately for mammals, and complex metazoans in general, this doesn’t seem to work to well for out lineage. If it did work well zoological veterinarians, who I’ve talked to, would be a lot more hopeful about what they’re trying to do by mating near relations in the hopes that they can get a large enough population to maintain a viable breeding program.

• Category: Science • Tags: Inbreeding, Selection

The mutation rate in human evolution and demographic inference:

The germline mutation rate has long been a major source of uncertainty in human evolutionary and demographic analyses based on genetic data, but estimates have improved substantially in recent years. I discuss our current knowledge of the mutation rate in humans and the underlying biological factors affecting it, which include generation time, parental age and other developmental and reproductive timescales. There is good evidence for a slowdown in mean mutation rate during great ape evolution, but not for a more recent change within the timescale of human genetic diversity. Hence, pending evidence to the contrary, it is reasonable to use a present-day rate of approximately 0.5 x 10−9 bp−1 yr−1 in all human or hominin demographic analyses.

Even since this review came out there has been new work. Fast changing.

• Category: Science • Tags: Genetics

Screenshot 2016-07-19 23.03.50

The Time and Place of European Admixture in the Ashkenazi Jewish History:

The Ashkenazi Jewish (AJ) population is important in medical genetics due to its high rate of Mendelian disorders and other unique genetic characteristics. Ashkenazi Jews have appeared in Europe in the 10th century, and their ancestry is thought to involve an admixture of European (EU) and Middle-Eastern (ME) groups. However, both the time and place of admixture in Europe are obscure and subject to intense debate. Here, we attempt to characterize the Ashkenazi admixture history using a large Ashkenazi sample and careful application of new and existing methods. Our main approach is based on local ancestry inference, assigning each Ashkenazi genomic segment as EU or ME, and comparing allele frequencies across EU segments to those of different EU populations. The contribution of each EU source was also evaluated using GLOBETROTTER and analysis of IBD sharing. The time of admixture was inferred using multiple tools, relying on statistics such as the distributions of segment lengths and the total EU ancestry per chromosome and the correlation of ancestries along the chromosome. Our simulations demonstrated that distinguishing EU vs ME ancestry is subject to considerable noise at the single segment level, but nevertheless, conclusions could be drawn based on chromosome-wide statistics. The predominant source of EU ancestry in AJ was found to be Southern European (≈60-80%), with the rest being likely Eastern European. The inferred admixture time was ≈35 generations ago, but multiple lines of evidence suggests that it represents an average over two or more admixture events, pre- and post-dating the founder event experienced by AJ in late medieval times. The time of the pre-bottleneck admixture event was bounded to 25-55 generations ago.

I think this preprint is coming close to the answer. Why does a small ethno-religious minority in Europe matter? Well, that’s a matter of historical contingency.

In any case, there were some good papers on Ashkenazi Jewish genetics which came out in the spring of 2010. They really moved the ball forward from the uniparental work. But they suffered from two major problems. First, the putative “parent” populations of Ashkenazi Jews are not that genetically distinct. Second, the hypothesized parental populations were often implausible; e.g., Northern Europeans and modern Levantines.

The likely parental populations of Ashkenazi Jews are Roman period peoples of the eastern Mediterranean, particularly the swath of territory from Alexandria up to Anatolia, and, the peoples of the western Mediterranean. That is, Levantines and Iberians & Italians. These two groups are distinct, but they’re not that distinct.

Additionally, the more and more we learn about the Middle East, the more likely it seems that Muslim populations, who are often modeled as a parental group, are highly cosmopolitan compared to ancient groups. Recall that Neolithic farmers from the Levant resemble Sardinians more than they do locals, because of later migration from further east in Eurasia, as well as later African gene flow. Using imperfect reference populations will probably skew the results appropriately.

The major change in the past few years is the usage of more genetic information than common genotypes. This paper for example looks at haplotype information. Sequences of variants across the genome. This preserves more recent genetic variation. In other cases you can look at whole genome sequences, and focus on low frequency variants which are extremely informative of recent population differentiation.

Ultimately the only reason I’d suggest that this paper is lacking is the imperfection of Middle Eastern source populations. That’s probably increasing the European and decreasing the Middle Eastern fraction somewhat on the margins. The contemporary populations of the Near East have changed a fair amount over the past 2,000 years, though there is still some continuity.

• Category: Science • Tags: Genomics, Jewish Genetics


Been busy with work. Lots of data coming in. Will be good to turn around some science.

But I’m eating OK. Location matters….

Here’s a FB post from a researcher on Eran Elhaik’s weird results which regularly make press. I’ve started ignoring Elhaik’s stuff because it’s also so crazy.

I’ll try to monitor the open thread better this week and respond to questions.

• Category: Miscellaneous • Tags: Open Thread

In my free moments I have been reading R. Scott Bakker’s The Great Ordeal, as I needed to take a break from Congo: The Epic History of a People (I stopped before the Great War). As you might guess the latter is not a ‘feel-good’ work. And to be frank, The Great Ordeal is probably not the best choice to lighten the mood as a change of pace. It is one of the darkest and philosophically textured examples of the fantasy genre I’ve ever encountered, but that’s not surprising given Bakker’s previous works, and his background as an academic philosopher. Though the series does not indulge in as much graphic and visually rich descriptions of death and gore as George R. R. Martin’s A Song of Ice and Fire, it’s more deeply haunting and horrible. If Martin deals in shades of gray, from the honorable lightness of Jon Snow to the black depravity of Ramsay Bolton, Bakker’s characters seem to be swallowed by a blankness of color. Amorality rather than immorality.

Martin is a master of creating vivid characters with deep color who operate in a world of frenetic and engaging activity (at least up until the third book, when the plot was relatively fast). In contrast, Bakker’s plotting and characterization are both inferior, but that is in part because he gives more space over to a broader philosophical and moral framework, which hangs heavily over the whole narrative. Golgotterath and the Inchoroi are more memorable to me, alive in my imagination, than assorted protagonists swept up along the tides of history over the course of Bakker’s five books so far.

Where R. Scott Bakker excels, and where he rivals Tolkien in my opinion, is world building on a cosmic scale, complete with a well thought out mythos for humanity in his Secondary World. Bakker’s vision exhibits a great deal of verisimilitude, traversing humanity’s Bronze Age to the medieval period in ~4,000 years. The main actors within the narrative action are people from three of the races of men, of whom there are five total, and whose history goes back to an event termed the Breaking of the Gates, as humanity streamed into the western portion of the continent on which they reside, and engaged in a campaign of genocide against the Nonmen and their human servile caste, the Emwama.

Why am I regaling you with the narrative of a fantasy book series? Because the recent results out of ancient DNA and historical genetic inference of human prehistory suggest that the ‘make-believe’ narratives of epic fantasy may actually be an appropriate model of the formation of human populations in the wake of the Holocene. A friend of mine half-seriously quipped that the last 200,000 years of human history are a matter of collapsing ancient population structure. In fantasy novels often main characters themselves are exemplars of such broken population structure; the ‘half-blood’ trope as it were.

As a primal and backward looking genre fantasy dispenses with the need for a liberal individualist ethical framework, as historical relativism allows us to “put ourselves in the place” of protagonists whose motives and concerns are profoundly alien to moderns, albeit often with a sympathetic and contemporary twist. Jon Snow’s life to a great extent is motivated by his need to prove himself despite his bastardy. The specific motivation here would be hard to understand today, as legitimacy is not legally or normatively privileged as it has been historically, but the general need to find a place for yourself is one we can empathize with. Snow’s situation within a world of great noble houses and warring polities divided by region and language is one which most moderns are not comfortable with, but he is no revolutionary who yearns to overthrow the old regime. On the contrary, he is likely to play a large role in its maintenance and perpetuation.

Sargon_of_Akkad The meteoric rise of individuals from a humble station in the context of a static and hierarchical world are not aberrations on a world-historical scale. Sargon of Akkad, the first recorded emperor, whose dominion spanned multiple polities, was from a humble background. Gilgamesh, the scion of a noble family may be semi-mythical, but Sargon was a real person. On the edge of history, but a real person. In a world of corporate entities, defined by group identity, affinity, and affiliation, his success occurred though co-option of a system of city-states with roots over 1,000 years old at that point.

Sargon’s world is one whose outlines we are only vaguely aware of. There are many lacunae, not least of which the origins of the Sumerian people, who served to Sargon’s Akkadians the role of cultural progenitors. A linguistic isolate, the origin of the Sumerians is an unresolved mystery to this day. The end of the Sumerian cultural hegemony occurred in part due to the depredations of the Gutians, people from the hills of what is today Kurdistan, and rivalry with the people of Elam, from modern day Khuzistan.
Elam-mapThe linguistic affinities of the Gutians are unknown, while the Elamites, like the Sumerians, seem to be part of a linguistic isolate.

Much of this ignorance has to do with the importance of literacy in history. What we know about Elam is often through a Mesopotamian lens. The people of Sumer and Akkad, and later Babylonia and Assyria, saw Elam as the great enemy, the Persia to their Rome. The Gutians were a coalition of tribes from the mountainous areas to the east of Mesopotamia, and so had no real indigenous literate tradition. They do not even seem to have a distinctive enough archaeological tradition to trace their migrations.

F4.large Without text and material where does that leave us? Obviously we have a new method: ancient DNA. With this method one can infer demographic change by looking at patterns of genetic variation. The genetic relationship of various peoples who are “mysterious” to us today with modern populations will give us great insight. I predict that when the first results come back from Elamite Iran there will be a strong affinity to peoples in southern Pakistan, especially the Baloch and Brahui, as well as connections to India more broadly, above and beyond the expected local continuity.

Last week Science published a new paper on ancient Iranian genomes, from a period thousands of years before what I discussed above, Early Neolithic genomes from the eastern Fertile Crescent. It’s open access, so you can read it yourself, and I encourage you to do.

What makes this paper different from what has come before? Two things. The first is minor: better sampling. In particular, they have better regional sampling. For example, Iranian Zoroastrians (the link has plink format files). Second, and more important, they have at least one sample at 10x or more coverage. This means they can use haplotype based methods and make better calls on genotypes. It’s much more extensive in the supplements, but the authors discuss the functional characteristics of these populations more than in the earlier papers because of access to higher quality whole genome data. You need to be more confident at a specific locus when inferring function from that locus, than you need to be across the whole genome.

The phylogenetic portion reinforces what the earlier work argues: there were two great tribes of founding farmers who brought agriculture to North Africa, and Western & Southern Eurasia. Though the “cradles of civilization” were often in riverine landscapes, the agricultural revolution began in the Near East in the uplands, which would later become backwaters. Only here could primitive dryland agriculture take root in the desiccated landscape. This was the “Breaking of the Gates”.

There were, it seems, two major phases. The first phase was expansionary. The western farmers pushed outward to Europe and North Africa. The eastern farmers pushed toward South Asia and Central Asia. But look at the position of Iranians in the PCA, and the affinities within Iran. Modern Iranians are much more west shifted than you might expect from perfect continuity. Additionally, the haplotype affinities of populations to western vs. eastern farmers shows that Iranians today have much more affinity to western farmers than Iranian speaking people from Pakistan, especially the Baloch and Makrani in the southwest of the country. This is because there was a second phase: the great scrambling, when reflux from the west into Iran, and vice versa, erased the great division.

In the initial expansionary phased a stylized model was probably as good as any model. The world was dominated by hunter-gatherers, whose social-political ability to scale and organize was minimal. The farming populations probably began to organize chiefdoms rather early, and the spread of their lifestyle was to some extent at the tip of the spear. The hunter-gatherers fled, or were rapidly assimilated as subordinates, losing their cultural distinctiveness. But the next stage after the chiefdoms were more complex arrangements, which might transcend tribal loyalties, especially when one’s tribe spanned a continent.

A close look at the map shows that the Baloch and Sardinians have more affinity with these two ancient peoples than many of the groups which today occupy the Middle East. Why? Mostly because they are distinctive in being less subject to the reflux migrations in the wake of the Neolithic. And, if you look at Europe and South Asia, you can see that Indo-Europeans also left a stamp on these areas, by mediating gene flow from these tribes into areas where the other tradition had been dominant. Northern Europe is less biased toward western farmers than Southern Europe. Within South Asia, the most skewed bias toward eastern farmers are the Baloch, who happen to co-inhabit territory with a non-Indo-European speaking population, the Brahui. These Dravidian speakers are basically indistinguishable from the Baloch. Among the other groups, the Vishwabrahmin are biased toward eastern farmers. In contrast, the Tiwari, North Indian Brahmins, are more balanced. I believe this is because the Indo-Aryans brought western farmer ancestry with them from the steppe.

Rather than talking about the phylogenetic aspects anymore, I want to move to the functional considerations. It seems that the ancient eastern farmers did not have many of the adaptations that we associate with farmers. This is entirely logical. Much of our genetic character is the product of cultural changes, rather than cultural changes being the product of our genetic character. The null hypothesis should be that hunter-gatherers who had just taken to farming are basically like hunter-gatherers who adapted a new lifestyle.

But there are some intriguing elements of the pigmentation genetics, a topic I know a fair amount about. The results from this paper show that the derived variant of SLC24A5, the largest effect pigmentation allele we know of, was segregating in these farmers. This is not surprising. It was segregating in western farmers at high frequency as well. Among Caucasian hunter-gatherers, and even among hunter-gatherers from Mesolithic Sweden. It was, though, not so much found among Western European hunter-gatherers. It is totally fixed in Europe today in the derived variant. Curiously, the authors mention that SLC45A2, another skin-lightening derived allele, which is much more concentrated in Europe, has been found segregating in Neolithic Aegeans. So it may be that the two major skin-lightening alleles were introduced by western and eastern farmers. Finally, the allele known to produce blue eyes in Europeans, found in high frequencies in Mesolithic European hunter-gatherers, was also found segregating in WC1. WC1 is the highest quality genome in their ancient data, so this seems a likely inference.

What this tells us I think is that skin-lightening alleles have been segregating at appreciable frequencies for long time. They have a deep history. Periodically, a particular haplotype gets targeted for selection, and a sweep occurs. Personally, I am more and more leaning to the hypothesis that a diversity of functions and characteristics are the targets of this selection, with the phenotype often being a side effect. What is even more intriguing to me is that the peoples as distinct as Sardinians and Baloch don’t actually look that different physically. The great reflux even affected them, and with it perhaps came alleles which were selected upon and produced a relatively uniform phenotype from the Atlantic to the Indus?

Much of the prior understanding of history and prehistory has been driven by a banal and workaday conception of progress and change. Proponents of demic diffusion imagined stateless villagers pushing outward. Diffusionists assumed that techniques and material would flow along trade routes. There were no great disruptions, rather, there were evolutions and continuities.

That is not what ancient DNA tells us. In another context I’ve mentioned that ISIS is appealing to some because of its “heroic” narrative. Similarly, the origins of modern humanity may be much more heroic than we’d have thought. We the descendants of humans who crossed in Australia. The descendants of humans who finally made it to the New World. Would it be any surprise that nearer prehistory was as ground-breaking and tumultuous?

• Category: History • Tags: Prehistory

The_Great_Ordeal I’ve been in upstate New York, working this week. So busy. Should take a break to crank out some blog posts. In particular, probably “How to read an admixture estimate”, since even after so many years readers are confused….

While I’ve been holed away, Pokemon Go happened. What?

Great Ordeal, the third book in R. Scott Bakker’s Aspect Emperor series is going to come out in two days. #Excited

Though who knows when I will have time to read it?

• Category: Miscellaneous

Screenshot 2016-07-04 23.33.00

One can appreciate a work of art on two levels. When one beholds the sculpted renderings of the Classical Greeks, across the distance of more than 2,000 years we can feel viscerally that they have touched something beautiful, and made it stone. To reduce this to biology, our perception maps onto to deep grooves in our evolutionary landscape of aesthetic judgments. As a savanna ape the darkness of the forest haunts us with its beauty and majesty; but we are the children of the meadows and edges of the Paleolithic pastoral. Similarly, on some level we acknowledge physical beauty when we see it, before we even think it.*

Another level of appreciation is narrower, and that is one where you have awareness of the ingenuity of technique, the deep virtuosity and fluency of execution. This aspect of understanding aesthetics is naturally delimited to those with equivalent skills, or whose skills aspire toward the plane of the masters.

Reading Iosif Lazaridis’ The genetic structure of the world’s first farmers you can evaluate on both levels. The results are broadly accessible, but the depth of the analysis is clear to anyone who has ever attempted something analogous. These papers coming out of David Reich’s lab have a certain template, but they are definitely not paint-by-numbers. For those who are interested in technical details, you have to read the supplements.

Ten years ago the insights gleaned from this preprint were only glimmers in the eyes of assorted researchers and “genome bloggers.” The problem now is one of going from the raw result, back to the dynamics which produced the result. A deep problem of inference.

To get to where we are now, and the embarrassment of copious conclusions, researchers needed three things:

1) Lots of genetic data, and methods designed to leverage that data (basically, genomics, and the statistical genetics geared toward analyzing large data sets).

2) Genetic data from time points in the human past, and not just present.

3) The technological infrastructure necessary to handle the data (from computational power to the arcane arts of the ancient DNA lab).

What have we learned? Ancient DNA has revealed that genetic variation in the human past has been characterized by very strong discontinuities, both over time and space. What do I mean by this?

As a stylized fact it has been fashionable in some quarters to describe human variation as being overwhelmingly clinal. That is, a continuous change in gene frequencies as a function of space. One associated fact has been the expectation that gene frequencies will change over time in a similar steady and regular fashion.

Obviously there is some truth to the clinal variation in our species. If, for example, you walked form France to the Punjab, it would be difficult to establish a hard-and-fast line where there was a definite discontinuity in genes. But there could be candidates. In particular, in Central Asia there would be regions where you would find rather high frequencies of alleles more typical of East Asia, while in Afghanistan the genetic signatures of non-West Eurasian peoples of a different sort, typically found in South Asia, would start to crop up.

But these two points of discontinuity illustrate the general principle that discontinuity emerges from specific historical-demographic events. In the case of the rather high fraction of East Asian associated genes in Central Asia, this is almost certainly a product of the Turkic expansion, which occurred in starts and fits over the ~1,000 years between 500 and 1500. In South Asia, we now suspect that there was a relatively recent intrusion of West Eurasian populations, and likely some reciprocal gene flow between indigenous groups and the incomers.

These two instances point out that major disruptions in gene flow are likely correlated with major cultural disruptions. The Turkic expansion occurred in historical time, so we can inspect it and note that the decline of Iranian populations within Central Asia began during the late Sassanian period, but came to near completion with the major shocks of the Mongol period 700 years later. These were events of geopolitical note.

This is important to consider, because the older models which posit clinal variation assume that genetic change occurs through a ‘mass action’ process, whereby small family or village groups enter into a phase of demographic expansion, and literally outbreed others. This was to some extent the model implicit in the ‘demic diffusion’ theories of the expansion of the Neolithic lifestyle into Europe from the Near East, pioneered by Colin Renfew, and extended by L. L. Cavalli-Sforza and colleagues.

In a classical economic framework one can simply assume that those who practice the farming lifestyle will be in a state of land surplus on the frontier. Therefore, they will have large families, and keep expanding their range. In such a fashion individual decisions of Homo economicus can drive cultural and demographic change over large regions in relatively short time periods.


The decisions of the many in an uncoordinated fashion can lead to the ordered patterns we see around us, with clines of variation, as well as signals of genetic expansion. As L. L. Cavalli-Sforza noted the argument here is not that most of the ancestry of modern Europeans is exogenous to the continent when using Pleistocene groups as the indigenous reference, but that the demographic wave of advance is responsible for agriculture, not cultural emulation. Even with this wave of advance model, which has been widely explored in population genetics, assimilation of native groups on the frontier means that most of the ancestry on the frontier by the end of the process could be “indigenous.”

Cavalli-Sforza’s assertions came in the wake of a series of results in the early 2000s which were interpreted to suggest that most of the ancestry of modern Europeans derives from populations resident during the Pleistocene. These results were taken to suggest that agriculture must have then spread by cultural diffusion, not demographic expansion. All Cavalli-Sforza was pointing out was that the model he was supporting was about a dynamic process, not some specific value of haplotype counting by region.

Ultimately this rearguard apologia was not necessary. It turns out that a majority of the ancestry of modern Europeans is likely exogenous to the continent over the last ~10,000 years. The earlier results which were used to support the converse were right in their results, but were misinterpreted. Additionally, I also think that the model outlined by Cavalli-Sforza and his colleagues is in some ways too elegant and stylized to be useful. If you read The War Before Civilization there are plenty of archaeological hints that there were massive inter-group conflicts during prehistory, and the arrival of farmers to the continent probably exhibited some coordination and collective action beyond the village. The 3,200 year old battle on the Baltic is probably the continuation of a long tradition in Europe, and the world, of collective action and conflict.

This is a “problem” because inter-group conflicts on a geopolitical scale are not as tractable in terms of a general model as a “wave of advance” demographic scenario where endogenous growth parameters rule supreme. Rather, demographic patterns are not due to continuous predictable dynamics, but the intersection of such parameters and contingent events. History has no guarantees, though its wheels tend toward certain favored grooves.

Twenty years ago L. L. Cavalli-Sforza wrote a book geared toward the lay audience, Great Human Diasporas. The culmination of a lifetime’s work, it surveyed what we then knew about human genetic variation with classical markers derived from contemporary populations. The tools we have today are far more precise, with hundreds of thousands of markers rather than hundreds, and DNA samples from populations thousands or tens of thousands of years in the past. Instead of simply inferring the tree of life, researchers are now constructing a lattice of relationships derived not only from the nodes visible today, but also positions within the lattice from the deep past.

The evidence which is coming back is that pre-modern populations exhibited a great deal of genetic differentiation over even small distances, and, that differentiation could persist for thousands of years. Between group proportions of variation on the order of 10% of the total variance, what you see between Europeans and Han Chinese, were not atypical for nearby peoples, even though one migrant between them per generation would have eliminated that difference in short order. This equilibrium of difference would eventually get disrupted by radical demographic turnover, as location populations went extinct or were absorbed by newcomers, who reshaped whole landscapes through their expansions. In other words, if Cavalli-Sforza were to write a book today I believe it would be titled “Great Human Disruptions and their Diasporas.”

And this isn’t just about agriculture. Ancient DNA from Pleistocene Europe indicates turnover there too. There may be meta-population dynamics which are at work on the edge of the modern human range in Eurasia. As local populations go extinct, new populations expand to occupy their territory. The ancient human landscape may have been relatively sparsely populated, diminishing opportunities for gene flow.

But this is likely not the whole story. Inter-group conflict certainly played a role, and ancient DNA has uncovered evidence of long periods of genetic distinctiveness between neighboring populations. This suggests cultural practices serving as a barrier to gene flow. We do have one case where this occurs today: India. The caste system is such that continental wide genetic distances can be found within local populations in the same region, which have coexisted for thousands of years.

So what are the results of the the Lazaridis’ paper? The figure at the top gives you a PCA-centric view. Basically, all West Eurasian populations today can be modeled to a first approximation as a mixture of four ancestral groups which flourished on the order of ~10,000 years ago. If modern genetic variation can be conceived of as an algebra, then for West Eurasia these are the four variables with differential weights you need to produce any reasonable output.

The four are:

1) Western hunter-gatherers (WHG), the indigenous populations of Europe and surrounding areas.

2) Eastern hunter-gatherers (EHG), the indigenous populations of the the northeastern fringe of Europe.

3) Western farmers, the ancestors of Early European Farmers (EEF), with roots in the zone from the southern Levant north into Anatolia.

4) Eastern farmers, who are rooted populations which flourished in the Zagros mountains of western Iran (Central Asian Farmers, CAF).

These four themselves exhibit some compound ancestry. On the order of half the ancestry of EEF and and CAF was basal Eurasian (BEu), a population which seems to have diverged from other non-Sub-Saharan Africans more than 50,000 years ago, before Neanderthal admixture. To be clear, BEu seems to be an outgroup to populations as diverse as Pleistocene European hunter-gatherers, Australian indigenous groups, and Andaman Islanders. The other half of EEF and CAF ancestry derives from two distinct sources, which explain their different positions on the PCA plot. The EEF have a WHG-like admixture. That is, some of their ancestors are nested within the broader clade which includes European hunter-gatherers, and far more distantly the Ancestral North Eurasians (ANE). Work on Pleistocene genomics indicates that there was a major increase in affinity between European hunter-gatherers and Near Easterners ~15,000 years before the present, suggesting that there was major gene flow uniting these two regions. The Near Eastern element of this movement probably fused with BEu.

Second, the CAF population, which is known from far fewer samples, seems to have shared a lot of ancestry with EHG, so the two must have shared common ancestry from related groups. It seems that the mostly likely source of this was ANE. Due to the genetic distance between ANE and WHG, the Fst between EEF and CAF was on the order of ~0.10, similar to that between Chinese and Europeans today. These two groups seem to have stumbled upon agriculture very near to each other at similar times.

Where they independent events? I suspect that they weren’t. I’m not implying here cultural diffusion. There is evidence of independent domestication of landraces in the Zagros. Rather, these two populations were part of a broader network of trade connections within a similar ecological landscape. It was not coincidental that both stumbled upon agriculture. Likely there was diffusion between the two of similar cultural precursors to agriculture. Their location in such proximity can not be coincidence, though the details are to be worked out.

Interestingly once these two populations stumbled onto agriculture they expanded in opposite directions. Why? Probably because they could. That is, both of them had high population densities and social complexity, and rather than impinging upon each other’s territories they expanded into “empty” landscape. Regions inhabited by hunter-gatherers who were easier to eliminate or assimilate. The spread of Cardial and LBK people in Europe was so fast that it is almost certain that they were all one cultural unit initially. Something similar probably applies to the CAF groups which expanded east into South Asia, and north to the steppe.

Another intriguing result in this paper is that WHG themselves seem to have had admixture from eastern populations. More precisely, the Mesolithic hunter-gatherers used in earlier analyses as “pure” exemplars of WHG turn out not to be, but exhibit some admixture with other groups. This is probably why the ANE proportion of EHG is much higher in this paper. An older sample from Bichon in Switzerland lacks the eastern admixture, and so serves as a better reference for WHG. Though not definitive, it now looks as if ANE admixture into East Eurasians (e.g., Han Chinese) has resulted in some affinity between these populations and Europeans today, going back to, but not limited to, the WHG. This is no surprise. The emergence of agriculture is not singularly new, cultural innovation seems to trigger demographic disruptions, no matter the time or place.

Though the centerpiece of this preprint is the fact that four populations are sufficient to explain the genetic variation, and demographic history, of West Eurasian populations, I think perhaps a more interesting element is the role of ANE and BEu. Neither of these groups exist in “pure” form today. We don’t know who BEu were. We don’t know where they came from. To me it is suspicious that BEu ancestry exists in about the same fractions in both EEF (at least their precursors in the Middle East) and in CAF. It does not seem that the two BEu components were very differentiated. To me that indicates that BEu may have expanded relatively recently. I also believe that BEu may have a role mediating “back to Africa” gene flow. As BEu lacks Neanderthal admixture that would explain the very low levels in most of the continent, and yet the presence of what now look to be Eurasian origin E Y chromosomal haplotypes.

As for the ANE, their geographic coverage is incredible, from Western Europe all the way to the New World. It seems that as a unadmixed group they persisted into the Holocene, but in numbers they were always stretched thin. Through their amalgamation into agriculturalists they’ve persisted, but likely many of their Paleo-Siberian folkways diminished. I do believe though the R1 haplogroups on the Y chromosome likely derive from them, as it is a sister to the Amerindian Q.

There’s a lot in the paper to chew on, especially the supplements. For example, the percentages of “steppe” ancestry are non-trivial throughout South Asia. What to make of this? I think I’ll hold off until ancient DNA comes in, as it will in the next 6 months.

But, I do think ancient DNA and the model of disruptions and discontinuity supports the proposition that punctuated equilibrium as a thesis has much more validity for cultural evolution than it does for biology. Cultures exhibit inertia and a tendency toward conformity. Learning new things is difficult. Very special conditions must have existed for agriculture to “take” in the Near East, as hunter-gatherers shifted form facultative cultivation to obligate modes of production of crops. Once these cultures became farming cultures, it wasn’t easy for neighbors to easily adopt them, as cultural packages often come as a whole, with many contingent parts. The advantage of agriculture is that it extracts more yield from the ground, and population densities go up. Higher population densities means more resources in inter-group conflict, if it comes to that, and the need to expand to continue to race beyond the Malthusian limit. Once the space is occupied, a new equilibrium is reached.

And I want to reiterate, this model does not apply to just agriculture. A tweet from Spencer Wells:

New Guinea is a horticultural society, and the highlands are very densely populated. The high Fst is in line with what you see in early Holocene Western Eurasia, or in India today. But observe that the genetic differentiation is from the past 10,000 years, not the past 50,000. 10,000 years ago is when horticulture began. In all likelihood one population in the highlands began to practice this, and expanded demographically, eliminating or absorbing its neighbors. But the landscape of New Guinea sets tight limits to the range of possibilities, as the highlands of the island are a very isolated ecosystem. The genetic differentiation began once the expansion phase ceased, and groups began to struggle for existence at the Malthusian limit.

One of the insights of Lazaridis et al.’s paper is that this didn’t happen in Eurasia. The differences between EEF and CAF diminished, as the Near East saw reciprocal gene flow during the Bronze Age. The difference was not in agriculture, but post-agricultural social complexity, which allowed for the emergence of what Peter Turchin would term “meta-ethnic” identities, and complex institutions which transcend locality. In the new equilibrium state the Fst did not begin to go up as populations jostled for resources, as innovation began to gently push the production frontier outward, and foster connections of material (e.g., trade) and ideas (e.g., religion).

The whole story is not written in stone yet. The next few years are going to be interesting. China is the next frontier, and ancient DNA will open up its history soon enough. But it’s an exciting time to be witnessing the unveiling of prehistory before our eyes.

* There is a dimension of aesthetic judgement which is culturally conditional, and another which is not. I speak of the latter here.

• Category: Science • Tags: Genomics

The_Great_OrdealThe Great Ordeal, the third book in R. Scott Bakker’s Aspect Emperor series is going to come out in nine days. Bakker is apparently working on revisions to the fourth book, The Unholy Consult. So this series will complete (apparently Bakker’s original vision was for three related sequential series, so this would be the second of the three). There has been a large gap in time between the second and third book, but it seems mostly due to issues relating to publishing, and not the writer.

You may be interested in his blog, which has assorted links and comments. Also, there is a history of Earwa online which you might find interesting. The artwork is very evocative.

Bangladesh Attack Is New Evidence That ISIS Has Shifted Its Focus Beyond the Mideast. Apparently the attackers were educated at an English immersion school of some sort. Spoken Bengali, like British English, is very sharply differentiated by class. The menial workers at the bakery were apparently surprised at the obviously upper middle class speech patterns of the terrorists, as well as the fact that some of them spoke casually in English.

This is not surprising. Terrorist ideologues are often from privileged backgrounds. Marc Sageman has presented the most thorough ethnographies that I know of, and they’re rather clear. Generally middle to upper class, often with a technical background. Also a disproportionate number of converts. Many people with cosmopolitan backgrounds, in term of having exposure to other cultures and traveling.

As for why radicalism is cropping up in Bangladesh, it’s because they know that they’re in the middle of a culture war they might lose. There are atheists and gays in most countries, but in Bangladesh they are starting to be public. That’s a sign that the norms of conservative Islam are breaking down. Additionally, economic development and NGO influence are also integrating the nation into a web of international commerce.


Michael Cimino, Director of ‘The Deer Hunter’ and ‘Heaven’s Gate,’ Dies at 77.

Growing Pains for Field of Epigenetics as Some Call for Overhaul. The correction.

This robot-powered burger joint could put fast food workers out of a job.

3 reasons the American Revolution was a mistake.

Democratizing DNA Fingerprinting.

Multi-layered population structure in Island Southeast Asians.

• Category: Miscellaneous • Tags: Open Thread

Screenshot 2016-07-02 22.20.21
Deep Sequencing of 10,000 Human Genomes:

We report on the sequencing of 10,545 human genomes at 30-40x coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single nucleotide variants in the coding and non-coding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries in average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

The 30x means that they’re hitting each base on an average of 30 times, so they can be very confident of their call. This matters a lot for rare variants, as might be useful when it comes to idiopathic diseases. The 10,000 number is obviously to take it a step beyond the “1,000″ genomes, which went well above 1,000 genomes in any case. But the coverage means that these are very confident calls for any given individual.

A distribution of variants shows that their panel of unrelated individuals (~8,000) yields ~150,000,000 single nucleotide variants (out of a genome of 3,000,000,000 bases). You see that half of these 150 million are found at counts of one across their whole sample set. In contrast, you have ~5 million variants present at allele frequencies of about 5% or more, and a bit more than ~10 million variants at 1% or more, and ~20 million variants at 0.1% or more. Remember that the 1000 Genomes paper reported that each individual within their data set have about ~5 million variants in comparison to the human reference genome.

I reiterate these dull numbers to give people a sense of what it means to have 100,000 to 1 million marker SNP-chips in humans. It is true that without imputation these chips aren’t capturing a lot of functional variants (though they’re typical designed to target a lot of the most important disease markers in particular). But when comes to capturing the shape of genetic variation they’re a very good sampling indeed. Consider, for example, the proportion and number of voters who are part of the sample for exit polls or pre-election surveys. For standard PCA or genotypic model based clustering (e.g., ADMIXTURE/STRUCTURE) anything more than 1 million markers is pretty useful from what I’ve seen, and the 100,000 to 500,000 interval is sufficient for pretty much everything. And haplotype based methods that generally use phasing, like fineSTRUCTURE, seem to do fine in the ~250,000 marker range.

• Category: Science • Tags: Genetics

440px-Elizabeth_Warren--Official_113th_Congressional_Portrait-- There have been some media “explainers” about how genetics can’t speak to Elizabeth Warren’s Native American heritage. This is a complicated issue, and not all the assertions in the media pieces I’ve seen are wrong, but a lot of the details are very confused or wrong. In sum, this is very bad journalism from people who don’t know where to start, and had no idea they were relaying confusions or falsehoods. (I’m being generous here in assuming they didn’t know that they were repeating falsehoods)

The point of this post isn’t to get too involved in the political points. Or even to argue that Elizabeth Warren should take a genetic test (I don’t think she should unless she wants to for other reasons besides the political sideshow, but that’s my personal opinion). Rather, I think that genetics is being distorted for the sake of political points and demerits. That is not optimal. Normally I don’t do much “fisking” type posts, but this is necessary at this point.

Let’s start with The Washington Post, Sorry, Scott Brown: A DNA test can’t tell us if Elizabeth Warren has Native American roots.

First, the title is false. If a few percent of Elizabeth Warren’s ancestry was derived from people whose ancestors lived in the New Word before 1492, then it would be visible on a PCA with Europeans and Native Americans. She’d be shifted a bit toward Native Americans.

Second, the journalist at The Washington Post interviews someone with serious credentials to serve as a primary source:

Nanibaa’ Garrison is a bioethicist and assistant professor of pediatrics at Seattle Children’s Hospital. A Native American, she earned a PhD in the Department of Genetics at Stanford, with a dissertation focused on ancestry.

She certainly has done genetic research, but I’m not sure that she can speak to modern genomic inference, which has advanced a lot in the past ten years.


That’s because determinations of ancestry are based on “ancestry-informative markers” — genetic flags that offer probabilities of the likelihood of certain ancestries. Most of those markers, AIMs, are “based on global populations that are outside of the U.S.,” she said, “primarily people of European descent, people of Asian descent and people of African descent.

Those three populations are not enough to determine how much Native American ancestry a person has.

AIMs were popular in the 2000s. Basically they are usually less than 100 markers with very high between-population differences in frequency between your populations of interest. But today most people would not use AIMs unless cost is a major issue (e.g., I’ve seen that AIMs are still used sometimes in work from developing nations because they can’t afford SNP-chips). So all the talk about AIMs is totally irrelevant to the question at hand.

Today you can download data sets with hundreds of thousands, and in the case of the 1000 Genomes data, millions of markers. These are still ascertained for polymorphisms; variants. But they’re really not AIMs in the classical sense as they are not targeted to a narrow set of populations, but look for variation across most human groups.

Also, panels are not restricted to three populations. You can get plenty of indigenous American samples from various public panels, as well as looking in the 1000 Genome Peruvian data set. The focus on three populations is again an artifact of 2005, probably due to the HapMap era (CEU, YRI, CHB+JPT, if you know what I mean).


Warren’s understanding of her heritage was that she was part Cherokee, perhaps as little as 1/32nd based on outside sleuthing. (Brown dismissed that claim specifically on this week’s call.) The odds of identifying a particular tribal identity are essentially zero, according to Garrison, but such a small percentage of Native American blood would also make identification much harder, even if the necessary AIMs existed.

Again, AIMs are irrelevant. This is like explaining that Netflix won’t work because of 56K modem download speeds. Most people don’t use 56K modems anymore. The 1/32 fraction may be an issue, but not because ~3% is not detectable. It is. A few years ago I stumbled onto the fact that geneticist Dan MacArthur is ~2% South Asian. He checked, and his brother is in the same range, while his father is about double. It turns out that he had an ancestor who was an officer in the British army in India….

The bigger problem here is that as you proceed back generations you are less and less likely to have genetic segments from any given ancestor. So if you had an ancestor 200 years ago who was Native American, even if they were 100% Native American, you may not have any genetic segments from that individual.

So, the article says:

Even a test that was fine-tuned to pick out Native American identity might not find any on Warren’s genes, because the requisite markers simply may not have made the cut over multiple generations.

This is correct. But, you probably do have segments from someone five generations back. There’s about 5-10% chance that five generations back you wouldn’t inherit any segments from an ancestor at that remove. The expert consulted by The Washington Post states:

“It would be impossible to go back that far,” Garrison said. “One-32nd is low enough that, even if she does have Native American ancestry, just by chance the genes that show up on these AIM panels might not necessarily be passed down, even if she might have other genetic variants that are highly prevalent among Native Americans. It’s all just by chance, what you inherit from your parents.”

As I said, AIMs are irrelevant. Today you would use dense SNP-chip panels or even whole genome sequencing. But even with AIMs if you had 100 well distributed throughout the genome it would be quite possible to detect divergent ancestry from the rest of the genome. It is not “impossible” as asserted. The source is just incorrect.


“There’s a confidence interval that’s associated with [the results],” Garrison said. “That confidence interval can be very wide, especially when you’re talking about such low ancestral contribution.” So maybe Warren gets the results back and it says that she’s Native American — but that it can only be determined with 20 percent confidence. Scott Brown might not be convinced.

This is only an issue with AIMs. You can get results of 3% back pretty robustly. And it would show up on PCA too.

Then there are weird tangents, which I think exist to make the author look like they’ve “done their research” and reassure the lay audience:

Huntington disease, for example, can be spotted in DNA — but the test wouldn’t tell you when the disease might develop, which doesn’t do you much good if you’re worried about a four-year window. “There are so many different environmental factors or dietary factors and other health behaviors that would feed into whether or not a disease might develop and what time in their life it would develop,” Garrison said, making that sort of prediction impossible. (For now, at least.)

I’m not a medical geneticist, but I think the example of Huntington’s is kind of strange to put here (perhaps because people know about it?). It’s really well genetically characterized. From the link provided in the article:

As the altered HTT gene is passed from one generation to the next, the size of the CAG trinucleotide repeat often increases in size. A larger number of repeats is usually associated with an earlier onset of signs and symptoms. This phenomenon is called anticipation. People with the adult-onset form of Huntington disease typically have 40 to 50 CAG repeats in the HTT gene, while people with the juvenile form of the disorder tend to have more than 60 CAG repeats.

Individuals who have 27 to 35 CAG repeats in the HTT gene do not develop Huntington disease, but they are at risk of having children who will develop the disorder. As the gene is passed from parent to child, the size of the CAG trinucleotide repeat may lengthen into the range associated with Huntington disease (36 repeats or more).

Warren is old enough that she is unlikely to have 60 repeats or more. But Huntington’s is one of those diseases where we have a good sense of age of onset because it’s triplet repeat length is proportional to age of onset.

Next we have an article in Slate, A DNA Test Won’t Explain Elizabeth Warren’s Ancestry. First:

But here’s the thing: DNA testing cannot definitively prove whether a person is Cherokee. Or a member of any community, at least not reliably. To assume it can is to assume that there’s something inherently different in the genetic makeup of tribal members and that this thing is universal within that community. That’s not true.

Strawman. We’re always talking probabilities. Then:

The problem is that DNA snippets, or markers, are inconsistent. Sometimes they are passed on and sometimes they are not, and whether they are or aren’t is random. Sure, a large percentage of Native Americans may share certain genetic markers. But many Native Americans may lack the same marker, and many non–Native Americans may carry it by coincidence.

I don’t have a good sense of what the author is trying to get at, though I think there’s something underlying all this verbiage. The issue that allele frequencies are not (usually) disjoint across populations is well known. That’s why modern SNP-chip panels use hundreds of thousands of markers. Much of the Slate article is engaging a strawman when it comes to genetics because it acts as if we’d actually rely on a few markers, though perhaps not in the public’s perceptions of how these things work. In the latter case, the author could simply put in this sentence: “genetic tests to detect ancestry usually rely on hundreds of thousands of markers today, not only a few….”

This lack of specifics crops up over and over:

So when a DNA test comes back saying you are 28 percent Finnish, all it’s really saying is that of the DNA analyzed (most companies don’t analyze all of your DNA), 28 percent of it was most similar to that of a completely Finnish person. In the end, these comparisons are a fun but ultimately unreliable way to think about the possibilities of whom your ancestors might have been, rather than definitive proof of your ethnic background.

There’s a link in the piece that takes you to a 2007 piece on how DTC tests aren’t all they’re cracked up to be. 2007 is ages in genomics. So ignore that. Second, the selection of Finnish is unfortunate for the author, as Finns are actually one of the more genetically distinctive European populations out there because of a small effective population size. So, for example, one of my friends has a grandfather whose parents were from Finland. 23andMe says she is 19% Finnish. It’s simply wrong that it’s “unreliable.” With segment matching it’s quite reliable if you get a positive hit assuming you set the genetic distance threshold high enough. Also, depending on how you delimit “ethnic background” it can be quite definitive. Samples from Northern Europe never show much evidence of African ancestry. A minority of white Americans do. That’s not a coincidence.

As in The Washington Post the author of Slate piece has an authority who lays down the truth as they see it:

“Scientists who don’t know better claim that when more Natives are sampled they’ll have better data bases, i.e. more Native markers,” said Kim TallBear, professor of Native studies at the University of Alberta in a 47-tweet takedown of Brown’s remarks about Warren. “[Geneticists] think that with more markers, and greater historical-genetic resolution they’ll be able to pinpoint tribe-specific markers.” But this does not account for the fact that people are continuously moving and reproducing with other, diverse people. They mix their genetic code with other communities (as they always have, going back to the dawn of our species). If anything our DNA is getting more muddled, not more clear.

Can you read a paper like The genetic structure of the world’s first farmers, and believe this? Geneticists who work in historical population genomics are quite familiar with the ideas of migration and gene flow. More data is clarifying, just as it science should be.

The first authority cited in The Washington Post did some legitimate science at some point, though a bit outside of the core area of expertise she was being consulted on, and her knowledge definitely seems out of date (the constant talk about AIMs is a good tell here). Kim TallBear’s publications are quite different….

The author of the Slate piece ends:

Another issue is limited and inconsistent data., for example, divides the world up into 26 genetic regions and uses just 115 samples to create the representative of each region—a very small sample size. And different companies place different weight on these samples, which come from burial grounds, modern isolated communities, and academically published data, like the Human Genome Diversity Project. For the consumer, this means if you don’t like your heritage results, try a different company. You’ll get a completely different breakdown.

Whether there’s any harm in people basing their identity on faulty reasoning is unclear, but the success of these commercial endeavors proves that at the very least, consumers find it kind of fun. Genetic testing is basically just a low-cost way to get a blurry picture of whom your ancestors might have been related to.

First, the author needs to issue a correction. I immediately knew didn’t use 115 samples; that’s just too low. Fifteen seconds of Google shows me that they have a sample size of 3,000. No idea where 115 samples comes out of, and I don’t care. He’s wrong. Slate should correct this. [see addendum; I may have misunderstood or been too harsh here, but a different point them crops up....]

Second, it’s misleading to say the picture is “blurry.” No, arguably it’s overly precise, and misleads people. Many of these ancestry inferences are quite precise and robust. They don’t vary between replicates that much even though they have a stochastic parameter. But, model based clustering gives results conditioned on a model. The results themselves them are sensitive to the parameters you’re putting into the model. The different regions from different DTC companies and sample sets are these different conditions.

This isn’t mysterious or difficult to understand. If you want to separate your individuals into Africans and non-Africans all the non-Africans will go into one cluster. This is robust, precise, and highly reproducible. In fact, a non-African individual will never be clustered with Africans with normal SNP-chip densities. At least not in the thousands of iterations I’ve personally run and inspected. Similarly, as you separate populations further you’ll see reasonable and comprehensible divisions.

The problems crop up when you begin to slice and dice very close genetic groups, where there isn’t much between-population difference. This is what happens in Northern Europe, and this is where most of the DTC firms’ client base is from. So this causes problems, and often difficult to interpret results. Moderate changes in parameters then can produce divergent results because the question we’re trying to get at is really hard to resolve with the data on hand, less than one million SNPs.

There are ways to resolve this. And that has to do with more data. In particular, whole genome sequencing at high coverage can pick up very rare alleles, which are highly informative of more recent genealogical history, and so divide up even Northern Europeans in a way that is more comprehensible and historically accurate.

But really the problem isn’t with the data. We have very dense SNP-chip markers now. The problem isn’t with the methods. We have genotype and haplotype-based methods which can make pretty strong inferences, especially at the intercontinental level (e.g., a friend who is 1/4 Japanese genealogically comes out to be 24% Japanese genomically; the rest is European). The problem is that the public, including journalists, aren’t always clear what the results are telling them. Sometimes the DTC companies themselves may be at fault because of their unclear communication. And to be frank, the Henry Louis Gates Jr. in my opinion has often sown a lot of confusion as well with his television show, informative as it may be.

500px-JohnRossCLooping back to Elizabeth Warren, the biggest issue with her maybe not having any indigenous ancestry combined with a Cherokee ancestor five generations back is that the Cherokee nation in the 19th century was already genetically mixed. The great chief John Ross was 1/8th Cherokee by blood quantum. That is, 1/8th of his ancestors were present in the New World in 1492. So a simple reason for why Elizabeth Warren might be Cherokee, but without indigenous ancestry, is that her Cherokee ancestor may not have had much indigenous ancestry. It’s not because genetics can’t pick up indigenous ancestry, genetics can. It’s just that this is a case were social and cultural history and definitions are important.

To be honest this post is a bit trivial. But lots of people read The Washington Post and Slate. As I just explained above there is a simple reason why Elizabeth Warren could come out 100% European in her ancestry, and, be of Cherokee descent. Instead of explaining this, the media has decided to look for people who claim that genetics just can’t answer this question. In the process they garble, mislead, and repeat falsehoods (the sample size for is obviously wrong to anyone who is familiar with that field, but the journalist is not familiar, so it passed their smell test since they had no grounds for discernment).

This post exists only so that at least there is someone out there correcting the record.

I am a consultant for Gene By Gene and was a developer for their MyOrigins tool. This is one reason I know a lot about DTC genetic companies. But it also means I have a conflict of interest, as I think DTC genomics is useful with the proper caveats.

Addendum: A reader:

This seems, um, contrivedly obtuse. 115 samples per region times 26 regions is a total sample size of 2990, which seems reasonably close to 3000. Going the other way, 3000 / 26 is 115.4, so that will be where the claim of “115 per region” came from. There was no claim of “115 total”; the piece says that the representative of each region is constructed from 115 samples.

It’s true that 115 is an average figure and that’s not made clear in the article, but I’m not sure how comforting I should find it that the representative of “Polynesia” is actually constructed from 18 samples rather than 115.

A fair, but inadvertently ignorant, point. Sample sizes of ~20 are actually quite sufficient to generate reference populations. It partially depends on how diverse the populations are you are trying to use as a reference.

• Category: Science • Tags: Genetics

bw04060031396378081 It is a common assertion to state Christianity helped maintain the continuity of Classical civilization down to the Medieval era, through the “Dark Age” of Europe after the Fall of Rome. A more extreme position is that Christianity was a necessary condition for the maintenance of this civilizational tradition. I recall once reading an alternative history short story where illiterate tribesman visit the ruins of Rome, and muse about the consequences of Maxentius’ victory over Constantine at Milvian Bridge (this is the “point of departure”).

Obviously no one denies that the Christian Church was essential in maintaining ancient learning and ideas, whether through concrete steps such as copying in scriptoriums, or, more abstractly by integrating with into intellectual armamentarium tools developed by the Greeks (e.g., Greek philosophy). But, there is a line of thinking that asserts that there was something profound about the Christian religion which allowed for the maintenance of civilization against the barbarian hordes. Whether it is true or not is not an argument that is winnable in this space. But, the power of ideas to shape the course of human history is more tractable.

What I would suggest is that complex human phenomena, such as Christianity, are not reducible down to abstract sets of ideas in terms of how they manifest themselves in our world. That is, Christianity is only marginally about the Athanasian Creed, or even the sacrifice made by the Son of God, from a naturalistic perspective. Rather, the religion includes a broader set of institutions and folkways which derive from the culture at large (e.g., the Roman Catholic Church is the “ghost of the Roman Empire”). Additionally, it also expresses common human intuitions about the world and social relations.

But, as a complex cultural phenomenon, Christianity is conditional on complex culture. That is, Christianity may have aided the preservation of learning in the Dark Ages, but it couldn’t be the necessary cause of this preservation because too is an effect. The persistence of Christianity in the post-Roman world was a hallmark of those regions which maintained Romanitas to a greater extent. Christianity seems to have disappeared broadly (even if it persisted residually) from areas of the Roman Empire where there was total social collapse and transformation; the regions of Britain conquered by the Anglo-Saxons, much of the interior of Pannonia, Dacia, and Thrace. These are zones of cultural turnover. But, we know from genetics that a substantial local population persisted. In the Balkans and England a large minority of the ancestry derives from migrations which occurred after the year 500, but only a minority. But, the Roman majority clearly lost the cultural commanding heights, and with that the elite support for Christianity. These were zones that had to be re-Christianized in later centuries, even though a substantial proportion of the population probably had had Christian ancestors before.

congo_main_1894003f It isn’t that there was a proactive campaign of paganization, analogous to what occurred in 17th century Japan against the Christian population, who were forced to register with Buddhist temples. Rather, the total defenestration of the old Roman elites in these areas made it so that the new elites seem to have had little incentive to convert and patronize the old religion. This is in contrast to the situation in post-Roman Gaul (Francia), Spain and Italy, where Roman era elites maintained enough continuity to influence the German warrior elites (though in many cases these elites were already Christian, they were Arian sectarians, whose religious difference marked them off from the old nobility and the peasantry).

This all came to mind when I began to read portions of Congo: The Epic History of a People. I am reading this book for two reasons. After Dancing in the Glory of Monsters: The Collapse of the Congo and the Great War of Africa, I have come to think that the Congo basin is one of the great laboratories of the forces which drive cultural geography. As such, I have an eye out for books on the Congo. Second, it was a summer reading deal for the Kindle, and so cheaper than a Starbucks coffee.

The second relevant to this post: after the decline of the Kingdom of Kongo a residual memory of Christianity persisted across broad areas. But, Christianity became integrated into African shamanism and folk religion, and lost all its substantive distinctiveness from African traditional religion. The few Europeans who ventured into the interior in the 19th century reported villages where there were survivals of Christian ideas, but they had transformed beyond simple recognition. In the 20th century the southwest portion Congo basin, which been under Kongo rule, therefore became the focal point for missionary activity again.

What is true for Christianity is probably true for many complex human ideas and institutions that we think are here for good. The reality is that complexity of thought and contingency of logic are dependent on the surpluses generated by a a highly developed economy and centralized state.

Addendum: The tendency to culturally evolve seems normal. It happened to Islam in China when it was isolated from the broader world Islamic community.

• Category: History, Ideology • Tags: Kongo, Religion

Netflix-Old-Logo My Netflix account is going up in price from $7.99 to $9.99. They had warned this was going to happen. I don’t use Netflix much, so I’ve wondered if I should cancel (I have Amazon video options through Amazon Prime too). I probably won’t do so now, as it’s really cheap. But I don’t have time to “binge watch” television shows, and never get around to watching movies. So who knows?

The New York Times magazine had a piece up recently, Can Netflix Survive In the New World It Created. It’s interesting, but I want to again highlight Netflix’s culture in relation to employment: they focus on ‘superstars’ and don’t have any loyalty nor do they expect loyalty. The person who pushed for this policy was herself let go:

One of my last interviews at Netflix was with Tawni Cranz, the company’s current chief talent officer, who started under Patty McCord in 2007. Five years later, McCord, her mentor, left. When I asked her why, she visibly flinched. She wouldn’t explain, but I learned later that Hastings had let her go.

As long as Netflix is riding high, its policy in relation to employees will return yields. The problem is when the first sign of trouble crops up literally every employee will be running out the doors. I think Netflix is basically like an asexual lineage. When it’s optimized for its environment, it doesn’t pay the “two-fold cost of sex,” and it enters growth phase. But these lineages are far less robust to environmental turbulence, as all their eggs are in one genetic basket. Similarly, Netflix has put all its eggs into the basket of ideal and high return skills for current market conditions. There is no reserve of loyalty or cohesion to push through tough times, when ‘rational’ employees with prospects, which all of its employees presumably have, would simply jump ship.

• Category: Miscellaneous • Tags: Netflix

18920161 I have been very busy obviously. This is not a complaint, though I wish I could spend more time with my family. I do things professionally that I love. And, I’m well compensated for it.

Many people are not in a similar situation. I don’t have a major comment on the recent British vote aside from the fact that in a democracy with one person (adult) one vote the outcomes are not always going to be congenial to elites. I’d rather not be reductive, but, if people in large numbers are behaving in a manner that you perceive to be nihilistic, it may have something to do with your lack of comprehension about their values or prospects. The elites over the past ten years do seem to be engaging in a full throttle game of economic (neoliberal and pro-corporate) and cultural capture of the nation-state. This has triggered populism of the Left and Right. In popular democracies that means that the elites can sometimes lose, because non-elites believe they have nothing to lose.

Obviously I have not been able to sit down and write a long treatment of Iosif Lazaridis’ magisterial The genetic structure of the world’s first farmers. Greg Cochran has some comment, while the comment threads of Eurogenes are often informative. I would recommend that you read the supplementary document first. It’s basically a small book.

A few quick comments though. David Reich has stated that all of the world’s major populations are the products of relatively recent admixtures (i.e., the last 10,000 years after the Ice Age). In Lazaridis’ et al. the authors suggest that West Eurasian populations can be thought of as a mix of four root populations which flourished ~10,000 years ago. But I’d like to add that two of the four, the farmer populations, are themselves admixtures between two very distinct streams. A step backward and you have three root populations: Basal Eurasians, Ancient North Eurasians, and a variegated “West Hunter-Gatherer” set of groups. We have ancient genomes for the last two groups in a relatively unadmixed form, but not the first.

Also, 2007 PNAS paper Genetic evidence for a second domestication of barley (Hordeum vulgare) east of the Fertile Crescent: “We use differences in haplotype frequency among geographic regions at multiple loci to infer at least two domestications of barley; one within the Fertile Crescent and a second 1,500–3,000 km farther east. The Fertile Crescent domestication contributed the majority of diversity in European and American cultivars, whereas the second domestication contributed most of the diversity in barley from Central Asia to the Far East.” (via Jeff Ross-Ibarra)

One of the things that ancient genomes have taught us is that the past was subject to heroic tumult. Demographic shifts were not like the diffusion of heat through space, but a phase transition. At some point I want to go back to the most ancient oral and textual memories of Holocene man. In particular, the Rig Veda seems likely to have fragments of a world that made us. Any suggestions for good translations? (I have the Griffith one).

Comments have been pretty good recently by the way. Keep it up.

What else is going on?

• Category: Miscellaneous • Tags: Open Thread


The chart is from an article in Nature. But the source is NHGRI. It illustrates that between 2008 and 2012 genomics as a field crushed Moore’s law. Then there was a leveling off between 2012 and the middle of 2015. Illumina had a quasi-monopoly for that period and sequencing costs did not decrease too much. But something changed in the past year. As you can see it’s as if we’re back in 2008, or at least we see hints of the beginning of another major crash. I assume part of this is that Oxford Nanopore is finally starting to present a possible future of a disruption of Illumina’s dominance. Though that’s still a very speculative possibility.

But second, Illumina itself sees sequencing as a commodity service, and is pushing the price point down to get more data out there. Sort of like IBM transitioned from being a company that sold you big metal boxes, to a company where big metal boxes were part of an ecology of services that you purchased, Illumina is imagining a future where the sequence is just the first step, and not a very remunerative one at that.

To take a step back, we’ve gone from the 1990s, where the human genome cost about $3 billion dollars to sequence, to today where small firms like Full Genomes are pushing $1,450 high quality whole genomes to consumers. Veritas is gone even lower, though it seems that they’re limiting the supply right now.

It’s easy to be pessimistic. But there is reason for optimism about the power of technology.

• Category: Science • Tags: Genomics

340px-Codex_Gigas_devilThere has been extensive discussion online about the fact that the character of Ramsay Bolton on the HBO television show Game of Thrones was irredeemably psychopathic, cruel, and so ghoulishly sadistic as to be a cartoon of evil. But as a reader of the books I’ve generally shrugged off these complaints, because the character is even more perverse on the page than the screen. If you don’t believe me, this article in Vulture lays it out comparatively. It isn’t just that Ramsay kills people, most of the “nobility” in George. R. R. Martin’s world are butchers. It is who and how that is more shocking. For Ramsay killing is not simply a means, but an ends.

Screenshot 2016-06-20 20.31.42 Not only is the book Ramsay even more inhumane than the television Ramsay, but he doesn’t exhibit an incongruity between his physical appearance and his behavior, as he does on the television show. That is, while the actor who plays Ramsay is handsome, in the books he described as not not physically attractive at all.

All this in and of itself doesn’t raise eyebrows. George R. R. Martin doesn’t write characters who are boy-scouts. He admits to preferring shades of gray. But Ramsay is no shade of gray. Who then is the equivalent to Ramsay? It seems that in this case Martin’s world is somehow unbalanced.

• Category: Miscellaneous • Tags: Game of Thrones

18925629._UY200_ The show runners of Game of Thrones (the HBO television which will actually complete its run under its original creators) admitted that they patterned part of the battle in yesterday’s episode on the Battle of Cannae. This was obvious to me, as I was actually thinking that the Boltons were exhibiting something similar to the Carthaginian double envelopment. Pretty cool synthesis of a callback to The Two Towers, as well as integrating real history.*

If you want to read a great description of Cannae, I’d recommend Adrian Goldsworthy’s The Punic Wars.

* Somewhat anachronistically, as the phalanx formation was used by the Romans, and perhaps not the Carthaginians.

• Category: History • Tags: Game of Thrones

The genetics of an early Neolithic pastoralist from the Zagros, Iran.

The genetic structure of the world’s first farmers.

Busy at the Evolution Meeting.

• Category: Miscellaneous • Tags: Open Thread
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"