The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Screenshot 2016-11-28 02.43.45
MIT Technology Review has an article up, Do Your Family Members Have a Right to Your Genetic Code?, which is now part of the genomics-human-interest-piece genre you see regularly. Here you have the exemplar of this sort of narrative: what do you do when one twin gets a test and the other does not, and they disagree on how much they want to know? (obviously the twin being tested is a “I-want-to-know”) In this case though there is a twist: both of them are scientifically highly trained. And, it turns out that the twin who was worried/nervous about genetic results turned out to have ‘actionable’ results from the other twins’ decision.

There’s a major problem I have with this genre: genomics is not magic, genetics predates genomics by 100 years, and genetics predates DNA based molecular biology by 50 years. That means the basics of this dilemma have been present for over a century. Here are the conditions:

1) You have a condition
2) That condition is heritable
3) You have family members with varying degrees of relatedness to you

If it’s a highly penetrant variant, by which I mean that a very high proportion of individuals with the mutant allele will develop the condition, then the very fact of any individual being diagnosed has implications across the whole pedigree. Do you then keep it a family secret? Do you speak of it to friends? Do you avoid support groups based on the disease for reasons of genetic privacy?

Obviously sequencing an identical twin is a reductio ad absurdum. There’s very little uncertainty for highly penetrant alleles in this case. You know what the allele simply by inspecting the genome, even before diagnosis of any disease or condition (it may be a late in life disease). And, identical twins are almost perfect concordant in their genome. But qualitatively the same concerns have been present in some form for over a century.

Every disease you manifest and every trait you show on your face is a reflection on the genetics of your extended pedigree. Any information you divulge as a matter of course or happenstance then becomes a “bioethical dilemma.” The choices you make will smoke out values. For example, I weight the interests and privacy of the nuclear family far more than even the nearest circle of relations. I would without much thought sacrifice the diluted expectation of privacy (diluted because of diminished relatedness) for my extended kin (siblings, parents, cousins) if I needed information to help my children. In contrast, if I lived in a Hindu joint family my calculus might be different.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

300px-Hamito-Semitic_languagesIf you follow Y genealogy you know that the distribution of R1ba2 exhibits a peculiar pattern. R1b is the most common haplgroup in Western Eurasia, and shares a deep common ancestry with R1a. It seems to have risen to high frequencies in Europe only during the Bronze Age, though has been found in earlier periods. But within Africa R1b is found in very high concentrations around Lake Chad. This particular R1b lineage seems to have diverged from other Eurasian branches in the latter portion of the Pleistocene, so one possible consideration is that this was an instance of Eurasian backflow during the Ice Age.

One reason I have been somewhat skeptical of this model is that the Sahara desert was much more extensive and arid during much of the Pleistocene than today. And during this period humans had less cultural technology to endure the rigors of the deep desert. Or, if they did, their population densities were likely much lower, which probably served as an impediment to gene flow.

A new paper in The American Journal of Human Genetics sheds light on what might have been going on here. Chad Genetic Diversity Reveals an African History Marked by Multiple Holocene Eurasian Migrations. The major findings are straightforward. First, much greater sampling of populations, and a better depth/density of marker coverage, allowed the researchers to detect low levels, on the order of ~1%, Eurasian admixture in some Central African groups. This admixture seems to date to the Holocene, ~5,000 to ~7,000 years before the present (they used LD based methods on the autosome). Interestingly, the R1b lineage common in Central Africa also seems to coalesce during this time. Finally, the admixture seems to be closest to Sardinians among extant populations.

The Sardinian affinity of much of African Eurasian admixture may seem peculiar, but it makes more sense when one considers that Sardianians are probably the best modern proxies for the earliest Neolithic farmers from the Eastern Mediterranean. Modern Middle Eastern populations are very different from those which flourished in the prehistory between the rise of agriculture and complex civilizations because of admixture within Middle Eastern groups. The initial push into Africa by the agriculturalists dates to a period before we have a good understanding of the ethnographic balance.

Very high frequencies of R1b in modern Central Africa groups may indicate drift. But another possibility is that the migration was male-mediated. This seems to have been the case in much of Eurasia, so it would not be surprising in this context. The status of these males was such that despite their diminishing genetic impact on overall ancestry, their Y chromosomes, and possibly their language, with varied forms of Afro-Asiatic, persisting down to the present.

Finally, here’s the last paragraph of the discussion:

Our study has shown that human genetic diversity in Africa is still incompletely understood and that ancient admixture adds to its complexity. This work highlights the importance of exploring underrepresented populations, such as those from Chad, in genetic studies to improve our understanding of the demographic processes that shaped genetic variation in Africa and globally.

• Category: Science • Tags: Africa, Genetics 
🔊 Listen RSS

gr2 A new paper in The American Journal of Humans Genetics, The Divergence of Neandertal and Modern Human Y Chromosomes, reports on possible reasons why we don’t see Y chromosomes in modern humans from this archaic lineage, despite exhibiting detectable levels of autosomal admixture. As you might recall the clear lack of deep branching Y and mtDNA lineages was long one of the major genetic rationales for why gene flow between Neanderthals and modern humans was presumably not very significant. This, despite suggestive evidence from morphological analysis as well as inferences from autosomal data. The problem is that it is harder to do the sort of clean phylogenetic reconstruction via a coalescent model utilizing autosomal data (which recombines, as opposed to the Y and mtDNA, which do not for the regions of interest), so ancient genome sequences were really what was needed to convince most people with these sorts of markers.

This makes us ask: why are Neanderthal Y and mtDNA lineages not found in modern humans which exhibit indications of gene flow from other hominin lineages? After all, the lack of these really led many people off on the wrong track for years. I recall in 2008 going to a talk by Svante Paabo who reported that the Neanderthal mtDNA he had sequenced was definitely very different from anything in the current databases for our species, which confirmed his assumption that there was no admixture into modern populations (Paabo changed his tune very soon after due to the whole genome sequencing obviously). One simple explanation is that because effective population sizes of Y and mtDNA are smaller than autosomal regions of the genome they’ll be more strongly subject to drift, and exhibit higher extinction rates. In other words, it wouldn’t be that surprising of all Neanderthal Y and mtDNA went extinct after admixture because they were a small minority, and most lineages went extinct in any case. Researchers who work in non-human phylogeography who relied on mtDNA in particular can tell of many stories of being led astray by looking at one informative locus.

But chance may not be what is at work here. Buried in the discussion of the paper:

…polypeptides from several Y-chromosome genes act as male-specific minor histocompatibility (H-Y) antigens that can elicit a maternal immune response during gestation. Such effects could be important drivers of secondary recurrent miscarriages30 and might play a role in the fraternal birth order effect of male sexual orientation.31 Interestingly, all three genes with potentially functional missense differences between the Neandertal and modern humans sequences are H-Y genes, including KDM5D, the first H-Y gene characterized…It is tempting to speculate that some of these mutations might have led to genetic incompatibilities between modern humans and Neandertals and to the consequent loss of Neandertal Y chromosomes in modern human populations. Indeed, reduced fertility or viability of hybrid offspring with Neandertal Y chromosomes is fully consistent with Haldane’s rule, which states that “when in the [first generation] offspring of two different animal races one sex is absent, rare, or sterile, that sex is the [heterogametic] sex.”

The origin of species is obviously one of the founding questions which arose with the emergence of evolutionary biology. Haldane’s rule dates to the 1920s. In mammals the heterogametic sex are males, so these the hybrids which will be selected against (or, they may be sterile). There’s been a lot of research of late on why Neanderthals went extinct, and whether there were speciation barriers in keeping with the biological species concept between our two lineages. This result suggests that there is going to be interesting stuffed coming out of the population genomics of ancient hominins in the near future….

• Category: Science • Tags: Genetics, Neanderthals 
🔊 Listen RSS

7531-1477645031Eurogenes points me to this interesting conference with a book of abstracts, Human Dispersals in the Late Pleistocene – Interdisciplinary Approaches Towards Understanding the Worldwide Expansion of Homo sapiens. Below are those of interest to me….

Philipp Gunz

Max Planck Institute for Evolutionary Anthropology
Leipzig, Germany

Evolution and development of the modern human
face and brain

A number of fossils from North, South, and East Africa document the early stages of our species, and fossils from the Levant document the presumed first wave of migration out of Africa. The exact place and time of our species’ emergence remain obscure as large gaps in the fossil record and the chronological age of many key specimens make it difficult interpreting the evolutionary processes and population dynamics shaping the cranial diversity of modern humans. Here we use 3D geometric morphometrics based on landmarks and semilandmarks to compare facial and endocranial shape in a worldwide sample of recent and fossil humans from Africa, Europe, and Asia.

Our data support a complex evolutionary history of our species involving the whole African continent. Regarding facial shape, we find that even the early H. sapiens specimens fall within the shape variation of recent modern humans. Endocranial shape, however, changes considerably within the Homo sapiens lineage.

51t3ZeiK+vL._SY344_BO1,204,203,200_ I think I understand archaic introgression better now. Humans really care about faces. Brains? Not as much. If our species developed its normal range of species-typical faces rather early on than we’d recognize each other as conspecifics, despite widespread phenotypic differences (including likely cognitive and behavioral) and genetic divergence. Basically, it’s just like the Trojan War; a face can launch ships, and mediate gene flow.

John Hawks

University of Wisconsin-Madison, USA

African population diversity and its relevance for human dispersals

As modern humans dispersed throughout the world, they encountered and mixed with populations with much greater genetic distinctiveness than any living humans today. This process is now relatively well documented by ancient DNA in Eurasia and Australasia due to the ancient DNA records of Neanderthal and Denisovan samples. Within Africa this process of contact and mixture between genetically differentiated populations also took place, evidenced by the evidence of population mixture from genomes of some African populations today. The process began earlier, well before 100,000 years ago, and may have extended over a longer period of time. The evidence suggests that modern humans originated and began their dispersals within an African continental context equally or more genetically structured than Eurasia. However the fossil record of this population is very sparse, and it is not evident how archaeological distributions may relate to biological populations. Here I discuss the implications of this population structure for human dispersal and adaptability. T he modern human phenotype originated as one well adapted for dispersal within a long-existing network of successful populations of potential competitors.

Basically it strikes me that John is developing and extending the neo-multiregionalist framework that he was operating within in the early 2000s. Also, African substructure is a thing. A major thing.

Finally, but not least:

Stephan Schiffels

Max Planck Institute for the Science of

Human History, Jena, Germany

Analysing Australian genomes to learn about early modern human dispersal out of Africa

When and how modern humans left the African continent is still a debated question. Recently, three projects have analysed new genetic data from modern populations in Papua New Guinea and Australia, which has provided new insights on this topic. I will present analyses from one of these publications (Malaspinas et al. 2016), and compare results with findings from the two other projects (Mallick et al. 2016, Pagani et al. 2016). Here, we used MSMC2, a novel computational framework to analyse the distribution of times to the most recent common ancestor along multiple sequences. We find that all non-African populations that we analysed, including Australians, experienced a very similar population bottleneck in the past, consistent with only one out-of-Africa migration for all extant non-African populations. At the same time, we find evidence that some African populations are more distantly related to Australians than to Eurasian populations, and we show that this result is robust to haplotype phasing errors and archaic introgression. We interpret our result as evidence for gene flow between some Africans and Eurasians after the initial split, which is also consistent with results from other population genetic methods. Our analysis suggests that in order to understand human dispersal out of Africa, we need to better understand ancient population substructure within Africa, which is an important direction for future research.

Again, ancient African substructure. No coincidence. Talk to the cutting edge people in the field, and this is the fabric of reality that the knife’s edge is going to slice in the near future. Second, I do believe it is likely that there was non-trivial gene flow between Sub-Saharan Africa and Western Eurasia over the past 50,000 years. Some of this is masked perhaps by low levels, but, just as likely in mind, ancient African structure which has been erased due to population turnover.

• Category: Science • Tags: Genetics, Human Evolution 
🔊 Listen RSS

Rami_Malek_in_Hollywood,_California A friend of mine introduced me to Mr. Robot a month ago. The show was difficult for me to follow, and I don’t watch much TV in the first place (“watching TV” is like making a “mix tape”; there’s not television involved anymore). But, the star, Rami Malek, had an intriguing look.

It was only later that I realized why: his face resembled the Fayum portraits. These miniatures represented people in Roman Egypt from all walks of life. They are one of the best set of representations we have of normal individuals, albeit, prosperous enough to commission these works.

Malek is from a Coptic family, so presumably genetically representative of people in Roman Egypt during that time. It stands to reason that he’d look quite like many of these ancient Romans.

italian Anyway, I happen to have some data laying around put it through PCA, Treemix and ADMIXTURE. If you click the plot to the left PC 1 shows a cline from Sardinians to Lithuanians. PC 2 is from (modern) Egyptians to Basques. The Egyptians are clearly being shifted by their Sub-Saharan African admixture, which in other analyses usually comes in at between 10% to 25% depending on the individual. The Assyrian Christian samples, and Cypriots, are much closer to the other populations on PC 2 (several of the Lebanese). Then the Sicilians, Tuscans, Bergamo Northern Italians, and Spanish (before the Basque).

Sometimes Treemix is more informative. Below is a pretty representative graph with 5 migration edges (I set Egyptians to be the root):

Screenshot 2016-10-24 22.39.38

And here’s K = 4.


These sorts of plots are a Rorschach test. But, I’m pretty sure ancient DNA will confirm that migration around the Mediterranean during the Classical Era was non-trivial, but, the minor component in the ancestry of most modern populations.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

Screenshot 2016-10-08 09.46.41
Screenshot 2016-10-08 10.06.17 Many people have skin problems. Though luckily I’ve never had an issue with acne, most people who know me personally are aware that I suffered from extreme eczema as a child. Most of the major issues occurred when I was under five years of age, and in my first few years, so I have only minimal first hand recollection. The problem runs in my family, though I was the most extreme sufferer. Eczema also correlates with asthma, something I also suffer from.

Naturally I was curious about this new paper in Genome Biology, Atopic Dermatitis Susceptibility Variants In Filaggrin Hitchhike Hornerin Selective Sweep:

Human skin has evolved rapidly, leaving evolutionary signatures in the genome. The filaggrin (FLG) gene is widely studied for its skin-barrier function in humans. The extensive genetic variation in this gene, especially common loss-of-function (LoF) mutations, has been established as primary risk factors for atopic dermatitis. To investigate the evolution of this gene, we analyzed 2,504 human genomes and genotyped the copy number variation of filaggrin repeats within FLG in 126 individuals from diverse ancestral backgrounds. We were unable to replicate a recent study claiming that LoF of FLG is adaptive in northern latitudes with lower ultraviolet light exposure. Instead, we present multiple lines of evidence suggesting that FLG genetic variation, including LoF variants, have little or no effect on fitness in modern humans. Haplotype-level scrutinization of the locus revealed signatures of a recent selective sweep in Asia, which increased the allele frequency of a haplotype group (Huxian haplogroup) in Asian populations. Functionally, we found that the Huxian haplogroup carries dozens of functional variants in FLG and hornerin (HRNR) genes, including those that are associated with atopic dermatitis susceptibility, HRNR expression levels and microbiome diversity on the skin. Our results suggest that the target of the adaptive sweep is HRNR gene function, and the functional FLG variants that involve susceptibility to atopic dermatitis, seem to hitchhike the selective sweep on HRNR. Our study presents a novel case of a locus that harbors clinically relevant common genetic variation with complex evolutionary trajectories.

This shows the importance of whole genomes, as the earlier result correlation variation to climate seems to be due to ascertainment bias in terms of SNP discovery. Additionally, it’s intriguing that the haplotype which eczema-like diseases are associated with is very ancient. It’s found in Africans, and ancient genomes. So it’s been segregating in human populations for a while. That indicates some sort of balancing selection going on, so that it’s never purified.

What they found is that in Chinese samples there has been a recent positive sweep. I don’t really buy their conclusion that the haplotype wasn’t found in European hunter-gatherers, they don’t have a large enough sample. But it does seem to be a case where there is balancing selection maintaining standing genetic variation, which purified may be selected for or purified in some populations.

This may be a more common dynamic than we might realize. Many complex diseases may exhibit risk profiles due to being dragged up in frequency because of associations with a nearby region, or, a genetic-correlation where the positive benefit is greater than the negative.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

Migraciones_austronesiasGgas_human_soc One of the most incredible journeys that the human species has undergone is the Austronesian expansion of the past 4,000 years. These maritime peoples seem to have emerged from the islands of Taiwan, and pushed forward south, west, and east, so that their expansion pushed to East Africa, and the fringes of South America. There now also some circumstantial evidence that Polynesian contact with the Americas predates the Columbian Exchange. Looking at the map above in hindsight it seems natural to imagine such contacts.

Though where the Austronesians went is incredible, their origins are somewhat more opaque, but rather tantalizing. That is because their original expansion was likely just before the horizon of history. In Guns, Germs, and Steel Jared Diamond alluded to the “express train” vs. “slow boat” models of the expansion. Basically, whether the Lapita peoples rapidly pushed out from Taiwan, or whether there was a long period of coexistence with Melanesians in Near Oceania. Over the past few years genetics seems to have supported the “slow boat” model.

Here is a paper from 2012, Population Genetic Structure and Origins of Native Hawaiians in the Multiethnic Cohort Study:

The “Express Train” and the “Slow Boat” models of Polynesian migration are expected to have uniquely distinct genetic signatures on present day genomes of Native Hawaiians. Under the “Express Train” model, the proportion of admixture in Native Hawaiians of Melanesian and Asian ancestry is expected to be near zero, whereas under the “Slow Boat” model, the proportion of admixture is expected to be substantially greater than zero. To test these two models, we conducted a supervised ADMIXTURE analysis using Papuan and Melanesians as one source population of Polynesians and Han Chinese, She, Cambodian, Japanese, Yakut, and Yi as surrogates for the second source population of Taiwanese aborigines[18],[19]. Importantly, we did not fix ancestry for the Melanesians or Asians and therefore allowed for admixture within either ancestral groups–thus, mitigating bias by earlier admixture processes and allowing for accurate clusters of ancestry membership. We set K = 2 and estimated in 40 100% Native Hawaiians an average of 32% and 68% of their genomes to be derived from Melanesian and Asian origins, respectively (Figure 4). This notable proportion of Melanesian admixture (32%) among Native Hawaiians, substantially greater than zero, lends support of the “Slow Boat” model of ancestral origins.

This is not an isolated study. Y chromosomes indicate substantial Melanesian admixture, while the mtDNA does not. One inference then was a “slow boat” model predicated on matrifocality. That is, expanding Polynesian groups were centered around matrilineal lineages, and absorbed Melanesian men into their communities. The above research was from a Hawaiian data set, but the results are consistent across Polynesia in relation the proportion of Melanesian ancestry.

Case closed? No so fast! Ancient DNA has now been brought to the question, and fundamentally changed our perceptions. Genomic insights into the peopling of the Southwest Pacific:

The appearance of people associated with the Lapita culture in the South Pacific around 3,000 years ago1 marked the beginning of the last major human dispersal to unpopulated lands. However, the relationship of these pioneers to the long-established Papuan people of the New Guinea region is unclear. Here we present genome-wide ancient DNA data from three individuals from Vanuatu (about 3,100–2,700 years before present) and one from Tonga (about 2,700–2,300 years before present), and analyse them with data from 778 present-day East Asians and Oceanians. Today, indigenous people of the South Pacific harbour a mixture of ancestry from Papuans and a population of East Asian origin that no longer exists in unmixed form, but is a match to the ancient individuals. Most analyses have interpreted the minimum of twenty-five per cent Papuan ancestry in the region today as evidence that the first humans to reach Remote Oceania, including Polynesia, were derived from population mixtures near New Guinea, before their further expansion into Remote Oceanian…our finding that the ancient individuals had little to no Papuan ancestry implies that later human population movements spread Papuan ancestry through the South Pacific after the first peopling of the islands.

These results strong indicate that the original Lapita migration did not mix with Melanesians. And, the ancient samples share common ancestry with modern Polynesians, so that their heritage persists down to the present. Looking at the distribution of Melanesian ancestry they concluded this admixture occurred on the order of ~1,500 years before the present (their intervals were wide, but the ancient samples serve as a boundary). Additionally, in line with the Y and mtDNA the X chromosome indicated more of the ancient ancestry than the autosome. The authors conclude that “it is also possible that some of these patterns reflect a scenario in which the later movement of Papuan ancestry into Remote Oceania was largely mediated by males
who then mixed with resident females.”

The take home message than is that we need to be more modest with our models. Without ancient DNA it seems likely that we would not have stumbled onto this result; the ancestry deconvolution methods which date admixture have wide confidence intervals when you go back far in time.

• Category: Science • Tags: Genetics, Genomics 
🔊 Listen RSS

Screenshot 2016-09-30 08.23.21 As you may know in Britain there is a new direct to consumer genetic testing service, Living DNA. Debbie Kennett has a post up where she talks about how it works and why it’s different. For now it is British focused, and leverages haplotype-based methods with the PoBI database to give really fine-grained analysis to their customers on those sceptered isles.

Brought to you by the same people who brought you FineStructure, this is a major offering in this space.

But there’s a more general issue I want to comment on. In the video accompanying the website, one of the presenters states that people are “very surprised how admixed” humans are. This depends on scale.

Let’s start at a time, 0. In the short term you are not very admixed usually. Outside of the Americans and other settler societies admixture is not common on the scale of recent generations. But as you go further back, you become quite admixed. E.g., The Geography of Recent Genetic Ancestry across Europe showed lots of admixture on 1,000 year scales within Europe. David Reich’s lab has shown lots of admixture between very diverged populations on a 5,000 year scale. But as you go further back the ‘admixture’ gets lower and lower, and at some point you hit the species barrier, and you see the same genealogies coalescing again and again through the bottleneck.

Now, look at space. As distance is ~0 you’ll see lots of admixture. But as you push further and further away, the admixture drops. At some point the admixture is quite boring. Ergo, all the “you are 100% European” results from DTC companies.

The two dimensions look quite different. Admixture increases as you go further back in time…until it doesn’t. You’re hitting the “species” taxonomical barrier. In terms of space, admixture decreases the further you go away from your focal point of interest.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

Screenshot 2016-09-18 20.57.52

The above results are from Ancestry. You can see here 4% Melanesian. This is common in South Asians. And it’s not an error in the method. Rather, it is a natural outcome of the methods uses to generate admixture profiles.

Basically what’s going on is this:

1) You have data. In this case, the data are your own genotypes, as well as that of a set of individuals which represent world genetic variation, and are categorized into discrete populations.

2) You have a model or set of models. These models have different parameters.

3) You look at the data you have, and pick the parameters which best explain the data given the model.

If you have 100,000 or more markers that’s more than enough genotype data for individuals. The models themselves are quite stylized (e.g., HWE random mating sets of populations), but close enough to reality to give good results in many cases. For example, Ashkenazi Jews are often assigned to be ~100% Ashkenazi Jewish through these methods.

Then again, Ashkenazi Jews are a good test case. This is a population which went through a bottleneck about 500 to 1,000 years ago, and has been reasonably endogamous most of this time. Additionally, it’s not extremely structured due to inbreeding in different clan lineages. Though cousin marriage and uncle-niece marriage has been practiced by Ashkenazi Jews, the runs of homozygosity you see in Jewish genomes is not such that indicates a highly inbred population, as is common in the Middle East or South Asia. Rather, there are lots of medium length segments identical by descent across individuals.

Ashkenazi Jewish population is rather simple, and it is actually a rather clear and distinct population cluster. It stands to reason that when you create an Ashkenazi Jewish reference panel in your training data set it’s a pretty good match to the individuals you are testing.

The problems occur when you are to generate clusters and ancestry assignments for populations which are not so clear and distinct. Why do South Asians routinely come out as part Melanesian or Polynesian? This post was prompted by a Facebook thread where a South Asian customer of Ancestry was interested to see she had Polynesian ancestry. The reality is she almost certainly does not have Polynesian ancestry.

What’s going on is that the reference panel for South Asians used by many of the DTC genomics companies is not diverse enough to capture South Asian genetic diversity. There is an element of South Asian ancestry, “Ancestral South Indian” or ASI, which has deep shared ancestry with populations across Southern Eurasia and out toward Oceania. The admixture analysis method is searching through the reference panels for combinations of genotypes which can explain individual genetic variation. Since the South Asian training set is insufficient to explain all the South Asian variation the algorithms are filling in the balance of the variation with the closest available proxies to the “ghost clusters.”

The method is constrained and conditioned on two things:

1) The data being put in, which is often insufficient.

2) The set of populations that it is forced to work with to generate the combinations in individuals (the parameter values in the model to explain the data) are often insufficient or artificial.

What I mean by the last is that many of the genetic clusters are not taxonomically equivalent. “South Asian” ancestry is much more diverse and diffuse than “Melanesian” ancestry. This why Melanesian ancestry can explain South Asian ancestry, but generally not the reverse.

• Category: Science • Tags: Genetics, Genomics 
🔊 Listen RSS

Screenshot 2016-09-15 16.38.46

51gumWkW0TL A new paper in Quaternary International, Western Eurasian genetic influences in the Indonesian archipelago, confirms what has long been suspected by smaller batch data:

…To locate the primary areas of Western Eurasian genetic influence in Indonesia, we have assembled published uniparental genetic data from ∼2900 Indonesian individuals. Frequency distributions show that Western Eurasian paternal lineages are found more commonly than Western Eurasian maternal lineages. Furthermore, the origins of these paternal lineages are more diverse than the corresponding maternal lineages, predominantly tracing back to South West and South Asia, and the Indian sub-continent, respectively. Indianized kingdoms in the Indonesian archipelago likely played a major role in dispersing Western Eurasian lineages, as these kingdoms overlap geographically with the current distribution of individuals carrying Western Eurasian genetic markers. Our data highlight the important role of these Western Eurasian migrants in contributing to the complexity of genetic diversity across the Indonesian archipelago today.

The table above highlights the distribution of paternal Indian lineages in several parts of Indonesia. These Y chromosomal haplotypes are found in the core of what was Majapahit. Some of these haplotypes might be due to shared ancient ancestry, but the presence of R1a means that it is more recently than the past 4,000 years, as I believe R1a is relatively intrusive into South Asia. Many of the other haplogroups are a diverse cross-section of those typical for South Asia.

The further question then is whether these date to the period of European colonialization, or to the first millennium A.D., when the first “Indianized kingdoms” arose in Southeast Asia. The fact that there is compelling evidence of old and even admixture in Cambodia, where colonialism was not as pervasive or longstanding as in maritime Southeast Asia, suggests that it can’t be chalked up to the Dutch presence, and their role as mediators for migration (more plainly, they enslaved many South Asians and moved them around the Indian ocean basin).

But the text of the paper makes some things rather clear:

…constant since the first contacts and exchanges between the Indian sub-continent and Indonesia in the late 1st millennium B.C.E., it is likely that this gene flow was particularly intense during the period of the Hindu kingdoms in Indonesia (7th to the 16th century AD). These assumptions, based on archaeological and historical data, are also in broad agreement with dating on unpublished genome-wide SNP markers from Island Southeast Asia (unpublished data).

• Category: Science • Tags: Genetics 
🔊 Listen RSS

Screenshot 2016-09-05 12.27.38
For whatever reason I missed this paper which came out in July in AJHG, Human Y Chromosome Haplogroup N: A Non-trivial Time-Resolved Phylogeography that Cuts across Language Families. Basically it blows up sample size and utilizes NGS techniques (whole-genome) to resolve some questions around haplogroup N, and in particular the M46/TAT subclade which exhibits a peculiar geographic distribution, from the shores of the Baltic to easternmost Siberia.

Screenshot 2016-09-05 12.33.34 I actually blogged about this as far back as 2003, so it’s a long term mystery. There’s no autosomal rhyme or reason to the frequency of this lineage. Yes, there is a vague Uralic affinity, but this Y chromosomal variant is higher in the Lithuanians than the Finns, and found in peoples as distant as the Koryaks. One of the major early questions was whether it was a marker that indicated east-west movement, or west-east movement. In other words, was it associated Siberian ancestry in Finns and affiliated people, or did it indicate European ancestry in Siberian people?

Rurik, carrier of N1c

Rurik, carrier of N1c

If the results in this paper are correct the likely answer is: none of the above. The core TAT lineage looks like it underwent an explosion ~5,000 years ago. This is around the same time as Northern Europeans and Siberians as we understand them were coming into being. So the TAT lineage didn’t come with a specific people, it was part of the process which made the people. I’ll quote from the discussion:

Overall, a considerable proportion of men inhabiting much of the Arctic and temperate zones of western and eastern Eurasia share N3a3’6 lineages that date back to the mid-Holocene (4.5–5.0 kya). This common patrilineal ancestry unites widely different linguistic phyla, including Indo-European, particularly Balto-Slavic, branches of the Altaic, such as the Mongolic, Turkic, Tungusic, and Chu- kotko-Kamchatkan branches, as well as the Balto-Finnic branch of the Finno-Ugric.

The autosomal genome-wide data is clear, pretty much all the Finnic peoples in Europe seem to have a small (to various degrees, with Finns proper the least), but clear, signal of admixture that is Siberian. It is tempting to associate this with the men who carried TAT into these populations, but observe that the Lithuanians seem to be lacking in this signature. Y chromosomes and autosomes are not always in alignment, but recall that many Siberians have some West Eurasian ancestry, some of it likely quite ancient, and carry R1a1a Y chromosomes. The past was more complex than we had assumed, and the relationship between movements of men and languages is likely not so straightforward in the inferences we can make. It may be that the Siberian admixture into Finnic peoples, and their linguistic identity, post-dates the arrival of TAT into the far north of Europe.

One of the aspects of the explosion of many Y chromosomal lineages 4-5,000 years ago is how much they don’t associate well with ethno-linguistic boundaries. The “Indo-Aryan” R1a1a in South Asia is very common in some low caste South Indian tribal populations. The R1b brought by the Corded-Ware culture, which presumably transmitted Indo-European languages, is at very high frequency among the non-Indo-European Basques, as well as groups such as Sardinians, who were Indo-Europeanized only in Classical Antiquity. The Y lineages seem to expanded far beyond the totality of the cultural unit.

Genetics is giving us lots of data. But there are no theoretical bones to scaffold this flesh.

• Category: Science • Tags: Finns, Genetics 
🔊 Listen RSS

Screenshot 2016-08-28 15.41.08
About thirteen years ago I expressed the opinion that an understanding of population structure will become a matter of intellectual curiosity once we have a better understanding of the genetic basis of characteristics. A friend, who was a statistical geneticist, told me that this was unlikely. We were unlikely to capture the ability to predict all outcomes well enough on even high heritable complex traits to simply discard population structure information. Some of this is not due to genetics; different populations may expose themselves to different environmental conditions. For example, it would be useful to know which individuals in the CEU white European American data set are practicing Mormons, and which are not, because Mormonism tends to result in a lot of behavior modification.

But some of the concern about population structure has to do with the fact that genetic background matters, and we are unlikely to ever have total omniscience as to the nature of genetic interactions and dependencies. By this, I mean that if we have a strong causal signal which associates disease risk with a genetic variant, that risk is still conditional on dependencies of other genetic variations across the genome. Those variations are the outcome of demographic histories, which one can “control” for to some extent by accounting for population structure. In more plain language, a signal that predicts an outcome in Norwegians may not predict the same outcome in Nigerians. The may be due to different frequencies of other variants which are not directly causal, but interact with the causal signals, which vary between populations.

Screenshot 2016-08-28 15.58.43 More recently I’ve been a bit sanguine. I don’t follow the literature closely, but papers like High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants, make me wonder if the genetic background concerns weren’t over-wrought.

A new preprint, Population genetic history and polygenic risk biases in 1000 Genomes populations, suggests we should be worried. Or, more precisely, we should be cognizant of the limitations genetic background imposes upon us for certain classes of variants and disease. In particular, rare variants are going to be less portable across populations because of shallower time depth of their emergence, after, populations have diverged. So, if you have a low frequency major effect causal variant in Europeans, there is a much lower likelihood that it is in other populations.

The histogram above illustrates an excellent case study from the preprint. The genetic architecture of height and its genomic basis has been most well elucidated for Europeans. We know, for example, many of the loci which distinguish Northern and Southern Europeans, and, we know that selection has resulted in divergence between the two populations over the past 5,000 years. But as you can see the predicted heights seem to simply follow genetic distance from Europeans. SAS = South Asians, while AMR = a mixed cohort of populations from the Americas. EAS and AFR are East Asians and Africans. In reality, Africans are nearly as tall as Europeans (taller or shorter depending upon the reference European population), and taller than East Asians. The predictions here are off because the causal variants inferred from the studies of European cohorts are portable in direction proportion to shared demographic history. South Asians share a relatively ancient demographic history with Europeans, while many mixed groups from the Americas have Europeans as one of their recent founding populations. But in both cases the causal variants were likely segregating in the ancestral populations before divergence, so there is no major difference in the consequence.

The preprint has a lot more than just a reanalysis of GWAS. Using local ancestry deconvolution methods they show how one can infer history from patterns of genetic variation (though as always, this should not be taken as gospel, as there are biases in the methods currently used). The major take home is simple: population structure is real, and, it has real consequences functionally.

• Category: Race/Ethnicity, Science • Tags: Genetics, Genomics 
🔊 Listen RSS


In 2011 I was having dinner with an old friend who was an engineer at Intel. He also has a Ph.D. from MIT. Smart guy. But when I mentioned casually offhand that we were all a few percent Neanderthal (outside of Africa), he was surprised. I was a bit shocked, as I explained that this was a huge science story. The Neanderthal genome had been published the previous year. How could my friend not have known?

He was totally unembarrassed, and told me I overestimated how closely the public followed genetics and paleontology. I’m sure he was right. But it’s hard to remember sometimes.

We’ve gone further beyond where we were in 2010. We now have a really good grasp of a lot of population dynamics in Eurasia over the past 20,000 years. Probably the best place to start is with this preprint, The genetic structure of the world’s first farmers. But the general outlines were already evident a few years back in Toward a new history and geography of human genes informed by ancient DNA.

Most of the world’s population seems to descend from a mixing of a set of groups which 10,000 years ago were distinct. How distinct? We’re talking about Fst values on the order of 0.10, which means that ~10% of the variation genetically is partitioned across two pairwise populations. That’s about what you see between Europeans and Chinese today. Some of the Fst values were a bit higher, some lower, but the 0.10 seems about right.

BlankMap-Worl To make it easy for some of you, I’ve labeled and placed the approximate locations of ancestral groups to modern Northern Europeans ~10,000 years ago. What I’m trying to represent is a map which shows the modal regions of distribution of ancestors that Northern Europeans today had 10,000 years ago. So, for example, since ~15% of the ancestry of Northern Europeans is “Ancient North Eurasian” (ANE), a lot of ancestors of Northern Europeans alive today would be living somewhere in the broad expanse of Central Eurasia (now, because of various demographic events the number of ANE was probably lower than farmers, perhaps lower than the 15% contribution to the modern genomes).

A substantial proportion of the ancestry of Northern Europeans is “European hunter-gatherer,” dating to the Pleistocene. But here’s the kicker: most of that ancestry dates to after the LGM, to about ~15,000 years ago. The really deep Pleistocene ancestry in Europe is only found at very low levels now.

The final issue is that a lot of the phenotypes that we racially code are recent. This probably explains why groups like the Kalash and Nuristanis can look more like Europeans than South Asians, but they’re genetically more like South Asians.

What does any of this have to do with non-scientific things? I don’t really know. My interest in population structure is intellectual, not personal. But a certain type of person should probably stop talking about how white people have been in Europe for 40,000 years. First, the ancestors of modern Europeans 40,000 years ago were almost all residing outside of Europe. An assertion that holds until 15,000 years ago. And most would still be resident outside of Europe 8,000 years ago as depending on how you count/calculate* And, perhaps more importantly, the typical phenotype of Northern Europeans probably really coalesced only around ~5,000 years ago.

* Definitely true for Southern Europeans, but conditional on Northern Europeans depending on where you draw Europe’s eastern boundary.

Addendum: I stole the title from John McWhorter’s book, Our Magnificent Bastard Tongue.

Also, this is not to say that

1) population structure today is trivial in a phylogenetic sense, it isn’t.

2) it is not to say that population structure functionally irrelevant, it isn’t.

• Category: Science • Tags: Genetics, Genomics 
🔊 Listen RSS

51sdHZvYfTL._SX334_BO1,204,203,200_ Evolutionary theory famously predated the emergence of genetics by decades. Initially there was some conflict between the heirs of Charles Darwin and the first geneticists in terms of their mechanistic understanding of how evolutionary process occurs. Within a few decades though genetics and evolutionary biology were synthesized so that the former came to be integral toward understanding the processes and parameters which shape the character of the latter (see The Genetical Theory of Natural Selection). E.g., imagine attempting to understand the origins and maintenance of sexual reproduction without any genetic understanding of the determination of sex and its implications for transmission.

But obviously genes are not everything when it comes to phenotypes. In particular with humans, there are complex behaviors and social interactions which seem to be persistent, and perhaps adaptive, which may not be directly contingent upon any simple genotype-phenotype map. 41YXHblIQEL This is not to say that cultural and behavioral traits have no genetic basis. To give an example, religion is a complex phenomenon which is both universal and does not seem directly encoded in one’s genes. The search for a “god gene” is futile, because religion as a phenotype is mediated by innumerable other phenotypes, which themselves have complex genetic bases.

Though culture is contingent upon genes, exhibits a character which is separable from genetic evolution. In particular, dual inheritance theory explicitly acknowledged that human cultural variation over time and space is a function of the interaction between both cultural and genetic evolution. Though there are similarities between the two, and in fact the field of cultural evolution consciously utilizes much of the same formalism as population and quantitative genetics, the modes of inheritance and nature of the origination and perpetuation of variation of the two differ a great deal.

As a rule of thumb you can posit that genetic evolution is relatively slow and torpid in relation to cultural evolution, which is protean and quicksilver. Consider that lactase persistence or high altitude adaptations are the two fastest we know for human genetics, and they occur on 1,000 year time scales. Over a 1,000 year time scale takes you from Julius Caesar to Otto the Great. It takes you from first of the Mycenaean, to Athens of Pericles.

The differences between culture and genes are important to keep in mind when one is making predictions. I’m a big fan of the Eric Kaufmann book, Shall the Religious Inherit the Earth?: Demography and Politics in the Twenty-First Century. The model outlined within the book, higher fertility for religious people, ergo, the reemergence of religion, is logically plausible. But I always must remind me people that the same concerns were prevalent in France before 1850, with the arrival of more traditional Roman Catholics into a milieu which had notably secularized and undergone early demographic transition. Why is France today not a uniformly Catholic republic? First, there is history. The migration of Muslims from North Africa. But even more important, cultural evolution, as the descendants of Spaniards, Poles, and Italians, secularized.

9780226558271 There is though a difference between description, and formal modeling. The field of cultural evolution attempts to do the latter. There are several lay and specialist introductions to the field (just click some of the book links and you’ve find them all). It’s worth attempting to grapple with the domain in a more systematic way, because that’s the only way you can make predictions which make sense of the diversity we see around us.

A new preprint is an interesting addition to the literature, Gene-culture co-inheritance of a behavioral trait:

Human behavioral traits are complex phenotypes that result from both genetic and cultural transmission. But different inheritance systems need not favor the same phenotypic outcome. What happens when there are conflicting selection forces in the two domains? To address this question, we derive a Price equation that incorporates both cultural and genetic inheritance of a phenotype where the effects of genes and culture are additive. We then use this equation to investigate whether a genetically maladaptive phenotype can evolve under dual transmission. We examine the special case of altruism using an illustrative model, and show that cultural selection can overcome genetic selection when the variance in culture is sufficiently high with respect to genes. Finally, we show how our basic result can be extended to nonadditive effects models. We discuss the implications of our results for understanding the evolution of maladaptive behaviors.

The most relevant section is probably 3.2 Model 2: Cultural prisoner’s dilemma. If you don’t know what the Price Equation is, read the original paper. It will induce some clarity.

The fact that more variance in culture in relation to genes allows for selection to act more powerfully on culture, and arguably in a maladaptive manner from the gene-centric perspective, is no surprise. This preprint adds more precision and clarity. For adaptation to occur there needs to be heritable variation. One reason that cultural group selection is more plausible than genetic group selection is that genetic variation across demes is often very low. The Fst between racial groups may be 0.10 to 0.30, but it is not very common for such Fst values to be realized between two groups genuinely in competition. More often neighboring populations have much lower Fst values, though ancient DNA is suggesting that 0.05 to 0.10 values were maintained in some areas 5 to 10 thousand years ago. A simple population genetic rule of thumb is that one needs to have less than one migrant between two populations per generation for their genetic variation to increase, rather than decrease. In other words, minimal gene flow on a general scale quickly reduces between group genetic variance.

In contrast, cultural variation can be maintained because migrants can switch cultures, or, their genetic progeny can adopt the culture of one the parents in totality. In this way the later Ottoman Sultans and Umayyad rulers of Al-Andalus had been genetically transformed by generations of mixing with concubines derived from Europeans or Caucasians (i.e., those from the Caucasus), while remaining culturally very Turk and Arab respectively.

As noted in the preprint, this formal/theoretical avenue of research will allow for the development of a robust empirical research program. The data is out there.

• Category: Science • Tags: Cultural Evolution, Genetics 
🔊 Listen RSS

51zeajUmWhL._SX316_BO1,204,203,200_ An excellent open access review of population genetics history from 1966 to the present in Heredity, Population genetics from 1966 to 2016. From the abstract:

We describe the astonishing changes and progress that have occurred in the field of population genetics over the past 50 years, slightly longer than the time since the first Population Genetics Group (PGG) meeting in January 1968. We review the major questions and controversies that have preoccupied population geneticists during this time (and were often hotly debated at PGG meetings). We show how theoretical and empirical work has combined to generate a highly productive interaction involving successive developments in the ability to characterise variability at the molecular level, to apply mathematical models to the interpretation of the data and to use the results to answer biologically important questions, even in nonmodel organisms. We also describe the changes from a field that was largely dominated by UK and North American biologists to a much more international one (with the PGG meetings having made important contributions to the increased number of population geneticists in several European countries). Although we concentrate on the earlier history of the field, because developments in recent years are more familiar to most contemporary researchers, we end with a brief outline of topics in which new understanding is still actively developing.

Charlesworth & Charlesworth are giants in the field, and they’ve a lot of changes over the past few decades. If you are inclined toward a deeper exploration of population genetics with an evolutionary focus, then Elements of Evolutionary Genetics is the book for you.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

440px-Heraclius_tremissis_681357 The Emperor Heraclius is a great man. It’s a shame most people don’t know more about him. His campaigns against the Persians in the early 7th century were truly audacious. But, he also lived long enough to witness the loss of Syria and Egypt. If you haven’t, I would highly recommend A History of the Byzantine State and Society.

In any case, I was double-checking the marriage to his niece Martina because of some comments below, and came upon this interesting passage on Wikipedia:

Martina and Heraclius had at least 10 children, though the names and order of these children are questions for debate…

Of these at least two were handicapped, which was seen as punishment for the illegality of the marriage.

The coefficient of relatedness between uncles and nieces is 1/4. Twice as close as cousins, and the same as that between half-siblings. It isn’t entirely surprising that debilities would show up at this genetic distance, though two out of ten at that extreme might be a bit high.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

41Y1PqrWh5L._SX392_BO1,204,203,200_ One of the interesting things about genetics, and population genetics even more specifically, is how the theory and analysis outran the biophysical mechanism of the phenomenon. By this, I mean that the Mendelian laws inferred from transmission of physical characteristics predate any understanding about how genes were embedded within chromosomes, let alone the structural nature of DNA.

Population genetics, which fused the quantitative evolutionary thinking of the biometrical school with Mendelism, arguably outran the data by decades. Until the molecular evolution revolution of the 1960s controversies such as the role of selection and drift in shaping variation were rhetoric rich and data poor. Though the allozyme era was clarifying, I do think people who were shaped by that era get a bit fixated on being a particular camp. In contrast, with the genomics revolution many researchers seem to be more willing to let the data speak, because the data is so copious. A model that is relevant in one part of the tree of life may not be as predictive in another portion of it.

The rise of data makes old questions live again. With that, I present a paper in PNAS where the first author is Jonathan Wakely, a pioneer of coalescent theory, Effects of the population pedigree on genetic signatures of historical demographic events:

Genetic variation among loci in the genomes of diploid biparental organisms is the result of mutation and genetic transmission through the genealogy, or population pedigree, of the species. We explore the consequences of this for patterns of variation at unlinked loci for two kinds of demographic events: the occurrence of a very large family or a strong selective sweep that occurred in the recent past. The results indicate that only rather extreme versions of such events can be expected to structure population pedigrees in such a way that unlinked loci will show deviations from the standard predictions of population genetics, which average over population pedigrees. The results also suggest that large samples of individuals and loci increase the chance of picking up signatures of these events, and that very large families may have a unique signature in terms of sample distributions of mutant alleles.

The paper is open access, so read the whole thing. The major math is tucked away in the extended material. Many of the formalisms in the text are those you’d regularly encounter in population genetics. The issue they’re addressing here is the fact that real populations exhibit pedigree structure, and even unlinked loci, which we treat as independent evolutionary histories, share a pedigree history.

If you read the text though it is notable how robust standard population genetic inferences are to the fact that in a literal sense they’re based on false assumptions. Massive demographic expansion (e.g., Genghis Khan haplotype) and unrealistic selection coefficients don’t seem to disturb the lineages enough so that the assumption of independent assortment starts to become misleading.

This shouldn’t be entirely surprising. I would argue that genomics has not really revolutionized evolution or population biology. The big frameworks are vindicated because nature is one, and the glimmers of reality you see in sparse data nevertheless sample from a comprehensible underlying distribution. As we get more data we’re getting more clarity, but the overall picture is not shocking or surprising.

Citation: John Wakeley, Léandra King, and Peter R. Wilton, Effects of the population pedigree on genetic signatures of historical demographic events

• Category: Science • Tags: Evolution, Genetics 
🔊 Listen RSS

The mutation rate in human evolution and demographic inference:

The germline mutation rate has long been a major source of uncertainty in human evolutionary and demographic analyses based on genetic data, but estimates have improved substantially in recent years. I discuss our current knowledge of the mutation rate in humans and the underlying biological factors affecting it, which include generation time, parental age and other developmental and reproductive timescales. There is good evidence for a slowdown in mean mutation rate during great ape evolution, but not for a more recent change within the timescale of human genetic diversity. Hence, pending evidence to the contrary, it is reasonable to use a present-day rate of approximately 0.5 x 10−9 bp−1 yr−1 in all human or hominin demographic analyses.

Even since this review came out there has been new work. Fast changing.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

Screenshot 2016-07-02 22.20.21
Deep Sequencing of 10,000 Human Genomes:

We report on the sequencing of 10,545 human genomes at 30-40x coverage with an emphasis on quality metrics and novel variant and sequence discovery. We find that 84% of an individual human genome can be sequenced confidently. This high confidence region includes 91.5% of exon sequence and 95.2% of known pathogenic variant positions. We present the distribution of over 150 million single nucleotide variants in the coding and non-coding genome. Each newly sequenced genome contributes an average of 8,579 novel variants. In addition, each genome carries in average 0.7 Mb of sequence that is not found in the main build of the hg38 reference genome. The density of this catalog of variation allowed us to construct high resolution profiles that define genomic sites that are highly intolerant of genetic variation. These results indicate that the data generated by deep genome sequencing is of the quality necessary for clinical use.

The 30x means that they’re hitting each base on an average of 30 times, so they can be very confident of their call. This matters a lot for rare variants, as might be useful when it comes to idiopathic diseases. The 10,000 number is obviously to take it a step beyond the “1,000” genomes, which went well above 1,000 genomes in any case. But the coverage means that these are very confident calls for any given individual.

A distribution of variants shows that their panel of unrelated individuals (~8,000) yields ~150,000,000 single nucleotide variants (out of a genome of 3,000,000,000 bases). You see that half of these 150 million are found at counts of one across their whole sample set. In contrast, you have ~5 million variants present at allele frequencies of about 5% or more, and a bit more than ~10 million variants at 1% or more, and ~20 million variants at 0.1% or more. Remember that the 1000 Genomes paper reported that each individual within their data set have about ~5 million variants in comparison to the human reference genome.

I reiterate these dull numbers to give people a sense of what it means to have 100,000 to 1 million marker SNP-chips in humans. It is true that without imputation these chips aren’t capturing a lot of functional variants (though they’re typical designed to target a lot of the most important disease markers in particular). But when comes to capturing the shape of genetic variation they’re a very good sampling indeed. Consider, for example, the proportion and number of voters who are part of the sample for exit polls or pre-election surveys. For standard PCA or genotypic model based clustering (e.g., ADMIXTURE/STRUCTURE) anything more than 1 million markers is pretty useful from what I’ve seen, and the 100,000 to 500,000 interval is sufficient for pretty much everything. And haplotype based methods that generally use phasing, like fineSTRUCTURE, seem to do fine in the ~250,000 marker range.

• Category: Science • Tags: Genetics 
🔊 Listen RSS

440px-Elizabeth_Warren--Official_113th_Congressional_Portrait-- There have been some media “explainers” about how genetics can’t speak to Elizabeth Warren’s Native American heritage. This is a complicated issue, and not all the assertions in the media pieces I’ve seen are wrong, but a lot of the details are very confused or wrong. In sum, this is very bad journalism from people who don’t know where to start, and had no idea they were relaying confusions or falsehoods. (I’m being generous here in assuming they didn’t know that they were repeating falsehoods)

The point of this post isn’t to get too involved in the political points. Or even to argue that Elizabeth Warren should take a genetic test (I don’t think she should unless she wants to for other reasons besides the political sideshow, but that’s my personal opinion). Rather, I think that genetics is being distorted for the sake of political points and demerits. That is not optimal. Normally I don’t do much “fisking” type posts, but this is necessary at this point.

Let’s start with The Washington Post, Sorry, Scott Brown: A DNA test can’t tell us if Elizabeth Warren has Native American roots.

First, the title is false. If a few percent of Elizabeth Warren’s ancestry was derived from people whose ancestors lived in the New Word before 1492, then it would be visible on a PCA with Europeans and Native Americans. She’d be shifted a bit toward Native Americans.

Second, the journalist at The Washington Post interviews someone with serious credentials to serve as a primary source:

Nanibaa’ Garrison is a bioethicist and assistant professor of pediatrics at Seattle Children’s Hospital. A Native American, she earned a PhD in the Department of Genetics at Stanford, with a dissertation focused on ancestry.

She certainly has done genetic research, but I’m not sure that she can speak to modern genomic inference, which has advanced a lot in the past ten years.


That’s because determinations of ancestry are based on “ancestry-informative markers” — genetic flags that offer probabilities of the likelihood of certain ancestries. Most of those markers, AIMs, are “based on global populations that are outside of the U.S.,” she said, “primarily people of European descent, people of Asian descent and people of African descent.

Those three populations are not enough to determine how much Native American ancestry a person has.

AIMs were popular in the 2000s. Basically they are usually less than 100 markers with very high between-population differences in frequency between your populations of interest. But today most people would not use AIMs unless cost is a major issue (e.g., I’ve seen that AIMs are still used sometimes in work from developing nations because they can’t afford SNP-chips). So all the talk about AIMs is totally irrelevant to the question at hand.

Today you can download data sets with hundreds of thousands, and in the case of the 1000 Genomes data, millions of markers. These are still ascertained for polymorphisms; variants. But they’re really not AIMs in the classical sense as they are not targeted to a narrow set of populations, but look for variation across most human groups.

Also, panels are not restricted to three populations. You can get plenty of indigenous American samples from various public panels, as well as looking in the 1000 Genome Peruvian data set. The focus on three populations is again an artifact of 2005, probably due to the HapMap era (CEU, YRI, CHB+JPT, if you know what I mean).


Warren’s understanding of her heritage was that she was part Cherokee, perhaps as little as 1/32nd based on outside sleuthing. (Brown dismissed that claim specifically on this week’s call.) The odds of identifying a particular tribal identity are essentially zero, according to Garrison, but such a small percentage of Native American blood would also make identification much harder, even if the necessary AIMs existed.

Again, AIMs are irrelevant. This is like explaining that Netflix won’t work because of 56K modem download speeds. Most people don’t use 56K modems anymore. The 1/32 fraction may be an issue, but not because ~3% is not detectable. It is. A few years ago I stumbled onto the fact that geneticist Dan MacArthur is ~2% South Asian. He checked, and his brother is in the same range, while his father is about double. It turns out that he had an ancestor who was an officer in the British army in India….

The bigger problem here is that as you proceed back generations you are less and less likely to have genetic segments from any given ancestor. So if you had an ancestor 200 years ago who was Native American, even if they were 100% Native American, you may not have any genetic segments from that individual.

So, the article says:

Even a test that was fine-tuned to pick out Native American identity might not find any on Warren’s genes, because the requisite markers simply may not have made the cut over multiple generations.

This is correct. But, you probably do have segments from someone five generations back. There’s about 5-10% chance that five generations back you wouldn’t inherit any segments from an ancestor at that remove. The expert consulted by The Washington Post states:

“It would be impossible to go back that far,” Garrison said. “One-32nd is low enough that, even if she does have Native American ancestry, just by chance the genes that show up on these AIM panels might not necessarily be passed down, even if she might have other genetic variants that are highly prevalent among Native Americans. It’s all just by chance, what you inherit from your parents.”

As I said, AIMs are irrelevant. Today you would use dense SNP-chip panels or even whole genome sequencing. But even with AIMs if you had 100 well distributed throughout the genome it would be quite possible to detect divergent ancestry from the rest of the genome. It is not “impossible” as asserted. The source is just incorrect.


“There’s a confidence interval that’s associated with [the results],” Garrison said. “That confidence interval can be very wide, especially when you’re talking about such low ancestral contribution.” So maybe Warren gets the results back and it says that she’s Native American — but that it can only be determined with 20 percent confidence. Scott Brown might not be convinced.

This is only an issue with AIMs. You can get results of 3% back pretty robustly. And it would show up on PCA too.

Then there are weird tangents, which I think exist to make the author look like they’ve “done their research” and reassure the lay audience:

Huntington disease, for example, can be spotted in DNA — but the test wouldn’t tell you when the disease might develop, which doesn’t do you much good if you’re worried about a four-year window. “There are so many different environmental factors or dietary factors and other health behaviors that would feed into whether or not a disease might develop and what time in their life it would develop,” Garrison said, making that sort of prediction impossible. (For now, at least.)

I’m not a medical geneticist, but I think the example of Huntington’s is kind of strange to put here (perhaps because people know about it?). It’s really well genetically characterized. From the link provided in the article:

As the altered HTT gene is passed from one generation to the next, the size of the CAG trinucleotide repeat often increases in size. A larger number of repeats is usually associated with an earlier onset of signs and symptoms. This phenomenon is called anticipation. People with the adult-onset form of Huntington disease typically have 40 to 50 CAG repeats in the HTT gene, while people with the juvenile form of the disorder tend to have more than 60 CAG repeats.

Individuals who have 27 to 35 CAG repeats in the HTT gene do not develop Huntington disease, but they are at risk of having children who will develop the disorder. As the gene is passed from parent to child, the size of the CAG trinucleotide repeat may lengthen into the range associated with Huntington disease (36 repeats or more).

Warren is old enough that she is unlikely to have 60 repeats or more. But Huntington’s is one of those diseases where we have a good sense of age of onset because it’s triplet repeat length is proportional to age of onset.

Next we have an article in Slate, A DNA Test Won’t Explain Elizabeth Warren’s Ancestry. First:

But here’s the thing: DNA testing cannot definitively prove whether a person is Cherokee. Or a member of any community, at least not reliably. To assume it can is to assume that there’s something inherently different in the genetic makeup of tribal members and that this thing is universal within that community. That’s not true.

Strawman. We’re always talking probabilities. Then:

The problem is that DNA snippets, or markers, are inconsistent. Sometimes they are passed on and sometimes they are not, and whether they are or aren’t is random. Sure, a large percentage of Native Americans may share certain genetic markers. But many Native Americans may lack the same marker, and many non–Native Americans may carry it by coincidence.

I don’t have a good sense of what the author is trying to get at, though I think there’s something underlying all this verbiage. The issue that allele frequencies are not (usually) disjoint across populations is well known. That’s why modern SNP-chip panels use hundreds of thousands of markers. Much of the Slate article is engaging a strawman when it comes to genetics because it acts as if we’d actually rely on a few markers, though perhaps not in the public’s perceptions of how these things work. In the latter case, the author could simply put in this sentence: “genetic tests to detect ancestry usually rely on hundreds of thousands of markers today, not only a few….”

This lack of specifics crops up over and over:

So when a DNA test comes back saying you are 28 percent Finnish, all it’s really saying is that of the DNA analyzed (most companies don’t analyze all of your DNA), 28 percent of it was most similar to that of a completely Finnish person. In the end, these comparisons are a fun but ultimately unreliable way to think about the possibilities of whom your ancestors might have been, rather than definitive proof of your ethnic background.

There’s a link in the piece that takes you to a 2007 piece on how DTC tests aren’t all they’re cracked up to be. 2007 is ages in genomics. So ignore that. Second, the selection of Finnish is unfortunate for the author, as Finns are actually one of the more genetically distinctive European populations out there because of a small effective population size. So, for example, one of my friends has a grandfather whose parents were from Finland. 23andMe says she is 19% Finnish. It’s simply wrong that it’s “unreliable.” With segment matching it’s quite reliable if you get a positive hit assuming you set the genetic distance threshold high enough. Also, depending on how you delimit “ethnic background” it can be quite definitive. Samples from Northern Europe never show much evidence of African ancestry. A minority of white Americans do. That’s not a coincidence.

As in The Washington Post the author of Slate piece has an authority who lays down the truth as they see it:

“Scientists who don’t know better claim that when more Natives are sampled they’ll have better data bases, i.e. more Native markers,” said Kim TallBear, professor of Native studies at the University of Alberta in a 47-tweet takedown of Brown’s remarks about Warren. “[Geneticists] think that with more markers, and greater historical-genetic resolution they’ll be able to pinpoint tribe-specific markers.” But this does not account for the fact that people are continuously moving and reproducing with other, diverse people. They mix their genetic code with other communities (as they always have, going back to the dawn of our species). If anything our DNA is getting more muddled, not more clear.

Can you read a paper like The genetic structure of the world’s first farmers, and believe this? Geneticists who work in historical population genomics are quite familiar with the ideas of migration and gene flow. More data is clarifying, just as it science should be.

The first authority cited in The Washington Post did some legitimate science at some point, though a bit outside of the core area of expertise she was being consulted on, and her knowledge definitely seems out of date (the constant talk about AIMs is a good tell here). Kim TallBear’s publications are quite different….

The author of the Slate piece ends:

Another issue is limited and inconsistent data., for example, divides the world up into 26 genetic regions and uses just 115 samples to create the representative of each region—a very small sample size. And different companies place different weight on these samples, which come from burial grounds, modern isolated communities, and academically published data, like the Human Genome Diversity Project. For the consumer, this means if you don’t like your heritage results, try a different company. You’ll get a completely different breakdown.

Whether there’s any harm in people basing their identity on faulty reasoning is unclear, but the success of these commercial endeavors proves that at the very least, consumers find it kind of fun. Genetic testing is basically just a low-cost way to get a blurry picture of whom your ancestors might have been related to.

First, the author needs to issue a correction. I immediately knew didn’t use 115 samples; that’s just too low. Fifteen seconds of Google shows me that they have a sample size of 3,000. No idea where 115 samples comes out of, and I don’t care. He’s wrong. Slate should correct this. [see addendum; I may have misunderstood or been too harsh here, but a different point them crops up….]

Second, it’s misleading to say the picture is “blurry.” No, arguably it’s overly precise, and misleads people. Many of these ancestry inferences are quite precise and robust. They don’t vary between replicates that much even though they have a stochastic parameter. But, model based clustering gives results conditioned on a model. The results themselves them are sensitive to the parameters you’re putting into the model. The different regions from different DTC companies and sample sets are these different conditions.

This isn’t mysterious or difficult to understand. If you want to separate your individuals into Africans and non-Africans all the non-Africans will go into one cluster. This is robust, precise, and highly reproducible. In fact, a non-African individual will never be clustered with Africans with normal SNP-chip densities. At least not in the thousands of iterations I’ve personally run and inspected. Similarly, as you separate populations further you’ll see reasonable and comprehensible divisions.

The problems crop up when you begin to slice and dice very close genetic groups, where there isn’t much between-population difference. This is what happens in Northern Europe, and this is where most of the DTC firms’ client base is from. So this causes problems, and often difficult to interpret results. Moderate changes in parameters then can produce divergent results because the question we’re trying to get at is really hard to resolve with the data on hand, less than one million SNPs.

There are ways to resolve this. And that has to do with more data. In particular, whole genome sequencing at high coverage can pick up very rare alleles, which are highly informative of more recent genealogical history, and so divide up even Northern Europeans in a way that is more comprehensible and historically accurate.

But really the problem isn’t with the data. We have very dense SNP-chip markers now. The problem isn’t with the methods. We have genotype and haplotype-based methods which can make pretty strong inferences, especially at the intercontinental level (e.g., a friend who is 1/4 Japanese genealogically comes out to be 24% Japanese genomically; the rest is European). The problem is that the public, including journalists, aren’t always clear what the results are telling them. Sometimes the DTC companies themselves may be at fault because of their unclear communication. And to be frank, the Henry Louis Gates Jr. in my opinion has often sown a lot of confusion as well with his television show, informative as it may be.

500px-JohnRossCLooping back to Elizabeth Warren, the biggest issue with her maybe not having any indigenous ancestry combined with a Cherokee ancestor five generations back is that the Cherokee nation in the 19th century was already genetically mixed. The great chief John Ross was 1/8th Cherokee by blood quantum. That is, 1/8th of his ancestors were present in the New World in 1492. So a simple reason for why Elizabeth Warren might be Cherokee, but without indigenous ancestry, is that her Cherokee ancestor may not have had much indigenous ancestry. It’s not because genetics can’t pick up indigenous ancestry, genetics can. It’s just that this is a case were social and cultural history and definitions are important.

To be honest this post is a bit trivial. But lots of people read The Washington Post and Slate. As I just explained above there is a simple reason why Elizabeth Warren could come out 100% European in her ancestry, and, be of Cherokee descent. Instead of explaining this, the media has decided to look for people who claim that genetics just can’t answer this question. In the process they garble, mislead, and repeat falsehoods (the sample size for is obviously wrong to anyone who is familiar with that field, but the journalist is not familiar, so it passed their smell test since they had no grounds for discernment).

This post exists only so that at least there is someone out there correcting the record.

Note: I am a consultant for Gene By Gene and was a developer for their MyOrigins tool. This is one reason I know a lot about DTC genetic companies. But it also means I have a conflict of interest, as I think DTC genomics is useful with the proper caveats.

Addendum: A reader:

This seems, um, contrivedly obtuse. 115 samples per region times 26 regions is a total sample size of 2990, which seems reasonably close to 3000. Going the other way, 3000 / 26 is 115.4, so that will be where the claim of “115 per region” came from. There was no claim of “115 total”; the piece says that the representative of each region is constructed from 115 samples.

It’s true that 115 is an average figure and that’s not made clear in the article, but I’m not sure how comforting I should find it that the representative of “Polynesia” is actually constructed from 18 samples rather than 115.

A fair, but inadvertently ignorant, point. Sample sizes of ~20 are actually quite sufficient to generate reference populations. It partially depends on how diverse the populations are you are trying to use as a reference.

• Category: Science • Tags: Genetics 
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"