The Unz Review - Mobile

The Unz Review: An Alternative Media Selection

A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media

Email This Page to Someone

 Remember My Information

 Gene Expression Blog


Over at Eurogenes there was a casual mention of the Simons Genome Diversity Project. This is kind of a big deal, because as the map above shows you this project has really good population coverage. Additionally, the quality of the genomes are very good. The vast majority are in the 30× to 50× range, with all over 20× (coverage basically gives you a sense of how many times a particular position is likely to have been sampled by sequence reads, ergo, how accurate your call for a polymoprhism is and such).

The tranche of papers for this project doesn’t seem to have started coming out yet, and the researchers have asked people to not report genome-wide results if they get permission to download it. It’s a total of 10 terabytes of data. Below is a table of the populations with counts. Remember that with high coverage whole genomes even one individual can give you lots of inferences (e.g., PSMC).

Population N
Abkhasian 2
Adygei 2
Albanian 1
Aleut 2
Altaian 1
Ami 2
Armenian 2
Atayal 1
Australian 3
Balochi 1
BantuHerero 2
BantuKenya 2
BantuTswana 2
Basque 2
BedouinB 2
Bengali 2
Bergamo 1
Biaka 2
Bougainville 2
Brahmin 2
Brahui 2
Bulgarian 2
Burmese 2
Burusho 2
Cambodian 2
Chane 1
Chechen 1
Chukchi 1
Czech 1
Dai 5
Daur 1
Dinka 3
Druze 2
Dusun 2
English 2
Esan 2
Eskimo_Chaplin 1
Eskimo_Naukan 2
Eskimo_Sireniki 2
Estonian 2
Even 3
Finnish 3
French 3
Gambian 2
Georgian 2
Greek 2
Han 4
Hawaiian 1
Hazara 2
Hezhen 2
Hungarian 2
Icelandic 2
Igorot 2
Iranian 2
Iraqi_Jew 2
Irula 2
Itelman 1
Japanese 3
Jordanian 3
Ju_hoan_North 4
Kalash 2
Kapu 2
Karitiana 4
Khomani_San 2
Khonda_Dora 1
Kinh 2
Korean 2
Kusunda 2
Kyrgyz 2
Lahu 2
Lezgin 2
Luhya 2
Luo 2
Madiga 2
Makrani 2
Mala 2
Mandenka 4
Mansi 2
Maori 1
Masai 2
Mayan 2
Mbuti 4
Mende 2
Miao 2
Mixe 3
Mixtec 2
Mongola 2
Mozabite 2
Naxi 3
North_Ossetian 2
Orcadian 2
Oroqen 2
Palestinian 3
Papuan 16
Pathan 2
Piapoco 2
Pima 2
Polish 1
Punjabi 4
Quechua 3
Relli 2
Russian 2
Saharawi 2
Samaritan 1
Sardinian 3
She 2
Sindhi 2
Somali 1
Spanish 2
Surui 2
Tajik 2
Thai 2
Tlingit 2
Tu 2
Tubalar 2
Tujia 2
Turkish 2
Tuscan 2
Ulchi 2
Uygur 2
Xibo 2
Yadava 2
Yakut 2
Yemenite_Jew 2
Yi 2
Yoruba 3
Zapotec 2

To mark the release of the 1000 Genomes papers, here’s are pedigree files with the 2,500 1000 Genomes samples. The 290,000 SNPs overlap with HGDP and other public SNP-chip data sets. The .fam has the population IDs. For what it’s work, I just used plink 2 to convert from VCF format.

• Category: Science • Tags: Genome

speciationJerry Coyne, an eminent evolutionary geneticist and all around public intellectual, is retiring, and has posted a bittersweet and hopeful farewell letter to his conventional scientific career. For the general public Coyne is probably more famous as a New Atheist, though Coyne is actually a vocal atheist of long standing. His most recent book was on that topic, Faith Versus Fact: Why Science and Religion Are Incompatible. I’m an atheist, but on the balance I demur form many of his positions in regards to religion and science. More precisely, I am quite willing to defend atheism and dismiss religion, but on philosophical or meta-scientific grounds, not scientific grounds as such.

When it comes to science on the whole I tend to agree with Coyne more often than not. In particular, his attitude toward the dynamics driving evolutionary process. In regards to the science, this section jumped out at me:

What I’m proudest of, I suppose, is the book I wrote with my ex-student Allen Orr, Speciation, published in 2004. It took each of us six years to write, was widely acclaimed and, more important, was influential. I still see that book as my true legacy, for it not only summed up where the field had gone, but also highlighted its important but unsolved questions, serving as a guide for future research.

As readers know I read Speciation in 2005, and it has really influenced my perspective on the broader topic. It’s an ambitious book even if the focus is on the process of speciation, rather like Structure of Evolutionary Theory in spirit, though far more economical in terms of prose and clearer in execution. I don’t know if Speciation is out of date or not, as I don’t study speciation, but I’d recommend it to anyone who wants to understand how an evolutionary geneticist might view the process and concept.

• Category: Science • Tags: Evolution


The 1000 Genomes paper is out, A global reference for human genetic variation. It’s open access, read the whole thing. Here’s the abstract:

The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

ft067n99v9_cover The PSMC above is interesting to me. It shows BEB, the Bengali population form Dhaka, starting from a small base and exploding in size. There are some issues relating to ascertainment that need to be admitted here though. The Indian Gujurati sample turns out to be about half Patel, and half other Gujuratis. In contrast, the Bengalis are relatively homogeneous in ancestry (sampled from Dhaka), and don’t seem to exhibit much population structure. What I’m saying is that when the authors talk about “Gujuratis” they are really talking about “sort of Patels”, while when they talk about Bengalis, they are talking about Bengalis as a whole. There’s an apples-to-oranges aspect to this. It also needs to be kept in mind when they note the alleles private to the Gujurati (GIH) sample; that’s almost certainly due to the large number of endogamous Patels in the original Houston data set who are going to share a lot more demographic history than you’d otherwise expect among Gujuratis.

Secondly, the bottleneck + genetic homogeneity in the admixture for Bengalis reinforces the model outlined in The Rise of Islam and the Bengal Frontier, 1204-1760. Basically the population size change above highlights that eastern Bengalis descend from a small group of founders relatively recently in the past, despite their >100 million modern census size. Genetically this has resulted in the ancestral homogeneity you see in the plot above, but culturally it also allowed for the degrading of the social institutions of Indian society which allowed Hinduism to be robust to nearly one thousand years of Islamic hegemony across the subcontinent. Additionally, the lack of structure in ancestral components reflects relatively little endogamy (I have checked the runs of homozygosity in my parents’ genotypes, and they’re lower than those of my South Asian friends from particular caste/jati backgrounds).

• Category: Science • Tags: Genomics

515hZV+DqJL._SY344_BO1,204,203,200_ As most of you who regularly read me know I’m not too interested in persuading people of things. Rather, I think that if the truth is what it is through a collaborative process of searching for it we’ll all eventually converge upon it, given enough time (which is a big condition!). Rather, the goal on this weblog is to create a set of like-minded readers and explorers. I know some of you think I’m smart and well read, but the point is that I don’t really care what you think of me. And similarly, I hope you don’t care what I think of you. The truth as we understand is its own reward. A sweetness of discovery and comprehension which most people don’t seek, nor desire. Rather, they’d prefer to run with their own horde of fellow-travelers.

With that out of the way, I was curious what books readers had purchased over the years. I’ve been an Amazon affiliate for over 15 years mostly because I do so much book-blogging. Amazon gives me records right now back to 2010. In that time over 5,000 books have been purchased through links on this website. So what are the top 30? (I picked that number because these are the number well above N = 10) It’s probably no surprise that The Fall of Rome: And the End of Civilization tops the list. I’ve read this book three times cover to cover since 2006. It’s really shaped my perception of how we can understand history in a positive, rather than just interpretative, sense. Second, I’m rather proud that I’ve somehow been involved in ~20 purchases of Principles of Population Genetics. These were people who didn’t purchase it for a class, but because they were interested in the topic. Finally, I have no idea why so many people bought Different Brains, Different Learners. I have never heard of this book before today. No surprise that no fiction is in the top 30.

Rank Books
1 The Fall of Rome: And the End of Civilization
2 The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World
3 War in Human Civilization
4 Empires and Barbarians: The Fall of Rome and the Birth of Europe
5 The Shape of Ancient Thought: Comparative Studies in Greek and Indian Philosophies
6 Theological Incorrectness: Why Religious People Believe What They Shouldn’t
7 Ancestral Journeys: The Peopling of Europe from the First Venturers to the Vikings
8 Dancing in the Glory of Monsters: The Collapse of the Congo and the Great War of Africa
9 Albion’s Seed: Four British Folkways in America
10 Different Brains, Different Learners: How to Reach the Hard to Reach
11 1493: Uncovering the New World Columbus Created
12 The Blank Slate: The Modern Denial of Human Nature
13 The Journey of Man: A Genetic Odyssey
14 The 10,000 Year Explosion: How Civilization Accelerated Human Evolution
15 Empires of the Silk Road: A History of Central Eurasia from the Bronze Age to the Present
16 Principles of Population Genetics
17 Uncontrolled: The Surprising Payoff of Trial-and-Error for Business, Politics, and Society
18 In Gods We Trust: The Evolutionary Landscape of Religion
19 Population Genetics: A Concise Guide
20 Before the Dawn: Recovering the Lost History of Our Ancestors
21 The Cousins’ Wars: Religion, Politics, Civil Warfare, And The Triumph Of Anglo-America
22 The Dawn of Human Culture
23 Religion Explained
24 The Nurture Assumption
25 The Price Of Altruism: George Price and the Search for the Origins of Kindness
26 War and Peace and War: The Rise and Fall of Empires
27 1491: New Revelations of the Americas Before Columbus
28 Born That Way: Genes, Behavior, Personality
29 Darwin’s Cathedral: Evolution, Religion, and the Nature of Society
30 Empires of the Word: A Language History of the World
• Category: Miscellaneous • Tags: Books


Congratulations are in order obviously.

Update: For those who are not familiar with the paper, Genes mirror geography within Europe.

• Category: Science • Tags: Science

I’ve been on the internet for over 20 years. When I initially got on the net I remember interacting with people who lived in England, and it was so cool! At one point I recall getting into a talk session with someone who lived in Ecuador. If you lived through the era of Wired circa 1995 to 1999 you remember all the talk about how the internet was going to make location irrelevant, and we were going to congeal into a world cross-linked by cyber-connections. In the mid-2000s the Second Life boomlet brought back some of those feelings, but that faded.

Unlike many Americans I have a lot of family abroad. One of my Facebook friends is my cousin who happens to be a religious teacher and brought up in Tablighi Jamaat by an uncle who has long been a partisan of that movement. I know this cousin a bit (I met him when I visited Bangladesh in 1990 and 2004), and he’s a nice enough fellow. He even likes some of my personal events (e.g., the births of my children). We’ve had chat sessions here and there. Since my “religion” is put as “atheist” on my profile he also knows that about me (he double-checked with me when he became my Facebook friend).

I bring all this up because I hardly ever interact with the cousins who are on Facebook who live abroad. Rather, my Facebook feed is mostly devoted to those who I grew up with in the states, and in particular those who I work with, or went to school with recently. Basically what you’d expect. Facebook has over 1 billion users, but we’re all in our own cultural silos, chattering amongst ourselves. This isn’t totally surprising, and today it seems banal. Yes, there are millions of people from India on Facebook, but they’re not part of my social graph, and won’t be…unless they immigrate to the United States.

When the internet was young we didn’t anticipate many things about its later development. One was that rather than transforming our social networks, it would simply facilitate them. Yes, e-mail and Facebook have changed the way we interact and socialize. But they’ve probably just amplified and smoothed preexisting trends, rather than change the underlying dynamic.

• Category: Miscellaneous • Tags: Facebook

518YSJZZSGL._SX332_BO1,204,203,200_ I’ve been very busy of late, and had to travel this weekend. Explaining the relatively light blogging recently. Will probably change in the near future.

While on the airplane I decided to reread my Kindle version of John Gillespie’s Population Genetics: A Concise Guide. The subtitle is accurate, it’s short and quick. Longtime readers know if that you want to “understand” population genetics, you might should probably check out Principles of Population Genetics. But that’s not light reading, and, like many textbooks there isn’t a Kindle version (and in many cases the e-book version of a text is as expensive, and in some cases more expensive). But for the purposes of following along on some of the more abstruse posts in this space I have to say that Gillespie’s precis of pop gene probably does hit all the major notes. The main demerit is that since it’s short, and was written in the 1980s, it is not as genomic or coalescent heavy has a book written today with the same aims and constraints might be (I read the version that came out in 2004, but even in this edition there are some assertions about the limitations of what we know about genetic variation, especially human, which turn out to no longer hold in our time).

• Category: Miscellaneous • Tags: Open Thread

I haven’t paid much attention to the “three-person babies” controversy, because it seems like a manufactured one. After all, we’re balancing people who might develop a severe illness, against vague and inchoate concerns. Very few (though some) biologists that I know of express any concern about this issue. Mostly it seems to be the public, whose fears are stoked by ethicists and religious moralists.

Nature now has an article, The hidden risks for ‘three-person’ babies, which smokes out the major concern. Much of the piece focuses on what looks to be some sort of “hybrid break-down” due to conflicts between mitchondria and differing genetic backgrounds in animal models (flies and mice, for example). The worry is that mitocondria specified by their unique sequences exhibit functional differences which may interact in a deleterious fashion with different nuclear genomes. In more plain language, there may be a problem when you mix racial heritages. This is not a thesis that I usually see proposed outside of racialist circles, but it’s pretty obviously what the author is dancing around. Rather than focus on animal models, why not acknowledge that there are plenty of “natural experiments” which test this thesis. Here’s the rejoinder on Twitter:

And here’s the relevant section in the article (which is rather abbreviated after all the focus on Drosophila!):

They also pointed out that most of the evidence for risk stems from studies that used strains of flies and mice that had been highly inbred — a process that would increase the genetic differences between the strains and therefore produce a greater ‘mismatch’ when the mitochondria are swapped. They argued that such studies have little relevance for human populations that interbreed all the time. The “lack of any reliable evidence of mitochondrial–nuclear interaction as a cause of disease in human outbred populations”, they wrote, “provides the necessary reassurance to proceed”.

Aside from very rare instances such as in Helgadottir et al. there isn’t any evidence of hybrid breakdown across human populations. Greg Cochran and Henry Harpending looked in the fertility literature about ten years ago, and they found no evidence of depressed fitness. Pontus Skoglund has looked for the sort of purifying selection you see in the Neanderthal-modern human admixture event (X chromosome and genic regions have less Neanderthal), and found none. The earliest branching human population are Khoisan, about ~200,000 years before the present, and there’s no good evidence that major incompabilities exist which are fundamentally racial except for the one I gave above (that is, differences are nearly fixed between populations, and crosses have reduced fitness due to lower fitness in heterozygote state). And of course there are the “natural experiments” of populations across the New World, and places like South Africa, where highly diverged mtDNA lineages are moving into different genetic backgrounds.

Presumably there have to be other issues that people worried about ‘three-person’ individuals are concerned with. What are they? I can accept the functional importance of variation in mitochondria, but I don’t see what this has to do with the three-person individual as opposed to people who are racially mixed. If the different genetic backgrounds are an issue, then it is more defensible to object to interracial relationships than three-person individuals, since in the latter case the only reason you’re introducing the novel mitochondria is to prevent illness.

• Category: Science • Tags: Genetics

CRISPR as search term in Google

Remember interactive television? In the mid-1990s Microsoft was betting the farm on this new technology. As it happens they had to make a course correction. The Mosaic browser was the first “killer app” of the internet (sorry e-mail and usenet), creating the world wide web as we know it. The the rest is history. The lesson is that sometimes no one sees a technology coming. And when it does come, it disrupts the whole landscape. It can both create and destroy. This was clear in my recent post for the Genetics Society of America. The discipline is over 100 years old, and yet over the past 30 years we’ve seen genomics go from being invented as a term (in 1986), to revolutionizing the field, finally to a great extent becoming coextensive with the field. Similarly, the internet existed for two decades before the world wide web came along, but rather soon our conception of “the internet” became synonymous with the web (and e-mail and newsgroups have become absorbed into the web architecture as well).

Similarly, genetic engineering has been around for decades. Direct manipulation of DNA sequences emerged as a technique in the 1970s, and the Asilomar Conference on Recombinant DNA agreed upon a set of guidelines in terms of how the method would be deployed. Despite what you might have gathered from movies such as Gattaca genetic engineering was both difficult and limited in its power to effect change. Of course, despite public concern GMO crops have been moving into circulation for years in the United States, while medical research would be hampered without the access to engineered mice. But very few people would assert that genetic engineering is ubiquitous today.

The CRISPR/Cas system has the potential to change this. It is easy, cheap, and fast. It can take genetic engineering from a vital niche, to a pervasive aspect of human culture. CRISPR first began to gain some attention in scientific circles in 2012. As I write now, in 2015, its presence in discussions relating to genetics can seem ubiquitous, even cloying. Seminars with the word “CRISPR” in the title suddenly become standing room. If the world wide web is an analogy to what is going on, then we are in 1993. The implication is that we haven’t seen our first Netscape of CRISPR, nor the emergence of a whole economy built around the technology. Right now it is a scientific superstar, but we’ll known that it’s made the “big time” when we see it mentioned ubiquitously on CNBC.

So what’s holding us back? There are two primary things I can think of. First, fear and uncertainty. The regulatory environment is essential for the success of any technology today (well, except Uber!), and the framework is currently ad hoc rather than formalized. The Chinese scientists who modified embryos were only newsworthy because of the bioethical and regulatory consequences of their actions, not the science. It is certainly more significant that a British group is now asking for permission to do experiments on the developmental genetics of embryos using CRISPR technology. The outrage over the modifications last spring had as much to do with breaking the tacit social norm within science where everyone wants to establish some sort of agreed upon framework for novel human research, rather than concern about the the scientific implications. If the British group receives approval, it will set a precedent which could open the door for other reputable researchers.

But what about in concrete terms in a near term horizon? The ability to “edit DNA” sounds incredible in the abstract, and is almost certainly civilization changing in the long term. But over the next few years it seems likely that CRISPR/Cas will result the reemergence of gene therapies as a means by which Mendelian diseases may be treated. Gene therapy as a field suffered a major blow in the late 1990s due to a series of fatalities, arguably tied to unethical practices by one researcher. But the idea of curing someone of a genetic illness by modifying the gene reBut isponsible for that illness is straightforward in its logic.

Many diseases, such as diabetes or schizophrenia, are complex in their origins. There is no specific gene responsible for the cause in the vast majority of instances. The road to genetically engineer a “fix” would be long and the outcome not assured in these cases. Any risks would have to be weighed strongly. In contrast Mendelian diseases are often due to a single locus, and the cause is due to that precise biological malfunction. And their outcomes are often easy to quantify. Cystic fibrosis takes decades off your life expectancy, and entails hundreds of thousands of extra costs over the lifetime. There is some debate as the frequency of Mendelian disease within the population, but something on the order of ~10% of the American population seems likely using a very liberal definition of disease. If only a a percent or two of these have illnesses which are of some severity, that may still justify intervention if it is feasible and safe.

The feasibility of gene editing to cure Mendelian disease is conditional partly on mode of delivery, which is not a genetic concern per se. That is, how do you modify a sufficient number of cell’s in a living human’s body to result in a change in function? A second concern are “off target” changes. You may be attempting to modify one thing, but modify another, in which case you’ve gone from the frying pan to the fire. Both of these though seem to be soluble problems over the time scale of a decade (CRISPR precision has gotten better even in the past few years). And for diseases such as sickle cell and cystic fibrosis, which entail shortened lives and constant monitoring and treatment, the perfect can’t be the enemy of the good. In the near future the ethical mandate will not be if, but why not.

When that happens you will see a shift in the medical system in the United States. Instead of attempting to tackle symptoms of Mendelian diseases, physicians will plausibly offer up the possibility of eliminating the root cause. This will make some companies very rich, as health care is a growing sector of our economy. Sequencing will be ubiquitous, obligatory, with exemptions necessary, not elective. In a classic sense it will be a “win-win,” as medical costs per individual will decrease, and their lifetime earning power will increase due to greater health.

The effective utilization of genetic engineering to make lives better for a minority of Americans will also change perceptions in the public as to the implications of genetic engineering. Instead of a dystopian future, people will begin to see their own present, and the fear will give away to acceptance. And it is in the time horizon beyond 2025 that I think we may need to start thinking about tackling germ-line modifications and more radical ‘experiments’ in biological engineering….

• Category: Science • Tags: Genomics

j10063 I know I’ve mentioned that stopped reading much about religion a few years back because I had hit diminishing marginal returns. But this Peter Turchin review of Big Gods: How Religion Transformed Cooperation and Conflict, made me reconsider. There’s no time or inclination in the near term for me to read this book, but it’s definitely in my mental stack now. I found the thesis plausible, and am familiar with the author’s published research, but remain mildly skeptical. Some of the experimental cognitive science I’ve seen in this literature is kind of “wow, that’s cool!”, but of late I’ve started to become more skeptical, as much of it turns out to not have generalizable relevance or is not robust (see The Invisible Gorilla: And Other Ways Our Intuitions Deceive Us). But, it does seem that this research program is starting to go into a more multi-disciplinary direction, and that’s a good thing, as you have multiple domains of “cross-checking” to build your positive case.

418MoLiuCdL._SY344_BO1,204,203,200_ On Twitter Steven Pinker points out that IQ has been immune from replicability crisis. Unlike a concept like implicit association there aren’t debates about its relevance to other characteristics (e.g., no, you do not necessarily behave in a more racist manner if you score as more racist on the IA tests) as well as the robustness of the result itself (e.g., the same people can get wildly different scores on re-tests which aren’t spaced that far apart). But that’s one reason I haven’t read much about IQ in years. The last book I read all the way through on IQ was probably James Flynn’s What is Intelligence (though I am excited to read my friend Garrett Jones’ book Hive Mind: How Your Nation’s IQ Matters So Much More Than Your Own, which is coming out in early November). Basically, the major findings of intelligence testing are pretty well set and good enough for someone for whom the topic isn’t a specialization. Similarly, from the lay perspective you don’t really need to keep up on the latest details in evolutionary science. The big sketch is probably already good enough for you. But, I did buy a copy of Stuart Ritchie’s Intelligence: All That Matters. I’ve sampled a fair amount of the book, though not read it front to back, and I think I can recommend it to those who want a primer.

Current Biology has a new paper, The Role of Recent Admixture in Forming the Contemporary West Eurasian Genomic Landscape. It uses the fineStructure framework, basically looking at haplotype sharing across groups. The time depth here of the inferences are relatively recent. There’s a lot in the paper, and I don’t know how to interpret all of it. But, it does reiterate that recent gene flow is a pervasive feature of the human landscape, and not just one of the modern era.

I will be in the DC-Baltimore area for ASHG in a few weeks. Excited about the poster buffet. Also going to eat spicy Chinese food. Any recommendations in Baltimore for Sichuan?

• Category: Miscellaneous • Tags: Open Thread

pizza I don’t want to disappoint my low-carb readers, but now and then I eat pizza. Especially when you have little kids pizza is a really good choice, since it tastes good, and even liminal toddler savages can consume it (it’s soft, it’s easy to grasp onto, and the mess isn’t that big of a deal). I probably should do more reading on food, since food is important, and I spend more than the typical American in terms of total budget (yes, I definitely lean SWPL in this domain). The family has an edition of On Food and Cooking which I used to thumb through, but I probably haven’t touched it in 5 years. Do any readers know if Pizza: A Global History is good? It’s part of a series.

My pizza preferences aren’t too sophisticated (though if you say that you like “Chicago style” you are dead to me). Usually I avoid the chains, because there’s often a good local joint. But of late I have been going toward a chain, Blaze Pizza. I really love the fact that you can order a specific pizza tailored to your preferences online, and then go and pick it up. And I’m not the only one, a 2015 survthough uey has Blaze up to the #2 “fast casual” brand. I asserted to my wife that Blaze is obviously trying to be the Chipotle of pizza, and apparently I’m not the only one making the obvious analogy. But the CEO of Blaze has higher ambitions. He wants the chain to be the “Starbucks of fast casual.” Good luck on that! (I think Starbucks is fundamentally a different beast as a Third Place, which an eatery like Blaze is never likely to be).

But here’s the reason I’m putting up this post: Blaze’s online ordering system has a major problem, and that is what they expect you to do when you pick up. Specifically, I make an online order, and then it tells me I need to pick it up at a specific time. They have everything set up, and they put it into the grill when you arrive. But, you are supposed to go up the cashier and tell them you’re an online order, and most of the time a lot of the other customers in line make it really hard for you to get the cashier’s attention. Half the time the cashiers themselves seem to wonder if you are trying to cut ahead in line. Perhaps it’s an feature of my local Blaze, but where you pick up pizza and where you pay the cashier are so close that it’s hard to differentiate myself. So yesterday it said I could pick up the pizza at 6:05. But they didn’t put it into the grill until 6:15 because 1) there was a woman who decided to harangue the cashier about the fact that they didn’t have specified quantities of how much pesto drizzle they had (she liked a “medium” amount) 2) even after she was done the cashier didn’t realize I was waiting for an online order even though I kept trying to make eye contact (I could have shouted “online order” but that would have entailed me cutting in on conversations that were going on).

If you aspire to be the Starbucks of a sector, you need to fix a problem this basic. The convenience of online disappears when there’s such an annoying rate-limiting step. If you want a frictionless experience, and I found that most of it is really smooth, you need to work just a little bit harder.

• Tags: Pizza

Full episode:


selec By now you have read about the new paper in Science, Greenlandic Inuit show genetic signatures of diet and climate adaptation. Carl Zimmer has an excellent treatment in The New York Times, Inuit Study Adds Twist to Omega-3 Fatty Acids’ Health Story. The backstory here is that for decades people have been told to take fish oils because of their possible protective role against heart disease. Apparently some of these recommendations were based on observing the dietary habits of indigenous peoples of the Arctic and their health outcomes. Unfortunately studies which attempt to gauge the impact of these recommendations on Western populations have come back mixed at best. I myself stopped taking fish oils years ago after a review of the literature and asking around. Well, it turns out that there may have been a confound that the populations of the Arctic are adapted to their particular diet.

The figure to the right gets at the heart of the result. Greenland Inuit (they selected for individuals with less than 5% European ancestry), Europeans, and Chinese, exhibit a particular genome-wide pattern of relatedness, which you can see at the bottom. Looking at their results the authors found that there is gene flow from a Greenland-like population to the Chinese at some point over the last 20,000 years. This seems plausible. Additionally, I recall that Greenland natives and Europeans share Ancestral North Eurasian heritage. This is not a population genomics paper focused on phylogenomics, so these details aren’t too important. The takeaway is that on a set of derived alleles around the fatty acid desaturase genes the populations of Greenland seem fixed for variants which are very different from the major alleles in both Chinese and Europeans. These genes are very extreme in terms of their results on the population branch statistic (PBS), which measures deviations in allele frequency against reference groups.

The details are somewhat gnarly. The authors look at several groups of genes, before zeroing in on the FADS group, and they also look at several variants within FADS, as well as various phenotypes. It turns out some markers make the Inuit differ in height and weight, and the height result also applies to Europeans (the frequency is far lower, so that may be why it wasn’t picked up in earlier GWAS).

But I want to focus on a major top-line result. First, here is Rasmus Nielsen in Carl’s piece, “The same diet may have different effects on different people.” And from the paper itself: “In addition to the associations with height, we also found known associations with low fasting serum levels of insulin, total cholesterol, and LDL cholesterol for European carriers of low-frequency–derived alleles of FADS1 variation, suggesting that there may be a protective effect of these variants on cardiometabolic phenotypes.” The implications of this study are commonsense, but they’re also very deep, as they confirm a deep intuition that the same dietary regime may not have the same outcome in all humans. As the authors note in the piece many of alleles at high frequency in Greenlanders are also at high frequency in American native populations in general. Looking at the time depth of the selection event it seems likely that a lot of change occurred in Beringia or Siberian, so for New World groups this may be an ancestral suite of characteristics. But perhaps even more interesting is that many populations have high minor allele frequencies of these alleles. I looked at one marker in the 1000 Genomes data set, and the range is wide. Many Eurasian populations have the “Greenland” variant at ~10% frequency, so ~1% might be homozygote for that genotype.

5169qqIjeZL._SX309_BO1,204,203,200_ What that means is that studies in small populations like the natives of Greenland may still have wide-ranging implications. There are literally hundreds of millions of people with these alleles. Though one might suggestion caution about extrapolating results out-of-population, some of the phenotypes are replicated already in Europeans who have the variant.

This study can’t be understood in isolation. It allows for broader generalizations. Ten years ago I read Why Some Like It Hot: Food, Genes, and Cultural Diversity. This was really a pre-genomic era book, drawing on an older body of work. But it is very interesting, and reports on a wide range of studies and the author’s own experiences. Much of it won’t be a surprise to many, but others would still benefit from its comparative method. Today we know a lot more about population-level variation, and what it might tell us about individual-level variation. It’s going to be fun times ahead, as I suspect that the intersection of diet, nutrition, genomics, and quantified-self is going to be a very big deal in the near future.How and what we eat is important. The diet industry is nearly a one hundred billion dollar market.

• Category: Science • Tags: Genomics

I have a post up at the new GSA blog, Read/write access to your genomes? Using the past to jump to the future. One thing I would say: I didn’t get into human germline modification because I don’t think it’s going to be a major issue in the near term. And, I think it’s more of a bioethical aspect of the technology of genetic engineering than a scientific one. I’m pretty sure we’ll have the technology, but we’ll cross that bridge when we get to it. Also, special thanks to Yaniv Erhlic for writing A vision for ubiquitous sequencing. It really has gotten me thinking….

• Category: Science • Tags: Genomics

The abstracts can be found here. Will be updating this post repeatedly….

Genome-wide data on 34 ancient Anatolians identifies the founding population of the European Neolithic.

I. Lazaridis1,2 ; D. Fernandes3 ; N. Rohland1,2 ; S. Mallick1,2,4 ; K. Stewardson1,4 ; S. Alpaslan5 ; N. Patterson2 ; R. Pinhasi*3 ; D. Reich*1,2,4

View Session DetailAdd to Schedule

1) Department of Genetics, Harvard Medical School, Boston, MA USA; 2) Broad Institute of MIT and Harvard, Cambridge, MA USA; 3) Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland; 4) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA USA; 5) Independent physical anthropologist, Netherlands.

It has hitherto been difficult to obtain genome-wide data from the Near East. By targeting the inner ear region of the petrous bone for extraction [Pinhasi et al., PLoS One 2015] and using a genome-wide capture technology [Haak et al., Nature, 2015] we achieved unprecedented success in obtaining genome-wide data on more than 1.2 million single nucleotide polymorphism targets from 34 Neolithic individuals from Northwestern Anatolia (~6,300 years BCE), including 18 at greater than 1× coverage. Our analysis reveals a homogeneous population that is genetically a plausible source for the first farmers of Europe in the sense of (i) having a high frequency of Y-chromosome haplogroup G2a, and (ii) low Fst distances from early farmers of Germany (0.004 ± 0.0004) and Spain (0.014 ± 0.0009). Model-free principal components and model-based admixture analyses confirm a strong genetic relationship between Anatolian and European farmers. We model early European farmers as mixtures of Neolithic Anatolians and Mesolithic European hunter-gatherers, revealing very limited admixture with indigenous hunter-gatherers during the initial spread of Neolithic farmers into Europe. Our results therefore provide an overwhelming support to the migration of Near Eastern/Anatolian farmers into southeast and Central Europe around 7,000-6,500 BCE [Ammerman & Cavalli Sforza, 1984, Pinhasi et al., PLoS Biology, 2005]. Our results also show differences between early Anatolians and all present-day populations from the Near East, Anatolia, and Caucasus, showing that the early Anatolian farmers, just as their European relatives, were later demographically replaced to a substantial degree.

Ancient European haplotype enrichment in modern Eurasian populations.

D. Harris1 ; T. O’Connor2

View Session DetailAdd to Schedule

1) Graduate Program in Molecular Medicine, University of Maryland School of Medicine, Baltimore, MD; 2) Institute for Genome Sciences, Program in Personalized and Genomic Medicine, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD.

The diversification of modern European populations is a fascinating puzzle that has recently advanced due to the sequencing of ancient European genomes. We analyzed 732 modern West Eurasian individuals using three ancient samples coming from the Lazardis et al. Human Origins Array dataset. Specifically, we determined ancient European haplotype enrichment by calculating pairwise differences (PWD) between each ancient European individual and modern Western Eurasian individuals in 50 SNP blocks. Modern Western Eurasians had the fewest PWD across all population groups with the farming Stuttgart individual and had the most PWD with the Loschbour and Motala12 hunter-gatherer individuals confirming Lazardis et al. observation that modern Europeans are more similarly related to ancient individuals coming from a farming community. We selected SNP blocks, for gene ontology enrichment analysis through the use of GORILLA, based on 1) the 10% of regions with greatest differences of PWD between groups, and 2) the 10% of those regions from the first criterion that most closely correlated with the geography of those groups. Most SNP blocks positively correlated to PC1 (latitude) and PC2 (longitude), therefore we focused on outliers that negatively correlated to biogeography. For SNP blocks that negatively correlated to PC1; “regulation of chondrocyte development”, “androsterone dehydrogenase activity”, and “antigen processing and presentation of endogenous peptide antigen” had the highest enrichment score in the comparison of the Stuttgart, Loschbour, and Motala12 individuals, respectively. Interestingly, the “alpha-beta T cell receptor complex” and “interleukin-17 receptor activity” (including CD3D,E,G and IL17RC,E) were enriched in the Loschbour and Motala12 comparisons of SNP blocks that were positively correlated to PC2. In addition, the Stuttgart individual had the lowest PWD disparity between all modern populations for the SNP blocks that contain the IL17R and CD3 genes, which potentially indicates selection acting on these immune system haplotypes from the Stuttgart individual consistent with the Stuttgart farmer and modern Europeans’ continual close interaction with animals and zoonotic disease exposure. In conclusion, our approach of calculating PWD in small SNP blocks supported prior conclusions made by Lazardis et al. and illuminated small genomic haplotypes that are of importance to the evolution of modern West Eurasian populations.

Clarifying the disputed role of FOXP2 in modern human origins.

E. G. Atkinson; B. M. Henn

View Session DetailAdd to Schedule

Dept. of Ecology and Evolution, Stony Brook University, Stony Brook, NY.

Identified for its pivotal role in the development of spoken language, the FOXP2 gene is also known for its controversial role in human evolution. Early genetic work identified a selective sweep for two derived amino acid substitutions in FOXP2 during recent human evolution (within the past 200,000 years), supported in large part by detection of an extremely low Tajima’s D value at the gene. When the genomes of other ancient hominids were found to contain the same fixed genetic variants , however, the conflicting timelines between the signals of selection obtained from the molecular sequence of the gene as compared to divergence time estimates between humans and other ancient hominid species were irreconcilable. Selection for these two amino acids thus appears not to be human-specific, yet many papers continue to work from a hypothesis of positive selection of FOXP2 in humans. Here, we comprehensively re-analyze FOXP2 with next-generation genomic datasets comprising hundreds of individuals and thousands of SNPs. Specifically, we test for fine-scale molecular patterns in the gene and between various human populations in order to resolve estimates of selection. We are unable to replicate the original negative Dsignal in the expanded human genomic datasets, despite having many more variants, more diverse individuals, and greater statistical power. We can, however, mimic the negative D result when running calculations on a subset of the HGDP genomic dataset with a sample of human populations comparable to the original work; i.e. one-third Africans and two-thirds individuals who underwent the Out-of-Africa expansion. The D signal thus appears to have been due to the pooling of Africans and non-Africans together for analyses, which increases the number of segregating sites relative to pairwise genetic differences. Such a result seems to have been an unintended consequence of a small sampling strategy. We apply additional selective sweep statistics and haplotype analysis to this locus to evaluate evidence for selection over the past 200,000 years, finding indications of balancing selection in Africans but not non-Africans. FOXP2 does not appear to have undergone a recent selective sweep, as had been previously proposed.

Haplogroup C Phylogeny for Altaian Populations and its Implications for the Peopling of Siberia and the Americas.

A. Askapuli1,3 ; M. C. Dulik1 ; S. I. Zhadanov1,2 ; L. P. Osipova2 ; T. G. Schurr1

View Session DetailAdd to Schedule

1) Department of Anthropology, University of Pennsylvania, Philadelphia, PA 19104-6398, USA; 2) Institute of Cytology and Genetics, SB RAS, Novosibirsk 630090, Russia; 3) Center for Life Sciences, NLA, Nazarbayev University, Astana 010000, Kazakhstan.

Characterization of mitochondrial DNA at a genomic level is very important since it provides opportunities for more accurately estimating the timing and directionality of prehistoric human migrations from a maternal perspective. The Altai Mountains are located at the geographic center of the Eurasian landmass, and have been a hotspot of human activities since ancient times due to its geographic location and rich natural resources.Aiming to contribute to a better understanding of the prehistoric human expansions in Siberia and subsequent colonization of the Americas, we sequenced and characterized eighteen whole mtDNA genomes belonging to haplogroup C from Altaian populations. The sequenced Altaian mtDNAs represent all four subgroups of haplogroup C (C1, C4, C5, and C7), and two of them belong to C1a, the Asian sister branch of Native American C1. The Altaian whole mitochondrial sequences were analyzed together with 313 previously published haplogroup C sequences from different parts of the world.The analyses of whole mitochondrial genomes reveal that haplogroup C lineages in Siberia are distributed without any specific association with geography or language, and suggest northeastern Siberia as a place of origin for haplogroup C and its subbranches C1, C4, C5, and C7. The analyses also indicate that Native American haplogroup C types are distantly related with their Siberian sister branches. Given the distribution pattern of haplogroup C in Eurasia, the timing of expansions could be inferred from the age estimates of the lineages within haplogroup C. Age estimation of haplogroup C sequences in our data set via ρ statistics shows that haplogorup C has a TMRCA of 31.25 kyr (24.13-38.56), and its subbranches C1, C4, C5, and C7 have TMRCAs of 21.64 kyr (16.83-26.55), 24.88 kyr (16.65-33.41), 19.76 kyr (13.63-26.08), and 27.2 kyr (16.69-38.17), respectively.Still, it is almost impossible to pinpoint geographic origin of Native Americans and directionality of prehistoric migrations in Siberia with certainty. Based on the results of the current study, the Amur region in northeastern Siberia could be the geographic origin for ancestral Native Americans. In order to obtain clearer picture of human population movements in Siberia and the Americas from a maternal perspective, more mitochondrial genomes need to be sequenced, especially mitochondrial genomes belonging to the relatively diverse haplogroups C and D.

Genetic, Geographic and Cultural Reconstruction of an Ancient Endogamous Community.

D. K. Sanghera1 ; A. Raina2 ; C. E. Aston1 ; D. D. Mascarenhas3

View Session DetailAdd to Schedule

1) Pediatrics, University of Oklahoma HSC, Oklahoma City, OK ,USA; 2) All India Institute of Medical Sciences, New Delhi, India; 3) Mayflower Organization for Research and Education, Sunnyvale, CA, USA.

The provenance of a rare R1a1 Y-haplogroup (Y-HG) subtype designated as 657A lies in proximity to an ancient migration route running through Afghanistan but is largely absent from other geographic locations. A clan of 657A Brahmin “founder” family lineages within the Goud Saraswat community (GSB) in a town in Western India was identified in which 15 of 16 males from nine families were R1a1 Y-HG, including 10 who were 657A. TMRCA calculations using pairwise comparisons to control cohorts suggested a probable migration history for this priestly subgroup. To support this genetic narrative we present archeological, toponymic, numismatic, linguistic, iconographic, architectural, sociological and literary data. Specifically, in this study we test two main hypotheses regarding these 657A families: (1) Using Y-HG centroid analysis, chi-square analysis of TMRCA distributions and archeological find-spots, and discriminant function analysis we show that the parental Z93 L342.2 sub-clade in which 657A occurs originated in West Asia and that 657A individuals migrated toward the southeast by a Bolan Pass route distinct from the traditionally presumed route of “Vedic” ingress into the Indian subcontinent; and (2) Priestly 657A lineages in Western India retain distinct family practices with respect to literacy, religious practice and migration not shared by other more orthodox Brahmins of canonical geographic origin within the same community, despite intermarriage. Long-term transmission of differentiated family practices within a single patrilineal endogamous community has rarely been documented.

Reconstructing genetic history of Siberian and Northeastern European populations.

E. Wong1 ; A. Khrunin2 ; L. Nichols2 ; D. Pushkarev3 ; D. Khokhrin2 ; D. Verbenko2 ; O. Evgradov4 ; J. Knowles4 ; J. Novembre5; S. Limborska2 ; A. Valouev1

View Session DetailAdd to Schedule

1) Department of Preventive Medicine, Keck School of Medicine of USC, Los Angeles, CA; 2) 2.Department of Molecular Bases of Human Genetics, Institute of Molecular Genetics, Russian Academy of Sciences, Moscow, Russian Federation; 3) Illumina, Inc., Advanced Research Group, San Diego, CA, USA; 4) 4.Department of Psychiatry and Behavioral Sciences, Keck School of Medicine, Zilkha Neurogenetic Institute, University of Southern California, CA, USA; 5) Department of Human Genetics, University of Chicago, Chicago, IL, USA.

Siberia and Western Russia are home to some of the least studied ethnic groups in the world, and their genetic history holds keys to understanding peopling of the world. We present whole-genome sequencing data from 28 individuals belonging to 14 distinct indigenous populations from that region. We used these datasets together with an additional 32 modern-day and 15 ancient human genomes to build and compare autosomal, Y-DNA and mtDNA trees and delineate genetic history. Our analyses uncover complex migratory processes that shaped the genetic landscapes in Asia and Europe. Admixture events between ancient Siberian groups resulted in distinct ancestries of nowadays Western and Eastern Siberians. Western Siberians share genetic affinity with modern Europeans. Both can trace their ancestry to the lineage of a 24,000-year-old Siberian Mal’ta boy. For Eastern Siberians, they have much weaker genetic affinity with Europeans and their ancestor separated from East Asians much later (approximately 10,000 years ago). Major migration wave from Eastern Siberians into Western Siberian groups occurred approximately 7,000 years ago, and it extended into Northeastern Europe. This is based on the admixtures we observed between Siberians and lineages represented by the 5,000-year-old hunter-gatherer Ire8 from Pitted Ware Culture excavated in Sweden, the 2,900-year-old Iron age Hungarian IR1 from the Mezocsat Culture, and modern-day northeastern Europeans. Our whole-genome data based on a broad sample of populations in Siberia and Western Russia provides new insights at a high-resolution into the genetic history of Eurasians.

Ages of mitochondrial DNA lineages coincides with the agriculture spread in Finland.

S. Översti1 ; P. Onkamo1 ; J. Palo2

View Session DetailAdd to Schedule

1) Department of Biosciences, PO Box 56 (Viikinkaari 5) FI-00014 University of Helsinki, FINLAND; 2) Laboratory of Forensic Biology, Department of Forensic Medicine, Hjelt Institute, PO Box 40 (Kytösuontie 11) FI-00014 University of Helsinki, FINLAND.

The current inhabitants of Finland in the Northeastern Europe are quite unique in terms of their genetic composition. Based on Y chromosomal and genome wide studies Finns differ from other European populations: especially the Y chromosomal diversity is reduced and distinctive. In contrast, Finnish mitochondrial DNA (mtDNA) haplogroup distribution is similar to other European populations. Mitochondrial genepool in modern Europeans is a mixture of Mesolithic hunter-gatherer associated haplogroups (U and V) and Neolithic associated farmer haplogroups (H, J, K and T). The frequency of hunter-associated haplogroup U in Finland is one of the highest in Western Eurasia. Also, it is more common in Eastern and Northern parts of the country while farmer haplogroups are more frequent in Southern and Western Finland.In this study we compiled a comprehensive data set of 833 modern Finnish complete mtDNA sequences from the public databases and utilized coalescent based Bayesian phylogenetic inference (BEAST v.1.8.1) to perform fine resolution phylogenetic analyses on the sequences. We also exploited previously published radiocarbon dated ancient complete mtDNA sequences from Western Eurasia in our analysis as calibration points to the phylogenetic trees, enhancing their accuracy.Our results demonstrate that among Finns, many typically “European” haplogroups, both hunter-gatherer and farmer associated, actually comprise lineages specific for Finns. Several of these lineages, despite being rather common in present Finnish population, are virtually absent from other populations. Oldest of these haplogroups date back over 7,000 years, though most appear to be around 3,000-5,000 years old. This period temporally coincidences with the arrival and especially the spreading of the agriculture and Corded Ware culture in Finland. Age estimates are also concurrent with the arrival of another culture, the textile ceramics, into Finland from Volga region (main period of textile ceramics lies between 1,700-1,000 BC). According to these results there is distinct evidence that arrival of these cultural entities also influenced Finnish mitochondrial DNA pool and this impact is still visible in modern day Finns.

An empirical recombination for demographic inference and IDB detection.

T. Y. Wang; J. H. Loo; M. Lin

View Session DetailAdd to Schedule

Mackay Memorial Hospital, New Taipei City, Taiwan.

Genome-wide data facilitate the investigation of genomic relatedness between individuals within or across populations, providing an insight into demographic histories. Genomic regions of identity by decent (IBD) in individuals, co-inherited from common ancestors, can be detected and analyzed to reveal genetic relatedness for demographic inference. Many methods have recently been developed to detect IBD regions, aiming at detecting identical regions that are statistically unlikely to occur without common ancestors.Some employ coalescent or probabilistic models to identify such IBD regions with significantly low frequencies of occurrence; the others use non-coalescent and non-probabilistic models to detect IBD regions with long lengths, which serve as proxies for low frequencies. However, due to high computational cost of coalescent or probabilistic models, the first ones are usually not fit to large datasets, and because of no short IBD regions detected, the last ones cannot provide comprehensive information for demographic inference.We propose an empirical approach that is able to infer demographic histories and to detect IBD regions simultaneously. This approach comprises of an empirical model of recombination and an IBD detection algorithm. The empirical model builds coalescent trees with recombination events based on genomic similarities of individuals, and the detection algorithm incorporates the information of coalescent trees with recombination events to identify IBD regions. These two procedures can be executed iteratively till no new IBD regions found and no new changes in coalescent trees. In addition, the two procedures can be in parallel in each iteration to improve computational efficiency.We applied our method in simulated data and two real datasets: the 1,000 genomes and the HLA alleles in Taiwan populations. First in simulation analysis, our method is able to infer demographic histories and to detect short IBD regions with high accuracy while maintaining high computational efficiency. Second in the 1,000 genome dataset, our approach not only reveals recent demographic events based on long detected IBD regions, but also ancient histories from short IBD regions. Finally in the HLA alleles in Taiwan populations, we demonstrate the pure utility of the empirical recombination model for recent demographic inference. Therefore, our proposed method is capable of detecting IBD regions efficiently and making demographic inference comprehensively.

A new locus of genetic resistance to severe malaria is associated with a locus of ancient balancing selection.

G. Band; on behalf of the MalariaGEN consortium

View Session DetailAdd to Schedule

Wellcome Trust Centre for Human Genetics, Roosevelt Drive, Oxford OX3 7BN UK.

We describe a genome-wide association study of severe malaria susceptibility using DNA from over 10,000 individuals from across sub-Saharan Africa with replication in a further 15,000. We identify a new locus of association near the glycophorin gene cluster on chromosome 4, which encodes red cell surface proteins previously shown to interact with malaria parasite surface receptors during invasion, and determines the MNS blood group. A single haplotype at this locus, common in parts of East Africa, confers 33% protection against severe malaria, and is linked to variation displaying signatures of ancient balancing selection. We describe attempts to elucidate the possible causal mutations, including imputation into an African-enriched reference panel and the refinement and imputation of large structural variants in the region. This association brings the number of loci confirmed by GWAS to be associated with severe malaria to four, all of which are involved in red blood cell function or morphology, and at least three of which display unambiguous signals of balancing selection. These analyses bring important new insights into malaria biology and may have implications for genome-wide association studies of infectious diseases more generally.

The evolutionary impact of Denisovan ancestry in Australo-Melanesians.

S. Sankararaman1,2 ; S. Mallick1,2,3 ; N. Patterson2 ; D. Reich1,2,3 ; for The Simons Genome Diversity Project

View Session DetailAdd to Schedule

1) Department of Genetics, Harvard Medical School, Boston, MA USA; 2) Broad Institute of Harvard and MIT, Cambridge, MA USA; 3) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA USA.

Analyses of genome sequences from archaic and modern humans have documented major admixture events between the ancestors of Neanderthals and non-Africans as well as between the Denisovans (a sister-group of the Neanderthals) and populations in island south-east Asia. Understanding the impact of these ancient admixture events on evolution and phenotypes is a central goal in human population genomics. While a number of recent studies have made progress towards understanding the structure and impact of Neanderthal admixture [Sankaramanan et al. Nature 2014; Vernot and Akey Science 2014], the Denisovan admixture event remains poorly understood. To this end, we adapted a statistical method previously developed for inferring Neanderthal ancestry to infer Neanderthal and Denisovan local ancestries in Melanesian populations. We applied this method to a dataset of high-coverage whole-genome sequences from 11 Melanesian individuals (2 Aboriginal Australians, 1 Bougainville Islander, 8 Papua New Guineans) that were sequenced as part of the Simons Genome Diversity Project to infer maps of Denisovan and Neanderthal ancestry in these populations. Power to confidently infer Denisovan ancestry is estimated to be about half that of Neanderthal ancestry – a consequence of the greater divergence of the sequenced Denisovan genome from the ancestral population. Nevertheless, our statistical method identifies around 38,000 Neanderthal-derived alleles and around 25,000 Denisovan-derived alleles. Using the confidently inferred ancestries across multiple individuals, we can reconstruct about 150 Mb of the genome of the introgressing Denisovan. We observe that the proportion of both Denisovan and Neanderthal local ancestry is reduced in regions of the genome with strong background selection. This observation is consistent with a model in which Neanderthal and Denisovan alleles are subject to strong purifying selection in the admixed Melanesian populations analogous to the previous observation of strong purifying selection against Neanderthal alleles in non-Africans. In addition, we document a number of regions with elevated proportions of archaic ancestry (including a previously reported example at the STAT2 locus) which represent putative candidates for adaptive introgression.

IBD sharing in the 1000 Genomes Project Phase 3 data reveals relationships from Neanderthals to present day families.

G. Povysil; S. Hochreiter

View Session DetailAdd to Schedule

Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria.

The 1000 Genomes Project data harbor information about a great variety of relationships which can be recovered using identity by descent (IBD) analysis. Short IBD segments convey information about events far back in time because the shorter IBD segments are, the older they are assumed to be. At the same time longer IBD segments can be used to detect more recent relationships as they occur in families. The identification of short IBD segments becomes possible through next generation sequencing (NGS), which offers high variant density and reports variants of all frequencies. However, only recently HapFABIA has been proposed as the first method for detecting very short IBD segments in NGS data. HapFABIA utilizes rare variants to identify IBD segments with a low false discovery rate. We applied HapFABIA to the 1000 Genomes Phase 3 whole genome sequencing data to identify IBD segments which are shared within and between populations as well as with the genomes of Neandertal and Denisova. Using the proportion of IBD segments an individual shares with any other individual in the data set, we were able to discover first degree relatives that we consequently removed from further analyses. Not only are most IBD segments found in Africans, but also each African individual has about ten times more IBD segments than any East Asian, South Asian, or European individual. Furthermore, the number of IBD segments of an individual correlates with his degree of African ancestry as reported by other methods. IBD segments can be used to recover the population of origin of an individual and find individuals with wrong population labels. By comparing the rare variants that tag an IBD segment with the genome of Neandertal and Denisova, we were able to find IBD segments shared with these ancient genomes. We extracted two types of very old IBD segments that are shared with Neandertals/Denisovans: (1) longer segments primarily found in East Asians, South Asians, and Europeans that indicate introgression events outside of Africa; (2) shorter segments mainly shared by Africans that may indicate events involving ancestors of humans and other ancient hominins within Africa. Our results from the autosomes are further supported by an analysis of chromosome X, on which segments that are shared by Africans and match the Neandertal and/or Denisova genome were even more prominent.

Novel probabilistically interpretable methods for identifying and localizing genomic targets of selective sweeps.

L. A. Sugden; S. Ramachandran

View Session DetailAdd to Schedule

Department of Ecology and Evolutionary Biology, Brown University, Providence, RI.

Human populations throughout the world have had to adapt to novel pathogens and environments; this adaptive evolution has shaped present-day genomes. Here, we introduce novel frameworks for detecting adaptive sweeps from de novo mutations that are easily extensible to detecting adaptive evolution from standing variation. While current methods for detecting adaptive mutations rely on single statistics that probe one of three major signatures of a sweep — long-range haplotype blocks, changes in the site frequency spectrum, and population differentiation — recently, composite methods have shown increased power by combining multiple statistics. However, these methods falter when a subset of their component statistics is undefined, as often happens with long-range haplotype statistics, and they yield scores that are fundamentally difficult to interpret.Our approach classifies local targets of selective sweeps within multiple populations in a way that combines multiple statistics, has an easy probabilistic interpretation, and deals naturally with undefined statistics. We introduce two classifiers that infer the probability that a new locus has undergone a sweep, based on distributions learned from demographic simulations. The first is a Naïve Bayes classifier, which assumes independence among component statistics, while the second uses a machine-learning tool called an Averaged One-Dependence Estimator (AODE) to allow for pairwise dependencies. In simulated data, we show that the Naïve Bayes classifier vastly outperforms state-of-the-art methods in detection and localization of sweep signals, in some cases reducing the number of false positive predictions by seven-fold. We show that this classifier performs particularly well when identifying completed sweeps and fast sweeps, which have great biological significance. For a subset of sweep parameters, the AODE further improves classification performance. In data from the 1000 Genomes Project, we show that both classifiers can detect known sweep targets, including the DARC locus in West Africans, the EDAR locus in East Asians, and the SLC24A5 locus in Europeans. We also show that the dependency structure implemented in the AODE is necessary for detection of some signals, including the CD36 locus in West Africans, which harbors malaria resistance alleles. Our methods produce fewer false positives and negatives compared to existing approaches, thus identifying promising targets for experimental validation.

Whole genome view of the Finnish bottleneck effects using 2926 whole genome sequences from Finland and UK.

H. Chheda1 ; P. Palta1 ; M. Pirinen1 ; S. McCarthy2 ; V. Salomaa3 ; R. Durbin2 ; T. Aittokallio1 ; A. Palotie1,4,5 ; S. Ripatti1,2,6

View Session DetailAdd to Schedule

1) Institution for Molecular Medicine Finland (FIMM), Helsinki, Finland; 2) Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; 3) National Institute for Health and Welfare, Helsinki, Finland; 4) Program in Medical and Population Genetics and Genetic Analysis Platform, The Broad Institute of MIT and Harvard, Cambridge, MA, USA; 5) Department of Medical Genetics, University of Helsinki and University Central Hospital, Helsinki, Finland; 6) Public Health, University of Helsinki, Helsinki, Finland.

Lim et al (Plos Genetics 2014) showed recently that loss-of-function (LoF) and missense variants in 0.5-5% frequency are enriched in Finnish population compared to Non-Finnish Europeans, providing an opportunity to study downstream effects of these variants in Finns. However this change in the frequencies may not be confined only to the coding region. To this extent we have studied the enrichment of variants in the Finnish population across the whole genome. To study the bottleneck effects across the whole genome, we analyzed single nucleotide variants (SNVs) from 1463 low coverage whole genome sequences both from Finland (~4.6x) and UK (6x). These samples were processed together by the Haplotype Reference Consortium to harmonize the variant calls and minimize the batch effects. As observed previously, we see a 1.34x enrichment of the LoF variants (p-value LoF = 0.056) in the 2-5% minor allele frequency (MAF) range and 1.1x enrichment in the missense variants (p-value missense = 2.95e-05). Further, we studied the enrichment of variants across the whole genome. We found significant enrichment in Finns in the MAF range from 0.5-5%, with maximum enrichment in the MAF range of 2-5% (p-value = 6.4e-323). We also see enrichment across different functional sub-categories in Finns with the highest enrichment observed for conserved regions (p-value conserved_regions=9.36e-24, p-value TFBS=6.02e-46, p-value promoter=1.67e-11, p-value enhancers=0.001), although not as considerable as for the LoF variants. Furthermore, in the regulatory regions, rare and low frequency variants (MAF <= 2%) are enriched beyond expected bottleneck effects. When limiting the analysis to the 23,441 variants that were enriched at least 100x in Finns, genes in pathways related to neuron development, signal transductions and cation transport channels were observed to be significantly over represented after correcting for multiple testing. These results show that the enrichment of low frequency variants in founder populations is not limited to coding loss-of-function and missense variants, but are also observed in conserved regions and regulatory elements. This finding provides opportunities to study downstream health effects of these variants in founder populations with multiple bottleneck effects such as Finns outside of the coding regions.

Reconstructing the Genetic History of Indigenous Caribbean Populations.

T. Schurr1 ; J. Benn Torres2 ; M. Vilar3 ; C. Melendez4 ; G. Torres2 ; J. Gaieski1 ; M. Stevenson5 ; R. Bharath Hernandez6 ; Z. Browne5 ; W. Waters5

View Session DetailAdd to Schedule

1) Anthropology, University of Pennsylvania, Philadelphia, PA; 2) Anthropology, University of Notre Dame, Notre Dame, IN; 3) Science and Exploration, National Geographic Society, Washington, DC; 4) Liga Guakia Taina-Ke, Humacao, Puerto Rico; 5) The Garifuna Heritage Foundation Inc., Kingston, St. Vincent; 6) Santa Rosa First Peoples Community, Arima, Trinidad.

In collaboration with the Garifuna/Kalinago of St. Vincent, the First People’s Community of Arima, Trinidad, and Taíno descendant communities in Puerto Rico, we are conducting an anthropological genetic study of the prehistoric and historic settlement of the Caribbean. Using genetic data generated with the GenoChip, we are evaluating hypotheses concerning the original settlement of the Greater and Lesser Antilles, as well as the expansion of Carib and Awakan-speaking populations into this region over the past few thousand years. Our initial results suggest that the Greater Antilles were colonized by indigenous populations from South America and possibly Mesoamerica, whereas the Lesser Antilles were settled by only South American groups. In addition, while sharing some indigenous mtDNA (maternal) and Y-chromosome (paternal) lineages in common, populations from the Greater and Lesser Antilles otherwise appear to be largely genetically distinct from each other. Autosomal SNP data from these indigenous Caribbean communities further expand our understanding of the genetic contributions from African, European and South Asian populations since European contact. Overall, this study demonstrates the region’s first peoples’ ongoing legacy in shaping the genetic diversity of contemporary Caribbean populations.

Selective constraint and sex-biased demography of human populations from X chromosome-autosome comparisons.

M. H. Quiver1 ; J. Lachance1 ; K. Mullen2 ; M. E. B. Hansen2 ; M. A. Chen2 ; P. H. Hsieh3 ; K. R. Veeramah4 ; S. A. Tishkoff2

View Session DetailAdd to Schedule

1) School of Biology, Georgia Institute of Technology, Atlanta, GA; 2) Departments of Biology and Genetics, University of Pennsylvania, Philadelphia, PA; 3) Department of Molecular and Cell Biology, University of Arizona, Tucson, AZ; 4) Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY.

Because the number of X chromosomes differs for men and women, comparisons between sex-linked and autosomal genetic loci reveal sex-biased patterns of human demography. Using 44 high-coverage whole genomes from a diverse global set of 11 human populations we quantified the strength of selective constraint on different chromosomes, found evidence of sex-biased colonization, and determined whether recent migrations are matrilocal or patrilocal. Relative amounts of genic and intergenic diversity were similar across all studied populations regardless of subsistence pattern or geography. The strength of selective constraint on genes was greater for X-linked loci compared to autosomal loci – a pattern that is consistent with selection against deleterious recessive alleles. The ratio of X chromosome to autosome diversity (Q) was greater than the null expectation of 0.75 for African populations and less than 0.75 for non-African populations, with lower values of Q for populations located farther from Africa. This pattern is consistent with a male-biased serial founder effect model, and computer simulations suggest a plausible out-of-Africa bottleneck size of 320-340 males and 60-70 females. Using PSMC, we found evidence of large historic population sizes for West African Pygmies, but not Hadza or Sandawe populations. Genetic distances revealed female-biased gene flow between Hadza and Sandawe hunter-gatherers, between Maasai pastoralists and African farmers, and between Chinese and Japanese populations. We found evidence of male-biased gene flow between African farmers and hunter-gatherers, and between different African farmer populations. This calls into question the idea that patrilocality is coupled with the emergence of agriculture.

A Genomic Map of Positive Selection in Sardinia.

J. H. Marcus1,9 ; M. Steri2,9 ; M. Floris3,9 ; C.WK. Chiang4 ; J. Smith5 ; F. Busonero2 ; A. Maschio2,6 ; A. Mulas2,8 ; S. Sanna2 ; G. Pistis2 ; M. Pitzalis2 ; M. Zoledziewska2 ; A. Angius2 ; C. Sidore2,6 ; D. Schlessinger7 ; G. R. Abecasis6 ; J. Novembre1,5,10 ; F. Cucca2,8,10

View Session DetailAdd to Schedule

1) Department of Human Genetics, University of Chicago, IL, USA; 2) Istituto di Ricerca Genetica e Biomedica, CNR, Monserrato, Cagliari, Italy; 3) Center for Advanced Studies, Research, and Development in Sardinia (CRS4), AGCT Program, Parco Scientifico e Tecnologico della Sardegna, Pula, Italy; 4) Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA, USA; 5) Department of Ecology and Evolution, University of Chicago, IL, USA; 6) Center for Statistical Genetics, Ann Arbor, University of Michigan, MI, USA; 7) Laboratory of Genetics, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA; 8) Università degli Studi di Sassari, Sassari, Italy; 9) Co-First Authors; 10) Co-Last Authors.


The recent production of population-scale genomic data offers an unprecedented opportunity to understand how natural selection has shaped human phenotypic variation within populations. Sardinia has a rich history of genetic studies driven by its relative isolation and high incidence of malaria, which was endemic there until eradication efforts in the 1940s. To identify signatures of recent positive selection in Sardinia, we use 23 million single nucleotide polymorphisms from low-coverage whole genomes of 3,514 Sardinians along with data from the 1000 Genomes project. Using haplotype (iHS, nSL), cross-population (Fst, PBS, XP-EHH), and site-frequency-spectra (CLR) based statistics we find many genetic variants show evidence of selection. To assess the significance of these selection statistics, we use an empirical null distribution generated from randomly chosen variants matched by minor allele frequency, local recombination rate, and background score. We also evaluate these statistics relative to a null, neutral model using a demographic history inferred from deeply sequenced Sardinian individuals. We show that selection statistics computed for outlier variants cannot be explained by neutral forces alone. By intersecting genome-wide-association study data for hundreds of traits in Sardinia with publicly available functional genomic databases we find that autoimmunity-related genes are significantly enriched for these putatively adaptive variants. Taken together, these results illustrate the importance of characterizing both the demographic history of and phenotypic variation within a population, and especially the utility of whole-genome-sequence data, when proposing and interpreting genetic signatures of positive selection.

Adaptation in global human populations has been hard, soft and polygenic.

Z. A. Szpiech1 ; R. D. Hernandez1,2,3

View Session DetailAdd to Schedule

1) Department of Bioengineering and Therapeutic Sciences, University of California at San Francisco, San Francisco, CA 94158; 2) Institute for Human Genetics, University of California at San Francisco, San Francisco, CA 94158; 3) Institute for Quantitative Biosciences (QB3), University of California at San Francisco, San Francisco, CA 94158.

There is ample debate about the strength and mode of natural selection that has occurred in recent human evolution. This is particularly so for classical hard sweeps, during which an adaptive allele quickly drags a single haplotype to high frequency. An alternative model of adaptation involves soft sweeps, whereby multiple haplotypes are brought to high frequency (i.e. when a previously segregating neutral or slightly deleterious allele becomes adaptive in a new environment). Yet another alternative model includes polygenic selection, whereby complex phenotypes driven by multiple loci across the genome are selected. Here we develop new statistics designed to identify both hard and soft sweeps, by tracking the decay of homozygosity of the k-most frequent haplotypes away from a core locus. We evaluate our statistics with rigorous simulations under multiple realistic models of human demography and find that they have high power. We then integrate signals of selection across the genome to identify characteristic signals of polygenic selection. We apply our approaches to a large dataset of 1,728 unrelated individuals spanning 20 worldwide human populations from the 1000 Genomes Project. We find that a large number of novel regions consistent with soft sweeps, particularly in African populations, and instances of polygenic selection driving the regulatory architecture of several genes. We then use an Approximate Bayesian Computation framework to infer selection parameters for these regions.

The relative effective population size of chromosome X and the autosomes along distinct branches of the human population tree.

L. Arbiza; A. Keinan

View Session DetailAdd to Schedule

Biological Statistics and Computational Biology, Cornell University, Ithaca, NY.

In recent years, many studies have focused on the effective population size of chromosome X relative to the autosomes. This comparison can be useful to reveal past demographic processes, differences in the histories of males and females, and the action of natural selection. We have recently shown how the ratio of nucleotide diversity between the two (X-to-Autosome ratio; X/A), when compared between pairs of populations (relative X/A), can be used to uncover sex-biased processes in human history. While this strategy serves to alleviate the response of genetic diversity to the influence of events in a time range that largely predates the split of the studied populations, a different and more natural approach to capture recent changes occurring after populations split can be formulated based on the differentiation of allele frequencies between populations, as commonly summarized by the F ST statistic. Here, we consider population differentiation in humans, and extend beyond simple pairwise comparisons, using allele frequency differences across several populations to learn about the ratio of X-to-autosomal effective population size along distinct branches in the tree of human populations. We then test these for differences from the expectation of equal female-to-male breeding ratios, as well as differences between different branches. Using coalescent simulations of a variety of previously published human demographic models, we show that our approach is able to capture the ratio of interest and is more accurate than estimates based only on pairwise F ST across all pairs of populations. We then turn to the latest data from the 1000 Genomes Project, controlling for the effect of uncertainty associated with low coverage sequencing, as well as the influence of linked selection (background selection or hitchhiking), all of which differentially affect the X chromosome and the autosomes. Estimating the X-to-autosomal effective population size ratio for branches leading to different 1000 Genomes populations, as well as for internal branches in the population tree, points to a higher female effective population size in African-specific population history, but not in non-Africans. More interestingly, we localize previously-debated observations to a significant increase in male effective population size on the branch leading to all non-African populations, suggesting male-biased processes associated to the Out-of-Africa event.

Estimation of growth rates for populations and haplogroups using full Y chromosome sequences.

F. L. Mendez; G. D. Poznik; C. D. Bustamante; 1000 Genomes Project Consortium

View Session DetailAdd to Schedule

Department of Genetics, Stanford University, Stanford, CA.

Evolutionary processes affecting a population influence gene genealogies across the genome. Coalescent theory provides the mathematical framework to connect realized genealogies to the underlying evolutionary processes. However, in most cases, information about the genealogies is obtained only indirectly through the observation of genetic variation. Therefore, in general, very limited information about any individual locus is available. As the longest non-recombining portion of the human genome, the Y chromosome accumulates mutations relatively quickly. When large amounts of sequence are used, the Y chromosome provides an unparalleled ability to resolve the structure and coalescence times of its genealogy. Because patterns of variation in the Y chromosome are only influenced by processes affecting men, they can be used to study both demographic and social phenomena. The 1000 Genomes Project includes whole Y-chromosome data from more than 1000 men and has an extensive representation of most lineages that have experienced recent massive expansions in size. Though the dynamics of population growth have likely changed over time, we are more interested in the growth rates at the times of these rapid expansions than on an average effect. To study this, we have developed a new method that takes advantage of the temporal resolution provided by Y-chromosome data and of historical data, while accounting for the uncertainties associated with the coalescent and mutational processes. We estimate the growth rates for several branches of the Y-chromosome tree, including those in Europe, sub-Saharan Africa and South Asia. We estimate that several lineages within the European R1b, sub-Saharan African E1b, and South Asian R1a haplogroups experienced growth rates of at least 20-60% per generation at the onset of their massive expansions, some 3-5 thousand years ago. These high growth rates are comparable to those experienced by human populations during the 20th century. However, we find that most observed genealogies are unlikely to be the result of whole population expansion or of natural selection.

Polygenic Adaptation Regression Analysis.

Y. Field1,2 ; N. Telis3 ; E. A. Boyle1,4 ; D. Golan1,5 ; J. K. Pritchard1,2,6

View Session DetailAdd to Schedule

1) Genetics, Stanford University; 2) Howard Hughes Medical Institute; 3) Biomedical Informatics, Stanford University; 4) Stanford School of Medicine; 5) Statistics, Stanford University; 6) Biology, Stanford University.

Understanding how natural selection had shaped the existing genetic variation within humans is a major goal in population genetics. With the growing understanding that many human diseases and complex traits have a polygenic genetic architecture, it has been hypothesized that adaptation in recent human history might be largely polygenic as well. The increased frequency of many alleles associated with genetic basis for tall stature in northern Europe, has been the major supporting example for the polygenic adaptation model. However, beyond this outstanding example, the nature and extent of polygenic adaptation in recent human history is still poorly understood. Current methods for testing for polygenic adaptation, based on allele frequency differences between populations, do not account for the linkage disequilibrium between loci. In turn, there is no general framework available for testing for adaptation over one set of functionally related loci, while controlling for possible causal effects (on allele frequency differences) by other genetically-linked genomic features. For example, one would like to test for adaptation among known GWAS hits, controlling for the selection for height; or to test for selection within regulatory regions, controlling for possible selection on non-synonymous sites; or to control for admixture effects on allele frequency differences, etc. To address this need, we have developed POLARIS, a novel and general method for POLygenic Adaptation Regression analysIS. Our method is based on a multivariate normal model for the frequency differences between populations, which is structured to explicitly represent linkage disequilibrium, drift and annotation-dependent polygenic adaptation. The method allows to test, and control, for annotation-dependent effects on both the mean and variance of allele frequency difference, giving it a great flexibility to mix directed and undirected hypotheses. As we demonstrate with an initial analysis of publically available datasets, POLARIS opens the road for a richer and more extensive characterization of the nature and extent of polygenic adaptation in recent human history.

Rare variants are a large source of heritability for gene expression patterns.

R. Hernandez1,2,3 ; D. Vasco1 ; L. Uricchio1,4 ; C. Ye2,5 ; N. Zaitlen6,2,3

View Session DetailAdd to Schedule

1) Bioeng. & Therapeutic Sci, UCSF, San Francisco, CA; 2) Institute for Human Genetics, UCSF, San Francisco, CA; 3) Institute for Quantitative Biosciences, UCSF, San Francisco, CA; 4) Department of Biology, Stanford University, Stanford, CA; 5) Epidemiology & Biostatistics, UCSF, San Francisco, CA; 6) Department of Medicine Lung Biology Center, UCSF, San Francisco, CA.

Understanding the genetic architecture of complex traits is a central challenge in human genetics. There currently exists a large disparity between heritability estimates from family-based studies and large-scale genome-wide association studies (GWAS), which has been sensationalized as the “missing heritability problem”. Among the possible explanations for this disparity are rare variants of large effect that are not tagged by genotyping platforms. However, recent population genetic models suggest that the conditions under which rare variants are expected to substantially contribute to heritability may be fairly limited. To better understand the heritability of complex phenotypes, we investigated the role of cisalleles in gene expression levels across European and African individuals using RNA and whole genome sequencing data from the GEUVADIS and 1000 Genomes Projects. In particular, we investigate whether rare variants are likely to be a source of missing heritability in expression across genes. Using variance-component methods, we partitioned the heritability of expression levels explained by cis variants for each gene in the genome across several frequency bins from rare (≤1%) to common (>10%). We performed extensive simulations to validate our heritability estimation procedure. We find that when pooling all variants in cis (within 500kb of a gene), heritability estimates are on average h c2=17.6% (with 4.7% of genes having h c2>50%). Using variance-component methods, we find that in cis, rare variants (MAF ≤ 1%) contribute significantly more heritability than common variants (MAF > 10%) across genes (p MWU=1.1×10-6). In particular, 35.6% of h c2 across genes is contributed by rare variants, while common variants contribute 22.3%. This observation suggests that rare variants play a substantial role in the heritability of gene expression patterns, which is inconsistent with neutral evolutionary forces operating on the cisregulatory architecture of most genes. We discuss our results in the light of recent population genetic models of quantitative traits, and highlight the importance of understanding how natural selection can shape the genetic architecture of gene expression in humans. We conclude by discussing implications for studying a variety of complex phenotypes in humans.

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1Bgene in Europe and East Asia.

K. J. Galinsky1,2 ; G. Bhatia2,3 ; P. Loh2,3 ; S. Georgiev4 ; S. Mukherjee5 ; N. J. Patterson2 ; A. L. Price1,2,3

View Session DetailAdd to Schedule

1) Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA; 2) Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA; 3) Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA; 4) Google, Palo Alto, CA; 5) Departments of Statistical Science, Computer Science, and Mathematics, Duke University, Durham, NC.

Population differentiation is a widely used approach to detect the action of natural selection. Existing methods search for unusual differentiation in allele frequencies across discrete populations, e.g. using FST. Loci that are unusually differentiated with respect to the genome-wide FST or with respect to a null distribution of ST are reported as signals of selection. These approaches are particularly powerful for closely related populations with large sample sizes.However, population genetic data often is not naturally partitioned into discrete populations. We developed a test for selection that uses SNP loadings from principal components analysis (PCA). For a given PC reflecting geographic ancestry, under the null hypothesis of no selection, the square of the SNP loadings, rescaled by a scaling factor derived from the eigenvalue of the PC, follows a chi-square (1 d.o.f.) distribution. This statistic is able to infer selection with genome-wide significance, a key consideration in genome scans for selection. We confirmed via simulations that this statistic has correct null calibration under a wide range of demographies and is well-powered to detect selection at large sample sizes.We applied the method to a cohort of 54,734 European Americans genotyped on genome-wide arrays. PCs were inferred using our FastPCA software (running time: 57 minutes). The top 4 PCs corresponded to clines of Irish, Eastern European, Northern European, Southeast European and Ashkenazi Jewish ancestry, validated via PCA projection of samples of known ancestry. We detected genome-wide significant signals of selection at 4 known selected loci (LCT, HLA, OCA2 and IRF4) and 3 novel loci: ADH1B, IGFBP3 and IGH. 2 of the 3 novel loci could not be detected using discrete-population tests (or other existing tests). The ADH1B gene is associated with alcoholism (via the same coding SNP rs1229984 producing a signal in our selection scan) and has been shown to be under recent selection in East Asians (via a haplotype-based test for recent selection); we show here that it is a rare example of independent evolution on two continents. The IGFBP3 gene and IGH locus have been implicated in breast cancer and multiple sclerosis, respectively. Our results show that application of our PC-based selection statistic to large data sets can infer novel, genome-wide significant signals of selection at loci linked to disease traits.

Genetic origins and admixed ancestry characterization of Japanese people.

W. Ko1,2 ; K. Higasa3 ; M. Narahara2 ; F. Matsuda3 ; R. Yamada2

View Session DetailAdd to Schedule

1) Faculty of Life Sciences and Institute of Genome Sciences, National Yang-Ming University, Taipei, 112, Taiwan; 2) Statistic Genetics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, 606-8507, Japan; 3) Human Disease Genomics, Center for Genomic Medicine, Kyoto University Graduate School of Medicine, Kyoto, 606-8507, Japan.

A modern human population found at a certain geographic location is often descended from multiple ethnic groups owning to the complex migration history of human expansion. In Japan, although it has been studied extensively over the past decades, the genetic origins of Japanese people remain controversial. Current genetic evidence supports a dual model which suggested that the Japanese people are constituted mainly by an early settlement of human populations during the Upper Paleolithic period (i.e., Jomon people) followed by an admixture event with the people migrated from the Korean peninsula around 2300 year ago (i.e., Yayoi people). However, the genetic origin(s) of the native Jomons remains unclear. Tracing the genomic signatures of admixture history can not only reveal the unknown human migration events but also provide critical information that can facilitate the genetic profiling of disease susceptibility, which is critical for the success of personalized medicine. Here, we analyzed a combined dataset of the whole genome SNP genotyping data from 2,277 individuals sampled globally across >100 populations for a total of 19,290 SNPs (after intersecting the two datasets). We performed principle component analysis to project individuals onto a series of orthogonal axes to reveal the genetic structure among diverse ethnic groups. After separating the genetic components contributed from the populations representing the Yayoi, we identified several candidate populations that share common non-Yayoi ancestry with the modern Japanese people. Our results suggest that the genetic origins of Jomons may consist of multiple migration events from both Southeast and Northeast Asia. Surprisingly, we also identified an additional migration wave from the Hmong population. We assigned local ancestry (LA) on the phased chromosomes of the mainland and Okinawa Japanese by performing RFmix (which used the identified candidate ancestral populations to infer the LA tracts in admixed chromosomes by finding the most likely sequence of ancestries through maximum a posterior estimation). Because an ancient population admixture would allow more recombination events to break LA tracks into shorter segments than a recent admixture event, our results of the LA track-length distributions differ significantly between the Yayoi, Hmong, and Jomon ancestries (in descending order), suggesting that the Hmong migration may have occurred before the Yayoi migration.

The genetic structure of the Saudi Arabian population.

H. Al-Saud1 ; SM. Wakil1 ; BF. Meyer1 ; M. Falchi2 ; N. Dzimiri1

View Session DetailAdd to Schedule

1) Genetics Department, King Faisal Hospital and Research Centre, Riyadh, Saudi Arabia; 2) Department of Twin Research and Genetic Epidemiology, King’s College London, London, United Kingdom.

Saudi Arabia is the largest Gulf Cooperation Council (GCC) country. Its population consists of different tribes that originated in the northern, western, eastern, middle and south regions of Saudi Arabia, respectively. Due to political and cultural reasons, there has historically been very limited admixture between different tribes. People from the different Saudi tribes then migrated from Saudi Arabia, contributing to foundation of the populations now inhabiting other Gulf countries. Few population genetics research projects have been conducted on this highly consanguineous population that has been shown to have one of the highest prevalence in the world of recessive disorders and common metabolic diseases, especially diabetes. It is therefore important to identify the genetic substructures of the Saudi population, both to help in tracing the migratory genetic flows that contributed to other Gulf populations, and to permit designing of efficient genetic studies aimed at the identification of risk factors underlying common and rare diseases in the GCC countries. We carried out the largest population genetic study in Saudi Arabia to date, by genotyping 2,150 Saudi nationals sampled from different regions of Saudi Arabia using Axiom GWH-96 Array (Affymetrix) arrays. Model-based and model-free clustering were applied to these data, including in our analyses data on eight populations (encompassing Europe, America, Oceana, East Asia, Central South Asia, Middle East, Africa and Qatari populations) from the Human Genetic Diversity Project (HGDP) data set. We identified clear clustering of the Saudi samples into different subgroups, with some tribes showing similarity with both Central East Asian (Kalash Pakistan, Balochi Pakistan, Sindhi Pakistan, Makrani Pakistan and Brahui Pakistan subpopulations) European (Orkney Islands Europe, Russian Europe and Russian Caucasus subpopulations) and Qatari populations, while other tribes appear to show specificity of background.These data strongly support the presence of genetic stratification within the Saudi population, and suggest the presence of subgroups that are characterized by a unique genetic background different from other Arabian populations. Our findings constitute a valuable resource for the investigation of both general and population-specific genetic risk variants associated with different disorders in this population.

Recent genetic history of Denmark.

G. Athanasiadis1,2 ; F. G. Jørgensen3 ; J. Cheng1,2 ; P. C. Kjærgaard2,4 ; M. H. Schierup1,2,5 ; T. Mailund1,2

View Session DetailAdd to Schedule

1) Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark; 2) Centre for Biocultural History, Aarhus University, Aarhus, Denmark; 3) Tørring Gymnasium, Tørring, Denmark; 4) Department of Culture and Society, Aarhus University, Aarhus, Denmark; 5) Department of Bioscience, Aarhus University, Aarhus, Denmark.

Purpose.- Denmark has strong historical bonds not only with Norway and Sweden, but also with Western and Eastern Europe through a series of invasions, conquests and alliances. In addition, within Denmark, industrialization in the second half of the 19thcentury led to considerable migration from the countryside to the cities. In this work we explore the extent to which such distant and more recent historical events left their mark on the genetic structure of the current Danish population.Methods.- We ran an extensive genetic analysis on the Where Are You From? data set of ~600 students from 36 high schools across Denmark. Each student provided a saliva sample for DNA analysis and completed an online questionnaire about family origin, education level and basic biometrical data. All participants gave their informed consent and the Ethical Committee of the University of Aarhus approved the study. Genotyping was outsourced to 23andMe and more than 500,000 SNPs were available for analysis. After merging our data with data from POPRES, we ran PCA and ADMIXTURE to detect genetic structure. For more fine-grain effects, we identified each individual’s closest genetic relatives through IBD tract sharing and calculated the geographic distance between the individual’s place of birth and the weighted average geographic coordinates of their closest relatives. Finally, we explored population structure within Denmark as the result of recent admixture with adjacent populations by use of an IBD-based local ancestry method (i.e. “chromosome painting”).Results.- Although Denmark forms a distinguishable cluster from neighboring countries in the PCA plots (compatible with isolation-by-distance), no stong structure was observed within the country. Similarly, ADMIXTURE revealed high levels of homogeneity in the Danish samples compared to other North European countries. However, we did observe significant correlation between PC1 (south-north orientation) and average grandparental geographic coordinates rotated clockwise at ~30°. Also, the IBD-based geographic correlation analysis revealed that Danes tend to live near their closest genetic relatives at a median distance of 100 Km – significantly closer than the random expectation. Finally, chromosome painting revealed strong genetic influence from neighboring Nordic (Sweden and Norway) and Germanic (Germany and Holland) countries and negligible influence from Finland, France and Portugal.

Assessing the benefits of priors that encourage sparsity for estimating ancestral admixture from genome-wide data.

P. Carbonetto; Y. Wang; K. Noto; M. Barber; J. Byrnes; R. Curtis; K. Chahine; J. Granka; E. Han; A. Kermany; N. Myres; C. Ball

View Session DetailAdd to Schedule

AncestryDNA, San Francisco, CA.

Several recent papers have demonstrated the benefits of using sparse matrix factorization techniques—sparse factor analysis and non-negative matrix factorization—to infer population structure from genetic polymorphism data. The primary strength of sparse matrix factorization is its flexibility; it can capture a wide range of population structure scenarios, and can do so in a way that often has a natural interpretation. For example, sparse matrix factorization is able to recapitulate a mixture of continuous and discrete population structure, whereas other methods, such as PCA and STRUCTURE, cannot do this. However, we have found that this flexibility can come at a cost: in realistic demographic settings, it incorrectly predicts individual admixture proportions. We hypothesize that this is because sparse matrix factorization does not completely specify an admixture model. Motivated by this, we propose a model-based approach, building on ADMIXTURE, that encourages sparsity in the admixture proportions (or “loadings”). We encourage sparse estimates by introducing an exact L0-norm penalty term in the cost function that penalizes non-zero admixture proportions, then we iteratively solve for the model parameters using a hybrid EM algorithm. This penalty can also be interpreted as a prior on the number of ancestral populations contributing to an individual’s genome. We explore the behaviour of penalized and unpenalized admixture estimates in data from the Human Genome Diversity Project. Although the idea of encouraging sparse admixture estimates has been suggested previously, to our knowledge the features of this approach have not been empirically assessed in real genetic data from human populations.

Sex-Biased Admixture in the Americas.

S. Musharoff1 ; C. R. Gignoux1 ; S. Shringarpure1 ; M. A. Taub2 ; T. O’Connor3; R. A. Mathias4 ; C. D. Bustamante1 ; K. C. Barnes4 ; CAAPA Consortium

View Session DetailAdd to Schedule

1) Genetics, Stanford University School of Medicine, Stanford, CA; 2) Department of Biostatistics, Johns Hopkins University, Baltimore, MD; 3) University of Maryland School of Medicine, Baltimore, MD; 4) Division of Allergy & Clinical Immunology Department of Medicine, Johns Hopkins University, Baltimore, MD.

We studied sex-biased population histories from Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) high-coverage whole genomes (~30x depth). CAAPA comprises 673 individuals who are African-American, African, Afro-Caribbean (Barbados, Jamaica), or Latin-American (Colombia, Brazil, Puerto Rico, Honduras, Dominican Republic). X chromosomes show a decrease of European ancestry as estimated with ADMIXTURE, consistent with a history of European male-driven colonization. CAAPA Latin Americans have female-biased Native American ancestry (5.36% mean excess X-chromosomal), male-biased European ancestry (1.36% mean excess autosomal), and female-biased African ancestry (6.72% mean excess X-chromosomal). Some CAAPA African-descent populations have never been studied genetically. The Garifuna from Honduras have very little autosomal European ancestry (2.2%) but high Native American ancestry (16.6%). The Afro-Brazilians from Condé have a high proportion of African ancestry (50.5%). The Cartagena Colombians (from one of two slave ports in South & Central America) have more African ancestry than the TGP Colombians (CLM): on average CAAPA individuals have 31.1% autosomal and 29.7% X-chromosomal African ancestry and TGP CLM have 7.7% and 6.8%, respectively. Y and MT haplotype analysis support the above sex-biased admixture findings: Afro-Caribbeans have African mitochondria, Latin Americans have a mix of African and Native American mitochondria, yet both groups have mostly European Y chromosomes. We identify three Native American Y haplotypes in the Honduran Garifuna only, highlighting their unique history. Unexpectedly we identified a new subgroup of MT-E1a1a that suggests a connection with the Malagasy slave trade. We apply a novel method to infer sex-biased demography during specific time epochs to autosomal and X-chromosomal site frequency spectra. CAAPA Latin Americans show evidence for a female bias over a longer time scale, male-biased bottlenecks Out-of-Africa and into the Americas, and male-biased admixture events. We analyze ancestry tracts with the program TRACTS to estimate timings and magnitudes of sex-biased admixture events. Overall, our findings recapitulate the complex history of the Americas and highlight key differences between populations based on their local admixture histories. As this is the first time some of these unique populations have been studied, this represents a valuable population and medical genetic resource.

The Demographic Patterns Revealed by New World African Diaspora Genome.

W. Song1 ; R. A. Mathias2 ; K. C. Barnes2 ; T. D. O’Connor1 ; CAAPA Consortium

View Session DetailAdd to Schedule

1) Medicine, University of Maryland, Baltimore, MD; 2) Medicine, Johns Hopkins University, Baltimore, MD.

One of the great interests in human genetics research is to understand human population structure, demographic patterns, and evolutionary history. New World populations, such as African Americans and Latino Americans with African ancestry, provide good examples for studying large migration and admixture events in recent human history. Three questions in particular are: 1) where did different sources of admixture come from, 2) when did admixture happen, and 3) what is the difference among subpopulations. To answer these questions we will make use of the Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA), which contains high coverage (~30× depth) whole genome sequence data of 952 individuals of African ancestry. These individuals were selected from populations in North and South America, the Caribbean and continental Africa to form a large spectrum of New World African Disapora. We merged all CAAPA data with 1963 individuals from the publically available Human Origins genotype data. After filtering rare variation (MAF < 5%), there are 389,397 SNPs in autosomal chromosomes left for the analysis of population admixture in this work. The filtered SNPs are first phased using Shapeit with the reference panel from 1000 Genomes Project. PCA-based local ancestry estimation on the CAAPA dataset is performed with PCAdmix, using the continental reference samples from Human Origin dataset. Ancestry-specific PCA (ASPCA) Analysis of PCAmask, in which ancestry specific regions from Europeans, Africans, and Native Americans are masked in PCA with sub-continental reference panels, reveals that the European ancestry in these New World African Diaspora populations comes from two main parts of Europe: Northwest (English/French) and Southwest (Spanish). We use Malder and Tracts to identify the timing of admixture in these populations. The African and Native-American ancestries admix with each other about 13-16 generations ago and later European ancestry entered into these populations 6-8 generations ago. We show that the origin and time of European introgression are different between New World African ancestry populations. Our results clearly reflect the ancestry patterns of African admixed populations in America and provide a general pipeline to study the evolutionary history of other New World populations.

Percent African admixture is associated with telomere length in a healthy adult population.

L. R. Yanek1,2 ; K. R. Iyer1 ; M. A. Taub3 ; D. Vaidya1,2,3 ; B. G. Kral1,2 ; L. C. Becker1,2 ; M. Armanios1 ; D. M. Becker1,2,3 ; R. A. Mathias1,2,3

View Session DetailAdd to Schedule

1) Medicine, Johns Hopkins University, Baltimore, MD; 2) GeneSTAR Research Program, Johns Hopkins University, Baltimore, MD; 3) Public Health, Johns Hopkins University, Baltimore, MD.

Africans and African Americans (AA) have been shown to have longer leukocyte telomere length (LTL) than persons with European ancestry (EA), but the extent to which this is a function of a finer scale of admixture (i.e. percent African ancestry) remains unknown. We examined whether the percentage of African admixture is associated with telomere length in 283 healthy subjects (average age = 42.3, age range = 21-80 years, n female = 161; n AA = 127). Telomere length was calculated from whole genome sequence (WGS) data (>30x coverage on the Illumina HiSeq platform) for 7 contiguous repeats of the telomere motif (TTAGGG or CCCTAA) using the approach of Ding et al (2014). Admixture was estimated using 50,000 LD-pruned SNPs in STRUCTURE using three ancestral groups to calculate the % African and % European ancestry. Standard linear models were used to evaluate the association between telomere length and admixture estimates. In the AAs, average African ancestry was 80% (range: 45-98%) and average European ancestry was 20% (range: 2-55%). In the EAs average European ancestry was 99% (range: 72-99%) and average African ancestry was 1% (range: 0.01-19%). As previously observed, an overall comparison between the two groups reveals longer telomere length in AAs than EAs (84364 vs 78560 kb, p=0.0008). On a continuous scale, % of African ancestry was significantly correlated with telomere length (r=0.22, p=0.0002). Furthermore, in AAs, there is a strong association between % African ancestry and telomere length (each percent increase in African admixture was associated with an increase in telomere length of 6284 kb, p=0.0036). Given minimal variation in the EAs, there is no observed association with % African ancestry and telomere length in this group (p=0.1552). We confirm the prior observations that telomere length is different between AAs and EAs, and show here that within African Americans the % of African admixture is a significant predictor of telomere length. Future studies of racial differences in telomere length may need to account for differences in the proportion of African ancestry among subjects, particularly within African Americans.

The lingering load of archaic admixture in modern human populations.

K. Harris1,2 ; R. Nielsen2,3

View Session DetailAdd to Schedule

1) Stanford University, Stanford, CA; 2) University of California Berkeley, Berkeley, CA; 3) Center for Bioinformatics, University of Copenhagen, Copenhagen, Denmark.


Founder effects and bottlenecks can damage fitness by letting deleterious alleles drift to high frequencies. This almost certainly imposed a burden on Neanderthals and Denisovans, archaic hominid populations whose genetic diversity was less than a quarter of the level seen in humans today. A more controversial question is whether the out-of-Africa bottleneck created differences in genetic load between modern human populations. Some previous studies concluded that this bottleneck saddled non-Africans with potentially damaging genetic variants that could affect disease incidence across the globe today (e.g. Lohmueller, et al. 2009; Fu, et al. 2014), while other studies have concluded that there is little difference in genetic load between Africans and non-Africans (e.g. Simons, et al. 2014; Do, et al. 2015). Although previous studies have devoted considerable attention to simulating the accumulation of deleterious mutations during the out-of-Africa bottleneck, none to our knowledge have incorporated the fitness effects of introgression from Neanderthals into non-Africans. We present simulations showing that archaic introgression may have had a greater fitness effect than the out-of-Africa bottleneck itself, saddling non-Africans with weakly deleterious alleles that accumulated as nearly neutral variants in Neanderthals. Assuming that the exome experiences deleterious mutations with additive fitness effects drawn from a previously inferred gamma distribution, we predict that the fitness of the average Neanderthal was about 50% lower than the fitness of the average human, implying the existence of strong selection against early Neanderthal-human hybrids. This is a direct consequence of mutation accumulation during a period of low Neanderthal population size that is thought to have lasted ten times longer than the out-of-Africa bottleneck (Pruefer, et al. 2014). Although our model predicts some transmission of deleterious Neanderthal variation to present-day non-Africans, it also predicts that many Neanderthal alleles have been purged away, depleting conserved genomic regions of Neanderthal ancestry as observed empirically by Sankararaman, et al. (2014). Our results imply that the deficit of Neanderthal DNA from functional genomic regions can be explained without the action of epistatic reproductive incompatibilities between human and Neanderthal alleles.

Y-chromosome diversity suggests southern origin and Paleolithic backwave migration of Austro-Asiatic speakers from eastern Asia to the Indian subcontinent.

XM. Zhang1 ; SY. Liao2 ; XB. Qi1 ; JW. Liu1,8 ; J. Kampuansai5 ; H. Zhang1 ; ZH. Yang3,4 ; B. Serey6 ; T. Sovannary6 ; L. Bunnath6 ; H. Seang Aun6 ; H. Samnom7 ; D. Kangwanpong5 ; H. Shi3,4 ; B. Su1,4

View Session DetailAdd to Schedule

1) Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, China; 2) School of Life Sciences, Anhui University, Hefei 230039, China; 3) Center for Primate Translational Medical Research, Kunming University of Science and Technology, Kunming 650500, China; 4) Yunnan Key Laboratory of Primate Biomedical Research, Kunming 650500, China; 5) Department of Biology, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand; 6) Department of Geography and Land Management, Royal University of Phnom Penh, Phnom Penh 12000, Cambodia; 7) Capacity Development Facilitator for Handicap International Federation and Freelance Research, Battambang 02358, Cambodia; 8) Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing 100101, China.

Analyses of an Asian-specific Y-chromosome lineage (O2a-M95)—the dominant paternal lineage (60.65% on average) in Austro-Asiatic (AA) speaking populations, who are found on both sides of the Bay of Bengal—led to two competing hypothesis of this group’s geographic origin and migratory routes. One hypothesis posits the origin of the AA speakers in India and an eastward dispersal to Southeast Asia, while the other places an origin in Southeast Asia with westward dispersal to India. Here, we collected samples of AA-speaking populations from mainland Southeast Asia and southern China and then analyzed both the Y-chromosome and mtDNA diversities. Combining our samples with previous data, we generated a comprehensive picture of the O2a-M95 lineage in Asia, including both AA and Daic speaking populations. We demonstrated that the O2a-M95 lineage originated in the southern East Asia among the Daic-speaking populations ~20-40 thousand years ago and then dispersed southward to Southeast Asia after the Last Glacial Maximum before moving westward to the Indian subcontinent. This migration resulted in the current distribution of this Y-chromosome lineage in the AA-speaking populations. Further analysis of mtDNA diversity showed a different pattern, supporting a previously proposed sex-biased admixture of the AA-speaking populations in India.

Historical mating patterns in the U.S. revealed through admixture and IBD patterns from genome-wide data from over 800,000 individuals.

J. M. Granka1 ; Y. Wang1 ; E. Han1 ; J. K. Byrnes1 ; A. Kermany1 ; R. E. Curtis2 ; P. Carbonetto1 ; K. Noto1 ; M. J. Barber1 ; N. M. Myres2 ; C. A. Ball1 ; K. G. Chahine2

View Session DetailAdd to Schedule

1) AncestryDNA, San Francisco, CA; 2) AncestryDNA, Provo, UT.

Within a diverse population like the United States, many individuals are admixed, with ancestry from many worldwide regions. Non-random mating and migration can result in non-random combinations of ancestries within ad­­­mixed individuals (i.e., certain sets of ancestries may be common, and others may be rare); such dynamics can also affect patterns of identity-by-descent (IBD) among admixed and non-admixed individuals. To shed insight into historical mating and migration, we study genome-wide genotype data of over 800,000 AncestryDNA customers, as well as a subset of over 400,000 born in the US. First, we use a supervised algorithm to estimate individuals’ genetic admixture proportions across 26 global regions. We measure correlations between the estimated ancestries, and find certain sets of ancestries to frequently co-occur in individuals’ estimates. Such relationships may reflect historical events; e.g., the association between ancestry from the Americas and the Iberian Peninsula could reflect Colonial Era admixture. In addition to historical mating patterns, however, the admixture inference procedure and the delineation of global regions could also impact such correlations. To disentangle whether these trends could reflect mating patterns and preferences, we examine associations between the estimated ancestries of the parents of over 10,000 trios. Observed correlations agree with many of those identified within individuals, and potentially reflect more recent historical trends. Thirdly, we extend our study to IBD patterns in an inferred IBD network among genotyped individuals. Sub-clusters of the IBD network, which can often be annotated by ethnicity or historical US migration, are often inter-connected by bridging IBD connections; we highlight several connected sub-clusters in light of findings from genetic ancestry. Finally, we corroborate findings from these three analyses, as well as their potential timescales, by examining over 500,000 AncestryDNA customer pedigrees. Associations of country-level birth locations between pairs of couples support many of the non-random associations of ethnicities and IBD connections identified using genetic data. Many of the associations we observe reflect historical phenomena, and while not conclusive about their cause, suggest that many individuals with admixed ancestry, including those in the US, have present-day genetic signatures reflecting the migration and subsequent non-random mating of their ancestors.

Discovery of a previously unknown ancestral origin of the modern Taiwanese population that is distinct from the north-south gradient seen in other Han Chinese populations using the Taiwan Biobank.

C. W. Lin1,2 ; C. H. CHEN1,2,3,4 ; J. H. YANG3 ; P. E. WU1,2 ; C. N. HSIUNG1,2 ; L. C. CHANG3 ; J. CHANG2 ; I. W. SONG2; S. L. YANG2 ; F. T. LIU1,2 ; C. Y. SHEN1,2,5

View Session DetailAdd to Schedule

1) Taiwan Biobank, Academia Sinica, Taipei, Taiwan; 2) Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan; 3) National Center for Genome Medicine, Academia Sinica, Taipei, Taiwan; 4) School of Chinese Medicine, China Medical University, Taichung, Taiwan; 5) College of Public Health, China Medical University, Taichung, Taiwan.

The aim of the Taiwan BioBank is to build a nationwide biomedical research database that integrates genomic profiles, lifestyle patterns, dietary habits, environmental exposure histories, and long-term health outcomes of 300,000 Taiwanese residents (representing almost 1.5% of the Taiwanese population). We describe here results from 8265 samples that were genotyped using the Taiwan BioBank array, which was specifically designed for the Taiwanese population. After data quality control, genotype data for 589,016 single-nucleotide polymorphisms (SNPs) in 7203 unrelated individuals were denoted as TWB7203 and further analyzed. The 7203 individuals were clustered into three cline subgroups: 4.5% were of northern Han Chinese ancestry, 77.6% were of southern Han Chinese ancestry, and 17.8% were an admixture of Han Chinese and a previously unknown ethnic group. This unknown group was genetically distinct from neighboring southeast Asian groups and Austronesian tribes, but was similar to the southern Han Chinese. Long-range linkage disequilibrium and flips of major alleles at about 400 SNPs across the major histocompatibility complex region suggested that the previously unknown group may have experienced evolutionary events different from those of the other southern Han Chinese. The difference was further supported by the unique pattern of body figures measures of this unknown group. Genome-wide summary statistics for the ethnic subgroups of TWB7203 were released through a publicly accessible web-based calculation platform, Taiwan View (, on which genome-wide association analyses can be performed using TWB7203 as the reference. The release of this large-scale population-level and subpopulation-level genomic information will greatly benefit human genetic research.

Fine scale population structure of Spain and the genetic impact of historical invasions and migrations.

C. Bycroft1 ; C. Fernandez-Rozadilla1,2 ; A. Carracedo2 ; C. Ruiz-Ponte2 ; I. Quintela-García3 ; P. Donnelly1,4 ; S. Myers1,4

View Session DetailAdd to Schedule

1) Wellcome Trust Centre for Human Genetics, University of Oxford; 2) Galician Public Foundation of Genomic Medicine (FPGMX)-Grupo de Medicina Xenómica-Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERer)-Universiy of Santiago de Compostela, Spain; 3) Grupo de Medicina Xenomica, Universidade de Santiago de Compostela, Centro Nacional de Genotipado – Plataforma de Recursos Biomoleculares y Bioinformaticos – Instituto de Salud Carlos III (CeGen-PRB2-ISCIII); 4) Department of Statistics, University of Oxford, 1 South Parks Road, Oxford, OX1 3TG, UK.

As well as being linguistically and culturally diverse, the Iberian Peninsula is unusual among European regions in that its demographic history includes a prolonged and large-scale occupation by people of predominately north-west African origin. Therefore, the Iberian Peninsula provides a unique opportunity for studying fine-scale population structure and admixture, and to test cutting-edge methods of detecting complex or subtle population genetic patterns.Previous studies using Y-chromosome, mtDNA as well as autosomal data have detected limited genetic structure in Iberia. However, powerful new methods and larger datasets mean it has recently become possible to detect and characterise genetic differentiation at a sub-national level. We performed the largest and most comprehensive study of Spanish population structure to date by analysing a dataset of ~1,400 Spanish individuals typed at ~700,000 SNPs. Using the fineSTRUCTURE method we detected striking and rich patterns of population differentiation within Spain, at scales down to tens of kilometres. Strikingly, the major axis of genetic differentiation in Spain runs from west to east, while conversely there is remarkable genetic similarity in the north-south direction.To infer details of historical population movements into Spain, we analysed Spain alongside a sample of ~6,000 individuals from Europe, North Africa, and sub-Saharan Africa. Across Spanish groups, we identify varying genetic contributions from north-west African ancestral populations, at times that all fall within the period of Islamic occupation. We also identify Basque-like admixture within Spanish groups to the south of the Basque-speaking region, implying southerly gene flow from this region. This analysis has revealed details of the strengths and weaknesses of different approaches to investigating population genetic history, as well as providing important new insights into the complex genetic history of Spain.

Prevalence of an archaic high altitude adaptive EPAS1 haplotype in the Himalayas.

Q. Ayub1 ; S. Hackinger1 ; T. Kraaijenbrink2; Y. Xue1 ; M. Mezzavilla1 ; G. van Driem3 ; M. A. Jobling4 ; P. de Knijff2 ; C. Tyler-Smith1

View Session DetailAdd to Schedule

1) Wellcome Trust Sanger Institute, Hinxton, Cambs, United Kingdom; 2) Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands; 3) Institute of Linguistics, University of Bern, Bern CH-3012, Switzerland; 4) Department of Genetics, University of Leicester, Leicester, United Kingdom.

Genetic, biochemical and morphological changes have enabled humans to adapt to living at high altitudes in Asia, Africa and South America. High altitude adaptation in Tibetans is reportedly influenced by introgression of a 32.7 kb long haplotype from the Denisovans, an extinct branch of archaic humans. This haplotype lies within the endothelial PAS domain protein 1 (EPAS1), a transcription factor acting in the hypoxia inducible factor pathway. A parallel study indicated that the same haplotype had probably entered the Tibetan population from the Sherpa, a high altitude adapted population from Nepal, thus suggesting that most likely the Denisovan introgression occurred in a population ancestral to the Sherpa and Tibetans. We genotyped 22 single nucleotide variants (SNVs) in this region in 1,550 Eurasian individuals, including 1,233 from Bhutan and Nepal residing at altitudes ranging from 86 – 4,550 m above sea level. Derived alleles for 5 SNVs (rs115321619, rs73926263, rs73926264, rs73926265, rs55981512) that characterize the core Denisovan haplotype (AGGAA) were present at high frequency not only in Tibetans and Sherpa, but also among many ethno-linguistic groups from Bhutan and Nepal. The frequency of the Denisovan core haplotype in these populations shows a significant correlation with altitude (Spearman’s correlation coefficient = 0.797, p-value 6.996 x 10-12). The Denisovan derived alleles were also observed at frequencies of 3-14% in the 1000 Genomes Project African samples and an additional 10 East and South Asian samples shared the Denisovan haplotype that extends beyond the 32 kb region. These additional samples enabled us to refine the haplotype structure and identify candidate functional variants that might be driving the selection signal.

OCA2 confers convergent skin lightening of East Asians during recent human evolution.

B. Su1 ; Z. Yang1,2 ; H. Zhong3 ; J. Chen4 ; X. Zhang1 ; H. Zhang1 ; X. Luo1,2 ; S. Xu5 ; H. Chen6 ; D. Lu5 ; Y. Han7 ; L. Li8 ; L. Fu8; X. Qi1 ; Y. Peng1 ; K. Xiang1 ; Q. Lin1,2 ; Y. Guo1 ; M. Li1 ; X. Cao1 ; Y. Zhang1 ; L. Zhang4 ; X. Guo9,10 ; S. Dong9 ; F. Liang9; J. Wang9,10 ; A. Willden1 ; Q. Li7 ; A. Meng4 ; H. Shi1

View Session DetailAdd to Schedule

1) State Key Laboratory of Genetic Resources and Evolution, Kunming Instititue Zoology, Chinese Academy Sciences Kunming, China; 2) Kunming College of Life Science, University of Chinese Academy of Sciences, Beijing, China; 3) Department of Pathology and Immunology, Baylor College of Medicine, Houston, USA; 4) State Key Laboratory of Biomembrane and Membrane Engineering, School of Life Sciences, Tsinghua-Peking Center for Life Sciences, Tsinghua University, Beijing, China; 5) Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; 6) Center for Computational Genomics, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China; 7) College of Life Science, Liaoning Normal University, Dalian, China; 8) No.1 School of Clinical Medicine of Kunming Medical University, Kunming, China; 9) BGI-Shenzhen, Shenzhen, China; 10) Department of Biology, University of Copenhagen, Copenhagen, Denmark.

Skin lightening among Eurasians is considered an adaptation to high latitude environments, likely occurred independently in Europe and eastern Asia due to convergent evolution. In Europeans, several responsible genes for lightening have been found, but for East Asians the situation remains elusive. We conducted a genome-wide comparison between dark-skinned Africans and Austro-Asiatic speaking aborigines and light-skinned northern Han Chinese, and identified a pigmentation gene OCA2showing unusually deep allelic divergence between them. An amino acid substitution (His615Arg) of OCA2prevalent in most eastern Asian populations, but absent in Africans and Europeans, was significantly associated with skin lightening in northern Han Chinese. Further transgenic and targeted gene modification analyses in zebra fishes and mice both recapitulated the phenotypic effect of the OCA2 variant, resulting from a decreased melanin production. Our results indicate that OCA2 plays a key role in the convergent skin lightening of East Asians during recent human evolution.

Genetic Basis of Polygenic Adaptation in Indigenous Siberian Populations Inferred using Exome Sequencing Data.

P. Hsieh1 ; B. Hallmark2 ; TM. Karafet3 ; MF. Hammer1,3 ; RN. Gutenkunst1,4

View Session DetailAdd to Schedule

1) Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ; 2) Interdisciplinary Program in Statistics, University of Arizona, Tucson, AZ; 3) Arizona Research Laboratories, Division of Biotechnology, University of Arizona, Tucson, AZ; 4) Department of Molecular and Cellular Biology, University of Arizona, Tucson, AZ.

Siberia is one of the coldest environments on Earth and has great seasonal temperature variation. Recent archeological studies indicate that humans have occupied Siberia for at least ~45,000 years, and persisted through the Last Glacial Maximum in North Eurasia. As early modern humans dispersed from their ancestral tropical African homeland into much cooler environments, long-term settlement in Siberia undoubtedly required biological adaptation to severe cold stress, dramatic variation in photoperiod, as well as limited and highly variable food resources. Humans are the only primate species other than the Japanese macaque that has adapted to boreal conditions—where temperatures remain far below freezing for more than half the year—pointing to intense selection pressures that likely drove the enhancement of physiological processes that generate and conserve heat. Physiological evidence, such as differences in basal metabolic rates and brown adipose tissue, suggests genetic adaptions in Arctic populations to life at high latitude. Because many of these physiological traits, including body mass and metabolic processes, are highly polygenic, we sought signatures of polygenic selection in Siberian populations. We sequenced exomes of individuals from two indigenous Siberian populations: the Nganasans (N = 21), who are the northernmost indigenous group in the world, and the Yakuts (N = 21), who live in the coldest regions on our planet. To detect polygenic selection, we performed gene-set enrichment analysis using pathways from the NCBI Biosystems database as well as a set of candidate genes that have been previously implicated in cold adaptation. The significance of the candidate gene sets for polygenic selection was assessed using whole-exome coalescent simulations to account for potential biases caused by demographic processes and heterogeneity in mutational and recombination rates across the entire genome. Our results thus give insight into the complex polygenic basis of adaptation to life in cold environments in human populations.

Demographic inferences from 447 complete human genome sequences from 148 populations worldwide.

M. Metspalu1 ; L. Pagani2 ; D. Lawson3 ; A. Kushniarevich1 ; R. Mägi5 ; L. Saag1 ; A. Eriksson4 ; A. Manica4 ; T. Kivisild2 ; International Collaboration effort of the Estonian Centre for Genomics

View Session DetailAdd to Schedule

1) Estonian Biocentre, Tartu, Estonia; 2) Department of Biological Anthropology, University of Cambridge, Cambridge, United Kingdom; 3) Heilbronn Institute, School of Mathematics, University of Bristol, Bristol BS8 1TH, UK; 4) Department of Zoology, University of Cambridge, Cambridge, UK; 5) Estonian Genome Center, University of Tartu, Tartu, Estonia.

Complete high coverage individual genome sequences carry the maximum amount of information for reconstructing the evolutionary past of a species in the interplay between random genetic drift and natural selection. Here we use a novel dataset of 447 human genomes sequenced at 40X on the same platform (Complete Genomics) and uniform bioinformatic pipelines. Based on SNP-chip data we generally chose three samples to represent each population of interest. We cover a wide range of mostly Eurasian populations with additional populations from Oceania, South America and Africa.Here we describe the dataset in terms data quality and new recovered genetic variation that originates predominantly from previously subsampled continental regions.Using MSMC, D-statistics and Finestructure we have shown that peopling of the World from Africa is best explained by at least two migration waves (See Lawson et al abstract nr …). Here we expand on these conclusions by investigating short IBD segment sharing patterns using diCal, Hapfabia etc. We also disentangle split times involving the two migrations out of Africa (OoA), by running MSMC separately on genome chunks derived from OoA1 and OoA2. We also present detailed regional population histories in reconstructions of past dynamics of effective population size and population split times.

Global diversity in the TAS2R38 bitter taste receptor: revisiting a classic evolutionary PROPosal.

D. Risso1,2 ; M. Mezzavilla3 ; L. Pagani2,4 ; A. Robino3 ; G. Morini5 ; S. Tofanelli6 ; M. Carrai6 ; D. Campa6 ; R. Barale6 ; F. Caradonna7 ; P. Gasparini3 ; D. Luiselli2 ; S. Wooding8 ; D. Drayna1

View Session DetailAdd to Schedule

1) National Institute on Deafness and Other Communication Disorders, NIH, Bethesda, MD 20892, USA; 2) Department of BiGeA, Laboratory of Molecular Anthropology and Centre for Genome Biology, University of Bologna, via Selmi 3, 40126 Bologna, Italy; 3) Institute for Maternal and Child Health – IRCCS ‘Burlo Garofolo’, Trieste, Italy; University of Trieste, Trieste, Italy; 4) Division of Biological Anthropology, University of Cambridge, CB2 1QH, Cambridge, UK; 5) University of Gastronomic Sciences, Piazza Vittorio Emanuele 9, Bra, Pollenzo 12042, CN, Italy; 6) Department of Biology, University of Pisa, Via Ghini 13, 56126 Pisa, Italy; 7) Biological, Chemical and Pharmaceutical Sciences and Technologies Department, STEBICEF, Università degli Studi di Palermo, V.le delle Scienze, Edificio 16, 90128 Palermo, Italy; 8) Health Sciences Research Institute, University of California at Merced, 5200 North Lake Road, Merced, CA 95343, USA.

The ability to taste phenylthiocarbamide (PTC) and 6-n-propylthiouracil (PROP) is a classic polymorphic trait that is mediated by the TAS2R38 bitter taste receptor gene. These taste phenotypes have been shown to be correlated with the ability to taste other taste-active compounds, as well as with food habits. Nonetheless, several features of its evolutionary significance and population dynamics are still unresolved. In particular, it is not clear why the worldwide frequency of the TAS2R38 non-taster AVI haplotype is very high, almost equivalent to that of the taster PAV haplotype. While the long-standing hypothesis suggests that balancing selection has been acting on this locus, other theories have emerged more recently. We performed a detailed analysis of the TAS2R38 gene and its surrounding regions in a sample of 5511 individuals belonging to 104 different worldwide populations. Our results show no departures from neutral expectations. This suggests that recent demographic events have had a major role in shaping the genetic diversity at this locus, suggesting a reconsideration of the classic hypothesis. We also hypothesize that interactions with the adjacent maltase-glucoamylase (MGAM) gene may have contributed to the current distribution of PAV and AVI haplotypes. One hypothesis is that the distribution of the uncommon TAS2R38AAI haplotype is interpretable as the product of a recent recombination event that occurred in Africa, after the Out Of Africa (OOA) event. Collectively, our results offer novel insights into the evolutionary history of the TAS2R38gene, showing a relaxation of the selective forces previously acting on this gene, and providing a new hypothesis for the observed present-day worldwide distribution of AVI and AAI haplotypes.

Recent polygenic adaptation in Europe.

N. Telis1 ; E. A. Boyle2,5 ; Y. Field3,4 ; J. K. Pritchard3,2,4

View Session DetailAdd to Schedule

1) Biomedical Informatics, Stanford University; 2) Biology, Stanford University; 3) Genetics, Stanford University; 4) Howard Hughes Medical Institute; 5) Stanford School of Medicine.

Adaptive evolution in recent human history remains poorly characterized. Human population genetics has focused on strong selective sweeps. However, our understanding of other selective patterns and their effects on patterns of human genetic diversity is still limited. Although there is compelling evidence for recent selection pushing increased height in northern Europe, the literature is devoid of other strong notable examples of recent polygenic selective events. We develop a non-comparative scoring method for individual polymorphisms to detect signals of recent adaptation, based on using singleton density to approximate haplotype age. Simulations suggest that this method preferentially detects recent evolutionary events with several different sweep patterns. We apply this method to 1,600 individuals from the ALSPAC cohort and confirm known selection signals in Northern Europeans, as well as broader signals of polygenic selection. We investigate associations with these signals and demonstrate that these signals are robust to population allele frequency differences in Europeans. We use this method in combination with population allele frequency differences to identify novel signals of polygenic adaptation in modern Europeans.

Strong selection at MHC in Mexicans since admixture.

Q. Zhou1,2,3 ; L. ZHAO1,2 ; Y. GUAN1,2,3,4

View Session DetailAdd to Schedule

1) USDA/ARS Children’s Nutrition Research Center, Houston, TX; 2) Department of Pediatrics, Baylor College of Medicine, Houston, TX; 3) SCBMB, Baylor College of Medicine, Houston, TX; 4) Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX.


Mexicans are recent admixture of Amerindians, Europeans and Africans. We performed local ancestry analysis of Mexican samples from two genome-wide association studies obtained from dbGaP and discovered that at the MHC region Mexicans have excessive African ancestral alleles compared to the rest of the genome, which is the hallmark of recent selection for admixed samples. The estimated selection coefficients are 0.07 and 0.09 for two datasets, which put our founding among the strongest known selections observed in humans, namely, lactase gene in northern Europeans and sickle-cell allele in Africans. Inaccurate Amerindian training samples was a major concern for the credibility of previously reported selection signals in Latinos. Taking advantage of the flexibility of our statistical model, we devised model fitting method that can learn Amerindian ancestral haplotypes from the admixed samples, which allow us to infer local ancestries for Mexicans using only European and African training samples. The strong selection signal at MHC remains without Amerindian training samples. One wonders why such a strong selection signal was not discovered by 1000 Genomes project in their analysis of Mexican samples using other competing local ancestry inference models. Our simulation studies suggested that the approach adopted by 1000 Genomes admixture analysis group, which used consensus estimates from four methods, is perhaps to blame. Finally, we pointed out that medical history studies suggested such a strong selection signal is plausible in Mexicans.

Young Northern Finnish founder population reveals enrichment of rare recessive and dominant gene variants in neurodevelopmental disorders.

M. I. Kurki1,2 ; O. Pietiläinen3 ; E. Saarentaus4 ; E. Hämäläinen4 ; J. S. Moilanen5 ; O. Kuismin5 ; M. Daly6 ; A. Palotie1,2,4 ; Sequencing Initiative Suomi consortium

View Session DetailAdd to Schedule

1) Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA; 2) Stanley Center, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA; 3) Department of Stem Cell and Regenerative Biology, University of Harvard, Cambridge, Massachusetts, USA; 4) Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland; 5) Department of Clinical Genetics, Oulu University Hospital, Medical Research Center Oulu and PEDEGO Research Group, University of Oulu, Oulu, Finland; 6) Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, USA.

Genetic variants with strong reproductive disadvantages are evolutionary constrained and remain generally rare in population. However, these variants can still exist at higher frequencies in young populations, such as Finns, when the negative selection hasn’t had time to counteract the effect of genetic drift on rare alleles. Thus, population isolates provide a valuable study design to explore the role of rare genetic variants in complex traits. In Finland, the youngest settlement is in the north and east parts of the country dating back to a small number of founder families only few centuries ago. In addition this region has higher prevalence of schizophrenia and intellectual disability (ID). We exploited this hypothesis by producing whole exome sequence (WES) and GWAS data from 352 patients from Northern Finland with ICD-10 diagnosis of ID of unknown etiology, and their 293 family members (97 trios, 109 duos and 146 index cases). The Northern Finland Intellectual Disability Project (NFID) exomes were combined with 8000 Finnish exomes sequenced in the Sequencing Intitative Suomi project (SISu, expected, we observed comparable amount of large CNVs and de novo mutations as reported in similar patient collections, both of these categories being enriched in the NFID patients. Given the genetic origin of NFID, we expected to observe variants enriched in Finland that are 1) strong acting recessive variants that seem Mendelian but account for ~1% of a 1% phenotype rather than all of a 1/10000 phenotype and 2) dominant alleles with odds ratios in the range of 2-5.As per our hypothesis we discovered a Finnish-specific recessive cause of ID in 4 cases, homozygosity of a variant in CRADD (p=4e-8). The variant is not observed in homozygous state in 61 000 individuals worldwide ( or in 8000 Finnish individuals. We also observed Finnish enriched dominant missense variants in multiple genes (OR range 3-6) including a gene encoding for TUBA1A1 (OR 5.2, p:4e-8). Significant and promising variants are replicated by sequencing additional Northern Finnish ID cases and their family members (n=315; 150 cases; 51 trios).In conclusion, we demonstrate young founder populations as a powerful resource to study rare variants. Specifically, we show that an enrichment of deleterious alleles increases power to detect causal and disease associated variants that would require very large sample sizes in more diverse populations.

A population genetics perspective on quantitative traits.

Y. B. Simons1 ; K. Bullaughey2 ; R. R. Hudson2 ; G. Sella1

View Session DetailAdd to Schedule

1) Biology, Columbia University, New York, NY; 2) Ecology & Evolution, University of Chicago, Chicago, Illinois.

Genome wide association studies (GWAS) have begun uncovering the genetic architecture of a wide array of human quantitative traits, including morphological traits like height and BMI as well as complex diseases like schizophrenia and diabetes. Interpreting GWAS results in evolutionary terms can provide us with an unprecedented insight into the evolution of quantitative traits in humans and may help guide the design of future mapping studies. Evolutionary processes, such as mutation, selection, drift and pleiotropy, shape the genetic architecture of quantitative traits but very few existing models incorporate them in a way that’s meaningful for GWAS interpretation. We extend Fisher’s Geometric Model to quantitative variation and use it to obtain predictions of the genetic architecture of quantitative traits under different evolutionary scenarios. Under this model, we relate the phenotypic effects of variants to the selection acting upon them and we see that weakly and strongly selected variants are expected to carry more of the variance than nearly neutral sites and therefore be easier to detect in GWAS. However, variants under very strong selection would be too rare to be detected. Pleiotropy is represented by the dimensionality of the trait space and pleiotropic effects weaken the relationship between effect size and selection. The overall effect of pleiotropy on GWAS success is to effectively increase GWAS power. Our analysis suggests that the increase in GWAS success with increasing study size may be highly sigmoidal for some traits and that the increase may be quite dramatic once a large enough study size is reached. This model may provide the basis for using GWAS results to infer the strengths of evolutionary forces shaping quantitative traits in humans.

Chinese population allele frequency estimations based on large-scale non-invasive prenatal testing samples.

H. Xu; F. Chen; X. Jin; Y. Zhang; H. Jiang; X. Xu

View Session DetailAdd to Schedule

BGI-Shenzhen, Shenzhen, China.

Allele frequency estimations in Chinese people were an important factor for the genetic map in Chinese population and other epidemiology studies including molecular prevalence of genetic diseases. Chinese Han populations in the 1000 Genomes Project released in 2012 have been the most widely-used database for variants especially SNPs in Chinese, providing an intact and precise map of genome variation in Chinese people and accelerating tons of Chinese population studies. With 90 Han people in China sequenced, the sampling proportion was quite small compared with billions of Chinese people and the estimated allele frequencies may be deviated from those of large-scale Chinese population due to sampling randomness. It was estimated that over 1,000,000 pregnant women took sequencing-based non-invasive prenatal testing for fetal aneuploides screening in China from January 2011 to June 2015. Low-coverage (~0.1X) WGS strategy was mostly used in clinical labs in China, which presented a large-scale and randomly sampled population and constitutes significant genetic databases for Chinese populations with informative phenotypes including territory distributions, maternal age, nationality and regions. Till now, most population-based algorithms for allele frequencies computation were developed and validated in 30X WGS data but were not appropriate for low-coverage sequencing data. It was hard to discover population knowledge from such big but ultra-low coverage data which request specific models to deal with population SNP calling, demanding computational tools and mass storage. Here in this study we developed a maximum-likelihood method to estimate allele frequency in Chinese population and applied it to NIPT data in over 150,000 samples. A Chinese genetic map of over 150,000 Chinese people was built and the allele frequencies in the whole genome in the large-scale Chinese people were studied. We also analyzed prevalence of common single-gene disorders such as thalassemia, DMD, SMA and hearing loss in different regions in China from 2011 to 2015. Our findings were compatible with current epidemiology reports in Chinese populations and showed the pictures of molecular prevalence of genetic diseases in China. It was the greatest population studied with millions of orders of magnitude to our knowledge. Our studies improved the understanding of variants in Chinese populations, promoting more potential uses of NIPT samples in population genetics.

Refining the South Asian origin of the Roma people.

B. Melegh1,2 ; Zs. Banfai1,2 ; M. Kayser3 ; B. Melegh1,2

View Session DetailAdd to Schedule

1) Medical Genetics, University of Pecs, Pecs, Hungary; 2) Szentagothai Research Centre, University of Pecs, Pecs, Hungary; 3) Department of Forensic Molecular Biology, Erasmus University, Netherlands.

Purpose: Historical and linguistic studies have suggested that Roma people, living mainly in Europe, migrated into the continent from South Asia about 1000-1500 years ago. Genetic studies, based on the examination of Y chromosome and mitochondrial DNA data, confirmed these findings. Recent genetic studies based on genome-wide Single Nucleotide Polymorphism (SNP) data further investigated the history of Roma and, among many other findings, suggested that the source of South Asian ancestry in Roma originates mainly form the Northwest region of India.Methods: In this study, using also genome-wide SNP data, we attempted to refine these findings using significantly larger amount of European Roma samples. We also had the opportunity to use more data of distinct Indian ethnic groups, which provided us a higher resolution of the Indian population. The study uses several ancestry estimation methods based on the algorithmic method principal component analysis and model-based methods that apply Bayesian approach and uses Markov chain Monte Carlo or maximum likelihood estimation.Results: According to our analyses, Roma showed significant common ancestry with Indian ethnic groups of Jammu and Kashmir, Punjab, Rajasthan, Gujarat, Uttarakhand states, e.g. with Kashmiri Pandit, Punjabi, Meghawal, Gujarati and Tharu. However, we found strong common ancestry with Pashtun and Sindhi, ethnic groups living in Pakistan. Populations of Northeast India have also strong common ancestry with Roma. These ethnic groups are Brahmin, Kshatriya, Vaish.Conclusion: We can conclude, that Northwest India plays an important role in the South Asian ancestry of Roma, but they have similarly strong ancestry with some Pakistani ethnic groups and we can find populations in the east region of North India, which also could function as a source of Indian ancestry of Roma. However, ethnic groups of the southern region of India do not show strong relationship with Roma people, living in Europe.

Recessive disease gene mapping in India: extraordinary opportunities for understanding health and disease.

N. J. Nakatsuka1 ; K. Thangaraj2 ; P. Moorjani1,3,4 ; A. Tandon1,3 ; N. Patterson3; L. Singh2 ; D. Reich1,3,5

View Session DetailAdd to Schedule

1) Department of Genetics, Harvard Medical School, Boston, MA USA; 2) Centre for Cellular and Molecular Biology, Hyderabad, India; 3) Broad Institute of MIT and Harvard, Cambridge, MA USA; 4) Department of Biological Sciences, Columbia University, New York, NY USA; 5) Howard Hughes Medical Institute, Harvard Medical School, Boston, MA USA.

Modern India is a region of remarkable cultural, linguistic, and genetic diversity with over 4,500 anthropologically well-defined groups. Large genetic differentiation has been observed between many of these groups, reflecting strong founder events with effects that have been preserved in some cases for thousands of years due to low genetic exchange between groups. We undertook a systematic survey to assess the strength of founder events in over 1200 individuals from over 230 Indian groups genotyped on Affymetrix (6.0 and Human Origins) and Illumina (650K) arrays. These groups include tribes, castes, and religious groups with a wide-range of census sizes and spanned every state in India. We also analyzed Ashkenazi Jews and Finns, two groups known to have high rates of recessive diseases due to strong founder events. To determine the severity of founder events, we measured the total length of the genome inherited identical-by-descent (IBD) in each group. The data were phased with Beagle 3.3.2, and detection of IBD fragments was performed using FastIBD and GERMLINE. The HaploScore algorithm was used to filter out false positive fragments. To reduce the influence of recent consanguinity, we excluded closely related individuals detected by the presence of very long IBD segments. We quantified the IBD score for a group as the combined length of IBD segments between 3 to 20cM long, averaged over all pairwise comparisons within the group. We find that over 100 Indian groups in our dataset have founder effects stronger than in Ashkenazi Jews and Finns, including many groups with large census sizes (>1 million). This represents an extraordinary opportunity for biological discovery and potential reduction of genetic disease burden through mapping of recessive disease genes and prenatal counseling. Future work should focus on better characterization of the history and relationships amongst the founder events, as well as mapping variants associated with genetic diseases in the groups with the strongest founder events.

The UK 100,000 Genomes Project.

K. Smith1 ; J. Davies1 ; A. Devereau1 ; T. Fowler1 ; T. Hubbard1,2 ; E. McDonagh1 ; M. Parker1 ; A. Rendon1 ; L. Riley1 ; A. Rueda Martin1 ; M. Ryten1 ; E. Thomas1 ; C. Turnbull1 ; M. Caulfield1 ; Genomics England

View Session DetailAdd to Schedule

1) Genomics England, London, United Kingdom; 2) King’s College London, London, United Kingdom.

The UK’s 100,000 genomes project has begun consenting participants to its main programme through a network of 11 NHS Genomic Medicine Centres (GMCs) spanning over 70 local delivery partner institutions. It has four main aims: (1) to bring benefit to patients; (2) to create an ethical and transparent programme based on consent; (3) to enable new scientific discovery and medical insights; and (4) to kick-start the development of a UK genomics industry.The project focuses on patients with a rare disease and their families (approximately 50,000 genomes), as well as patients with certain common cancers (about 25,000 tumour-normal pairs). Whole genome sequencing of DNA extracted from blood samples is performed to at least 30x depth for germline samples and 75x for tumour samples, using Illumina’s HiSeq X Ten sequencing platform. PCR-free library preparation is employed when feasible. All germline samples are required to achieve at least 15x high-quality coverage over 95% of the autosomal genome.One of the innovations of the project is the collection of phenotype data from the GMCs in a comprehensive and standardised manner, either through direct data entry or by the population of data models directly from Electronic Health Record systems. Data models for each of the 120 currently eligible rare genetic conditions, as well as for cancers, have been developed in consultation with clinical experts to support clinical data capture and phenotyping. This approach was designed to enable clinical interpretation and large-scale genomic research. Clinical data models include questions about the presence or absence of human phenotype ontology (HPO) terms, additional clinical tests, and family history. To date, data models include 1370 different HPO terms, with a median 37 terms per condition and range 2 (hyperinsulinism) to 116 (mitochondrial disorders). Over 200 phenotypes have been proposed for addition to the HPO. One of the important benefits of the programme will be to obtain HPO term frequencies based on large numbers of patients, rather than occurrence in OMIM entries, informing more powerful variant prioritisation strategies.

• Category: Science • Tags: ASHG 2015

ShowCoverVarieties of evolution

Years ago I remember Joe Thornton asking me if I wanted to be an evolutionary biologist, and I didn’t have a really good answer. Yes, I have degrees in biology and biochemistry, but it seemed weird to make your living studying evolution. It had long been a hobby of mine, back to the descriptive paleontology days, and in my adulthood more in the domain of theory undergirded by genetics. But it had always been an avocation until I decided to go to graduate school. At this point I’m focused on mammalian genomics professionally, with an obvious interest in domestication. But sometimes you get too narrowly focused, and it’s important to take a step back, and evaluate. Some hot chains to explore different peaks if you will.

516FJKA926L._SY344_BO1,204,203,200_ On Twitter I got into a discussion with Nathaniel Comfort about his review of Richard Dawkin’s latest book, Brief Candle in the Dark: My Life in Science. Broadly Nathaniel takes the perspective that Dawkin’s views on evolutionary biology are somewhat anachronistic, that they’ve become frozen in the 1970s, when he was beginning his career as a science communicator, and still primarily focused on research. Comfort’s contention is that things have changed, but that Dawkins has stood in place. Specifically he seemed to suggest that the rise of genomics has changed how we view evolution. This I am skeptical of. Evolutionary biology pre-dates genetics, and genetics pre-dates our understanding of DNA as its concrete substrate. Genomics changes a great deal (as Graham Coop observed genomics makes a big difference in understanding the molecular dynamics of evolutionary process, but says less about the details of phenotype). But I’m not convinced that it has revolutionized our understanding of evolution. Rather, it has had an evolutionary effect on the broadest scale. There are a diversity of opinions on a variety of topics, and Richard Dawkins’ views are not necessarily “orthodox” on all counts, but, he’s not an out of touch dinosaur. There are serious active researchers who would defend his oeuvre (here’s something I found out, a sequence of advisers and students, Richard Dawkins → Alan Grafen → Laurence Hurst → Gil McVean). In fact, I would contend Dawkins is closer to the “center” of opinion among evolutionary biologists than the “extended synthesis” crowd, who many feel are a touch self-aggrandizing.

But, I do think “dissenting views” are interesting, and illuminating. One think I’m going to endeavor to do in the next few months is get around to reading James Shapiro’s Evolution: A View from the 21st Century. I purchased it when Jerry Coyne mentioned that for a limited time it was free on the Kindle, but it’s sat there ever since.

Commenter’s policy

I continue to have problems with commenters not understanding the ground rules, and becoming enraged when I tell them they aren’t very smart. This has been a bigger issue since I moved to Unz Review. I suspect part of this is that my policy is somewhat of a shock set against Ron’s radical libertarianism in this regard, and the general liberality of most of the other bloggers here. But I need to keep reiterating the framework. It’s something I’ve come to feel works for me, and I’ve evolved organically toward it over 13 years of blogging.

There are two general things to consider. How comments relate to me. And how they relate to you (commenters). First, me. I don’t talk about my non-blog life in detail here because it’s not important, nor is it really any of your business. But I’m a person who is in graduate school (which often involves me being a teaching assistant as well as doing research in the lab), does a fair amount of consulting work in genomics, and, has a family (small children) and a wide circle of friends. An issue of side interest to some of you is that one reason I didn’t want to dwell on what happened with The New York Times is that I’m just very busy, and most of my professional focus isn’t on writing at all. I had other things to do and so didn’t really have the marginal time to be sad about missing out on an opportunity that literally fell into my lap, and was never going to be a major income source. I know it’s unhealthy, and I worry about it, but one way I “get shit done” (as David Mittleman would say) is that I don’t sleep as much as I should. This means I’m cranky, and my time is precious. Every comment I read has an opportunity cost.

So how do I justify reading and allowing comments? It’s important that the commenters add genuine value. That means new ideas and concepts which I find interesting. Unfortunately there are several classes of commenters who don’t fall into these categories. First, there are those of low intelligence. There’s not much to say here, and the whole situation is unfortunate for everyone involved in any intellectual enterprise. Second, there are the class of commenters whose priors are so different that there’s no point in having a discussion. For example, if a commenter is a Creationist, even if they are intelligent, there’s really not going to be a fruitful exchange. So I don’t post Creationist comments (I get one about once a month). Of course Creationism is an extreme case. Consider this comment. The individual is well informed and intelligent, but unfortunately I’m going to ignore the whole comment because I disagree with one of the axioms which you have to hold to make the comment worth reading in its entirety (that scripture/text are in a deep way determinative of a religion). My disagreement here is predicated on my reading of a particular domain of scholarly literature, and I came to this conclusion after holding the position of the commenter before changing my views. I understand many people disagree with me, so I often post these comments, though I generally respond as I did, that I simply don’t accept the premise of the argument and so it’s pointless to continue. Unfortunately, the next stage for many commenters is anger and accusation that I’m stupid or ignorant. Rather than taking disagreement at what it is these commenters begin to hector me after I don’t recognize their self-evident genius, at which point I have to ban them (I’m not saying this has happened with this commenter, just that it often does when I dismiss someone’s axioms and so render their own logical/analytic enterprise moot in relation to me).

This gets to the fact that ultimately I am the judge of what’s useful/edifying for a comment thread. Commenters sometimes seem to think that the threads exist to show off their erudition or filibustering capabilities. In other cases people get into juvenile debates where they really seem to believe winning an argument on the internet is something that does anything. The aim of discussion in my opinion is less about convincing your interlocutor, and more about fleshing out and dissecting your own opinions. On a related matter, often commenters want to talk about their own hobby-horses, or move the discussion into a topic of their preference or choosing. This is tolerable, but I have my limits, and I particularly am harsh on monomaniacal individuals (well, except perhaps the guy who kept going back to the lack of female pubic hair today; that guy was just funny). Some commenters think that I’ve been caught or I’m trying to hide something when I don’t want to talk about what they want to talk about. Actually, I just don’t want to talk about what they want to talk about.

Finally, there is the whole issue of commenters who are insulting and/or awesome. Let me start with the awesome ones. They are awesome in their own eyes. These the individuals who are supposedly incredibly smart, despite me judging them to be rather dull. That might simply be due to the fact that I’m dull. But that makes me wonder: why is that they are reading me, and I’m not reading them? These are people who are reading what I have to say, and then proactively leaving a comment on what I have to say, who then throw a fit when I tell them to please not comment in the future. Despite them invariably telling me I’m a loser, and explaining at length how awesome they are (sometimes for paragraph after paragraph!), I suspect that something related to ego is going on here (yes, they accuse me of ego, which is fine, I have a big enough ego that I really don’t give a shit what they think). And last but not least there are the insulting/presumptuous types. These two tendencies go together; the correlation is very high. Being told I’m stupid isn’t really an insult, it’s more a descriptive hypothesis. I could be stupid, in which case I invite you never to read me or comment here. But sometimes people leave weird creepy comments about my race/personal history/background that they have no knowledge of. For example, the commenters who leave statements of the form “you were obviously raised abroad, so you can’t understand Americans….”, or something of that variant. Or, “your mentality is obviously South Asian….” A lot of these are in the what?/not even wrong category. I’m sure they have their own logic, but the statements are often difficult to parse, and usually presuppose facts which are false (e.g., I was raised abroad [my formative years were spent in the inter-Montane West], I want to marry a white woman but never will [box already checked, hope your head doesn't explode], I’m a Muslim [no], I’m pro-life [no], etc.).

In conclusion, if I ban you or don’t post your comment, perhaps you should be insulted and angered. But rather than leaving a bizarre and self-indulgent comment and wasting your own time (after all, you’re an awesome mind and your time is precious!) you should just move on. A few of the readers here on this blog have become friends, but in general I’m not here to make friends, those I already have are sufficient. I’m here to extract interesting information out of you if you want to engage.

linear-300x300Humanity as a plesiomorphy

So, Homo naledi. About two years ago I randomly happened to be in town when Lee Berger gave a talk in Washington D.C. So I’ve known since then what he’s been sitting on. People are asking me: so what? I’m an aspiring evolutionary geneticist, not a paleontologist, so why should my opinion even count? But here’s what I’d say: the likelihood of conscious deposition of these individuals who morphologically are very different from modern humans makes us reconsider what “human” is. My own opinion on this changed when Luke Jostins crunched the data and showed that the cranial capacities of all hominin lineages seem to have been increasing over the past 2 million years. Relevance? I don’t think that “behavioral modernity” was a contingent fluke. Rather, I think that once our own lineage reached a particular point in evolutionary development ~2 million years ago some sort of adaptive ratchet kicked in, and humanity was inevitable. The Neanderthal Parallax then could be understood as alternative history in a fundamental sense. Being “behaviorally modern” is not a derived character of our particular lineage of Homo sapiens. Its potential at the base of our line, as far back as the australopiths.

51yuNuckdiL._SY344_BO1,204,203,200_Human evolution

In light of the recent discovery, a friend asked me what I should read about to understand human evolution. Unfortunately, that’s like asking what you should read about to understand physics as quantum mechanics was being developed. Things are just changing so fast. I would say that one book I read a few years ago has struck me as very useful when it comes to the paleoanthropology, The Humans Who Went Extinct. In particular the author focuses on the role of the Gravettian culture in developing the toolkit which allowed modern humans to conquer Northern Eurasia and ultimately push beyond Beringia. And, I have to say that Richard Klein’s The Dawn of Human Culture is still very interesting and important. It basically tells you what the dominance of an “Out of Africa” and total replacement framework had led many to conclude: that modern humans were a nearly a saltation that occurred 50,000 years ago. That modern humans were in fact the humans, the only ones with speech, and therefore culture.

Basal Eurasians

It looks like ancient Anatolian genomes will answer a lot of questions. Hopefully. I’m going to ASHG 2015, and there are some posters there. So updates soon. I looked at some data…and it is weird that the LBK “First Farmer” has such strong affinity to Cypriots, as well as groups like Tunisian Jews. I got into a discussion on Twitter with Iosif Lazaridis, and he pointed out that there is less Basal Eurasians in modern Middle Eastern groups than in the past. I had thought before digging into the data myself that North Middle Eastern groups (e.g., Armenians) would have a lot of Basal Eurasian, but they don’t seem to have an inordinate proportion.

• Category: Miscellaneous • Tags: Open Thread

If you care about human evolution, keep an eye out for reports on what happened in South Africa a few years ago. A massive cache of bones was discovered. I’ve been privy to a few preliminary findings, and the implications are explosive, revolutionary, all the hyperbolic language that I tend to avoid. This is a big deal, not just because of the results, but also because of the possibility that this will be an inflection point in how paleoanthropology is done. That is, rather than hoarding fossils the “sharing economy” of science will make itself felt within the individualistic and proprietary domain of the fossil hunters.

If I did the timing right the announcement should drop in a little over a day from when I post this. Keep track of Lee Berger and John Hawks’ Twitter.

• Category: Science • Tags: Paleoanthropology


There’s a new paper in PNAS, Ancient genomes link early farmers from Atapuerca in Spain to modern-day Basques. It is a nice complement to the earlier paper on an earlier Iberian Neolithic sample. These individuals all date to a later period, most ~5,000 years ago, and one ~3,500 years. Despite the media hype, the results of this paper were pretty much expected, and it’s the final nail in the coffin of the idea that the Basque language and culture are relics of Paleolithic Europe. Rather, it confirms the result that the Basque descend in large part from agriculturalists who brought the Neolithic revolution to Europe. The genetic result began to be clear as early as 2010, when PLOS BIOLOGY published A Predominantly Neolithic Origin for European Paternal Lineages. The interpretation of that paper was wrong in some of the specific detail. It is quite likely that the R1b haplogroup did not come with the first farmers, but that it was a later arrival. But, the authors were early in refuting the contention that the high frequency of this lineage among Basques was ipso facto evidence that it was a primal Paleolithic signature. In fact much of that work exhibited some circularity, with the premise that Basques were primal descendants of hunter-gatherers being the linchpin for archaeogenetic inferences which then came back around to pointing out that the intuited genetic distinctiveness of the Basques was further evidence of their uniqueness.

Screenshot - 09082015 - 10:30:26 AM The admixture plot to the left reiterate a few things I’ve been asserting of late. First, the Spanish Basque are unique in having weaker signatures of being impacted by North African gene flow and the genetic signal associated with people from the Eurasian steppe than other groups in the Iberian peninsula. This isn’t a new finding. What is interesting though is that the authors confirm through a variety of methods that the Basque have Western European hunter-gatherer gene flow which post-dates the arrival of the first farmers. The earlier paper I allude to above suggested that the Iberian Cardial individual, which predates the oldest of these samples by ~2,500 years, had hunter-gatherer ancestry which exhibited affinities with a Hungarian, and not Spanish, sample. In other words, the first European farmers were themselves a compound population to begin with. Subsequent to their expansion all across Europe they seem to have absorbed local hunter-gatherer populations. This is the resurgence of hunter-gatherer ancestry over thousands of years that David Reich has mentioned before. This was a phenomenon across much of Europe, not just in the Iberian peninsula.

Which brings us to how we go about solving this puzzle. It seems that archaeologists and anthropologists have to start tackling the issue. One possibility is that the human geography of ancient Neolithic Europe was intercalated, with hunter-gatherer populations occupying zones between the expanding farmers which were not amenable to their agricultural practices. I suspect that the Pygmy example might be informative, as this group has had a long period of symbiotic coexistence with agriculturalists. Note also that the results from earlier work suggests that the fraction of hunter-gatherer ancestry increased even before the arrival of the Eurasian Steppe populations, which changed the character of Europe’s north, and to a lesser extent south.

Finally, there’s the enigma of the Basque language. The authors of the above paper mention possible connections with Paleo-Sardinian, which predates Romance dialects on the island. And Sardinians, like Basques, exhibit strong signatures of farmer ancestry. In fact, Sardinians have more farmer ancestry than any other Europeans, likely due to marginal pre-Neolithic presence on the island. The genetic closeness of the farmer groups from Spain up into Germany in the early Neolithic indicates a rapid expansion from a small founding stock with roots in the Balkans and or Anatolia. This sort of expansion is highly likely to be accompanied by the spread of the common language and culture of these people, and in that way the Basque can actually give us some vague insight as to the cultural character of the first Neolithic people, not, the hunter-gatherers. These results reiterate that some of the ancestry of the Basques does derive from the people of Paleolithic history in a genetic sense. But perhaps more importantly, it points the likelihood that there was a massive cultural rupture between Ice Age and Neolithic Europe, and the Basque stand with the latter.

• Category: Science • Tags: Ancient DNA, Basque, Genomics

Giant study poses DNA data-sharing dilemma:

Next month, the group is expected to release a project plan. Observers are eager to learn its answer to a key question: how much information about disease risk, especially genetic data, will the project share with participants?

That issue is the subject of much debate. Dishman and others say that participants should at least have the option to see all their personal data so that they can investigate their own health, just as he did. But some specialists in the field say that showing participants their data is irresponsible, because the information is challenging for people to interpret and its significance is often uncertain.

Most genetic variants linked to disease increase risk only slightly, yet people who discover that their genome holds such a variant might worry excessively or seek unnecessary medical tests. Or they might do nothing: the limited research on how people react suggests that, far from causing panic, information about common variants of small-to-moderate effect does not seem to motivate people to make recommended long-term behavioural changes to lessen risk. “Unless you give people the tools and the skills to deal with the raw data, I don’t see how you could give them the raw data,” says Brian Van Ness, a geneticist at the University of Minnesota in Minneapolis.

Years ago I had a short conversation with Mike Snyder where we discussed the fact that for human genomics to become useful there had to develop a culture of openness and ubiquity of data. That is, you need huge sample sizes and lots of information on those samples. Several years ago I elaborated some of my views with an essay in Genome Biology, co-written with my friend David Mittleman, Rumors of the death of consumer genomics are greatly exaggerated. The point is that the gains to genomics, and what is now being called “precision medicine”, is really going to occur when there is widespread adoption of sequencing and comfort among the populace with analyzing the intersection of the sequence data with phenotype data. That is, there are returns to scale. But all this sounds very “Brave New World” to people, and they are not comfortable with it. Yet. One way to make people more at ease is to give them some “ownership” of the process. They may never analyze their raw data, but they’ll have it, forever. Or, they may analyze it with third-party tools such as Promethease. Ultimately personal genomics needs to be personalized, and if you lock the data behind gates, that totally undermines the message.

Second, I think on normative grounds one can say that it is not ethical and right for researchers to withhold your own sequence when they are attempting to do research with your data. After all, they are making gains career wise with your own information, your raw data. It strikes me as unjust that they’d withhold that data because you might not do the right thing with it. That it’s up to them to decide when you can see information on your own body. There is perhaps a place for paternalism on some issues, but in the generality this is not one hill that I think many geneticists will die on. Those who stand athwart history are going to look foolish in hindsight.

• Category: Science • Tags: Personal Genomics
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"