The Unz Review - Mobile

The Unz Review: An Alternative Media Selection

A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media

Email This Page to Someone

 Remember My Information

 Gene Expression Blog

• Category: Humor

Screenshot 2016-08-26 19.23.09

Quartz has an article up, 23andMe has a problem when it comes to ancestry reports for people of color, which I want to comment on at length. Though literally taken the title is not something I’d disagree with too much, the tone and details I have serious issues with.

First, some disclosure. Hong talked to me on the phone for an hour about this story. Mostly we talked about her Korean ancestry results. More on that later. Second, I consulted for 2.5 years for Family Tree DNA, am friends with Spencer Wells (who is quoted), and am on friendly terms (I’d like to think!) with Joanna Mountain, and quite respect many of the scientists at 23andMe (e.g., Kaisa Bryc and Ivan Juric off the top of my head).

I will go through the article point by point. First:

I doubt that most 23andMe users realize how paltry the company’s data is for non-Caucasians. For example: The data set that 23andMe used to generate my report has 76 Koreans in it, according to Dr. Joanna Mountain, the company’s senior director of research. 76 Koreans. It is estimated there are at least 7 million Koreans living outside of the Korean peninsula—including 1.7 million in the US—among a worldwide population of 83 million.

Seventy-six Koreans seemed small to me, but what do I know? I’m just a journalist. So I spoke to geneticist Spencer Wells, founder and former director of National Geographic’s Genographic Project (arguably a 23andMe competitor), which he ran from 2005-2015. “[76] is a really low number,” he concurred.

The small sample sizes seem really, really problematic if you are a lay person, or a journalist. The issue is that with genotype technology that looks for common polymorphisms you really don’t get that much more information from 1,000 individuals than you do from 100. All things equal, more sample size is better, but the gap between 10 and 100 is much much greater than 100 and 1,000 or 100 and 10,000. You can see this in the robustness of results for model based clustering conditional on different sample sizes. For a homogeneous population like the peoples of the Korean peninsula, who seem relatively panmictic, a bigger sample size would have only marginal effect on the overall outcomes using these methods (also, it might matter if you were looking at low-frequency alleles from whole genome sequencing).

Before I talked to Hong I checked in with a friend who was half north Korean (in that her father’s family was from the northern half of the peninsula and migrated south) and half central Korean (i.e., her mother’s family was from around Seoul). Just like her husband, whose family was from Busan in the far south, her results came back as 99% Korean. Some genetic research has been done on Koreans, and there just isn’t that much structure. The Koreans have a composite origin if you go far back enough, but they’ve been intermarrying with each other a long time.


Also, astonishingly, the report shows that I am 13.4% Japanese and 14% Chinese—and only 61.6% Korean. I was looking forward to watching my parents freak out. My sister texted me, “Oh [Dad will] probably blame Mom.”

To my disappointment, my parents did not freak out, nor did they get into an amusing argument about which of their ancestors was the ho. Because they simply did not believe the data. And, for once, they were right.

The public relies on journalists for the truth. Sometimes the truth can be slippery. But sometimes it is clear. Most of conversation between Hong and myself was about her Korean ancestry. As I said to her, I asked a handful of my Korean friends about their 23andMe results before we spoke. From that I told Hong I was 99% sure that she had recent non-Korean ancestry. 23andMe’s results are really robust. I tried to emphasize that over and over. Hong can believe what she wants, but it is obvious that she almost certainly has non-Korean ancestry relatively recently in the past.

Because 23andMe uses chromosome painting, you can see she has very long segments of inferred Chinese and Japanese ancestry. This non-Korean ancestry is probably from within the last three generations because ancestry tract lengths indicate that recombination hasn’t broken apart the associations across the chromosomes (there are 20-40 recombination events across the genome per generation).


I asked Wells whether my percentage breakdowns of Korean, Chinese, and Japanese meant anything. “Yes,” he said, “but I think it is misleading to go to a decimal place or even to go out two digits.” Wells said that another problem with the data is that “Most of those [samples] are from the US. They’re not terribly useful for studies of indigenous composition—which is effectively what this analysis is trying to do.”

I had a long text conversation with Spencer on this after the article came out. I can see where he’s coming from. And 23andMe does have a shortfall of indigenous and non-European samples. But as I said, I asked around to Korean friends who had used 23andMe before and the population is pretty homogeneous, and the friends’ results I cited above were representative. I have also worked with and seen samples from Family Tree DNA, and it’s the same story. There might be undersampled populations from Korea, but I’d bet against it. Koreans are relatively homogeneous, with a position between Japanese and North Chinese. Where you would expect them to be.

Spencer is correct about the decimal places issue. They give people a false impression of precision. I do know that scientists within DTC companies struggle against it. But scientists don’t always win these arguments.


I also interviewed Harvard geneticist Robert Green, who made the important point that private companies have different methods and standards from those of an academic lab. “There is a difference between analysis you can do with hundreds of [genetic] markers at a research level, and the kind of analysis that even the best companies can do, which is more an approximation,” he said.

Green is a medical geneticist who does great work. But I’ll be generous and assume he’s taken totally out of context here, because what he says makes no sense. The genotyping platforms do have error rates (no-calls, mistypings, etc.) on the order of 1%. But they’re using hundreds of thousands of SNPs. This error rate doesn’t matter too much for what 23andMe is doing in relation to ancestry. And with population structure inference these errors usually don’t cause a major issue if they aren’t systematic.

Then there’s this:

A few of the geneticists I interviewed for this article (but not Green or Wells) outright accused 23andMe of commercially driven ethnic bias. For example, no distinction is made between northern and southern Chinese, who have very different traits. This was a serious allegation, so I put the question directly before 23andMe’s Mountain. “As a scientist, I find that insulting,” she said in a phone interview.

I brought up the issue with the Chinese to Hong, and I apologize to Mountain here if it came off as offensive, because I certainly didn’t mean it that way. My point, which I’ve brought up for years both in public, and when I have consulted for DTC companies, is that South and East Asians are huge groups, and it’s incongruous that they aren’t differentiated as much as the Europeans. These tests basically tell you are South Asian, or Chinese, or Korean, or Japanese. In the case of Koreans and Japanese there isn’t that much structure within these groups, but that is not the case with the Han Chinese. There is an decent amount of structure, but last I checked 23andMe has a catchall Han Chinese group. Why? I’ll get to that later. (It’s not because they don’t have the data.)

Though I disagree with the tone and the emphasis, a simple inspection by Hong has shed light on something that has been glaringly obvious in the genetic genealogy community: there is laser-like focus on differentiating very close Northern European groups, such as Irish and English, and not so much emphasis on differentiating diverse populations such as South Asians. This was one thing I did talk to Hong about at length. I don’t think it’s crass racism, and I think that I made that clear to her, but I’m not happy with the situation either (23andMe representatives know I’m not happy, and have talked to me about it at ASHG).

The final sections involve Hong reviewing the disparities in sample representation. As I said above, some of this overdone. But, it is a little ridiculous that there are only a few hundred African population samples in their data. Granted, it turns out that between-population genetic distance in Africa is actually not as much as you’d think based on aggregate variation (the within population variation is what makes all the news). I think Hong is correct that 23andMe should have made more effort on sample collection these past few years…but I’m not CEO of 23andMe, and Joanna Mountain and her scientists don’t call all the shots. I think Hong’s piece leaves Mountain and the researchers holding the bag for something that really isn’t their doing (perhaps it is, but I’m really skeptical of that).


Could the company be doing a better job with collecting ethnographic data? “Absolutely they could,” Wells said, “but it’s not their raison d’être.” Which, of course, is pharma and health research. Fair enough—it’s their money. But how about a disclaimer attached to the ancestry part of the report? Like, “for entertainment purposes only?” Because data based on 76 Koreans (or any other ethnic group) is definitely not worth potentially causing family discord or a blood feud. I don’t know whether the company understands the realities of deadly global ethnic tensions and the potential damage created by people’s trust in these reports.

I think Spencer has highlighted the major dynamic here: 23andMe is pivoting towards biomedical research. It has a database of north of a million, mostly European-origin individuals. The real money now comes from leveraging the database to collect information on health, and combining it with the genotypes they already have. On the margin, getting greater population diversity is probably not a major avenue by which they could gain higher valuations. And getting from one million to ten million genotypes is nothing without increasing their database of phenotypes.

The real story here is not one of racism. It’s one of capitalism. Most of 23andMe’s customers are white European in ancestry, and a disproportionate number of those are Northern European. Is it a surprise that their tools breakdown Northern European ancestry so finely? That’s their customer base.

Second, many Asians I’ve talked to are relatively uninterested in fine-grained breakdowns in their ancestry. For several years I worked with an engineer from Fujian, and his Family Tree DNA results showed that he was shifted toward the southern end of the north-south Chinese cline. He didn’t care at all, because he was from Fujian, so of course he knew this. Many Asians seem to have this attitude where the ancestry results are viewed as confirmatory. Hong’s case, where there was a surprise, is exceptional.

If 23andMe wanted to they could easily breakdown Asians into further subcomponents. I think there are two reasons they don’t want to aside from the firm’s recent focus on health and pharma. First, they don’t have that many Asian customers. Second, their Asian customers might actually get a bit irritated!

Ultimately, Hong can think whatever she wants to about her 23andMe results. But the data are out there. It’s pretty obvious that unless there was a sample mix-up, she has recent Chinese and Japanese ancestry (she could put the raw results in the public domain and have people cross-check with other methods, like PCA; I’m pretty sure they would confirm the 23andMe results).

On a last nerdy note: the data generated by DTC companies is great. Their Illumina SNP-chips are really good, with 99% or so correct-call rates. Hong referred to data in the piece when she really meant results. The thing is that results are basically generated through a sieve of methods geared toward human digestibility. 23andMe and other DTC companies differ because of different methods and parameters in those methods, that are determined by what humans want out of these techniques. But the data, that’s pretty straightforward and robust.

If you are interested in a more philosophical take, Joe Pickrell’s What is ancestry?

Addendum: My conversation with Hong was very wide-ranging. We talked about EDAR, random mating populations, and local ancestry deconvolution. Well, perhaps not in those words. It’s a little saddening to me that ultimately what came out of all that is a piece which tries to paint 23andMe as prejudiced against minorities. The only prejudice they exhibit as a firm is against smaller market share.

• Category: Science • Tags: 23andMe


No matter the Yelp reviews, if it doesn’t have dry pot or whole boiled fish on the menu, not worth it. Also, should feature something where the peppercorn is salient.

• Category: Miscellaneous • Tags: Sichuan cusine

20160821_152743 (1) I got this hot sauce at Whole Foods. The original Whole Foods.

What a disappointment. Salty. Without much other flavor besides the spice. It was like a watery spin on Louisiana hot sauce. I couldn’t taste the “aromatic spices” and “fresh herbs.” And don’t tell me it is because it’s too spicy, I didn’t find it too spicy. I did find it very salty though.

• Category: Miscellaneous • Tags: Hot Sauce


In 2011 I was having dinner with an old friend who was an engineer at Intel. He also has a Ph.D. from MIT. Smart guy. But when I mentioned casually offhand that we were all a few percent Neanderthal (outside of Africa), he was surprised. I was a bit shocked, as I explained that this was a huge science story. The Neanderthal genome had been published the previous year. How could my friend not have known?

He was totally unembarrassed, and told me I overestimated how closely the public followed genetics and paleontology. I’m sure he was right. But it’s hard to remember sometimes.

We’ve gone further beyond where we were in 2010. We now have a really good grasp of a lot of population dynamics in Eurasia over the past 20,000 years. Probably the best place to start is with this preprint, The genetic structure of the world’s first farmers. But the general outlines were already evident a few years back in Toward a new history and geography of human genes informed by ancient DNA.

Most of the world’s population seems to descend from a mixing of a set of groups which 10,000 years ago were distinct. How distinct? We’re talking about Fst values on the order of 0.10, which means that ~10% of the variation genetically is partitioned across two pairwise populations. That’s about what you see between Europeans and Chinese today. Some of the Fst values were a bit higher, some lower, but the 0.10 seems about right.

BlankMap-Worl To make it easy for some of you, I’ve labeled and placed the approximate locations of ancestral groups to modern Northern Europeans ~10,000 years ago. What I’m trying to represent is a map which shows the modal regions of distribution of ancestors that Northern Europeans today had 10,000 years ago. So, for example, since ~15% of the ancestry of Northern Europeans is “Ancient North Eurasian” (ANE), a lot of ancestors of Northern Europeans alive today would be living somewhere in the broad expanse of Central Eurasia (now, because of various demographic events the number of ANE was probably lower than farmers, perhaps lower than the 15% contribution to the modern genomes).

A substantial proportion of the ancestry of Northern Europeans is “European hunter-gatherer,” dating to the Pleistocene. But here’s the kicker: most of that ancestry dates to after the LGM, to about ~15,000 years ago. The really deep Pleistocene ancestry in Europe is only found at very low levels now.

The final issue is that a lot of the phenotypes that we racially code are recent. This probably explains why groups like the Kalash and Nuristanis can look more like Europeans than South Asians, but they’re genetically more like South Asians.

What does any of this have to do with non-scientific things? I don’t really know. My interest in population structure is intellectual, not personal. But a certain type of person should probably stop talking about how white people have been in Europe for 40,000 years. First, the ancestors of modern Europeans 40,000 years ago were almost all residing outside of Europe. An assertion that holds until 15,000 years ago. And most would still be resident outside of Europe 8,000 years ago as depending on how you count/calculate* And, perhaps more importantly, the typical phenotype of Northern Europeans probably really coalesced only around ~5,000 years ago.

* Definitely true for Southern Europeans, but conditional on Northern Europeans depending on where you draw Europe’s eastern boundary.

Addendum: I stole the title from John McWhorter’s book, Our Magnificent Bastard Tongue.

Also, this is not to say that

1) population structure today is trivial in a phylogenetic sense, it isn’t.

2) it is not to say that population structure functionally irrelevant, it isn’t.

• Category: Science • Tags: Genetics, Genomics

So I don’t really have strong opinions on the whole controversy over women’s sports at the elite level…mostly because I have a really hard time following all the logic. For me the biggest problem seems to be that we have two categories, men’s and women’s, and there are those who are arguing that they’re actually nearly plastic catchalls…which then suggests to me we shouldn’t have two categories in the first place in competition at the highest levels.

With that in mind, D. J. Grothe points me to this prescient interview from a few months back, Hyperandrogenism and women vs women vs men in sport: A Q&A with Joanna Harper. Joanna Harper is a transwoman who is (was?) also a competitive racer and a sports scientist. This portion is where the facts stand:

I would also like to relate a two-part epiphany that I had after my transition. In 2005, nine months after starting HRT, I was running 12% slower than I had run with male T levels; women run 10-12% slower than men over a wide range of distances. In 2006 I met another trans woman runner and the she had the same experience. I later discovered that, if aging is factored in, this 10-12% loss of speed is standard among trans women endurance athletes. The realization that one can take a male distance runner, make that runner hormonally female, and wind up with a female distance runner of the same relative capability was life changing for me.

As they say, “read the whole thing.” It’s long, and detailed, and doesn’t offer easy answers. Ultimately the reality is that no “solution” is going to be fair to world-class athletes. But, it’s probably important to remind ourselves that it is also unfair to those of us without the genetics of world-class athletes, and we seem to be OK with that.

Compare and contrast with this piece from Let Caster Run! We Should Celebrate Semenya’s Extraordinary Talent. The title really captures the reality that it was pretty obvious that the author was going to come down on one side, and would make a lawyerly case. Rather disappointed with Nate Silver’s shop.

• Category: Science • Tags: Science, Sports

61lWPI+qpGL._SY344_BO1,204,203,200_ Guy Gavriel Kay’s Children of Earth and Sky is set in the same world as the Sarantine Mosaic duology, and the Lions of Al-Rassan, The Last Light of the Sun, and A Song for Arbonne. I’ve enjoyed Kay’s work for more than half my life at this point, so no surprise that I enjoy Children of Earth and Sky. As I’ve noted before, Kay is arguably the world’s greatest historical fantasist, and for someone like me it’s always pleasurable to make connections between our own real history, and his secondary creation. This sort of fantasy is more magical, than characterized by magic.

I know I have readers in India because of IP addresses. Keep an eye out for my byline in India Today, where I’ll make some contributions now and then. The first should drop this Friday in print and online, a short review of Shadi Hamid’s Islamic Exceptionalism. An Indian friend told me that India is one nation where the sales of print are actually increasing, so I’m curious how this will go.

On Twitter most of my blocks come because I’m being tweeted at directly by someone. If I don’t follow you on Twitter in most cases I don’t want to be bothered. The main reason I block isn’t because I’m a coward or I feel unsafe. It’s because the person is probably stupid, and starting to annoy me. Sometimes, it’s because they want me to make a point that they want to make. Needless to say, I don’t take kindly to that. Between all my various adult responsibilities that I have now at this age, I don’t really feel guilty at all muting stupid people (who invariably think they’re genius, because you know, they’re stupid).

So you’ve convinced me that functional programming is the way to go.
Screenshot 2016-08-21 02.02.00 Since my genotype is public somehow it got used in an rnsnps tutorial. Pretty funny.

Also, Running Structure-like Population Genetic Analyses with R. Looks like there are some interesting visualizations of admixture components which are feasible with the new program.

People keep emailing me about the HGDP plink data set. I think I removed where it initially was, and it’s linked to my old Admixture tutorials. Well, download this zip, and look at the .fam file. It has clear population labels, so you should be able to do what you want in Plink.

Should We Be Having Kids In The Age Of Climate Change? The arguments really go all the way back to the ZPG movement. Actually, they’ve popped up in philosophical movements from the beginning of time. The world is a “vale of tears”, etc. Myself, I have no guilt about having children. My children are attractive, and seem rather intelligent so far. Would that more children like mine exist in the world!

Stop Tweeting Your #Firstsevenjobs: It’s just a way to disguise your privilege. FUCK YOU. The author of the piece has a degree in French language and literature from Columbia university according to her Linkedin. She gets to write for a living for Slate about food, and was editorial assistant for Mark Bittman. What. The. Fuck. She gets paid to write about food! She was Mark Bittman’s assistant. I guess it takes one to know one. Not that the author tries to mask her privilege: “But this list doesn’t tell you that I went to an Ivy League school and graduated without debt, since my parents were able and willing to pay for my tuition.”

I hate it when people say that gender is a continuum, because that tacitly brings to mind a uniform distribution. It’s not. It’s highly bimodal.

Lou Pearlman is dead. The weirdest thing about his career is that several stories have implied he became a boy-band impresario because he was a closeted gay man, and that was a way for him to have access to young vulnerable teenagers. The fact that he became very prominent in the late 1990s boy-band boom was almost a coincidence.

Sarah Haider has been accused of being a white supremacist.

Let Caster Run! We Should Celebrate Semenya’s Extraordinary Talent. As they would say, “I don’t even.”

The company I work for has a 20% discount on kits right now. So if you want your dog to get genotyped on 200,000 markers, and get ancestry and health, it’s a good time as any.

What are you reading?

The Amazing Atheist youtube channel has some pretty funny videos. E.g.,

• Category: Miscellaneous • Tags: Open Thread

n20307 Taking a break in my work of the day I stumbled upon the fact that Bernard Cornwell’s series based on King Alfred’s period, which began with The Last Kingdom, is a Netflix series. To be honest I much preferred the three volume Warlord Chronicles, set more than three centuries earlier, in post-Roman and pre-Saxon Britain. A retelling of the Arthurian romance with not too much romance, George R. R. Martin admitted to me in correspondence in the late 1990s that he quite enjoyed it as well. The protagonist of The Last Kingdom is peculiarly similar to the one in Warlord Chronicles.

As a fan of alternate history I’ve occasionally stumbled upon the “what-if” scenario whereby Alfred’s Wessex is conquered, and England becomes Daneland. Would we today be speaking another Scandinavian language? Would Christianity disappear, and the pagan rites of the Norse come to rule the day? It seems broadly likely that that would not be cause at all.

First, the victory of Christianity in Europe was overdetermined by the 9th century. Even in this period there was a Christian presence in Scandinavia. A Scandinavian ruled England would almost certainly be a Christian one. And in fact in the century before the Norman conquest the Scandinavians created a hybrid society with the native English. Harold Godwinson had a Danish mother, and connections to the Danish monarchy.

The second issue is one of language. The English language of Alfred’s time was much more Germanic, so the gap between it and the tongue of the Danes was not that large in any case. And, from what I have seen, it seems that the number of Scandinavians in relation to the native population was much smaller than that of the Saxons in relation to the British, though even in the latter case it must be acknowledged that the Germans who arrived in the 5th to 6th centuries were numerically outnumbered by the native Romano-British (see PoBI results).

Perhaps if the kingdom of Wessex fell England’s identity would be more indubitably aligned with Scandinavia, as it was arguably in the decades before Norman conquest in any case. But cultural identities can be curiously resilient. The Finns endured nearly 600 years of Scandinavian domination, but maintained their language, while the long Irish interaction with the Vikings still left the Irish identity intact.

• Category: History • Tags: History

Screenshot 2016-08-19 02.55.07

Joe Pickrell and Yaniv Erlich did an AMA on Reddit yesterday. I recommend you check it out.

They promote their new project, seeq. It looks pretty slick, and I’m excited to be part of the batch of beta testers.

• Category: Science • Tags: Genomics

51szD9HC9pL._SX258_BO1,204,203,200_ Obviously I’m doing more development right now than I would have expected. But in the long term I want to move beyond hacking to survive for the present, and write some code that’s sustainable. So I think I want to read a design patterns book. The last one I read was 15 years ago and I don’t really have much retention of it. I’m particularly interested in stuff geared toward Python (the language I’m starting to get comfortable in right now).

Readers with recommendations are invited to weigh in. I know I have a fair number of software engineers in the readership, so I’m asking for your thoughts and suggestions. Perhaps the classic from the GoF is still the way to go? Remember, I’m not a software engineering who works on scientific data, I’m a scientists who sometimes needs to do a little engineering and data analysis.

Sean, if you take this as an opportunity to leave a long-winded comment about sexual selection and blond women, again, I’m going to have to finally ban you!

• Category: Miscellaneous • Tags: Engineering

512QZUX2sSL._SX331_BO1,204,203,200_ Over at The Genetic Literacy Project Jon Entine has a post up, Usain Bolt’s Olympic gold proves again why no Asian, white–or East African–will ever be crowned world’s fastest human. Fifteen years ago Jon wrote Taboo: Why Black Athletes Dominate Sports And Why We’re Afraid To Talk About It, so he knows something about this topic.

Actually, I think Jon is wrong on this. Better drugs and biological engineering mean that I suspect at some point in the near future the fastest “human” alive is going to be non-African, and, if I had to bet, Chinese. But you know what Jon meant.

There is a lot of detail in Jon’s post because he knows a lot about this topic. But at the end of the day the specific details are less important than the general theoretical framework, which makes it unsurprising that a single group of humans who are genetically related dominate sprinting. Unlike figure skating, sprinting is entirely objective. All that matters are physical inputs. Second, unlike swimming, which is also objective, sprinting seems to have pushed very close to the boundaries of what non-modified or drug-enhanced individuals are capable of. To my knowledge there’s no expectation of a Fosbury Flop in sprinting.

Therefore, sprinting is selecting for raw ability. Training is not irrelevant, but the issue with training is that others can train too. What can’t be mimicked is raw ability due to one’s biological aptitudes and abilities (again, excepting bioengineering). Let’s assume that Olympic caliber sprinters are among the 10,000 fastest humans on the planet, because not all people with the aptitudes become sprinters. Assuming a normal distribution, that’s about five standard deviations above the human norm. I suspect I’m being conservative. Someone like Usain Bolt is probably a six standard deviation unit human. Google tells me that a fit human can run the 100 meter dash in 13.5 seconds. The world record is about 9.5 seconds. The absolute range here is not incredibly large. Small differences in the mean across populations suggest that when you select for extreme individuals those small differences will make all the difference.

If sprinting was less objective, then there would probably be more equality in outcome. I suspect judges would be biased for various reasons, and one set of nations or people of a particular ethnic background dominating a field can get quite embarrassing. But sprinting is rather objective, and the socioeconomic obstacles are low. Given basic nutrition, and the ability to huff it, you have a shot. What matters is the magnitude of your ability.

principlespopulationgenetics One peculiar thing population genetics teaches us that non-adaptive traits are more heritable. This is due to the fact that selection tends to remove variation, selecting for fitter individuals. Humans are good runners, there are entire evolutionary theories based around our biomechanical modifications and adaptations. But there’s really no benefit in running in bursts of 10.5 in the 100 meter dash vs. 9.5. We’re not that sort of ambush predator. There’s probably some heritable variation in burst ability, but it’s small, and not visible in any normal set of tasks among large groups of humans.

But modern competitive sports at the Olympic level is not selecting for normality, it’s selecting from outliers. It isn’t that West Africans were guaranteed to be the best sprinters, it’s just that a priori it shouldn’t be surprising that in such a non-adaptively beneficial trait as running a few seconds faster in the 100 meter dash some populations had the genetic die loaded in their direction.

Note that I’m not denying any sort of selective or adaptive argument. There’s a fair amount of evidence that there is some selection in favor of greater height in Northern Europeans vs. Southern Europeans, which probably explains why Lithuanians are more prominent in basketball in relation to their numbers than Italians. But the selection wasn’t for basketball, and the fact that there is heritable variation suggests that selection wasn’t that strong and unidirectional….

Humans vary. Populations vary too. When you select from the tails of the distribution, the differences between populations are going to be very noticeable. If a sport is objective, and pushing its limits, it will select from the tails of the distribution.

• Category: Race/Ethnicity, Science • Tags: Race, Sports

600px-Ptolemaicsystem-smallSabine Hossenfelder on her side gig as a physics consult, What I learned as a hired consultant to autodidact physicists:

Sociologists have long tried and failed to draw a line between science and pseudoscience. In physics, though, that ‘demarcation problem’ is a non-problem, solved by the pragmatic observation that we can reliably tell an outsider when we see one. During a decade of education, we physicists learn more than the tools of the trade; we also learn the walk and talk of the community, shared through countless seminars and conferences, meetings, lectures and papers. After exchanging a few sentences, we can tell if you’re one of us. You can’t fake our community slang any more than you can fake a local accent in a foreign country.

I haven’t learned any new physics in these conversations, but I have learned a great deal about science communication. My clients almost exclusively get their information from the popular science media. Often, they get something utterly wrong in the process. Once I hear their reading of an article about, say, space-time foam or black hole firewalls, I can see where their misunderstanding stems from. But they come up with interpretations that never would have crossed my mind when writing an article.

I’ve been blogging since 2002. Like Sabine I can often tell if someone has a scientific background after a few sentences, especially if they are biologists of some sort. As for the rest, the chasm is between the intelligent vs. not so intelligent, and it is usually pretty clear too. Mostly the intelligent have liberal arts or social science backgrounds, but have the basic analytic tools to decompose problems at the most general levels. The less intelligent tend to speak in simple formulas when coherent, and devolve into total incomprehensibility when they try and attempt originality.*

The second issue is a somewhat different one from physics. Usually at a given moment there is a topic of particular interest to the media. Evo-devo and epigenetics come to mind. These are real scientific fields of inquiry. But because of disproportionate media attention to these sorts of topics, usually those who rely on their science knowledge from popularizations will assume that evo-devo and epigenetics have “revolutionized” our understanding of evolution and genetics, when in reality these are still developing areas, whose ultimate impact is to be determined.

In fact, I’d take this further: the area of evolutionary genetics has arguably not been “revolutionized” since the 1970s, with the theoretical and empirical debates triggered by allozyme work and the neutralist-selectionist debates. All the rest, including genomics, is just commentary.

* Here is a good example: the stupid reader who was explaining to me patiently how splicing and gene regulation “disprove” heritability estimates. I dismissed them, but the reality is that I’m 99% sure that that reader thinks I’m an idiot as well.

• Category: Science • Tags: Science

41ezBQHrx7L Spencer Wells, along with many others, such as Jared Diamond, argued that agriculture was a disaster in terms of what it wrought for the quality of life for the average human in his book Pandora’s Seed. This is broadly plausible to me. On the other hand, I also think it is highly likely that agriculture and civilization were basically inevitable.

The “great leap forward” in cultural complexity and explosion of symbolic expression ~50,000 years ago, give or take, seems likely to have been only the culmination of a process of encephalization and increased sophistication which had proceeded over millions of years. The precursors to the agricultural life were likely already there before the Holocene.

To a great extent the hypothesis of inevitability has been tested: in the Americas much of the dynamics which characterize the Old World were recapitulated. Agriculture, civilizations with writing and class stratification, and monumental architecture, all with analogs in the Old World, are there. In fact, this National Geographic piece, In Search of the Lost Empire of the Maya, is fascinating to read, because it seems to me that it likely parallels developments in the Old World two thousand years before. The Snake Kings were warlords in a manner which would have been familiar to the “Great Kings” of the ancient Near East.

There are two great schools of history from the pre-modern era. Those which are cyclical, and those which exhibit some intuition that there is an endpoint or progress. The “independent” experiments of human history suggest that both are true, with an arc of history on the macroscale scaffolded by innumerable cycles of rise and fall.

• Category: History • Tags: History

The-Ocean-of-Churn I just bought my friend Sanjeev Sanyal’s book, The Ocean of Churn: How the Indian Ocean Shaped Human History. Sanjeev is a polymath with varied interests, some of which intersect with my own. A few years back I had the pleasure of having dinner with him and Reihan Salam, and the server kept unapologetically offering drinks one of us had ordered to the wrong person. I don’t think we look that much alike!

The top start-up mecca in America is far from Silicon Valley. It’s cheap to live here, and fun for young brogrammers. Also, not too long a flight from elsewhere. As Mark Krikorian observed on Twitter being a blue bubble in a red state means that Austin can take advantage of low-cost and low-tax public policies, while maintaining a culturally liberal social aesthetic.

Bought Python Essential Reference.

Hubby and Lewontin on Protein Variation in Natural Populations: When Molecular Genetics Came to the Rescue of Population Genetics.

Sausage Party is a surprising mix of high and lowbrow.

Update: If I don’t post your initial comment, posting five additional times won’t result in your comment being posted.

• Category: Miscellaneous • Tags: Open Thread

Rosenberg_1048people_993markers A friend recently emailed to ask about the best way to pick a proper “K” value when inferring structure. K just being the parameter which defines how many putative ancestral populations you have in your model to explain some data on genetic variation. Obviously some value of K are more informative than others of population history.

For example, if you had 100 Swedes and 100 Yoruba Nigerians, to model the population structure you could select K = 2 or K = 50. The algorithm would produce results in the latter case, but you “know” a priori that really K = 2 is a really good model of the population history in a straightforward interpretable sense. There’s just not that much more juice to squeeze with many clustering methods out of this sort of data.

But it’s harder when you have population structure in organisms which we don’t know much about aside from the genetic data. How does one “objectively” select a K. The most common method is outlined in a 2005 paper, Detecting the number of clusters of individuals using the software structure: a simulation study:

The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software structure allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated ‘log probability of data’ does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic ΔK based on the rate of change in the log probability of data between successive K values, we found that structure accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

There’s an old saying, “garbage in, garbage out.” The method of ΔK is useful as far as it goes, but as inputs it takes the log likelihoods from the Structure program. For Admixture you can look at cross-validation. But these statistics are subject to various assumptions and approximations (in addition, some of the priors within the clustering algorithms are gross simplifications).

This is one reason I was excited about Estimating the Number of Subpopulations (K) in Structured Populations:

A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favor of a particular value of K cannot usually be computed exactly, and instead programs such as Structure make use of heuristic estimators to approximate this quantity. We show—using simulated data sets small enough that the true evidence can be computed exactly—that these heuristics often fail to estimate the true evidence and that this can lead to incorrect conclusions about K. Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach, using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are more accurate and precise than those based on heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Finally, we test our solution in a reanalysis of a white-footed mouse data set. The TI methodology is implemented for models both with and without admixture in the software MavericK1.0.

The website for MavericK 1.0 is informative if you don’t have academic access.

Unfortunately, and probably not surprisingly, this method is not scalable to genomic data sets. E.g., they’re looking that 10, 20 or 50 loci. A “modest” human genotyping array will provide you with tens of thousands of loci (SNPs). A “standard” array will provide you with on the order of 500,000 SNPs.

But the conclusion of the paper is worth keeping in mind:

Finally, it is important to keep in mind that when thinking about population structure, we should not place too much emphasis on any single value of K. The simple models used by programs such as Structure and MavericK are highly idealized cartoons of real life, and so we cannot expect the results of model-based inference to be a perfect reflection of true population structure (see discussion in Waples and Gaggiotti 2006). Thus, while TI can help ensure that our results are statistically valid conditional on a particular evolutionary model, it can do nothing to ensure that the evolutionary model is appropriate for the data. Similarly—in spite of the results in Table 2—we do not advocate using the model evidence (estimated by TI or any other method) as a way of choosing the single “best” value of K. The chief advantage of the evidence in this context is that it can be used to obtain the complete posterior distribution of K, which is far more informative than any single point estimate. For example, by averaging over the distribution of K, weighted by the evidence, we can obtain estimates of parameters of biological interest (such as the admixture parameter a) without conditioning on a single population structure. Although one value of K may be most likely a posteriori, in general a range of values will be plausible, and we should entertain all of these possibilities when drawing conclusions.


• Category: Science • Tags: K, Structure

51sdHZvYfTL._SX334_BO1,204,203,200_ Evolutionary theory famously predated the emergence of genetics by decades. Initially there was some conflict between the heirs of Charles Darwin and the first geneticists in terms of their mechanistic understanding of how evolutionary process occurs. Within a few decades though genetics and evolutionary biology were synthesized so that the former came to be integral toward understanding the processes and parameters which shape the character of the latter (see The Genetical Theory of Natural Selection). E.g., imagine attempting to understand the origins and maintenance of sexual reproduction without any genetic understanding of the determination of sex and its implications for transmission.

But obviously genes are not everything when it comes to phenotypes. In particular with humans, there are complex behaviors and social interactions which seem to be persistent, and perhaps adaptive, which may not be directly contingent upon any simple genotype-phenotype map. 41YXHblIQEL This is not to say that cultural and behavioral traits have no genetic basis. To give an example, religion is a complex phenomenon which is both universal and does not seem directly encoded in one’s genes. The search for a “god gene” is futile, because religion as a phenotype is mediated by innumerable other phenotypes, which themselves have complex genetic bases.

Though culture is contingent upon genes, exhibits a character which is separable from genetic evolution. In particular, dual inheritance theory explicitly acknowledged that human cultural variation over time and space is a function of the interaction between both cultural and genetic evolution. Though there are similarities between the two, and in fact the field of cultural evolution consciously utilizes much of the same formalism as population and quantitative genetics, the modes of inheritance and nature of the origination and perpetuation of variation of the two differ a great deal.

As a rule of thumb you can posit that genetic evolution is relatively slow and torpid in relation to cultural evolution, which is protean and quicksilver. Consider that lactase persistence or high altitude adaptations are the two fastest we know for human genetics, and they occur on 1,000 year time scales. Over a 1,000 year time scale takes you from Julius Caesar to Otto the Great. It takes you from first of the Mycenaean, to Athens of Pericles.

The differences between culture and genes are important to keep in mind when one is making predictions. I’m a big fan of the Eric Kaufmann book, Shall the Religious Inherit the Earth?: Demography and Politics in the Twenty-First Century. The model outlined within the book, higher fertility for religious people, ergo, the reemergence of religion, is logically plausible. But I always must remind me people that the same concerns were prevalent in France before 1850, with the arrival of more traditional Roman Catholics into a milieu which had notably secularized and undergone early demographic transition. Why is France today not a uniformly Catholic republic? First, there is history. The migration of Muslims from North Africa. But even more important, cultural evolution, as the descendants of Spaniards, Poles, and Italians, secularized.

9780226558271 There is though a difference between description, and formal modeling. The field of cultural evolution attempts to do the latter. There are several lay and specialist introductions to the field (just click some of the book links and you’ve find them all). It’s worth attempting to grapple with the domain in a more systematic way, because that’s the only way you can make predictions which make sense of the diversity we see around us.

A new preprint is an interesting addition to the literature, Gene-culture co-inheritance of a behavioral trait:

Human behavioral traits are complex phenotypes that result from both genetic and cultural transmission. But different inheritance systems need not favor the same phenotypic outcome. What happens when there are conflicting selection forces in the two domains? To address this question, we derive a Price equation that incorporates both cultural and genetic inheritance of a phenotype where the effects of genes and culture are additive. We then use this equation to investigate whether a genetically maladaptive phenotype can evolve under dual transmission. We examine the special case of altruism using an illustrative model, and show that cultural selection can overcome genetic selection when the variance in culture is sufficiently high with respect to genes. Finally, we show how our basic result can be extended to nonadditive effects models. We discuss the implications of our results for understanding the evolution of maladaptive behaviors.

The most relevant section is probably 3.2 Model 2: Cultural prisoner’s dilemma. If you don’t know what the Price Equation is, read the original paper. It will induce some clarity.

The fact that more variance in culture in relation to genes allows for selection to act more powerfully on culture, and arguably in a maladaptive manner from the gene-centric perspective, is no surprise. This preprint adds more precision and clarity. For adaptation to occur there needs to be heritable variation. One reason that cultural group selection is more plausible than genetic group selection is that genetic variation across demes is often very low. The Fst between racial groups may be 0.10 to 0.30, but it is not very common for such Fst values to be realized between two groups genuinely in competition. More often neighboring populations have much lower Fst values, though ancient DNA is suggesting that 0.05 to 0.10 values were maintained in some areas 5 to 10 thousand years ago. A simple population genetic rule of thumb is that one needs to have less than one migrant between two populations per generation for their genetic variation to increase, rather than decrease. In other words, minimal gene flow on a general scale quickly reduces between group genetic variance.

In contrast, cultural variation can be maintained because migrants can switch cultures, or, their genetic progeny can adopt the culture of one the parents in totality. In this way the later Ottoman Sultans and Umayyad rulers of Al-Andalus had been genetically transformed by generations of mixing with concubines derived from Europeans or Caucasians (i.e., those from the Caucasus), while remaining culturally very Turk and Arab respectively.

As noted in the preprint, this formal/theoretical avenue of research will allow for the development of a robust empirical research program. The data is out there.

• Category: Science • Tags: Cultural Evolution, Genetics

killerenhancedcolourscheme Recently Daniel Falush’s group came out with a preprint, A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots. If you read the science posts on this weblog (basically, if you read this weblog), and you haven’t read it, read it now.

At his weblog, Paint My Chromosomes, Falush has talked about both the production of the preprint (I had a minor stimulatory role), and the attempt to get it published somewhere. This reaction is strange to me:

We also had our first journal rejection, from eLife. It has not been my habit to live-tweet journal rejections and am not intending to start now. I am a journal editor myself and do not think the process would benefit from being turned into a public performance. I was disappointed because eLife claims to hold itself to higher standards, trying to change publication by judging papers on their true worth rather than on simple measures of impact and also because the reason given was silly:

“..but feel that the target audience is a rather specialised one.”

Of course I’m biased. But this strikes me as crazy. The third most cited paper in the history of the journal Genetics, is Jonathan Pritchard’s Inference of Population Structure Using Multilocus Genotype Data. Take a look at the list, and note the papers that it is more cited than (e.g., a Sewall Wright paper from 1931, and Tajima’s 1989 paper!).

To be sure, the number of times that a paper is cited is not a good measure of how often it is read and understood. And that’s kind of the point of Falush’s preprint, to actually give some guidance to people who use model based clustering in a turnkey fashion without any deep comprehension of its limitations and biases. The nuts & bolts of the inferences of population structure may be specialized, but analysis of structure is a routine part of many different types of papers, in particular in medical genetics where variants may have different effects in different genetic backgrounds.

• Category: Science • Tags: Structure

85251766_fea18b6004 Probably the most incredible science story of the week, Eye lens radiocarbon reveals centuries of longevity in the Greenland shark (Somniosus microcephalus):

The Greenland shark (Somniosus microcephalus), an iconic species of the Arctic Seas, grows slowly and reaches >500 centimeters (cm) in total length, suggesting a life span well beyond those of other vertebrates. Radiocarbon dating of eye lens nuclei from 28 female Greenland sharks (81 to 502 cm in total length) revealed a life span of at least 272 years. Only the smallest sharks (220 cm or less) showed signs of the radiocarbon bomb pulse, a time marker of the early 1960s. The age ranges of prebomb sharks (reported as midpoint and extent of the 95.4% probability range) revealed the age at sexual maturity to be at least 156 ± 22 years, and the largest animal (502 cm) to be 392 ± 120 years old. Our results show that the Greenland shark is the longest-lived vertebrate known, and they raise concerns about species conservation.

Elisabeth Pennisi has a nice write-up, Greenland shark may live 400 years, smashing longevity record:

…Using this technique, the researchers concluded that two of their sharks—both less than 2.2 meters long—were born after the 1960s. One other small shark was born right around 1963.

The team used these well-dated sharks as starting points for a growth curve that could estimate the ages of the other sharks based on their sizes. To do this, they started with the fact that newborn Greenland sharks are 42 centimeters long. They also relied on a technique researchers have long used to calculate the ages of sediments—say in an archaeological dig—based on both their radiocarbon dates and how far below the surface they happen to be. In this case, researchers correlated radiocarbon dates with shark length to calculate the age of their sharks. The oldest was 392 plus or minus 120 years, they report today in Science. That makes Greenland sharks the longest lived vertebrates on record by a huge margin; the next oldest is the bowhead whale, at 211 years old. And given the size of most pregnant females—close to 4 meters—they are at least 150 years old before they have young, the group estimates.

• Category: Science • Tags: Greenland Shark, Science

A follow up on the Ancient Archaic Admixture Into the Andamanese story, No evidence for unknown archaic ancestry in South Asia:

Genomic studies have documented a contribution of archaic Neanderthals and Denisovans to non-Africans. Recently, Mondal et al. 2016 (Nature Genetics, doi:10.1038/ng.3621) published a major dataset–the largest whole genome sequencing study of diverse South Asians to date–including 60 mainland groups and 10 indigenous Andamanese. They reported analyses claiming that nearly all South Asians harbor ancestry from an unknown archaic human population that is neither Neanderthal nor Denisovan. However, the statistics cited in support of this conclusion do not replicate in other data sets, and in fact contradict the conclusion.

Last I heard they hadn’t released the bam files. Mistakes are made, that’s how science is done, and other people help in the process of correction. But, it is starting to get worrisome to me to see papers with bioinformatic errors being published in high impact journals.

• Category: Science • Tags: Genomics

pydata_cover (1) Sorry about the light posting. I’ll get back into gear in a few days. Very busy professionally and personally the past week or so.

I’ve been getting into writing Python code, as opposed to reading it. It’s a different beast altogether, obviously. I’m a lot slower than I would be in Perl, but I’m getting stuff done, so that’s something. I would highly recommend Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, if you have a background in R and another scripting language.

I went to my high school reunion. It was fun and interesting. Apparently people change in a few decades…

• Category: Miscellaneous • Tags: Open Thread
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"