|Year||Google Scholar hits|
|Year||Google Scholar hits|
The above talk is from Alice Dreger, author of Galileo’s Middle Finger: Heretics, Activists, and One Scholar’s Search for Justice. I don’t know Dreger personally, but she seems like a brave and courageous person. In the broadest strokes there’s very little where we disagree. Yes, our politics, and many of our specific beliefs, diverge, but we generally at least hold to the ideal of truth.
There is one section of her talk where Dreger waxes eloquently about the Enlightenment, and freedom of thought, which caught my attention. We have always missed the mark, but at there was a point where in Western intellectual culture the idea that freedom of thought and striving toward truth was at least the paramount method and goal. I am not so sure that is the case today.
When Dreger pointed approvingly on Twitter to University of Chicago’s statement on “safe spaces,” I told her that most of my liberal Twitter follows were enthusiastically sharing this piece, UChicago’s anti-safe spaces letter isn’t about academic freedom. It’s about power. The piece makes some coherent points, but mostly it is self-congratulatory intellectual masturbation. At a certain point the cultural Left no longer made any pretense to being liberal, and transformed themselves into “progressives.” They have taken Marcuse’s thesis in Repressive Tolerance to heart.
Though I hope that Dreger and her fellow travelers succeed in rolling back the clock, I suspect that the battle here is lost. She points out, correctly, that the total politicization of academia will destroy its existence as a producer of truth in any independent and objective manner. More concretely, she suggests it is likely that conservatives will simply start to defund and direct higher education even more stridently than they do now, because they will correctly see higher education as purely a tool toward the politics of their antagonists. I happen to be a conservative, and one who is pessimistic about the persistence of a public liberal space for ideas that offend. If progressives give up on liberalism of ideas, and it seems that many are (the most famous defenders of the old ideals are people from earlier generations, such as Nadine Strossen and Wendy Kaminer, with Dreger being a young example), I can’t see those of us in the broadly libertarian wing of conservatism making the last stand alone.
Honestly, I don’t want any of my children learning “liberal arts” from the high priests of the post-colonial cult. In the near future the last resistance on the Left to the ascendency of identity politics will probably be extinguished, as the old guard retires and dies naturally. The battle will be lost. Conservatives who value learning, and intellectual discourse, need to regroup. Currently there is a populist moood in conservatism that has been cresting for a generation. But the wave of identity politics is likely to swallow the campus Left with its intellectual nihilism. Instead of expanding outward it is almost certain that academia will start cannibalizing itself in internecine conflict when all the old enemies have been vanquished.
Let the private universities, such as Oberlin, wallow in their identity politics contradictions. Dreger already points to the path we will probably have to take: gut the public universities even more than we have. Leave STEM and some professional schools intact, and transform them for all practical purposes into technical universities. All the other disciplines? Some private universities, the playgrounds of the rich and successful, will continue to be traditionalist in maintaining “liberal arts,” which properly parrot the latest post-colonial cant. But much learning will be privatized, and knowledge will spread through segregated “safe spaces.” Those of us who read and think will continue to read and think, like we always have. We just won’t have institutional backing, because there’s not going to be a societal consensus for such support.
I hope I’m wrong.
Since there was some discussion about East Asian genetic structure below…I pulled about 20 South Koreans I have in my data. Merged them Han and Japanese from the HGDP. I then ran a PCA and plotted it, and also unsupervised ADMIXTURE, and plotted it.
The results are below.
About thirteen years ago I expressed the opinion that an understanding of population structure will become a matter of intellectual curiosity once we have a better understanding of the genetic basis of characteristics. A friend, who was a statistical geneticist, told me that this was unlikely. We were unlikely to capture the ability to predict all outcomes well enough on even high heritable complex traits to simply discard population structure information. Some of this is not due to genetics; different populations may expose themselves to different environmental conditions. For example, it would be useful to know which individuals in the CEU white European American data set are practicing Mormons, and which are not, because Mormonism tends to result in a lot of behavior modification.
But some of the concern about population structure has to do with the fact that genetic background matters, and we are unlikely to ever have total omniscience as to the nature of genetic interactions and dependencies. By this, I mean that if we have a strong causal signal which associates disease risk with a genetic variant, that risk is still conditional on dependencies of other genetic variations across the genome. Those variations are the outcome of demographic histories, which one can “control” for to some extent by accounting for population structure. In more plain language, a signal that predicts an outcome in Norwegians may not predict the same outcome in Nigerians. The may be due to different frequencies of other variants which are not directly causal, but interact with the causal signals, which vary between populations.
More recently I’ve been a bit sanguine. I don’t follow the literature closely, but papers like High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants, make me wonder if the genetic background concerns weren’t over-wrought.
A new preprint, Population genetic history and polygenic risk biases in 1000 Genomes populations, suggests we should be worried. Or, more precisely, we should be cognizant of the limitations genetic background imposes upon us for certain classes of variants and disease. In particular, rare variants are going to be less portable across populations because of shallower time depth of their emergence, after, populations have diverged. So, if you have a low frequency major effect causal variant in Europeans, there is a much lower likelihood that it is in other populations.
The histogram above illustrates an excellent case study from the preprint. The genetic architecture of height and its genomic basis has been most well elucidated for Europeans. We know, for example, many of the loci which distinguish Northern and Southern Europeans, and, we know that selection has resulted in divergence between the two populations over the past 5,000 years. But as you can see the predicted heights seem to simply follow genetic distance from Europeans. SAS = South Asians, while AMR = a mixed cohort of populations from the Americas. EAS and AFR are East Asians and Africans. In reality, Africans are nearly as tall as Europeans (taller or shorter depending upon the reference European population), and taller than East Asians. The predictions here are off because the causal variants inferred from the studies of European cohorts are portable in direction proportion to shared demographic history. South Asians share a relatively ancient demographic history with Europeans, while many mixed groups from the Americas have Europeans as one of their recent founding populations. But in both cases the causal variants were likely segregating in the ancestral populations before divergence, so there is no major difference in the consequence.
The preprint has a lot more than just a reanalysis of GWAS. Using local ancestry deconvolution methods they show how one can infer history from patterns of genetic variation (though as always, this should not be taken as gospel, as there are biases in the methods currently used). The major take home is simple: population structure is real, and, it has real consequences functionally.
About 2/3 of the way through The Ocean of Churn: How the Indian Ocean Shaped Human History by Sanjeev Sanyal. It’s a wide-ranging book which synthesizes diverse disciplinary threads. The big over-arching thesis seems to be that movement of peoples and ideas was far less unidirectional than we often tend to think and are told. Probably one of the major examples of this which I think has been somewhat misleading to many people has been the idea that migration out of Africa can be purely defined unidirectional migration in a series of stepwise events.
That being said, there are the usual problems that occur when you synthesize diverse disciplines. Since I know a fair amount about the intersection of genetics and history I can say with great confidence that some of the genetics in the book is now outdated. The reason is that it relies on work that was published ~5 years ago. Also, there is the unfortunate reality that sometimes high-impact journals publish works that are almost certainly wrong. For example, Sanyal cites Genome-wide data substantiate Holocene gene flow from India to Australia. This paper is interesting, but it was clear to many that it was probably wrong almost immediately upon publication.
Longer review when I have time.
— Razib Khan (@razibkhan) August 27, 2016
I need to read the paper closely. But the demographic-historical implications of this are pretty straightforward. (it’s open access)
G.E., the 124-Year-Old Software Start-Up. The story is interesting to me mostly because it illustrates how contingent how modern civilization is. There are so many people doing so many specialized things that we take for granted.
Forget “Earth-Like”—We’ll First Find Aliens on Eyeball Planets. M. J. Engh’s Wheel of Winds was set on one of these planets.
Ohana is a suite of software for analyzing population structure and admixture history using unsupervised learning methods. We construct statistical models to infer individual clustering from which we identify outliers for selection analyses.
It may be better than ADMIXTURE, but we’re reaching a point where “good-enough” tools are achieving “lock-in.”
Down in the valley, up on the ridge. On Melungeons.
The crescent and the globe. I wrote this.
Update: In light of further comments I may have been wrong about Hong’s recent admixture! See the comments below (also, further discussion with Spencer Wells offline). I don’t have total clarity on what’s going on, because I’m sure my friends weren’t lying…but they were also early adopters, and the methods may have changed. And, I do think 23andMe has the talent and methods to resolve Korean ancestry, so it’s a matter of investment, not data.
All that being said, all individuals should pull down the raw data and do a reanalaysis.
Quartz has an article up, 23andMe has a problem when it comes to ancestry reports for people of color, which I want to comment on at length. Though literally taken the title is not something I’d disagree with too much, the tone and details I have serious issues with.
First, some disclosure. Hong talked to me on the phone for an hour about this story. Mostly we talked about her Korean ancestry results. More on that later. Second, I consulted for 2.5 years for Family Tree DNA, am friends with Spencer Wells (who is quoted), and am on friendly terms (I’d like to think!) with Joanna Mountain, and quite respect many of the scientists at 23andMe (e.g., Kaisa Bryc and Ivan Juric off the top of my head).
I will go through the article point by point. First:
I doubt that most 23andMe users realize how paltry the company’s data is for non-Caucasians. For example: The data set that 23andMe used to generate my report has 76 Koreans in it, according to Dr. Joanna Mountain, the company’s senior director of research. 76 Koreans. It is estimated there are at least 7 million Koreans living outside of the Korean peninsula—including 1.7 million in the US—among a worldwide population of 83 million.
Seventy-six Koreans seemed small to me, but what do I know? I’m just a journalist. So I spoke to geneticist Spencer Wells, founder and former director of National Geographic’s Genographic Project (arguably a 23andMe competitor), which he ran from 2005-2015. “ is a really low number,” he concurred.
The small sample sizes seem really, really problematic if you are a lay person, or a journalist. The issue is that with genotype technology that looks for common polymorphisms you really don’t get that much more information from 1,000 individuals than you do from 100. All things equal, more sample size is better, but the gap between 10 and 100 is much much greater than 100 and 1,000 or 100 and 10,000. You can see this in the robustness of results for model-based clustering conditional on different sample sizes. For a homogeneous population like the peoples of the Korean peninsula, who seem relatively panmictic, a bigger sample size would have only marginal effect on the overall outcomes using these methods (also, it might matter if you were looking at low-frequency alleles from whole genome sequencing).
Before I talked to Hong I checked in with a friend who was half north Korean (in that her father’s family was from the northern half of the peninsula and migrated south) and half central Korean (i.e., her mother’s family was from around Seoul). Just like her husband, whose family was from Busan in the far south, her results came back as 99% Korean. Some genetic research has been done on Koreans, and there just isn’t that much structure. The Koreans have a composite origin if you go far back enough, but they’ve been intermarrying with each other a long time.
Also, astonishingly, the report shows that I am 13.4% Japanese and 14% Chinese—and only 61.6% Korean. I was looking forward to watching my parents freak out. My sister texted me, “Oh [Dad will] probably blame Mom.”
To my disappointment, my parents did not freak out, nor did they get into an amusing argument about which of their ancestors was the ho. Because they simply did not believe the data. And, for once, they were right.
The public relies on journalists for the truth. Sometimes the truth can be slippery. But sometimes it is clear. Most of conversation between Hong and myself was about her Korean ancestry. As I said to her, I asked a handful of my Korean friends about their 23andMe results before we spoke. From that I told Hong I was 99% sure that she had recent non-Korean ancestry. 23andMe’s results are really robust. I tried to emphasize that over and over. Hong can believe what she wants, but it is obvious that she almost certainly has non-Korean ancestry relatively recently in the past.
Because 23andMe uses chromosome painting, you can see she has very long segments of inferred Chinese and Japanese ancestry. This non-Korean ancestry is probably from within the last three generations because ancestry tract lengths indicate that recombination hasn’t broken apart the associations across the chromosomes (there are 20-40 recombination events across the genome per generation).
I asked Wells whether my percentage breakdowns of Korean, Chinese, and Japanese meant anything. “Yes,” he said, “but I think it is misleading to go to a decimal place or even to go out two digits.” Wells said that another problem with the data is that “Most of those [samples] are from the US. They’re not terribly useful for studies of indigenous composition—which is effectively what this analysis is trying to do.”
I had a long text conversation with Spencer on this after the article came out. I can see where he’s coming from. And 23andMe does have a shortfall of indigenous and non-European samples. But as I said, I asked around to Korean friends who had used 23andMe before and the population is pretty homogeneous, and the friends’ results I cited above were representative. I have also worked with and seen samples from Family Tree DNA, and it’s the same story. There might be undersampled populations from Korea, but I’d bet against it. Koreans are relatively homogeneous, with a position between Japanese and North Chinese. Where you would expect them to be.
Spencer is correct about the decimal places issue. They give people a false impression of precision. I do know that scientists within DTC companies struggle against it. But scientists don’t always win these arguments.
I also interviewed Harvard geneticist Robert Green, who made the important point that private companies have different methods and standards from those of an academic lab. “There is a difference between analysis you can do with hundreds of [genetic] markers at a research level, and the kind of analysis that even the best companies can do, which is more an approximation,” he said.
Green is a medical geneticist who does great work. But I’ll be generous and assume he’s taken totally out of context here, because what he says makes no sense. The genotyping platforms do have error rates (no-calls, mistypings, etc.) on the order of 1%. But they’re using hundreds of thousands of SNPs. This error rate doesn’t matter too much for what 23andMe is doing in relation to ancestry. And with population structure inference these errors usually don’t cause a major issue if they aren’t systematic.
Then there’s this:
A few of the geneticists I interviewed for this article (but not Green or Wells) outright accused 23andMe of commercially driven ethnic bias. For example, no distinction is made between northern and southern Chinese, who have very different traits. This was a serious allegation, so I put the question directly before 23andMe’s Mountain. “As a scientist, I find that insulting,” she said in a phone interview.
I brought up the issue with the Chinese to Hong, and I apologize to Mountain here if it came off as offensive, because I certainly didn’t mean it that way. My point, which I’ve brought up for years both in public, and when I have consulted for DTC companies, is that South and East Asians are huge groups, and it’s incongruous that they aren’t differentiated as much as the Europeans. These tests basically tell you are South Asian, or Chinese, or Korean, or Japanese. In the case of Koreans and Japanese there isn’t that much structure within these groups, but that is not the case with the Han Chinese. There is an decent amount of structure, but last I checked 23andMe has a catchall Han Chinese group. Why? I’ll get to that later. (It’s not because they don’t have the data.)
Though I disagree with the tone and the emphasis, a simple inspection by Hong has shed light on something that has been glaringly obvious in the genetic genealogy community: there is laser-like focus on differentiating very close Northern European groups, such as Irish and English, and not so much emphasis on differentiating diverse populations such as South Asians. This was one thing I did talk to Hong about at length. I don’t think it’s crass racism, and I think that I made that clear to her, but I’m not happy with the situation either (23andMe representatives know I’m not happy, and have talked to me about it at ASHG).
The final sections involve Hong reviewing the disparities in sample representation. As I said above, some of this overdone. But, it is a little ridiculous that there are only a few hundred African population samples in their data. Granted, it turns out that between-population genetic distance in Africa is actually not as much as you’d think based on aggregate variation (the within population variation is what makes all the news). I think Hong is correct that 23andMe should have made more effort on sample collection these past few years…but I’m not CEO of 23andMe, and Joanna Mountain and her scientists don’t call all the shots. I think Hong’s piece leaves Mountain and the researchers holding the bag for something that really isn’t their doing (perhaps it is, but I’m really skeptical of that).
Could the company be doing a better job with collecting ethnographic data? “Absolutely they could,” Wells said, “but it’s not their raison d’être.” Which, of course, is pharma and health research. Fair enough—it’s their money. But how about a disclaimer attached to the ancestry part of the report? Like, “for entertainment purposes only?” Because data based on 76 Koreans (or any other ethnic group) is definitely not worth potentially causing family discord or a blood feud. I don’t know whether the company understands the realities of deadly global ethnic tensions and the potential damage created by people’s trust in these reports.
I think Spencer has highlighted the major dynamic here: 23andMe is pivoting towards biomedical research. It has a database of north of a million, mostly European-origin individuals. The real money now comes from leveraging the database to collect information on health, and combining it with the genotypes they already have. On the margin, getting greater population diversity is probably not a major avenue by which they could gain higher valuations. And getting from one million to ten million genotypes is nothing without increasing their database of phenotypes.
The real story here is not one of racism. It’s one of capitalism. Most of 23andMe’s customers are white European in ancestry, and a disproportionate number of those are Northern European. Is it a surprise that their tools breakdown Northern European ancestry so finely? That’s their customer base.
Second, many Asians I’ve talked to are relatively uninterested in fine-grained breakdowns in their ancestry. For several years I worked with an engineer from Fujian, and his Family Tree DNA results showed that he was shifted toward the southern end of the north-south Chinese cline. He didn’t care at all, because he was from Fujian, so of course he knew this. Many Asians seem to have this attitude where the ancestry results are viewed as confirmatory. Hong’s case, where there was a surprise, is exceptional.
If 23andMe wanted to they could easily breakdown Asians into further subcomponents. I think there are two reasons they don’t want to aside from the firm’s recent focus on health and pharma. First, they don’t have that many Asian customers. Second, their Asian customers might actually get a bit irritated!
Ultimately, Hong can think whatever she wants to about her 23andMe results. But the data are out there. It’s pretty obvious that unless there was a sample mix-up, she has recent Chinese and Japanese ancestry (she could put the raw results in the public domain and have people cross-check with other methods, like PCA; I’m pretty sure they would confirm the 23andMe results).
On a last nerdy note: the data generated by DTC companies is great. Their Illumina SNP-chips are really good, with 99% or so correct-call rates. Hong referred to data in the piece when she really meant results. The thing is that results are basically generated through a sieve of methods geared toward human digestibility. 23andMe and other DTC companies differ because of different methods and parameters in those methods, that are determined by what humans want out of these techniques. But the data, that’s pretty straightforward and robust.
If you are interested in a more philosophical take, Joe Pickrell’s What is ancestry?
Addendum: My conversation with Hong was very wide-ranging. We talked about EDAR, random mating populations, and local ancestry deconvolution. Well, perhaps not in those words. It’s a little saddening to me that ultimately what came out of all that is a piece which tries to paint 23andMe as prejudiced against minorities. The only prejudice they exhibit as a firm is against smaller market share.
No matter the Yelp reviews, if it doesn’t have dry pot or whole boiled fish on the menu, not worth it. Also, should feature something where the peppercorn is salient.
What a disappointment. Salty. Without much other flavor besides the spice. It was like a watery spin on Louisiana hot sauce. I couldn’t taste the “aromatic spices” and “fresh herbs.” And don’t tell me it is because it’s too spicy, I didn’t find it too spicy. I did find it very salty though.
In 2011 I was having dinner with an old friend who was an engineer at Intel. He also has a Ph.D. from MIT. Smart guy. But when I mentioned casually offhand that we were all a few percent Neanderthal (outside of Africa), he was surprised. I was a bit shocked, as I explained that this was a huge science story. The Neanderthal genome had been published the previous year. How could my friend not have known?
He was totally unembarrassed, and told me I overestimated how closely the public followed genetics and paleontology. I’m sure he was right. But it’s hard to remember sometimes.
We’ve gone further beyond where we were in 2010. We now have a really good grasp of a lot of population dynamics in Eurasia over the past 20,000 years. Probably the best place to start is with this preprint, The genetic structure of the world’s first farmers. But the general outlines were already evident a few years back in Toward a new history and geography of human genes informed by ancient DNA.
Most of the world’s population seems to descend from a mixing of a set of groups which 10,000 years ago were distinct. How distinct? We’re talking about Fst values on the order of 0.10, which means that ~10% of the variation genetically is partitioned across two pairwise populations. That’s about what you see between Europeans and Chinese today. Some of the Fst values were a bit higher, some lower, but the 0.10 seems about right.
To make it easy for some of you, I’ve labeled and placed the approximate locations of ancestral groups to modern Northern Europeans ~10,000 years ago. What I’m trying to represent is a map which shows the modal regions of distribution of ancestors that Northern Europeans today had 10,000 years ago. So, for example, since ~15% of the ancestry of Northern Europeans is “Ancient North Eurasian” (ANE), a lot of ancestors of Northern Europeans alive today would be living somewhere in the broad expanse of Central Eurasia (now, because of various demographic events the number of ANE was probably lower than farmers, perhaps lower than the 15% contribution to the modern genomes).
A substantial proportion of the ancestry of Northern Europeans is “European hunter-gatherer,” dating to the Pleistocene. But here’s the kicker: most of that ancestry dates to after the LGM, to about ~15,000 years ago. The really deep Pleistocene ancestry in Europe is only found at very low levels now.
The final issue is that a lot of the phenotypes that we racially code are recent. This probably explains why groups like the Kalash and Nuristanis can look more like Europeans than South Asians, but they’re genetically more like South Asians.
What does any of this have to do with non-scientific things? I don’t really know. My interest in population structure is intellectual, not personal. But a certain type of person should probably stop talking about how white people have been in Europe for 40,000 years. First, the ancestors of modern Europeans 40,000 years ago were almost all residing outside of Europe. An assertion that holds until 15,000 years ago. And most would still be resident outside of Europe 8,000 years ago as depending on how you count/calculate* And, perhaps more importantly, the typical phenotype of Northern Europeans probably really coalesced only around ~5,000 years ago.
* Definitely true for Southern Europeans, but conditional on Northern Europeans depending on where you draw Europe’s eastern boundary.
Addendum: I stole the title from John McWhorter’s book, Our Magnificent Bastard Tongue.
Also, this is not to say that
1) population structure today is trivial in a phylogenetic sense, it isn’t.
2) it is not to say that population structure functionally irrelevant, it isn’t.