The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

The Pith: In this post I examine the relationship between racial ancestry and cancer mortality risks conditioned on particular courses of treatment. I review research which indicates that the amount of Native American ancestry can be a very important signal as to your response to treatment if you have leukemia, as measured by probability of relapse.

If you are an engaged patient who has been prescribed medication I assume you’ve done your due diligence and double-checked your doctor’s recommendations (no, unfortunately an M.D. does not mean that an individual is omniscient). Several times when I’ve been prescribed a medication I have seen a note about different recommended dosages by race when I did further research. Because of my own personal background I am curious when it says “Asian.” The problem with this term in medical literature is that “Asian” in the American context is derived from a Census category constructed in 1980 for bureaucratic and political purposes. It amalgamates populations which are genetically relatively close, East and Southeast Asians, with more distant ones, South Asians (when my siblings were born I remember that my parents listed their race as “Asian” when they filled out paper work for the hospital).

But at least the issues with an “Asian” category are clear. Consider the “Hispanic/Latino” category. In the the USA this term also became popular through government fiat around 1970, as a catchall for people whose ancestry derives from the Spanish speaking Americas, with Spaniards, Portuguese, and Brazilians, being border-line cases. Additionally, it has become relatively common in the general American culture to code Hispanic as non-white. This despite the fact that all Latin American populations have large self-identified white populations, with some, such as Argentina and Uruguay, being overwhelmingly white. In the USA between 54% and 92% of Hispanics identify as white in terms of their race. The discrepancy is that some surveys allow for the “Some other race” option, which is the second most popular choice. Surveys which force respondents into a few categories such as white, black, Native American or Asian, produce a result where Hispanics default to a white self-identification. Implicitly we know it’s more complicated than this mishmash of bureaucratic convenience and opportunistic American identity politics. The HapMap has a Mexican American sample from Los Angeles. Above you see K = 3 in ADMIXTURE for Mexican Americans. Each thin “slice” is an individual, with the color proportions reflective of genomic contributions of one of three putative ancestral groups.The full plot had Europeans and Chinese as well. Blue seems to correspond with Native American, and red white European (the green residual is modal in East Asians). Los Angeles’ Mexican American community is obviously mixed-race. What in Latin American might be termed mestizo. And yet according to the survey data when forced to choose this community seems to affiliate with a white Spanish identity, blanco. Seeing as almost all of them are Spanish speaking and not indigenous (I am aware that the USA has a small and growing non-Spanish speaking Latino population of indigenous immigrants), this would make sense. But another facet of Mexican American identity surfaces in the concept of Aztlán, which is a nod to the Nahua roots of much of the Mexican population.

But whatever the the cultural nuance and subtly, which can be decomposed at length, it is also important to properly characterize the genetic structure of the Hispanic populations. Some Mexican Americans are predominantly white European in ancestry, and some are predominantly Amerindian. Many are mixed in roughly equal proportions. This is not just a minor detail. Going back to my first paragraph, a new letter to Nature Genetics reports on the differential response to treatment in children with leukemia proportional to Native American ancestry. Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia:

Although five-year survival rates for childhood acute lymphoblastic leukemia (ALL) are now over 80% in most industrialized countries…not all children have benefited equally from this progress…Ethnic differences in survival after childhood ALL have been reported in many clinical studies…with poorer survival observed among African Americans or those with Hispanic ethnicity when compared with European Americans or Asians…The causes of ethnic differences remain uncertain, although both genetic and non-genetic factors are likely important…Interrogating genome-wide germline SNP genotypes in an unselected large cohort of children with ALL, we observed that the component of genomic variation that co-segregated with Native American ancestry was associated with risk of relapse (P = 0.0029) even after adjusting for known prognostic factors (P = 0.017). Ancestry-related differences in relapse risk were abrogated by the addition of a single extra phase of chemotherapy, indicating that modifications to therapy can mitigate the ancestry-related risk of relapse.

They inferred ancestry through two different methods. First, they used principal component analysis to extract the biggest independent dimensions of variation within the genetic data set. What happens when you do this is that you quickly recapitulate totally comprehensible patterns of population genetic clustering within your data set. To the left you see a PCA where the largest component of variance separates Africans from non-Africans (x axis) and the second largest separates Europeans from East Asians (y axis). The underlying data is from a merging of the HapMap and HGDP.

This pattern crops up over and over. Within this broader framework you see more specific trends. I have labelled the Mexican American populations on the two dimensional plot. Note its linear topology. This is a sign of possible admixture. Roughly, the position of any given individual along a line between two putative parental populations is proportional to their distance from those populations. In plain English, someone who is half-Chinese and half-Swedish will be placed equidistant from the Chinese and Swedish clusters on a PCA plot with those populations. The Mexican Americans span a region between Europeans and East Asians. This makes perfect sense in terms of their recent population history. It also means that just knowing that someone is “Mexican” in their heritage does not tell you as much about their ancestry as if you knew that someone was “Chinese.” There’s a lot of variance genetically in the Mexican population.

I introduced the preamble about PCA plots because the figure where they use PCA to elucidate the ancestry of the HapMap and their sample population of leukemia effected children can be somewhat confusing. What you see is that panel A, B, and C, are PC 1, 2, and 3, respectively. That means that the top panel explains the most variation, and the third panel the least. I’ve added some extra labels because of the small font. You see in the top panel immediately what was evident in the two dimensional plot above: Africans separate out from non-Africans.The boxes represent the 25-75 percent intervals within the populations. Contrast the very tight distributions of the “pure” reference populations, and the more varied distribution of the children in their data set. It seems that some self-identified white children have a rather high load of African ancestry, while the black Americans naturally vary a great deal more than Yoruba in Nigeria. The distribution of the Mexican Americans reflects the African ancestry which has been absorbed within the Mexican population more broadly.

Panel B illustrates the second PC, which separates non-Africans on a rough west-east axis. So what’s going on with the Asians? Again, the 1980 Census strikes again! A substantial fraction of the “Asians” are “South Asian,” who have somewhat more “European” than “Asian” ancestry. Again, a small minority of “white” children seem to have substantial Asian ancestry. The Hispanic pattern is rather easy to explain, probably simply Amerindian-European admixture variation.

The last PC seems to separate Native Americans from other populations. So why are white and Asian children also exhibiting variance here? First, a non-trivial proportion of white Americans have substantial Native American ancestry. Brett Favre has a grandparent who was a member of the Choctow tribe, for example. Second, I suspect much of the variance is due to a common ancestry between Amerindians and some Eurasian groups which isn’t showing up in the white Utah sample, which is sampled from the far west of Eurasia, or the Chinese sample.

Another way to visualize ancestry is of course to posit K number of ancestral groups, and assign given quanta of ancestry to each individual from a given K. To the right you see a STRUCTURE bar plot where 2,500+ individuals are displayed vertically, with shading proportional to ancestry. I’ve placed tentative labels. Most of the children in the sample are white, and so exhibit mostly European (red) ancestry. From what I know about the black American community it seems that on this visualization they’ve been separated into two clusters (see the supplements for the algorithm). About 10% of black Americans are more than 50% white, while the median black American has 20-25% white ancestry. The Asian cluster is strange because it amalgamates East and South Asians. South Asians are 65-90% “European,” depending on their ancestral region. Finally, you have the Mexican Americans, who span the range of admixture between Europeans and Amerindians, with some African element as well.

STRUCTURE produces the averages to the left for self-identified populations. The proportions for African Americans is just about right. For the Hispanic category it seems more European than the Los Angeles Mexican Americans, but there are historical reasons to suspect that Mexican Americans in Texas have more Spanish ancestry, while Cubans in the USA are overwhelmingly white (and Puerto Ricans have more white ancestry than black or Amerindian). The very low percentages for non-European ancestry for whites makes me skeptical of the means; I assume there are some mixed-race individuals who identify as white, but I wonder if most white Americans with ~1% “Native American” just have deep common ancestry dating back to the Ice Age, or whether it’s an artifact of the method.

But what’s the point of all this? Ancestry analysis is fun, interesting, and has some relevance to broader socio-political debates and conflicts, but this is a story with some medical relevance (which is why it’s in Nature Genetics). In short, the authors found that Native American ancestry was a very high risk factor for relapse, conditional on the extent of chemotherapy. I merged table 3 and some panels from figure 2 to show what’s going on:

On the right are a list of risk (or mitigating) factors. I’ve underlined the effect of the proportion of Native American ancestry, treated as a continuous variable. To the left, you see the probability of cancer relapse as a function of Native American ancestry. The red line are those with less than 10% Native American ancestry, and the blue line more than 10%. In the top panel you see the impact on self-identified whites. The bottom panel shows the outcome for those who did not receive “delayed intensification” treatment. Panel E, which I did not show, illustrates clearly that the two lines converge when delayed intensification treatment is provided. If there is something in the Native American genetic background causing this problem, then it seems one can model it as a “gene-environment interaction.” That is, the genetically mediated outcome is conditional on particular environmental conditions (in this case, lack of the treatment).

But ancestry isn’t magic. The authors managed to track down candidates SNPs associated with Native American ancestry on the genomic level. In particular, the risk allele at rs6683977 at PDE4B was significantly more common among those with more than 10% Native American ancestry than less than 10%. Native American ancestry in that genomic region was also associated with particular relapse risk.

This seems a relatively straightforward application of using genetic data in a cost-benefit program. In the United States there is a major issue with growing health care costs. Many with a technocratic “evidence based” bent are curious as to efficacies of particular treatments, and their relationship to the costs incurred. It may be that for particular genetic backgrounds the cost-benefit calculus will be different than for the general population. While further rounds of chemotherapy may not be justified in terms of return-on-investment (i.e., probability of survival 10 years out) for a white child, it may be for a Native American one. Weighing costs and probabilities like this may seem bloodless, but we do it implicitly every day. This is just another tool in that thankless enterprise.

Citation: Yang JJ, Cheng C, Devidas M, Cao X, Fan Y, Campana D, Yang W, Neale G, Cox NJ, Scheet P, Borowitz MJ, Winick NJ, Martin PL, Willman CL, Bowman WP, Camitta BM, Carroll A, Reaman GH, Carroll WL, Loh M, Hunger SP, Pui CH, Evans WE, & Relling MV (2011). Ancestry and pharmacogenomics of relapse in acute lymphoblastic leukemia. Nature genetics PMID: 21297632

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Ancestry, Genetics, Genomics, Health, Medical genetics, Race 
🔊 Listen RSS

Nature profiles Dodecad, the Pickrell Affair, and the emergence of amateur genomicists in a new piece. Interestingly David of BGA is going to try and get something through peer review. In particular, the relationship of Assyrians and Jews.

So we have Genomes Unzipped, Dodecad, and BGA. What next? Who next? I hope Dienekes doesn’t mind if I divulge the fact that the computational resources needed to utilize ADMIXTURE as he has is within the theoretical capability of everyone reading this post. Rather, the key is getting familiar with PLINK and writing some code to merge data sets. After you do that, to really add value you’d probably want to get raw data from more than what you can find in the HGDP, HapMap and other public resources.

But here I make an open offer: if you start a blog or a project which replicates the methods of Dodecad and BGA I’ll link to you and promote you. When Dienekes began Dodecad I actually started to play around with the data sets in ADMIXTURE, but I’ve personally held off until seeing what he and David find. What their pitfalls and successes might be. Here’s to 2011 being more interesting than we can imagine!

Update: Already had a friend with a computational background contact me about doing something on South Asian genomics. So again: if you get a site/blog set up, and start pumping out plots, I will promote you. In particular, if you need 23andMe raw data files of geographical region X it might be useful to try and get the word out via blogs and what not.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Since I have been promoting the Dodecad Ancestry Project, it seems only fair to bring to your attention Eurogenes 500K SNP BioGeographicAncestry Project. The sample populations are a bit different from Dodecad, but again ADMIXTURE is the primary tool. But the author also makes recourse to other methodologies to explore more than simply population level variation. For example, his most recent post is Locating and visualizing minority non-European admixtures across our genomes:

Imagine, for example, a white American carrying a couple of tiny segments of West African origin, from an ancestor who lived 250 years ago, and an eastern Finn with no Asian ancestors in the last 4000 years or more. If we run an inter-continental ADMIXTURE analysis with these two, it’s very likely the American will score 100% European, while the eastern Finn will probably come out around 9% North and East Asian due to really old Uralic influence.

That sort of thing isn’t a huge problem when comparing the genetic structure of populations. Obviously, overall, eastern Finns rather than white Americans are genetically closer to North Asians, and that’s basically what ADMIXTURE picks up. However, if the focus is also on individuals, this certainly can become an issue. Our hypothetical American might be aware of that African ancestor, with solid paperwork backing up their genealogical connection, but he’s pulling his hair out because nothing’s showing up via genetic tests.

So let’s take a look at a real life example of how RHHcounter can pick up segments of potentially recent Sub-Saharan African origin…

Olivia Munn & Uyghur woman

The basic issue here is that in terms of genomic variation old admixture looks different from new admixture. Someone who is a first generation Eurasian, with a Chinese and European parent, may be about the same ancestral mix proportionally as a Uyghur. They would resemble a Uyghur on STRUCTURE and be placed within that cluster on a PCA chart (this is what happens in 23andMe). But, the Uyghur “Eastern” and “Western” genetic heritage has been reshuffled to a great extent by recombination over the past 1,000-2,000 years. In contrast, a first generation Eurasian will have huge swaths of their genome which are Eastern or Western on alternating strands (from their respective parents). In population genetic language a group of first generations hybrids would be exhibit a lot of linkage disequilibrium (LD). In a panmictic hybrid population LD will decay due to recombination, which breaks apart the distinctive allelic associations inherited from the parental populations.

This is the key to differentiating between the old “Asian” ancestry which sometimes falls out of the genetic variation of Finns at low frequencies, and more recent Asian ancestry. For example, the paleoanthropologist Vance Haynes is apparently a great-grandson of one of the original “Siamese Twins,” Chang Bunker. Chang Bunker was a Chinese Thai, so presumably Vance Haynes would come out to be ~10% Asian, and would be shifted toward the Asian cluster in relation to other Europeans. On the other hand, a closer look at his genome would indicate differences from a Turk who was ~10% Asian, because Vance Haynes’ Asian ancestry has only had three generations for recombination to break apart the original allelic associations which were passed down from Chang Bunker. After only these few generations the genome would still show many segments of clustered ancestry with distinctive sets of markers characteristic of Han Chinese.

Let’s make this more concrete. Below are two “ancestry paintings” from 23andMe. One is of a reference example, a Uyghur woman, and another is of a Eurasian individual. The difference is pretty obvious:


23andmeclusFor the record, 23andMe says that the Eurasian man is 50% Asian, 50% European. For the Uyghur woman, 52% European, 48% Asian. As I indicated above, Eurasian individuals who are projected onto the variation of the HGDP sample tend to cluster with the Uyghurs. In the image to the left the black mark indicates the Eurasian man. The Uyghurs are green. The purple rectangles are Hazaras.

But obviously this is a trivial example. What’s the point of sniffing around for non-European ancestry in individuals whose non-European ancestry is 1) visible, and 2) recent and immediate. No, a bigger question here are claims and suggestions by some white Americans that they have significant non-European ancestry. Usually this is Native American. But in the case of one of the European-origin samples which “Polako” (the principal behind the BGA Project) analyzed it seems there is a suggestion of West African ancestry.

dandonThis individual is Dr. Don Conrad of Genomes Unzipped. In particular, Polako found that there were two nearby segments on two chromosomes which exhibited a pattern of population atypical heterozygosity in Dr. Don Conrad’s genome. Look at chromosomes 7 and 13. Contrast the pattern with my distant paternal cousin, Dr. Daniel MacArthur. He also exhibits points of heterozygosity, but they’re randomly distributed across the genome. It’s old admixture or just noise.

Polako doesn’t make much of Dr. Don Conrad’s results, and neither do I (presumably as Dr. Don Conrad is a member of Genomes Unzipped it’s easy to talk about his results without any of the ethical or moral hassles about confidentiality). On the other hand, unlike Dr. Dan MacArthur, a little utilization of the powers of the interwebs indicates that Dr. Don Conrad is an American. In particular, of recent Midwestern background. Though I’m not a total creep, so I didn’t start poking around But after the Pickrell affair I am probably just a touch more hesitant to laugh off peculiar results from these sorts of analyses as simply algorithms-gone-meshugana.

Image Credit: Colegota

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Admixture, Ancestry, BGA, Genetics, Genomes Unzipped, Genomics 
🔊 Listen RSS

Harry_F._ByrdIn Jonathan Spiro’s Defending the Master Race it is recounted that as American states were passing more robust anti-miscegenation laws and legally enshrining the concept of the one-drop-rule an exception was made in Virginia for those with 1/16th or less Native American ancestry. The reason for this was practical: many of the aristocratic “First Families of Virginia” claimed descent from Pocahontas. Included within this set was Senator Harry F. Byrd Sr. of Virginia, who was 1/16th Native American, being a great-great-grandson of Pocahontas. This sort of background was probably not exceptional among the “Founding Stock” of Anglo-Americans whose ancestors were resident within the boundaries of the American republic at independence. Only around 1700 did the white population of the American British colonies exceed the indigenous, so no doubt some amalgamation did occur.

But from what I’ve seen the extent of admixture with the indigenous substrate was very marginal, especially in comparison to white populations in Argentina or Brazil. Or so I thought. In conversation a friend recently claimed that over 50% of American whites were 5% or more non-European in ancestry. I expressed skepticism, and he dug up the citation. Genetic ancestry: A new look at racial disparities in head and neck cancer:

The study included 358 patients; 37 percent were African American.

The researchers examined diagnosis (late versus early stage) and overall survival for African Americans with HNSCC based on self-reported race and genetic West African ancestry.

During the past decade, many groups have developed and characterized sets of single nucleotide polymorphism markers that can distinguish genetic ancestry among major ethnic groups such as Asian and West African, called ancestry information makers (AIMs).

For the study, genetic ancestry was based on a panel of 100 AIMs to estimate genetic background.

“Using these genetic markers gives you additional statistical power. It’s no longer two just categories – Black or White; it becomes a continuous variable. Race is not equal to genetics. Genetic markers don’t define specific races,” says Dr. Worsham.

Ultimately, the study found no correlation between West African genetic ancestry and HNSCC outcomes. Only self-reported race was associated with head and neck cancer stage.

Only 5 percent of self-reported African Americans had more than 95 percent West African ancestry, with 27 percent having less than 60 percent West African ancestry. By comparison, 48 percent who self-reported as Caucasian had more than 95 percent European American ancestry.

I’m not too worried about the number of markers. 100 should be sufficient on the scale of continents if well selected. But I’m curious about the representativeness of the sample. The African American one seems more European than others I’ve seen previously. And I really haven’t seen that much admixture with non-Europeans in the CEPH Utah white sample in the HapMap. But perhaps the Utah whites aren’t representative? Dienekes ran ADMIXTURE on the HapMap3 populations a few weeks ago, and I don’t see any elevated component of non-European ancestry in the Utah whites when compared to the Tuscans from Italy.


A factoid such as that less than 50% of white Americans are 95% or more European in ancestry can get traction quickly. But I think we should wait a bit and just get more samples. The results are from a presentation at a conference, not even a paper. Of course there’s a possibility that many people have more interesting backgrounds than multi-generational families which settled in Utah rather early. Time will tell.

Addendum: I believe that Native American admixture is going to be more common among the white Americans of the South than Yankees from New England. The reason I would give is that powerful and populous tribes and confederacies such as the Creek and Cherokee persisted in the Southern highlands far longer than in New England. The CEPH sample is going to be biased toward Yankees, as well as European converts from the British Isles and Scandinavia, so perhaps giving a somewhat lower result for non-European ancestry in American whites.

Addendum II: I thought about it more. Something went wrong in their analysis, or they had a very unrepresentative sample. Perhaps they had many Latinos and only coded their self-identified race and not ethnicity (50% of American Latinos identify as white). Maybe the AIMs aren’t good. I don’t know. But I do know that American genealogy buffs who assume Native American ancestry are often very disappointed. They seem to far outnumber those who find surprising non-white ancestry.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Ancestry, Genetics, Genomics, Race 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"