R1a1a is one of the most geographically expansive Y chromosomal haplogroups. It spans the Irish Sea to the Bay of Bengal. I am of this lineage, as is my friend Daniel MacArthur. But with deeper exploration of the phylogeny of this haplogroup it seems clear now that it is very diverse, with a great deal of geographic structure. There are a wide range of South Asian lineages, but also one very dominant one in Eastern Europe.

A new paper in Nature Communications, Phylogenetic applications of whole Y-chromosome sequences and the Near Eastern origin of Ashkenazi Levites, addresses a peculiarity in the domain of Jewish genetics. The Levites, the helpers of the Cohen priestly class, seem to be carriers of R1a1a, and this lineage has exploded rapidly in this population (the classic “star-like phylogeny”). The historical genetic question though is this: are the Levites descendents of a Slav proselyte? Within Europe R1a1a exhibits the highest frequency in what was once the Pale of Settlement, so this is a reasonable question.

panelA Using whole genome analyses and more extensive geographic coverage, the answer to this question seems to be no. Rather, the Levites descend from a distinct West Asian branch of non-European R1a1a. This is evident in panel A of Figure 1. You can see clear that the Ukrainian samples are the outgroup in relation to the other branches of R1a1a. And within those there is further structure, as the South Asian Gujaratis are distinct from the clade in which most of the Ashkenazi Levites are nested (the authors posit that the presence of the Iberians may be attributable to the Moors, plausible enough). The Golden Age of Y chromosomal phylogenies is over, but these markers still have some juice which can be squeezed out.

Citation: Phylogenetic applications of whole Y-chromosome sequences and the Near Eastern origin of Ashkenazi Levites (open access).

• Category: Science • Tags: Ashkenazi Jewish Genetics 

A few people have asked me about a new paper on arXiv, The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses. Since it is on arXiv you can read the preprint yourself. And, since it is a preprint it is not quite polished, so keep that in mind when evaluating it. After a fashion we are part of the polishing process. So what do I think?

First, it seems to me that the author has a sense of humor about this, and I don’t know how seriously to take some of his assertions. Consider this passage: Such an unnatural growth rate (1.7-2% annually) over half a millennia, affecting only Jews residing in Eastern Europe is commonly explained by a miracle (Atzmon et al. 2010). Unfortunately, this divine intervention explanation poses a new kind of problem – it is not science. Taken literally this seems rather bizarre. In the paper referenced the author refers to the “so-called demographic miracle of population expansion,” alluding to another scholar’s observation. It seems obvious that miracle in this context simply means an inexplicable phenomenon, not a genuine supernatural intervention. There are also plain factual problems which I assume will get cleared up in the final draft. Romania and Hungary are referred to as Slavic nations which were targets of migration by Khazars fleeing the collapse of their polity. Neither of these nations were then, or are now, Slavic. In general I have to say that the historical framework of the paper is very skeletal, verging on incoherent (at least to me).

That being said, there are positives. The authors use methods which you yourself could replicate with a public data set. When it comes to the “methods” section he seems to have it down (this is clearly a side project looking at his research focus). In particular my first instinct was to look for the keyword “IBD.” To get a real good sense of history through genetics utilizing dense marker data sets you really need to look at correlations across the genome which are indicative of relatively recent relatedness, not just PCA and model-based clustering which give you summaries of affinities between populations (e.g., ADMIXTURE). And they used many of the methods you’d want to see in concert. What more could they have done? Well, tested some explicit demographic models. But that’s computationally intensive from what I recall.

Setting aside the historical fuzziness of the paper, the major issue I have is that though the methods are totally kosher, so to speak, the data you put into them strongly shape your outcomes. Dienekes and Maju both anticipated my own key concern. The “Middle Eastern” aspect of Ashkenazi Jewish ancestry might in fact be most well represented by populations in the zone of the northern Fertile Crescent and Eastern Anatolia; rather near or overlapping with the homelands of several of the Caucasian populations used in the above study as a proxy for Khazars. Additionally, modern Palestinians (the HGDP data set) are used as a reference to the Middle Eastern ancestors of Jews. I now believe that the Arabian contribution to the ancestry of Levantine and Iraqi Muslim population which dates to after the 7th century, and differentiates Muslim Arabs from their local non-Muslim Arab* co-ethnics, is significant. Perhaps on the same order of Germanic ancestry in modern England which dates to the 6th century and later. In plainer language the Caucasian component that is being detected in this paper may simply be a indigenous Middle Eastern ancestral element which has now been somewhat displaced northward in its modal frequency due to the expansion of the Arabs, and later the admixture of some Sub-Saharan admixture among Muslim Arabs. This would explain the finding of the author that the Druze, who are an endogamous community which has roots in the mountains of Lebanon, have affinities to the Turks. From this the author posits a Druze migration southward, but I suspect a more parsimonious explanation is simply that the Druze are a relatively isolated population which is more reflective of the Near Eastern genetic substratum which has been somewhat modified by over 1,000 years of cosmopolitan Muslim polities in the lowlands. In this model the modern Turks and Kurds would also be reflective of this ancient substratum, being more insulated from Sub-Saharan admixture as well as the population movements of Arabian tribes from the peninsula in the first century or so of Islam.

One aspect of the paper which requires some clarification is the idea that the Armenians are a Caucasian people. If you look at the modern state of Armenia this is eminently reasonable. But for most of its history Armenia was a marginal Caucasian nation, with its center of gravity further south, straddling Anatolia and western Iran, and looming over the plains of Mesopotamia. The Caucasian nature of modern Armenia is to a great extent a function of the extermination of Armenians from much of eastern Anatolia in the early 20th century. In contrast, Georgia is much more fundamentally a Caucasian nation. If you kept this reality in mind I suspect that passages such as this would not be necessary: The high genetic similarity between European Jews and Armenian compared to Georgians…is particularly bewildering because Armenians and Georgians are very similar populations that share a similar genetic background…and long history of cultural relations…. I wouldn’t place too much stock in one particular result, but it becomes a lot less bewildering if you know that Armenians have been much more active players in Near Eastern history because of locus of concentration further south than Georgians (e.g., Lesser Armenia).

Mind you, I wouldn’t be totally shocked if there was a Khazar contribution to the modern European Jewish ancestry. There have been some suggestive uniparental results. But the smoking gun for me is a simple one: East Asian ancestry. The Khazars were Turkic, and as such they would have had substantial proportions of East Asian ancestry. This is evident in the modern Chuvash, who have had a thousand years to admix with surrounding Slavic populations (and have). There are reasonable explanations for the “Caucasian” ancestry of Ashkenazi Jews which do not make recourse to the Khazar hypothesis. But a Mongoloid element is almost certainly feasible only through Turks of some sort, and the coincidence of a Judaized Turkic populations on the fringes of Europe is far too coincidental. There are some suggestive results which indicate small components of Mongoloid ancestry in Ashkenazi Jews, but the proportions are low enough that they may be some artifacts. This is one area where more investigation is warranted. For example, whole-genome analyses which look at “East Asian” segments in Ashkenazi Jews, and match them to various East Asian populations. That would almost certainly answer the Khazar question, as there are relatively undiluted Turkic populations, such as the Kirghiz, that one could use as a reference.

Finally, despite the fact that I praise the author’s utilization of a wide array of contemporary statistical genetic methods, one can’t just do away with a thick and sturdy historical framework and reasonable questions derived from this superstructure. The historical models tested in this paper are moderately inscrutable to me (e.g., the “Rhineland hypothesis”). As others have noted there is a peculiar lacunae in regards to models of ethnogenesis during Roman antiquity, even though other lines of historical and genetic evidence do point in that direction. Instead, the author concocts a scenario of a mass migration after the Muslim conquest from the Middle East into Europe. To my knowledge Europe after the fall of Rome was not taking in the huddled Hebrew masses (though it was taking in some Middle Eastern Christians). But perhaps I haven’t read the proper books on this issue. In some ways to me this paper screams of the problem with taking a mass of data and using legitimate methods, and coming out with very specific results because of the way the parameters are set. In this case the parameters happen to be two contrasting models, and a neglect of other alternatives. This is unfortunately one of the primary problems with “hypothesis driven research” in the age of big data.

Overall I still commend the author for putting this up on arXiv. I hope this sort of feedback will result in some revision, and we’ll get a better handle on what’s going on here.

* Though the majority of Arab non-Muslims are Arabicized (ergo, some of them still reject Arab identity despite the usage of Arabic as their day to day language), a minority may date to Arab Christian populations which were numerous on the fringes of the Roman and Persian Empires by 600.

I know I excoriate readers of this weblog for being stupid, ignorant, or lazy. But this constant badgering does result in genuinely insightful and important comments precisely and carefully stated on occasion. I put up my previous post in haste, and when I published it I wasn’t totally happy with the evidence from which the authors adduced that Ashkenazi Jews were not inbred. Here’s why, from the comments: Doesn’t identity-by-state permutations test reflect a counterbalance of admixture vs. inbredness + drift? Rather than just the degree of inbreeding? Since the population has strong admixture effects, a low IBS doesn’t exclude strong inbreeding, does it?

From my little personal experience IBS is not the best statistic from which to generalize widely, and can be highly misleading in admixed individuals, as implied by the commenter. First, since I’ve stated above that the Ashkenazi Jews are admixed, let me go into a tangent as to why Ashkenazi are admixed between a Middle Eastern and Western European population, as opposed to being a relatively unadmixed ancient Eastern Mediterranean group with affinities to both regions. The previous previous paper found evidence of linkage disequilibrium decay. This means that LD was high in admixed individuals in the past, and declined over time. Why?

Imagine someone who is half black and half white. There are particular alleles which are highly diagnostic for black or white ancestry (e.g., SLC24A5 or Duffy). In admixed individuals these alleles will be correlated on the same chromosomal segment from one parent. They will have linear blocks of ancestry; colloquially, one chromosomal pair will be from one population and the other from the other population. Now imagine that you have a population of individuals who are mixed in white and black ancestry, and they pair off with each other over the generations exclusively. The ancestral fractions will remain roughly the same (let’s assume a large effective population), but the genomic segments of ancestry from a specific population will be broken apart by recombination. The more generations from the initial admixture event, the more the blocks of ancestral segments will be scrambled. LD is a statistic which can measure this, since we’re talking about patterns of correlated alleles across loci. Given a long enough time LD will converge upon what you would expected in a random mating population. But if the admixture event was recent enough, then LD will be elevated over intervals which can give you a sense of the time in the past that the admixture event occurred. Ashkenazi Jews exhibit an LD pattern of a population which went through admixture between Europeans and Middle Easterners, not just a population whose allele frequencies lay between these two groups.

With that about the way, how about inbreeding? I will put up a follow up posting on what I mean by inbreeding, but here is a paper which gives a better sense of what’s going on with Ashkenazi Jews, Abraham’s Children in the Genome Era – Major Jewish Diaspora Populations Comprise Distinct Genetic Clusters with Shared Middle Eastern Ancestry:

IBD between Jewish individuals exhibited high frequencies of shared segments…The median pair of individuals within a community shared a total of 50 cM IBD (quartiles: 23.0 cM and 92.6 cM)—such levels are expected to be shared by 4th or 5th cousins in a completely outbred population. However, the typical shared segments in these communities were shorter than expected between 5th cousins (8.33 cM length), suggesting multiple lineages of more remote relatedness between most pairs of Jewish individuals….

Within the different Jewish communities, three distinct patterns were observed…The Greek and Turkish Jews had relatively modest levels of IBD, similar to that observed in the French HGDP samples. The Italian, Syrian, Iranian, and Iraqi Jews demonstrated the high levels of IBD that would be expected for extremely inbred populations. Unlike the other populations, the Ashkenazi Jews exhibited increased sharing of segments at the shorter end of the range (i.e., 5 cM length), but decreased sharing at the longer end (i.e., 10 cM)….

Another paper, Fine-scale population structure and the era of next-generation sequencing:

The amount of genomic sharing can be easily quantified and the degree of relationships between a random pair of individuals in a population can be estimated. For example, a random pair of individuals in the Ashkenazi Jewish population are as genetically similar, on average, as fourth cousins…indicating that recent genealogy may be of importance. The impact of inbreeding on the frequency of rare diseases has been demonstrated in historically endogamous populations, such as the Ashkenazim, Hutterites and some island groups…However, if many geographic regions throughout the world are finely structured as suggested by identical-by-descent (IBD) analysis, then populations in those regions may also have elevated rates of rare alleles and correspondingly of unique rare diseases. Indeed, in terms of the average amount of genomic segments’ IBD, the Ashkenazim are not outliers in the global sample mentioned above.

As indicated by the paper by Ralph et. al. many European populations share IBD tracts indicating a fair amount of relatedness on the time scale of ~2,000 years. The Ashkenazi Jewish pattern of lots of short tracts is why services like 23andMe yield so many “relative” matches for individuals of that background. These people share a lot of the same ancestors, but these are often rather far back in time. So, when considering “inbred” as in the products of frequency cousin marriages, Ashkenazi Jews are not inbred in that manner. They don’t exhibit an abnormal level of very long IBD tracts, which is what you get floating around in the population with there are lots of extremely recent common ancestors between one’s two parents. The coefficient of relationship between parents and children is 50%. Between first cousins (in an outbred population) is 12.5%. Between fourth cousins it is 0.2%.

Just because Ashkenazi Jews are not inbred by this measure does not mean that they are not relatively genetically homogeneous person to person, and exhibit a great deal of distinctiveness. They do. But as I stated earlier, that distinctiveness is likely due to an ancient phase cosmopolitan admixture, followed a shift toward relatively strict endogamy imposed by their position as a marginalized minority within Islamic and Christian civilization.

