The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS


Claudia_Cardinale_1963 The question of Italy population genetic structure comes up rather often for various reasons. I haven’t visited this topic in much detail since reading Consanguinity, Inbreeding, and Genetic Drift in Italy, a very old book using classical genetic techniques. L. L. Cavalli-Sforza did not find much structure in Italy at the time, but it turns out that there wasn’t enough power in the methods. I have some access to Italian data sets and I can tell you that there is a lot of variation. Sicilians in particular are mixed in ways unique outside of the Iberian peninsula A few years ago using the PopRes data set Peter Ralph and Graham Coop found in The Geography of Recent Genetic Ancestry across Europe some interesting facts about Italy:

In addition to the very few genetic common ancestors that Italians share both with each other and with other Europeans, we have seen significant modern substructure within Italy (i.e., Figure 2) that predates most of this common ancestry, and estimate that most of the common ancestry shared between Italy and other populations is older than about 2,300 years (Figure S16). Also recall that most populations show no substructure with regards to the number of blocks shared with Italians, implying that the common ancestors other populations share with Italy predate divisions within these other populations. This suggests significant old substructure and large population sizes within Italy, strong enough that different groups within Italy share as little recent common ancestry as other distinct, modern-day countries, substructure that was not homogenized during the migration period. These patterns could also reflect in part geographic isolation within Italy as well as a long history of settlement of Italy from diverse sources.

There were limitations in terms of how much geographic specificity the PopRes data set provided them, so there was only so much you could say. One hypothesis could be that unlike much of Europe deep local structure within the Italian peninsula predating the Roman Empire persists to this day. The Latinization of Italy then during the late Republican and early Imperial period could be thought of primarily as a matter of cultural diffusion and elite emulation. This stands to reason in part because much of the Italian peninsula was inhabited by peoples who were already speaking languages very close to Latin. But, another possibility is that this deep structure exists became of more recent migrations. For example, the existence of Magna Graecia in southern Italy and Sicily was due to the migration of males from Greece in the centuries before the rise of Rome. The genetic distance of this population would be inflated due to this gene flow, and if Italian demographic history is such that gene flow across regions is low, then it would persist.

italyBut things have changed since 2013. We know a fair amount more about European genetic history, thanks to ancient DNA. Just read Ancient human genomes suggest three ancestral populations to get a flavor. In short, it turns out that most European populations can be modeled as a three-way admixture, between one group with ancient Middle Eastern affinities, but different from modern Middle Easterners. Modern Sardinians are very close to this group. A second group are the indigenous European hunter-gatherers, who presumably expanded after the retreat of the tundra and had deeper roots in the continent, possibly at least back to the Gravettian period. Finally, a third group is a compound with a different Middle Eastern group, the European hunter-gatherer ancestry, and an ancient North Eurasian population more distant to other West Eurasians.

ejhg2015233x3 Most readers of this weblog are familiar with this song and dance. Now I want to submit new results from a paper in EJHG, The Italian genome reflects the history of Europe and the Mediterranean basin. A minor nit: I would assume that the Italian genome reflects the history of Europe and the Mediterranean basin! It would be really surprising if the Italian genome reflects the history of East Asia and the South China Sea!

What immediately jumped out for me about the results form this paper is that it seems clear that all non-Sardinian populations exhibit equal distance to Sardinians. That is, there is no “Sardinian-cline” in these data. Perhaps there are populations on the mainland that do exhibit a Sardinian-cline, but they haven’t been sampled in this study. What does this mean? The circumstantial evidence is strong that there was an intrusive population across Europe which arrived from the steppes spread across Northern Europe about ~4,500 years ago. The linguistic evidence tends to bind the Celtic and Italic branches of the Indo-European language family, so it seems the case that there was likely an intrusive population from Northern Europe that arrived sometime between 500 BC, when the Italian populations start to edge into history, and 2500 BC, when the Indo-Europeans swept Northern Europe. These people would presumably have amalgamated with the original Sardinian-like group. The best work suggests that though Sardinians have the most of this ancestry, it is still predominant in Southern Europe overall. It is curious then that the Sardinian fraction is so low, and, that it is relatively even. In fact, it is lowest in the southernmost Italian groups, and highest in Lombdary! Part of this is probably because Sardinian is not the same as Sardinian-like farmer. But I still would have expected some cline (I presume the Sardinians shifted toward the mainland are due to migration from the mainland). On the other hand, there is a large north-south gradient that you can see on the admixture plot .

ejhg2015233x5 The plot to the left is too small to make out well, but as people allude to Italian population structure in a world-wide context, this PCA does just that. The bright green are the Southern Italians, the bright light blue the Central Italians, and the red the Northern Italians. You see that the Southern Italians are shifted toward the Middle Eastern groups, while the Northern Italians are closer to groups like the Spanish and French. To the top right are Northern European groups, in purple, and the bottom right are Mozabites, with Turks in dark green in the middle, shifted toward Italians. Sardinians occupy the far left. As you can see, contrary to a commenter earlier this week, Italians of all stripes are not that distinct from other Europeans.

But, Southern Italians, and from what I have seen in private data Sicilians in particular, are distinct because of a possible admixture signal with exotic groups you don’t normally see in Europeans. If you look in the supplements the possibility becomes clearer. There is a lot of evidence that this admixture is North African. You see this in the ADMIXTURE plots in the supplements, as well as the IBD sharing patterns. The South Italian groups are enriched with the Mozabites and Moroccans, not groups from the eastern Mediterranean. The likely period when this admixture occurred is when Sicily was an Arab emirate, from 830 to 1070. More or less Sicily was then part of the greater Maghreb. Calabria also had a Muslim presence, though more tenuous.

Finally, the authors used LD patterns and reference populations to attempt to estimate admixture times:

We found evidence of the presence of a mix of Central-Northern European and Middle Eastern-North African ancestries in the Italian individuals (Supplementary Table S5). The estimated times of admixture ranged between ~2050 and 1300 years ago (y.a.), with an average of about 1650 y.a. – assuming 29 years per generation– for Northern Italians, and between ~3000 and 1450 y.a. (~2100 y.a. on average) for Central Italians. Finally, for the Southern Italian individuals, admixture between European and Northern African-Middle Eastern ancestry was estimated to have occurred about 1000 y.a. (see Supplementary Table S5 and Supplementary Results for a complete report of significant results).

The admixture in Southern Italy is estimated to have occurred ~1000 years ago. That’s pretty much what you’d expect. These methods tend to pick up the last signal of admixture, so there may have been ones earlier (e.g., Magna Graecia?). That might explain the relatively low fraction of “Sardinian” ancestry, as this area of Italy has had significant gene flow from outside Italy over the past 2,500 years, whether it be Greeks, people from other parts of the Mediterranean, and last Maghrebis.

The difference between Northern and Central Italians is intriguing. The reference populations are not optimal, and the dates have a wide interval. We actually know what was happening 2,100 years ago in Central Italy, and there was no admixture between Middle Eastern and Northern European groups. The Roman world empire was still in a nascent state. The Northern Italian admixture date might align with a German migration into Italy, or perhaps the Gauls in the centuries earlier. I really don’t know. I am of the inclination to suggest that the Central Italian signal might be somehow low balling the Indo-European admixture.

The authors say that their data will be released. But I looked up the accession number, and it’s not up there yet.

• Category: Science • Tags: Genomics, Italy 
🔊 Listen RSS

Since the Ralph & Coop paper on IBD patterns across Europe I’ve been keen to see what gets uncovered about Italy. Recall, if you will, that in that paper the authors noted that Italy in particular of European nations exhibits a lot of deep population structure. Whereas the network of descent ties together many European nations and regions, in Italy there are deep regional differences which seem to go back to antiquity. Additionally, more recently Sardinia has come under focus as possibly particularly informative in the ethnogenesis of European peoples. Until recently I was moderately skeptical of the utility of Sardinian samples in the HGDP data set. After all, it was an isolated island, and perhaps subject to peculiarities of low effective population size. Well, it turns out that it may be that modern Sardinians are the best approximation we have today to Southern Europeans ~5,000 years ago.

A new paper in PLoS ONE has a huge sample of Italians, and applies standard techniques to ascertain population structure. An Overview of the Genetic Structure within the Italian Population from Genome-Wide Data:

In spite of the common belief of Europe as reasonably homogeneous at genetic level, advances in high-throughput genotyping technology have resolved several gradients which define different geographical areas with good precision. When Northern and Southern European groups were considered separately, there were clear genetic distinctions. Intra-country genetic differences were also evident, especially in Finland and, to a lesser extent, within other European populations. Here, we present the first analysis using the 125,799 genome-wide Single Nucleotide Polymorphisms (SNPs) data of 1,014 Italians with wide geographical coverage. We showed by using Principal Component analysis and model-based individual ancestry analysis, that the current population of Sardinia can be clearly differentiated genetically from mainland Italy and Sicily, and that a certain degree of genetic differentiation is detectable within the current Italian peninsula population. Pair-wise FST statistics Northern and Southern Italy amounts approximately to 0.001 between, and around 0.002 between Northern Italy and Utah residents with Northern and Western European ancestry (CEU). The Italian population also revealed a fine genetic substructure underscoring by the genomic inflation (Sardinia vs. Northern Italy = 3.040 and Northern Italy vs. CEU = 1.427), warning against confounding effects of hidden relatedness and population substructure in association studies.

The number of SNPs is rather good for the tasks which they attempted. My personal experience is that for clustering algorithms like ADMIXTURE or PCA you’re hitting diminishing returns >100,000, if you are looking at intra-national differences. And the sample size is rather large, though the authors admit that they could have had denser coverage of central Italy. For Italy they pooled a lot of data sets, including from biomedical studies. Naturally they also took in the HGDP and HapMap Italians.

On some methodological notes, the PCA is really hard to read. I’m not quire sure if the labeling is correct (see figure 1 to check me here). So I’ll just report the ADMIXTURE results. I looked at the methods, and I do have some concerns here. I am not clear if they ran ADMIXTURE K 2 to 10 more than once. The reality is that you should. That’s because ADMIXTURE is sensitive to the value of the seed parameter (you should change it from the default and allow it to be generated pseudo-randomly from the computer’s time), and when you do statistical checks such as cross-validation that value itself can vary across runs! What I’m saying is that one run of ADMIXTURE may tell you that K = 4 is the best fit, but another run may tell you that K = 6 is the best fit. It’s happened to me. I once ran a data set up to K = 20 20 times, and the cross-validation values themselves exhibited considerable variation across runs depending upon the K (there were some K’s though where the value seemed extremely stable, so I was more confident of the fit of that K).

Also, there was one passage which makes me a little curious as to how clearly the authors understand the clustering techniques being used here, and what it tells us (and does not tell us):

The average admixture proportions for Northern European ancestry within current Sardinian population is 14.3% with some individuals exhibiting very low Northern European ancestry (less than 5% in 36 individuals on 268 accounting the 13% of the sample).

I’d be careful of labeling a modal component in Northern Europeans “Northern European ancestry.” I’ve posted on enough topics related to this to illustrate how easy it is to generate statistical artifacts which have little correspondence to the real biological world. It’s one thing when you have two populations which are genetically very distinct, and clustering in a disjoint faction almost immediately. For example, Africans and Europeans. But when you have intra-European variation, and the clusters don’t distribute in an exclusive fashion, one should be wary of reifying them into real populations. “Northern European modal cluster” may not roll off the tongue, but it has the benefit of being precise and not false.

So what about the results? Nothing too surprising, I invite you to peruse the figures and read the supplements yourself. I did note that the evidence of intra-Italian migration is very obvious in these results. People whose geographic origins are in the north often cluster with southerners (i.e., the southern cluster), but people whose origins are in the south rarely seem to cluster with northerners. In the 20th century there were massive flows of migration from the Italian south to northern cities like Turin, while Mussolini encouraged the migration of southerners to the German speaking regions of the northeast. In contrast, few northerners headed south. In short, many people in northern Italy have grandparents or great-grandparents who left southern Italy. Far fewer southern Italians have grandparents or great-grandparents who left northern Italy (though they do exist, I actually met a young man recently whose mother was a Neapolitan whose parents were from the Veneto). Additionally, I’m curious about the fact that Sardinians seem to exhibit some level of genetic homogeneity. This surprises many people because of the history of Sardinia, under Carthaginian, Roman, and Vandal rule. I have a simple explanation for what’s going on: the coasts of Sardinia are malarial. The modern population of Sardinia are the descendants of the indigenous mountainers, who repopulated the coastal cities periodically.

I want to note that if you look at the ADMIXTURE runs the Mozabites have nearly as much of the Sardinian modal component as mainland Italians. This doesn’t mean equal genetic distance; the Mozabite dominant cluster has a higher distance. But, it does suggest to me that it may be that in the Copper Age the western Mediterranean was dominated by a Sardinian-like population, which later was displaced and assimilated by newcomers.

Finally, I have no idea where to get this data. That’s sad, since it is so large a set. But I specifically noted the biomedical origin of some the data because I suspect that’s going to make it difficult to get it into the public domain.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Anthropology, Genetics, Genomics, Italy 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"