Eunjung Han, Peter Carbonetto, Ross E. Curtis, Yong Wang, Julie M. Granka, Jake Byrnes, Keith Noto, Amir R. Kermany, Natalie M. Myres, Mathew J. Barber, Kristin A. Rand, Shiya Song, Theodore Roman, Erin Battat, Eyal Elyashiv, Harendra Guturu, Eurie L. Hong, Kenneth G. Chahine & Catherine A. Ball
Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies. …
From north to south: Northeast and Utah, Pennsylvania, Lower Midwest and Appalachians, Upland South, and Lower South. In other words, these are the American-Americans, largely English, Scottish, Welsh, Irish, and German in origin and mixing for hundreds of years.
which we describe as assimilated immigrant clusters, account for a large portion (60%) of the IBD [identity-by-descent] network and exhibit a markedly different profile. Lacking distinctive affiliations to non-US populations, they show almost no differentiation in allele frequencies (FST at most 0.001; Supplementary Table 5) and high levels of IBD to non-cluster members (Supplementary Data 2), suggestive of high gene flow between these clusters. Moreover, few members of these clusters could be assigned to a stable subset, indicating that this clustering is largely driven by continuous variation in IBD.
Genealogical data reveal a north-to-south trend (Fig. 5), most consistently east of the Mississippi River (Fig. 3). These findings imply greater east-west than north-south gene flow, which is broadly consistent with recent westward expansion of European settlers in the United States, and possibly somewhat limited north-south migration due to cultural differences.
Michael Barone points out that 19th Century railroad men loved to build more east-west railroads than were profitable, probably because they had cultural and political connections along particular latitudes. In contrast, a rare north-south railroad, the Illinois Central, made huge profits due to lack of competition.
While the precise numbers and boundaries of these clusters are not necessarily meaningful and may be partly driven by the assumption that inter-cluster connectivity follows a random graph model, these findings demonstrate that isolation-by-distance, and specifically geography in the continental United States, can be captured from IBD alone.
These five groups are basically the famous four groups in historian David Hackett Fischer’s Albion’s Seed: Four British Folkways in America – New England Puritans, Pennsylvanian Midlanders, Scots-Irish highlanders, and Lowland Southerners — but with, I believe, Hackett’s Scots-Irish split up by genome folks into two groups “Lower Midwest and Appalachians” and “Upland South.”