Citation: Decker JE, McKay SD, Rolf MM, Kim J, Molina Alcalá A, et al. (2014) Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle. PLoS Genet 10(3): e1004254. doi:10.1371/journal.pgen.1004254
I am a man of a particular age, old enough to remember when the idea of thousands of what were then quaintly termed ‘molecular markers’ would have left one aghast as the surfeit of data. Today the term “post-genomic” almost strikes me as anachronistic as the “information superhighway.” This is not the post-genomic era, it just is, the wildest dreams that were are. But the glorious present of data abundance is not without its limitations and pitfalls. As a friend explained once, bioinformaticians just “do stuff,” sometimes without understanding why they do stuff. Somewhere along the way the bio part seems to have been forgotten in the hurry to assemble the next organism as the machine demands more and more for its hungry maw. But the mechanical monster slurping through the fire hose of data with a hacked together chimera of a regular expression isn’t without some purpose. Many biologists with an interest in evolution have a dream of dense marker painting vast swaths of the tree of life, an empire of phyolgenetic information to be conquered.
But these vistas need some context, a horizon of information about the organism. This came to mind when I read Jared Decker’s new paper on the phylogenetics of domestic cattle, Worldwide Patterns of Ancestry, Divergence, and Admixture in Domesticated Cattle. In many ways it is a straightforward paper. You can see discussions on the earlier iterations over at Haldane’s Sieve (the preprint process seems to have worked to make it a more robust and clear publication from what I can tell!). Decker utilizes some straightforward methods (at least straightforward in 2014) on a very large SNP marker data set with expansive geographic coverage. In particular, TreeMix, Admixture, and PCA. With about ~40,000 SNPs these packages should blast through the data rather quickly (I’ve used all of them with this marker density, and sample sizes of approximately the size of the one Decker has).
You can read the whole paper yourself since it is open access. To me it seems to reiterate that cattle truly are cattle, to be pulled and prodded and traded at the whim of human beings. The fact that many East African cattle have predominantly Indian heritage (one of the two major clades) illustrates the fact that domestic animals exhibit the protean tendencies of human culture, rather than biological organisms which are governed by standard geographical and morphological diversification through conventional population genetic pressures. But I have to still admit that much of the narrative force of this paper escapes me because I lack understanding of the cattle at a level beyond the plainly statistical genetic. In other words, the organism matters. Cattle geneticists who may “hum through” the plots may still be able to grasp the force of argument with a greater clarity because their understanding of the topic is fundamentally thicker than that of outsiders. Many of the paper’s inferences from genetic data clearly draw their plausibility from elements of natural history which bovine biologists would take for granted.
And this is just the beginning. Over the next decade it seems inevitable that the clusters at the heart of “genomics cores” across the world will be gorging on whole sequences of thousands of individuals for many organisms. It will be a “flood the zone” era for attempting to understand the tree of life. An army of bioinformaticists will be thrown at the data in human waves, absorbing shock after shock, slowly transforming the ad hoc kludge pipelines of the pre-Model T era of genomics into simpler turnkey solutions. And then the biology will come back to the fore, and the deep wellspring of knowledge by those who focus on specific organisms and is going to be the essence of the enterprise once more.