The Unz Review - Mobile

The Unz Review: An Alternative Media Selection

A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media

Email This Page to Someone


 Remember My Information



=>
 Gene Expression Blog

51szD9HC9pL._SX258_BO1,204,203,200_ Obviously I’m doing more development right now than I would have expected. But in the long term I want to move beyond hacking to survive for the present, and write some code that’s sustainable. So I think I want to read a design patterns book. The last one I read was 15 years ago and I don’t really have much retention of it. I’m particularly interested in stuff geared toward Python (the language I’m starting to get comfortable in right now).

Readers with recommendations are invited to weigh in. I know I have a fair number of software engineers in the readership, so I’m asking for your thoughts and suggestions. Perhaps the classic from the GoF is still the way to go? Remember, I’m not a software engineering who works on scientific data, I’m a scientists who sometimes needs to do a little engineering and data analysis.

Note:
Sean, if you take this as an opportunity to leave a long-winded comment about sexual selection and blond women, again, I’m going to have to finally ban you!

 
• Category: Miscellaneous • Tags: Engineering

512QZUX2sSL._SX331_BO1,204,203,200_ Over at The Genetic Literacy Project Jon Entine has a post up, Usain Bolt’s Olympic gold proves again why no Asian, white–or East African–will ever be crowned world’s fastest human. Fifteen years ago Jon wrote Taboo: Why Black Athletes Dominate Sports And Why We’re Afraid To Talk About It, so he knows something about this topic.

Actually, I think Jon is wrong on this. Better drugs and biological engineering mean that I suspect at some point in the near future the fastest “human” alive is going to be non-African, and, if I had to bet, Chinese. But you know what Jon meant.

There is a lot of detail in Jon’s post because he knows a lot about this topic. But at the end of the day the specific details are less important than the general theoretical framework, which makes it unsurprising that a single group of humans who are genetically related dominate sprinting. Unlike figure skating, sprinting is entirely objective. All that matters are physical inputs. Second, unlike swimming, which is also objective, sprinting seems to have pushed very close to the boundaries of what non-modified or drug-enhanced individuals are capable of. To my knowledge there’s no expectation of a Fosbury Flop in sprinting.

Therefore, sprinting is selecting for raw ability. Training is not irrelevant, but the issue with training is that others can train too. What can’t be mimicked is raw ability due to one’s biological aptitudes and abilities (again, excepting bioengineering). Let’s assume that Olympic caliber sprinters are among the 10,000 fastest humans on the planet, because not all people with the aptitudes become sprinters. Assuming a normal distribution, that’s about five standard deviations above the human norm. I suspect I’m being conservative. Someone like Usain Bolt is probably a six standard deviation unit human. Google tells me that a fit human can run the 100 meter dash in 13.5 seconds. The world record is about 9.5 seconds. The absolute range here is not incredibly large. Small differences in the mean across populations suggest that when you select for extreme individuals those small differences will make all the difference.

If sprinting was less objective, then there would probably be more equality in outcome. I suspect judges would be biased for various reasons, and one set of nations or people of a particular ethnic background dominating a field can get quite embarrassing. But sprinting is rather objective, and the socioeconomic obstacles are low. Given basic nutrition, and the ability to huff it, you have a shot. What matters is the magnitude of your ability.

principlespopulationgenetics One peculiar thing population genetics teaches us that non-adaptive traits are more heritable. This is due to the fact that selection tends to remove variation, selecting for fitter individuals. Humans are good runners, there are entire evolutionary theories based around our biomechanical modifications and adaptations. But there’s really no benefit in running in bursts of 10.5 in the 100 meter dash vs. 9.5. We’re not that sort of ambush predator. There’s probably some heritable variation in burst ability, but it’s small, and not visible in any normal set of tasks among large groups of humans.

But modern competitive sports at the Olympic level is not selecting for normality, it’s selecting from outliers. It isn’t that West Africans were guaranteed to be the best sprinters, it’s just that a priori it shouldn’t be surprising that in such a non-adaptively beneficial trait as running a few seconds faster in the 100 meter dash some populations had the genetic die loaded in their direction.

Note that I’m not denying any sort of selective or adaptive argument. There’s a fair amount of evidence that there is some selection in favor of greater height in Northern Europeans vs. Southern Europeans, which probably explains why Lithuanians are more prominent in basketball in relation to their numbers than Italians. But the selection wasn’t for basketball, and the fact that there is heritable variation suggests that selection wasn’t that strong and unidirectional….

Humans vary. Populations vary too. When you select from the tails of the distribution, the differences between populations are going to be very noticeable. If a sport is objective, and pushing its limits, it will select from the tails of the distribution.

 
• Category: Race/Ethnicity, Science • Tags: Race, Sports

600px-Ptolemaicsystem-smallSabine Hossenfelder on her side gig as a physics consult, What I learned as a hired consultant to autodidact physicists:

Sociologists have long tried and failed to draw a line between science and pseudoscience. In physics, though, that ‘demarcation problem’ is a non-problem, solved by the pragmatic observation that we can reliably tell an outsider when we see one. During a decade of education, we physicists learn more than the tools of the trade; we also learn the walk and talk of the community, shared through countless seminars and conferences, meetings, lectures and papers. After exchanging a few sentences, we can tell if you’re one of us. You can’t fake our community slang any more than you can fake a local accent in a foreign country.

I haven’t learned any new physics in these conversations, but I have learned a great deal about science communication. My clients almost exclusively get their information from the popular science media. Often, they get something utterly wrong in the process. Once I hear their reading of an article about, say, space-time foam or black hole firewalls, I can see where their misunderstanding stems from. But they come up with interpretations that never would have crossed my mind when writing an article.

I’ve been blogging since 2002. Like Sabine I can often tell if someone has a scientific background after a few sentences, especially if they are biologists of some sort. As for the rest, the chasm is between the intelligent vs. not so intelligent, and it is usually pretty clear too. Mostly the intelligent have liberal arts or social science backgrounds, but have the basic analytic tools to decompose problems at the most general levels. The less intelligent tend to speak in simple formula when coherent, and devolve into total incomprehensibility when they try and attempt originality.*

The second issue is a somewhat different one from physics. Usually at a given moment there is a topic of particular interest to the media. Evo-devo and epigenetics come to mind. These are real scientific fields of inquiry. But because of disproportionate media attention to these sorts of topics, usually those who rely on their science knowledge from popularizations will assume that evo-devo and epigenetics have “revolutionized” our understanding of evolution and genetics, when in reality these are still developing areas, whose ultimate impact is to be determined.

In fact, I’d take this further: the area of evolutionary genetics has arguably not been “revolutionized” since the 1970s, with the theoretical and empirical debates triggered by allozyme work and the neutralist-selectionist debates. All the rest, including genomics, is just commentary.

* Here is a good example, the stupid reader who was explaining to me patiently how splicing and gene regulation “disprove” heritability estimates. I dismissed them, but the reality is that I’m 99% sure that that reader thinks I’m an idiot as well.

 
• Category: Science • Tags: Science

41ezBQHrx7L Spencer Wells, ago with many others, such as Jared Diamond, argued that agriculture was a disaster terms of what it wrought for the quality of life for the average human in his book Pandora’s Seed. This is broadly plausible to me. On the other hand, I also think it is highly likely that agriculture and civilization were basically inevitable.

The “great leap forward” in cultural complexity and explosion of symbolic expression ~50,000 years ago, give or take, seems likely to have been only the culmination of a process of encephalization and increased sophistication which had proceeded over millions of years. The precursors to the agricultural life were likely already there before the Holocene.

To a great extent the hypothesis of inevitability has been tested: in the Americas much of the dynamics which characterize the Old World were recapitulated. Agriculture, civilizations with writing and class stratification, and monumental architecture, all with analogs in the Old World, are there. In fact, this National Geographic piece, In Search of the Lost Empire of the Maya, is fascinating to read, because it seems to me that it likely parallels developments in the Old World two thousand years before. The Snake Kings were warlords in a manner which would have been familiar to the “Great Kings” of the ancient Near East.

There are two great schools of history from the pre-modern era. Those which are cyclical, and those which exhibit some intuition that there is an endpoint or progress. The “independent” experiments of human history suggest that both are true, with an arc of history on the macroscale scaffolded by innumerable cycles of rise and fall.

 
• Category: History • Tags: History

The-Ocean-of-Churn I just bought my friend Sanjeev Sanyal’s book, The Ocean of Churn: How the Indian Ocean Shaped Human History. Sanjeev is a polymath with varied interests, some of which intersect with my own. A few years back I had the pleasure of having dinner with him and Reihan Salam, and the server kept unapologetically offering the wrong person the alcohol that they ordered. I don’t think we look that much alike!

The top start-up mecca in America is far from Silicon Valley. It’s cheap to live here, and fun for young brogrammers. Also, not too far a flight away from elsewhere. As Mark Krikorian observed on Twitter being a blue bubble in a red state means that Austin can take advantage of low cost and low tax public policies, while maintaining a culturally liberal social aesthetic.

Bought Python Essential Reference.

Hubby and Lewontin on Protein Variation in Natural Populations: When Molecular Genetics Came to the Rescue of Population Genetics.

Sausage Party is a surprising mix of high and lowbrow.

Update: If I don’t post your initial comment, posting five additional times won’t result in your comment being posted.

 
• Category: Miscellaneous • Tags: Open Thread

Rosenberg_1048people_993markers A friend recently emailed to ask about the best way to pick a proper “K” value when inferring structure. K just being the parameter which defines how many putative ancestral populations you have in your model to explain some data on genetic variation. Obviously some value of K are more informative than others of population history.

For example, if you had 100 Swedes and 100 Yoruba Nigerians, to model the population structure you could select K = 2 or K = 50. The algorithm would produce results in the latter case, but you “know” a priori that really K = 2 is a really good model of the population history in a straightforward interpretable sense. There’s just not that much more juice to squeeze with many clustering methods out of this sort of data.

But it’s harder when you have population structure in organisms which we don’t know much about aside from the genetic data. How does one “objectively” select a K. The most common method is outlined in a 2005 paper, Detecting the number of clusters of individuals using the software structure: a simulation study:

The identification of genetically homogeneous groups of individuals is a long standing issue in population genetics. A recent Bayesian algorithm implemented in the software structure allows the identification of such groups. However, the ability of this algorithm to detect the true number of clusters (K) in a sample of individuals when patterns of dispersal among populations are not homogeneous has not been tested. The goal of this study is to carry out such tests, using various dispersal scenarios from data generated with an individual-based model. We found that in most cases the estimated ‘log probability of data’ does not provide a correct estimation of the number of clusters, K. However, using an ad hoc statistic ΔK based on the rate of change in the log probability of data between successive K values, we found that structure accurately detects the uppermost hierarchical level of structure for the scenarios we tested. As might be expected, the results are sensitive to the type of genetic marker used (AFLP vs. microsatellite), the number of loci scored, the number of populations sampled, and the number of individuals typed in each sample.

There’s an old saying, “garbage in, garbage out.” The method of ΔK is useful as far as it goes, but as inputs it takes the log likelihoods from the Structure program. For Admixture you can look at cross-validation. But these statistics are subject to various assumptions and approximations (in addition, some of the priors within the clustering algorithms are gross simplifications).

This is one reason I was excited about Estimating the Number of Subpopulations (K) in Structured Populations:

A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favor of a particular value of K cannot usually be computed exactly, and instead programs such as Structure make use of heuristic estimators to approximate this quantity. We show—using simulated data sets small enough that the true evidence can be computed exactly—that these heuristics often fail to estimate the true evidence and that this can lead to incorrect conclusions about K. Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach, using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are more accurate and precise than those based on heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Finally, we test our solution in a reanalysis of a white-footed mouse data set. The TI methodology is implemented for models both with and without admixture in the software MavericK1.0.

The website for MavericK 1.0 is informative if you don’t have academic access.

Unfortunately, and probably not surprisingly, this method is not scalable to genomic data sets. E.g., they’re looking that 10, 20 or 50 loci. A “modest” human genotyping array will provide you with tens of thousands of loci (SNPs). A “standard” array will provide you with on the order of 500,000 SNPs.

But the conclusion of the paper is worth keeping in mind:

Finally, it is important to keep in mind that when thinking about population structure, we should not place too much emphasis on any single value of K. The simple models used by programs such as Structure and MavericK are highly idealized cartoons of real life, and so we cannot expect the results of model-based inference to be a perfect reflection of true population structure (see discussion in Waples and Gaggiotti 2006). Thus, while TI can help ensure that our results are statistically valid conditional on a particular evolutionary model, it can do nothing to ensure that the evolutionary model is appropriate for the data. Similarly—in spite of the results in Table 2—we do not advocate using the model evidence (estimated by TI or any other method) as a way of choosing the single “best” value of K. The chief advantage of the evidence in this context is that it can be used to obtain the complete posterior distribution of K, which is far more informative than any single point estimate. For example, by averaging over the distribution of K, weighted by the evidence, we can obtain estimates of parameters of biological interest (such as the admixture parameter a) without conditioning on a single population structure. Although one value of K may be most likely a posteriori, in general a range of values will be plausible, and we should entertain all of these possibilities when drawing conclusions.

Amen!

 
• Category: Science • Tags: K, Structure

51sdHZvYfTL._SX334_BO1,204,203,200_ Evolutionary theory famously predated the emergence of genetics by decades. Initially there was some conflict between the heirs of Charles Darwin and the first geneticists in terms of their mechanistic understanding of how evolutionary process occurs. Within a few decades though genetics and evolutionary biology were synthesized so that the former came to be integral toward understanding the processes and parameters which shape the character of the latter (see The Genetical Theory of Natural Selection). E.g., imagine attempting to understand the origins and maintenance of sexual reproduction without any genetic understanding of the determination of sex and its implications for transmission.

But obviously genes are not everything when it comes to phenotypes. In particular with humans, there are complex behaviors and social interactions which seem to be persistent, and perhaps adaptive, which may not be directly contingent upon any simple genotype-phenotype map. 41YXHblIQEL This is not to say that cultural and behavioral traits have no genetic basis. To give an example, religion is a complex phenomenon which is both universal and does not seem directly encoded in one’s genes. The search for a “god gene” is futile, because religion as a phenotype is mediated by innumerable other phenotypes, which themselves have complex genetic bases.

Though culture is contingent upon genes, exhibits a character which is separable from genetic evolution. In particular, dual inheritance theory explicitly acknowledged that human cultural variation over time and space is a function of the interaction between both cultural and genetic evolution. Though there are similarities between the two, and in fact the field of cultural evolution consciously utilizes much of the same formalism as population and quantitative genetics, the modes of inheritance and nature of the origination and perpetuation of variation of the two differ a great deal.

As a rule of thumb you can posit that genetic evolution is relatively slow and torpid in relation to cultural evolution, which is protean and quicksilver. Consider that lactase persistence or high altitude adaptations are the two fastest we know for human genetics, and they occur on 1,000 year time scales. Over a 1,000 year time scale takes you from Julius Caesar to Otto the Great. It takes you from first of the Mycenaean, to Athens of Pericles.

The differences between culture and genes are important to keep in mind when one is making predictions. I’m a big fan of the Eric Kaufmann book, Shall the Religious Inherit the Earth?: Demography and Politics in the Twenty-First Century. The model outlined within the book, higher fertility for religious people, ergo, the reemergence of religion, is logically plausible. But I always must remind me people that the same concerns were prevalent in France before 1850, with the arrival of more traditional Roman Catholics into a milieu which had notably secularized and undergone early demographic transition. Why is France today not a uniformly Catholic republic? First, there is history. The migration of Muslims from North Africa. But even more important, cultural evolution, as the descendants of Spaniards, Poles, and Italians, secularized.

9780226558271 There is though a difference between description, and formal modeling. The field of cultural evolution attempts to do the latter. There are several lay and specialist introductions to the field (just click some of the book links and you’ve find them all). It’s worth attempting to grapple with the domain in a more systematic way, because that’s the only way you can make predictions which make sense of the diversity we see around us.

A new preprint is an interesting addition to the literature, Gene-culture co-inheritance of a behavioral trait:

Human behavioral traits are complex phenotypes that result from both genetic and cultural transmission. But different inheritance systems need not favor the same phenotypic outcome. What happens when there are conflicting selection forces in the two domains? To address this question, we derive a Price equation that incorporates both cultural and genetic inheritance of a phenotype where the effects of genes and culture are additive. We then use this equation to investigate whether a genetically maladaptive phenotype can evolve under dual transmission. We examine the special case of altruism using an illustrative model, and show that cultural selection can overcome genetic selection when the variance in culture is sufficiently high with respect to genes. Finally, we show how our basic result can be extended to nonadditive effects models. We discuss the implications of our results for understanding the evolution of maladaptive behaviors.

The most relevant section is probably 3.2 Model 2: Cultural prisoner’s dilemma. If you don’t know what the Price Equation is, read the original paper. It will induce some clarity.

The fact that more variance in culture in relation to genes allows for selection to act more powerfully on culture, and arguably in a maladaptive manner from the gene-centric perspective, is no surprise. This preprint adds more precision and clarity. For adaptation to occur there needs to be heritable variation. One reason that cultural group selection is more plausible than genetic group selection is that genetic variation across demes is often very low. The Fst between racial groups may be 0.10 to 0.30, but it is not very common for such Fst values to be realized between two groups genuinely in competition. More often neighboring populations have much lower Fst values, though ancient DNA is suggesting that 0.05 to 0.10 values were maintained in some areas 5 to 10 thousand years ago. A simple population genetic rule of thumb is that one needs to have less than one migrant between two populations per generation for their genetic variation to increase, rather than decrease. In other words, minimal gene flow on a general scale quickly reduces between group genetic variance.

In contrast, cultural variation can be maintained because migrants can switch cultures, or, their genetic progeny can adopt the culture of one the parents in totality. In this way the later Ottoman Sultans and Umayyad rulers of Al-Andalus had been genetically transformed by generations of mixing with concubines derived from Europeans or Caucasians (i.e., those from the Caucasus), while remaining culturally very Turk and Arab respectively.

As noted in the preprint, this formal/theoretical avenue of research will allow for the development of a robust empirical research program. The data is out there.

 
• Category: Science • Tags: Cultural Evolution, Genetics

killerenhancedcolourscheme Recently Daniel Falush’s group came out with a preprint, A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots. If you read the science posts on this weblog (basically, if you read this weblog), and you haven’t read it, read it now.

At his weblog, Paint My Chromosomes, Falush has talked about both the production of the preprint (I had a minor stimulatory role), and the attempt to get it published somewhere. This reaction is strange to me:

We also had our first journal rejection, from eLife. It has not been my habit to live-tweet journal rejections and am not intending to start now. I am a journal editor myself and do not think the process would benefit from being turned into a public performance. I was disappointed because eLife claims to hold itself to higher standards, trying to change publication by judging papers on their true worth rather than on simple measures of impact and also because the reason given was silly:

“..but feel that the target audience is a rather specialised one.”

Of course I’m biased. But this strikes me as crazy. The third most cited paper in the history of the journal Genetics, is Jonathan Pritchard’s Inference of Population Structure Using Multilocus Genotype Data. Take a look at the list, and note the papers that it is more cited than (e.g., a Sewall Wright paper from 1931, and Tajima’s 1989 paper!).

To be sure, the number of times that a paper is cited is not a good measure of how often it is read and understood. And that’s kind of the point of Falush’s preprint, to actually give some guidance to people who use model based clustering in a turnkey fashion without any deep comprehension of its limitations and biases. The nuts & bolts of the inferences of population structure may be specialized, but analysis of structure is a routine part of many different types of papers, in particular in medical genetics where variants may have different effects in different genetic backgrounds.

 
• Category: Science • Tags: Structure

85251766_fea18b6004 Probably the most incredible science story of the week, Eye lens radiocarbon reveals centuries of longevity in the Greenland shark (Somniosus microcephalus):

The Greenland shark (Somniosus microcephalus), an iconic species of the Arctic Seas, grows slowly and reaches >500 centimeters (cm) in total length, suggesting a life span well beyond those of other vertebrates. Radiocarbon dating of eye lens nuclei from 28 female Greenland sharks (81 to 502 cm in total length) revealed a life span of at least 272 years. Only the smallest sharks (220 cm or less) showed signs of the radiocarbon bomb pulse, a time marker of the early 1960s. The age ranges of prebomb sharks (reported as midpoint and extent of the 95.4% probability range) revealed the age at sexual maturity to be at least 156 ± 22 years, and the largest animal (502 cm) to be 392 ± 120 years old. Our results show that the Greenland shark is the longest-lived vertebrate known, and they raise concerns about species conservation.

Elisabeth Pennisi has a nice write-up, Greenland shark may live 400 years, smashing longevity record:

…Using this technique, the researchers concluded that two of their sharks—both less than 2.2 meters long—were born after the 1960s. One other small shark was born right around 1963.

The team used these well-dated sharks as starting points for a growth curve that could estimate the ages of the other sharks based on their sizes. To do this, they started with the fact that newborn Greenland sharks are 42 centimeters long. They also relied on a technique researchers have long used to calculate the ages of sediments—say in an archaeological dig—based on both their radiocarbon dates and how far below the surface they happen to be. In this case, researchers correlated radiocarbon dates with shark length to calculate the age of their sharks. The oldest was 392 plus or minus 120 years, they report today in Science. That makes Greenland sharks the longest lived vertebrates on record by a huge margin; the next oldest is the bowhead whale, at 211 years old. And given the size of most pregnant females—close to 4 meters—they are at least 150 years old before they have young, the group estimates.

 
• Category: Science • Tags: Greenland Shark, Science

A follow up on the Ancient Archaic Admixture Into the Andamanese story, No evidence for unknown archaic ancestry in South Asia:

Genomic studies have documented a contribution of archaic Neanderthals and Denisovans to non-Africans. Recently, Mondal et al. 2016 (Nature Genetics, doi:10.1038/ng.3621) published a major dataset–the largest whole genome sequencing study of diverse South Asians to date–including 60 mainland groups and 10 indigenous Andamanese. They reported analyses claiming that nearly all South Asians harbor ancestry from an unknown archaic human population that is neither Neanderthal nor Denisovan. However, the statistics cited in support of this conclusion do not replicate in other data sets, and in fact contradict the conclusion.

Last I heard they hadn’t released the bam files. Mistakes are made, that’s how science is done, and other people help in the process of correction. But, it is starting to get worrisome to me to see papers with bioinformatic errors being published in high impact journals.

 
• Category: Science • Tags: Genomics

pydata_cover (1) Sorry about the light posting. I’ll get back into gear in a few days. Very busy professionally and personally the past week or so.

I’ve been getting into writing Python code, as opposed to reading it. It’s a different beast altogether, obviously. I’m a lot slower than I would be in Perl, but I’m getting stuff done, so that’s something. I would highly recommend Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, if you have a background in R and another scripting language.

I went to my high school reunion. It was fun and interesting. Apparently people change in a few decades…

 
• Category: Miscellaneous • Tags: Open Thread

51zeajUmWhL._SX316_BO1,204,203,200_ An excellent open access review of population genetics history from 1966 to the present in Heredity, Population genetics from 1966 to 2016. From the abstract:

We describe the astonishing changes and progress that have occurred in the field of population genetics over the past 50 years, slightly longer than the time since the first Population Genetics Group (PGG) meeting in January 1968. We review the major questions and controversies that have preoccupied population geneticists during this time (and were often hotly debated at PGG meetings). We show how theoretical and empirical work has combined to generate a highly productive interaction involving successive developments in the ability to characterise variability at the molecular level, to apply mathematical models to the interpretation of the data and to use the results to answer biologically important questions, even in nonmodel organisms. We also describe the changes from a field that was largely dominated by UK and North American biologists to a much more international one (with the PGG meetings having made important contributions to the increased number of population geneticists in several European countries). Although we concentrate on the earlier history of the field, because developments in recent years are more familiar to most contemporary researchers, we end with a brief outline of topics in which new understanding is still actively developing.

Charlesworth & Charlesworth are giants in the field, and they’ve a lot of changes over the past few decades. If you are inclined toward a deeper exploration of population genetics with an evolutionary focus, then Elements of Evolutionary Genetics is the book for you.

 
• Category: Science • Tags: Genetics

pydata_cover For the past few days I’ve been using the Python data analysis library, or “pandas.” Most of the time I work with Perl, R, and shell scripting. But the Perl/R combination has gotten to be pretty unwieldy recently, and some of my coworkers swear by pandas. So in the interest of firm cohesion I converted to their sect. Honestly it wasn’t hard, as I’ve always enjoyed reading Python code, though I haven’t written much before.

red-panda-985643_960_720 So far I’ve mostly been relying in Wes McKinney’s Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. If Python aficionados have other recommendations, I am open to them.

If I need do to text processing fast & in a pinch, I can still see falling back on Perl. And R is still a great environment to work in, I won’t be uninstalling RStudio anytime soon.

 
• Category: Miscellaneous • Tags: Programming

440px-Lance_Armstrong_2005 I don’t follow cycling closely, but I once praised Lance Armstrong, who I had read about in the media, to a friend who had been a journeymen professional in the sport in the late 1990s. My friend expressed some irritation, shrugged, and told me that everyone in the sport knew that Armstrong doped. He didn’t seem to want to talk about it in any depth, as he’d left the sport anyhow, and I didn’t pursue the conversation any further. Honestly I wasn’t sure at the time whether my friend was correct, or, whether he was jealous. I assumed the former, but I didn’t totally discount the latter. How could I truly know at the time?

This was in the early 2000s. Obviously if my friend, who was very far down the rankings of competitors, knew this, many more did so. The media almost certainly suspected, but Armstrong was a great story, and most people didn’t have definitive proof. I thought of this when reading this piece in The New York Times, Clean Athletes, and Olympic Glory Lost in the Doping Era:

Babashoff arrived at the Montreal Olympics in 1976 with a chance to match the performance in 1972 of Mark Spitz, whose seven golds sealed his status as an American icon and propelled him into a career as a product pitchman. Babashoff, a teammate of Spitz’s at those Munich Olympics, swam significantly faster four years later only to settle for four Olympic silver medals and one relay gold. Her career path as a high-profile endorser and motivational speaker was blockaded by broad-backed, husky-voiced East Germans later found to have been unwitting victims of a government-sponsored doping program.

Shamed by the news media and shunned by swimming officials for pointing out her competitors’ cartoonish musculature and suggesting they were cheating, Babashoff retreated into a self-imposed, decades-long exile. She raised her son, Adam, now 30, as a single mother well out of the spotlight while working as a postal carrier in Huntington Beach, Calif.

“People knew what was going on at the time, they just didn’t know what to do about it,” Babashoff said. “It just seems so weird in this day and age that they can’t right the wrongs. It just seems like such an easy fix.”

“Well, except for their deep voices and mustaches, I think they’ll probably do fine,” she said. Her remarks were the beater that churned Cold War politics. Apologetic United States Olympic Committee officials sent the East German women flower arrangements, Babashoff wrote. In her book, Babashoff includes an open letter to Bach requesting that the female swimmers from the 1976 Olympics who finished behind the East Germans be awarded duplicate medals.

At the 1996 Olympics in Atlanta, the swimmer Allison Wagner finished second in the 400 individual medley to Michelle Smith, 26, of Ireland, whose winning time was 19.76 seconds faster than her 26th-place effort four years earlier at the Barcelona Olympics. Smith’s remarkable improvement at a relatively advanced age made her competitors suspicious.

…she had left the pool deck, panting from exhaustion, after the 400 I.M. final and had been cut off by Hungary’s Krisztina Egerszegi, the defending champion, whom she had defeated by five-tenths of a second. Wagner said: “She came right up to me and said: ‘Congratulations. You’re the true winner. I just want you to know that.’ I had never talked to her before in my life, and she said that to me.”

But when Wagner met with the news media shortly thereafter, she refrained from denigrating Smith or questioning her performance.

“I didn’t say anything because people in our swimming federation used to say to me, ‘You don’t want to be Surly Shirley, do you?’” she said, referring to Babashoff.

Depressing. But in sports where differentiation at the highest echelons can be split second, resulting in huge variation in monetary outcomes, I do wonder if there is a lot of subtle cheating which we can never even hope to detect.

 
• Category: Science • Tags: Sport

Picture1

The employment data above are from Randall Parker (seasonally adjusted for what it’s worth), and originally the Labor Department. Randall had it as a tabular display, but I think a simple bar plot is more illustrative. The percentage of unmarried births is from the Census.

It looks like Americans with university degrees or higher are basically at full employment. Additionally, the substantial majority of Americans with university degrees or higher are in the labor force. In contrast, only a minority of Americans without high school diplomas, and only a simple majority of Americans with high school diplomas, are in the labor force.

Labor force participation is pretty straightforward. If you are looking for a job, or have a job, you are part of the labor force. Everyone else is part of the whole population (e.g., those who are homemakers, etc.).

FT_15.12.4.college.marriage2 As for births to unmarried women, those with university degrees basically live in a different universe. I didn’t want to clutter the above chart anymore, so I didn’t mention divorce. But you can see from the data to the left that college educated Americans tend to have very long marriages. In contrast, when the non-college do get married, divorce is rather common.

I’m pretty bullish on America, and the world. But that’s easy for me to say, since I am the sort of person who has more work than time, and my work is very fulfilling. Also, I’m married, with beautiful healthy children. I’m a lucky person, and the world seems charmed. It’s simply not in my interest to rock the boat.

But for those for whom only desperation stretches out before them, desperate acts can seem quite rational. Those with nothing to lose have nothing to lose.

 
• Category: Economics, Ideology • Tags: Class

pepper
They say to write about what you know. One thing I know are peppers, and hot sauce. So in addition to my writings on genetics, history, and assorted odds & ends, probably more pepper writing than before.

51G93vyEl5L Class is important, but it doesn’t seem to be a good organizing principle around which an organic social movement can develop, like race or religion. The Soviet Union and Peoples’ Republic of China have both evolved into nationalistic states because the ideology of Communism never erased, and in fact only complemented, the nationalist ethos which served as the true substrate of the modern polities.

The Hillbilly Elegy: A Memoir of a Family and Culture in Crisis is an important book because it gives an impression of the possibilities of much of the human future. These poor white people can be described in unvarnished terms because they’re white, and white people can be described somewhat objectively. Their world is in crisis, as the world economy leaves them behind. The golden age of well-paid unskilled and semi-skilled work is gone. The future is uncertain, and without dignity.

This is the lot of the bottom 90 percent of all races. But because class can’t motivate human emotions in the same way as race and religion, we might see a return to more nationalist organizing principles in the near future because the elites really don’t have anything to give in terms of dignity and economic hope to the masses. Yes, they’ll live at a marginal consumer level, but they won’t obtain honor and self-worth through work, because they will have been rendered redundant by productivity gains and globalization.

I had a discussion about gentrification at a start-up event recently. As a gentrifier and small-l libertarian I don’t have a problem as such with gentrification. My interlocutor had local roots, and talked about the dislocation imposed upon his maternal Mexican American side. I was sympathetic, but, I suggested that America is a global nation, and a diverse one. He made the case for non-economic social capital, and cultural cohesion, and I suggested that sounds a lot like the sort of thing working class whites might also offer up for why mass immigration is a problem (he was taken aback by the analogy).

Ultimately the public discussion tends to avoid the hard questions. And that’s why we’re where we are.

A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots. Unless you’ve produced a lot (a lot) of these plots, please read this. Whether you read my blog, or plan to do admixture analysis in the future.

The Strange Rites of the Ancient Olympics.

The domesticated brain: genetics of brain mass and brain structure in an avian species.

 
• Category: Miscellaneous • Tags: Open Thread

Pa3fEwuyuzaNqxWWfjkfuQecyVO6IZcSiNsl7n5uEg8
The above visualization is from a Reddit thread, Almost all men are stronger than almost all women. It’s based on grip strength, and basically reiterates my post from last year, Men Are Stronger Than Women (On Average). The same metric, grip strength, is highlighted. The plot above shows that the “great divergence” occurs on the cusp of puberty, exactly when secondary sexual characteristic of males and females become much more pronounced. In my post I pointed out that the Olympic caliber female German fencers were on the lower end of the male distribution.

This came to my mind when reading this nice piece in The New York Times Magazine, The Phenom: The most dominant swimmer in the pool this summer is 19-year-old Katie Ledecky. The question isn’t whether she’ll win, but by how much:

It’s not unusual for men and women swimmers to train together, but being in the pool with Ledecky is something that many men can’t handle. In April, Conor Dwyer, a 6-foot-5, 27-year-old American swimmer who won a gold medal in the 4-by-200 freestyle relay in London, gave a revealing interview posted online by USA Swimming. In it, he talked about male swimmers being “broken” by Ledecky when they practiced together at the Olympic Training Center in Colorado Springs.

Ledecky’s ability to crush men in practice does not necessarily mean she would defeat them in competition. There’s a difference between imposing her will, and perhaps superior conditioning, over the course of a two-hour practice and doing it in a shorter race in which men’s generally greater strength provides an advantage. Her best chance would probably be in the 1,500 freestyle, which women race at the FINA World Championships but not at the Olympics. (The men don’t swim the 800 in the Olympics, so there are the same number of events for male and female swimmers.) Ledecky’s best time in the event would put her among the dozen or so top American men and is 25 seconds faster than their qualifying time at the United States Olympic trials — but it is much too slow to earn a medal at the Games. On the other hand, because no other woman offers a real challenge to her, she is never pushed in that event. I asked Andrew Gemmell, who specializes in the 1,500 free, a hypothetical question: What if, in some dystopian swim universe, Ledecky was told that there would be no women’s events and that she would have to try to make the American team by competing with the men in the 1,500?

His father, who trains her, had told me that he did not think she could qualify, a feat that under current rules would require her to finish first or second at the trials. Andrew, who trains side by side with her, had a different answer. “It would be really difficult, but I would never bet against her,” he said. “I don’t think anybody knows yet what she’s capable of.”

9781440838101 I’m a little surprised honestly that the term “dystopian” got in there, because there are now people with academic appointments arguing for the ending of sex segregation in sports. Often they are sociologists, who believe all things are socially constructed, and take some element of non-binary aspect to gender to meaning that the distribution of possibilities are entirely flat and arbitrary.

Katie Ledecky has preternatural gifts, as well as opportunities afforded to her by her class status. The whole piece highlights Ledecky’s exceptional physical abilities and mental attributes. But even it acknowledges she would likely not beat the top men in her events.

One of the authors of the above book, Sex Segregation in Sports: Why Separate Is Not Equal, Adrienne Milner, was interviewed last year on NPR about the thesis. The interviewer was polite, but a little incredulous. When he brought up biological differences, her response was illuminating, after a fashion.

First, she argued that sex segregation in sport denoted women’s inferiority, and that was a problem. The fact is that when it comes to strength, especially upper body strength, all the data do suggest that women, on average, are markedly inferior to men. This is a fact. This fact causes problems. But the fact that this fact causes problems does not entail that we literally deny the fact. At least that’s my opinion.

Second, she analogizes sex and gender as social constructs to race as a social construct. I knew she was going to go there, because this is a rhetorical nuclear option which is going to quickly defenestrate interlocutors. She observes that:

“We look at race as a social construction. It is not genetic, it is not biological, and we believe the same is [true] for sex … The male-female dichotomy doesn’t cover everyone, right? We have trans people, intersex people.”

As I said above, the reporter was incredulous, but he had a hard time responding after Dr. Milner explicitly connected race and sex, because it is the mainstream position now that race is a social construct and lacks any biological basis. The facts may not be on Milner’s side, but she has the theory and the “moral arc of history” backing her. It would take great courage to still dig in and defend reality as it is, as opposed to her preferences.

The reality is that race and sex/gender are social constructs. The atom is a social construct. Matter and energy are social constructs. Cities are social constructs. Everything is a social construct, as we look through the glass darkly. But social constructs operate on various levels of clarity and distinctiveness and exhibit different levels of pliability and utility. Dalton’s atomic model is profoundly wrong. It has long been superseded by quantum physical models, which have the utility of making correct predictions, whatever their correspondence to reality on a metaphysical level might be. But the Daltonian model is still often implicitly the one introduced to children to allow them to gain some intuition as to the nature of how matter is constituted. In contrast, the metaphysical ideas of the ancients as to the material nature of the universe are both wrong, and, lacking in utility.

All models are wrong, but there are still superior and inferior models. Their measure is in how they correspond to, and predict, reality. Not how they correspond to our ethical judgements of how the universe should be.

Many sociologists dissent from this position. They’ve marched into the academy and taken it over. Because of their ideology that all things are social, they believe they can reshape the fabric of the universe through their own normative preferences. To me this is a problem. I struggle against it. Our deep human intuitions often reject, and recoil, against fragments of reality. But to successfully grapple with reality we need to attempt to understand reality on its own terms, not our own.

I may struggle in vain. Could it be the liberal Whiggish scientific moment in history is over? History is written by the winners, but perhaps in the future science will also be written by the winners. I’m not sure that the truth will win out. Perhaps the glass will become darker, rather than clearer. There are genuine difficult empirical questions about the nature of human variation and our dispositions, and how it relates to the values that we hold to be true. The fact that we’re still discussing sex segregation in sports and how it is unjust illustrates how far we’ve come in the solipsistic and socially constructionist direction.

Imagine that in the end of days all the mandarins will be sociologists, who come not to bring illumination of the truth, but to determine the nature of the truth for us to agree upon. Perhaps this is the true end of history, as humanity returns to an equilibrium where the bracing aspects of reality are shielded from the masses, which lay indolent in their delusions, while the technocrats and artificial intelligences confront the outside.

 
• Category: Science • Tags: Science, Sex

Yellowbird-Hot-Sauce-98-Oz-Combo-Pack-0-0
Yellowbird is a pretty good hot sauce. As you can see it gives you quantity, and the quality is decent. But there’s a major problem with the serrano and habanero brands.

According to the scoville scale the habanero is about 10 times spicier than the serrano. That sounds about right to me. So if you buy a habanero sauce, it should be around 10 times spicier, right? Well, not exactly since a sauce has other ingredients. But, it should be considerably spicier, at least.

That’s not what I perceive in the Yellowbird brand. The serrano sauce is nearly as spicy as the habanero line. What’s going on? If you look at the ingredients serrano is listed first for its sauce, but habanero is not first. Carrot is first. A lot of hot sauces use carrot puree in their sauces, but I find that a lot of “habanero” sauces overwhelm you with carrot flavor so that you can say you bought a habanero sauce, without tasting much habanero.

I suspect that that Yellowbird adds a lot more serrano to that lien than they add habanero to that sauce. So the label is officially accurate, but when you buy the two sauces they are not that different in spice levels, because the concentration of capsaicin is actually pretty close.

Overall I would say that the habanero sauce isn’t worth it. The Yellowbird serrano though is a good sauce. Because there’s a lot of pepper there is a fresh green flavor to it, and it’s not so sweet as some hot sauces. Like most good hot sauces there’s an astringency, but it doesn’t overpower.

 
• Category: Miscellaneous • Tags: Hot Sauce

440px-Heraclius_tremissis_681357 The Emperor Heraclius is a great man. It’s a shame most people don’t know more about him. His campaigns against the Persians in the early 7th century were truly audacious. But, he also lived long enough to witness the loss of Syria and Egypt. If you haven’t, I would highly recommend A History of the Byzantine State and Society.

In any case, I was double-checking the marriage to his niece Martina because of some comments below, and came upon this interesting passage on Wikipedia:

Martina and Heraclius had at least 10 children, though the names and order of these children are questions for debate…

Of these at least two were handicapped, which was seen as punishment for the illegality of the marriage.

The coefficient of relatedness between uncles and nieces is 1/4. Twice as close as cousins, and the same as that between half-siblings. It isn’t entirely surprising that debilities would show up at this genetic distance, though two out of ten at that extreme might be a bit high.

 
• Category: Science • Tags: Genetics

ng.3592-F1
A new paper in Nature Genetics, Characterization of Greater Middle Eastern genetic variation for enhanced disease gene discovery, is both interesting and important. But, as with the paper on the Andaman Islander genomes it starts out with a naive and misleading utilization of model -based clustering to frame the later results. Here’s a major offending section:

The least admixed samples were found in the NWA, AP, and PP subregions, suggesting that populations in these regions are derived from founder populations, but there was evidence of inter-regional variation in GME-specific components, suggesting the occurrence of local admixture (Fig. 1b) and potentially supporting historical events. The NWA component was found in regions from west to east across North Africa, likely representing the Berber genetic background…The AP component likely represents ancestral Arab populations and was observed in nearly all regions, possibly as a result of the Arab conquests of the seventh century coincident with the expansion of the Arabic language now spoken over much of the GME. Similarly, the Persian expansion into the TP and SD regions and parts of NEA in the fifth century was the most likely contributor of the PP signal.

Patterns of human migration and drift were recapitulated using TreeMix for GME subregions, on the basis of 1000 Genomes Project control populations…The inferred tree with no migration showed tight clusters for European and Asian populations but much greater apparent divergence among subjects from GME regions. T he ordering of the GME subregional populations from the root corroborated much of the ‘out-of-Africa’ ordering of subsequent founder populations…For GME populations, distance from the root emulated the west-to-east organization of GME samples, with the PP population showing the largest inferred drift parameter, supporting a west-to-east trajectory of human migrations.

You can’t assume that a population which is near fixed for a cluster, K, is actually not admixed. If you don’t have enough variation within your data set then the ‘least admixed’ populations will come out as similar to the reference, even though they themselves are admixed.

Second, I am quite open to the idea that the Arab conquests of the 7th century were demographically significant, but these results don’t show that. The Tuscan population is not 25% Arab, due to the Arab conquests. Additionally, Arabs did not permanently alter the interior of Anatolia. Their raids went rather rather far to the west, such as the one of Amorium, but the high water mark of Arab rule in relation to the Byzantines, arguably in the decades around ~800 A.D., simply resulted in a “no man’s land” along the borders (though some Semitic peoples, some of them Arabic speaking, of Christian background did migrate into Byzantium). Similarly, the Persian-Pakistani modal cluster has nothing to do really with the Persian Empire.

This is not a big deal, but, these passages are just silly. They’re wrong on the face of it. But the “peer reviewers” that Nature Genetics assigned to this paper were probably not well versed in human historical phylogenomics. Probably they saw that the methods were sound in the broadest sense (e.g., Admixture, Treemix, PCA, etc., are all fine methods), and were unaware that the inferences made were totally wrong. Anyone who had read Lazaridis’ et al.’s The genetic structure of the world’s first farmers would see how these passages needed to be revised and changed. The clusters in admixture above are to a great extent artifacts (useful ones for GWAS, but still artifacts). The historical inferences made have little basis in reality.

Second, the genetic pattern of variation above has nothing to do with the “out-of-Africa” migration. Rather, it has to do with the fact that there is cryptic Sub-Saharan African admixture even in the “pure” samples from some regions, because Sub-Saharan admixture is rather well mixed in some groups (e.g., in Northwest Africa). The cline is less about “out-of-Africa,” and more about a cline of African ancestry. These patterns of variation have literally nothing to say about the “out-of-Africa” migration. The whole passage should have been excised.

ng.3592-F3It’s a shame that there’s all this wrong stuff in the paper. I’m a big fan of Jean-Laurent Casanova because his medical genetics is going to make a difference in lives, and, his hairdo is awesome. Andy Clark is on the paper, he’s my St. Jerome for having co-authored Principles of Population Genetics. I feel a little ridiculous making these criticisms, but I think I’m right, and it’s a shame that the authors didn’t have anyone who knew enough human population genomics to fix this portion of the paper, and it’s a shame that Nature Genetics couldn’t find peer reviewers to steer them the right direction.

Aside from the the random wrong historical inference stuff, the paper is kind of a big deal (I think Nature Genetics worthy, but I don’t know anything about this stuff in regards to publications). It confirms in the broadest outlines a lot of what we knew. The further you go from Africa the less genetically diverse populations get when it comes to looking at polymorphism diversity. Native Americans have fewer segregating polymorphisms than Eurasian populations, for example. One way to model this is as serial bottlenecks out of Africa. I think that’s too simple of a picture, as there has been a lot of gene flow and admixture over the last 10,000 years, but on the coarsest of all scales it’s not totally misleading.

But a peculiar aspect of these dynamics is that when you look at runs of homozygosity in the genome, which usually measure more recent inbreeding, the Middle East and South Asia tends to have higher lower genetic diversity. To get a sense of South Asian populations, you can read The promise of disease gene discovery in South Asia. Because of caste/jati endogamy a lot of the South Asian groups have less genetic diversity than you might expect. This has disease implications.

Middle Eastern, North African, and Pakistani populations are even more extreme. You can see it in the figure above. Across short runs of homozogosity the results converge onto what you’d expect, roughly. But Middle Eastern populations are a huge anomaly at long runs. That’s because of this:

From 20–50% of all marriages in the GME are consanguineous (as compared with <0.2% in the Americas and Western Europe)1, 2, 3, with the majority between first cousins. This roughly 100-fold higher rate of consanguinity has correlated with roughly a doubling of the rate of recessive Mendelian disease19, 20. European, African, and East Asian 1000 Genomes Project populations all had medians for the estimated inbreeding coefficient (F) of ~0.005, whereas GME F values ranged from 0.059 to 0.098, with high variance within each population (Fig. 2c). Thus, measured F values were approximately 10- to 20-fold higher in GME populations, reflecting the shared genomic blocks common to all human populations. F values were dominated by structure from the immediate family rather than historical or population-wide data trends (Supplementary Fig. 8). Examination of the larger set of 1,794 exomes that included many parent–child trios also showed an overwhelming influence of structure from the immediate family, with offspring from first-cousin marriages displaying higher F values than those from non-consanguineous marriages (Fig. 2d).

Screenshot 2016-07-28 20.09.42 The authors masked alleles which were part of the reason that individuals were included in the data set in the first place (to prevent ascertainment bias). Rather, they were focused on genome-wide patterns of loss of function and derived alleles. Because they were looking at many low frequency variants naturally they found a lot of new variation, totally unobserved in European dominated genetic data sets. This is why bringing genomics to the world is kind of a big deal.

For me this was the most interesting, and sad, result:

Despite millennia of elevated rates of consanguinity in the GME, we detected no evidence for purging of recessive alleles. Instead, we detected large, rare homozygous blocks, distinct from the small homozygous blocks found in other populations, supporting the occurrence of recent consanguineous matings and allowing the identification of genes harboring putatively high-impact homozygous variants in healthy humans from this population. Applying the GME Variome to future sequencing projects for subjects originating from the GME could aid in the identification of causative genes with recessive variants across all classes of disease. The GME Variome is a publicly accessible resource that will facilitate a broad range of genomic studies in the GME and globally.

The theory is simple. If you have inbreeding, you bring together deleterious recessive alleles, and so they get exposed to selection. In this way you can purge the segregating genetic load. It works with plants. But humans, and complex animals in general, are not plants. More precisely the authors “compared the distributions of derived allele frequencies (DAFs) in GME and 1000 Genomes Project populations.” If the load was being purged the frequency of deleterious alleles should be lower in the inbreeding populations. It wasn’t.

Middle Easterners should stop marrying cousins to reduce the disease load. But that’s just a recommendation. Some of these nations, like Qatar, have a lot of money to throw at Mendelian diseases. Perhaps they’ll use preimplantation genetic diagnosis? I don’t know.

 
• Category: Science • Tags: Genomics, Inbreeding
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"