The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Genetics of Height

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

Kobe Bryant is an exceptional professional basketball player. His father was a “journeyman”. Similarly, Barry Bonds and Ken Griffey Jr. both surpassed their fathers as baseball players. Both of Archie Manning’s sons are superior quarterbacks in relation to their father. This is not entirely surprising. Though there is a correlation between parent and offspring in their traits, that correlation is imperfect.

Note though that I put journeyman in quotes above because any success at the professional level in major league athletics indicates an extremely high level of talent and focus. Kobe Bryant’s father was among the top 500 best basketball players of his age. His son is among the top 10. This is a large realized difference in professional athletics, but across the whole distribution of people playing basketball at any given time it is not so great of a difference.

What is more curious is how this related to the reality of regression toward the mean. This is a very general statistical concept, but for our purposes we’re curious about its application in quantitative genetics. People often misunderstand the idea from what I can tell, and treat it as if there is an orthogenetic-like tendency of generations to regress back toward some idealized value.

Going back to the basketball example: Michael Jordan, the greatest basketball player in the history of the professional game, has two sons who are modest talents at best. The probability that either will make it to a professional league seems low, a reality acknowledged by one of them. In fact, from what I recall both received special attention and consideration because they were Michael Jordan’s sons. It is still noteworthy of course that both had the talent to make it onto a roster of a Division I NCAA team. This is not typical for any young man walking off the street. But the range in realized talent here is notable. Similarly, Joe Montana’s son has been bouncing around college football teams to find a roster spot. Again, it suggests a very high level of talent to be able to plausibly join a roster of a Division I football team. But for every Kobe Bryant there are many, many, Nate Montanas. There have been enough generations of professional athletes in the United States to illustrate regression toward the mean.

So how does it work? A few years ago a friend told me that the best way to think about it was a bivariate distribution, where the two random variables are additive genetic variation and environmental genetic variation. Clearer? For many, probably not. To make it concrete, let’s go back to the old standby: the quantitative genetics of height.

For height in developed societies we know that ~80% of the variation of the trait in the population can be explained by variation of genes in the population. That is, the heritability of the trait is 0.80. This means that the correspondence between parents and offspring on this trait is rather high. Having tall or short parents is a decent predictor of having tall or short offspring. But the heritability is imperfect. There is a random “environmental” component of variation. I put environmental in quotations because that really just means it’s a random noise effect which we can’t capture in the additive or dominance components (this sort of thing may be why homosexual orientation in individuals is mostly biologically rooted, even if its population-wide heritability is modest). It could be biological, such as developmental stochasticity, or gene-gene interactions. The point is that this is the component which adds an element of randomness to our ability to predict the outcomes of offspring from parents. It is the darkening of the mirror of our perceptions.

Going back to height, the plot to the left shows an idealized normal distribution of height for males. I set the mean as 70 inches, or 5 feet 10 inches. The standard deviation is 2.5, which means that if you randomly sampled any two males from the dataset the most likely value of the difference would be 2.5 inches which is just the average deviation from the mean (it’s a measure of dispersion). Obviously the height of a male is dependent upon the height of a father, but the mother matters as well (perhaps more due to maternal effects!). Here we have to note that there’s clearly a sex difference in height. How do you handle this problem? Actually, that’s easy. Just convert the heights of the parents to sex-controlled standard deviation units. For example, if you are 5 feet and 7.5 inches as a male you are 1 standard deviation unit below the mean. If you are a female at the same height you are 1.4 standard deviation units above the mean (assuming female mean height of 5 feet and 4 inches, and standard deviation of 2.5 inches). If height was nearly ~100% heritable you’d just average the two parental values in standard deviation units to get the expectation of the offspring in standard deviation units. In this case, the offspring should be 0.2 standard deviation units above the mean.

But height is not ~100% heritable. There is an environmental component of variation which isn’t accounted for by the parental genotypic values (at least the ones with effects of interest to us, the additive components). If height is ~80% heritable then you’d expect the offspring to regress 1/5th of the way back to the population mean. For the example above, the expectation of the offspring would be 0.16 standard deviation units, not 0.20.

Let’s make this more concrete. Imagine you sampled a large number of couples whose midparent phenotypic value is 0.20 standard deviation units above the mean in height. This means that if you convert the father and mother into standard deviation units, their average is 0.20. So one pair could be 0.20 and 0.20, and another could be of someone 2.0 and -1.6 standard deviation units. What’s the expected distribution of male offspring height?

The relevant points:

1) The midparent value naturally is constrained to have no variance (though as I indicate above since it’s an average the selected parents may have a wide variance)

2) The male offspring are somewhat above the average population in distribution of height

3) It remains a distribution. The expected value of the offspring is a specific value, but environmental and genetic variation remains to produce a range of outcomes (e.g., Mendelian segregation and recombination)

4) There has been some regression back to the population mean

I only displayed the males. There are obviously going to be females among the offspring generation. What would the outcome be if you mated the females with the males? Recall that the female heights would exhibit the same mean, 0.16 units above the original population mean. This is where many people get confused (frankly, those whose intelligence is somewhat closer to the mean!). They presume that a subsequent generation of mating would result in further regression back to the mean. No! Rather, the expected value of the offspring would be 0.16 units. Why?

Because through the process of selection you’ve created a new genetic population. The selection process is imperfect in ascertaining the exact causal underpinning of the trait value of a given individual. In other words, because height is imperfectly heritable some of the tall individuals you select are going to be tall for environmental reasons, and will not pass that trait to heir offspring. But height is ~80% heritable, which means that the filtering process of genes by using phenotype is going to be rather good, and the genetic makeup of the subsequent population will be somewhat deviated from the original parental population. In other words, the reference population to which individuals “regress” has now changed. The environmental variation remains, but the additive genetic component around which the regression is anchored is now no longer the same.

This is why I state that regression toward the mean is not magical in a biological sense. There is no population with fixed traits to which selected individuals naturally regress or revert to. Rather, populations are useful abstractions in making sense of the statistical correlations we see around us. The process of selection is informed by population-wide trends, so we need to bracket a set of individuals as a population. But what we really care about are the genetic variables which underpin the variation across the population. And those variables can change rather easily through selection. Obviously regression toward the mean would be exhibit the magical reversion-toward-ideal-type property that some imagine if the variables were static and unchanging. But if this was the matter of things, then evolution by natural selection would never occur!

Therefore, in quantitative genetics regression toward the mean is a useful dynamic, a heuristic which allows us to make general predictions. But we shouldn’t forget that it’s really driven by biological processes. Many of the confusions which I see people engage in when talking about the dynamic seem to be rooted in the fact that individuals forget the biology, and adhere to the principle as if it is an unthinking mantra.

And that is why there is a flip side: even though the offspring of exceptional individuals are likely to regress back toward the mean, they are also much more likely to be even more exceptional than the parents than any random individual off the street! Let’s go back to height to make it concrete. Kobe Bryant is 6 feet 6 inches tall. His father is 6 feet 9 inches. I don’t know his mother’s height, but her brother was a basketball player whose height is 6 feet 2 inches. Let’s use him as a proxy for her (they’re siblings, so not totally inappropriate), and convert everyone to standard deviation units.

Kobe’s father: 4.4 units above mean

Kobe: 3.2 units above mean

Kobe’s mother: 1.6 units above the mean

Using the values above the expected value for the offspring of Kobe’s father & mother is a child 2.4 units above the mean. Kobe is somewhat above the expected value (assuming that Kobe’s mother is a taller than average woman, which seems likely from photographs). But here’s the important point: his odds of being this height are much higher with the parents he has than with any random parents. Using a perfect normal distribution (this is somewhat distorted by “fat-tailing”) the odds of an individual being Kobe’s height are around 1 in 1,500. But with his parents the odds that he’d be his height are closer to 1 out of 5. In other words, Kobe’s parentage increased the odds of his being 6 feet 6 inches by a factor of 300! The odds were still against him, but the die was loaded in his direction in a relative sense. By analogy, in the near future we’ll see many more children of professional athletes become professional athletes both due to nature and nurture. But, we’ll continue to see that most of the children of professional athletes will not have the requisite talent to become professional athletes.

Image Credit: Wikipedia

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

In the post below, Moderate marginal value to genomics, I left some things implicit. It turns out that this was an ill-considered decision. In reality my comments were simply more cryptic and opaque than implicit. This is pretty obvious because even those readers who are biologists didn’t seem to catch what I had assumed would be obvious in the thrust of my argument.

The point in the broadest sense is that DNA and genomics are not magical. Genetics existed before either of them. Understanding the physical basis of genetics has certainly been incredibly fruitful, and genomics has altered the playing field in many ways. But there was a broad understanding of genetics before DNA and genomics, both in a Mendelian sense and in the area of biometrics and quantitative genetics. In the earlier post I indicated that the tools for predictions of adult traits due to the effect of genes have been around for a long time: our family history. By this, I mean that a lot of traits of interest are substantially heritable. A great deal of the variation within the population can be explained by variation of genes in the population, as inferred by patterns of correlation between individuals in their traits as a function of genetic relatedness. This is genetics as a branch of applied statistics. It has great “quick & dirty” power, especially in agricultural science.

Let’s look at something simple, height. It’s a continuous trait which is rather concrete. No one argues that “height” is a social construct. In Western societies height is ~80-90% heritable. That means that most of the variation within the population of the trait can be explained by variation in one’s family background. Tall people have tall children, short people have short children, and so forth. Here’s a “toy” scatterplot which shows the relation between mid-parent heights and adult offspring heights (I made up the numbers):

The correlation isn’t perfect. But it’s pretty good. The more heritable a trait is, the more a scatterplot of this form (offspring regressed on parents) approaches tight linearity with a slope of ~1. These plots are measuring narrow sense heritability, which is the additive genetic variance over the phenotypic variance. Additive genetic variance just means the variants which have additive or subtractive values to the trait value (or, they can be transformed as such).

To make this plot in a fashion which is more than illustrative you need a lot of data on a large number of individuals and their parents. This would be tedious and require a substantial labor investment in earlier periods, but today with powerful data mining techniques I think it would be much, much, easier. In a world where the child is the father of the man these methods would have great power.

But they’re not perfect. Siblings vary in height, even though though the trait seems mostly controlled by variation in genes on the population level. What’s going on? Genetically, Mendelian segregation and genetic recombination are going to reshuffle the many alleles which control variation in height from parent to offspring in terms of what the gamete contributes. Additionally, the nature of the environmental “noise” may vary from sibling to sibling. Using population wide data you can infer the expected value of the offspring based in heritability and mid-parent value, but there’s going to be variance about the mean of the theoretical distribution. For example, the standard deviation of I.Q. within the population is 15 points, and across full-siblings it is also 15 points.

This is where genomics comes in. It does make a difference, on the margin. I suspect it would do so by removing some of the uncertainty of segregation and genetic recombination. Going back to the height example, imagine that you know of the ~1,000 genes which vary within the population to control variation in height. You sequence two parents, and so know which regions of the genomes they’re enriched for “tall” or “short” alleles. Some of the variance in the offspring is going to be due to the fact that the offspring don’t receive a perfect proportional representation of their parent’s alleles in terms of aggregate effect size. You could then remove some of the uncertainty in outcome because you can check the child’s genome against the parents’ and assess whether they received more or less of the “tall” or “short” alleles.

But there would still be environmental “noise” which you probably couldn’t account for. You can see an illustration of what I have in mind in the two normal distributions I plotted above. Both of them represent the theoretical distribution of possibilities of a child on a quantitative trait which only becomes realized in adulthood. The blue line shows what you can infer from the plain information of parental phenotypes. But what happens when you give them a genomic test? You remove some of the uncertainty from your calculus, and the variance drops. You see that in the red line.

This is what I mean when I say that genomics matters on the margin. It does have an effect. But all the tools to profile and predict are around us now. Even determined amateurs can find out quite a bit about someone’s family if they’re determined. This is no different in deep principle from the sort of techniques which large corporations are utilizing to create a “profile” of your possible future purchases by what you purchased in the past. The parents are past purchases. The adult offspring are future purchases. Knowing a lot of behavior genetic implicated genes might help the profile, but at the end of the day it’s not a deal-breaker or a game-changer.

An analogy to current market research and prediction algorithms is particularly apropos I think. They creep people out. So I naturally expect people to be creeped out if the state or insurance company has detailed fleshed out acturial tables based on genetics and genomics. But genetics or genomics don’t make it any more or less scary on a deep level. Nor do they make the techniques qualitatively more effective. And the policy questions and responses are going to be the same no matter what.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

It is known that Northern Europeans tend to be somewhat taller than Southern Europeans. This seems intuitively obvious if you spend a bit of time around representative populations. Growing up in the Pacific Northwest I’ve always been on the short side at 5 feet 8 inches, but when I was in Italy for 3 weeks one year back (between Milan and Rome, with disproportionate time spent in the Piedmont) I didn’t feel as small (I recall feeling similarly when I was in Cajun country in the early 2000s). Steve Hsu alerts me to the fact that Luke Jostins is back blogging at Genetic Inference, reporting from the Biology of Genomes meeting. Apparently Michael Turchin has found that:

1) Alleles known to be associated with greater height are found at higher frequencies in Northern Europeans

2) Alleles known to be associated with greater height also exhibit signatures of natural selection

He used the GIANT consortium data set. How big is it? 129 thousand individuals! Luke adds:

This is a textbook example of how an evolutionary study should be done; you show a phenotypic difference exists, that it is heritable, and that it is under selection. This opens the question as to why height has been selected in Northern Europe (or shortness in Southern Europe). Could the same data be used to test specific hypotheses there?

One thing we do know is that there isn’t much difference in heights between black Americans and white Americans, who are predominantly Northern European in ancestry. I wonder if perhaps the smaller sizes of Southern Europeans is due to the fact that these populations have lived for a longer period under a high density agricultural regime than either Northern Europeans or Africans (northern Sweden still was dominated by hunter-gatherers until ~5,000 years B.P.). My working hypothesis that for various reasons stable agricultural societies may reduce lifetime mortality rates but maintain higher levels of morbidity, making large body sizes less feasible. But that’s just speculation. At last European is a good testing ground for these sorts of explorations, as obviously obligate nutritional differences aren’t much of an issue anymore on that continent.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Aka Pygmies

The Pith: There has been a long running argument whether Pygmies in Africa are short due to “nurture” or “nature.” It turns out that non-Pygmies with more Pygmy ancestry are shorter and Pygmies with more non-Pygmy ancestry are taller. That points to nature.

In terms of how one conceptualizes the relationship of variation in genes to variation in a trait one can frame it as a spectrum with two extremes. One the one hand you have monogenic traits where the variation is controlled by differences on just one locus. Many recessively expressed diseases fit this patter (e.g., cystic fibrosis). Because you have one gene with only a few variants of note it is easy to capture in one’s mind’s eye the pattern of Mendelian inheritance for these traits in a gestalt fashion. Monogenic traits are highly amenable to a priori logic because their atomic units are so simple and tractable. At the other extreme you have quantitative polygenic traits, where the variation of the trait is controlled by variation on many, many, genes. This may seem a simple formulation, but to try and understand how thousands of genes may act in concert to modulate variation on a trait is often a more difficult task to grokk (yes, you can appeal to the central limit theorem, but that means little to most intuitively). This is probably why heritability is such a knotty issue in terms of public understanding of science, as it concerns the component of variation in quantitative continuous traits which is dispersed across the genome. The traits where there is no “gene for X.” Additionally, quantitative traits are likely to have a substantial environmental component of variation, confounding a simple genotype to phenotype mapping. Arguably the classic quantitative trait is height. It is clear and distinct (there aren’t arguments about the validity of measurement as occurs in psychometrics), and, it is substantially heritable. In Western societies with a surfeit of nutrition height is ~80-90% heritable. What this means is that ~80-90% of the variance of the trait value within the population is due to variance of the genes within the population. Concretely, there will be a very strong correspondence between the heights of offspring and the average height of the two parents (controlled for sex, so you’re thinking standard deviation units, not absolute units). And yet height is at the heart of the question of the “missing heriability” in genetics. By this, I mean the fact that so few genes have been associated with variation in height, despite the reality that who your parents are is the predominant determination of height in developed societies.

The issue gets even more thorny when you talk about variation across societies. This is a simple and yet complex issue. On the one hand we know that over time people across the world have gotten taller as nutrition has gotten better. What is less well known is that human populations have been shrinking until the past few centuries since the the Last Glacial Maximum ~20,000 years ago. Why? One can posit many reasons, both genetic and environmental, but it does point us to the reality that the story of height is not monotonic. That is, it doesn’t go in one direction, and has no simple one size fits all answer.

But that’s just the dimension of time. How about space? The question of whether different populations have final different genetic potentials for height is a disputed one. And yet it seems plausible that at the extremes there are genuine differences in the gene frequencies across populations which will speak to their different distributions in trait values. This is particularly interesting in the case of very populations characterized by low median adult heights, often termed “pygmies.” Of particular note are the Pygmies of Central Africa, who exist in a state of cultural symbiosis with their Bantu and Nilotic neighbors, adopting their languages, but remaining distinct.

These populations have very low median heights, but they are clearly not dwarfs (they are proportionate). Thankfully at least the population genetics of the Pygmies of Africa are now relatively well understood. It seems that the Western and Eastern Pygmy populations are very distinct clusters, with a common ancestry perhaps on the order of tens of thousands of years in the past. And not surprisingly the genetic distance between the Pygmy groups and their non-Pygmy neighbors is very large. The Western Pygmies tend to show more evidence of admixture with their Bantu neighbors than the Eastern ones (I suspect this is due to the longer residence of Bantus in this region). But for me the hardest issue to grapple with is the reality that the Pygmies of Central Africa seem to be genetically closer to the Khoisan people of Southern Africa than their Bantu or Nilotic neighbors! I believe this is evidence of an ancient hunter-gatherer continuum within Africa which has been marginalized and overlain by the recent expansion of Bantu farmers and Nilotic pastoralists.

In any case, what does all this have to do with the genetics of height? A new paper in the American Journal of Physical Anthropology synthesizes the inferences generated from population genetics with the basic logical assumptions of quantitative genetics to adduce that the difference between Pygmies and non-Pygmies in height is actually likely to be due to heritable differences. Indirect evidence for the genetic determination of short stature in African Pygmies:

Central African Pygmy populations are known to be the shortest human populations worldwide. Many evolutionary hypotheses have been proposed to explain this short stature: adaptation to food limitations, climate, forest density, or high mortality rates. However, such hypotheses are difficult to test given the lack of long-term surveys and demographic data. Whether the short stature observed nowadays in African Pygmy populations as compared to their Non-Pygmy neighbors is determined by genetic factors remains widely unknown. Here, we study a uniquely large new anthropometrical dataset comprising more than 1,000 individuals from 10 Central African Pygmy and neighboring Non-Pygmy populations, categorized as such based on cultural criteria rather than height. We show that climate, or forest density may not play a major role in the difference in adult stature between existing Pygmies and Non-Pygmies, without ruling out the hypothesis that such factors played an important evolutionary role in the past. Furthermore, we analyzed the relationship between stature and neutral genetic variation in a subset of 213 individuals and found that the Pygmy individuals’ stature was significantly positively correlated with levels of genetic similarity with the Non-Pygmy gene-pool for both men and women. Overall, we show that a Pygmy individual exhibiting a high level of genetic admixture with the neighboring Non-Pygmies is likely to be taller. These results show for the first time that the major morphological difference in stature found between Central African Pygmy and Non-Pygmy populations is likely determined by genetic factors.

First, is there a plausible physiological reason for the difference in adult height between Pygmies and non-Pygmies? The authors review the relevant evidence:

Endocrinologists have described the physiological determination of the African Pygmies’ short stature: serum levels of Insulin-Like Growth Factor 1 (IGF1) and of Growth Hormone Binding Protein (GHBP) are abnormally low, whereas the levels of Growth Hormone (GH) and IGF2 do not differ from Non-Pygmy controls…In this context, Merimee…proposed that the short stature of African Pygmies could be attributed to the absence of a growth spurt during puberty and that the genetic factor(s) implicated in the Pygmy stature were to be found in the GH-IGF1 axis…A recent gene-expression study further showed a slight (1.8-fold) under-expression of GH and a more dramatic (8-fold) under-expression of the GH receptor in adult African Pygmies, which was not found in Non-Pygmy Bantu speakers…However, the only genetic study focusing specifically on Pygmies’ stature, failed to find allele frequency differences in the promoter region of the gene encoding IGF1 between two African Pygmy populations and Non-Pygmy controls…In this context, whether the Pygmy populations’ short stature is solely due to environmental pressures experienced by individuals during growth (i.e., phenotypic plasticity), or to a complex genetic mechanism, remains to be demonstrated.

I believe that IGF can be found in meat and milk, so there are plausible dietary reasons that one could imagine this difference. As far as looking at differences between the genes which are known to impact height within populations across populations, there simply aren’t that many genes known which could account for the large between population differences. Not to mention that many of the current studies have used European populations, and so would likely have an ascertainment bias which might miss a lot of variance which is common within African populations.

The basic method in this paper is not too difficult to understand:

1) Use STRUCTURE, a program which assigns different ancestral quanta to individuals.

2) And compare the variation in a particular Pygmy-modal quantum across the population with variation in height.

If there are many genetic variants of small effect within the Pygmy genome which are resulting in their relatively low adult median height then dollops of Pygmy genome through admixture will reduce the height of non-Pygmies and dollops of non-Pygmy admixture in Pygmies will increase their height. The presumption is that if there are strong environmental impacts on height due to social differences then the disjunction between genetic identity and anthropological identity will be informative. For example, if Pygmies are put under particular stress or deprived specific nutritional intake because of their communal identity as marginalized Pygmies then different admixture levels with non-Pygmies should not matter much (and vice versa).

There’s a lot of statistics toward the aim of achieving significance in this paper (p-value > 0.05). And I really don’t understand the point of disaggregating males and females, for example. Just convert them to standard deviation units deviated from sex median! But in any case the major correlation is well illustrated by the two panels below. Pygmies are in red and non-Pygmies are in blue:

The y-axis is straightforward, height. You can see the Pygmies in their sample are shorter, on average. The x-axis is an ancestral component inferred from STRUCTURE which is generally found in non-Pygmies. You can see that as expected non-Pygmies have more of this than the Pygmies, but the descriptive statistic of a correlation between the non-Pygmy ancestry and height in Pygmies is evident even in this plot. Conversely, the Pygmy ancestry is correlated with lower adult height in non-Pygmies.

As a single result this particular finding isn’t too earth-shaking. If there was one population which was short due to genetic factors, I suspect that one would have to bet on the Pygmies of Central Africa. And as noted in the paper Pygmoid morphology is found among other hunter-gatherer tropical populations. This may not be a human ancestral type, but it is a type which has emerged repeatedly in our history, whether due to genetic or environmental factors. The big picture is that this same general procedure can be used to explore the differences in genetic dispositions across groups for many quantitative traits. With the coming era of cheap genotyping and sequencing I’m sure it will be done. A intrepid researcher has plenty of admixed populations in the New World to select from. There are in Brazil people who are socially identified and self-identify as white who have less European ancestry than those who are socially identified and self-identify as non-white. To compare the the social and genetic valences of African and European ancestral contributions for medical and psychological quantitative traits these sorts of populations will be of great future interest.

Link credit: Dienekes

Citation: Becker NS, Verdu P, Froment A, Le Bomin S, Pagezy H, Bahuchet S, & Heyer E (2011). Indirect evidence for the genetic determination of short stature in African Pygmies. American journal of physical anthropology PMID: 21541921

(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"