The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
🔊 Listen RSS

Meeting the Taino

In the comments below a few days ago someone expressed concern at the diminishing of genetic diversity due to the disappearance of indigenous populations. My response was bascally that it depends. The issue here is whether that disappearance is due to assimilation, or extinction. If a given population is genetically absorbed into another, obviously their genetic diversity is by and large maintained. What disappears are the specific genotypes, the combinations of gene pairs, which are distinctive to that given group. This is the same dynamic at the heart of the ‘disappearing blonde gene’ meme. Unless there is selection at the loci which encode or predispose one to blonde hair the ‘gene’ isn’t going anywhere. Rather, the implicit issue here is that blonde people are intermarrying with non-blonde people, and if the genetic variant has a recessive expression then the frequency of the trait will decrease. Populations with a high degree of homozygosity at the ‘blonde loci’ are distinctive in a very particular manner, but they’re no more or less ‘diverse’ than other populations which don’t manifest the same tendency.

A toy example will suffice. Take two populations, A and B, and one locus, 1, with two variants, X and x. Assume that the two populations are the same size. At locus 1 population A is 100% X, and population B is 100% x. In a diploid scenario then all the individuals in population A will be XX, and in B will be xx. When you add A + B you get a frequency of X of 0.5, and of x of 0.5 (since the two populations are balanced in size).


Now imagine a scenario where all individuals in population A pair up with someone in population B (assume sex balance in both populations). In the first generation, F1, all the offspring will be heterozygote Xx (hybrids). The frequency of X and x will be 0.5 still, as in the previous generation. But no individual now reflects the genotype of the parental populations, as all individuals are heterozgyote. At the level of alleles, specific genetic variants, you’ve go the same diversity (X and x at locus 1). But at the level of genotype there’s a huge shift. Two genotypes (XX and xx) no longer exist, but a novel one is now fixed in the population (Xy).

A novel combination

Finally, in the F2 generation, the offspring of F1, Hardy-Weinberg will reassert itself. 25% of the genotypes will be XX, 25% xx, and 50% Xx, due to p2 + 2pq + q2 = 1. In this scenario some of the distinctiveness of the parental and F1 generations in terms of genotype are evident, but the diversity in the allelic sense of the parental and F1 states remains the same, X = 0.5 and x = 0.5. Observe that if you’re looking at genotypic diversity the F2 generations are actually more diverse than the parental (because Xy is a different genotype). In other words, in some ways the aggregation of various distinct populations may increase diversity by generating novel combinations.

This is not to deny that a very specific historically contingent form of diversity in terms of distinctness of particular groups is threatened today. That’s why it was important that the HGDP was overloaded with threatened groups like the Bushmen, Kalash, and Pygmies. These populations may be assimilated soon, and with that assimilation it will be more difficult to extract out historically very important information which will inform us about the human past.

But another issue is extinction instead of assimilation. Wouldn’t this eliminate a lot of genetic variation? Perhaps. I actually considered this issue a few years back with the Star Trek reboot. If you haven’t watched the film, there’s a major spoiler next. So basically on the order of ~10,000 Vulcans survived the destruction of their planet. Culturally the preservation was rather good, because the Vulcan elders, who are the repositories of the culture, were saved. In this way a fully fleshed Vulcan culture could easily reemerge out of the genocide. On the other hand, the vast majority of Vulcans died. Isn’t ths population bottleneck a genetic catastrophe? It depends. If the Vulcans who survived are a relatively random assortment of the population genetically, then the disaster isn’t that bad in terms of genetic diversity.

To get some idea of why, consider the statistic of heterozygosity. This measures the extent of heterozygote states, where the two gene copies differ at a locus, across the population. It’s a proxy for genetic diversity, as more allelic diversity produces more heterozygosity.

The decay of heterozygosity over time due to random genetic drift (without mutation) can be modeled like so:

Ht = H0(1 – 1/(2N))t

The variable “t” is simply the generation time, from an initial time. H0 refers to the initial heterozygosity, and Ht is simply the value at a given time out from that initial value. The N is effective population size. This formula can be used to model population bottlenecks. The Vulcan population reduction from one on the order of billions to 10,000 was basically a massive population bottleneck. The decrease in heterozygosity that you’d expect would be:

Ht = (1 – 1/(2*10,000))1

Ht = 0.99995 of the initial value. Basically almost nothing. Why? Because 10,000 turns out to be a relatively large population. This makes some intuitive sense. If you have a sample size of 10,000, and it’s representative, sample variance isn’t going to be that high. If you have an infinite number of coin flips so that the ratio of heads and tails is 50:50, reducing that to 10,000 flips isn’t going to result in much of a deviation from 50:50.

Let’s look at the effect of population bottlenecks of 20 generations at various values of N. The x axis shows generation time, while the y axis illustrates the proportion of the initial heterozyosity which remains.

This is not to downplay the impact of bottlenecks and demographic stochasticity. Rather, it’s to suggestion that population genetic diversity is relatively resistant to a crash in numbers. The extinction of small tribal groups is a tragedy, but genetically it may not be as much of a problem as we think. Even in groups such as the Bushmen with a great deal of genetic diversity it is likely that most of that diversity is already found within non-Bushmen populations.

Image credits: Ian Beatty and Lesley-Ann Brandt.

🔊 Listen RSS

ResearchBlogging.orgThe Pith: When it comes to the final outcome of a largely biologically specified trait like human height it looks as if it isn’t just the genes your parents give you that matters. Rather, the relationship of their genes also counts. The more dissimilar they are genetically, the taller you are likely to be (all things equal).

Dienekes points me to an interesting new paper in the American Journal of Physical Anthropology, Isolation by distance between spouses and its effect on children’s growth in height. The results are rather straightforward: the greater the distance between the origin of one’s parents, the taller one is likely to be, especially in the case of males. These findings were robust even after controlling for confounds such as socioeconomic status. Their explanation? Heterosis, whether through heterozygote advantage or the masking of recessive deleterious alleles.

The paper is short and sweet, but first one has to keep in mind the long history of this sort of research in the murky domain of human quantitative genetics. This is not a straight-forward molecular genetic paper where there’s a laser-like focus on one locus, and the mechanistic issues are clear and distinct. We are talking about a quantitative continuous trait, height, and how it varies within the population. We are also using geographical distance as a proxy for genetic distance. Finally, when it comes to the parameters affecting these quantitative traits there are a host of confounds, some of which are addressed in this paper. In other words, there’s no simple solution to the fact that nature can be quite the tangle, more so in some cases than others.

Because of the necessity for subtlety in this sort of statistical genetic work one must always be careful about taking results at face value. From what I can gather the history of topics such as heterosis in human genetics is always fraught with normative import. The founder of Cold Spring Harbor Laboratory, Charles Davenport, studied the outcomes of individuals who were a product of varied matings in relation to genetic distance in the early 1920s. This was summed up in his book Race Crossing in Jamaica:

A quantitative study of 3 groups of agricultural Jamaican adults: Blacks, Whites, and hybrids between them; also of several hundred children at all developmental stages. The studies are morphological, physiological, psychological, developmental and eugenical. The variability of each race and sex in respect to each bodily dimension and many basis vary just as morphological traits do. In some sensory tests the Blacks are superior to Whites; in some intellectual tests the reverse is found. A portion of the hybrids are mentally inferior to the Blacks. The negro child has, apparently, from birth on, different physical proportions than the white child.

Because of the fears of miscegenation in the early 20th century scholars had a strong bias toward finding the data to confirm the assumption that admixture between divergent human kinds resulted in a breakdown and depression in trait value in relation to both parental lineages. Today this is not so. Rather, I would argue that the bias is now in the opposite direction, at least in the West. My friend Armand Leroi wrote Meet the world’s most perfect mutant seven years ago. Who is the most perfect human according to Armand? She is Saira Mohan, a model of Indian, Irish and French ancestry. Armand concludes:

If deleterious mutations rob us of it, they should do so with particular efficacy if we marry our relatives. Most novel mutations are at least partly recessive, and inbreeding should accentuate their negative effects. Many weird genetic disorders come from Pakistan and Saudi Arabia, where there is a strong tradition of first-cousin marriage.

Conversely, people of mixed ancestry should show the benefits of concealing recessive mutations. And this, I suspect, is the true meaning of Saira Mohan: half Punjabi, quarter Irish, quarter French and altogether delightful. She, too, is a mutant – but a little less so than most of us.

Thandie Newton masking recessive alleles

This is entirely in keeping with the dominant ethos of the global elite, which aims for a panmixia of genes in concert with an alignment of a particular set of cosmopolitan post-materialist memes. But, as I pointed out to Armand there are also cases where crosses between genetic backgrounds may have deleterious consequences. For example, a European specific allele in African Americans may have a negative fitness interaction with the predominant African genetic background of this population. I am not implying here that science is fiction, a construction of our biases and preconceptions. But the dominant cultural narrative framework does put pressure upon how we interpret science, and all the more so in domains which require a level of statistical subtlety and personal candor.

Of course now that we can see exactly how individuals are mutant at the level of the genome Armand’s supposition can actually be tested. That is, we can see how many deleterious recessive alleles are in fact masked in people of hybrid origin. That at least may plug one of the fuzzy spots in our picture of how genetic backgrounds interact in humans.

I prefaced the review of a paper on marital distance and height with some history of science and a reflection of how contemporary values influence the generation and interpretation of knowledge because there’s a lot of confusing material in the literature on correlations between genetic distance and trait value. There is the result that marriages between 3rd cousins seem the most fertile in Iceland. Is this because of a balance between genetic incompatibilities and expression of recessive diseases? Or perhaps the answer lies in social dynamics, insofar as people who come from related lineages are more likely to weather difficult times in their relationship? It’s one study from Iceland. But of course the minority who vociferously argue against racial amalgamation and admixture on moral/normative grounds will focus upon this specific positive empirical finding in the literature. Now, Iceland is ideal for many human genetic studies because it has excellent records and is culturally homogeneous. But at the end of the day Iceland is still Iceland.

And today Poland is still Poland. I say that because this study tracks thousands of Polish youth over the years. Here’s the abstract:

Heterosis is thought to be an important contributor to human growth and development. Marital distance (distance between parental birthplaces) is commonly considered as a factor favoring the occurrence of heterosis and can be used as a proximate measure of its level. The aim of this study is to assess the net effect of expected heterosis resulting from marital migration on the height of offspring, controlling for midparental height and socioeconomic status (SES). Height measurements on 2,675 boys and 2,603 girls ages 6 to 18 years from Ostrowiec Świętokrzyski, Poland were analyzed along with sociodemographic data from their parents. Midparental height was calculated as the average of the reported heights of the parents. Analyses revealed that marital distance, midparental height, and SES had a significant effect on height in boys and girls. The net effect of marital distance was much more marked in boys than girls, whereas other factors showed comparable effects. Marital distance appears to be an independent and important factor influencing the height of offspring. According to the “isolation by distance” hypothesis, greater distance between parental birthplaces may increase heterozygosity, potentially promoting heterosis. We propose that these conditions may result in reduced metabolic costs of growth among the heterozygous individuals.

As you may know, height is substantially heritable. That means that ~80-90% of the variation in the trait within the population in developed nations is due to variation in genes. This has some validity even within families. Tall parents tend to given rise to tall offspring, though there is a variation around the expectation. In other words, siblings differ in height, in part because of environmental factors, but also in part because siblings differ in their genetic endowments from their parents. So naively one can model this like so:

Height ~ Genetic endowment + Environmental contingencies

The genetic endowment is a function of the mid-parent value in standard deviation units. That means you average the standard deviations of the parents from the sex-controlled mean. Let’s give a concrete example. Imagine a male who is 5’8 inches, and a female who is 5’7 inches. The standard deviation for height is ~3 inches, with the American male mean being 5’10 inches and female being 5’4 inches. That means that the male is -2/3 standard deviations below the mean, and the female is 1 standard deviation above the mean. The expectation for their offspring then will be 1/3 standard deviation above the mean (5’11 for males, 5’5 for females). But because of the variation in the nature of genetics and environment, there’s actually going to be a standard deviation of ~3 inches for the offspring (e.g., ~70% chance that the male will be between 5’8 and 6’2). There is also the reality that because environmental factors aren’t heritable the offspring should regress somewhat back to the population mean all things equal, though in the case of height not too much because it is so genetically influenced.

A few years ago I played this game with libertarian pundits Megan McArdle and Peter Suderman, who announced their engagement. Megan and Peter are both 6’2. I estimated that the expected value is that any son of theirs would be 6 feet 3.6 inches, and any daughter 5 feet 9.6 inches. How can it be that their sons should be taller than either of them? Remember that Megan is much taller than Peter in standard deviation units in relation to her sex.

Now how would expectation be altered if Megan McArdle and Peter Suderman were full-siblings? (they are not full-siblings, this is a thought experiment!) At this point even if you had never taken college genetics you might be wondering whether it makes sense to calculate an expectation for the height of the offspring of two full-siblings. You know very well that there are much more serious genetic issues at hand. Going back to the relation above, you might update it like so:

Height ~ Genetic endowment + Environmental contingencies – Incest decrement

Even stipulating viability of the offspring, any child of full-siblings would exhibit all the problems that Armand alludes to above. It seems likely that whatever potential their parents might impart to their offspring, the combination of their genotypes would be highly deleterious, because near kin carry the same recessives. The paper above posits the inverse effect, where outbreeding results in greater outcomes than are to be expected based on the mid-parent trait value. In this telling, height is a proxy for health and development. This seems biologically plausible in the case of humans. Individuals who marry those genetically dissimilar impart gains of fitness to their offspring by virtue of elevated heterozygosity. So now we create a new relation:

Height ~ Genetic endowment + Environmental contingencies + Magnitude of outbreeding

In pre-modern societies individuals tended to marry those close to them geographically. Even if cousin marriage was not normally practiced, over time clusters of villages would form networks of de facto consanguinity. In the 19th and especially 20th century much of this in the extreme cases abated in Europe because of better transport. L. L. Cavalli-Sforza documented this in Consanguinity, Inbreeding, and Genetic Drift in Italy. Modern roads resulted in a radical drop in inbreeding in mountainous regions of the country. Some researchers have argued that this shift resulted in an increased level of height, intelligence, and health, among European populations.

With that, here’s a nice map from

Going back to the paper, after controlling for socioeconomic status they found that:

1) The increased marital distance predicts taller height than expected, especially in boys.

2) This effect is most noticeable in boys who already have parents who are relatively tall.

3) Finally, greater marital distance seems to be correlated with greater height in the parents!

The last is actually a possible reason why there’s no reason to appeal to heterosis at all. This might simply be a function of assortative mating of tall individuals who are more mobile. In the paper the authors go at length about sexual selection, greater mobility of individuals who are taller, etc. But whatever the reason, this shows exactly the care which must be taken with these sorts of results. It is known for example that taller individuals seem to have higher I.Q.s, leading some to assert that the genes which control height and I.Q. variance must be the same (some of them almost certainly are if there are many loci of small effect). But, it turns out that this height-I.Q. correlation disappears within families (tall siblings are no smarter than short siblings), implying that the correlation might be a function of assortative mating.

As for why there may be a sex difference, the authors suggest that heterosis may manifest at different points in the developmental arc of children. Females mature somewhat faster than males. This may be so, the sexes differ and such. But my own preference is that the original results merit a deeper and expanded examination before we posit an evolutionary story (that’s not possible in a scientific paper which needs a discussion, but I’m proposing an ideal world of knowledge generation and refinement!). The empirics need to be firmed up before we scaffold it in theory. Poland is Poland, and if you troll through enough data sets there’ll be millions of correlations which are publishable. And yet we are living in the age of information, so we had better get going in sieving through it. At the end of the paper the authors go in a direction which I think might yield some interesting finds in the future:

One possible limitation of our study and explanation of the results may come from the fact that we used geographical distance between parental birthplaces as the only approximate measure of offspring heterozygosity. Further studies should focus on more direct examination of individuals’ allele diversity and its influence on physiological processes. Of particular interest would be investigation of a possible relationship between the level of basal metabolic rate and individual’s heterozygosity both in general term as well as heterozygosity of specific locus. Such suggestion seems to be supported by previous studies which indicate that the variation in energy expenditure at rest is determined by substantial genetic component (Bouchard et al., 1989; Bouchard and Tremblay, 1990) and heterogeneity of gene loci (Jacobson et al., 2006; Loos et al., 2007). More studies in this regard may be crucial for a better and profound understanding of the Homo sapiens metabolism and energy budget.

Because of the advances in genomics, as well as the proliferation of social science data sets (thanks to corporations and government) I hope that we can begin breaking out of the habit of being led about by the nose by our norms in more areas of human genetics than just the study of Mendelian diseases! That’s a hope. I’m not saying I’d bet money on it.

Citation: Sławomir Kozieł, Dariusz P. Danel, & Monika Zaręba (2011). Isolation by distance between spouses and its effect on children’s growth in height American journal of physical anthropology : 10.1002/ajpa.21482

Image Credit: Caroline Bonarde Ucci.

🔊 Listen RSS

800px-IMGP2147The number 1 gets a lot more press than -1, and the concept of heterozygosity gets more attention than homozygosity. Concretely the difference between the latter two is rather straightforward. In diploid organisms the genes come in duplicates. If the alleles are the same, then they’re homozygous. If they’re different, then they’re heterozygous. Sex chromosomes can be an exception to this because in the heterogametic sex you generally have only one copy of gene as one of the chromosomes is sharply truncated. This is why in human males are subject to X-linked recessive traits at such a great frequency in comparison to females; recessive expression is irrelevant when you don’t have a compensatory X chromosome to mask the malfunction of one allele.

Of course recessive traits are not simply a function of sex-linked traits. Consider microcephaly, an autosomal recessive disease. To manifest the trait you need two malfunctioning copies of the gene, one from each parent. In other words, you exhibit a homozygous genotype with two mutant copies. I suspect that this particularly common context of homozygosity, recessive autosomal diseases, is one reason why it is less commonly discussed outside of specialist circles: there are whole cluster of medical and social factors which lead to homozygosity which are already the focus of attention. The genetic architecture of the trait is of less note than the etiology of the disease and the possible reasons in the family’s background which might have increased the risk probability, especially inbreeding. In contrast heterozygosity is generally not so disastrous. Even if functionality is not 100%, it is close enough for “government work.” The deleterious consequences of a malfunctioning allele are masked by the “wild type” good copy. The exceptions are in areas such as breeding for hybrid vigor, when heterozygote advantage may be coming to the fore. The details of complementation of two alleles matter a great deal to the bottom line, and the concept of hybrid vigor has percolated out to the general public, with the more informed being cognizant of heterozygosity. But homozygosity is of interest beyond the unfortunate instances when it is connected to a recessive disease. Like heterozygosity, homozygosity exists in spades across our genome. My 23andMe sample comes up as 67.6% homozygous on my SNPs (which are biased toward ~500,000 base pairs which tend to have population wide variation), while Dr. Daniel MacArthur’s results show him to be 68.1% homozygous across his SNPs. This is not atypical for outbred individuals. In contrast someone whose parents were first cousins can come up as ~72% homozygous. This is important: zygosity is not telling you simply about the state of two alleles, in this case base pairs, it may also be telling you about the descent of two alleles. Obviously this is not always clear on the base pair level; mutations happen frequently enough that even if you carry two minor alleles it is not necessarily evidence that they’re identical by descent (IBD), or autozygous (just a term which denotes ancestry of the alleles from the same original copy). What you need to look for are genome-wide patterns of homozygosity, in particular “runs of homozygosity” (ROH). These are long sequences biased toward homozygous genotypes.

220px-Morgan_crossover_1What ROH can tell you about an individual, and perhaps a population, becomes more clear when you conceptualize in your mind’s eye the basic dynamics which occur in the course of biological replication in diploid sexual organisms. Each individual receives half their autosomal genome from each parent. Though genes are abstractions, individual units at the root of a complex causal sequence which maps to a phenotype, a trait, they’re also physical entities embedded within the structure of DNA. This structure is a physical sequence, whereby you have adjacent base pairs, clusters of which define genes, intergenic regions, exons, introns, promoters, etc. In other words, the whole alphabet soup of molecular genetics. The spatial relationship of genes to each other along the chromosome allowed for linkage mapping decades before the biophysical substrate of DNA was known to be critical to the whole process. Particular sequences of alleles may therefore be inherited together, and form a haplotype. Over the generations the associations of these distinctive alleles in haplotypes dissolve through recombination, a physical process which erodes the structural integrity of chromosomal sequences.

210px-Juan_de_Miranda_Carreno_002With these basics in mind, let’s move to a specific repulsive example. Imagine a father who impregnates his daughter. Why is this repulsive to us? From a consequential “gene’s eye” perspective the father is suborning the beauty of sexual reproduction whereby genetic variation is mixed & matched across individuals. Colloquially, where the daughter would be 50% of the father genetically, the child of the daughter and her father would be 75% of the father genetically. From a gene-only perspective this may be favorable, as the father is coming closer to cloning himself, but we all know that the rate of breakdown of the “vehicle” in these individuals is high. Why? Inbreeding leads to a relatively massive increase in homozygosity as chromosomal segments identical by descent are paired off against each other. We know that the problem is that a host of nasty recessive diseases are highly likely in inbred individuals.

All humans carry a large load of deleterious alleles. Some of these may be potentially lethal. But like bombs without the trigger a functional copy of the alleles complements and masks the mutant variety and we carry on. Many of these mutants are particular to our family, and some of them are private even to ourselves, the outcome of de novo mutations which make each human distinctive genetic islands (at least until they reproduce and pass on their mutational distinctions). Therefore a man who mixes his own genes together in the act of incest is potentially lighting the fuse whereby these hidden malevolent mutants will explode from being cryptic genetic abormalities toward full-blown disease monstrosities.

One statistic which would register incest would be ROH; naturally when you have long regions of recently IBD chromosomal segments adjacent to each other you’ll have a lot of homozygosity, since the paired alleles are replica copies. Assuming that an individual with many long ROH can survive and reproduce over time these massive swaths of homogeneity will be wiped away by mutation and recombination as well as outbreeding. Incest is still arguably a health disaster, but one can imagine the motive genetic engines of evolutionary variation healing the damage over time.

And it doesn’t have to be so extreme. Father-daughter or sibling incest is only a boundary condition. First cousin marriages aren’t nearly as disastrous, the fecundity of British Pakistanis despite higher rates of genetic abnormalities being clear evidence of this. They are certainly more evolutionarily fit than non-Pakistani Brits, who do not reproduce at the clip of 4 children per family. These clans will exhibit more modest levels of ROH because the coefficient of relationship between cousins is only 1/8, as opposed to 1/2 between parents and children or full siblings.

roh1The figure to the left is from a 2008 paper on ROH in Europeans. Specifically these are Orcadians or part-Orcadians. A population you should be familiar with from the HGDP panel. Orcadians are natives of the Orkney islands just off the north coast of Scotland. Though of somewhat diverse origins, Viking, Scot and Pict, being islanders they’ve developed their own genetic peculiarities because of their isolation. A good rule of thumb is that any body of water is a fearsome barrier to casual gene flow. On the y-axis you see the total number of ROH in the genome of a given individual. I point you to the methods if you are curious as to the exact parameters they specified in their calculation. ROH is assessed over a window of the genome, and naturally one can vary its width, as well as the stringency in registering a particular region as a run or not a run. On the x-axis are the total lengths in terms of base pairs. What you see is a positive correlation between the number of ROH, and the total genomic length of the sequences. Those Orcadians who are genetically more diverse because of non-Orcadian parentage have the least homozygosity in their genomes. Those who are products of the recent cousin marriage have the most. But notice a peculiar pattern: there’s a curvilinear trend to the values. In those individuals who presumably have very high inbreeding coefficients the total length of ROH seems to exceed one’s expectation based on just the total number of ROHs. Why? Because they have very long runs of homozygosity indeed. This is just what we’d expect from the sort of process I described earlier, where it takes many generations for the long chromosomal sequences to be broken apart by recombination.

Before I get you too excited about the genetics of European homozygosity, let’s take a wider view. Some of the same researchers who published the paper above have come out with a set of results which survey the world. Genomic Runs of Homozygosity Record Population History and Consanguinity:

The human genome is characterised by many runs of homozygous genotypes, where identical haplotypes were inherited from each parent. The length of each run is determined partly by the number of generations since the common ancestor: offspring of cousin marriages have long runs of homozygosity (ROH), while the numerous shorter tracts relate to shared ancestry tens and hundreds of generations ago. Human populations have experienced a wide range of demographic histories and hold diverse cultural attitudes to consanguinity. In a global population dataset, genome-wide analysis of long and shorter ROH allows categorisation of the mainly indigenous populations sampled here into four major groups in which the majority of the population are inferred to have: (a) recent parental relatedness (south and west Asians); (b) shared parental ancestry arising hundreds to thousands of years ago through long term isolation and restricted effective population size (N e), but little recent inbreeding (Oceanians); (c) both ancient and recent parental relatedness (Native Americans); and (d) only the background level of shared ancestry relating to continental N e(predominantly urban Europeans and East Asians; lowest of all in sub-Saharan African agriculturalists), and the occasional cryptically inbred individual. Moreover, individuals can be positioned along axes representing this demographic historic space. Long runs of homozygosity are therefore a globally widespread and under-appreciated characteristic of our genomes, which record past consanguinity and population isolation and provide a distinctive record of the demographic history of an individual’s ancestors. Individual ROH measures will also allow quantification of the disease risk arising from polygenic recessive effects.

Their data set consists of the HGDP sample populations, so you naturally have the broad geographic clusters such as Africa, Europe, West Asia, Central/South Asia, East Asia, Oceania, and the New World. Two big dynamics are superimposed upon each other in the patterns of ROH: “deep history” demographic processes such as bottlenecks and population expansions, and cultural anthropological patterns which we see around us such as cousin marriage within inbred clans. To find the former you need to survey the genome finely. In contrast the latter leaves pretty obvious signs genomically in the form of very long ROH, as well as clusters of recessive diseases.

The first figure shows the distribution of different lengths of ROH by population:


Here’s the take away:

– Oceanians have many short ROH, but as you increase the length of ROH threshold they are not exceptional at all

– The New World samples persist in having a disproportionately number of ROH no matter the length, though the number does drop as you increase length threshold. This makes sense, the human genome is of finite length and you can only have so many very long ROHs

– The West Asian and Central/South Asian populations seem to have more long ROHs than the other Eurasian or African groups, though they’re not exceptional in the lowest category

– The Africans have the least ROH, especially in the category of very short runs

Before I comment on these patterns in detail, let’s quickly check out the next figure. It looks at Africans only, but divides the sample into those which are hunter-gatherers and those which are agriculturalists.


The hunter-gatherers have more, and longer, ROH than the agriculturists. Why? The answer in large part explains the geographical patterns as well: larger long term effective population. Effective population just refers to the proportion of the population which contributes genetically to the next generation. Small effective populations means a lot of genetic drift because of increased sample variance, and tends to converge upon consanguinity. If your tribe is small enough the only people you may find to marry are your cousins. As I noted above, this will produce long ROH as individuals will have descent through multiple lines from the same ancestor, increasing the probability of autozygosity greatly. The same process explains why West Asians and Central/South Asians are enriched for long LOH relative to other groups excepting Amerindians. Here’s a map from


Many Muslim societies practice cousin marriage, and many Muslims even argue that it is the Islamic practice (he married one of his cousins among his many wives. Strangely somehow these Muslims don’t argue that it is also the Muslim custom to marry old rich widows, though some do argue for the importance of marrying barely pubescent girls). Additionally, in India many Hindu groups in the South practice consanguineous marriages, including uncle-niece marriage. This is all occurring now, and so produces signatures of long ROH in many families. The final figure breaks down the individuals from selected populations, with again the y-axis being the number of ROH and x-axis being total length of the ROH:


The population sets are representative of broader geographic clusters. The Karitiana are from the Amazon, the Mandenka from Senegal, and the Balochi from Pakistan. If you don’t know where the French and Japanese are from, I would ask you never leave a comment on this weblog. Notice a few French, Mandenka, and Japanese individuals deviated away from their main clusters. These are cryptically inbred, perhaps their parents were cousins, or some of their grandparents were cousins. In contrast the Baloch have a wide range in terms of length of ROH; this is typical of populations where a large proportion of individuals are the products of cousin marriage, but many are not. The fact that individuals would exhibit a large variance of expected relatedness between their parents means that their own inbreeding coefficients and the genomic correlates (in this case ROH) would also vary greatly. The same parameter is operative among the Karitiana, an endangered ethnic group which presumably has a small “mate market” available to each individual.

So what about the Papuans? Their cluster is tight, and they don’t have nearly the total length of ROH as the Amazonian tribe. But remember that in the first figure they had many short ROH. A plausible explanation for this is the the Papuans went through an ancient bottleneck, from which they have expanded. The bottleneck increased genetic drift and so generated highly common haplotype blocks which combined to produce runs of homozygosity. But over time these blocks would have disintegrated through mutation and recombination. ROH in the Papuans then is simply a shadow of demographic events past, while ROH in Baloch is evidence of demographic events present.

roh2These two balancing realities are starkly illustrated in the supplements when you drill down to the South and Central Asian groups. In the figure it is clear that the group with the consistently highest number of ROH are the Kalash. This makes sense. The Kalash are a genetic isolate because they’re traditionally a pagan non-Muslim group isolated in the remote Chitral region of Pakistan. Because Muslims can not join their tribe for over a thousand years the gene flow has been unidirectional, as the Kalash convert to Islam and so assimilate into the broader Pakistani society. In contrast the other Pakistani groups have a huge variance in the total amount of ROH. The individuals with the least ROH in both total length and number in the sample are Baloch, Brahui and Makrani, as are some of the individuals with the highest values on these statistics! While the Kalash have been slowly and consistently ground down by the pressure of small population size, the Baloch, Brahui, and Makrani, are subject to the hammer-blows of several generations of first cousin marriages in inbred clans. These repeated marriages across the generations rapidly increase the ROH as first cousins may be more closely related to each other genetically than they are anthropologically.

roIn the pre-genomic era it was simple to calculate inbreeding. Just look at pedigrees. From this you derived the inbreeding coefficient. The key is to remember that the relationship of one’s sum totality of ancestors were critical in this calculation. In the USA marriages between first cousins occur between individuals whose grandparents are not usually related. But in other societies the generation of the grandparents, and perhaps great-grandparents, may also have been cousins. But pedigrees have limits, and may miss deep ancestry. The figure to the left, from the first paper I referenced, shows the relationship of the proportion of an individual’s ancestry which is identical by descent as calculated by genomic (ROH) methods on the y-axis and conventional ones on the x-axis (pedigree). There’s an obviously correlation, but observe the slight bias toward values above the line of best fit, and the fact that the y values are higher than the x. Genomic estimates capture common ancestry which lay outside the purview of conventional genealogy!

The implications of these patterns are two-fold: first, looking backward toward human history, and second, forward toward biomedical science. Patterns of ROH here are roughly in line with a serial bottleneck model Out of Africa; the further populations are from Africa the more short ROH they have. African populations have the least of these because of their larger long term effective population size, and relative insulation from the bottlenecking process. A shorter term phenomenon is that of consanguineous marriage patterns, whether conscious and culturally normative (as in the the Muslim world and parts of South Asia), or due to demographic constraint, as is the case among hunter-gatherers. These two processes together are relevant because of the prominence of recessive diseases within the domain of medical genetics. Clearly very long ROH is a sign of inbreeding, and so a likely higher susceptibility of an individual to a host of ailments. But the authors note that the sum effect of many short ROH may also be problematic, especially due to the fact that these together may form the preponderance of the ROH within the genomes of many populations.

So far I’ve basically alluded to demographic history, and how it shapes the genome through processes which are fundamentally neutral and stochastic. Inbreeding itself can be thought of as a form of super-charged drift, as the long term effective population of a breeding group collapses in on itself. But what about natural selection? I decided to take a closer look at Dr. Daniel MacArthur of Genomes Unzipped ROH. One of his longest regions is on Chromosome 2, is about ~2 Mb in length, and runs from position 134606441 to position 136593184. In 23andMe there’s a position which I think might explain this: 136325116. That’s the number for rs4988235 in the 23andMe data file. Variation on this SNP tracks lactase persistence in Europeans. Dr. Daniel MacArthur has the genotype for lactase persistence in the homozygote form. Are we seeing the long haplotype associated with lactase persistence here in this long ROH which rose rapidly in frequency in the last 10,000 years because of natural selection? In general the parameters outlined in the paper satisfy the broad sketch of human history, but there may be interesting detail on the margins left out of the picture.

Finally, let’s go back to heterozygosity vs. homozygosity. I recently watched the documentary “Is it Better to be Mixed Race?” Setting aside the obvious reality that this sort of program reflects the Zeitgeist of the era (it is rather obvious that a Victorian scientist could have produced a different documentary, even with the same evidence), near the end there is a comparison of ROH across populations and individuals. The comparison was actually done by the research group which published the paper I just reviewed. If you jump to 38 minutes into the film and just watch they’ll lay out the results, but I’ll tell you what they found. They compared two European men, a South Indian woman, and a man whose father was English and mother Nigerian. The European men had expected levels of homozygosity; on the higher end. The South Indian woman had lower levels of aggregate homozygosity. This should be expected, as India is relatively genetically diverse on a pan-Eurasian scale. Finally, the mixed race male had almost no homozygosity to speak of. The principle investigator admitted that out of 5,000 individuals who had he tested and analyzed this was the most extreme result, and he had to recheck it. Why? Three factors:

– The mother is Nigerian, which is a population which is relatively genetically diverse

– The genetic distance between the father and mother is rather high

– Finally, because the man is a first generation hybrid on all the loci where Africans and Europeans tend to differ he’ll be much more likely to be heterozygous

I’ll let the authors have the last word:

Long ROH are a neglected feature of our genome, which we have shown here to be universally common in human populations and to correlate well with demographic history. ROH are, however, only partially predictable from an individual’s background (due to the stochastic nature of inheritance). As well as conferring susceptibility to recessive Mendelian diseases, ROH are also potentially an underappreciated risk factor for common complex diseases, given the evidence for a recessive component in many complex disease traits…they will allow quantification of the risk arising from recessive genetic variants in different populations.

Citation: Mirna Kirin, Ruth McQuillan, Christopher S. Franklin, Harry Campbell, Paul M. McKeigue, & James F. Wilson (2010). Genomic Runs of Homozygosity Record Population History and Consanguinity PLoS ONE : 10.1371/journal.pone.0013996

Image Credit: Allison Stillwell

Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"