The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS
The Pith: You are expected to have 30 new mutations which differentiate you from your parents. But, there is wiggle room around this number, and you may have more or less. This number may vary across siblings, and explain differences across siblings. Additionally, previously used estimates of mutation rates which may have been too high by a factor of 2. This may push the “last common ancestor” of many human and human-related lineages back by a factor of 2 in terms of time.

There’s a new letter in Nature Genetics on de novo mutations in humans which is sending the headline writers in the press into a natural frenzy trying to “hook” the results into the X-Men franchise. I implicitly assume most people understand that they all have new genetic mutations specific and identifiable to them. The important issue in relation to “mutants” as commonly understood is that they have salient identifiable phenotypes, not that they have subtle genetic variants which are invisible to us. Another implicit aspect is that phenotypes are an accurate signal or representation of high underlying mutational load. In other words, if you can see that someone is weird in their traits, presumably they are rather strange in their underlying genetics. This is the logic behind models which assume that mutational load has correlates with intelligence or beauty, and these naturally tie back into evolutionary rationales for human aesthetic preferences (e.g., “good genes” models of sexual selection).

Variation in genome-wide mutation rates within and between human families:

J.B.S. Haldane proposed in 1947 that the male germline may be more mutagenic than the female germline…Diverse studies have supported Haldane’s contention of a higher average mutation rate in the male germline in a variety of mammals, including humans…Here we present, to our knowledge, the first direct comparative analysis of male and female germline mutation rates from the complete genome sequences of two parent-offspring trios. Through extensive validation, we identified 49 and 35 germline de novo mutations (DNMs) in two trio offspring, as well as 1,586 non-germline DNMs arising either somatically or in the cell lines from which the DNA was derived. Most strikingly, in one family, we observed that 92% of germline DNMs were from the paternal germline, whereas, in contrast, in the other family, 64% of DNMs were from the maternal germline. These observations suggest considerable variation in mutation rates within and between families.

From what I gather there’s a straightforward reason why the male germline, the genetic information which is transmitted by sperm to a male’s offspring, is more mutagenetic: sperm are produced throughout your whole life, and over time replication errors creep in. This is in contrast to a female’s eggs, where the full complement are present at birth. The fact that mutations creep in through sperm is just a boundary condition of how mutations creep in to the germline in the first place, errors in the DNA repair process. This is good on rare occasions (in that mutations may actually be fitness enhancing), more often this is bad (in that mutations are fitness detracting), and, oftentimes it is neutral. Remember that in terms of function and fitness a large class of mutations don’t have any effect. Consider the fact that 1 out of 25 people of European descent carry a mutation which can cause cystic fibrosis in the general population if it manifests in a homozygote genotype. But the vast majority of cystic fibrosis mutations are present in people who are heterozygote, and have a conventional functional gene which “masks” the deleterious allele.* And there are many mutations which are silent even in homozogyote form (e.g., if there is a change in a base at a synonymous position).

As noted in the letter above until recently estimating mutation rates was a matter of inference. On the broadest canvas one simply looked at differences between two related lineages which had been long separated (e.g., chimpanzee vs. human), and so accumulated many differential mutations, and assayed the differences. It may have been a fine-grained inference in the case of individuals who manifested a disease which exhibited a dominant expression pattern, so that one de novo mutation in the offspring could change the phenotype. For most humans this is thankfully not a major issue, and mutations remain cryptic for most of our lives. But no longer. With cheaper sequencing at some point in the near future most of us will have accurate and precise copies of our genomes available to us, and we will be able to see exactly where we have unique mutations which differentiate us from our parents and our siblings.

In this paper the authors took two “trios,” parent-child triplets, and compared their patterns of genetic variation at the scale of the full genome to a very high level of accuracy. Accuracy obviously matters a great deal when you might be looking for de novo mutations which are going to be counted on the scale of hundreds when base pairs are counted in billions. In the future when we have billions and billions of genomes on file and omnipotent computational tools I suspect there will be all sorts of ways to ascertain “typicality” of regions of your genome, but in this paper the authors naturally compared the parents to the children. If a mutation is de novo it should be underivable from the genetic patterns of the parent. But, sequencing technologies are not perfect, so there’s going to be a high risk for false positives when you are looking for the de novo mutations “in the haystack” (e.g., an error in the read of the offspring can be picked up as a mutation).

So they started with ~3,000 candidate de novo mutations (DNMs) for each family trio after comparing the genomes of the trios, but narrowed it down further experimentally as they filtered out the false positives. You can read the gory details in the supplements, but it seems that they focused on the identified candidates to see if they were: germline DNMs, non-germline DNMs, variant inherited from the parents, or a false positive call. So it turns out that half of the preliminary DNMs were somatic and about 1% turned out to be germline. Remember that the difference is that the germline mutations are going to be passed on to one’s offspring, while the somatic mutations only have impact on one’s physiological fitness over one’s life history. For the purposes of evolution germline mutations are much more important, though over your lifetime somatic mutations are going to be very important as you age.

After the methodological heavy-lifting the results themselves are interesting, albeit of somewhat limited generalizability because you are focusing on two trios only. Before we examine the results here’s a figure which illustrates the study design:

From what I can gather there are two primary findings in this paper:

1) Variance in the sex-mediated nature of DNMs across trios. One of the pairs was much closer to expectation. The male germline contribution was responsible for the vast majority of DNMs.

2) A more precise estimate of human mutational rates which might have implications for “molecular clock” estimates used in evolutionary phylogenetics.

Here are the findings in a figure which shows the 95% confidence intervals around estimated mutation rates:

CEU refers to the sample of white Utah Mormons commonly used in medical genetics, while YRI refers to Yoruba from Nigerians. Remember, these are two families only. That severely limits the power of the insights which you can draw, but already you see that while the CEU trio shows the expected imbalance between male and female contribution to DNMs, the YRI trio does not. But, both of the trios do suggest a lower mutation rate than found in previous studies which inferred the value from species divergence. Here is the portion which is relevant for human evolution: “These apparently discordant estimates can be largely reconciled if the age of the human-chimpanzee divergence is pushed back to 7 million years, as suggested by some interpretations of recent fossil finds.” I wouldn’t put my money on this quite yet, going by just this one study, but I’ve been hearing that this paper doesn’t come to this number in a scientific vacuum. Other researchers are converging upon a similar recalibration of mutational rates which might push back the time until the last common ancestor of many divergent hominoid and hominin lineages (including modern humans).

Moving the lens back to the present and of more personal genomic relevance:

Mutation is a random process and, as a result, considerable variation in the numbers of mutations is to be expected between contemporaneous gametes within an individual. If modeled as a Poisson process, the 95% confidence intervals on a mean of ~30 DNMs per gamete (as expected from a mutation rate of ~1 × 10−8) ranges from 20 to 41, which is a twofold difference. Truncating selection might act to remove the most mutated gametes and thus reduce this variation among gametes that successfully reproduce, however, any additional heterogeneity in stem-cell ancestry or environment (for example, variation in the number of cell divisions leading to contemporaneous gametes) would likely increase inter-gamete variation in the number of mutations.

Using the much smaller marker set obtained from 23andMe I found that two of my siblings are nearly 3 standard deviations apart in in identity-by-descent when it comes to the distribution of full-siblings. In the near future we might be able to ascertain the realized, not just theoretical, extent of mutational load across a family. As noted by the authors much of this might be a function of paternal age. Rupert Murdoch has children who are younger than many of his grandchildren, so there are many, many, “natural experiments” out there, as males are having offspring over 40 years apart.

On a societal level we may be able to estimate the exact cost in terms of public health costs of rising mean age of fathers. Personally we may also be able to note the correlations within families between high levels of DNMs and traits of interest such as intelligence and beauty. Compared to more fine-grained tools of ancestry inference I presume this is going to be dynamite. But it isn’t as if we didn’t know siblings varied before.

Citation: Donald F Conrad, Jonathan E M Keebler, Mark A DePristo, Sarah J Lindsay, Yujun Zhang, Ferran Casals, Youssef Idaghdour, Chris L Hartl, Carlos Torroja, Kiran V Garimella, Martine Zilversmit, Reed Cartwright, Guy A Rouleau, Mark Daly, Eric A Stone, Matthew E Hurles, & Philip Awadalla (2011). Variation in genome-wide mutation rates within and between human families Nature Genetics : 10.1038/ng.862

* In a random mating population the proportions are defined by the Hardy-Weinberg Equilibrium, p2 + 2pq + q2 = 1, so where q = 0.04, q2 = 0.0016 and 2pq = 0.0768. Heterozygote genotypes of CF outnumber homozygote ones 50 to 1.

Bloggy addendum: The first author of this letter is Don Conrad who is a contributor to Genomes Unzipped.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.

Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between grandparent and grandchild is not deterministic at any given locus. Rather, it is defined by a probability. To give a concrete example, consider an individual who has four grandparents, three of whom are Chinese, one of whom is Swedish. Imagine that the Swedish individual has blue eyes. One can assume reasonably then on the locus which controls blue vs. non-blue eye color difference one of the grandparents is homozygous for the “blue eye” allele, while the other grandparents are homozygous for the “brown eye” alleles. What is the probability that any given grandchild will carry a “blue eye” allele, and so be a heterozygote? Each individual has two “slots” at a given locus. We know that on one of those slots the individual has only the possibility of having a brown eye allele. Their probability of variation then is operative only on the other slot, inherited from the parent whom we know is a heterozygote. That parent in their turn may contribute to their offspring a blue eye allele, or a brown eye allele. So there is a 50% probability that any given grandchild will be a heterozygote, and a 50% probability that they will be a homozygote. The above “toy” example on one locus is to illustrate that the variation that one sees among individuals is in part due to the fact that we are not a “blend” of our ancestors, but a combination of various discrete genetic elements which are recombined and synthesized from generation to generation. Each sibling then can be conceptualized as a different “experiment” or “trial,” and their differences are a function of the fact that they are distinctive and unique combinations of their ancestors’ genetic variants. That is the most general theory, without any direct reference to proximate biophysical details of inheritance. Pure Mendelian abstraction as a formal model tells us that reproductive events are discrete sampling processes. But we live in the genomic age, and as you can see above we can measure the variation in genetic relationships among siblings today in an empirical sense. The expectation, as we would expect, is 0.50, but there is variance around that expectation. It is not likely that all of your siblings are “created equal” in reference to their coefficient of genetic relationship to you.

We know now that the human genome consists of about ~3 billion base pairs of A, G, C, and T. In the oldest classical evolutionary genetic models each of these base pairs can be conceived to be inherited independently from the other. In other words, evolution is a game of independent probabilities. But this idealization is not the concrete reality. To the left is a visualization of a human male karyotype, the set of 23 chromosomal pairs which the human genome (excluding the mtDNA) manifests as. Because the ~3 billion aforementioned base pairs have a physical position within these chromosomes the reality is that some are inherited together. That is, their inheritance patterns are associated due to their physical linkage. The karytope you see is clearly diploid. Each chromosome is divided into two symmetrical homologs, inherited from each parent (except 23, the sex chromosomes). The chromosomal numbers also correspond roughly to a rank order of size. To give you a sense of the gap, chromosome 1 has 250,000,000 bases and 4,200 genes, while chromosome 22 has 1,100 genes and 50,000,000 bases (the Y chromosome has a paltry 450 genes, as opposed to the 1,800 on the X).

In the toy example above the eye color locus is on a chromosome. Specifically, chromosome 15. Each individual will inherit one copy of 15 from their parents. But, there is no guarantee that each sibling will inherit the same copy from the generation of the grandparents. Let’s illustrate this schematically. Below you see the four combinations possible in relation to the chromosomes inherited by an individual’s parents from their own parents. So “paternal” and “maternal” here is in reference from the parental generation, so there are two of each. The ones inherited from the parental mother I’ve italicized.

Possible outcomes of combinations from grandparents
Paternal Maternal
Father Paternal Paternal Paternal Paternal Maternal
Maternal Maternal Paternal Maternal Maternal

The outcome are as follows:

Top-left cell: paternal grandfather’s chromosome + maternal grandfather’s chromosome
Top-right cell: paternal grandfather’s chromosome + maternal grandmother’s chromosome
Bottom-left cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome
Bottom-right cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome

As an example, if on chromosome 15 two siblings were characterized by the top-left cell, we might say that they were 100% “identical-by-descent” (IBD). This just means that their genes came down from the exact same ancestors. On the other hand, if one sibling was characterized by the top-left cell, and another the bottom-right, then they would be 0% IBD! In other words, in theory with this model siblings could be 0% IBD on the autosomal chromosomes if they kept inheriting different homologs from their grandparents, chromosome by chromosome (This would not be possible for chromosome 23. Males by necessity inherit the same Y from their father. While two females must share the same X from their father).

If you have a background in biology, you know this is wrong, because there’s more to the story. Recombination means that in fact you don’t invariably inherit intact copies of your grandparent’s chromosome. Rather, during meoisis, an individual’s chromosomes often “mix & match” their strands so that new mosaics are formed. So instead of inheriting homologous chromosomes which resemble exactly those carried by their grandparents, individuals often have chromosomes which are a mosaic of maternal and paternal due to the two meoisis events which intervened (one during the formation of the gametes which led to one’s parents, and another during the formation of the gametes of their parents’). If you are still confused, the following 3 minute instructional video may help. The narration has information, so if you can’t listen, the blue = paternal chromosomal segments, and the red = maternal chromosomal segments. Focus especially on recombination, about half way through the video.

This process works in contradiction to conditional dependence of inheritance of variants due to physical linkage on the same chromosomal regions. In other words, though still theoretically possible with no recombination for siblings to be very different, realistically recombination breaks apart many of the associations and reduces the realized variance. In the figure above the the low bound outliers in terms of genetic distance across sibling pairs are about mid-way between the coefficient of relatedness of half-siblings (0.25) and full-siblings (0.50), and fulling-sibling ~0.35 or so (the high bounds are 0.65).

Any any given locus the variance of IBD for siblings is 1/8. Since expectation is ~0.50, you can infer from this that on a specific gene there’s a lot of deviation across a cohort of siblings. This makes sense when you consider that siblings differ a great deal on single gene Mendelian traits. But what about the whole genome? Because now you have many more “draws” the “law of large nummbers” tends to reduce the variance. The figure to the right shows the standard deviation of IBD by chromosome. Remember that expectation is ~0.50. Observe that longer chromosomes have lower deviations. This is due to the variation of rates of recombination across the genome. We’ve come a long way from an abstract Mendelian model, to the point where one can integrate in an understanding of differences of rates of recombination across regions of the genome into the model. The total genome standard deviation of IBD turns out to be 0.036, which is close to older theoretical models which predicted ~0.04. This means that if you randomly drew two full-siblings and compared the extent of total genome IBD, the highest likelihood would be that they differed from 0.50 by 0.036. Assuming a normal distribution that means that 70% of siblings would fall within the interval 0.536 and 0.464 coefficient of relatedness. About 95% would fall with two standard deviations, 0.428 and 572. About 99.8% would fall within three standard deviations, 39.2 to 61.8.

The paper from which I’m drawing the figures and statistics is Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. The citations, as well as follow-up papers are very interesting. It shows how modern genomics is literally swallowing whole the insights of classical quantitative genetics. Nature is one, and abstractions ultimately map onto the concrete. I’d long thought I should review this paper and its insights, as comparisons across siblings are likely going to be a future avenue of understanding the genetic basis of many traits. But I have a more personal reason for looking into this issue.

This week many of my family members came “online” to the 23andMe system. To review:

RF = Father
RM = Mother
RS1 = Sibling 1 (female)
RS2 = Sibling 2 (male)

Later to come will be RS3, another male. But his data has not loaded….

23andMe has many features related to disease risk and ancestry information. The former was not of great interest to me, as my family is large enough that I had a good sense of what we were at risk for. 23andMe told me that I was at more risk for various ailments which are common across my extended pedigree. It also told me I was at more risk for ailments which are not known in my family. And, it told me I was at less risk for ailments common across my extended pedigree. Finally, it told me I was at less risk for ailments not common across my pedigree. You get the picture. For most people there isn’t much value-add here. I haven’t even touched the issue of “odds ratios”.

In regards to ancestry, I have received some value. I suspect I’m near the end of the line in this area, unless I get into some serious DYI genetics. My involvement in the Harappa Ancestry Project is more about understanding regional patterns of variation, than that of my own family.

So we’re at the next stage: looking at patterns in my own family. The screenshot you see above is from the ‘family inheritance’, and shows the IBD between RS2 and RF chromosome by chromosome. My male sibling and my father. As you can see they are “half-identical” across the whole genome, as they should be. Of each gene my father contributes one copy on the autosome. There’s no variance here. The total 2.86 GB value is also what you’d expect, there are ~3 billion base pairs, and you’re excluding the X and Y, as well as “no calls.” I can tell you that I exhibit the exact same relationship to my father as my brother. In contrast, my sister has more segments shared. That’s because she has an X chromosome from my father. The relationship to our mother is also as expected. We’re all equally related to our parents, once you account for sex differences on chromosome 23.

Below are the screenshots from family inheritance comparing the three siblings in terms of our genomes. Remember that half-identical (light blue) has half the weight as full-identical (dark blue).

[nggallery id=30]

Here’s the top-line. I share about the same length of segments that are half-identical to both RS1 and RS2, 2.26 and 2.27 GB. But, while I have 0.60 full-identical with RS1, I have 0.86 full-identical with RS2. And here’s the even more surprising part: RS1 and RS2 have much less in common than I do with either of them. 2.09 GB half-identical, and 0.5 full-identical.

But that’s not all. 23andMe has a “relative finder” feature. It’s main goal is to find relatives you don’t know about. I don’t have any non-close relative so far, in contrast to most others from what I have heard. It may be that most of the Bangladeshis in the database are from my own immediate family! (though there are some Indian Bengalis, I’ve found only one other Bangladeshi in the database to “share” genes with) You can though include your own family in the mix. You get two different values, % of DNA shared, and # of shared segments. The former basically seems to be a proxy for IBD. I have a person of European American ancestry on my account, and they have many “relatives” matched with whom they share 0.1-1% of their genome. One individual who asked for a contact did turn out to be a very distant cousin (his surname was the same as that of a grandparent). In any case, the matrix above shows the results so far for my family. My parents are not related; they share no segments or DNA IBD. In contrast, we are all about ~50% IBD with our parents (remember that father contributes no X chromosome to sons). But look at the sibling comparisons. In particular, RS1 & RS2 share only42% of their DNA! This aligns with the earlier results. RS1 and I are a bit closer than expectation. RS2 and I are a bit more distinct. Interestingly, while RS2 and I have 49 segments in common, RS1 and RS2 have 55 in common. Why the discrepancy? Presumably RS1 and RS2 load up on the number of segments on smaller chromosomes. This seems clear in the images above.

Where does this leave us? We know intuitively that siblings differ, and cluster, in their traits. These data and methods illustrate how in the near future how parents be able to determine which siblings cluster on the total genome content level! As I have stated before, RS2 and I in particular resemble each other physically, far more than either of us resemble RS1. Could this relate to what we’ve found genomically? I believe so. Physical appearance is controlled by many different variants across many different genes, so the phenotype may be a good reflection of the character of the total genome. This can be generalized to other quantitative traits.

Finally, this has clear implications for our study of genetic inheritance within families. Classical genetic techniques had to assume that the coefficient of relatedness between siblings was 0.50. The deviation from this expectation would have introduced errors into estimates of heritability and possibly masked the understanding of the genetic architecture of a trait. But now we can correct for deviations from the 0.50 value, and so better understand the genetic basis of complex traits such as behavior.

Citation: Visscher, P., Medland, S., Ferreira, M., Morley, K., Zhu, G., Cornes, B., Montgomery, G., & Martin, N. (2006). Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings PLoS Genetics, 2 (3) DOI: 10.1371/journal.pgen.0020041

(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"