The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog
Quantitative Genomics

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

In earlier discussions I’ve been skeptical of the idea of “designer babies” for many traits which we may find of interest in terms of selection. For example, intelligence and height. Why? Because variation on these traits seems highly polygenic and widely distributed across the genome. Unlike cystic fibrosis (Mendelian recessive) or blue eye color (quasi-Mendelian recessive) you can’t just focus on one genomic region and then make a prediction about phenotype with a high degree of certainty. Rather, you need to know thousands and thousands of genetic variants, and we just don’t know them.

But I just realized one way that genomics might make it a little easier even without this specific information.

The method relies on the phenotypic correlation between relatives. Even before genomics, and genetics, biometricians could generate rough & ready predictions about phenotypic values based on parental values. The extent of the predictive power depends upon the heritability of the trait. A trait like height is ~80-90% heritable. That means that ~80-90% of the variation in the population of the trait is due to genes. The expected value of your height is strongly conditional upon the heights of your parents.

That’s all common sense. What does this have to do with genomics? Simple. You are 50% identical by descent with each parent. That means half your gene copies come from your mother and half from your father. You can’t change that unless you’re a clone. But, because of the law of segregation and recombination you are not necessarily 25% identical by descent from each grandparent! The expectation is that you’re coefficient of relatedness is 25%, but there is variation around this. A given parent either contributes their own paternal or maternal homologous chromosome. There’s a 50% chance that you’re going to inherit one or the other across your chromosomes, of independent probability. You have 22 autosomal chromosome pairs (non-sex chromosomes), so there’s a strong chance that you won’t be equally balanced between your opposite sex paternal and maternal grandparents (e.g., you have more genes identical by descent from your paternal grandfather than paternal grandmother).* Second, recombination is also going to generate new combinations. In the generation we’re concerned about this will work against the dynamic we’re relying on, by swapping segments across homologous chromosomes from the parents’ mother or father.

The ultimate logic here is to select for zygotes or gametes which are biased toward the grandparents with phenotypic values which you are interested in. To give a concrete example, if you have a parent who is moderately tall, whose own father was very tall, while the mother was somewhat short, and you want the tallest possible child, you’ll want to select zygotes with the most gene content identical by descent with the tall grandparent. The point isn’t to pick specific genetic variants, you don’t need to know that. All you know is that the tall grandfather probably had genes which resulted in a predisposition toward being tall. So just make sure that the grandchild has as much of that grandparent “in them.”

I still don’t know if this is going to be cost effective in the near term. But I began to think of it because in the near future I’ll be checking the genotype of a child who has a full pedigree of 1,000,000 SNPs of their parents and grandparents.

* Modeling it as a binomial, about 1 in 7 cases will have the expected 11 chromosomes from a focal grandparent. The standard deviation is more than 2 chromosomes. You need to have about 100 zygotes to expect to get any individuals who are 5 chromosomal units away from the expected value (i.e., the individual is 10-15% instead of 25% one grandparent, or 35-40%). Obviously you need more to be assured of getting zygotes of that value. And I neglected recombination, which would work against this, by swapping genomic regions….

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

A new paper in Molecular Psychiatry has been reported on extensively in the media, and readers have mentioned it several times in the comments. I read it. It’s titled Genome-wide association studies establish that human intelligence is highly heritable and polygenic. But the fact is that I read this paper last year. Back then it was titled Common SNPs explain a large proportion of the heritability for human height. I kid, but you get the picture. The new paper establishes for intelligence what we already suspected: most of the genetic variation in this heritable trait is accounted for by numerous genes of small effect. You inherit variants of these numerous genes from your two parents, and your own trait value is to a large extent a combination of the parental values. The issue is not if intelligence is heritable, but the extent of that heritability.

The standard way to estimate human heritability was to track similarities across individuals with varying degrees of relatedness. For example, compare identical twin correlations on a trait with fraternal twin correlations. The main objection to these methods is that one could argue that environmental factors were correlated with particular genetic relationships (e.g., you treat individuals who are presumed identical twins more similarly). There are many reasons that I’m skeptical of extreme objections in this vein, but there are out there. This particular experiment design sidesteps that issue by looking at unrelated individuals. Not just notionally unrelated individuals, but actually those who were not genomically related. That’s a key difference between quantitative genetics and quantitative genomics. The former takes biological relatedness at face value, translating from ideal categories. The relatedness between full siblings is 0.50 for example. But when you look at the genomic level you can account and correct for the variation of that relatedness amongst siblings (e.g., two of my siblings exhibit a relatedness of only 0.42)! In this study they focused on numerous widely dispersed single nucelotide polymorphisms (SNPS), specific variants within genes, and used these to infer the nature of the genetic architecture of intelligence. More specifically, the genetic variation of two forms of intelligence, crystallized and fluid. The former seems to correspond to knowledge and the latter to raw problem solving abilities. Perhaps the difference between having an excellent operative system and applications vs. top of the line hardware?

In any case, here’s their abstract:

General intelligence is an important human quantitative trait that accounts for much of the variation in diverse cognitive abilities. Individual differences in intelligence are strongly associated with many important life outcomes, including educational and occupational attainments, income, health and lifespan. Data from twin and family studies are consistent with a high heritability of intelligence, but this inference has been controversial. We conducted a genome-wide analysis of 3511 unrelated adults with data on 549 692 single nucleotide polymorphisms (SNPs) and detailed phenotypes on cognitive traits. We estimate that 40% of the variation in crystallized-type intelligence and 51% of the variation in fluid-type intelligence between individuals is accounted for by linkage disequilibrium between genotyped common SNP markers and unknown causal variants. These estimates provide lower bounds for the narrow-sense heritability of the traits. We partitioned genetic variation on individual chromosomes and found that, on average, longer chromosomes explain more variation. Finally, using just SNP data we predicted ∼1% of the variance of crystallized and fluid cognitive phenotypes in an independent sample (P=0.009 and 0.028, respectively). Our results unequivocally confirm that a substantial proportion of individual differences in human intelligence is due to genetic variation, and are consistent with many genes of small effects underlying the additive genetic influences on intelligence.

The authors suggest that these values are a floor to heritability estimates, at least with the sorts of homogeneous populations they have here. That’s because their statistical genetic method is likely to miss a lot of true genetic variance due to its diminishing power when the causal genes are at too low of a frequency. They’re working within a framework where a given typed marker is correlated with a nearby I.Q. causal marker. At very low correlations they are going to miss the causal variant.

Some of the psychologists interviewed by the media contended that on one level these were banal findings. A value in the range they report is entirely within the mainstream of behavior genetic studies, which use pedigrees and what not. But many people don’t trust behavior genetics for whatever reason. One person’s banality is another person’s profundity.

But I think these sorts of findings should tilt us away from the proposition that large effect quantitative trait loci are common for I.Q. By this, I mean an “I.Q. gene” which is responsible for a huge difference between two people. There are some of these no doubt, especially those which result in mental retardation, but they don’t play that much of a role in all likelihood in ‘normal’ variation. Earlier linkage studies which reported such genes made huge media splashes and tended to fade because of lack of replication. Those genes may actually have been real QTLs, but the huge effect was likely to have been a random chance occurrence. Genome-wide association is better able to detect smaller effect genes within populations, but even it has been notably lacking in robust results.

Overall, this is good science. The results aren’t what those of us who were hoping that the intersection between psychometrics and genomics would yield low hanging fruit were pulling for. But reality is often likely to dash our hopes. No matter the banality of the final results, I do think the figure to the left is rather cool. It shows that the larger the chromosome the greater the proportion of variance is explained by that chromosome. This is entirely expected in theory (large chromosomes would carry more causal variants), but it is gratifying to still see it born out empirically.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

The Pith: In this post I examine how looking at genomic data can clarify exactly how closely related siblings really are, instead of just assuming that they’re about 50% similar. I contrast this randomness among siblings to the hard & fast deterministic nature of of parent-child inheritance. Additionally, I detail how the idealized spare concepts of genetics from 100 years ago are modified by what we now know about how genes are physically organized, and, reorganized. Finally, I explain how this clarification allows us to potentially understand with greater precision the nature of inheritance of complex traits which vary within families, and across the whole population.

Humans are diploid organisms. We have two copies of each gene, inherited from each parent (the exception here is for males, who have only one X chromosome inherited from the mother, and lack many compensatory genes on the Y chromosome inherited from the father). Our own parents have two copies of each gene, one inherited from each of their parents. Therefore, one can model a grandchild from two pairs of grandparents as a mosaic of the genes of the four ancestral grandparents. But, the relationship between grandparent and grandchild is not deterministic at any given locus. Rather, it is defined by a probability. To give a concrete example, consider an individual who has four grandparents, three of whom are Chinese, one of whom is Swedish. Imagine that the Swedish individual has blue eyes. One can assume reasonably then on the locus which controls blue vs. non-blue eye color difference one of the grandparents is homozygous for the “blue eye” allele, while the other grandparents are homozygous for the “brown eye” alleles. What is the probability that any given grandchild will carry a “blue eye” allele, and so be a heterozygote? Each individual has two “slots” at a given locus. We know that on one of those slots the individual has only the possibility of having a brown eye allele. Their probability of variation then is operative only on the other slot, inherited from the parent whom we know is a heterozygote. That parent in their turn may contribute to their offspring a blue eye allele, or a brown eye allele. So there is a 50% probability that any given grandchild will be a heterozygote, and a 50% probability that they will be a homozygote. The above “toy” example on one locus is to illustrate that the variation that one sees among individuals is in part due to the fact that we are not a “blend” of our ancestors, but a combination of various discrete genetic elements which are recombined and synthesized from generation to generation. Each sibling then can be conceptualized as a different “experiment” or “trial,” and their differences are a function of the fact that they are distinctive and unique combinations of their ancestors’ genetic variants. That is the most general theory, without any direct reference to proximate biophysical details of inheritance. Pure Mendelian abstraction as a formal model tells us that reproductive events are discrete sampling processes. But we live in the genomic age, and as you can see above we can measure the variation in genetic relationships among siblings today in an empirical sense. The expectation, as we would expect, is 0.50, but there is variance around that expectation. It is not likely that all of your siblings are “created equal” in reference to their coefficient of genetic relationship to you.

We know now that the human genome consists of about ~3 billion base pairs of A, G, C, and T. In the oldest classical evolutionary genetic models each of these base pairs can be conceived to be inherited independently from the other. In other words, evolution is a game of independent probabilities. But this idealization is not the concrete reality. To the left is a visualization of a human male karyotype, the set of 23 chromosomal pairs which the human genome (excluding the mtDNA) manifests as. Because the ~3 billion aforementioned base pairs have a physical position within these chromosomes the reality is that some are inherited together. That is, their inheritance patterns are associated due to their physical linkage. The karytope you see is clearly diploid. Each chromosome is divided into two symmetrical homologs, inherited from each parent (except 23, the sex chromosomes). The chromosomal numbers also correspond roughly to a rank order of size. To give you a sense of the gap, chromosome 1 has 250,000,000 bases and 4,200 genes, while chromosome 22 has 1,100 genes and 50,000,000 bases (the Y chromosome has a paltry 450 genes, as opposed to the 1,800 on the X).

In the toy example above the eye color locus is on a chromosome. Specifically, chromosome 15. Each individual will inherit one copy of 15 from their parents. But, there is no guarantee that each sibling will inherit the same copy from the generation of the grandparents. Let’s illustrate this schematically. Below you see the four combinations possible in relation to the chromosomes inherited by an individual’s parents from their own parents. So “paternal” and “maternal” here is in reference from the parental generation, so there are two of each. The ones inherited from the parental mother I’ve italicized.

Possible outcomes of combinations from grandparents
Paternal Maternal
Father Paternal Paternal Paternal Paternal Maternal
Maternal Maternal Paternal Maternal Maternal

The outcome are as follows:

Top-left cell: paternal grandfather’s chromosome + maternal grandfather’s chromosome
Top-right cell: paternal grandfather’s chromosome + maternal grandmother’s chromosome
Bottom-left cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome
Bottom-right cell: paternal grandmother’s chromosome + maternal grandfather’s chromosome

As an example, if on chromosome 15 two siblings were characterized by the top-left cell, we might say that they were 100% “identical-by-descent” (IBD). This just means that their genes came down from the exact same ancestors. On the other hand, if one sibling was characterized by the top-left cell, and another the bottom-right, then they would be 0% IBD! In other words, in theory with this model siblings could be 0% IBD on the autosomal chromosomes if they kept inheriting different homologs from their grandparents, chromosome by chromosome (This would not be possible for chromosome 23. Males by necessity inherit the same Y from their father. While two females must share the same X from their father).

If you have a background in biology, you know this is wrong, because there’s more to the story. Recombination means that in fact you don’t invariably inherit intact copies of your grandparent’s chromosome. Rather, during meoisis, an individual’s chromosomes often “mix & match” their strands so that new mosaics are formed. So instead of inheriting homologous chromosomes which resemble exactly those carried by their grandparents, individuals often have chromosomes which are a mosaic of maternal and paternal due to the two meoisis events which intervened (one during the formation of the gametes which led to one’s parents, and another during the formation of the gametes of their parents’). If you are still confused, the following 3 minute instructional video may help. The narration has information, so if you can’t listen, the blue = paternal chromosomal segments, and the red = maternal chromosomal segments. Focus especially on recombination, about half way through the video.

This process works in contradiction to conditional dependence of inheritance of variants due to physical linkage on the same chromosomal regions. In other words, though still theoretically possible with no recombination for siblings to be very different, realistically recombination breaks apart many of the associations and reduces the realized variance. In the figure above the the low bound outliers in terms of genetic distance across sibling pairs are about mid-way between the coefficient of relatedness of half-siblings (0.25) and full-siblings (0.50), and fulling-sibling ~0.35 or so (the high bounds are 0.65).

Any any given locus the variance of IBD for siblings is 1/8. Since expectation is ~0.50, you can infer from this that on a specific gene there’s a lot of deviation across a cohort of siblings. This makes sense when you consider that siblings differ a great deal on single gene Mendelian traits. But what about the whole genome? Because now you have many more “draws” the “law of large nummbers” tends to reduce the variance. The figure to the right shows the standard deviation of IBD by chromosome. Remember that expectation is ~0.50. Observe that longer chromosomes have lower deviations. This is due to the variation of rates of recombination across the genome. We’ve come a long way from an abstract Mendelian model, to the point where one can integrate in an understanding of differences of rates of recombination across regions of the genome into the model. The total genome standard deviation of IBD turns out to be 0.036, which is close to older theoretical models which predicted ~0.04. This means that if you randomly drew two full-siblings and compared the extent of total genome IBD, the highest likelihood would be that they differed from 0.50 by 0.036. Assuming a normal distribution that means that 70% of siblings would fall within the interval 0.536 and 0.464 coefficient of relatedness. About 95% would fall with two standard deviations, 0.428 and 572. About 99.8% would fall within three standard deviations, 39.2 to 61.8.

The paper from which I’m drawing the figures and statistics is Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings. The citations, as well as follow-up papers are very interesting. It shows how modern genomics is literally swallowing whole the insights of classical quantitative genetics. Nature is one, and abstractions ultimately map onto the concrete. I’d long thought I should review this paper and its insights, as comparisons across siblings are likely going to be a future avenue of understanding the genetic basis of many traits. But I have a more personal reason for looking into this issue.

This week many of my family members came “online” to the 23andMe system. To review:

RF = Father
RM = Mother
RS1 = Sibling 1 (female)
RS2 = Sibling 2 (male)

Later to come will be RS3, another male. But his data has not loaded….

23andMe has many features related to disease risk and ancestry information. The former was not of great interest to me, as my family is large enough that I had a good sense of what we were at risk for. 23andMe told me that I was at more risk for various ailments which are common across my extended pedigree. It also told me I was at more risk for ailments which are not known in my family. And, it told me I was at less risk for ailments common across my extended pedigree. Finally, it told me I was at less risk for ailments not common across my pedigree. You get the picture. For most people there isn’t much value-add here. I haven’t even touched the issue of “odds ratios”.

In regards to ancestry, I have received some value. I suspect I’m near the end of the line in this area, unless I get into some serious DYI genetics. My involvement in the Harappa Ancestry Project is more about understanding regional patterns of variation, than that of my own family.

So we’re at the next stage: looking at patterns in my own family. The screenshot you see above is from the ‘family inheritance’, and shows the IBD between RS2 and RF chromosome by chromosome. My male sibling and my father. As you can see they are “half-identical” across the whole genome, as they should be. Of each gene my father contributes one copy on the autosome. There’s no variance here. The total 2.86 GB value is also what you’d expect, there are ~3 billion base pairs, and you’re excluding the X and Y, as well as “no calls.” I can tell you that I exhibit the exact same relationship to my father as my brother. In contrast, my sister has more segments shared. That’s because she has an X chromosome from my father. The relationship to our mother is also as expected. We’re all equally related to our parents, once you account for sex differences on chromosome 23.

Below are the screenshots from family inheritance comparing the three siblings in terms of our genomes. Remember that half-identical (light blue) has half the weight as full-identical (dark blue).

[nggallery id=30]

Here’s the top-line. I share about the same length of segments that are half-identical to both RS1 and RS2, 2.26 and 2.27 GB. But, while I have 0.60 full-identical with RS1, I have 0.86 full-identical with RS2. And here’s the even more surprising part: RS1 and RS2 have much less in common than I do with either of them. 2.09 GB half-identical, and 0.5 full-identical.

But that’s not all. 23andMe has a “relative finder” feature. It’s main goal is to find relatives you don’t know about. I don’t have any non-close relative so far, in contrast to most others from what I have heard. It may be that most of the Bangladeshis in the database are from my own immediate family! (though there are some Indian Bengalis, I’ve found only one other Bangladeshi in the database to “share” genes with) You can though include your own family in the mix. You get two different values, % of DNA shared, and # of shared segments. The former basically seems to be a proxy for IBD. I have a person of European American ancestry on my account, and they have many “relatives” matched with whom they share 0.1-1% of their genome. One individual who asked for a contact did turn out to be a very distant cousin (his surname was the same as that of a grandparent). In any case, the matrix above shows the results so far for my family. My parents are not related; they share no segments or DNA IBD. In contrast, we are all about ~50% IBD with our parents (remember that father contributes no X chromosome to sons). But look at the sibling comparisons. In particular, RS1 & RS2 share only42% of their DNA! This aligns with the earlier results. RS1 and I are a bit closer than expectation. RS2 and I are a bit more distinct. Interestingly, while RS2 and I have 49 segments in common, RS1 and RS2 have 55 in common. Why the discrepancy? Presumably RS1 and RS2 load up on the number of segments on smaller chromosomes. This seems clear in the images above.

Where does this leave us? We know intuitively that siblings differ, and cluster, in their traits. These data and methods illustrate how in the near future how parents be able to determine which siblings cluster on the total genome content level! As I have stated before, RS2 and I in particular resemble each other physically, far more than either of us resemble RS1. Could this relate to what we’ve found genomically? I believe so. Physical appearance is controlled by many different variants across many different genes, so the phenotype may be a good reflection of the character of the total genome. This can be generalized to other quantitative traits.

Finally, this has clear implications for our study of genetic inheritance within families. Classical genetic techniques had to assume that the coefficient of relatedness between siblings was 0.50. The deviation from this expectation would have introduced errors into estimates of heritability and possibly masked the understanding of the genetic architecture of a trait. But now we can correct for deviations from the 0.50 value, and so better understand the genetic basis of complex traits such as behavior.

Citation: Visscher, P., Medland, S., Ferreira, M., Morley, K., Zhu, G., Cornes, B., Montgomery, G., & Martin, N. (2006). Assumption-Free Estimation of Heritability from Genome-Wide Identity-by-Descent Sharing between Full Siblings PLoS Genetics, 2 (3) DOI: 10.1371/journal.pgen.0020041

(Republished from Discover/GNXP by permission of author or representative)
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"