The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Topics/Categories Filter?
 TeasersDavidB@GNXP Blogview

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

While writing a recent short note on Richard Dawkins and kin selection, I looked through my previous posts on the subject, and found what I thought was a blunder in an old post from 2004. To avoid misleading anyone who came across it in a search, I deleted it from the archive. But on further reflection I have concluded that there was no blunder after all…

My original post was a critique of an argument used by an anthropologist against the importance of kin selection in human social evolution. I did not mention the anthropologist’s name, and I do not now recall it, so I will refer to him simply as ‘Anthropologist’. Anthropologist’s argument was, in essence, that kin selection cannot be important because (a) for relatives beyond the closest it is too weak to be effective, and (b) in the human species (unlike, say, ants) an individual has few close relatives.

As it happens, I am sceptical about the importance of kin selection in human social evolution, and I agree with Anthropologist that kin selection is too weak to be important in relationships more distant than (roughly) uncle-nephew. (I will not consider Anthropologist’s second point, that there are too few close relatives.) According to Hamilton’s Rule, the value (to the relevant gene in the donor) of providing benefits to a relative is proportional to the coefficient of relatedness, and this declines rapidly with each extra degree of remoteness in the relationship: 1/2 for siblings, 1/8 for first cousins, 1/32 for second cousins and so on (or half of these figures if the relationship is through a single common ancestor, and not a pair). The problem with giving benefits to distant relatives is not just that beyond first cousins the relatedness is weak in absolute terms, but that it will usually be possible to give the benefit to a closer relative. Other things being equal, it is better to give a benefit to a sibling than a nephew, a nephew than a cousin, and so on. The circumstances in which it is advantageous to give benefits to a distant relative are probably rare.

So I agree with Anthropologist that kin selection in favour of helping distant relatives is unlikely to be important. But Anthropologist went beyond this qualitative conclusion, and attempted to quantify the importance of kin selection at each degree of relatedness in a way that I originally believed – and now believe again – is fallacious.

The importance of kin selection at any given degree of relatedness depends on the number of relatives helped and the inclusive fitness benefit of helping them. The number of relatives actually helped is an empirical matter, but it is possible to set an approximate upper limit to the number of relatives in any given degree who may be helped. In a stable or slowly changing population, each individual (or monogamous pair) will on average have 2 offspring, and the number of descendants after n generations will be approximately 2^n. Anthropologist takes this as his figure for the number of relatives in any given degree that can be helped. Even if we only count the descendants of a single ancestor, this is not strictly correct. For each individual ancestor, only about half of its descendants after n generations will have the most distant degree of relatedness to each other (e.g. second cousins, if they are descended from a common great-grandparent), while the other half will be more closely related (e.g. first cousins), because they have a more recent common ancestor. But it is true that on average the number of relevant descendants approximately doubles with each generation.

The selective value to a donor of giving benefits to a particular relative in any given degree depends on the coefficient of relatedness, or some similar measure, as used in Hamilton’s Rule. This may be calculated by tracing the connection between two relatives through their nearest common ancestor or ancestors, and taking a factor of 1/2 for each link in the chain. For example, for second cousins linked through a single great-grandparent, there are three links in the chain up from one cousin to the great-grandparent, and another three links in the chain down to the other cousin, so the coefficient of relatedness is (1/2)^6 = 1/64. More generally, if n is the number of steps back to the common ancestor, the relatedness is (1/2)^2n. Relatedness therefore declines by a factor of 1/4 for each unit increase in n.

So far, so good. But here comes the problematic part. Anthropologist calculates the (potential) importance of kin selection in each degree of relatedness by multiplying the estimated number of descendents from a single ancestor (or pair) by the coefficient of relatedness appropriate to that degree of relatedness. As already noted, one of these numbers increases by a factor of 2 with each generation, while the other declines by a factor of 1/4. Since 2 x 1/4 = 1/2, Anthropologist concludes that the potential aggregate value of helping relatives in any given degree is halved with each step of distance in relatedness, and rapidly becomes negligible.

In my 2004 post I raised several objections of detail to Anthropologist’s calculations, which I will not repeat here. But my more fundamental objection was that Anthropologist had taken into account only the relatives descended from a single pair of ancestors, for example second cousins descended from a single pair of great-grandparents. This underestimates the number of relatives an individual is likely to have in any given degree, since he or she may have such relatives by several different lines of descent. For example, excluding inbreeding, an individual will have 8 great-grandparents, and may have second-cousins descended from any of them. When this is taken into account, the number of relatives in any given degree increases in the same ratio as genetic relatedness declines. With each step back, the number of ancestors doubles (ignoring inbreeding) which exactly cancels out the alleged decline in the importance of kin selection.

Why then on re-reading my post did I think I had blundered?

In dealing with kin selection it is essential to take a gene’s-eye-view. If we focus on the particular gene of interest, kin selection only promotes it by increasing the fitness of genes that are identical by descent (IBD) with it. Since IBD genes are, by definition, derived by replication from a single common ancestor’s gene, IBD genes can only be found in the descendants of that ancestor, such as the descendants of a common great-grandparent. Other individuals may be equally closely related to the donor in the conventional sense – e.g. they may also be second-cousins – but their genes cannot be IBD with the gene of interest. The relevant number of relatives – those who can possess IBD copies of the gene of interest – is therefore confined to the descendants of a single common ancestor. On re-reading my post I thought I must have overlooked this, and that Anthropologist was right to count only the descendants of a single ancestor (or pair).

But on further reflection, I think I was right first time. (Indeed, on checking my old working papers I find that I considered this objection and dismissed it, unfortunately without mentioning it in my post.)

What is relevant to kin selection is the probability that a relative will have a copy of the gene that is IBD with the copy possessed by the actor. This probability is conventionally measured by the coefficient of relatedness, r. But probabilities are relative to the statistical ‘population’ under consideration, and this depends on the information available to us. For example, if all we know about an individual is that he is a white American male, and we wish to know the probability that he will die within a year, the relevant population consists of all white American males. If on the other hand we also know that he has been diagnosed with cancer, then the relevant population consists of all white American males who have been diagnosed with cancer. In the case of relatedness, the usual calculation of r assumes that the ancestral source of a gene is unknown, and that it may with equal probability have come from any ancestor at the appropriate distance. If on the other hand we know (or assume) that the source is some particular ancestor, then the usual calculation of r is not appropriate to determine the probability that another descendant of that ancestor has inherited the same gene. If that ancestor is definitely the source of the gene in one descendant, then the probability that it is IBD in another descendent is simply 1/2^n, where n is the number of generations from the common ancestor. It is easy to see that this probability is greater than r (as normally calculated) by a factor of 2^n.

We may therefore legitimately measure the potential importance of kin selection among relatives of a given degree in two ways. Either we may take account of all the relatives in that degree descended from all ancestors at the appropriate distance back, and multiply their number by the usual r, or we may take the number of relatives descended from a particular ancestor and multiply by (2^n)r. The result by both methods is the same. What we cannot legitimately do is to take the descendants of a single ancestor (or pair) and multiply their number by the usual r, as done by Anthropologist.

In my original post I made various more technical points. The only one worth mentioning here is that a randomly selected individual is more likely to come from a large, flourishing lineage than from a small, declining one. He certainly does not come from an extinct one! The average number of relatives in a given degree, for a randomly selected individual in the present generation, may therefore be significantly larger than the simple 2^n formula would suggest. However, I doubt that this consideration is enough to make kin selection in favour of distant relatives a major factor in human evolution. I agree with Anthropologist that this is unlikely, even if I disagree with the reasoning by which he reached this conclusion.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

The BBC have just finished a short (3-part) series of documentaries by Adam Curtis, under the general heading ‘All Watched Over by Machines of Loving Grace’. It’s impossible to describe them briefly, so I won’t try; suffice to say I found them fascinating but often exasperating with their wild leaps of logic. For GNXP readers the most interesting will probably be the last, which has a lot of material about W. D. Hamilton, George Price, and Dianne Fossey, as well as extraordinary archive footage from Central Africa. (I bet you never heard a BBC reporter casually referring to ‘jungle bunnies’ before.)

They are all available here for the next week, at least. Unfortunately I don’t know if this will be accessible outside the UK – some things are and some aren’t, usually for copyright reasons.

[Added: if you can't view it on the BBC iPlayer, the final part is currently available on YouTube - search YouTube for recent postings on 'Adam Curtis'.]

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

This concludes a series of posts on the work of George Price. For the most recent one, with links to the others, see here*. This final post covers the subject of group selection.

Price and Group Selection

The application of Price’s Equation to group selection, and the related problem of biological altruism, is largely responsible for the current interest in Price, as shown in Oren Harman’s biography. The controversy over group selection dates from the early 1960s, as discussed here*. Price attempted to cut through the controversy with a simple new approach. Using Price’s Equation, the overall change in frequency of a gene in a population between two generations can be broken down into two components, which I call the Covariance and Transmission terms. Price’s simple proposal was to identify the effect of group selection with the Covariance term, while selection on individuals (or genes within individuals) is covered by the Transmission term [Price, 1972, 488]. Price’s own work was cut short by his untimely death, but his approach received a boost when it was endorsed (with some qualifications) by W. D. Hamilton [Narrow Roads, vol.1, 333]. Yet it failed to attract much interest for another decade, and is still not generally accepted.

The issues raised are partly semantic: what do (or should) we mean by ‘group selection’? Some biologists, like the late John Maynard Smith, think it is important to draw a sharp distinction between group selection and kin selection. Since Price’s approach makes no such distinction, it has little attraction to these biologists. But there is also a more technical problem with the Price-Hamilton approach. As I discovered for myself when working through some numerical examples, the Covariance term (if it is not zero) will be affected by a factor which can only reasonably be described as individual selection, and which may in extreme cases account for most or all of the term. The division between ‘group’ and ‘individual’ selection is therefore not as clear as Price himself seems to have believed. On searching the literature I find that this point has been sporadically recognised at least since 1985, but it still does not seem to be sufficiently emphasised. For the moment, however, I will assume that the blurring of individual and group effects is small enough to be neglected. Making this assumption, how useful is Price’s Equation in clarifying the issues involved in the controversy over group selection?

Applying Price’s Equation to group selection

Group selection is explicitly dealt with only in Price’s 1972 paper, where he derives what I have called Price’s Second Equation. Unlike the First Equation, this can be applied to groups that are unequal in size. Against this advantage, it is more complicated than the First Equation, and it involves statistical functions (such as weighted covariances) whose meaning is not intuitively clear. For the purpose of analysing the concept of group selection I will therefore make the simplifying assumption that groups are of equal size. We may then use Price’s First Equation, in the knowledge that via the Second Equation the conclusions can be generalised to groups of unequal size.

Price’s First Equation, as published in 1970, can be expressed as:

[1] dQ = cov(z,q)/Z + Sz[i]dq[i]/NZ.

In the 1970 paper this equation is applied only to populations of individual organisms, but it is valid for any interpretation consistent with the conditions under which it is derived. It can be applied, with only minor modifications, to the case of group selection, provided the groups are of equal size in the first generation. We may interpret the population as being divided into N groups, indexed by the numbers 1, 2…i…N, each containing equal numbers of individuals. There is no migration or interbreeding between groups. The gene frequency of the relevant gene in the i’th group in generation 1 is designated by q[i], and the number of offspring of that group’s members is designated by z[i]. S indicates summation, and cov(z,q) is the covariance between z and q. The mean number of offspring Z is the average of z[i] over all groups (Sz[i]/N). dq[i] is the change in the frequency of the gene within the i’th group, and dQ the change in the population frequency of the gene, between generations 1 and 2.

When the equation is applied to sub-groups of a population, rather than to individuals, the size of the Covariance term depends on the relationship between the aggregate fitness of each group and the frequency of the relevant gene in that group. If groups with a high frequency of the gene (relative to the population average) also have a higher-than-average number of offspring, while groups with a low frequency have fewer offspring, then the term will be positive. Conversely, if groups with a high frequency of the gene have a lower-than-average number of offspring, while groups with a low frequency have more offspring, then the term will be negative. In contrast, the size of the Transmission term depends on the relative fitness of the gene within groups. If, for whatever reason, the frequency of the gene within groups (averaged over groups) is increasing, then the Transmission term will be positive; while if it is decreasing the term will be negative.

W. D. Hamilton was apparently the first to point out that the equation could be applied hierarchically to groups within groups within groups, etc. In equation[1] the second term on the RHS contains the expression dq[i] for the change in frequency of the gene within the i’th group. But if the i’th group itself contains sub-groups, the change in frequency within the group can be expressed by means of equation[1] itself. (As a special case, a diploid individual can be treated as a ‘group’ with the two genes at the relevant locus as its ‘members’.) In principle there is no limit to the number of ‘levels of selection’ that can be treated in this way. This is a major theoretical attraction of the Price-Hamilton approach, though in practice uses involving more than two levels of selection appear to be rare.

Advantages of Price’s Equation

Price’s Equation has several obvious merits for the analysis of group selection. It is simple, mathematically rigorous, and very general in its application. There have been many models of group selection, with a bewildering variety of assumptions and parameters, but in every case we can ask the question: under this model, how does the change of gene frequency between one generation and the next come out under Price’s Equation?

If we accept Price’s view that the Covariance term represents the contribution of group selection, then group selection requires covariance between group fitness and group gene frequency. The Price approach focuses attention on the circumstances that may give rise to covariance.

First, there cannot be covariance between two quantities unless there is variance in both of them. As Price himself noted, ‘the current “group selection” controversy hinges on the question of whether the intergroup variance [in gene frequency] is likely to be of significant magnitude under realistic natural conditions’ [1972, p.489]. If all groups had the same fitness, or the same gene frequency, or both, then the Covariance term would be zero. For example, if all groups were constrained by a ceiling on resources, such that the number of survivors was always the same in each generation, there would be no variance in group fitness, and the Covariance term would be zero.

The existence of variance in both group fitness and group gene frequency is a necessary but not sufficient condition for covariance between them. What is needed in addition is some connection between gene frequency and group fitness: they need to covary – literally, to vary together. We hypothesise that the groups’ genetic composition has some effect on fitness. This effect, if any, may be statistically measured by the regression of group fitness on group gene frequency. The regression coefficient measures the extent to which the high or low frequency of a given gene, relative to the population average, is associated with high or low fitness. The covariance is proportional to the regression coefficient, and as Price himself pointed out, in considering any model of selection the investigator should ask whether the assumed regression coefficient is plausible in nature. A high regression coefficient would imply that a small difference in group gene frequency has a large effect on group fitness: an assumption that would need to be justified.


I have so far said little about altruism. Price himself does not use the term at all. (He does refer to ‘group-benefiting’ and ‘individual-benefiting’ genes: 1972, p.489.) The problem of biological altruism was however at the heart of the group selection controversy, and there is little doubt that it motivated Price’s interest in the subject [see Harman, passim].

Within the framework of Price’s Equation there is an obvious approach to the definition of altruism. Assuming that the Covariance and Transmission terms are both non-zero, there are four possible combinations:

1. Both are positive.
2. Both are negative.
3. The Covariance term is positive but the Transmission term is negative.
4. The Covariance term is negative but the Transmission term is positive.

If we assume, with Price, that the Covariance term represents ‘group’ selection, and the Transmission term ‘individual’ selection, then group selection is operating in all four cases, but in cases 1 and 2 it is operating in the same direction as individual selection, so there is no conflict between them. In cases 3 and 4, in contrast, the outcome depends on the balance of opposing forces. Case 3 seems on the face of it to represent group selection in favour of a gene for altruism, while case 4 represents group selection resisting a gene for selfishness.

Price’s Equation has indeed been interpreted in this way to define altruism. If the Covariance term is positive, but the Transmission term is negative, then the frequency of the relevant gene is, on average, declining within groups. A number of investigators have used this as their criterion for the presence of altruism, defining altruism as reducing individual fitness relative to the group average, while increasing the overall fitness of the group. However, this is a criterion for what is often called weak altruism. Weak altruism does not necessarily reduce the altruist’s own reproductive fitness relative to the population average, which would constitute strong altruism. An important distinction between weak and strong altruism is that the former, but not the latter, can evolve even when individuals are allocated randomly to groups in each generation. The distinction between weak and strong altruism was formulated after Price’s work, and his Equation does not resolve the choice between them. A weak definition of altruism fits neatly into Price’s mathematical treatment, but this is hardly an overriding reason for adopting it. It is noteworthy that Hamilton, despite his words of endorsement for Price’s approach, used a strong definition of altruism in his own work. Alan Grafen is also an advocate of Price’s Equation, but thinks the term ‘weak altruism’ misleading, and prefers to call it ‘a self-interested refusal to be spiteful’ [Grafen 1984, p83]. The distinction is discussed more fully here.*

The blur between individual and group fitness

As I mentioned earlier, the Covariance and Transmission terms in Price’s Equation do not mark the division between ‘group’ and ‘individual’ selection as clearly as Price seems to have believed. To introduce this problem, consider an analogy. Ten fat ladies join a Weightwatchers club. At the beginning of the year they have a collective weight of 2000 pounds, and thus an average individual weight of 200 pounds. At the end of the year, nine of them are exactly the same weight as before, but one of them has lost 100 pounds. The collective weight is now 1900 pounds and the average individual weight 190 pounds. We may therefore analyse the weight change of each member into two components: a Group effect (a loss of 10 pounds), and an Individual effect (in one case, a loss of 90 pounds, and in the other nine, a gain of 10 pounds).

The arithmetic is impeccable, but the interpretation is nonsense. There is no ‘Group effect’; or rather, if there is, it cannot be proved in this way. Not every mathematically consistent way of treating a problem makes sense in reality.

Applying this lesson to Price’s Equation, suppose a population is divided into groups. In each group there are two gene types, A and B, differing in frequency in the various groups. Let us suppose that both A and B have a constant fitness, regardless of their frequency within the groups, and that A is fitter than B. The groups in which A has a higher frequency therefore have a higher collective fitness, and the Covariance term will be positive. (For a simple worked example see the Note.) Yet there is nothing that can reasonably be called a group effect, since by assumption the two gene types have a constant fitness regardless of group composition. The change of gene frequency in the population between generations would be exactly the same if the groups were mingled together.

It is evident in this example that the positive Covariance term is just a side-effect of individual fitness differences. It is not quite so obvious what is happening to the Transmission term. Since the Covariance and Transmission terms must still add up to the overall change in population frequency, an increase in the Covariance term (as compared with the position if all groups had the same gene frequency) must be offset by a reduction in the Transmission term, which in Price’s analysis is identified with individual fitness. A little consideration of the example in the Note should clarify what is happening. The Transmission term depends on the average change of frequency within groups. But this is not a reliable indicator of the overall fitness of different gene types in the population. Suppose for example that in one group type A has already reached a very high frequency – say 95%. It cannot possibly grow by more than a further 5 percentage points in that group. Yet that might represent a much larger increase of frequency in the population as a whole. There is no simple way of inferring frequency changes in the whole population from the average change within groups.

In the worked example the Covariance term arising from purely individual selection is relatively small, accounting for only about 10% of the overall frequency change, even though I have deliberately made the difference in initial frequency of the gene between groups moderately large (.5 against .2). In general the Covariance term arising from individual selection will be small unless the difference in initial frequency is very large. At the extreme, if all groups have either 100% or 0% frequency of the gene, then there will be no within-group change in frequency, and the whole of the population change will be attributable to the Covariance term, even though it might be proved from other evidence that the fitness of the gene is independent of group composition.

There will also be cases where the fitness of different gene types is not completely independent of group composition, but where part of the overall fitness of the gene can reasonably be attributed (by experimental controls or otherwise) to purely individual-level effects. Yet these individual effects will still affect the Covariance term. The Covariance and Transmission terms simply do not, as a matter of principle, map precisely onto group and individual effects.

This complication is apparent as soon as one works through a simple numerical example of the kind in the Note, yet it does not seem to have been mentioned in the literature until 1985, when papers by Grafen and Nunney independently drew attention to it. (Grafen believed that an obscure passage in Hamilton’s 1975 paper alluded to the same problem, but Hamilton seems to me to have been making a rather different point.) Even after 1985 it has seldom been given much prominence: for example, in Sober and Wilson’s long book Unto Others the main text waxes eloquent on Price’s Equation, while the complications are discreetly buried in a note on page 343.

A partial remedy to the problem is to use what is known as ‘contextual analysis’ (see for example Okasha). The general principle of this is to use partial regression techniques to distinguish the individual fitness of genes when group composition is ‘held constant’, and vice versa. This does indeed eliminate spurious effects like my ‘Weightwatchers’ example, but it gives up the attractive simplicity of the Price approach, and except in special cases it does not entirely separate group and individual effects; it just shows what the individual fitness of a gene type would be in a group of ‘average’ composition so far as the particular population is concerned.


My main conclusions from this discussion can be stated briefly. Price’s approach is indeed a useful contribution to the debate on group selection. In considering any particular example or model of (purported) group selection or altruism, it is possible to use Price’s Equation to analyse the change in between-group and within-group frequencies. If the Covariance term is non-zero, it is then important to consider what is giving rise to a positive or negative covariance. As Price himself noted, we also need to look at the intergroup variance in gene frequency, without which there can be no covariance.

On the other hand, Price’s Equation is far from giving an uncontroversial solution to the questions raised in the group selection debate. Even if there were no ‘blur’ between individual and group effects, Price’s approach implicitly adopts a ‘weak’ criterion for altruism, and interprets kin selection as a form of group selection rather than an alternative to it. Many biologists would think that this ignores some of the key questions in the debate: can there be strong altruism without kin selection, and if so what mechanisms are at work?


Suppose there are two gene types, A and B, and two groups each containing some A and B members. Suppose in the first and second generations the numbers in the groups are:
First generation
Group 1: 50 A, 50 B
Group 2: 20 A, 80 B

Second generation
Group 1: 100 A, 50 B
Group 2: 40 A, 80 B

It can be seen that in both groups the A and B types have a constant fitness: each A member has (on average) 2 offspring (we assume asexual reproduction), and each B member has on average 1 offspring.

The overall increase in the frequency of A in the population is 140/270 – 70/200 = .1685. If we calculate the increase of frequency within each group it is .1666 for Group 1 and .1333 for Group 2. Note that neither of these is as high as the increase in the whole population: the population change is not in any sense a weighted average of the group changes.

If we now calculate the terms of Price’s Equation (I will omit the details), we get .01666 for the Covariance term, and .15185 for the Transmission term, which total to .1685, the overall population increase, as expected.


G. R. Price, ‘Selection and covariance’, Nature, 227, 1970, 520-21.
G. R. Price, ‘Extension of covariance selection mathematics’, Annals of Human Genetics, 35, 1972, 485-90.

Alan Grafen: ‘Natural selection, kin selection and group selection’ in J. Krebs and N. Davies (eds.) Behavioural Ecology: an evolutionary approach, 2nd edn., 1984
Alan Grafen: ‘A geometric view of relatedness’ in Oxford Surveys in Evolutionary Biology, 2, 1985.
W. D. Hamilton: Narrow Roads of Gene Land, vol. 1, 1996.
Oren Harman: The Price of Altruism: George Price and the Search for the Origins of Kindness, 2010.
L. Nunney: ‘Group selection, altruism, and structured-deme models’, American Naturalist, 126, 1985, 212-35.
Samir Okasha: ‘Multi-level selection, covariance, and contextual analysis’, British Journal for the Philosophy of Science, 55, 2004, 481-504.
Elliott Sober and David Sloan Wilson: Unto Others: the evolution and psychology of unselfish behaviour, 1998.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

Back in 2004 I wrote a summary of Richard Dawkins’s oft-cited article ‘Twelve Misunderstandings of Kin Selection’. At that time the article was not, as far as I could see, available on the internet or in any easily accessible reprint. However, I have found that a free online pdf is now available, and anyone interested in the subject can read it here.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

There is one last loose end to tie up before concluding my series on George Price.

In a previous post I discussed the meaning of altruism in biology, and the distinction between strong and weak altruism. With strong altruism the altruist obtains no benefit from its own actions, whereas with weak altruism it does, though less than other members of a group to which it belongs.

W. D. Hamilton showed that strong altruism cannot evolve if altruism is randomly distributed in each generation. By varying his key assumptions several authors have put forward models in which a population is divided randomly into groups, yet altruism still increases in frequency. For example, groups may be formed randomly, but the aggregate benefit of altruism increases disproportionately with the number of altruists in the group.

A more surprising result is that strong altruism may evolve if groups are formed randomly, but allowed to persist for more than one generation. A natural first suspicion is that this is simply due to kin selection. In small groups which persist for several generations, many of the members in later generations are likely to be close relatives. However, an ingenious recent study by Fletcher and Zwick shows that in some circumstances strong altruism can evolve even if benefits to close relatives are excluded.

The general principle behind Fletcher and Zwick’s model can be illustrated with an example. Suppose a large (theoretically infinite) population is divided into groups of 4 members. Let the proportion of altruists in the population be 20%. Altruists and non-altruists are assigned to groups randomly. Groups may therefore have 0, 1, 2, 3 or 4 altruists. The expected proportions of these types of group in the population, and the proportions of all altruists belonging to each type, can be worked out using elementary probability theory. For example, with the parameters just stated, the number of groups which contain a single altruist, as a proportion of all groups, is just over 40%, while the proportion of all altruists belonging to such groups is just over 51%. Of course there are also groups – in fact around 40% of them – containing no altruists at all.

For simplicity we assume that reproduction is asexual. As a baseline for fitness assume that a non-altruist in a group with no altruists has 1 offspring. An altruist incurs a cost of c and distributes a total benefit b to the other members of its group, who share it equally. In a group with only 1 altruist the fitness of the altruist will be 1 – c, while the fitness of each of the 3 non-altruists will be 1 + b/3. In a group with 2 altruists each altruist will have a fitness of 1 – c + b/3 (since it obtains a benefit b/3 from the other altruist) and each non-altruist will have a fitness of 1 + 2b/3. In a group with 3 altruists each altruist will have a fitness of 1 – c + 2b/3 and the non-altruist will have a fitness of 1 + b. In a group with 4 altruists, each altruist will have a fitness of 1 – c + b.

At the end of the first generation, in the ‘mixed’ groups (on average) the proportion of altruists will have declined, since non-altruists are fitter than altruists within the group. It can also be shown that the proportion of altruists in the total population declines, as we would expect from Hamilton’s result. If all groups were now broken up and their members assigned randomly to new groups, as in Hamilton’s model, the proportion of altruists would decline again in the next generation, and so on indefinitely. It is also easy to see that as the frequency of altruists in the population declines, the proportion of all altruists belonging to single-altruist groups will increase (and more generally, the average number of fellow-altruists in a group will fall). Since altruists get less of the benefit of altruism in each successive generation, altruism is doomed.

The crucial difference in Fletcher and Zwick’s model is that the offspring produced by each group are allowed to stay together for at least one more generation. Altruists in relatively altruist-rich groups have more offspring than those in altruist-poor ones, since they receive fitness benefits from their fellow-altruists. If all offspring stay in their parental groups the average number of fellow-altruists in a group will therefore rise. This contrasts with the position if the offspring were assigned randomly to new groups. It follows that in the Fletcher-Zwick model there is a departure from random assortment, and a positive association of altruists in the second and any subsequent generations. It is this positive association which in principle allows even strong altruism to escape from Hamilton’s negative result. If altruists are positively associated, then the benefits of altruism within the total population go disproportionately to altruists, even though within a mixed group non-altruists receive more benefits than altruists. All that is needed is for groups to have some persistence over more than one generation, which seems a very modest and realistic requirement. (Of course it is also necessary for b to be greater than c, otherwise even in a group consisting entirely of altruists their fitness will be below the baseline.) Fletcher and Zwick show that their finding is still valid if various modifications are made to their model. Notably, altruism towards close relatives (kin selection) can be excluded, groups consisting entirely of altruists can be excluded, and a moderate amount of migration between groups can be allowed, yet altruism can still evolve (though less easily). It is also possible for altruism to evolve when the altruistic gene is initially rare (of the order of 1 in 1000), provided the benefits are high enough.

The Fletcher-Zwick model is a notable contribution to the debate on group selection, and may clarify some otherwise puzzling cases [see Note 1].

Now for some reservations. The Fletcher-Zwick process only works if groups are very small, if the collective benefit of altruism is large compared to the cost, or both. Fletcher and Zwick consider mainly groups between 2 and 5 in size, and even for these tiny groups, the collective benefit has to be around 5 times the cost for altruism to survive when the relevant gene is already common in the population. The collective benefit has to be much larger if altruism is to grow from initial rarity. But if groups are very small, and persist over two or more generations, we would expect many of the members (after the first generation) to be close relatives, and the growth or survival of altruism would be assisted by kin selection. Kin selection is a stronger mechanism than the Fletcher-Zwick process, as it allows altruism to evolve even when the benefit-cost ratio is relatively low (as low as 2:1 for full siblings), and is as effective when a gene is rare as when it is common (provided it is not a brand new mutation). With kin selection groups may also persist indefinitely without altruism losing its selective advantage, whereas with the Fletcher-Zwick process a periodical reassortment is necessary. It may therefore be that in nature there are few circumstances where the Fletcher-Zwick process would be a major factor in the evolution of altruism.

Note 1
The Fletcher-Zwick process may help explain a result which I found puzzling a few years ago. A paper by Harpending and Rogers described a model of group selection in which altruism could evolve despite an apparently random distribution of altruists. In their model groups are formed randomly but persist for more than a generation. The group size is fixed, and in every generation there are more births than deaths. The surplus goes into a ‘migrant pool’. Deaths are replaced either by births within the group or, with a specified probability, by a migrant. Groups may contain altruists and non-altruists. The birth-rate of all members of the group increases in proportion to the number of altruists (the beneficial effect of altruism), but altruists have a higher death-rate in any given period of time (the cost of altruism).

At the time when I commented on this model I did not appreciate the distinction between weak and strong altruism. On considering it again now, it seems that it involves weak altruism, since the birth rate of all members of a group rises with the number of altruists; altruists therefore enhance their own birth rate. With weak altruism it is accepted that altruism can evolve even with a random distribution of altruism in each generation. This may therefore account for Harpending and Rogers’s results. But in their model groups also have a partial persistence for more than one generation. The Fletcher-Zwick process may therefore also come into play. I still don’t fully understand what is going on, but at least it is now less puzzling that altruism can evolve despite the initial random distribution.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

I am writing a series of posts on the work of George Price. For the most recent one, with links to the others, see here I was planning next to cover Price’s treatment of group selection, but this raises side issues more conveniently dealt with separately. A previous post here considered what is meant by group selection. In the present post I look at definitions of altruism as used in biology. It has taken me a while to complete, partly because I found there is a lot of recent literature on the subject which I needed to digest. A valuable but difficult recent survey is here.

Of course in biology the term ‘altruism’ does not imply conscious intention on the part of the altruist. The altruist could be a plant, an insect, or a bacterium. Definitions of altruism such as ‘regard for others as a principle of action’ (Pocket Oxford Dictionary) are therefore not appropriate in biology. What is needed is a definition in terms of reproductive fitness.

The recent interest in biological altruism stems from W. D. Hamilton’s brief 1963 paper on ‘The evolution of altruistic behaviour’ and his two-part paper of 1964 on ‘The genetical evolution of social behaviour’. Both papers refer frequently to altruism, but Hamilton’s concept of altruism has to be inferred from what he says about it rather than from an explicit definition. Altruistic behaviour involves a ‘loss’, ‘cost’, ‘risk’ or ‘disadvantage’ to the personal fitness of the altruist, as measured by its production of offspring, while conferring a ‘gain’, ‘benefit’, or ‘advantage’ to one or more other individuals in the population. An altruist gains no direct benefit from its actions. It is implicit in Hamilton’s treatment that the altruist has lower personal fitness as a result of its altruistic behaviour than if it had not behaved altruistically. Later, in 1975, Hamilton more explicitly modelled the circumstances in which altruism could evolve, and again assumed that the benefits of altruistic behaviour go only to other members of the population. In this model altruism can only evolve if altruists receive sufficient benefits from each other to offset the costs.

Hamilton took the starting point for his discussion of altruism from J. B. S. Haldane, the only previous author to give it a mathematical treatment. In a section of Haldane’s book The Causes of Evolution (1932) devoted to ‘socially valuable but individually disadvantageous characters’, Haldane refers to ‘altruistic’ or ‘self-sacrificing’ conduct. In his mathematical model altruism is defined in relation to fitness within a group, not (as in Hamilton’s approach) in terms of its effects on the altruist’s own fitness. Haldane assumes that a population is divided into groups, some of which contain altruists and others do not. The possession of the altruistic trait decreases the fitness (measured by ‘probable progeny’) of the altruist to (1 – k) times that of the non-altruists in its own group. The benefit of altruism is measured by its effects on the fitness of all members of the group, including the altruists: the presence of a fraction x of altruists increases the ‘probable progeny’ of all members of the group to (1 + Kx) times that of a group with no altruists. In this model an ‘altruist’ obtains some of the benefits of its own altruism. Within a group, non-altruists are always fitter than altruists, but an altruist may still be fitter than non-altruists in the population generally. It is therefore possible in principle that an individual who ‘converts’ to altruism may have more offspring than otherwise.

The distinction between Hamilton-type and Haldane-type altruism is sometimes described as a difference between ‘strong’ and ‘weak’ altruism. ‘Weak’ altruism can be defined by one or both of the following characteristics:

1. a weak altruist is less fit than non-altruists within its own group, but not necessarily in the general population;
2. a weak altruist receives some net fitness benefit from its own altruistic acts, but not as much as other members of its group.

A scenario that fits description (1) will also usually fit description (2), and vice versa, but they tend to lead to different mathematical formulations that are not easily interconvertible. The natural way of formulating description (1) is to set the fitness of an altruist as some fixed proportion of that of non-altruists within its group, as in Haldane’s model. The natural way of formulating description (2) is to suppose that an altruist donates some fixed benefit B to all the N members of its group, including itself, while incurring a cost C, so that its net benefit from its own action is B/N – C. On this approach the fitness of an altruist cannot be expressed as a fixed proportion of the fitness of non-altruists, since this will depend on the number of altruists in the group. If there is more than one altruist, the altruists will derive some benefit from each other, and their fitness relative to non-altruists will vary with the composition of the group. As the proportion of altruists in a group rises, the larger the proportion of the total benefit they receive, so that the fitness of altruists approximates to that of non-altruists. [See Note 1 for a numerical example.]

The distinction between strong and weak altruism may seem a rather minor one, but it is important in some contexts. Notably, if altruism is ‘strong’, it cannot evolve by natural selection if the distribution of altruistic benefits in the population is random. If an altruist distributes a benefit B randomly to the other members of the population, while incurring a cost C, the benefit is shared in proportion to the existing frequency of altruists and non-altruists [but see Note 2], while the cost falls only on the altruists. Altruism will always be at a disadvantage. A division of the population into groups, followed by an equal or random distribution of benefits within them, makes no difference if the division itself is random in every generation, as this is just a roundabout way of distributing the benefit randomly.

If altruism is ‘weak’, on the other hand, division of the population into groups may well give altruism an advantage. An altruist is always in its own group to receive the benefit of its actions. Suppose that the population N is divided randomly into groups of size n, and that each altruist distributes a benefit B/n to each member of its group, including itself. The altruist incurs a cost C, so its net benefit from its own actions is B/n – C. The other members of the group each receive the larger benefit B/n. But in the population as a whole not all members will receive a benefit from that altruist, and if altruism is rare some non-altruists may not receive a benefit from any altruist. Even if altruism is common, each altruist still has a privileged access to the benefits of its own action [see Note 3].

It may be misleading to put so much emphasis on the fact that a weak altruist receives a benefit from its own actions. If we reformulate the model so that an altruist distributes the benefit to a group of individuals randomly selected from the entire population, including the altruist itself, then altruism is still ‘weak’ by the usual definitions. But under this model the distribution of benefits is truly random, and therefore neither increases or decreases the frequency of altruists in the population, while the costs fall only on the altruists. Altruism is therefore doomed. In contrast, under the usual models of weak altruism, the formation of groups is random, but the distribution of benefits is not, since an altruist has a guaranteed share of its own benefit. It is this non-randomness which can give ‘altruism’ an edge.

We would get the same result if each altruist gave part of its benefit to some other selected altruist, while the rest of the benefit was distributed randomly to the whole population except the donor itself. This model would fall under the usual definition of strong altruism, but would behave like weak altruism. The crucial point is not whether or not the altruist enjoys part of its own benefit, but whether there is preferential treatment of altruists. In the model just described, it is obvious that the overall distribution of benefits is not fully random: it is a mixture of random and selective distribution. In the usual models of weak altruism, the selective element is obscured by the fact that distribution within the group is random, and it is easy to overlook the altruist’s non-random access to its own benefit.

Whether one chooses to call ‘weak’ altruism a form of altruism at all is a matter of taste. Definitions are not right or wrong, but convenient or inconvenient. The choice need not however be entirely arbitrary. We do not make definitions just for fun: there is some underlying point to them. In the case of biological altruism, the main point of classifying some actions as altruistic is that they seem to require some special evolutionary explanation, beyond straightforward natural selection of individual reproductive fitness. If an action does not, on closer analysis, require any such special explanation, no purpose seems to be served by classifying it as altruistic, and there is a danger of confusion by lumping it together with actions which do require such explanations.

Note 1

Suppose baseline fitness (the fitness of a non-altruist in a group with no altruists) is 1, the group size N = 4, the benefit B = 10, and the cost C = 1. In a group with one altruist, the fitness of the altruist will be 1 + 10/4 – 1 = 2.5, while the fitness of the non-altruists will be 1 + 10/4 = 3.5, giving a ratio of 2.5/3.5 = .714. In a group with two altruists, the fitness of the altruists will be 1 + (2 x 10/4) – 1 = 5, while the fitness of the non-altruists will be 1 + (2 x 10/4) = 6, giving a ratio of 5/6 = .833. In a group with three altruists, the fitness of the altruists will be 1 + (3 x 10/4) – 1 = 7.5, while the fitness of the non-altruists will be 1 + (3 x 10/4) = 8.5, giving a ratio of 7.5/8.5 = .882.

Note 2

For mathematical convenience it is common to assume that the population is infinite, with a population frequency P of altruism. If groups of size n are formed at random, then the frequency of altruism among the fellow group members of any given altruist is still P, and if the benefits are distributed equally or randomly to the fellow members, the benefits will go to altruists and non-altruists in the proportions P:(1-P). In any real, finite, population this cannot strictly be true. If there are K altruists in a total population of N, then the frequency of altruism among the fellows of a given altruist will be (K -1)/(N – 1), which is slightly less than K/N (assuming N > K, and both N and K are reasonably large). Thus proportionately slightly more of the benefit will go to non-altruists than to altruists. For most purposes this slight discrepancy does not matter, but in special cases – for example, very small populations – it should not be overlooked.

Note 3

A numerical example may illustrate the point. I will avoid the simplification mentioned in Note 2.

Suppose there is a population of 100, divided into 50 altruists and 50 non-altruists. The population is randomly divided into groups of 10. Each altruist distributes a total benefit of 20 fitness units equally among the members of its group, including itself. Each therefore receives 2 units. The altruist incurs a cost of 1 unit and therefore receives a net benefit of 1 unit from its own actions.

Since the division of the population into groups is random, the 9 fellow group members of a given altruist will, on average, have the same proportion of altruists and non-altruists as in the population excluding the given altruist itself, namely 49/99 and 50/99. The expected value of the altruist’s benefits distributed within the group to non-altruists is therefore 50/99 x 9 x 2 = 9.0909… fitness units. The expected value of the altruist’s benefits distributed within the group to altruists other than itself is 49/99 x 9 x 2 = 8.90909…

As already noted, the net benefit to the altruist from its own actions is 1. The total net benefit to all altruists in the group, including the ‘focal’ altruist, is 1 + 8.90909… = 9.9090… , which is greater than the 9.0909… units going to non-altruists. Overall, the altruist therefore confers more fitness units on altruists (including itself) than on non-altruists. But the initial frequency of altruists in the population was 50:50, so altruists are getting more than their proportionate share of the benefits, and the frequency of altruism in the population will increase.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

I am writing a series of posts on the work of George Price. For the most recent one, with links to the others, see here. I was planning next to cover Price’s treatment of group selection, but this raises side issues more conveniently dealt with separately. This post considers what is meant by ‘group selection’. I have tried to establish what various key authors meant by the term (or similar expressions) up to the mid-1970s, when Price’s own work began to be influential.

The concepts of both group selection and kin selection can be traced back to Charles Darwin. I discussed his views at length here. Other biologists of the late 19th century, such as Alfred Russel Wallace, George Romanes, August Weismann and Karl Pearson, were more willing than Darwin himself to assume that natural selection could operate against the direct fitness of individuals, but they seldom considered in any depth how this might work, and so far as I know the term ‘group selection’ was not used. Weismann introduced the curious term ‘Cormal Selection’ (in German ‘Cormal-selektion’): ‘We may distinguish as a fourth grade of selection Cormal Selection, that is, the process of selection which effects [sic] the adaptation of animal or plant stocks or corms, and which depends on the struggle of the colonies among themselves. This differs from personal selection only in that it decides, not the fitness of the individual person, but that of the stock as a whole. It is a matter of indifference whether the stocks concerned are stocks in the actual material sense, or only in the metaphorical sense of sharing the common life of a large family separated by division of labour’ [Weismann, v2, p.378]. The examples Weismann gives (polyps, ants and termites) suggest that he was thinking mainly of species where reproduction is clonal or confined to a specialised reproductive caste.

In the early 20th century evolutionary biology was embroiled in bitter controversies, surveyed in a book by Vernon Kellogg around 1910. I have not read this for the purposes of the present discussion, but on looking through Kellogg’s detailed table of contents I do not see anything directly relevant. Biologists in this period were probably too busy arguing about whether natural selection was important at all to worry much about secondary issues like the levels at which selection operates.

One of the most influential discussions of group selection before the 1930s was written not by a biologist but by the demographer A. M. Carr-Saunders. He collected evidence that families in hunter-gatherer tribes have fewer children than the physiological maximum, and that they regulate their numbers by abortion, infanticide, and abstention from intercourse. He believed that they do this to maintain the size of their group at an optimum in relation to their food supplies. Where many biologists would have been content to mutter ‘for the good of the species’, Carr-Saunders saw the need for a better explanation, and went on: ‘The problem we have to face is how these practices could come to be of the necessary intensity. Now men and groups of men are naturally selected on account of the customs they practise just as they are selected on account of their mental and physical characters. Those groups practising the most advantageous customs will have an advantage in the constant struggle between adjacent groups over those that practise less advantageous customs. Few customs can be more advantageous than those which limit the group to the desirable number, and there is no difficulty in understanding how – once any of these three customs had originated – it would by a process of natural selection come to be so practised that it would produce an approximation to the desirable number’. [Carr-Saunders p.223] This would now be regarded as an example of cultural group selection: those groups with the best customs are most likely to survive and flourish. We may fairly see this as an example of the concept of group selection, but the term itself has not yet appeared. Carr-Saunders’s discussion is cited by several later writers, including V. C. Wynne-Edwards.

The first use of the exact term ‘group selection’ I have found is in R. A. Fisher’s Genetical Theory of Natural Selection (1930). In discussing the origin of ‘the qualities recognized by man as socially valuable’, he says that ‘it has hitherto only been possible to ascribe their evolutionary development to the selection of whole organized groups, comparable to the hives of the social insects. The selection of whole groups is, however, a much slower process than the selection of individuals, and in view of the length of the generation in man the evolution of his higher mental faculties, and especially the self-sacrificing element in his moral nature, would seem to require the action of group selection [my italics] over an immense period’. [p.245] Fisher’s own explanation for ‘socially valuable’ traits avoids group selection. He argues that in primitive societies there is a positive association of fertility and social status, and that individuals showing qualities such as courage, generosity, and honesty would have higher social status, bringing with it higher fertility (more wives, etc) and survival of offspring.

Fisher’s contemporary J. B. S. Haldane also discussed ‘socially valuable but individually disadvantageous characters’. He considered three ways in which such characters could evolve. The first is the special case of parental care. The second is that of social insects. Haldane notes that ‘in a beehive the workers and young queens are samples of the same set of genotypes, so any form of behaviour in the former (however suicidal it may be) which is of advantage to the hive will promote the survival of the latter, and thus tend to spread through the species’ [Haldane p.120]. Finally, he considers the case of ‘small social groups where every individual is a potential parent’. Though he does not use the term ‘group selection’ as such, he develops the first mathematical model of group selection, concluding that it was theoretically possible but required a rather stringent set of conditions to work in the long term.

Fisher’s other great contemporary, Sewall Wright, was more positive about group selection. From 1929 onwards Wright used the term ‘intergroup selection’, and argued that it was important in evolution. I have discussed Wright’s views elsewhere, so will not describe them again, beyond remarking that he did not initially claim that intergroup selection could account for ‘altruistic’ traits. He added this claim only in 1945, saying: ‘it is indeed difficult to see how socially advantageous but individually disadvantageous mutations can be fixed without some form of intergroup selection’ [Wright p.397]. But the evolution of altruism is difficult to reconcile with the process of intergroup selection Wright had previously described. In this process Wright postulates that some local populations (demes) of a species will, by genetic drift and other chance events, produce new and advantageous combinations of genes, giving their members a selective advantage over those of other populations. They will then spread, by expanding their range or by sending out successful migrants. Whether or not this process is plausible in itself, it would not explain the spread of altruism. By definition altruistic individuals are at a selective disadvantage in competition with non-altruists. While it is possible that by genetic drift altruistic traits might reach a high frequency (or even fixation) in some demes, the intergroup selection of altruism seems to require some kind of competition between groups as a whole, and some mechanisms, such as restrictions on interbreeding, or punishment of free-riders, to prevent ‘altruistic’ groups from being contaminated and undermined by non-altruists.

In the 1930s Wright was based at the University of Chicago, which was also the home of the ‘Chicago school’ of ecologists led by W. C. Allee. Allee and his colleagues placed great emphasis on co-operation among animals, for example in his book Animal Aggregations (1931). This work seems to have influenced Wright in believing that altruism was a common phenomenon, and in turn Wright’s endorsement of intergroup selection gave the Chicago ecologists an apparent theoretical basis for their views. The standard work Principles of Animal Ecology (1949) by W. C. Allee, A. E. Emerson, et al, frequently referred to intergroup selection, with reference to Sewall Wright for its justification.

In Britain as well as in the United States many biologists before the 1960s held a vague belief that evolution could work ‘for the good of the species’. There were however some defenders of a more individualistic concept of natural selection, such as E. B. Ford and the ornithologist David Lack. Lack was especially influential in combating the view that the size of animal populations is self-regulated in the interests of the species. The research of Lack and others tended to show that individual animals maximised the number of offspring they could successfully raise [Lack 1954]. By 1960 this was probably the mainstream doctrine. It was however resisted by the British ornithologist V. C. Wynne-Edwards, whose studies culminated in his book Animal Dispersion in Relation to Social Behaviour (1962). (An accessible summary is in his article in Nature.)

It is difficult now to read Wynne-Edwards’s book, not only because it is very long, but because it is exasperating to see his constant invocation of group selection to explain phenomena which usually have obvious explanations by individual, kin, or sexual selection. It is however desirable at least to skim the book, in order to get a sense of why the group selection controversy aroused such passions in the 60s and 70s. As John Maynard Smith plaintively recalled when combating revivals of group-selectionism by Elliott Sober and others, ‘In 1962, when Wynne-Edwards published his book, biology was riddled with “good-of-the-species” thinking… It was quite clear to me, as it must have been clear to George Williams, that no progress would be made toward understanding the evolution of such traits until this kind of thinking was ended… If Sober’s way of describing the world is taken seriously, it will again cease to be obvious, and someone (not me, next time) will have the job to do over again.’ [Maynard Smith, 1987, p.147]

The main thesis of Wynne-Edwards’s book is that a very wide range of animal behaviour, including mating rituals, territory-holding, dominance hierarchies, breeding in colonies, dispersal, and migration, have evolved in order to maintain population size at an optimal level for populations, to avoid exhausting their food supplies. Wynne-Edwards explicitly appeals to group selection to explain these phenomena. (While he sometimes uses Wright’s term ‘intergroup selection’ more often he refers to ‘group-selection’.) For example, he argues that sea birds nest in colonies so that they can assess the size of the population and adjust their breeding to maintain it within the available fishing resources. A necessary component of the theory is that animals exercise ‘prudential restraint’ in the interests of their group as a whole. Of course Wynne-Edwards does not suggest that this is done by conscious calculation, but that groups in which such habits prevail will in the long term survive, while those without such habits will die out. To explain how such traits emerge in the first place, he appeals, predictably, to Sewall Wright’s theory of genetic drift.

Even before Wynne-Edwards’s book was published, W. D. Hamilton was working on his theory of inclusive fitness. (Fisher and Haldane had both anticipated the concept in principle, but had not developed a mathematical theory of it.) In his first short paper in 1963, Hamilton noted that ‘the explanation usually given for such cases [of 'altruistic' behaviour towards individuals other than direct descendants] and for all others where selfish behaviour seems moderated by concern for the interests of a group is that they are evolved by natural selection favouring the most stable and co-operative groups’ [Hamilton p.6] As Hamilton proceeded to show in detail in his papers of 1964, an alternative explanation in terms of inclusive fitness is usually available.

One of the first to see the advantages of Hamilton’s approach was John Maynard Smith. In a short but influential paper in Nature Maynard Smith [1964] coined the term ‘kin selection’, saying ‘it is possible to distinguish two rather different processes, both of which could cause the evolution of characteristics which favour the survival, not of the individual, but of other members of the species. These processes I will call kin selection and group selection, respectively’. He ascribes the concept of kin selection to Haldane and Hamilton, and says ‘by kin selection I mean the evolution of characteristics which favour the survival of close relatives of the affected individual, by processes which do not require any discontinuities in population breeding structure’. ‘Close relatives’ include offspring and siblings, but presumably sometimes more distant relatives.

Maynard Smith says little further about kin selection, and turns to discuss group selection: ‘If groups of relatives stay together, wholly or partially isolated from other members of the species, then the process of group selection can occur. If all members of a group acquire some characteristic which, although individually disadvantageous, increases the fitness of the group, then that group is more likely to split into two, and in this way bring about an increase in the proportion of individuals in the whole population with the characteristic in question. The unit on which selection is operating is the group and not the individual… The distinction between kin selection and group selection as here defined is that for kin selection the division of the population into partially isolated breeding groups is a favourable but not an essential condition, whereas it is an essential condition for group selection, which depends on the spread of a characteristic to all members of a group by genetic drift’. Maynard Smith later discusses the circumstances in which group selection can succeed, using a simplified mathematical model, and concludes, like his mentor Haldane before him, that it requires rather stringent conditions.

Several points about Maynard Smith’s definitions should be noted. First, he applies both kin and group selection only to ‘characteristics which favour the survival, not of the individual, but of other members of the species’. Selection which goes in the same direction at both individual and group level is therefore ignored, presumably not because it cannot happen but because it presents no interesting problem. Second, Maynard Smith’s definition of kin selection does not require that the behaviour of ‘altruists’ is directed preferentially towards relatives, but only that its effect is to favour them (for example because they are geographically concentrated). This point was sometimes misunderstood by later writers. Third, the definition of kin selection explicitly includes parental care. Fourth, the definition of group selection requires separation of the population into partially isolated groups. Finally, Maynard Smith confines group selection to cases where some groups consist entirely of altruists, as a result of genetic drift.

These last two points imply a rather narrow definition of group selection. Later writers have devised models, which they describe as models of group selection, in which groups are not geographically isolated, or which contain a mixture of altruists and non-altruists, but where altruism still prevails. Maynard Smith himself criticised this wider usage of the term group selection, mainly because he believed that it blurred the distinction between group and kin selection, as in such models the success of altruism often depended on the geographical proximity of relatives. But this criticism was not always valid, as there are at least some models in which altruism can prevail without help to relatives.

The other critics of Wynne-Edwards’s theory included C. S. Elton, David Lack and George C. Williams. Elton, Britain’s leading animal ecologist, sceptically reviewed Wynne-Edwards’s book, pointing out that animal populations often fluctuated widely, with little sign of ‘self-regulation’ [Elton 1963]. Lack’s contribution [Lack 1966] was primarily to survey the empirical evidence for alternatives to group selection in explaining animal behaviour. Williams covered both empirical and theoretical aspects, with an entire chapter of his book devoted to group selection (a term which he attributes to Wynne-Edwards [Williams p.96]) and several others dealing with specific aspects of the problem. Like Maynard Smith, Williams is concerned only with cases where selection at the level of the group seems to act against the interest of individuals, and he specifies that ‘a group in this discussion should be understood to mean something other than a family and to be composed of individuals that need not be closely related’ [p.93]. In general he is even more sceptical than Maynard Smith about the importance of group selection, stressing that it should only be invoked ‘when the simpler explanation is clearly inadequate’ [p.124]. He does however appear to accept that over the very long term (geological time) group selection at the level of species as a whole could be important in the pattern of evolution. [p.97-101]

The work of George C. Williams brings us almost to the time at which George Price entered the debate on group selection. Price’s work however made little impact until the mid 1970s. In particular, his papers on ‘Price’s Equation’, and its application to group selection, are not mentioned in E. O. Wilson’s book Sociobiology: the New Synthesis (1975). It will therefore be worth looking briefly at the treatment of group selection in this. At the outset of his discussion Wilson adopts a very broad definition of group selection: ‘selection can be said to operate at group level, and deserves to be called group selection, when it affects two or more members of a lineage group as a unit’. However, within this broad definition he distinguishes between kin selection and ‘interdemic’ selection: ‘If selection operates on any of the groups [of relatives] as a unit, or operates on an individual in any way that affects the frequency of genes shared by common descent in relatives, the process is referred to as kin selection. At a higher level, an entire breeding population may be the unit, so that populations (that is, demes), possessing different genotypes are extinguished differentially, or disseminate different numbers of colonists, in which case we speak of interdemic (or interpopulation) selection’ [p.106].

Curiously, in the Glossary to his book Wilson defines kin selection as ‘the selection of genes due to one or more individuals favoring or disfavoring the survival and reproduction of relatives (other than offspring) who possess the same gene by common descent…’ [p.587]. The exclusion of offspring from kin selection does not seem consistent either with Wilson’s own earlier definitions, with Maynard Smith’s original definition of kin selection, or with Hamilton’s concept of inclusive fitness. Wilson’s rationale for the exclusion may have been that offspring contribute to an individual’s own reproductive fitness, so that help given to offspring is covered by conventional Darwinian natural selection without any need for the concept of kin selection. But this is an oversimplification if there is any choice in the allocation of an individual’s resources between producing new offspring, helping existing offspring, and helping other relatives. In this situation the principles of inclusive fitness have to be applied to explain why, for example, it is usually better to help offspring than nephews.

To draw a few conclusions from this brief survey:

1. The term ‘group selection’ itself was seldom used before the 1960s, though Wright’s term ‘intergroup selection’ was used by various authors.

2. There was very little serious analysis of the concept of group selection before Wynne-Edwards’s book and the subsequent controversy. The only quantitative approach, so far as I know, was that of Haldane. The remarks of Fisher and Wright on the subject were vague and qualitative. Sewall Wright’s authority, however, was such that his acceptance of group selection was influential on other biologists, especially in America.

3. The introduction of the concept of inclusive fitness by Hamilton (and Maynard Smith’s handy terminology of ‘kin selection’) brought about a useful clarification. There was little prospect for a useful discussion of group selection so long as biologists failed to distinguish between cases such as the social insects, where close genetic relatedness is important, from cases (real or hypothetical) where group selection does not depend at all on relatedness.

4. Even after the debates of the 1960s, there was no uniformity of usage in the definition of group selection – E. O. Wilson, for example, had a different usage from Maynard Smith. (I would add that there was also disagreement over the definition of altruism, but that is a subject for another post.) The continuing controversy in the 70s and beyond has been aggravated by the confusion of semantic and substantive issues.

Added on 16 January: In comments Peter Mazsa has linked to a Google Books text search for the phrases ‘group selection’ and ‘intergroup selection’. This shows some uses of these terms going back to the 1890s. As I would have expected, there is a great surge in the use of ‘group selection’ after 1960, but I was surprised to see that between 1930 and 1960 there are more results for ‘group’ than ‘intergroup’ selection. I have looked at some of the detailed search results and find there is a snag in interpreting these data, as the phrase ‘group selection’ had several technical uses in contexts quite unconnected with its use in evolutionary biology, for example in forestry, telephony, insurance and psychotherapy! These uses probably account for the majority of results before 1960. But there are certainly also some relevant biological uses from this period. The earliest I have so far identified is in an essay of 1895 by the sociologist and political philosopher Bernard Bosanquet. His comments are quite interesting and I may do a separate note on them.
Added on 21 January: using Google Books search I looked for the very first relevant use of the term ‘group selection’. The earliest I can find is in an essay of 1895 on ‘Socialism and Natural Selection’, by the political philosopher Bernard Bosanquet, available here (see the middle of page 294). Bosanquet’s essay was prompted in part by an article of 1894 with the same title by Karl Pearson, reprinted in book form here.
Pearson does not use the exact phrase ‘group selection’, but does use the terms ‘intra-group selection’ and ‘extra-group selection’. Intra-group selection is selection within a group resulting from competition between its members. Extra-group selection (meaning literally outside-group selection) could mean selection between individual members of different groups, but it is clear from the context that Pearson intended it to mean primarily selection between groups as a whole. Pearson regarded himself as a socialist as well as a good Darwinian, and was keen to rebut claims that socialism was incompatible with natural selection. Pearson argued that as human society becomes more advanced, competition and selection within groups becomes less important, as it gives way to co-operation and collective action, whereas competition and selection between groups (tribes, nations or races) becomes even stronger.

These early writers on group selection seldom gave much attention to the problem raised, but not solved, by Charles Darwin in the Descent of Man: if the qualities promoting group success, such as co-operation and self-sacrifice, conflict with individual success within the group, how is the conflict resolved? Bernard Bosanquet’s essay does however at least address the problem. His answer is essentially that there is no conflict. As society evolves, it creates a new selective environment for individuals, and this favours co-operation: ‘the struggle for existence has, in short, become a struggle for a place in the community; and these places are reserved for those individuals which in the highest degree possess the co-operative qualities demanded by circumstances’ (p.294). One may wonder if this assertion would stand up to close empirical testing, but Bosanquet deserves credit for recognising a problem generally ignored.



W. C. Allee: Animal Aggregations: a study in general sociology, 1931

W. C. Allee, A. E. Emerson, et al.: Principles of Animal Ecology, 1949

A. M Carr-Saunders: The Population Problem, 1925

C. S. Elton: Review of Wynne-Edwards, Nature, 1963, 197, p.634

R. A. Fisher: Genetical Theory of Natural Selection: a complete variorum edition, ed. Henry Bennett, 1999.

J. B. S. Haldane: The Causes of Evolution, 1932, Princeton UP edition 1990

W. D. Hamilton: Narrow Roads of Gene Land, vol. 1, 1996

Vernon Kellogg: Darwinism Today, 1907

David Lack: The Natural Regulation of Animal Numbers, 1954

David Lack: Population Studies of Birds, 1966

John Maynard Smith: ‘Group Selection and Kin Selection’, Nature, 1964, 201, p.1145-7

John Maynard Smith: ‘Reply to Sober’ in The Latest on the Best: Essays on Evolution and Optimality, ed. John Dupre, 1987

August Weismann: Lectures on the Evolution Theory, English edn., 1904

George C. Williams: Adaptation and Natural Selection, 1966

E. O. Wilson: Sociobiology: the New Synthesis, 1975

Sewall Wright: Evolution: Selected Papers, edited and with Introductory Materials by William B. Provine, 1986

V. C. Wynne-Edwards: Animal Dispersion in Relation to Social Behaviour, 1962

V. C. Wynne-Edwards: ‘Intergroup selection in the evolution of social systems’, Nature, 1963, 200, 623-6

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

A while ago I pointed out a discussion in R. A. Fisher’s Genetical Theory of Natural Selection which showed a pretty clear understanding of the concept of inclusive fitness, and shortly after that I mentioned a passage in his published correspondence that seemed equally clear.

While consulting GTNS for another purpose I have noticed another small example. In his section on ‘reproductive value’ Fisher considers how the ‘direct’ reproductive value of a woman, measured by the ‘present value of their future offspring’ will vary with age, increasing as the child passes through the dangers of high infant and child mortality, reaching a peak in early adulthood, then declining as the years of childbearing go by, and reaching zero around age 50 with the end of reproduction. But Fisher adds in passing that ‘the reproductive value of an older woman… is undervalued [in Fisher's calculations] in so far as her relations profit by her earnings or domestic assistance’ [p.29]. The use of the term ‘relations’ rather than ‘children’ implies recognition of one of the essential points of inclusive fitness, that the evolutionary ‘value’ of an individual includes their contributions to the reproduction of close genetic relatives (not necessarily offspring).

Altogether in GTNS Fisher gives at least seven examples of inclusive fitness:

1. Suckling of children[27]

2. The help a post-reproductive mother gives to her children[27]

3. The inhibiting effect of a foetus or baby on further conception[27]

4. The services of neuter insects to their queen[27]

5. The services of post-reproductive women to their ‘relations’[29]

6. The protective effect of distastefulness on the siblings of gregarious insect larvae[159]

7. The evolution of reproductive specialisation in social insects[186].

While of course the main credit for developing the theory of inclusive fitness should still go to W. D. Hamilton, Fisher’s various comments show that he had more than an inkling of the theory.

All page references are to R. A. Fisher: Genetical Theory of Natural Selection: a complete variorum edition, ed. Henry Bennett, 1999.


(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

I have previously written several posts on the work of George Price. This one contained general reflections on Price’s reputation; this one attempted to explain how he arrived at the first (1970) form of his famous equation; this one did the same for the later (1972) version of the equation; and this one one attempted to give a more intuitively satisfying account of the ‘covariance’ term in the equation.

I said I would conclude with some general comments and criticisms. This post will comment on aspects of the equation except its application to the problem of group selection, while the final post in the series will cover that.

The full equation
Price’s 1970 paper contained two versions of the equation, a short form:

[1] dQ = cov(z,q)/Z

and the full version:

[2] dQ = cov(z,q)/Z + Sz[i]dq[i]/NZ.

(For an explanation of the notation see the previous posts.)

Dealing first with the full version, its main virtue is to analyse the change in a gene frequency (or other properties of a population) rigorously into two components: one of which depends on the covariance between the fitness of individuals in the ‘parent’ generation and their own possession of the gene in question, and the other of which depends on any change of frequency of the gene among the offspring as compared with their own parents.

I think this is a genuinely important contribution to our understanding of selective processes. As Price himself pointed out, it is in principle applicable to changes other than those of gene frequencies, or even outside biology altogether. (These applications require some reinterpretation of the terms involved.) For example, if we wish to analyse an increase in average height in a population (excluding migration), this can be broken down into any covariance between the height of individuals and their number of offspring, and any tendency for offspring to be taller than their own parents. The two components may work in the same direction or in opposite directions, as would be the case if tall parents have fewer offspring than short parents, but offspring are taller than their parents. Price’s Equation provides a clear way of analysing such issues.

A particularly interesting application of the equation has been made by Steven A. Frank. Suppose we take the property of interest (represented by q in the equation) to be fitness itself. The equation can then take the form:

[3] dZ = cov(z,z)/Z + Sz[i]dz[i]/NZ.

The first term on the RHS now looks distinctly odd. What on earth is the covariance of fitness with itself? Odd though it may seem, it is mathematically quite legitimate, and if the term cov(z,z) is expanded using the definition of covariance it will be seen to equal S(z[i] – Z)(z[i] – Z)/N. But this is the same as the variance of z. Equation[3] may therefore be interpreted as showing that the change in average fitness between two generations depends on two quantities: the variance in fitness among the parent generation, and the change (if any) in average fitness between parents and their offspring. Some readers will have noticed a similarity in this proposition to R. A. Fisher’s Fundamental Theorem of Natural Selection. Steven A. Frank has shown that with an appropriate interpretation of the FTNS, it can be simply and elegantly derived from Price’s Equation [Frank p.20-25]. Anything that makes this possible must be regarded as a notable contribution to biological theory.

My only reservation about this contribution is that the conceptual breakdown of the change between the two components does not necessarily correspond to different causal processes, because the same process may sometimes contribute to both components. I fell into this trap myself in an earlier post when I said that the full equation ‘makes a sharp distinction between the effect of a gene on the fitness of the individual organism and any changes in gene frequency occurring solely during the transmission of genes from parents to offspring. These two types of effect can therefore be divided neatly between the two components in the right hand side of the equation.’ To see that this is misleading, consider the case of segregation distortion. This may happen because in individuals who are heterozygous for the relevant gene, the ‘male-producing’ allele manages to disable gametes from the same parent which carry a ‘female-producing’ allele. In this case the gene may have two effects: it will certainly increase the proportion of males among the offspring, but it may also reduce the overall number of offspring, since fewer viable gametes are produced. (See Ridley p. 172 for an actual example.) The same causal process (the sabotage of ‘female-producing’ gametes) therefore affects both components in the equation simultaneously.
The short form
Turning to the short form of the equation, dQ = cov(z,q)/Z, this is strictly valid when there is no change in gene frequency between individual parents and their own offspring, and a good approximation if such changes are confined to small random fluctuations in a large population. Whether it is actually useful, either for practical or conceptual purposes, is more debatable. Price himself, in his 1970 paper, said that ‘Recognition of covariance… is of no advantage for numerical calculation, but of much advantage for evolutionary reasoning and mathematical model building’. Biologists have differed in their opinions on this. Few paid the equation much attention until the mid-1980s, which is a long time for a supposedly valuable tool to go unused. This may be contrasted with the rapid and widespread use of evolutionary game theory, which Price also helped create in the early 1970s. Some very distinguished evolutionary biologists, including W. D. Hamilton, Alan Grafen, and D. C. Queller, claim to have found the ‘covariance’ approach to selection enlightening, but on the other side John Maynard Smith, in the interview mentioned in my last post, had no time for it. This may be, as JMS himself suggested, because scientists’ minds work in different ways.

I am hardly qualified to take sides in this debate, but I will. As I pointed out in my last post, there is nothing inherently surprising about finding some connection between covariance and selection. If one analyses the concept of covariance itself more closely, the equation is dangerously close to triviality. It is neat, and ingenious, but does it give any insight into selective processes that we could not get more easily in other ways? Price himself thought that the equation could be especially useful when the covariance is expressed as ‘regression times variance’. In these terms the short form of the equation can be expressed as:

[4] dQ = reg(z,q)Vq/Z

where reg(z,q) is the regression of z on q, and Vq is the variance of q.

The change in gene frequency resulting from selection is therefore proportional to the regression of z on q. This can be represented diagramatically as the slope of a line through a scatter plot, where fitness is measured along one axis and the gene frequency of individuals along the other. Price claimed that ‘at any stage in constructing hypotheses about evolution by natural selection, one can visualize such a diagram and consider whether the slope really would be appreciably non-zero under the assumptions of the theory’. But this seems a very roundabout way of achieving something that could usually be reached more directly by considering the relative fitness of genotypes (which must be known or assumed for such a diagram to be constructed). I just don’t see any real advantage in Price’s approach.
Applications to inclusive fitness
Another claim about the equation in its short form is that it helps clarify, or even supersedes, W. D. Hamilton’s inclusive fitness theory. Oren Harman’s biography of Price implies that Hamilton got some important ideas for developing the theory from Price. There is some basis for this view, as Hamilton admired and championed Price’s work, and made some use of the short form of Price’s Equation in his 1970 paper on ‘selfish and spiteful behaviour’. (Hamilton later made use of the ‘multilevel’ version of the equation in an attempt to clarify the issues raised by group selection, but that is not covered in the present post.)

It is therefore undeniable that the equation can be used in formulating the theory of inclusive fitness. But one may still ask the historical question how far the equation actually helped Hamilton develop the theory; and the practical question whether the equation is useful for understanding inclusive fitness.

After his initial presentation of inclusive fitness theory in 1963 and 1964, Hamilton made four notable corrections or extensions of the theory:

1. He recognised that in the case of haplodiploids his original measure of relatedness was incorrect. He corrected it in a short paper published in 1971, after he became aware of Price’s work, but he did not refer to Price and there is nothing to suggest that Price’s Equation had any influence on this particular point.

2. He recognised that in cases where one individual is inbred and the other is not, or where one is more inbred than the other, then the appropriate measure of relatedness for the purpose of Hamilton’s Rule must be asymmetrical, and is technically a regression rather than a correlation coefficient. Hamilton introduced a revised coefficient in the 1970 paper which first made use of Price’s Equation [Narrow Roads, 177-82], and this may have led some to suppose that it was Price’s work which prompted the change (see e.g. Harman’s biography of Price, p 208). But Hamilton has said elsewhere (Hamilton, Narrow Roads, vol. 2, p.99) that he recognised the need for a correction ‘a year or two after’ his 1964 paper, which was long before he came in contact with Price. Indeed, he had recognised even in his first (1963) paper on altruistic behaviour that in principle a regression coefficient is needed [Narrow Roads, vol. 1, p.7], but he considered at that time that Sewall Wright’s measure of relationship, which is a correlation coefficient, was a good enough approximation in most circumstances.

3. Hamilton’s original work did not allow for ‘spite’, that is, for behaviour which is harmful to the individual’s fitness but even more harmful to others. His 1970 paper rectified this. Again, it is tempting to suppose that Price’s work influenced this development, and Harman (p.223) implies this, but Hamilton’s own 1970 paper says that the two men reached their conclusions independently. The basic point about ‘spite’ can be made with a very simple example: a gene which caused an individual to kill some of its own offspring, but all the other members of the population, might at a stroke greatly increase its frequency in the population. The point can also be made within the framework of the original version of Hamilton’s Rule, provided one is aware that Wright’s coefficients of relationship can be negative. (See my post here.) If we put the Rule in the form:

[5] br – c > 0

where b is the ‘benefit’ to the recipient of the action, c is the cost to the actor, and r is the measure of relationship between them, it is evident that if r is negative then b, the ‘benefit’, may also be negative (i.e. spiteful) and still satisfy the Rule.

4. In a paper published in 1975 (based on a lecture given in 1973) Hamilton pointed out that altruism can spread if there is a positive association of altruists in the population, whether this is due to genetic relatedness (common ancestry) or to genetic similarity arising from other reasons (such as habitat preference). This is in principle an important extension of inclusive fitness theory. The 1975 paper is strongly influenced by Price’s Equation in its ‘multi-level’ form of 1972, but the basic point could also be made using the simple covariance formula. Anything that causes a positive covariance between fitness and the frequency of a gene in groups of a population could increase the frequency of a gene, and it is not difficult to see that factors other than common ancestry could in principle give rise to such a covariance. Price’s work may well have given Hamilton the inspiration for his treatment of this point in the early 1970s, but it is worth noting that Hamilton’s 1964 paper already recognised the possibility of extending the concept of inclusive fitness to cover genetic similarity for reasons other than common ancestry: ‘if some sort of attraction between likes [i.e. between genetically similar individuals] for purposes of co-operation can occur, the limits to the evolution of altruism… would be very much extended’ [Narrow Roads, vol 1, p54]. This is the origin of the ‘Green Beard’ concept.

Overall, I do not think that the short version of Price’s Equation played a major part in the development of inclusive fitness theory by Hamilton in the 1970s. However, it may still be argued that is useful for further work on that theory. Alan Grafen, one of the most brilliant recent evolutionary theorists, has made much use of Price’s work, and in 1985 showed how Price’s Equation could be used to derive a version of Hamilton’s Rule itself [Grafen 1985]. Grafen defines a certain quantity r in such a way that the frequency of the relevant gene will increase if:

[6] br – c > 0

This is identical with[5] above. Should we therefore say that Hamilton’s Rule can be derived from Price’s Equation? Well, not really. The term r represents different concepts in[5] and[6]. In[5] it stands for a measure of genetic relatedness, while in[6] it stands for an altogether more abstract quantity. It involves a ratio between two covariances, one of which measures the connection between possession of a gene and the frequency with which a given individual performs a costly social act, and the other of which measures the connection between possession of that gene and the frequency with which the same individual receives the benefit of such acts. The underlying model of the selective process is quite different from that in Hamilton’s Rule. The terms of Hamilton’s Rule represent the costs and benefits of a particular social act. The Rule states the condition under which that act will tend to increase the frequency of a particular gene which gives rise to the act. In contrast, Grafen’s derivation of[6] considers the totality of social interactions in a population, and states a condition under which the possessors of a particular gene will have an average fitness higher than that of non-possessors. This condition has no obvious connection with genetic relatedness. It is true that, as Grafen shows, Hamilton’s Rule in the ordinary sense complies with the condition. If individuals, on average, are sufficiently closely related by ancestry to the individuals with whom they interact for Hamilton’s Rule to be satisfied, then they will also meet the condition stated in Grafen’s model. But this cannot be deduced from Price’s Equation alone: it requires an elaborate analysis of the nature and measurement of relatedness. It is highly unlikely that anyone armed only with Price’s Equation would ever have hit on the expression[6] unless they had already arrived at something equivalent to Hamilton’s Rule by more direct means. It is arguable that Grafen’s version is both more rigorous and more general than Hamilton’s Rule in its usual interpretation (more rigorous because it is ultimately a mathematical tautology; more general because it allows for covariance due to factors other than common ancestry), but unlike Hamilton’s Rule in the ordinary sense it cannot directly be applied to any actual biological situation.
Summing up my subjective opinion on the value of Price’s Equation (apart from its use in ‘multilevel’ selection theory, and in areas outside biology), I think that the full version of the equation does make a notable contribution to evolutionary theory, but I am not convinced that the short version is as valuable as the enthusiasts claim. It has so far been used mainly by theorists with a taste for abstraction and mathematical elegance. I think that mathematical elegance in itself is less useful in biology than in physics, where it is often a clue to underlying laws of nature or to new phenomena, as for example Maxwell’s field equations led to the discovery of electromagnetic waves, and Dirac’s Equation led to the discovery of the positron. There is probably nothing comparable to this in biology; for example, Fisher’s comparison of his Fundamental Theorem of Natural Selection to the Second Law of Thermodynamics is quite misleading. The FTNS, when properly interpreted, is a mathematical tautology; whereas the Second Law of Thermodynamics is a fact of nature: we can easily imagine a world in which it would be false. So far as I know, Price’s Equation in its short form has not yet helped solve any hitherto unsolved problems in biology or led to any new empirical findings. Perhaps it will in future, but until then I don’t think we should get too excited about it.
G. R. Price, ‘Selection and covariance’, Nature, 227, 1970, 520-21.
G. R. Price, ‘Extension of covariance selection mathematics’, Annals of Human Genetics, 35, 1972, 485-90.
Steven A. Frank, Foundations of Social Evolution, 1998.
Alan Grafen: ‘A geometric view of relatedness’, Oxford Surveys in Evolutionary Biology, vol. 2, 1985, p.28-89.
W. D. Hamilton: Narrow Roads of Gene Land, vol. 1, 1996; vol. 2, 2001.
Oren Harman: The Price of Altruism: George Price and the Search for the Origins of Kindness, 2010.
Mark Ridley: Mendel’s Demon: Gene Justice and the Complexity of Life, 2000 (UK paperback edition.)


(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

In previous posts here and here I have discussed how George Price arrived at his eponymous Equation. I said that there would be one more post of comment and criticism, but it will take at least two posts to cover the points I want to make.

The present post is aimed at getting a better understanding of the ‘covariance’ element in Price’s Equation. A common reaction to this is that it is somehow mysterious and intuitively hard to grasp. There are some interesting comments in this video talk by the late John Maynard Smith, who claimed that he could never understand the ‘variance and covariance’ approach to population genetics. The problem was not that he could not follow the mathematics, but that he just could not think in those terms.

I believe that the supposed mysteriousness of Price’s Equation is largely due to the way it is usually presented and explained. I hope the following way of interpreting is intuitively more transparent and satisfying.

First it is necessary to recall that Price’s Equation comes in several versions. Here I will deal mainly with the ‘short’ version of what I have called the First Equation, from Price’s 1970 paper, in the form:

dQ = cov(z,q)/Z.

This contains only the ‘covariance’ term of the full equation. It is valid when there is no change in gene frequency between individual parents and their own offspring, and useful as an approximation if such changes are confined to small random fluctuations in a large population. In what follows I will assume that the short version holds exactly, but the interpretation given will be applicable, with minor modifications, to the covariance term in the other versions of the Equation.

There is nothing inherently odd about a covariance being involved in measuring the effects of selection. Whenever we want to measure the strength of a relationship statistically, then a covariance is likely to be involved, either in its own right or as a component in a correlation or regression formula. In Price’s case we are interested in the relationship between the fitness of organisms and their possession of certain genes. It is not surprising that it should be possible to describe this by means of a covariance. What is more surprising is that the change in frequency of a gene can be described simply and concisely in this way.

I do not suggest that this element of surprise can be removed entirely, but I think the meaning of the covariance can be made to emerge more naturally if we reverse the process of derivation (as presented by Price) and start by considering the end-product: the change in gene frequency.

The aim is to find a way of expressing the change of frequency of a gene between two generations. I make the usual simplifying assumption of separate generations. It is also convenient to avoid complications about ploidy (whether the organism is haploid, diploid, etc.) by interpreting the population number in a given generation as the total number of genes at the relevant locus rather than the number of organisms. A ‘parent’ can therefore be interpreted as a gene in Generation 1 (G1), and an ‘offspring’ as a replica of that gene in Generation 2 (G2).

With these assumptions, let us designate the total number of genes at a locus in G1 as N, and in G2 as N*. We designate the total number of genes of a particular type at the locus in G1 as A, and in G2 as A*. The frequency of that gene-type (which I will call the A-type) in the population in G1 is Q = A/N, and in G2 is Q* = A*/N*. The change, if any, of the frequency of the A-type gene between the two generations is therefore:

(1) dQ = Q* – Q = A*/N* – A/N.

The expression A*/N* – A/N can be manipulated in various ways, most obviously to get a fraction with a common denominator in the form:

(2) dQ = (A*N – AN*)/NN*.

However, it is more useful for comparison with Price’s Equation to find an expression with the common denominator N*, the population number of G2 [note 1]. To do this we can first express N* as a multiple of N, in the form:

(3) N* = kN.

The coefficient k can be regarded as a measure of population growth (or decline), so, for example, if the population grows by 20% then N* will be (1.2)N. Obviously k = N*/N.

With this usage, Q = A/N = kA/kN = kA/N*. Since Q* = A*/N* it follows easily that:

(4) dQ = Q* – Q = A*/N* – kA/N* = (A* – kA)/N*.

The final expression on the right is the desired formula with N* as the common denominator. The most interesting term in (4) is kA. From the definitions this is equal to QN*, which is the population of G2 multiplied by the proportion of A-type genes in G1. It can be interpreted as the number of A-type genes that would be expected in G2 if A-type genes reproduce at the same rate as the whole population, in other words, if there is no selection for or against them. The difference between the actual and the expected number of A-type genes in G2 is equal to (A* – kA), which can be interpreted as the surplus or deficit of A-type genes in G2 attributable to selection. When divided by N*, this gives the difference between the actual and expected frequencies of A-type genes in G2. Since the ‘expected’ frequency, in the absence of selection, is simply Q (the frequency in the previous generation), this also represents the change in frequency between generations, Q* – Q.

We now want to find an interpretation of Price’s Equation which mirrors the quantities expressed in (4). Suppose we consider the covariance between fitness (number of ‘offspring’) of individual genes in G1 and the frequency of the A-type in those genes (which can only be 1 or 0, since an individual gene either is or is not A-type). Using one of the standard formulae for a covariance, this can be expressed as:

(5) cov(z,q) = (Sz[i]q[i])/N – ZQ

where S is a summation sign, summation is over all members of G1, z[i] is the number of ‘offspring’ of the i’th gene in G1, q[i] is the frequency of A-type in the i’th gene in G1 (either 1 or 0), N is the ‘population’ size in G1, and Z and Q are the mean values of z and q (the totals Sz[i] and Sq[i] divided by N).

To derive Price’s Equation in its usual form we need to eliminate N from the RHS of (5). If we multiply both sides of (5) by N, and note that NZ = Sz[i], we can get:

(6) cov(z,q)N = Sz[i]q[i] – Sz[i]Q.

We are assuming that the frequency of A-type genes does not change between individual parents and offspring, so the term Sz[i]q[i] can be interpreted as the actual total number of A-type genes in G2. The term Sz[i]Q can be interpreted as the total number of A-type genes that would be expected in G2 if the population frequency of A-type genes does not change from G1, where the frequency is Q. The expression on the RHS can therefore be interpreted as the total actual number of A-type genes in G2 minus the total expected number of A-type genes in the absence of selection. But recalling (4) and its interpretation, this is equal to (A* – kA) as defined earlier. Since by (4) dQ = (A* – kA)/N*, we can therefore divide (6) by N* to get:

(7) dQ = cov(z,q)N/N* = (Sz[i]q[i])/N* – Sz[i]Q/N*.

A little consideration shows that N* = NZ = Sz[i], the total of all ‘offspring’ of G1, so with appropriate substitutions we can finally get:

(8) dQ = cov(z,q)/Z = (Sz[i]q[i])/Sz[i] – Q.

The equation dQ = cov(z,q)/Z is Price’s First Equation in its short form. This now has a natural interpretation in terms of equation (4) above. Cov(z,q)/Z, and the equivalent expression (Sz[i]q[i])/Sz[i] – Q, can be interpreted as the surplus or deficit of A-type genes in G2 attributable to selection, as a proportion of the total number of genes at the relevant locus in G2. The components of (Sz[i]q[i])/Sz[i] – Q correspond neatly to the components of Q* – Q, since (Sz[i]q[i])/Sz[i] can be seen to be equal to Q*, the total number of A-type genes in G2 as a proportion of all genes at the relevant locus, on the assumption that q[i] does not change between generations.

It may be thought that the concept of covariance has played very little part in this interpretation. It would be possible in principle to reach the equation:

(9) dQ = (Sz[i]q[i])/Sz[i] – Q

from the definitions and assumptions without referring to the concept of covariance at all, and it might be wondered whether the equivalence of (9) to cov(z,q)/Z is more than a curious mathematical coincidence. To see that it is more than this, we may note that a covariance itself involves a difference between actual and expected values.

If we consider in general the covariance between variables x and y, it may be represented in a variant of one of the standard formulae as:

(10) cov(x,y) = (Sx[i]y[i] – NXY)/N.

In this expression Sx[i]y[i] is the sum of the actual values of the products xy, for some definite pairing of the variables x and y, whereas NXY (where X and Y are the mean values of x and y) is the sum of the expected values of N products xy if x and y values are paired randomly [note 2]. The difference between these two quantities therefore represents the departure (if any) of the actual product sum from its expected value under random pairing. This may be compared with Sz[i]q[i] – Sz[i]Q in expression (6) above, where the first term gives the actual number of A-type genes in G2 (assuming no change in frequency between individual parents and their offspring) while the second term gives the expected number assuming no selection. ‘No selection’ is in this context another way of saying that values of z and q are paired randomly. ‘Selection’ is precisely a departure from randomness. The connection between selection and covariance is therefore not merely a formal coincidence.

The only thing that prevents the change in gene frequency being identical with a straightforward covariance is the possibility of a change in the size of the population between G1 and G2. The problem is then that the summation of product terms in the covariance is over members of G1 [see note 3], so the definition of covariance requires division by N, whereas the difference in gene frequencies requires division by N*. As N* = NZ, this may be achieved by dividing the covariance itself by Z. If there were no change in the size of the population, then Sz[i] would be N, Z would be 1, and (8) would be a straightforward covariance between z and q with N as the denominator. In a static population the covariance would therefore directly measure the effect of selection on the frequency of a gene.

In the more general case of a growing or declining population the covariance by itself would not give the right measure. If the population has grown, then the covariance would overstate the change in gene frequency, and division by Z (which in this case is greater than 1) will scale it down; and conversely if the population has shrunk. Another way of interpreting the formula is to take Z as a divisor, not of the covariance itself, but of all the z values. The various terms z[i]/Z can then be regarded as ‘standardised’ values of z, and the formula can be regarded strictly as a covariance between q and the standardised z. This is an interpretation suggested by Price himself in his 1972 paper, where he introduces a ‘relative selection coefficient’ which in the notation of the 1970 paper would be z[i]/Z. The resulting value of dQ is of course the same however we interpret it.

I hope (but doubt) that this has done something to dispel the air of mystery about Price’s Equation. There remains the question whether it is actually as useful or important as its enthusiasts claim. I will consider this in another post.

Note 1

It would also be possible to derive a formula with N as the denominator, but this would not lead easily to Price’s Equation in its usual form.

Note 2

It is perhaps obvious that if pairs of x and y values are taken at random and multiplied together the average value of the products, in a long run of trials, will be XY. For a more formal argument, consider the set of all possible products of x and y pairs. There are NN such products, with a total value of SxSy, since in the total every x is multiplied by every y. The mean value of all possible pairs is therefore SxSy/NN = XY. A random sample of N such pairs will therefore have an expected value of N times the mean value, or NXY.

Note 3

It might be wondered if there is any way of developing Price’s Equation as a covariance between properties of offspring (G2), rather than members of the parental generation (G1). I think there would be several difficulties with this, the clincher being that not all members of G1 have offspring, so that a covariance between properties of members of G2 would leave out an important part of the selective process.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

In a previous post I discussed what I called Price’s First Equation, as contained in George Price’s 1970 paper ‘Selection and covariance’. The present post deals with the 1972 paper ‘Extension of covariance selection mathematics’ (so far as I know, not currently available free online.) The main result of the 1972 paper is often known simply as Price’s Equation, but to avoid ambiguity I will call it Price’s Second Equation. As in the previous post, my aim will be to help people follow Price’s own derivation, which uses the relatively unfamiliar concept of a weighted covariance. Comment and criticism will be reserved for another post. This one won’t be easy reading, but people who aren’t interested in the technical details may still find a few points worth noting.

Price’s First Equation, in the notation explained previously, stated that:

dQ = Cov(z,q)/Z + Sz[i]dq[i]/NZ

This equation breaks down the change in frequency (if any) of a gene from one generation to the next into two components: (a) the effect of the gene on the reproductive success of members of the parent generation; and (b) the change in frequency of the gene between individual parents and their own offspring.

In the 1970 paper Price said that he was working on an extension of this approach to the problem of group selection. The First Equation itself can be applied to groups rather than individuals, with only slight changes of definition and interpretation, provided the groups are all of the same size.

The main technical contribution of the 1972 paper is to show how the Equation can be generalised to groups of differing size.


As previously, I will avoid Greek letters and other special typography, which may not be compatible with some browsers. Where Price uses Greek capital sigma to indicate summation, I will use capital S. In the 1972 paper capital sigma always has a subscript i, which can be taken as implied here. Where Price uses Greek capital delta to indicate an increase or decrease in a quantity, I will use the letter d. Where Price uses a bar over a letter to indicate the mean (average) value of a variable, I will use the capital form of the letter; so, for example, the capital letter W will indicate the mean value of the variable w. Where Price uses subscripts, I will use square brackets, so for example where Price has w with a subscript i, I will use w[i]. It would be possible to shorten the formulae by omitting most ‘subscripts’, and making some other notational changes, but I have decided to stick as closely as possible to Price’s formulae to facilitate comparison. I have however replaced ‘primes’ by asterisks, as I found that when I first posted this the primes came out inconsistently (and difficult to read). The expression cov(x,y) stands for the covariance between x and y. (See the previous post for the meaning of ‘covariance’.) The numbers [A1], [A2], etc, are Price’s own numbered equations or definitions. Other numbers in brackets are my own.

Weighted averages and covariances

The 1972 paper uses the concepts of weighted average and weighted covariance. The concept of a weighted average (mean) is relatively familiar, and was treated in standard works on statistics available to Price (e.g. Yule and Kendall, 332). Consistently with the usual treatment, Price defines the weighted average of a variable x as:

[A1] ave[w]x = (Sw[i]x[i])/Sw[i]

where and w[i] stands for a ‘weight’ or ‘weighting variable’. Price mentions two well-known properties of weighted averages: if all w[i] are equal, then the weighted average has the same value as the ordinary (unweighted) average; and if all w[i] are multiplied by the same constant, the value of the weighted average is unchanged. Another fairly obvious property is that provided all weights are positive (as is usually assumed), then the weighted average will fall somewhere within the range of the x values.

Weighted averages have various uses. For example they may be used to give different weight to data that differ in reliability or importance. Perhaps the most common use of weighted averages is to calculate an overall average when averages are known for subgroups of a population but not for the total population itself. If the sizes of the subgroups are used as the weighting factors, the resulting weighted average will be the same as the unweighted average for the total population. For example, if we know the average score of pupils on a test in the various schools of a city, but not the average in the city as a whole, the averages for each school may be weighted by the number of pupils taking the test to obtain the average for the city.

Price states an important result about weighted averages, which he would have known from the textbooks (e.g. Yule and Kendall, 334):

[A7] ave[w]x = X + cov(w,x)/W.

It can be seen that the weighted average will be greater than the ordinary average if the covariance between the weights and the values of x is positive, and less than the ordinary average if it is negative.

The concept of a weighted covariance is much less familiar than that of a weighted average, and on consulting several pre-1972 statistics textbooks I have not found it discussed. Price however treats it as an established concept (see also his more technical paper of 1971), and defines it as:

[A2] cov[w](x,y) = [Sw[i](x[i] – ave[w]x)(y[i] – ave[w]y)]/Sw[i].

This can be compared with the standard formula for an ordinary covariance, cov(x,y) = [S(x[i] – X)(y[i] – Y)]/N. It can be seen that in the weighted covariance the product terms are all weighted by their corresponding w[i]‘s, the ordinary averages of x and y are replaced by their weighted averages, and Sw[i] replaces the population number N.

Price states the following equivalence, which is important in his subsequent derivation:

[A6] (Sw[i]x[i]y[i])/Sw[i] = cov[w](x,y) + (ave[w]x)(ave[w]y).

It can be seen that this corresponds to (Sx[i]y[i])/N = cov(x,y) + XY for ordinary covariances. Proving it from the definitions is just a tedious algebraic grind, which I will omit.

As in the case of weighted averages, the formula for weighted covariance can be applied to cases where the covariance is known for each of the subgroups of a population but not for the total population. But just weighting the covariance within groups by the size of the groups would not necessarily give an accurate covariance for the population as a whole. Covariance in the whole population is relative to the population averages, and these may be different from the group averages. It is quite possible for the population covariance to be greater or less than any of the group covariances, or even to be different in sign [see Note 1]. Suppose, for example, that the ‘groups’ are human males and females. It is likely that within each group the covariance between height and hair length is small but positive, because other things being equal taller individuals in each group are likely to have slightly longer hair, whereas in the total human population the covariance may be negative, because men tend to be taller but to have shorter hair than women. Part of the total population covariance thus only becomes apparent when calculated by reference to the average values for the population as a whole. The weighted covariance formula [A2] allows for this by using weighted averages for the whole population as the ‘baseline’ from which deviations are measured. It can be shown that when this is done, and group sizes are used for weighting, then the weighted covariance in accordance with [A2] is equal to the ordinary (unweighted) covariance for the population.

Price also defines weighted versions of variance, standard deviation, regression, and correlation, but I will not make use of these.

Further definitions

The main derivation in the paper begins with some further definitions. Price gives hints as to what his new terms are intended to designate, but it is probably best in the first instance to regard them as purely formal symbols.

The term p is used to represent the numerical value of some property of population members.

By definition:

[A8] dp[i] = p*[i] – p[i],

d(ave[w]p) = ave[w*]p* – ave[w]p.

Price then says that the term s will be called a ‘selection coefficient’, and by definition s[i] = w*[i]/w[i]. (In my notation there would be an ambiguity between S as a summation sign and S as the unweighted average of s, but the latter does not occur in the derivation.)

It might be thought that the definitions at [A8] are formally defective, as the terms d and * (‘prime’ in Price’s text) have not been previously defined in the paper, and one undefined term is therefore defined by reference to another. However, in the subsequent working the derivations are always carried out strictly in accordance with the definitions, and the result is an equation that is formally valid even though it contains terms that have not yet been fully interpreted. It thus provides a schema that allows valid inferences to be made whatever interpretations are given to the undefined terms.


As in the 1970 paper, after all the preliminary definitions and assumptions have been set out, the actual derivation of the main result is quite brief. I will set out the various steps more fully than the paper, as Price tends to combine three or four operations in each step, which saves space but leaves the reader to do much of the work.

It will first be convenient to note some equivalences which follow easily from the definitions:

(1) p*[i] = p[i] + dp[i] (from [A8])

(2) w[i]s[i] = w*[i] (from the definition of s[i])

(3) Sw[i]s[i] = Sw[i](ave[w]s) (from [A1])

(4) ave[w*]p* = ave[w]p + d(ave[w]p) (from [A8]).

Now, substituting w* and p* for w and x in [A1], we have:

(5) ave[w*]p* = (Sw*[i]p*[i])/Sw*[i].

Using the equivalence (1) we can substitute p[i] + dp[i] for p*[i], giving:

(6) ave[w*]p* = [Sw*[i](p[i] + dp[i])]/Sw*[i].

Following the usual rules for summation we can split the expression on the right into two parts, giving:

(7) ave[w*]p* = (Sw*[i]p[i])/Sw*[i] + (Sw*[i]dp[i])/Sw*[i].

Then substituting w[i]s[i] for w*[i] in the first term on the right, as permitted by (2), we have:

[A10] ave[w*]p* = (Sw[i]s[i]p[i])/(Sw[i]s[i]) + (Sw*[i]dp[i])/Sw*[i].

Using the equivalence (3) we can substitute Sw[i](ave[w]s) for (Sw[i]s[i]) in the denominator of the first term on the right, giving:

(8) ave[w*]p* = (Sw[i]s[i]p[i])/Sw[i](ave[w]s) + (Sw*[i]dp[i])/Sw*[i].

But by [A6], substituting s and p for x and y, we have:

(9) (Sw[i]s[i]p[i])/Sw[i] = cov[w](s,p) + (ave[w]s)(ave[w]p).

We can therefore substitute cov[w](s,p) + (ave[w]s)(ave[w]p) into (8), getting for the first term on the right:

(10) [cov[w](s,p) + (ave[w]s)(ave[w]p)]/(ave[w]s),

which can be simplified to:

(11) cov[w](s,p)/ave[w]s + ave[w]p.

Equation [A10] as a whole is therefore equivalent to:

(12) ave[w*]p* = cov[w](s,p)/ave[w]s + ave[w]p + (Sw*[i]dp[i])/Sw*[i].

By [A1] the final term on the right is equivalent to ave[w*]dp, so we also have:

(13) ave[w*]p* = cov[w](s,p)/ave[w]s + ave[w]p + ave[w*]dp.

But by (4) we have ave[w*]p* = ave[w]p + d(ave[w]p), so we can subtract ave[w]p from both sides of (13) to get:

[A11] d(ave[w]p) = cov[w](s,p)/ave[w]s + ave[w*]dp.

This is the basic form of Price’s Second Equation. Price also proposes a simplification by defining a ‘relative selection coefficient’, s[i]/ave[w]s. He designates this by s with a tilde above it, which I will represent by [s~]. With this convention [A11] can be presented as:

[A12] d(ave[w]p) = cov[w]([s~],p) + ave[w*]dp.

The Second Equation is as yet just a formal identity, with no specific interpretation. As Price emphasises, it is an identity for all values of w[i], p[i], w*[i], and p*[i]. The equation will therefore be valid whatever values or symbols are substituted for these terms.

Application to group selection

The main application which Price himself makes of the equation is to the issue of group selection. He supposes that a large population is subdivided into a number of subpopulations or groups. For simplicity he assumes that generations are non-overlapping and that there is no interbreeding or migration between groups. (He also tacitly assumes sexual reproduction, though it would be easy to adapt the model to cover asexual or self-fertilising reproduction.) He then makes certain definitions:

n[i] is the population of group G[i] in the parent generation.

n*[i] is the population of group G[i] in the offspring generation.

s[i] = n*[i]/n[i]. Price notes that this is the mean number of offspring per parent generation member of group G[i], when each parent is given credit for half of each offspring conceived (assuming sexual reproduction). Without this proviso the mean number of offspring per member would be overstated, since each offspring would be counted twice, once for each parent.

p[i] is the frequency of gene A in group G[i] in the parent generation.

p*[i] is the frequency of gene A in group G[i] in the offspring generation. Price does not specify whether p[i] and p*[i] are frequencies of gene A at a particular locus or among all genes in the genome, but this does not matter provided the same approach is taken for each generation.

P is the frequency of gene A in the total population of the parent generation.

P* is the frequency of gene A in the total population of the offspring generation.

dP = P* – P (by definition).

Price notes that ‘it can easily be seen’ that P = ave[n]p[i], and P* = ave[n*]p*[i]. This does indeed follow from the definition of weighted average. It further follows from the definition of d(ave[w]p) in [A8] (substituting n for w) that:

dP = d(ave[n]p).

The final step is to note that n and n* can be substituted for w and w* respectively in all the previous formal derivations, up to [A12], which is possible because s[i] (in its ‘group’ interpretation) is defined as n*/n, corresponding to w*/w in the previous working. Equation [A11] can therefore be validly restated in the form:

dP = d(ave[n]p) = cov[n](s,p)/ave[n]s + ave[n*]dp

and equation [A12] in the form:

[A15] dP = d(ave[n]p) = cov[n]([s~],p) + ave[n*]dp.

That completes the main part of the derivation. It is of course a very roundabout way of reaching a result on group selection, and later authors have generally gone straight to an interpretation in terms of groups and gene frequencies. Price’s approach does however have the advantage that his equations [A1 - A12] are extremely general, and can in principle be applied to many other problems.

Price goes on to discuss the interpretation of group selection in the light of the Second Equation. I will reserve comments on this for another post. However, while the technical details are still fresh on the page, it may be useful to look at some issues of terminology.

Price himself refers to the term cov[n](s,p)/ave[n]s) as a covariance or weighted covariance, and later authors have usually called it the Covariance term. It is however an odd kind of covariance, and there might be doubts whether it has all the statistical properties of an ordinary (unweighted) covariance, notably for the purposes of correlation and regression. It involves a combination of group and individual properties: summation is ostensibly over the groups, as the [i]‘s are index numbers for groups, but the averages from which deviations are measured are the averages for the total population, and the product sum is divided by Sn[i], which is the total population number, not the number of groups.

I think any doubts can be alleviated by keeping in mind that the weighted covariance, when the weights are the numbers in the groups, is equal to the ordinary covariance for the total population, and has all the usual properties of a covariance in that context. One may still balk at claims of conceptual ‘transparency’. I suggest that purely for the purpose of thinking about group selection it might be better to assume that group sizes are initially equal, so that the covariance term does not need to be weighted at all. The covariance term can then be interpreted simply as a covariance between properties of groups.

The second term, ave[n*]dp, also raises a problem. It has become customary to call it the Expectation term, though this terminology was not used by Price himself and seems to have been introduced by Hamilton in 1975 [Hamilton, 332]. The problem is that ‘Expectation’ or ‘Expected value’ has a traditional meaning in probability theory (see e.g. von Mises, 148; Hacking, 80). In this usual sense, Expectation involves values (utilities) multiplied by probabilities, whereas Price’s term has nothing to do with either utilities or probabilities: it is simply a weighted average of the changes in frequency within groups. It might be better to call it the Transmission term, since it measures the fidelity of transmission of a property between generations. There are some other problems in interpreting it, but I will discuss these in a further post.

Note 1

I have not seen this stated, but it can be shown that if a population is divided into groups, then the covariance between traits in the whole population equals the weighted mean of the covariance within groups, plus the weighted covariance between the group means, using group sizes as the weights. (This corresponds to the better known additive property of within-group and between-group variances.) If the covariance between group means is negative, this may be enough to offset or reverse a positive covariance within groups, or vice versa.


Works by George Price

G. R. Price, ‘Selection and covariance’, Nature, 227, 1970, 520-21.

G. R. Price, ‘Extension of the Hardy-Weinberg law to assortative mating’, Annals of Human Genetics, 36, 1971, 455-58.

G. R. Price, ‘Extension of covariance selection mathematics’, Annals of Human Genetics, 35, 1972, 485-90.

Other works

Steven A. Frank, ‘George Price’s contributions to evolutionary genetics’, Journal of Theoretical Biology, 175, 1995, 373-88.

Ian Hacking: An Introduction to Probability and Inductive Logic, 2001.

W. D. Hamilton: Narrow Roads of Gene Land, vol. 1, 1996.

Richard von Mises: Probability, Statistics and Truth, Dover edn., 1981.

G. Udny Yule and M. Kendall: An Introduction to the Theory of Statistics, 14th edn., 1950.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

In a recent post I said I would write again about the work of George Price. My main aim will be to help people read Price’s own papers. There are various ‘Price guides’ in existence, some of which are very good in their way, but I have not seen any that follow Price’s own treatment at all closely. My own old post on ‘Price’s Equation’ was based mainly on an account by W. D. Hamilton, which differs in several ways from Price’s approach. [Added: a reader has drawn my attention to a recent paper by M. van Veelen. I have now seen this, and it does follow Price's treatment quite closely. See comments for the reference. The published version of the paper gives more detail than the preprint which I mention in the comments.]

Price published only six biological papers in his lifetime, two of them co-authored (for details see References). Two of the six items (nos. 2 and 5) are on relatively technical issues. The paper co-authored with John Maynard Smith on evolutionary game theory is more general, but self-explanatory. The remaining paper, on Fisher’s Fundamental Theorem, is also important, but I would need to consider it in conjunction with Fisher’s own writings on the subject. Maybe some other time.

That leaves the 1970 paper ‘Selection and covariance’ and the 1972 paper ‘Extension of covariance selection mathematics’. Here I will deal with the 1970 paper, which is currently available online here. The next post will cover the 1972 paper. A third post will contain some assessment and criticism. One of the main claims made for Price’s work is that it provides a clear way of analysing the effects of ‘group’ and ‘individual’ selection. I think these claims need to be qualified, not only for the reasons indicated in my old post on ‘Price’s Equation’, but for a further reason which only struck me recently when working through Price’s derivations using some numerical examples.

The main result of the 1970 paper, in notation which will be explained shortly, is an equation stating that:

dQ = Cov(z,q)/Z + Sz[i]dq[i]/NZ

In words, this can be loosely translated as:

The change in frequency (if any) of a gene from one generation to the next is composed of two elements: (a) the effect of the gene on the reproductive success of the parent generation, as measured by the connection between possession of the gene and the number of offspring; and (b) the change in frequency of the gene between individual parents and their own offspring.

As presented in words, the equation may seem almost a platitude. Of course the change in frequency depends on these two factors: what else could it depend on? But even in these loose verbal terms, the result has some value in focusing on the two distinct, and possibly conflicting, elements.

The mathematical equation, which uses the precise statistical concept of covariance, is more important than the verbal formulation. Price himself considered it literally miraculous that someone like himself, with no training in statistics and genetics, who previously ‘did not know a covariance from a coconut’, should have discovered such a fundamental relationship (Hamilton 322-3). The equation had in fact been partially anticipated by other writers, but it was not widely known. C. A. B. Smith, an experienced geneticist and statistician, told Price ‘he had never seen anything like it before’ (Harman 210). [Added: W. D. Hamilton has said that 'Central to Price's approach was a covariance formula the like of which I had never seen... The formula was easily checked to be correct, yet the approach by which it came obviously owed nothing to any previous account of selection theory I knew of.' (Hamilton, 172)] The Editor of Nature initially rejected Price’s paper as ‘too hard to understand’ (Harman 224). As late as 1997 John Maynard Smith still claimed, not entirely in jest, that he ‘did not understand Price’s Theorem’.

Whether favourable or not, these reactions suggest that Price’s work is somehow obscure and mysterious. I think this is exaggerated. While it is not trivial, it is not exceptionally obscure either.


I shall stick as closely as possible to Price’s own notation, to facilitate comparison with his paper, but for practical reasons will avoid Greek letters and other special typography. Where Price uses capital Greek sigma to indicate summation, I will use capital S. Where Price uses capital Greek delta to indicate an increase or decrease in a quantity, I will use the letter d. Where Price uses a bar over a letter to indicate the mean (average) value of a variable, I will use the capital form of the letter; so, for example, the capital letter Z will indicate the mean value of the variable z. Where Price uses subscripts, I will use square brackets, so for example where Price has n with a subscript g, I will use n[g]. A forest of subscripts makes formulae more difficult to read, and subscripts can often be omitted without danger of ambiguity. However, for consistency with Price’s paper I have decided to retain them. [Added September 21: I have now substituted asterisks for 'primes', which in the WordPress print format were very difficult to read.]

Statistical preliminaries

Price assumes some elementary knowledge of statistics, including the concepts of mean and covariance. The mean of a set of quantities is simply the sum of their values divided by the number of items. The covariance is defined as follows. Suppose there are two variables, x and y, with n values (not necessarily different) for each variable, and each value of x is paired with a value of y in some definite way, for example because they are measurements of different properties of the same individual. Let the values of each variable be indexed with the numbers [i] = 1, 2, … n in such a way that paired values have the same index number. By definition the covariance between the variables for that pairing is Cov(x,y) = S(x[i] – X)(y[i] – Y)/n. The intended scope of the summation sign S is the whole of the expression following it, and a term of the sum is to be formed for each pair of x and y values with the same index number. The value of the covariance will of course depend on the particular pairing of values under consideration; if they were paired in some other way the covariance might be quite different.

An important result used by Price is that Cov(x,y) = Sx[i]y[i]/n – XY. This was well-known in the statistical literature available to Price, but I have given a derivation in Note 1.

The covariance between two variables can be regarded as a measure of the closeness of the relationship between them. For given sets of values, if high values of x are paired with high values of y, and low values with low values, then the covariance will be relatively large. If values are paired at random, then the covariance will be close to zero, while if high values are paired with low values, then the covariance will be negative. The numerical value of the covariance is however affected by differences of scale: for example it will be larger if linear measurements are expressed in inches rather than feet. For this reason statisticians usually prefer a standardised measure, such as a correlation coefficient, which is not affected by changes of scale.

Assumptions and definitions

For simplicity Price assumes that a population of a species has separate generations. (Towards the end of his paper he indicates ways in which this assumption might be relaxed.) In comparing the number of individuals in different generations it is important to count them at corresponding points in their life cycle. Price does not discuss this at length, but his comments imply that the ‘offspring’ of a set of parents include all ‘zygotes conceived’. This suggests that the count of each generation should be taken immediately after conception. This makes good sense in the context of Price’s First Equation, because it makes a sharp distinction between the effect of a gene on the fitness of the individual organism and any changes in gene frequency occurring solely during the transmission of genes from parents to offspring. These two types of effect can therefore be divided neatly between the two components in the right hand side of the equation. If the count were taken at some later stage, say at the hatching of eggs, then the effect of a gene on the fitness of individuals would be smeared between the two components. (Contrast this with the current Wikipedia account, which seems to assume that the ‘count’ of the second generation is taken after a process of selection on the offspring. There is no basis for this in Price’s own treatment, and it conflicts with his statement that ‘if meiosis and fertilization are random with respect to gene A, the summation term at the right will be zero except for statistical sampling effects…’, which would not be the case if the gene undergoes selection in the offspring before the second generation is counted.)

P[1] and P[2] are two generations of a species. P[1] contains all parents of P[2] members, and P[2] consists of all offspring of P[1] members. Note the distinction between ‘contains’ and ‘consists’. It is common for some individuals in one generation to have no offspring, and, as later becomes clear, it is Price’s intention to include such individuals in P[1]. It would in principle be possible to count P[1] only as a generation of actual parents, but if this were the case then the equation would exclude much of the selective effect of a gene.

N is the number of members of P[1]. They are to be labelled with index numbers i = 1, 2, …. N.

It is assumed that each individual has the same number of genes (zygotic ploidy) at a given locus. (Price later indicates how this assumption might be modified for more complicated cases, such as X and Y chromosomes.) The zygotic ploidy of the species for the gene of interest will be called n[z] (where z presumably stands for ‘zygotic’). This n must not be confused with the population number N.

The number of genes of a particular kind, A, in individual i, will be called g[i]. For a haploid locus g[i] must be 0 or 1, for a diploid locus it must be 0, 1, or 2, and so on.

The frequency of genes of a type A, in individual i, will be called q[i]. By ‘frequency’ Price means the proportion of A genes at the relevant locus. From the previous definitions this must be q[i] = g[i]/n[z]. For a haploid the frequency q[i] can only be 0 or 1; for diploids it could be 0, .5, or 1, and so on.

In what follows, summation is always over the N members of P[1], even when properties of P[2] are under consideration. (This is one of the distinctive features of Price’s approach, which at first may make it difficult to follow.) When variables with an [i] subscript are to be multiplied together, the intention is that the i’th member of one set is to be multiplied by the i’th member of the other.

Q[1] is the overall frequency of gene A in population P[1]. Since this is the total number of A genes as a proportion of all genes at the relevant locus, it is Sg[i]/n[z]N, where summation is taken over all the individuals in P[1]. From the definition of g[i] and q[i] this also equals Sn[z]q[i]/n[z]N. In this expression n[z] is a constant and can be cancelled out, leaving Sq[i]/N. But from the definition of a mean this is the mean value of q[i] for population P[1]. In Price’s notation this is q with a bar over it, and in my notation Q. We therefore have Q[1] = Q.

We turn now to the following generation, P[2]. A gamete from a member of P[1] which contributes genes to a member of P[2] is termed a ‘successful gamete’. The number of successful gametes produced by individual i is designated z[i]. (The reason for choosing the letter z is not clear. It could stand for ‘zygote’, or ‘zygote conceived’.) This is said by Price to be the same as the number of i’s offspring. In the special case of self-fertilisation this is not true, in ordinary usage, since there are then two successful gametes but only one offspring, but this oversight (if it is one) does not affect the validity of Price’s derivation. It is assumed that the gametes contributing to P[2] all have the same ploidy at the locus of gene A, which will be designated by n[g] (where g presumably stands for ‘gametic’). Gametes are usually haploid, but diploid, triploid, etc, gametes are possible. The number of A genes among all of i’s successful gametes is designated g*[i] (note the asterisk after g; asterisks are used here to designate quantities in P[2] compared with P[1]). Since individual i has z[i] successful gametes, with a ploidy of n[g], the total number of genes at the relevant locus among all of i’s successful gametes is z[i]n[g]. The frequency (proportion) of A genes at the relevant locus among all of i’s successful gametes is designated q*[i]. In cases where the number of i’s successful gametes is not zero, the frequency q*[i] is evidently g*[i]/z[i]n[g]. When z[i] is zero, the concept of frequency appears not to apply, and the formula g*[i]/z[i]n[g] would require division by zero, which is invalid. However, for further working Price wants q*[i] to be defined even when z[i] is zero, and he stipulates that in this case, q*[i] = q[i]. This seemingly strange stipulation turns out to be workable in the further stages of the derivation.

I note in passing that Price’s use of the letter ‘g’ to refer both to ‘number of genes’ and to ‘gametic ploidy’, and his use of the letter ‘z’ to refer both to ‘zygotic ploidy’ and to the ‘number of i’s successful gametes’, is unfortunate. While in their context these uses can always be distinguished, one would think that with 26 letters to choose from Price could have found clearer alternatives. The confusion is compounded when several commentators, following Steven A. Frank, use the letter ‘z’ for an entirely different purpose, and in some cases (though not that of Frank himself) assert that this was Price’s usage.

As a final group of definitions, dq[i] = q*[i] – q[i], the change in frequency of A between a parent and its own offspring; Q[2] is the frequency of gene A in population P[2]; and dQ = Q[2] – Q[1], the change in the overall frequency of gene A between the two populations.

As a summary of all these definitions and equivalences, we have:

P[1]: the first generation, containing all parents of P[2], but also including any individuals with no offspring.
P[2]: all offspring of members of P[1].
N: the number of members of P[1].
i: an index number of a member of P[1].
Gene A: the particular gene type we are interested in.
n[z]: the number of genes (zygotic ploidy) at the locus of gene A in a member of P[1].
g[i]: the number of genes of type A in individual i.
q[i]: the number of genes of type A as a proportion of all genes at the relevant locus in individual i.
q[i] = g[i]/n[z].
g[i] = n[z]q[i].
Q: the mean value of q[i], by definition equal to Sq[i]/N.
Q[1]: the overall frequency of gene A at the relevant locus in P[1].
Q[1] = Sg[i]/n[z]N = Sn[z]q[i]/n[z]N = Sq[i]/N = Q.
Successful gamete: a gamete from P[1] which contributes genes to a member of P[2].
z[i]: the number of successful gametes produced by individual i.
n[g]: the number of genes (gametic ploidy) at the locus of gene A in a gamete.
g*[i]: the number of A genes among all of i’s successful gametes
z[i]n[g] = the total number of genes at the locus of gene A among all of i’s successful gametes.
q*[i]: the number of genes of type A as a proportion of all genes at the relevant locus among all of i’s successful gametes.
q*[i] = g*[i]/z[i]n[g], when z[i] is not zero. When it is zero, by stipulation q*[i] = q[i].
dq[i]: the change, if any, in the frequency of gene A between individual i and i’s successful gametes.
dq[i] by definition = q*[i] – q[i].
Q[2]: the frequency of gene A at the relevant locus in population P[2].
dQ by definition = Q[2] – Q[1].


After all the preliminaries, Price’s actual derivation of his equation is almost startlingly brief. [Added: Hamilton (p.172) describes it as 'like a rabbit from a conjuror's hat'.] The underlying aim is to find an expression for the change in the frequency of gene A between P[1] and P[2], in other words, to find dQ, using information about the number of offspring and gene frequencies of individual members of P[1]. We already have an expression for Q[1], which is Sq[i]/N = Q (the mean value of q[i]). If we can find a suitable expression for Q[2], dQ will therefore be Q[2] – Q.

From the definitions already laid down, Q[2] is the proportion of A genes among all genes at the relevant locus in P[2]. This is given by the number of all A genes in P[2] divided by the number of all genes at the locus: Q[2] = Sg*[i]/Sz[i]n[g] = Sz[i]n[g]q*[i]/Sz[i]n[g]. (It may be noted that in cases where z[i] is zero, the product z[i]q*[i] is also zero, so these products add nothing to the total, which is why Price’s definition q*[i] = q[i] for these cases is workable.) Since n[g] is a constant it may be cancelled out, giving:

Q[2] = Sz[i]q*[i]/Sz[i] = Sz[i]q*[i]/NZ.

This is relatively simple, but not obviously useful. Price’s next step is more ingenious, and is the key to the whole derivation. Recalling that by definition dq[i] = q*[i] – q[i], and therefore q*[i] = q[i] + dq[i], we can substitute q[i] + dq[i] for q* in the above equation, which gives us Q[2] = Sz[i](q[i] +dq[i])/NZ. Using the standard rules of summation this may be rearranged as:

Q[2] = Sz[i]q[i]/NZ + Sz[i]dq[i]/NZ.

Price’s next piece of ingenuity is to notice that the first term on the right is related to the covariance between z and q. By one of the standard formulae for covariance (see above and Note 1), Cov(z,q) = Sz[i]q[i]/N – ZQ, therefore Sz[i]q[i]/N = Cov(z,q) + ZQ, and Sz[i]q[i]/NZ = Cov(z,q)/Z + Q.

We therefore obtain:

Q[2] = Cov(z,q)/Z + Q + Sz[i]dq[i]/NZ.

But Q[1] = Q, and dQ = Q[2] – Q[1], therefore:

dQ = Cov(z,q)/Z + Sz[i]dq[i]/NZ.

This is Price’s First Equation. It is often expressed in the form ZdQ = Cov(z,q) + Sz[i]dq[i]/N, but so far as I know Price himself did not use this form.

The ‘covariance’ term on the right hand side is a measure of the relationship between the fitness of individuals in the parent generation and their possession of one or more copies of the A gene. If the A gene has no effect on fitness, the covariance will be zero (apart from statistical fluctuations), while if it has a strong beneficial effect the covariance will be relatively high, and if the effect is adverse the covariance will be negative. As fitness is measured by the individual’s number of ‘successful gametes’, anything that affects this, such as differential survival, mating success, or fecundity, will be taken account of in the covariance term. In so far as the genotype of the parent (not the gamete) affects the fate of sperms and eggs before fertilisation, this will also count in the covariance term.

The second term on the right hand side measures the change in frequency, if any, of the A gene between individual parents and their offspring. (Individuals who have no offspring do not contribute to the second term, as in this case z[i] = zero.) If the count of offspring (or successful gametes) is taken immediately after fertilization, as Price seems to envisage, the effects of gene A on the fitness of the offspring themselves will not affect the second term. Changes in gene frequency between parents and offspring may occur purely by random sampling effects, in which case they will tend to cancel out in large populations. The other main factors would be any tendency for some genes to get into more gametes than others (segregation distortion or meiotic drive), any effect of the gametic genes on the viability or mating success of the gametes themselves, and gene mutation. The effects of these factors (other than chance) are usually small. Gene mutation is a rare event, and the mechanisms of heredity seem to actively work against any distortion of gene ratios in the formation of gametes (see Chapter 7 of Mark Ridley’s Mendel’s Demon). And, in animals at least, the genes of the gamete itself are in general inactive until after fertilization. (I do not know the position in plants.) For these reasons in practice the second term will often be small, and Price therefore gives as an approximation the following form:

dQ = Cov(z,q)/Z.

As Price points out, this equation can also be expressed in terms of regression or correlation coefficients, as Cov(z,q) = Reg(z,q)Var(q) = Corr(z,q)sd(z)sd(q), where ‘Reg’ stands for Regression, ‘Corr’ stands for Correlation, ‘Var’ stands for Variance, and ‘sd’ stands for standard deviation.

The main value of Price’s First Equation in itself, it seems to me, is in distinguishing between two types of selection that may affect the change of gene frequency between generations. One of these is selection on parents affecting the number of successful gametes produced, the other is selection (or random change) among genes in the process of gamete formation and fertilisation. The second type of change is usually unimportant, provided chance events are averaged out, but there is no a priori reason why it should be, and there is an important evolutionary question why in fact it is (see Mark Ridley’s book in general).

The first term – the Covariance term – seems to me less interesting in itself than Price and his enthusiasts believe. But I will defer general comments on the value of Price’s various equations until a later post.

As to the ‘obscurity’ of the equation and its proof, Price’s treatment may seem elaborate, and in places difficult to follow, but much of it is quite mechanical, and follows almost inevitably once one tries to derive the change in population gene frequency from data about individual gene frequencies and reproductive fitness. (A slightly simpler derivation is possible by omitting separate terms for g and g* – see Note 2.) There are however those two flashes of ingenuity which lift it above a mere algebraic grind.

Apart from its intrinsic interest, the First Equation is also important as the prelude to the treatment in Price’s 1972 paper, where an equation is derived for ‘multi-level’ selection. It is the 1972 equation that is probably best known as ‘Price’s Equation’. At the end of the 1970 paper Price indicated that he was already working on these further developments.

There remain some biographical questions. How did Price, who had no biological training, get interested in these issues at all? And how did he hit on the particular approach of his First Equation?

Oren Harman’s biography of Price throws some light on these questions, but it is not always clear where hard evidence ends and speculation begins. I won’t discuss this further, but Harman does make one interesting claim about the equation itself. This is that Price first derived the simpler form of the equation, with just the ‘covariance’ term, and only later added the second term (Harman 220). This is entirely possible, but it is not clear whether there is any explicit documentary source for it. To see how it is possible, suppose we take the previous analysis as far as the equation:

Q[2] = Sz[i]q*[i]/NZ.

Then at this point suppose we say: On average the change in gene frequencies between adults and their own offspring is usually slight, so to a good approximation, q*[i] = q[i]. We can therefore say, approximately,

Q[2] = Sz[i]q[i]/NZ.

The final stage of the derivation can then proceed as before, but without the second term.

Whether this is how Price himself arrived at the simpler version of the equation I have no idea.

Note 1

By definition Cov(x,y) = S(x[i] – X)(y[i] – Y)/n. Expanding the right hand side by the usual rules of algebra, as applying to sums, we get (Sx[i]y[i] – Sx[i]Y – Sy[i]X + nXY)/n. (Note that the product XY has to be counted n times, once for each pair of corresponding x and y values.) But Sx[i]Y = nXY, and Sy[i]X = nYX, therefore the right hand side can be simplified to (Sx[i]y[i] – nXY)/n, or Sx[i]y[i]/n – XY, which was the required formula.

Note 2

Definitions and notation are as above, except that:
a = zygotic ploidy
b = gametic ploidy
the terms g and g* are omitted.

From the definitions:
Q[1] = Sq[i]a/Na = Sq[i]/N = Q
Q[2] = Sq*[i]bz[i]/Sbz[i] = Sq*[i]z[i]/Sz[i] = Sq*[i]z[i]/NZ

The remaining stages of the derivation are as before.


Works by George Price
1. G. R. Price, ‘Selection and covariance’, Nature, 227, 1970, 520-21.
2. G. R. Price, ‘Extension of the Hardy-Weinberg law to assortative mating’, Annals of Human Genetics, 36, 1971, 455-58.
3. G. R. Price, ‘Extension of covariance selection mathematics’, Annals of Human Genetics, 35, 1972, 485-90.
4. G. R. Price, ‘Fisher’s Fundamental Theorem made clear’, Annals of Human Genetics, 36, 1972, 129-40.
5. G. R. Price and C. A. B. Smith, ‘Fisher’s Malthusian parameter and reproductive value’, Annals of Human Genetics, 36, 1972, 1-7.
6. J. Maynard Smith and G. R. Price, ‘The logic of animal conflict’, Nature, 246, 1973, 15-18.

Other works
Steven A. Frank, ‘George Price’s contributions to evolutionary genetics’, Journal of Theoretical Biology, 175, 1995, 373-88.
W. D. Hamilton, Narrow Roads of Gene Land, vol. 1, 1996.
Oren Harman: The Price of Altruism: George Price and the Search for the Origins of Kindness, 2010.
Mark Ridley: Mendel’s Demon: Gene Justice and the Complexity of Life, 2000 (UK edition. A US edition, which I have not seen, was published with the title The Cooperative Gene. )

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

I haven’t posted here for some time, but Razib’s recent review of a book by Oren Harman about George Price prompted me to read the book, and I think I will have a few things worth saying about Price. Harman’s book itself is a good biography, but is sketchy on the mathematical details of Price’s work (as one would expect in a book aimed at a general audience), so it encouraged me to look closely at Price’s original papers for the first time. A few years ago I wrote a commentary on ‘Price’s Equation’, but I based my discussion largely on a treatment by W. D. Hamilton, which I now find is not quite the same as Price’s own approach. I have not seen any close analysis of Price’s own key papers. (The descriptions in Harman’s book are taken almost verbatim from a treatment by Steven A. Frank which itself is very brief and a long way from Price’s original.) My main purpose will therefore be to give a commentary and (I hope) elucidation of Price’s own derivations.

All this will take me a week or two to prepare. Meanwhile, I have been reflecting on the recent upsurge of interest in Price. Not many scientists attract the attention of biographers, unless they are in the class of Galileo, Darwin, Newton, or Einstein, so what is special about Price? More generally, what makes some scientists and areas of science attractive to popular historians and biographers, while others are not?

These thoughts were prompted partly by the references in Harman’s book to Cedric (C. A. B.) Smith. Smith was extremely important in Price’s career, being among the first to take an interest in his biological work and providing him with the facilities to continue it. He continued to support Price in various ways, despite Price’s eccentricities, and he must have been a kind and tolerant man (it is no surprise to find that he was a Quaker). But the point I want to make is that Smith was an important scientist in his own right. He made notable contributions to both genetics and pure mathematics. In purely academic terms he was far more successful than Price, holding J. B. S. Haldane’s former chair in genetics at the University of London, and obtaining various awards and distinctions. Yet nobody is likely to write a popular biography of C. A. B. Smith. So what is the difference between Smith and Price?

A first obvious possibility is that Price’s work, though sparse in quantity, was in fact more fundamental and influential than Smith’s. I think this is probably true, but it can hardly be the whole explanation. Fundamental work (like the Lotka-Volterra equations in ecology) does not necessarily make the stuff of popular biography.

A further requirement is that the subject should in some way be ‘sexy’. It is difficult to say what makes a subject sexy at any given time. At present evolutionary biology is undeniably sexy, but this has not always been the case. Anyone today reading Ronald Clark’s biography of J. B. S. Haldane, published in 1968, must be surprised how little space is given to Haldane’s work in evolutionary theory and population genetics, compared to his other exploits. To qualify as sexy it helps that the subject should be somewhat mysterious, like the problem of altruism in biology, but not so mysterious that the layman cannot even understand the problem. To take some examples from mathematics, Fermat’s Last Theorem, Godel’s Theorem, and Goldbach’s Conjecture are sexy, but Riemann’s Hypothesis and Cantor’s Continuum Problem are probably not, because it takes a great deal of mathematical knowledge even to understand what they are about. (At some points in reading John Derbyshire’s brave attempt to explain Riemann’s Hypothesis I thought I was beginning to see the light, but the illusion soon faded.)

Beyond the subject matter of the science, in biography it seems to help if the person concerned meets one or more of the following criteria:

- eccentricity to the point of, if not beyond the point of, madness (Price, Haldane, W. D. Hamilton, John Nash, Charles Babbage, Godel, Cantor, Erdos, and Turing are examples). Being conspicuously sane and socially well-adapted, like John von Neumann or Niels Bohr, is a definite drawback.

- an unconventional or adventurous sex life is always a plus point (Turing, Haldane, Nash, and apparently Price himself)

- facing neglect or resistance from the ‘establishment’, and having greatness only recognised late in life or, better still, posthumously (Hamilton, Price, Turing, Cantor, Nash). The precedent here was set by Dava Sobel’s hugely successful book Longitude, which set its hero John Harrison against the villainous Astronomer Royal. If no obvious neglect or resistance are apparent, try to manufacture some (for example, Alfred Russel Wallace is always presented as overshadowed or neglected in comparison with Darwin, though in reality Wallace was one of the most widely read and highly regarded scientists of his day).

- being female. There are so few female scientists or mathematicians of any distinction that the occasional exception is bound to garner at least one biography, usually adulatory (for an exception see Dorothy Stein’s judicious debunking of Ada Lovelace).

- dying prematurely, preferably in suspicious or tragic circumstances (Price, Turing, Hamilton, Dian Fossey).

It will be seen that George Price ticks most of the boxes (apart from the female thing), so his current prominence shouldn’t really come as a surprise. It is interesting to see whether there are any scientists who tick a lot of the boxes but have not yet been given the pop-biog treatment. The one who looks most promising is Georg Cantor. There is a good scholarly biography by J. W. Dauben, but so far as I know no popular treatment. All it needs is for someone to discreetly plagiarise Dauben, strip out all the boring technical stuff, beef up the sexy bits (insanity, religion), and give it a jazzy title, preferably with the words ‘God’ and ‘infinity’ in it. Something like The Man who Found God but Lost his Mind in Infinity. I offer this suggestion free of charge to anyone who wants to have a go at it.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

There has been much discussion in the blogosphere (for example by Olivia Judson here) of the current libel case between the science writer Simon Singh and the British Chiropractic Association. Most of the comments have supported Singh and criticised both the BCA and the trial Judge, Sir David Eady. Science writers complain that the libel laws are stifling fair criticism of unscientific claims (which makes this at least marginally relevant to gnxp).

I have no interest, of any kind, in chiropractic, and I support freedom of speech, so you might expect me to join the chorus of Singh-lovers and Eady-haters. Unfortunately, much of the commentary has been ill-informed or self-interested (since journalists and bloggers view the libel laws much as turkeys view Christmas). The British press has other motives for attacking Judge Eady, who has extended the legal right of privacy against paparazzi and tabloid journalists. So protestations of concern for ‘free speech’ need to be taken with a hefty pinch of salt…

Some red herrings to dispose of. First, there is a legitimate debate over the practice of ‘libel tourism’ or ‘forum shopping’. But this issue does not arise in the Singh case, where a British writer made comments about a British organisation in a British newspaper. There is no question that a British Court is entitled to try the case.

Second, on libertarian grounds I would be willing to argue for complete freedom of speech, with no restrictions on libel. But that is not where we start from. Every country has some kind of libel law. The details vary, and the balance between freedom of speech and protection of individual reputation is struck in different ways. It is arguable that American law leans too far in favour of the libeller, while English law leans too far in favour of the libelled. But Eady’s critics argue that even within the general framework of English libel law his rulings are dangerous to freedom of speech. I will therefore take that general framework as given.

What then are the issues?

Here is the key passage from Singh’s article, which prompted the libel action:

The British Chiropractic Association claims that their members can help treat children with colic, sleeping and feeding problems, frequent ear infections, asthma and prolonged crying, even though there is not a jot of evidence. This organization is the respectable face of the chiropractic profession and yet it happily promotes bogus treatments.

Before going any further it is necessary to set out the various stages of a libel action under English law, since some of the critical commentary seems to misunderstand this. A case can be divided into four main stages:

Stage 1: It must be established what was said or written, who said it, and who was its ‘target’. In the present case this is straightforward.

Stage 2: It is necessary to decide whether what was said is defamatory. Roughly, this means whether or not it is damaging to the reputation of the complainant. At this stage, under English libel law, the truth or falsity of what was said is irrelevant. [Note 1] Much of the comment on the case has failed to grasp this. A true statement may be defamatory, and a false statement may be non-defamatory. The point at issue is not its truth, but whether it is damaging.

Stage 3: If it is decided that a statement is defamatory, the person responsible for the statement may then defend it. Except in certain special circumstances, the defence is either that the statement is true (the defence of ‘justification’), or that it constitutes ‘fair comment’. In the English system it is usually for a jury to decide whether the defence is convincing.

Stage 4: If the jury finds in favour of the complainant, a decision is then needed on the amount of damages or other remedial action. Damages are decided by the jury. All costs of the case are usually paid by the losing side. It has been suggested in some commentaries that it is cheap to bring a libel action, because the complainant can hire a lawyer on a no-win no-fee basis. But this is only true if the complainant has a strong case; otherwise no lawyer will touch it.

The basic complaint of the BCA is that Singh’s article accuses them of dishonesty, by promoting treatments which they know to be ‘bogus’.

Judge Eady was asked to give preliminary rulings on two issues: what Singh’s words meant; and whether they amounted to an assertion of fact or merely an expression of opinion. On the first point, he decided, agreeing with the BCA, that Singh’s article accuses them of dishonesty, saying: ‘[the quoted passage] is in my judgment the plainest allegation of dishonesty and indeed it accuses them of thoroughly disreputable conduct.’ After this, it was straightforward to take the further step of deciding that the passage is defamatory, since an accusation of dishonesty could hardly not be. On the second point, Judge Eady concluded that the passage amounts to an assertion of fact. The importance of this is that if the defamatory passage is an assertion of fact, the defence of ‘fair comment’ is not available, and the only defence (usually) is to show that the assertion is factually true, or ‘justified’. This defence remains open to Singh.

The case so far therefore raises two issues:

1. Was Eady right to conclude that Singh had accused the BCA of dishonesty?

2. Was Eady right to conclude that the accusation was an assertion of fact, rather than merely an expression of opinion?

On the first point, the matter is perhaps not as clear-cut as Eady’s ruling suggests, but on a common-sense reading of Singh’s passage it is at least a very reasonable interpretation. Singh’s words are strong: he says there is ‘not a jot of evidence’ for the BCA’s claims, and that while it is the ‘respectable face’ of chiropractic, it still ‘happily promotes bogus treatments’. Whether or not Singh intended this to be an accusation of dishonesty, it is a natural inference for the reader to draw. The word ‘bogus’ by itself usually has an implication of dishonesty; the dictionary gives synonyms such as ‘sham’, ‘spurious’, and ‘counterfeit’. To say that someone promotes bogus treatments therefore might in itself be taken as implying dishonesty. This interpretation is reinforced by the contrast Singh draws between the ‘respectable face’ of the BCA and its ‘happily’ promoting ‘bogus treatments’. The contrast between ‘respectable face’ and ‘bogus’ seems to imply that the BCA is not, after all, as respectable as it may appear. If Singh did not intend an imputation of dishonesty, he expressed himself carelessly. An alternative possibility is that he did intend to impute dishonesty, but chose his words so as to insinuate that conclusion without making it explicit. In any case, under English libel law, Singh’s intention is irrelevant: what matters is the interpretation that reasonable readers are likely to put on his words.

On the second point, namely whether the defamatory claim was a matter of fact or opinion, the issues are more technical, and I do not pretend to understand all the legal subtleties. According to Eady’s ruling:

It will have become apparent by now that I also classify the defendant’s remarks as factual assertions rather than the mere expression of opinion. Miss Rogers reminded me, by reference to Hamilton v Clifford [2004] EWHC 1542 (QB), that one is not permitted to seek shelter behind a defence of fair comment when the defamatory sting is one of verifiable fact. [Note 2] Here the allegations are plainly verifiable and that is the subject of the defence of justification. What matters is whether those responsible for the claims put out by the BCA were well aware at the time that there was simply no evidence to support
them. That is an issue capable of resolution in the light of the evidence called. In other words, it is a matter of verifiable fact. That is despite the fact that the words complained of appear under a general heading “comment and debate”. It is a question of substance rather than labelling.

Given the assumption that there was an accusation of dishonesty, this seems a reasonable enough decision. The defence of ‘fair comment’ is more narrowly circumscribed than the layman might imagine. The test of whether something is ‘opinion’ depends on the substance of the alleged disreputable conduct, and not on the form in which the allegation is made. It does not become a matter of opinion just because the author uses the words ‘in my opinion’ or some other verbal dodge.

Clearly the whole case (so far) hinges on the question whether a reasonable reader would interpret Singh’s words as containing an accusation of dishonesty. Much of the commentary has either missed this point, or strained to find alternative interpretations. For example, the words are interpreted as imputing mere gullibility or ignorance, rather than dishonesty. In some circumstances that might be the most natural interpretation of the same or similar words. For example, it might be said that exorcism is a ‘bogus’ treatment for mental illness, yet that some religious sects ‘happily promote’ this bogus treatment. In this case it might plausibly be argued that the implied accusation is one of gullibility or ignorance rather than dishonesty. But this interpretation relies on the background knowledge than religious sects are commonly ill-informed and gullible. In the case of the BCA, the contrast that Singh himself makes is between the BCA’s position as the ‘respectable face’ of a medical profession, and its willingness ‘happily’ to promote ‘bogus’ treatments for which there is ‘not a jot of evidence’. It is difficult to regard this merely as an accusation of gullibility. According to Judge Eady’s ruling:

It is alleged that the claimant promotes the bogus treatments “happily”. What that means is not that they do it naively or innocently believing in their efficacy, but rather that they are quite content and, so to speak, with their eyes open to present what are known to be bogus treatments as useful and effective. That is in my judgment the plainest allegation of dishonesty and indeed it accuses them of thoroughly disreputable conduct.

The critics complain that this is reading too much into the word ‘happily’, which could have a variety of other meanings. But again the question is not what the word might conceivably mean, but what a reasonable reader is likely to take it to mean. The meaning of words often depends on their context. In this case, the word ‘happily’ does not have its literal meaning as a description of an emotional state. The word must in some way describe the collective state of mind of the BCA in promoting ‘bogus’ treatments, and in the context it does (it seems to me) have a strong suggestion of dishonesty. The alternative is to suppose that it has a weaker connotation of recklessness or irresponsibility, but not of conscious dishonesty, or that it leaves several possibilities open, meaning (roughly) ‘dishonest or gullible or reckless or irresponsible…’. These interpretations are not impossible, but Singh himself has made it more difficult to accept them by saying that there is ‘not a jot of evidence’ for the ‘bogus’ treatments. If this were true, then the BCA, as a body of specialists in the field, could hardly be unaware of it, and their promotion of such treatments would go beyond mere recklessness into conscious dishonesty. Judge Eady’s interpretation is therefore not unreasonable.

Nor does the case have the far-reaching implications for freedom of speech or scientific research that some critics claim. No-one is suggesting that it is improper to criticise chiropractic or other alternative therapies. The only lesson to be drawn is that if you wish to accuse someone of dishonesty, at least in England, you must be ready to back up your accusation with evidence; and if you do not wish to accuse someone of dishonesty, you should choose your words with care.

Note 1: This is the position in most of the Common Law world. It was also the position in the United States until a series of Supreme Court decisions shifted the burden of proof onto complainants, where they are ‘public figures’, to show that the words complained of are not only defamatory but deliberately false. A useful account of American libel law is here.

Note 2: Out of curiosity I looked up this case. British readers may recall the incident when the former MP Neil Hamilton and his wife were accused of having raped a woman. The accusation was investigated by the police and disproved. The accuser was subsequently prosecuted and jailed for making false accusations. But before this, she had sold her story to the tabloids, using the PR consultant Max Clifford as intermediary. During the police investigation Max Clifford had gone on television to defend the woman’s claims, and among other things said he personally believed the claims were true. This was what led to the libel action, as the Hamiltons claimed that by endorsing the woman’s accusations Clifford was himself in effect accusing the Hamiltons of rape. Clifford argued in his defence that he was merely expressing an opinion, but the Judge ruled that he was making an assertion of fact, and could not shield behind the defence of ‘fair comment’. And who was the Judge? – step forward, Mr Justice Eady!

Added on 27 November: it has been pointed out that Simon Singh has recently been granted leave to appeal on some of the issues raised by the case. The Appeal Court may well reverse Judge Eady’s rulings on some or all matters. In my post I did not suggest that Eady was necessarily right, just that his rulings were a lot more reasonable than some commentators have claimed. As I said at the outset I have no interest in chiropractic. I have only commented on the case because I was getting tired of misrepresentations of it, which recur in an article in the (London) Times yesterday. Two things in particular have irritated me. One is the one-sided presentation of the case by the commentators. I have not seen a single comment which recognises that the BCA might just have a legitimate complaint when they are, arguably, accused of dishonesty. You can argue about the precise meaning of the words used by Singh, but no-one can sensibly deny that they could be used to make an accusation of dishonesty. Second, I am concerned that scare-mongering about the effects of the case on free speech and scientific enquiry could be a self-fulfilling prophecy. If scientists and science writers (including bloggers) are led to believe that they cannot make strong criticisms of pseudo-science without facing a libel action, freedom of speech and enquiry really will be inhibited. For the reasons given in my post, I do not think that the Singh case has these implications, and those who claim that it does are harming the cause they wish to defend.

I am also happy to acknowledge that I obtained the text of Judge Eady’s ruling through JackOfKent’s blog, via Olivia Judson’s blog, which is linked in my post. I would also stress that my criticism of ‘ill-informed’ commentators does not include JackOfKent. I don’t agree with his assessment of the case, but he is certainly well-informed about it – far more so than me.

Added on 29 November: I hold no brief, in any sense, for the BCA, but it seems to me that in fairness one should not accuse them of ‘litigiousness’, without at least checking their own statements of position. Here is one of their p
ress notices on the Singh case. I do not know (obviously) whether the quote they attribute to Simon Singh at the end of their statement is true, but if it is, it puts Singh in a very different light from that presented by his cheerleaders.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

This is the seventh and last in a series of posts about Charles Darwin’s view of evolution. Previous posts were:

1: The Pattern of Evolution.
2: Mechanisms of Evolution.
3: Heredity.
4: Speciation
5. Gradualism (A) , which dealt with Darwin’s views on gradualism in the rate of evolutionary change.
6. Gradualism (B), about the size of the mutations adopted by natural selection.

This final post deals with Darwin’s views on the levels of selection in evolution. Does selection occur mainly between genes, individuals, families, groups, species, or what? In the modern debate on levels of selection, Darwin has been quoted in support by both sides: those who accept, and those who reject, a major role for selection above the level of the individual organism.

Unless otherwise stated, all page references are to Charles Darwin: The Origin of Species: a Variorum Text, edited by Morse Peckham, 1959, reprinted 2006.

This post will be (relatively) brief, because there is already an excellent detailed study [Ruse] which I have little to add to.

Darwin’s position on levels of selection can be summarised in four points:

1. His formulation of the process of natural selection is expressed almost entirely in terms of selection among individuals, based on what he calls ‘individual differences’. In this respect he differs from Wallace, who referred mainly to selection between ‘varieties’. It has recently been argued that Wallace (in 1858) did not quite ‘get’ the idea of natural selection after all. Be that as it may, Wallace was always more welcoming than Darwin to what we would now call group selection.

2. Darwin gave no autonomous role to selection between species or varieties. In so far as he did mention selection at these levels, it was as a by-product of selection at lower levels. For example, if a newly introduced species displaces an indigenous one, it is because the individual organisms of the first species are competitively superior to those of the second.

3. Darwin recognised the possibility that selection might operate on individuals indirectly, via the individual’s relatives, as in the case of neuter insects. Thus he had the germ of the modern ideas of kin selection and inclusive fitness, but these were not fully developed until much later.

4. At a level between the family and the species, Darwin recognised a role for selection between social communities, notably among social insects and human ‘tribes’. Most of the recent debate about Darwin’s views on levels of selection has concerned the interpretation of this ‘community selection’.

Darwin’s most explicit statement on the issue in the Origin says in the first edition (with italics added):

Natural selection will modify the structure of the young in relation to the parent, and of the parent in relation to the young. In social animals it will adapt the structure of each individual for the benefit of the community; if each in consequence profits by the selected change[172]

In the fifth edition the word each is revised to this and in the sixth to the community. It has been suggested [Richards p.217] that these changes involve an important shift towards group selectionism. In the first edition, traits benefiting the community are only selected if they are also beneficial to the individual, but in the fifth and sixth editions such a trait can be selected if even if it is harmful to the individual. I agree that this is an important revision, but I think it is only stating as a general principle something that Darwin had already accepted in individual cases. He believed that the sterility of neuter insects had been selected for the good of the community[417]. Likewise, the sting of bees is useful to the community, and is selected for that reason, even though it kills the individual bee when it is used[374]. Since dying, or becoming sterile, are clearly against the interests of the individual, these examples were inconsistent with Darwin’s original formulation, and his revisions may just have been a belated recognition of this.

If a trait is beneficial to the community, but harmful to the individual who possesses the trait (like the bee’s sting), the question arises how such a trait can increase in frequency. In the case of the sterile classes of social insects Darwin saw fairly clearly that the solution was in the relatedness of the members of the colony:

This difficulty, though appearing insuperable, is lessened or, as I believe, disappears, when it is remembered that selection may be applied to the family, as well as to the individual, and may thus gain the desired end… Thus I believe it has been with social insects: a slight modification of structure, or instinct, correlated with the sterile condition of certain members of the community, has been advantageous to the community: consequently the fertile males and females of the same community flourished, and transmitted to their fertile offspring a tendency to produce sterile members having the same modification [416-17].

The same mechanism does not apply where individuals are not genetically related. In the fifth edition Darwin discussed the problem in the context of the sterility of hybrids:

With sterile insects we have reason to believe that modifications in their structure and fertility have been slowly accumulated by natural selection, from an advantage having been thus indirectly given to the community to which they belonged over other communities of the same species; but an individual animal not belonging to a social community, if rendered slightly sterile when crossed with some other variety, would not thus itself gain any advantage or indirectly give any advantage to the other individuals of the same variety, thus leading to their preservation[445]

Darwin concluded (contrary to the position of Wallace) that the sterility of hybrids, and the inter-sterility of different species, had not evolved directly by natural selection but as a by-product of other changes. Unfortunately in the sixth edition the quoted passage was omitted, as Darwin believed he had more convincing new evidence that the sterility had not been selected.

In the Descent of Man, Darwin returned to the issue in the context of the evolution of human morality. He believed that tribes containing ‘a greater number of courageous, sympathetic, and faithful members’ [Descent of Man, 1871, p.162] would succeed in competition against other tribes, but he saw a problem in explaining how such virtues could evolve within a tribe: ‘But it may be asked, how within the limits of the same tribe did a large number of members first become endowed with these social and moral qualities, and how was the standard of excellence raised?’[163] He thought it was very unlikely that these qualities could be directly favoured by natural selection within a tribe. As a ‘probable’ solution, he suggested two important factors. One was what we now call ‘reciprocal altruism’, i.e. that a benefit might be provided in the expectation of a return benefit[163]. To complicate matters, Darwin believed that habitual behaviour, once acquired, could be transmitted by ‘Lamarckian’ inheritance [163-4]. The second, and more important, factor was ‘the praise and blame of our fellow-men’[164]: ‘it i
s hardly possible to exaggerate the importance during rude times of the love of praise and the dread of blame’[165]. Darwin does not explain how praise and blame are converted into individual fitness, but modern theorists have devised game theoretical models to handle these issues, which tend to confirm the importance of reputation. An individual who gains a reputation as a cheat or shirker will be excluded from the benefits of social life, with adverse effects on fitness.

Finally, Darwin returns to the point that tribes with many individuals possessing traits of courage, etc, ‘would be victorious over most other tribes; and this would be natural selection’[166]. This passage is the main basis for the claim that Darwin became a ‘group selectionist’. In a sense this is true, since it does give selection between groups (tribes) a role in promoting the spread of a trait. However, I do not think Darwin intends it as part of the solution to the question ‘how within the limits of the same tribe did a large number of members first become endowed with these social and moral qualities’. If he did, the solution would clearly be invalid. The process of group selection envisaged by Darwin presupposes that some tribes already have ‘many individuals’ possessing the qualities in question. At best group selection has a role in reinforcing and extending the prevalence of altruistic traits which have first emerged within the tribes for other reasons.

The crucial problem for group selectionists has always been to explain how altruistic traits can become common within a group despite harming individual fitness. Darwin sidesteps the problem in this form, since his two suggested mechanisms (reciprocal altruism and ‘praise and blame’) in fact raise individual fitness, perhaps sufficiently to offset the loss of fitness. The problem of altruism still remains for those theories in which altruists suffer a net loss of individual fitness. If ‘genes for altruism’ are randomly distributed, and the benefits of altruism are simply proportional to the number of altruists in the group, then altruism will always be eliminated (apart from recurrent mutations) [Maynard Smith p.166]. A solution is however possible if either (a) genes for altruism are concentrated in some groups above chance levels, for example because close relatives tend to live near each other; or (b) the benefits of altruism are not simply proportional to the number of altruists. If chance concentrations of altruists gain disproportionate benefits, altruism can be selected despite its fitness detriment to those altruists who fall outside such concentrations. ‘Synergistic’ effects of this kind are quite plausible [Maynard Smith p.167], yet this solution to the problem has been strangely neglected.

Group selection of some kind is therefore possible, and it is an empirical matter to determine its prevalence and the mechanisms responsible in any particular case. Darwin did not solve the problem, but at least it may be said that he recognised it more clearly than any evolutionist before R. A. Fisher, and that he sketched out most of the possible solutions to the problem that have been explored more fully by his successors.

This post brings to an end my series of posts on ‘What Darwin Said’, which I regard as my contribution to ‘Darwin Year’. I have not aimed to cover every aspect of Darwin’s work, even in evolutionary theory – notably, I have not discussed sexual selection. I hope however that I have clarified Darwin’s views on most of the issues that are still under serious debate. I have also tried to evaluate how far Darwin’s views have stood the test of time. Overall, I think the answer is ‘remarkably well’, considering the extent of ignorance and false beliefs in Darwin’s time on many key issues such as the nature of inheritance. But Darwin was not infallible, even with the evidence available to him, and it would be short-sighted to defend evolutionism in general by pretending (in the manner of diehard Marxists) that the Master was always right.

John Maynard Smith, Evolutionary Genetics, 1989.
Robert J. Richards, Darwin and the Emergence of Evolutionary Theories of Mind and Behavior, 1987.
Michael Ruse: ‘Charles Darwin and Group Selection’, Annals of Science, 37, 1980, 615-30, repr. in The Darwinian Paradigm, 1989.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

This is the sixth in a series of posts about Charles Darwin’s view of evolution. Previous posts were:

1: The Pattern of Evolution.
2: Mechanisms of Evolution.
3: Heredity.
4: Speciation.
5. Gradualism (A) , which dealt with Darwin’s views on gradualism in the rate of evolutionary change.

The present part deals with another aspect of gradualism: the size of the variations adopted by natural selection. A gradualist in this respect maintains that successful variations (mutations, in modern terminology) are always or usually relatively small in effect.

[Added on 20 October: there is an article on Panda evolution (discussed below) here.]


Unless otherwise stated, all page references are to Charles Darwin: The Origin of Species: a Variorum Text, edited by Morse Peckham, 1959, reprinted 2006.

To denote small variations, Darwin usually refers to ‘individual differences’, which he describes as ‘many slight differences which may be called individual differences, such as are known frequently to appear in the offspring of the same parents, or which may be presumed to have thus arisen, from being frequently observed in the individuals of the same species inhabiting the same confined locality’[122]. ‘Individual differences’ are therefore envisaged by Darwin as being both small and relatively common.

To denote larger or rarer variations, Darwin uses several terms, with some differences of meaning. Occasionally he refers to ‘single variations’, meaning those occurring only in rare and isolated individuals. Single variations may be either ‘slight or strongly-marked’[178]. He also sometimes refers to ‘sports’, a traditional term used to describe unexpected new characters such as buds different from the rest of a plant[81]. Another term is ‘monstrosity’: ‘some considerable deviation of structure in some part, either injurious to or not useful to the species, and not generally propagated’[120].

But Darwin most often uses the phrase ‘great and sudden’, ‘great and abrupt’, or related terms, to describe variations larger than individual differences [264, 267, 345, 362, , 713, 735, 751] : ‘As natural selection acts solely by accumulating light, successive, favourable variations, it can produce no great or sudden modification; it can act only by very short and slow steps’ [735; the word 'very' is omitted in the 5th and 6th editions].

The adequacy of individual differences

Darwin believed that in general individual differences were sufficient for the observed pattern of evolution: ‘A large amount of inheritable and diversified variability is favourable [to natural selection], but mere individual differences probably suffice’[192].

Some authors have interpreted Darwin’s concept of individual differences as covering only continuous variations (quantitative traits, in the modern jargon). I find no strong evidence to support this interpretation. Darwin himself does not use the terms continuous and discontinuous. In the first edition of the Origin he does twice refer to ‘insensibly’ small gradations [321, 714], and on one occasion to ‘infinitesimally small’ inherited modifications[185], expressions which might be taken to imply strict continuity. But the words ‘insensibly’ and ‘infinitesimally’ were removed in later editions. Darwin moreover says that ‘Every one who believes in slow and gradual evolution, will of course admit that specific changes may have been as abrupt and as great as any single variation which we meet with under nature, or even under domestication’[263]. He also says that ‘the general pattern of an organ might become so much obscured as to be finally lost, by the atrophy and ultimately by the complete abortion of certain parts, by the soldering together of other parts, and by the doubling or multiplication of others, – variations which we know to be within the limits of possibility’[679]. It is a general rule that when the same body part is repeated many times in the same individual, like the vertebrae of snakes, the number is variable[297]. These statements clearly imply that ‘meristic’ (numerical) changes can occur in evolution.

No great and sudden changes

Darwin makes several statements against the likelihood of ‘great and sudden’ changes. Perhaps the strongest is that: ‘If it could be demonstrated that any complex organ existed, which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down’[344] This still leaves open the possibility of ‘great and sudden changes’ which do not produce a new complex organ, but some other change such as a large increase in size. Darwin was sceptical about the likelihood of any great and sudden changes. Many of his comments on the subject was added in the 3rd or later editions of the Origin, for example: ‘It may perhaps be doubted whether monstrosities, or such sudden and great deviations of structure as we occasionally see in our domestic productions, more especially in plants, are ever permanently propagated in a state of nature’ [121, 3rd edn. onwards]. Darwin’s position is not that such changes are impossible, but that he sees little evidence for them, and several arguments against them. His reasoning is considered further below.

Darwin does make a partial exception for two kinds of sudden change. One is where a plant has flowers, etc, of two different kinds. If for some reason the plant ceased to produce one of these, there would be a sudden change, though the different types might originally have been produced gradually.[121] Darwin also acknowledged the case, raised by E. D. Cope, of what is now called heterochrony, where different stages of the life cycle of an organism are speeded up or slowed down. In such circumstances some parts of the life cycle may eventually be omitted entirely, and this could be a relatively sudden change, but ‘Whether species have often or ever been modified by this comparatively sudden mode of transition, I can form no opinion; but if this has occurred, it is probable that the differences between the young and mature, and between the mature and the old, were primordially acquired by graduated steps’[349]

Darwin’s reasons for gradualism

Curiously, in the first edition of the Origin Darwin gives little argument against ‘great and sudden changes’. This was not because he had not thought about the matter. In the draft ‘big species book’ which he was working on before receiving the celebrated letter from Wallace, he included the following passage:

I cannot believe that in a state of nature new species arise from changes in structure in old species so great & sudden as to deserve to be called monstrosities. Had this been so, we should have had monstrosities closely resembling other species of the same genus or family; as it is comparisons are instituted with distant members of the same great order or even class, appearing as if picked out almost by chance. Nor can I believe that structures could arise from any sudden & great change of structure (excepting possibly in the rarest instances) so beautifully adapted as we know them to be, to the extraordinarily complex conditions of existence against which every species has to struggle. Every part of the machinery seems to have been slowly & cautiously modelled to
guard against the innumerable contingencies to which it has to be exposed [p. 319, Charles Darwin's Natural Selection, ed. R. C. Stauffer, 1975]

In abridging his draft for the Origin, Darwin omitted this passage, apparently because he did not expect his position to be controversial. In the event, several commentators, including T H Huxley, thought he had been unwise to rule out great and sudden changes or ‘saltations’. Darwin’s correspondence shows that he was surprised at this objection, and in subsequent editions of the Origin he gave fuller reasons for his position. These may be broken down into the following main points:

1. It is improbable that large and sudden changes would produce the refinement of adaptation we observe in nature: ‘almost every part of every organic being, at least with animals, is so beautifully related to its complex conditions of life that it seems as improbable that any part should have been suddenly produced perfect, as that a complex machine should have been invented by man in a perfect state’[121]

2. It is very rare for wholly new organs to appear ‘as if created for some special purpose…nature is prodigal in variety, but niggard in innovation’[361] In his chapters on ‘Difficulties of theory’ and ‘Miscellaneous objections’ (in the 6th edition) Darwin discusses at length the evidence for transitional stages in the evolution of organs.

3. Closely related species usually differ only slightly, by a number of small differences, and cannot be sharply distinguished from sub-specific varieties, which in turn are merely well-marked individual differences. There is a continuity of variation which suggests that the differences between species are the accumulated effects of individual differences [135, 265]

4. ‘Monsters’ are usually sterile[121]

5. Embryology shows gradual rather than sudden transitions[267]

6. Great and sudden changes occur only rarely and in isolated individuals. Even if they have some selective advantage, such changes are likely to be eliminated by chance extinction, or diluted by interbreeding with other individuals, before they can establish themselves.[178]. Darwin based this argument on an anonymous article in the North British review (in fact by Fleeming Jenkin). Jenkin intended it as an argument against natural selection in general, but Darwin welcomed it as supporting his position against the importance of ‘single variations’.

7. As in his pre-Origin draft, Darwin pointed out that if evolution often occurred through large and sudden changes in a single trait or organ, we ought to be able to find many cases of closely related species, resembling each other in most respects, but differing sharply in some particular way resembling a monstrosity. Darwin claimed that he had diligently searched for such cases but not found any[121]

Was Darwin right?

Mainstream neo-Darwinian evolutionists generally follow Darwin in minimising the role of ‘great and sudden’ variations. Gradualism usually goes together with a belief in the importance of adaptation and natural selection. In contrast, critics of natural selection, from Darwin’s lifetime onwards, have often favoured some kind of saltationism: Mivart, Bateson, Goldschmidt, and Schindewolf being notable examples. More recently, punctuationists such as Stephen Jay Gould and Stephen Stanley have combined acceptance of macromutations with a lukewarm attitude towards natural selection. On the other hand, there is no logical incompatibility between macromutation and natural selection, if large mutations are favoured by selection. A minority of evolutionists, including Francis Galton, De Vries and J. B. S. Haldane, have combined selectionism with acceptance of macromutation as a possibility.

It is sometimes supposed that R. A. Fisher proved the impossibility of large mutations being selected. I am not sure that Fisher himself made such a bold claim. What he did prove, given a few reasonable assumptions, was that:

a. a small mutation is more likely to be beneficial than a large one

b. the probability that a mutation will be beneficial declines as the number of different traits affected by the mutation increases.

These points fall short of the strong claim that macromutations can never be beneficial. Confusion on this point has perhaps arisen from Fisher’s informal ‘microscope’ analogy. If a microscope is already fairly well focused (corresponding to an organism which is fairly well adapted, as it must be to survive at all), a small adjustment of the focus has a 50:50 chance of making an improvement, whereas a large adjustment is bound to make things worse. But the latter conclusion depends on the tacit assumption that there is only a single optimum focus. In the case of a microscope we know that this is true, but if instead of a microscope we have some instrument with more than one locally optimal setting, such as an FM radio receiver with many stations, then a large adjustment has a non-zero probability of improving on the current position. Whether the probability is significant will depend on the number and spacing of local optima within the range of possible adjustment. Turning to the biological case, we can make use of the phenotypic version of the ‘adaptive landscape’ concept. Assuming that an organism is close to a peak in the landscape, a mutation will take its offspring to some other point of the landscape. Whether this is higher or lower in fitness than the parent will depend on the structure of the landscape. If peaks of fitness are few and far between, relative to the range of possible mutational effects, then the probability of a large mutation being beneficial will be very small. If on the other hand there are many peaks within the range of feasible mutations, then a large mutation may well be advantageous. I do not know of any rigorous argument against this scenario.

Darwin himself [121, 267] objects to the sheer improbability that a single, sudden transformation would produce an organism perfectly adapted to its environment. But the macromutationist does not need to claim this much: he need only claim that the organism would on balance be fitter than its predecessors. The adaptation might then be refined by smaller changes.

Turning to Darwin’s other points, the argument from embryology is a weak one. It assumes that the evolutionary sequence of mature organisms is closely followed in the embryonic development of an individual, so that there can be no sudden transitions in the former without sudden transitions in the latter. This requires a strong ‘recapitulation’ theory of embryology, which would not now be accepted.

Fleeming Jenkin’s arguments against the selection of ‘single variations’ depend on the assumption of ‘blending inheritance’, according to which an offspring is always intermediate between its offspring. But for single gene mutations, which would include many mutations of large effect, the ‘blending’ assumption fails.

The argument that ‘monsters’ are usually sterile carries some weight. Large mutations are sometimes due to chromosomal abnormalities, which reduce fertility, while even in the case of single gene mutations the resulting ‘monster’ may have difficulty finding a mate.

Darwin’s remaining arguments are empirical. He claims that it is rare to find an organ in one species which cannot be traced through transitional forms in other species. New organs or body parts are evolved from old ones: for example the wings of birds and bats are evolved from the standard tetrapod forelimb, and not wholly new parts. Related species differ by a number of small differences, not by single radical mutation of the kind known to occur in ‘monstrosities’.

It would be difficult to evaluate these claims without a wide-ranging survey of the animal and plant kingdoms. It seems to be true that closely related species often differ by numerous
small changes. Cases like the ‘geminate species’ on the opposite sides of the Panama Isthmus, where several million years of separate evolution have produced slightly differing pairs of species, support this position. There is however evidence that single mutations of striking phenotypic effect have had a larger role in evolution than Darwin supposed. The classic cases of melanism and mimicry in insects seem to be of this kind: a melanic or mimic form first appears by a relatively large mutation, which is then refined by smaller changes. It is also claimed (by Vorontsov) that the hairlessness of the naked mole rat and of the bat species Cheiromeles is due to a single mutation. Whether we call these ‘macromutations’ is a matter of taste, but they go beyond what Darwin described as ‘individual differences’.

There are some cases where the nature of the variation itself seems to require a sudden change. For example, some snail shells coil in the opposite direction to the standard one, and a single-step reversal of chirality (of a kind known to occur by rare mutations) is more credible than a transition through an uncoiled stage. In starfish, most species have five rays, which is evidently the ‘primitive’ condition, but some species have more than five. A transition through forms with ‘five-and-a-bit’ rays seems highly improbable, so there was presumably a sudden increase in some lineages. When significant sudden changes do occur, they may conceivably mark the origin of a new higher taxon, such as a family or order. Macromutationists have often argued for this. However, most such claims are vague and poorly supported. For example, Stephen Jay Gould endorsed the view of Dwight Davis that the distinctive features of the Panda (Ailuropoda) may have resulted from a few large mutations (with some subsequent ‘polishing’), but there seems to be no direct genetic or fossil evidence for this. On perusing Davis’s original 1964 monograph, he really had no reason for his view other than a gut feeling that the changes, such as the enlargement of the radial sesamoid bone, could not have been gradual. It has also been claimed (e.g. by Oliver Rieppel) that Chelonia (tortoises and turtles) must have evolved their armoured shell by large mutational steps. There were until recently no intermediate stages in the fossil record, but some transitional forms have now been discovered, which casts doubt on the macromutationist argument.

Overall, Darwin’s gradualism probably went too far. There is no reason (Fisher notwithstanding) to be opposed in principle to macromutational changes, and there is evidence that they have sometimes occurred (perhaps more often in plants than animals). On the other hand, enthusiasm for ‘saltationism’ has often been linked with hostility towards natural selection, and a subjective inability to see how such-and-such a change can have been gradually selected. 150 years after the Origin, evolutionists should arguably be more open-minded than Darwin to the role of large mutations in evolution, but still cautious in claiming that any given change ‘must’ have been sudden.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

Having recently posted on the subject of Charles Darwin’s ‘gradualism’, I was pleased to see a news report on research showing the gradual evolution of the distinctive head of flatfish, which, like many of Picasso’s portraits, have both eyes on the same side of the face. In Darwin’s time this case was raised, especially by St George Mivart, as a fatal objection to the theory of gradual evolution by natural selection, since (it was argued) there would be no advantage to having one eye gradually moving around the face from one side to the other. The new research claims to show that this is precisely what happened. I am not sure whether the research is entirely new, because I vaguely recall something similar before, but the new study is presumably fuller and more definitive.

Added: The researcher, Matt Friedmann, had an article in Nature last year. The Abstract is here.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

Exciting news for Anglo-Saxonists here.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

This is the fifth in a series of posts about Charles Darwin’s view of evolution. Previous posts were:

1: The Pattern of Evolution.
2: Mechanisms of Evolution.
3: Heredity.
4: Speciation

The present part deals with the subject of gradualism. Gradualism is contrasted with views of evolution as a sudden, discontinuous or even instantaneous process. At various times, from Darwin’s lifetime onwards, discontinuous processes of evolution have often been advocated, most recently by the proponents of ‘punctuated equilibrium’.

What is gradualism?

It is necessary to distinguish between

(A) gradualism with respect to the rate of evolutionary change, and
(B) gradualism with respect to the size of the variations adopted by natural selection.

A gradualist in sense (A) maintains that evolution is always or usually slow, while a gradualist in sense (B) maintains that successful variations (mutations, in modern terminology) are always or usually relatively small in effect. Gradualism in senses (A) and (B) often go together, but in principle they can be separated. It would be possible that evolution, when it occurs, usually occurs rapidly (on a geological time scale), but that the mutations responsible are individually quite small. For example, small increases in size might cumulatively double the size of an organism within a few thousand years, which would be ‘sudden’ on a geological timescale. Conversely, it would be logically possible for a large phenotypic change, such as a reduction from four to two legs, to occur as the result of a single mutation, but to take a very long time to spread through a species. In practice this seems unlikely. Large mutations are seldom advantageous, but when they are, the selective advantage is also likely to be large, and the spread of the mutation will be rapid.

These two senses of ‘gradualism’ depend on terms such as ‘slow’, ‘rapid’, ‘large’, and ‘small’, which are themselves vague. They can be made more definite by specifying what is meant by ‘large’, ‘rapid’, etc. For example, it might be suggested that an individual mutation affecting a quantitative trait is ‘large’ if it produces a trait more than three standard deviations larger or smaller than the current mean size. Or the rate of evolution might be considered ‘rapid’ if more than 90% of evolutionary change occurs within only 10% of the available time. Stephen Jay Gould proposed that speciation be considered ‘punctuational’ if it takes place in less than 2% of the duration of the species.

I had intended to cover both types of gradualism in this post, but for convenience I will now defer consideration of type (B) to another post.

Darwin’s views

So far as the rate of evolutionary change is concerned, Darwin says repeatedly that it is slow, e.g.: ‘we have every reason to believe the process of manufacturing new species to be a slow one’ [Origin, Variorum text, 140], ‘that natural selection will always act with extreme slowness, I fully admit’ [201. In the 6th edition he changed 'will always act' to 'generally acts']. This does not imply that the rate of evolution is constant, and Darwin said several times that it is irregular and intermittent (see Part 1 of this series).

‘Slow’ is a vague term, but Darwin believed that evolution in natural populations is unlikely to be detectable within a human lifetime. Evolution by deliberate artificial selection could produce noticeable changes more quickly, but Darwin thought this was unlikely for species in nature. A closer analogy would be with ‘unconscious’ artificial selection, by which ‘various breeds have been sensibly changed in the course of two or three centuries’[486], but species in nature ‘probably change much more slowly’ than this[486], implying a timescale of thousands rather than hundreds of years. One secondary source does however hint at a possibility of more rapid change: E. B. Ford, in his Ecological Genetics [4th edn., p,393], says ‘Major Leonard Darwin told me of a conversation with his father, the great Charles Darwin, who expressed his belief that by choosing the right material it might be possible actually to detect evolutionary changes taking place at the present time. For this purpose he said that long-continued investigations and careful records would be needed, extending over a period which he estimated at perhaps fifty years in species reproducing annually’.

Darwin apparently believed that substantial evolutionary change, sufficient to form a new species, was usually slow even by geological standards: ‘although each formation may mark a very long lapse of years, each probably is short compared with the period requisite to change one species into another’ [495. The word 'probably' replaced 'perhaps' from the 4th edn. onwards]. It is not entirely clear what Darwin means by a ‘formation’. The term is generally used to refer to the smallest units of the stratigraphic column having their own distinctive rock type and fossil fauna, such as the Chalk of the Cretaceous period. With modern dating methods most such units would be estimated as having a duration of over a million years. In Darwin’s time there was much debate about the absolute length of geological time, with physicists arguing (wrongly) that the age of the Earth, and therefore the entire geological record, could not be greater than 100 million years. Even with this timescale, formations would usually have a duration of at least 100 thousand years.

Darwin recognised that his doctrine of slow evolutionary change, in which the emergence of a new species would usually take longer than the duration of a geological formation, faced serious difficulties. It appeared to conflict with a literal reading of the geological record. Darwin identified three main difficulties:

1. New species were not observed to evolve gradually from old ones: they usually appeared quite suddenly in the record

2. Whole groups of related species sometimes appeared simultaneously in the fossil record

3. The fossil record itself began rather suddenly at the base of the Paleozoic era, and showed considerable diversity from the outset.

These phenomena appeared inconsistent with Darwin’s gradualism. Darwin’s answer was that the fossil record could not be taken at face value. In his chapter on ‘The Imperfection of the Geological Record’ he examines the processes by which fossils are formed, preserved, and then discovered, and argues that the fossil record gives a very incomplete picture of past life. Sediments can only be deposited over long periods in the same area if the land (or more usually the sea bed) is gradually subsiding at a rate just matched by the supply of new sediment, and the circumstances in which this can happen are rare. The geological record in any one place is therefore usually intermittent, and shows occasional periods of sedimentation interrupted by long gaps during which either there is no deposition, or in which deposits are subsequently eroded. Even where there is no conspicuous discontinuity, as shown by differences in the inclination of bedding planes, there may be long gaps in the record. This is not just a theoretical prediction, but can be demonstrated by cases where formations known to exist in some localities are missing from a seemingly continuous sequence elsewhere.

The sudden appearance of new species in a formation therefore does not imply that they have been suddenly evolved (or created). The alternatives, consistent with gr
adualism, are that evolution has taken place in the same locality during an unrecorded period, or that the species has evolved elsewhere and then migrated into the area, producing a seemingly instantaneous change in the fossil record. Darwin recognises both of these possibilities [496, 499].

Modern views and controversies

It would now generally be accepted that the fossil record in any one locality is very incomplete. As the geologist Derek Ager put it, ‘there is more gap than record’. Whether this entirely explains the apparent patterns of evolution as observed in the record is more controversial. The doctrine of punctuated equilibrium (PE) maintains that the apparent pattern is true to the reality, and that evolutionary change (in any given species) is usually concentrated in relatively short periods of time, too short to be often preserved in the fossil record. The corollary of this is that for most of the time there is ‘stasis’, i.e. an absence of significant change. However, in the ‘fine print’ of PE it is admitted that there may be gradual changes in size. The doctrine of stasis is also weakened to the claim that there is no ‘directional’ change, leaving the possibility that there is fluctuating change rather than strict stasis. It is also admitted that there may be changes in pigmentation, behaviour, and other traits which are not detectable in the fossil record. Taken together, these qualifications substantially dilute the doctrine of ‘stasis’.

Most evolutionary biologists accept that punctuation and stasis are possible, but maintain they are consistent with orthodox neo-Darwinism. They point out that the notion of ‘phyletic gradualism’ denounced by PE – a belief in steady and continuous evolutionary change in a definite direction – is a straw man, and that ever since G. G. Simpson’s work in the 1940s evolutionists have believed in wide variations in the rate of evolution, including bradytely (near-stasis) and ‘quantum evolution’ (rapid bursts of change, often associated with an adaptive radiation).

There remains the question how far the core doctrine of PE is empirically supported. Advocates of PE have compiled meta-analyses which purport to show that the majority of evolutionary change conforms to the PE pattern. However, any meta-analysis can only be as good as the primary data it is based on. In view of the notorious disagreements among paleontologists, such as the division between ‘lumpers’ and ‘splitters’, one may be sceptical about any analysis which simply takes paleontological data (such as lists of species and genera) on trust. As recently as 1989 (some 20 years after PE was first advocated) an eminent paleontologist could still say that ‘many practising palaeontologists do not record adequate stratigraphic data because they do not think that such information is useful’. [Christopher Paul in Evolution and the Fossil Record, p. 102] There appear to be few studies which directly show punctuation ‘events’ occurring in the fossil record, and it is part of the PE doctrine that such events are unlikely to be recorded. (In this respect PE agrees with Darwin in appealing to the incompleteness of the fossil record.) The sudden appearance of a new species in a formation is not in itself proof of punctuational evolution, since the species may have evolved slowly elsewhere and then migrated. One of the few cases claimed as showing punctuation in action (P. G. Williamson’s study of Lake Turkana molluscs) has recently been re-assessed as a case of migration: see here.

There is wider agreement that ‘stasis’ is often observed, as can be shown by striking cases of near-identical fossils separated by millions of years in time. Whether stasis is a predominant mode is less clear. The majority of recorded fossil species are known only from a single specimen, so there is no means of assessing their rate of evolutionary change. In studies of living species, there is little evidence of stasis in a strict sense. Almost every population that has been studied in detail has shown some evolutionary change going on. Most species that are geographically widely distributed also show geographical variation, which implies recent evolutionary change. Some evolutionary biologists, including John Maynard Smith and George C. Williams, have been disturbed by the apparent conflict between the evidence of change among living species and the evidence of stasis in the fossil record. One possible resolution is to suppose that most changes are short-term fluctuations which are soon reversed, and that species have an underlying stability due to ecological factors. Only if the ecosystem breaks down, as in a mass extinction, will there be a burst of more substantial and lasting change. Another possible resolution of the puzzle is that different types of species in fact show different evolutionary patterns. Most studies of evolution among living species are of land animals, whereas most paleontologists study marine invertebrates. It is possible that evolution among marine invertebrates is in general slower than among land animals. A case in point is the marine inverterbrates on the opposite sides of the Panama isthmus. The emergence of the isthmus is dated to around 3.5 million years ago. Since then the populations have been isolated from each other and have generally evolved slight morphological differences, to the extent that they are classified as closely related ‘geminate’ species. If however they were only known as fossilised remains, and it were not known that the populations had been geographically separated, they would probably be lumped together and regarded as a case of ‘stasis’, with no significant change and no speciation over a period of at least 3 million years. If such slow rates of change are the rule among marine invertebrates, Darwin could be right in arguing that evolution may be too slow, rather than too fast, to be often observable in the fossil record. A final consideration is that paleontologists are interested especially in those parts of animals which are convenient for classification, and these are often features that can be counted, such as the number of teeth in the hinge of a bivalve shell, rather than continuous quantitative variables which require careful measurement. But there are theoretical reasons for supposing that these countable or ‘meristic’ traits are subject to stabilising selection; see the section on meristic traits in Fisher’s GTNS. If so, they would tend to change only rarely, but when they do, the change would be rapid as the species moves from one stable state to another.

The Cambrian Explosion

The sudden appearance of a diverse range of fossil life at the base of the Paleozoic (now often known as the Cambrian Explosion) was recognised by Darwin as a special problem [512-6]. He considered that if his theory was true, there must have been a long period of evolution before the earliest known fossils, probably longer than the period of the known fossil record itself. Why then were there no fossils from this early period? (Darwin did mention fossils from the late pre-Cambrian Longmynd formation, and in later editions of the Origin he mentioned the so-called Eozoon in the pre-Cambrian of Canada. Eozoon is now regarded as inorganic, while the interpretation of the Longmynd fossils remains uncertain – see here ) Darwin speculated on possible explanations, such as very long term changes in the position of land and sea, which might have obliterated the pre-Cambrian record, but he concluded ‘The case at present must remain inexplicable; and may be truly urged as a valid argument against the views here entertained’.

Subsequent discoveries have partially resolved Darwin’s problem. Microscopic fossils have now been discovered far back in the early pre-Cambrian, but larger organisms are not found until quite late in t
he pre-Cambrian (the Ediacaran fauna), and hard-bodied animals not until the base of the Cambrian. The appearance of hard-bodied animals was not quite as sudden and simultaneous as believed in Darwin’s time, as small shells, and trace fossils attributed to arthropods, have been found in the early Cambrian before the appearance of the trilobites, molluscs, echinoderms, and other forms known in Darwin’s time. But the appearance of hard bodies is itself a relatively sudden event which requires some explanation, and a number of speculative hypotheses have been proposed. It also remains unclear whether the division of animals into the existing major phyla (arthropods, echinoderms, etc) was simultaneous with the Cambrian Explosion, or whether there was a long prior period of ‘cryptic’ evolution of phyla among small soft-bodied forms. There is some molecular evidence to support the ‘cryptic’ theory, and it is not inherently absurd. Most existing phyla of small soft-bodied animals, such as Rotifera, have no fossil record to speak of, despite presumably having existed in vast numbers since at least the Cambrian (see the table on p.186 of James W. Valentine, On the Origin of Phyla, 2004). The issue of timing remains unresolved (Valentine, p.195).

Overall, it is not clear whether Darwin overestimated the gradualism of the rate of evolution. Some of the issues raised by Darwin, such as the nature of the Cambrian Explosion, remain active and controversial areas of research. Darwin was undoubtedly right – and frank – in highlighting ‘the most obvious and gravest objection which can be urged against my theory’ – the lack of direct evidence in the fossil record for the gradual transitions which the theory postulated. Subsequent paleontological research has provided some examples of the kind of transitions required, but these remain a small minority of all species in the fossil record. This has led some paleontologists and biologists to think that Darwin also put too much emphasis on gradualism in sense (B): that is, his insistence that the variations favoured by natural selection were always very small. If larger variations – ‘saltations’ or ‘macromutations’ – are important in the evolutionary process, the problem of the scarcity of gradual transitions in the fossil record would largely disappear. I will consider gradualism in sense (B) in my next post.

(Republished from by permission of author or representative)
• Category: Science 
🔊 Listen RSS

This is the fourth in a series of posts about Charles Darwin’s view of evolution. Previous posts were:

1: The Pattern of Evolution.
2: Mechanisms of Evolution.
3: Heredity.

The present part deals with the subject of speciation, that is, the formation of new species. Modern commentators often regard this as one of the weaker parts of Darwin’s theory. They complain either that Darwin didn’t understand the problem of speciation, or that he did, but gave the wrong solution. On the other hand, some biologists reject the current orthodoxy, and suggest that Darwin’s approach was closer to the truth.


Speciation. By speciation I mean the formation of a new species. This may either occur by change of an existing species to the point where it is classified as a new species, or by the splitting of an existing species into two or more different species. Some authors prefer to confine the term speciation to the latter process (splitting, or ‘cladogenesis’), but I will use it in the broader sense.

The term ‘speciation’ was not coined until early in the 20th century, and therefore was not used by Darwin himself. In letters he occasionally used the term ‘specification’ with much the same sense: Life and Letters, vol.3, p.160, letter of 26 November 1878 to Karl Semper, and More Letters, vol.1, p.380, letter of 25 November 1869 to George Bentham.

Species. The most widely used modern definition of ‘species’ is Ernst Mayr’s Biological Species Definition (BSD), according to which species are ‘groups of actually or potentially interbreeding natural populations which are reproductively isolated from other such groups’ [Mayr, 19]. If two sets of organisms are living in the same place at the same time, and are successfully interbreeding, then they belong to the same species. If they are living in the same place at the same time, but are not successfully interbreeding to any significant extent, then they belong to different species. If two populations live at different places and/or times, they cannot be ‘actually’ interbreeding, and the question is then whether they are ‘potentially’ interbreeding. As Mayr recognises, this cannot usually be directly tested [Mayr, 22]. The crucial question is whether there are ‘isolating mechanisms’ that would be sufficient to prevent successful interbreeding if the two populations were combined under natural conditions [Mayr, chapter 5]. Geographical barriers or distance by themselves do not count as isolating mechanisms, since two separated populations may be reunited and then interbreed. Some intrinsic difference of genetics or behaviour, sufficient to prevent successful interbreeding, is necessary for reproductive isolation[91].

The BSD is not universally accepted. Some taxonomists find little use for it, as the status of ‘potential interbreeding’ is often uncertain. Mayr himself admitted that the BSD is not always applicable (for example, to wholly asexual species). The BSD does however have one great merit for theoretical purposes. If the pattern of evolution is ‘tree-like’, as generally accepted by modern biologists, then a crucial part in any system of classification is played by the points at which branching takes place. A species, under the BSD, marks the lowest level of classification at which two populations have passed such a point. It is desirable to have a term to mark this distinction, and the BSD meets this requirement. The problem remains that in many cases there is no way of verifying whether two populations have passed the point of separation.

Modes of speciation A large number of different terms have been used to describe processes, observed or hypothetical, by which speciation (in accordance with the BSD) may occur. Usage by different authors is not always consistent. I will use the following terms.

Allopatric speciation occurs when two or more populations of the same species, living in different places (the literal meaning of ‘allopatric’) are separated by a geographical barrier or unoccupied space, and become reproductively isolated from each other. (This is equivalent to what Mayr calls ‘geographic’ speciation.) Reproductive isolation may be acquired by evolutionary changes in the populations following their geographic separation, or by the extinction of intermediate forms in part of a continuous range, leaving the remaining forms reproductively as well as geographically isolated from each other.

Sympatric speciation occurs when two or more species are formed out of a single species living in the same place (the literal meaning of ‘sympatric’). On this interpretation of ‘sympatric’, the terms ‘allopatric’ and ‘sympatric’ do not exhaust the possibilities, because there could be a third case where new species arise in adjacent parts of a continuous range. Mayr, on the other hand, uses ‘sympatric’ to cover every case other than ‘allopatric’.

Parapatric speciation is the third case just mentioned, where new species arise in adjacent areas without geographical separation between them. Mayr calls this hypothetical proceess ‘semigeographic’ speciation [Mayr 525] .

Peripatric speciation is a form of allopatric speciation where new species arise from relatively small geographically isolated populations on the periphery of a species range.

Stasipatric speciation occurs when a new species is formed in a relatively small locality within an existing species range, displacing or coexisting with the parent species, but not interbreeding with it. It is generally assumed that in this case reproductive isolation occurs as a result of polyploidy or some other major change in the chromosomes, and some authors make this part of the definition of ‘stasipatric’. I prefer to avoid assuming a particular mechanism of reproductive isolation, and will use ‘stasipatric’ to refer to any form of speciation in a small area within a species range.

Theories of speciation

In the second half of the 20th century the dominant theory of speciation was that of Ernst Mayr. Mayr maintained that, except for speciation by polyploidy and other major chromosomal changes, the only major form of speciation was allopatric (or geographic in Mayr’s terminology). He particularly stressed the importance of peripatric speciation in small isolated areas. He argued vigorously against sympatric and parapatric speciation. His main theoretical argument was that major divergence between populations, to the point of reproductive isolation, is not possible without geographical obstacles to gene flow. He did however define ‘geographical obstacles’ very widely, so that, for example, different host species of parasites could be regarded as spatially separated [Mayr 349]. There might be geographical obstacles even within a single fresh water lake [Mayr 465]. Mayr accepted that polyploidy was an important mode of speciation among plants and some invertebrates, but disputed the importance of chromosomal changes in speciation among vertebrates.

Mayr’s views dominated post-war thinking on speciation, but were increasingly challenged from about 1970 onwards. Mayr’s rejection of sympatric and parapatric speciation was based mainly on verbal arguments, whereas quantitative models showed that sympatric and parapatric speciation were theoretically possible. There also seemed to be cases, like species swarms of fishes in lakes, which were difficult to explain by allopatric speciation. However, recent reviews of the evidence sugg
est that there are few clear examples of sympatric speciation [C&O 178], while it is very difficult to distinguish between the effects of allopatric and parapatric speciation [C&O 118] in those situations where the question arises.

Darwin on Speciation

There is no single section in the Origin devoted to what we now call speciation, so the first step is to identify which parts of Darwin’s work are relevant. Many parts of the Origin could have some bearing on the subject, but the following are the main ones:

‘Variation under Nature’ – important for its discussion on the distinction between species and varieties

‘Natural Selection’ – important for the discussion of circumstances favourable and unfavourable to natural selection, including isolation and interbreeding, and for the ‘principle of divergence’

‘Difficulties of the theory’ – probably the most important chapter for the problem of speciation, because it deals with the question how varieties and species can be formed despite interbreeding

‘Hybridism’ – discusses the evidence on interspecific breeding and the viability and fertility of hybrids

‘Geographical Distribution’ (two chapters) – important for discussion of isolation and means of dispersal.

‘Recapitulation and Conclusion’ – contains brief statements of most of Darwin’s key propositions.

Relevant comments may also be found in other works, and in Darwin’s correspondence.

Darwin’s definition of species

Darwin does not propose a formal definition of ‘species’, and he implies that any such definition would be arbitrary. He argues, especially in the section ‘doubtful species’ [Origin, 126-38], that there is no sharp distinction between varieties and species: ‘Certainly no clear line of demarcation has as yet been drawn between species and sub-species – that is, the forms which in the opinion of some naturalists come very near to, but do not quite arrive at the rank of species; or again between sub-species and well-marked varieties, or again between lesser varieties and individual differences’. Summing up his position, he says: ‘From these remarks it will be seen that I look at the term species as one arbitrarily given for the sake of convenience to a set of individuals closely resembling each other, and that it does not essentially differ from the term variety, which is given to less distinct and more fluctuating forms’. In the chapter on Hybridism Darwin discusses the varying degrees of intersterility and viability of offspring between recognised species, and concludes that ‘neither sterility nor fertility affords any clear distinction between species and varieties’[427]. Darwin was of course concerned to refute the traditional view that species were created with unbridgeable differences between them, and that intersterility was a special endowment designed by the Creator to keep them separate. This may have led him to play down the importance of reproductive isolation as a criterion of species status. It is unlikely that he would have accepted the BSD, if it had been put to him.

Darwin on Speciation

Regardless of whether or not Darwin would have accepted the BSD, we may still ask whether his theory can account for the division of existing species into new species as defined under the BSD. In other words, does he adequately explain reproductive isolation?

Some critics would claim that Darwin did not even recognise the problem, and that he therefore did not offer a theory of speciation at all. But this is an inaccurate criticism, as a section of the chapter on ‘Difficulties of the theory’ is devoted to the problem. Whether or not one agrees with Darwin’s ‘solution’, he did offer one.

First, it may be noted that Darwin deliberately rejected one tempting option: the proposal that the barriers to interbreeding between species were due to the natural selection of sterility between them. In a lengthy correspondence Alfred Russel Wallace tried to persuade Darwin to accept this solution, but after much agonising Darwin rejected it. He concluded that the observed pattern of sterility and fertility was difficult to reconcile with an explanation by natural selection, for example because species from widely separated areas, where there could be no selective pressure against interbreeding, were nevertheless often intersterile, or produced sterile hybrids, in captivity. He also saw a fundamental theoretical objection to Wallace’s theory. Wallace argued that intersterility would be selected because it was beneficial to the species, or to the variety, but Darwin pointed out that there would be no advantage to individuals, (or indirectly to their ‘nearest relatives’ or other individuals of the same variety) in a reduction of fertility.[444] He therefore did not see how the sterility could be initiated and gradually increased by natural selection. This is one of Darwin’s most important discussions of ‘levels of selection’, and I will return to it in another post. Since intersterility could not be explained by natural selection, or by a ‘special endowment’, Darwin concluded that it was a by-product, ‘an incidental result of differences in the reproductive systems of the parent species’[425]. This would be generally accepted by modern biologists.

So how did Darwin explain the divergence of varieties to the extent of what we now call speciation?

An important part of the answer was always geographical isolation. Long before the Origin, in a letter of 1844 to Joseph Hooker, Darwin wrote that ‘the most general conclusion, which the geographical distribution of all organic beings, appears to me to indicate, is that isolation is the chief concomitant or cause of the appearance of new forms’ [L&L, ii, 28]. In the Origin the emphasis on isolation is somewhat reduced, but it is still one of the most important factors, for example in the chapters on geographical distribution.

For highly mobile animals, Darwin comes close to regarding isolation as essential for speciation: ‘intercrossing will affect those animals most which unite for each birth, which wander much, and which do not breed at a very fast rate. Hence in animals of this nature, for instance in birds, varieties will generally be confined to separate countries, and this I believe to be the case’[194].

In contrast, for ‘hermaphrodite organisms which cross only occasionally, and likewise in animals which unite for each birth, but which wander little and which can increase at a very rapid rate, a new and improved variety might be quickly formed on any one spot, and might there maintain itself in a body, so that whatever intercrossing took place would be chiefly between the individuals of the same new variety. A local variety when thus formed might subsequently slowly spread to other districts’[194]. This might be regarded as a form of stasipatric speciation.

For the generality of organisms, which are neither highly mobile nor static, the problem remains of explaining how distinct species are formed, rather than a smooth continuous distribution, now called a cline[323]. Darwin considers the possibility that intermittent periods of isolation have always been involved in speciation. But Darwin rejects this option, saying ‘ I will pass over this way of escaping from the difficulty; for I believe that many perfectly defined species have been formed on strictly continuous areas; though I do not doubt that the formerly broken condition of many areas now continuous has played an important part in the formation of new species, more especially with freely crossing and wandering animals’[324] To account for the formation of distinct species within a continuous area, Darwin describes what would now be called a form of parapatric speciation. He stresses that the organic and inorganic environment seldom change smoothly. Within the range of a species, there are likely to be zones where conditions
are relatively unfavourable, and the population will be sparse and liable to periodic ‘extermination’ (325), for example when a predator or prey species fluctuates in numbers. For these reasons the population of a species will be much larger and more continuous (in time) in some areas than others. The areas where the population flourishes will be more favourable to evolution, since there will be more chance for new variations to arise, whereas in sparsely populated intermediate areas there will be less variation, the population will be liable to ‘accidental extermination’, and intermediate forms will be constantly at risk of being overrun by the more successful surrounding varieties, which are more sharply distinct.[326]

This account of the processes leading to parapatric speciation has much in common with more modern approaches. There is however still one element missing from Darwin’s theory. Modern theories generally incorporate the idea that behavioural mechanisms will evolve to discourage mating between different varieties in ‘border’ zones, where the offspring of such matings would be disadvantaged. There is perhaps a hint of such mechanisms in one remark of Darwin, where he says that ‘I can bring a considerable category of facts, showing that within the same area, varieties of the same animal can long remain distinct, from haunting different stations [ecological niches], from breeding at slightly different seasons, or from different varieties of the same kind preferring to pair together’[194]. But this passage is a long way from Darwin’s discussion of his ‘parapatric’ model, and it would be straining interpretation to suppose that he intended them to be connected.

We may conclude that Darwin believed in the occurrence of allopatric, parapatric, and possibly stasipatric modes of speciation, even if he did not by modern standards have fully worked-out models of the process.

There remains the question whether Darwin also believed in the occurrence of sympatric speciation. Of course, if ‘sympatric’ is defined so as to include ‘parapatric’, then the answer is trivially ‘yes’. But if we define ‘sympatric’ more narrowly, to require divergence of two populations living together in the same or widely overlapping areas, then the answer is not so clear. Some modern commentators are confident that Darwin did accept sympatric speciation in this sense [C&O 125] . Against this, we may set a quite explicit denial by Darwin himself: ‘I do not believe that one species will give birth to two or more new species, as long as they are mingled together within the same district. Nevertheless I cannot doubt that many new species have been simultaneously developed within the same large continental area; and in my ‘Origin of Species’ I endeavoured to explain how two new species might be developed, although they met and intermingled on the borders of their range. [Darwin's emphasis] It would be a strange fact if I had overlooked the importance of isolation, seeing that it was such cases as the Galapagos Archipelago, which chiefly led me to study the origin of species’ [letter of 13 October 1876 to Moritz Wagner, Life and Letters, iii, 159]. One could hardly expect a clearer statement of the distinction between parapatric and sympatric speciation, or a clearer rejection of the latter. How then can it be maintained that Darwin believed in sympatric speciation?

It is, unfortunately, common for an author to be confused or inconsistent in his or her views, so a clear denial by Darwin of sympatric speciation in one passage does not rule out his acceptance of the process elsewhere. The interpretation of Darwin as an advocate of sympatric speciation rests on the section on ‘divergence of character’ in the chapter on Natural Selection. Here Darwin attempts to explain why the descendants of a single species diverge into many different types. The general explanation is that there are advantages in an ecological ‘division of labour’: ‘the more diversified the descendants from any one species become in structure, constitution and habits, by so much will they be better enabled to seize on many and widely diversified places in the polity of nature, and so be enabled to increase in numbers’.[207] They may, for example, feed on different kinds of prey, or live in different habitats such as trees or water. Even in the same patch of ground, a diverse mixture of species and genera of plants will produce more vegetation than a single species or variety.[207]

The principle of divergence of character is important and in general plausible, but its application to varieties within a single species and in a single geographical area is problematic. If Darwin had confined the principle to varieties which had already reached the stage of distinct species, there would be no problem, but some of his wording does seem to apply to sub-specific varieties. Darwin opens his discussion by asking, ‘how then does the lesser difference between varieties become augmented into the greater difference between species?’[205], and the principle of divergence is his ostensible answer[208]. There would still be no problem if he confined the divergence of varieties to cases where they live in different areas, but he does not explicitly limit the principle in this way, and some of his illustrations of the principle seem to involve such cases; notably, he refers to different varieties of grass in the same patch of ground[207]. Yet the evolution of different varieties within the same small area would conflict not only with Darwin’s clear contrary statement to Wagner, but with those passages of the Origin itself which deal with the ‘blending’ of varieties through interbreeding. Moreover, even in the section dealing with divergence of character, Darwin goes on to say that the animals and plants living on a small patch of ground, and which therefore compete most severely with each other, in general belong to different genera or orders: ‘where they come into the closest competition with each other, the advantages of diversification of structure, with the accompanying differences of habit and constitution, determine that the inhabitants which thus jostle each other most closely, shall, as a general rule, belong to what we call different genera and orders’ .[208] In this case they cannot be recently descended from different varieties of the same species. It is all rather confusing. The charitable interpretation is that Darwin wished to deal only with one issue at a time, and intended his discussion of divergence to be qualified by the discussion of ‘blending’ in the chapter on ‘Difficulties of the Theory’. I think however it is more likely that Darwin simply overlooked the tension between his comments on divergence and his comments on blending.

Overall, Darwin’s position on the modes of speciation is pluralistic. He recognised what we call allopatric and parapatric speciation, and possibly also stasipatric speciation. His position on sympatric speciation is more doubtful.

This pluralism contrasts with the dominant modern doctrine of Ernst Mayr, which recognises only allopatric speciation (with polyploidy and other major chromosomal changes admitted as a special exception). Mayr and his adherents therefore found Darwin’s position unsatisfactory. Orthodox ‘Mayrism’ has however come under increasing criticism in the last few decades. Not surprisingly, some of those who criticise Mayrism have found support in Darwin’s writings, and applaud his supposed acceptance of sympatric speciation [H&B 90]. The importance and prevalence of different modes of speciation remain open questions in evolutionary biology.


Origin: Charles Darwin: The Origin of Species: a Variorum Text, edited by Morse Peckham, 1959, reprinted 2006.
Mayr: Ernst Mayr, Animal Species and Evolution, 1963
C&O: Jerry Coyne and H. Allen Orr, Speciation, 2004
H&B: Endless Forms: Species and Speciation, ed. D. J. Howard and S. H. Berlocher, 1998.
Life and Letters of Charles Darwin 3 vols, ed. Francis Darwin.
More Letters of Charles Darwin, 2 vols, ed. A. C. Seward and Francis Darwin.

(Republished from by permission of author or representative)
• Category: Science 
No Items Found
The “war hero” candidate buried information about POWs left behind in Vietnam.
What Was John McCain's True Wartime Record in Vietnam?
The evidence is clear — but often ignored
Are elite university admissions based on meritocracy and diversity as claimed?
A simple remedy for income stagnation