I have no idea what you will be thinking or doing on 12th December, but efforts are being made to determine how UK citizens will vote on that day. Why the fuss? A rational approach to elections is to read the party manifestos, judge the personal and societal impact of the proposals, calculate the probability of the promises being kept, and then vote in advance by postal ballot. That done, the good citizen has no need of reading the news, watching debates, or even waiting in a queue on election day. Certainly, the rational citizen has no need of opinion polls: they are not relevant to informed decision-making.
In a further show of rationality, the wise elector will not stay up on election night, since the event cannot be influenced by watching it. Wiser still, the elector should not check the results for a few days, to avoid wasting time with dangling chads, recounts and other frivolities.
Sadly, the current culture of instant gratification requires a daily feed of speculation about how the election is going, that is, how other people are thinking about which way they might vote on election day, and whether they are likely to vote at all. That provides hours of comment. Then, in a meta-analytic frenzy, commentators discuss how much the opinion polls themselves are causing citizens to change their intended behaviours, as they realize that they are in a minority or, conversely, in such a majority that they might be giving too much power to the most popular party. Others, feeling exposed at being revealed to be in a minority, switch to the most popular party, just as others are abandoning it.
Opinion polls took off in the US in the 1920s. Gradually pollsters learned about the effects of selection bias. If you only poll the readers of your newspaper, you leave out the majority who don’t read it, and certainly those that don’t read any newspapers. Sampling is crucial. However good your sample, you can never include those who don’t want to be sampled. Different avenues of approach lead to different types of voter. Telephones (remember those?) only catch older people at home, and that may be only 20% of home numbers dialed.
To keep calm and avoid election fever, I have been reading a book on statistics. The best statistics books are those one does not admit reading. They explain complex matters in simple terms, and one is forever grateful, without admitting it.
David Spiegelhalter. The Art of Statistics: Learning from data. Random House, 2019.
This is a good book, with helpful explanations which concentrate on key concepts, not specific formulae. Brian Everitt, one of the developers of cluster analysis, always said to me that it was a deficiency of statistics that when you asked a statistical question you got a number instead of an answer. “Yes” or “no” are generally the answers one is searching for.
Far from bringing solace, the book discussed the problem of using opinion polls to predict election results, using the UK election of 2017 as a prime example of the shortcomings of conventional techniques, which failed to spot a late surge for the Labour party, leading to a hung Parliament, and a precarious working majority derived from a cobbled together Conservative coalition.
Can statisticians do better in 2019?
Spiegelhalter’s book is worth reading because he poses interesting questions, and answers them without numbers (or at least, without too many complicated numbers in the first instance). His aim is to get you to think straight about problems, and to solve them in a systematic way, leaving the number and calculation issues till later. Think hard, plan carefully, and then you can let the (properly selected and presented) numbers do the talking. OK, numbers don’t talk, but if you have thought things through, then you can explain the findings in ordinary language.
For example, has a nice family doctor been murdering his patients? How does one estimate normal death rates in medical practices? What counts as an excess? Anything else worth measuring? Is time of the day the death occurs something worth counting?
Spiegelhalter shows in a simple figure (page 5) that Dr Harold Shipman’s victims disproportionately died in the afternoons, when he did his home visits, and administered a lethal opiate overdose to at least 215 elderly patients. As Spiegelhalter dryly observes: “The pattern does not require sophisticated statistical analysis.”
Spiegelhalter’s approach is immensely sensible. He shows that statistics require careful thinking, and only then some number crunching, followed by an honest depiction of the findings. He is a good guide to statistics, particularly for those who panic at the sight of mathematical notation. He is good at explaining (yet again) the difference between relative and absolute risk; the distorting effects of question framing (in the UK 57% supported “giving 16 and 17 year olds the right to vote” but only 37% agreed with the logically identical proposal to “reduce the voting age from 18 to 16”; the distorting effects of telephone polls which do not declare what proportion of telephone numbers dialled never answered; he explains that a causal link does not mean that every single person shows an effect (many people smoke and don’t get cancer, but smoking causes more people who smoke to get cancer than those who don’t smoke, and some non-smokers get cancer); don’t rely on a single study, instead review all studies systematically; some potential causes can be called “lurking factors” but actual causes follow the Bradford Hill criteria (page 115): effect size so large it cannot be explained by plausible confounding; appropriate temporal or spatial proximity in that cause precedes effect and effect occurs after a plausible interval, or at the same site; effect increases as exposure increases, and reduces upon reduction of the dose; there is a plausible mechanism of action; the effect fits in with what is known already; the effect replicates; and the effect is found in similar but not identical studies.
He is excellent at explaining regression to the mean (two thirds of the apparently beneficial effect of speed cameras are due to regression to the mean); at discussing the bias/variance trade-off (over-fitting predictors to correct “bias” and reflect local circumstances at the cost of lower reliability); and in accusing algorithms of prejudice when predicting whether criminals will re-offend, notions of justice are being favoured over predictive accuracy.
There is much to recommend in this book. He covers a wide range of statistical issue with clarity, particularly on probability, so Chapters 8 and 9 are worth buying the book for. I will probably refer further to this book in subsequent posts.
What does this mean for the UK election on 12 December? I will try to explain. The current YouGov snapshot shows the following voting intentions, and what they are likely to mean in terms of parliamentary seats.
UK elections are carried out in 650 constituencies, and use the “first past the post” system, in which the winner is the one with most votes, and becomes the Member of Parliament, and all the other votes contribute nothing to the election. Harsh but effective, like the race between spermatozoa.
Most constituencies vote solidly one way or the other, so can almost be counted in advance, and are ignored by campaigners who concentrate on the “marginals” where a change of winner is possible. The constituencies are somewhat biased towards urban centres, and are usually smaller, giving city voters an advantage. This is not democratic, but the parties have not agreed to a proper reform of the boundary setting process. Currently cities tend to vote Labour. After the election the political parties can do some horse-trading. In 2017 the Conservatives were short of seats and borrowed them, at significant cost, from the Unionists in Ulster, with whom they have traditional links. This time if the Labour Party do not win they are likely to seek support from the Scottish Nationalists (who don’t want to be part of the Union anyway) and the Liberal Democrats, who say they won’t work with Labour, but the prospect of power will make them pause for only a few minutes. So, it is possible for the Conservatives to win the popular vote (which is irrelevant) and gain the largest number of seats under their own banner, but still lose power if the other minority powers decide, after the election, on a coalition of convenience in which they bury their differences to thwart the winner.
Because of these complications, pollsters can only attempt to predict voting intentions, and then need to use some interesting statistics to predict what it will mean for each constituency. They use available demographic and social data at constituency level to estimate how much the national figures will vary in each constituency, and therefor what the likely winner will be. There are several ways of doing this, and different groups have come up with somewhat different results.
Into this mix we must put that large tribe, about one tenth of voters, who don’t yet know how they will vote. They will possibly get inspiration as they look at the ballot sheet. Animal charities may benefit. They could jump any way, or stay at home if it is cold and rainy. Indeed, all voters could stay at home. Furthermore, all the polling data is always a day or two late, and changes of mind happen at the moment of voting.
Plus, all the predictions come with a margin of error of 3 points. Spiegelhalter recommends doubling that to 6 points to account for samples which are too small, sampling approaches which leave out particular groups, questions which are too leading, polling which is too late to detect surges of support (in 2017 there was a late Labour surge which was missed by pollsters) and plain ordinary human cussedness. Those factors, plus idiosyncratic marginal constituency three-way races complicate the picture, even before you factor in deliberate tactical voting, make this a hard election to call. The current state of the polls may be deceiving, and already out of date as the crunch vote hoves into view. Yet, it seems likely at the moment that the Conservatives will scrape in with a working majority. With any luck, that will spare us from elections until late 2024.
Vote early, vote often.