Some things are associated with others. Some things you eat make you ill. Some animals attack you. Some places are dangerous, some people likewise. On a brighter note, some foods are tasty and healthy. Some animals can be domesticated, or at least are easy to hunt or trap. Some places are safe, and some people likewise.
Correlation is not causation, but it’s the way to bet. Your life may depend upon it. Under-predict dangers and you could end up dead. Better to be safe than sorry. Better to be sorry that you have missed some opportunities than to be dead. It is sensible to worry about what may happen. Stereotypes are your friend. They are preliminary observations about life. Improve them as you learn more. Some must be discarded, but many more can be sharpened up and refined.
Life is a dilemma. When searching for a meal you must avoid ending up as a meal. Be careful, but don’t worry so much that you cannot forage for food. Hunger will make you adventurous, and then you are at risk again.
Ideally, we would never calculate correlations coefficients, but would just look at the data properly plotted out, ideally over a long period, and judge things by eye. The shape of the distribution matters. Intellectual and scholastic tests need not be a perfect bell curve, though they can be pretty close to one.
Sometimes an unknown force distorts the distribution, as when illness and infections sap the wits of poor citizens living in bad circumstances. More mysteriously, sometimes distributions are almost normal, but pinched into a narrower range, as if bound by a tighter central limit. Why are some groups narrower than others? Women, for example? African Americans, for another? Easy to see how systematic disadvantage could shift a mean downwards, less easy to see how those forces could both encourage low scorers and discourage high scorers.
A correlation coefficient is a straight-line simplification. Useful, though. It captures a lot in a little number. Standard deviations are also very informative.
It is no disproof of a correlation that it is not unity. Most real-life correlations are far less than perfect, but will be much better than guessing, even though there will always be outliers. Adding up those outliers in terms of residuals (errors of prediction) is a useful way of understanding the power of predictions based on correlations. For example, if you have to predict the height of an unknown person, your best bet (least error prone) is to predict that they are of average height. If you are asked to predict the height of 100 people, betting that everyone of them is of average height results in your error of prediction being the same as the standard deviation of the height of the general population.
If you have extra knowledge, such as being told the height of the individual’s parents, then you can improve your prediction by taking that into account. You will have reduced your error of prediction, and can compare how much it improves your bets by comparing your reduced residual with that of the standard deviation of the population.
Some people really believe they have invalidated a correlation by drawing attention to a particular outlier. If you conceive of a correlation as an ellipse rather than a straight line you can see that the highest scorer on one variable will not be the highest scorer on the other variable. That only happens with perfect correlation. Steve Hsu explains the issue here:
Correlation is not causation, but you are more likely to find a cause in a correlated variable than in an uncorrelated one. Search where there is at least a trace of a putative connective tissue. If you think it was the tomato that upset your digestion, start your controlled trial on tomatoes.
Correlation is not causation, but sometimes a finding is suggestive, like a trout in the milk. It does not prove that the milk was watered, but it makes you suspicious.
The “correlation is not causation” mantra is true as far as it goes, but it tends to be used so as to argue that, despite many correlations linking A with B being found in different circumstances, these will somehow never suffice to strongly suggest a causal link between A and B. On the contrary, correlation is a necessary feature of causation, but not a sufficient proof. Correlation is not always causation, but it helps find causes. Correlation is a pre-condition of causality.
Michael Woodley has set a challenge: “Sure, correlation does not equal causation, but find me just one single instance of a causal relationship where there is no correlation (just one would suffice).”
Whilst it is true that correlation does not necessarily equate to causation, all causally related variables will be correlated. Thus correlation is always necessary (but not in and of itself sufficient) for establishing causation.
The claim that ‘correlation does not equal causation’ is therefore meaningless when used to counter the results of correlative studies in which specific causal inferences are being made, as the inferred pattern of causation necessarily supervenes upon correlation amongst variables. Whether the variables being considered are in actuality causally associated as per the inference is another matter entirely.
The correct critique of such findings therefore is from mediation, i.e. the idea that a given correlation might be spurious owing to the presence of ‘hidden’ variables that are generating the apparent correlation. A famous example is yam production and national IQ, which across countries correlate negatively. It would be wrong to say that yam production somehow inhibits IQ, as the association will in fact turn out to be mediated by something like temperature and latitude. These variables are in turn proxies for historical and ecological trends that make the sort of countries that yield fewer yams the sort of countries that are typically populated by higher ability people, and vice versa. The causation in this case is via additional variables, which cause the covariance between the two variables of interest, without there being a direct effect of one on the other.
Properly constructed multivariate models can use these patterns of mediation to infer the likelihood of causation going in one direction or another. Thus, it is possible to actually test causal inference amongst a population of correlated variables. By far the best way of doing this is to compare the fits of models containing specific theoretically prescribed patterns of causal inference against (preferably many) alternative theoretically plausible models, in which alternative patterns of causation are inferred (Figueredo & Gorsuch, 2007).
Sir William Gemmell Cochran termed this “Fisher’s Dictum‟:
“About 20 years ago, when asked in a meeting what can be done in observational studies to clarify the step from association to causation, Sir Ronald Fisher replied; `Make your theories elaborate.’ The reply puzzled me at first, since by Occam’s razor, the advice usually given is to make theories as simple as is consistent with known data. What Sir Ronald meant, as subsequent discussion showed, was that when constructing a causal hypothesis one should envisage as many different consequences of its truth as possible, and plan observational studies to discover whether each of these consequences is found to hold. (Cochran, 1965, §5).
Cochran, W. G. (1965). The planning of observational studies of human populations (with Discussion). Journal of the Royal Statistical Society. Series A, 128, 134–155.
Figueredo, A. J., & Gorsuch, R. L. (2007). Assortative mating in the jewel wasp. 2.
Sequential cononical analysis as an exploratory form of path analysis. Journal of the Arizona-Nevada Academy of Science, 39, 59-64.
Following on from Woodley it is interesting to calculate how many replications of a correlation would be needed to strongly suggest a causal relationship, including the possibility that it was caused by an ever-present hidden variable yet to be identified. Would three replications be enough if one was in a hurry?
Faced with the 1854 cholera outbreak in London, John Snow had no idea what mechanism caused cholera, and his instruments could not reliably identify the contaminants in water supplies, but he noted what we would now call correlations: some water companies had more of their clients die than others, even though all of them supplied to rich and poor households alike. South of the river companies were more deadly, and they drew more contaminated water from the river rather than other sources, and filtered it less than other companies. Some neighbourhood pumps had more deaths nearby than did others. This was a geographic form of correlation (now called a Voronoi diagram) and it was on that correlational basis, without knowledge of the real mechanism, that he took the handle off the Broad Street pump, and stopped the epidemic.
That is the way we tell the story now, but Snow was a careful and clever man, and pointed out another explanation: the cholera outbreak was coming to an end anyway, as people ran away from areas where there were many deaths. The common folk who believed that correlation implied causation ran for their lives and lived to see another day.
Snow also had to cope with a major anomaly in his geographic correlational investigations. None of the brewery workers right next to the Broad Street water pump fell ill with cholera. It turned out that they received free beer, and the water for the beer was boiled so as to release the flavour of hops, thus inadvertently killing off the water-borne organisms.
Snow jumped to a conclusion because his mind was prepared to interpret associations in a particular way, intially by his doubts about the air transmission miasma theory and later by his own hypothesis of water-borne transmission. He jumped to the right conclusion, without proofs of the causal mechanism which were only available years later.
Correlation is not causation, but it’s the way to bet.