If someone tells me I must not read something, I am tempted to give it a look. If you are reading this, you probably have the same curiosity, and the same wish to rebel against other people telling you what you may not read, and what you must not think.
In that light, here is an interesting story. Some authors who had published an academic paper on 26 June asked for it to be withdrawn. Very odd: getting papers published is difficult and very time-consuming. Most authors want them to be read as widely as possible. If other researchers have objections, they submit papers criticising the original paper. Sometimes an individual error is found and corrected in an Erratum statement. To withdraw a whole paper in this way is unusual.
Psychological Science 26 June 2020 “Declines in Religiosity Predict Increases in Violent Crime—but Not Among Countries With Relatively High Average IQ”
What mistake did the authors find which made them take down a paper which had been peer reviewed (by four reviewers, including a statistician, plus two members of the editorial team) and accepted into the public domain as a scholarly publication?
The journal, Psychological Science, says that the authors:
requested that this article be retracted out of concern that some of the measures used in the research were invalid. Specifically, they note that the National IQ data used in their analyses, largely based on Lynn and Vanhanen’s (2012) compilation, are plagued by lack of representativeness of the samples, questionable support for some of the measures, an excess of researcher degrees of freedom, and concern about the vulnerability of the data to bias. They also noted that the cross-national homicide data used in the research are unreliable, given that many countries included in the data set provided no actual data on homicides that had occurred. Instead, in these countries, homicide rates were estimated on the basis of other variables that may or may not be closely related to homicide rates. Importantly, some of the variables used to create the estimates were confounded with variables of interest in the research. When the authors re-analyzed the data without the imputed values, the reported effects were no longer apparent.
In the conclusion of their request for retraction, the authors reflected that although articles with certain types of errors may still be helpful to have in the literature, they do not believe theirs falls into that category. They explicitly expressed concern that leaving the article in the literature could “prolong the use of Lynn & Vanhanen’s cross-national IQ measures.”
So, the authors withdrew mainly because of criticisms of Lynn’s work on country IQs. Usually, once a paper has been accepted, other people read it and then write papers in reply. Those papers, like the one they are criticising, go through their own review process, and eventually get published. In that way we, the readers, see what the original authors have said, and what critics have replied. We can judge argument and counter-argument. This is usual academic practice. Debate takes time, but is done in the open.
This case is unusual. The paper was accepted on 26 June, and withdrawn 3 days later, on the basis of arguments we haven’t seen published. We haven’t even seen the re-analysis done by the authors. In fact, because of the inordinate delays imposed by academic journals, (in which authors write for free, referees referee for free, and then every student and member of the public has to pay to read) the paper was accepted “In Press” in January 2020, and caused no particular critical reaction, but as the actual publication date approached they received more criticisms in the final weeks, resulting in this withdrawal.
The Editor, Patricia Bauer, adds:
Critiques of Lynn and Vanhanen’s (2012) National IQ data were available in the literature prior to the publication of Clark et al. (2020). It is unfortunate that these critiques were not consulted, thereby potentially avoiding publication and the necessity for retraction.
Of course, this assumes that the critiques were right. Almost every paper of note generates critiques, and at best these criticisms can improve later work. At worst they throw up a lot of dust. Sometimes criticisms can be shown to be wrong, or that they selectively require standards no other research has achieved. In fact, the authors did consult David Becker, who is in charge of editing the Lynn database, which is now in a public form that allows any critic to make their own evaluations of the quality of individual papers.
We would like to thank David Becker for his helpful correspondence regarding the NIQ dataset and the relative merits of different country-level IQ measures.
You can get the database here, and download it.
Why has usual debate been circumvented? Perhaps, very late in the day, the authors have accepted criticisms which they did not accept or know about when the paper was accepted last year. That seems unlikely. Lynn’s work has been very widely criticized, often by concentrating on a few of the least representative studies. Nonetheless, the general findings been replicated by others, often in the economics literature, coming to the same conclusions without mentioning Lynn, and using different terms like “human capital” which do not arouse so many emotions as IQ. The same general pattern is observable in PISA and other international scholastic studies. Not all countries participate in those studies, but those countries sometime use PISA items in their national tests, so one can deduce what the general levels would be if they did participate.
Becker has ensured that you can compare the Lynn estimates with the scholastic estimates, and anyone can see how the different variations correlate. You can compare Lynn’s list with the shorter reference list that Becker has been able to use, and compare results.
I have read the paper, and in my view the authors are correct in their assessment of the Lynn database, that it is the best source of country intelligence results, and that in the Becker editions there are different variants (some with, and others without, scholastic data, and some with and without estimates for missing countries based on geographic neighbours). That is, they do not go overboard with it, and are aware of restrictions and short comings. On those they note:
Note also that noise in the data, if anything, should obscure our hypothesized pattern of results.
They are cautious about their measures.
Study 2 examined the interaction between country-level IQ and religiosity on homicide rates. All countries for which the relevant data could be obtained were included. Given that there are no objective best measures of religiosity and IQ nor an objective best list of relevant control variables, we conducted a multiverse analysis using three operationalizations of religiosity, three operationalizations of IQ, all possible combinations of four control variables, and additional interactions between those control variables and each operationalization of religiosity.
The author’s conclusion is very modest, and in some ways in favour of religious belief.
One-size-fits-all social prescriptions to complicated social problems may lack important nuance. And indeed, some cultural institutions (like religion, but also others such as monogamous marriage norms [see Henrich, Boyd, & Richerson, 2012]) that are denigrated as outmoded among high-IQ populations, may still serve valuable functions among other groups around the world.
Furthermore, the sample sizes in the Lynn database at 353 subjects on average are larger than the norm in psychology papers, 40 to 120 depending on subfield, (Kirkegaard 2019) so, far from this set being the worst, it is standard psychology papers which are “plagued by lack of representativeness of the samples”.
Incidentally, the Kirkegaard review recommended more up to date statistical techniques and more discussion about cultural biases, while also noting that other researchers are using similar data, and coming to the same general conclusions, and that the Lynn database has led to a very productive research program.
In response to one of their reviewers they even ran their analyses again without the Lynn data (page 15) using school assessment data only, and got the same results.
So, we are led to believe that, after submitting their paper, having it reviewed and accepted for publication, they suddenly found that the Lynn database had been subject to criticism, and must be abandoned.
One paper which influenced the authors in this judgment was Kanis et al 2017. This paper is a very interesting investigation of the methods used to record homicide rates across the world, either by direct methods in the case of well-organized wealthy countries, or by inferential modelling in the case of less-well-organized poorer countries. At the end they suggest that the directly estimated WHO figures are the best to use, and the inferred ones must be treated cautiously. Kanis et al. do not spell this out, but some dangerous places do not careful record homicides. Absence of data often means that things are pretty bad.
I don’t see this as a reason not to study international homicide rates: merely a warning that you should try to run studies with the direct data and the inferred data, and see what holds up, and (my emphasis) use your judgment about the inferences which come out of the modelling. If the Clark et al. authors did that, it would be good to see their workings.
So, how did the journal react to this retraction? One approach would have been to state it and move on. After all, the journal had accepted it for publication. In fact, the journal has gone further.
The Editor in Chief says:
In discussing their findings and their implications, the authors made a number of statements that have been interpreted as politically charged and that some members of the academic community interpreted as racist. Other members of the community questioned not only the claimed implications but also the empirical foundation on which they were based. Still others questioned how the manuscript came to be published in Psychological Science.
Concerns were also raised about the measures of national IQ used in the research; the measures tend to trend lower in non-Western, educated, industrialized, rich, and democratic (non-WEIRD) countries. Ultimately, it was these concerns that led Clark et al. to request that their article be retracted. Yet throughout the process of review and response to review (and in the now-retracted article itself), the authors defended the measures.
As social scientists, we have a responsibility to be sensitive to the political, social, and cultural issues raised by our work. We have a responsibility to clearly distinguish between the measures we use and the theoretical constructs those measures are intended to assess. To paraphrase Steve Lindsay, we have a responsibility to be appropriately modest in asserting our claims, clear in articulating the limits on the generalizability of our findings, and circumspect in our conclusions and their possible implications (S. Lindsay, personal communication, June 15, 2020). We must be especially sensitive when the topics with which we are dealing are associated with a history of injustice and when the message of our work could be inflammatory or incendiary.
My supposition was that the Editor wanted to go ahead with publication, but the authors became alarmed after receiving criticism. However, the Editor says that the journal must be “especially sensitive” about some topics because of historical injustices.
The Editor says:
We failed to recognize that the message of this article could be interpreted to have racial overtones and thus could be highly controversial. We therefore failed to act to mitigate the potential harm to which the message could contribute. We failed to provide a more direct, deliberate, and explicit alternative perspective on the data and the conclusions of the article. We should not and will not shy away from publishing articles on sensitive political, social, and cultural issues. But what we must and will do is exercise greater care in our handling of all submissions, including those on sensitive topics.
It is to be hoped that since the journal will not shy away from publishing articles on sensitive issues, they will shortly be publishing papers making the case for country intelligence research.
The authors seem to have been put in a very difficult position, and subjected to considerable pressure. There should be one standard of criticism in psychology, and that should be applied to all papers. Scholars must feel that they can examine different explanations without putting their careers at risk. Evidence must be insensitive. Lord Lindsay, founder of Keele University, said that it was for “the pursuit of truth in the company of friends”, but such high ideals are not always achieved. The authors probably came to the conclusion that the level of criticism they received was far more than would be meted out to usual psychology papers, and that it was best to withdraw their publication in the hope that they could retain their careers.
In these censorious times, it is hard to blame them for this pragmatic choice, but it makes it harder for any subsequent researcher to study national differences in intellectual and scholastic achievement. If a journal determines that a particular set of findings must be subject to special sensitivity, then new applicants submitting papers may well judge that they want their slim chance of publication not to be made even slimmer by choosing the wrong subject, or entertaining the wrong interpretation. Scholars are being warned: sensitive issues are best avoided, sensitively.
As if it were not enough that a just-about-to-be-published paper is retracted because of unpublished and unrefereed criticisms, papers that were published years ago have been retracted by editors after they have received critical petitions, even though the authors in question are dead. You might imagine that is unfair, because dead authors cannot reply to criticism, but this seems no longer relevant to editors. So, in other news:
Rushton & Templar. Do pigmentation and the melanocortin system mediate aggression and sexuality in humans as they do in other animals? Personality and Individual Differences. 53 (2012) 4-8
has ceased to exist.
The usual practice in academia is that anyone critical of any particular paper would write their own paper arguing against the original work. The new paper could then be discussed, and readers could judge what they thought of it. They might agree that the original paper was wrong, even on the available knowledge at the time of publication. That would be fair comment. The new paper would reference the old one, which would be no longer relied upon, unless of course another researcher came to its rescue, by writing a third paper, this time defending it. This is the way researchers used to work, letting you see the debate unfold, and keeping a track of the history of claim and counter-claim. That standard procedure meant that even decades later someone doing new research could look back at previous work with new techniques, and say which of the competing claims had turned out to be right. History should be the whole thing, the different views and different fashions, the agony and the ecstasy, but now work can be cancelled both before and after publication, if it offends the sensibilities of petition signatories.
Amusingly, I can remember this paper being discussed at one of our conferences, and being put aside because there were no up-to-date global studies of skin luminosity. The best way to improve papers is to improve them: better measures, more representative samples, more powerful statistical techniques should be used by all researchers, on all subjects.
Critics have deemed that the authors of the recently retracted paper have eaten forbidden fruit and must be cast out of academe.
“They, looking back, all the eastern side beheld
Of Paradise, so late their happy seat,
Waved over by that flaming brand, the gate
With dreadful faces thronged and fiery arms:
Some natural tears they dropped, but wiped them soon;
The world was all before them, where to choose
Their place of rest, and Providence their guide;
They, hand in hand, with wandering steps and slow,
Through Eden took their solitary way.”