My first experience with Raven’s Matrices was as a psychology student. We did the test as a group, and then the Alice Heim 5 test of high grade intelligence, and finally inexpertly attempted to give each other the Wechsler test of adult intelligence. As you will have noted, the concept of intelligence and the ways it could be measured was one of the many topics considered central to a proper psychology degree.
I can remember thinking as I handed in my paper that I had got the last Raven problem wrong , but truly cannot remember what final result I got. It was assumed we would get them all right, which is why the last item bugged me. I can remember that on the Alice Heim test my result was an overall grade score of B, but not the actual score. The scores confirmed the university entry requirement of that time in the 1960s of being “in the top 2 %”.
I was certainly interested in finding out how intelligent I was, because a school is usually less selective than a university, in which supposedly brighter students congregate, so any school based hierarchy has to be re-calibrated for much tougher competition in higher education. It was clear to me that there were people much brighter than me, and that that group included the team on University Challenge and certainly the leading lights in Philosophy, Psychology and the sciences.
To create his test, John Raven supposedly went around the British Museum looking at designs on pottery from across the world in order to use motifs from different cultures. The test has a very strong underlying structure: the procedure is made evident in the practice items, and the same basic format is maintained throughout. People have to choose the correct item to complete the matrix. All that changes is that the problems get harder as you go through each set. (Quite why any problem is harder than another problem is my pet subject, on which I have not been making much progress, but assume as a working hypothesis that it is the number of elements times the number of operations which need to be done on those elements). Raven did item analysis for each of the 60 problems, so he was able to study their particular characteristics.
It is a general rule of test construction that you need a large number of items and a large number of people to try them out on. Although on the face of them all items are possible tests of intelligence, most of them will fall by the wayside. They may fail because everyone gets them right (too easy, waste of time, no discriminative power); everyone gets them wrong (too hard, waste of time, no discriminative power); or because whether people get them right or wrong does not relate at all to how well they do on all the other items, or even to subsets of those items (the item has an ambiguity or inconsistency in it which makes it unreliable). A good item is one which 50% of the people get right, and in addition those who pass that item are more likely to get the next item right (maximum discriminative and predictive power).
Although there are always arguments about what a test actually tests (and these arguments apply to all examinations), and everything any person does can reflect the culture in which they live, including how often they take tests. I would turn this argument on its head and ask: How can people develop a culture without understanding cause and effect relationships? How can a culture understand cause and effect if it has no notion of sequence? The Raven’s items are about the problem elements getting bigger or smaller, more or less numerous, adding or subtracting features, hiding in front or behind of each other. In short, they are about changes which a culture would note in tracking and hunting animals, in searching for food, and in finding out how to achieve favourable outcomes by noting positive developments.
Could a person do well on this task without solving the problems of sequencing which are embedded in the matrix? Apart from the option of cheating, every subject has to look at the elements and work out what progression is being revealed by the items, and which of the proffered options correctly completes it. The aim of each series is clear, but the actual solution to each problem is not. Virtually all subjects understand the very easy practice problems. Most subjects can complete the very easy next items. How is this possible, unless it taps very basic skills? It is later, when the task is still understood but the problems get harder (more elements combined) that individual differences reveal themselves. I think that the Matrices test reveals power differences between people, and not a fundamental operating-system incompatibility between continents.
I had already described the work David Becker is doing on the Richard Lynn database, which is the best collection of country IQs. The link below explains the background, and gives a link to the second edition of the database, with all the different tests included. Overall, the aim is to make every reference traceable and every procedure transparent so that readers can make their judgments about data quality, and decide for themselves which studies to accept or reject in their own research. There are many new papers to be added, which Becker will be working through.
Now Becker has refined his search by reporting only on the subset of data in which Raven’s Matrices were used to assess intelligence. Although all the tests in the database have a contribution to make, by restricting himself to only Raven’s Matrices in this particular exercise, Becker can avoid the effects of test heterogeneity. There are fewer test results, and fewer countries covered, but that is a cost worth paying in order to reduce an important source of possible error variance. Here is the link to the Matrices “Raven’s only”subset of results
Becker explains how to read the file:
LEVEL(N) shows a list of all nations for which raven-data were available and replicable
LEVEL(R) shows a list of all sources and samples from replicable raven-data
Both are connected to the working sheet WORLDIQ. This includes replicated (blue) but also non-replicable data (red). This is the best summary of all the available data, though less reliance should be placed on the results in red.
CALCULATIONS shows tables for special estimations carried out on each paper. IDs of sources are noted in every table header, so it should be possible for readers to see to which they belong.
Further sheets contain norm-tables and the FLynn-Effect estimation for UK.
- P&V means Flynn Effect-correction according to Pietschnig and Voracek (2015)
- L&V means Flynn Effect-correction according to Lynn & Vanhanen (from Richard’s working paper)
- 3PD means Flynn Effect-correction by a rough estimate of 3 IQ-points per decade
The other tabs are for the Advanced, Standard, Coloured
Here are Becker’s explanatory lecture slides
There is a ton of work here in these spreadsheets, and if you can help improve the database even further, then contact David directly. We are well on the road to having an accurate and transparent database of the world’s intelligence, openly available for all to use.