The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 BlogviewJames Thompson Archive
The Matrix
🔊 Listen RSS
Email This Page to Someone

 Remember My Information


Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

The Matrix

My first experience with Raven’s Matrices was as a psychology student. We did the test as a group, and then the Alice Heim 5 test of high grade intelligence, and finally inexpertly attempted to give each other the Wechsler test of adult intelligence. As you will have noted, the concept of intelligence and the ways it could be measured was one of the many topics considered central to a proper psychology degree.

I can remember thinking as I handed in my paper that I had got the last Raven problem wrong , but truly cannot remember what final result I got. It was assumed we would get them all right, which is why the last item bugged me. I can remember that on the Alice Heim test my result was an overall grade score of B, but not the actual score. The scores confirmed the university entry requirement of that time in the 1960s of being “in the top 2 %”.

I was certainly interested in finding out how intelligent I was, because a school is usually less selective than a university, in which supposedly brighter students congregate, so any school based hierarchy has to be re-calibrated for much tougher competition in higher education. It was clear to me that there were people much brighter than me, and that that group included the team on University Challenge and certainly the leading lights in Philosophy, Psychology and the sciences.

To create his test, John Raven supposedly went around the British Museum looking at designs on pottery from across the world in order to use motifs from different cultures. The test has a very strong underlying structure: the procedure is made evident in the practice items, and the same basic format is maintained throughout. People have to choose the correct item to complete the matrix. All that changes is that the problems get harder as you go through each set. (Quite why any problem is harder than another problem is my pet subject, on which I have not been making much progress, but assume as a working hypothesis that it is the number of elements times the number of operations which need to be done on those elements). Raven did item analysis for each of the 60 problems, so he was able to study their particular characteristics.

It is a general rule of test construction that you need a large number of items and a large number of people to try them out on. Although on the face of them all items are possible tests of intelligence, most of them will fall by the wayside. They may fail because everyone gets them right (too easy, waste of time, no discriminative power); everyone gets them wrong (too hard, waste of time, no discriminative power); or because whether people get them right or wrong does not relate at all to how well they do on all the other items, or even to subsets of those items (the item has an ambiguity or inconsistency in it which makes it unreliable). A good item is one which 50% of the people get right, and in addition those who pass that item are more likely to get the next item right (maximum discriminative and predictive power).

Although there are always arguments about what a test actually tests (and these arguments apply to all examinations), and everything any person does can reflect the culture in which they live, including how often they take tests. I would turn this argument on its head and ask: How can people develop a culture without understanding cause and effect relationships? How can a culture understand cause and effect if it has no notion of sequence? The Raven’s items are about the problem elements getting bigger or smaller, more or less numerous, adding or subtracting features, hiding in front or behind of each other. In short, they are about changes which a culture would note in tracking and hunting animals, in searching for food, and in finding out how to achieve favourable outcomes by noting positive developments.

Could a person do well on this task without solving the problems of sequencing which are embedded in the matrix? Apart from the option of cheating, every subject has to look at the elements and work out what progression is being revealed by the items, and which of the proffered options correctly completes it. The aim of each series is clear, but the actual solution to each problem is not. Virtually all subjects understand the very easy practice problems. Most subjects can complete the very easy next items. How is this possible, unless it taps very basic skills? It is later, when the task is still understood but the problems get harder (more elements combined) that individual differences reveal themselves. I think that the Matrices test reveals power differences between people, and not a fundamental operating-system incompatibility between continents.

I had already described the work David Becker is doing on the Richard Lynn database, which is the best collection of country IQs. The link below explains the background, and gives a link to the second edition of the database, with all the different tests included. Overall, the aim is to make every reference traceable and every procedure transparent so that readers can make their judgments about data quality, and decide for themselves which studies to accept or reject in their own research. There are many new papers to be added, which Becker will be working through.

Now Becker has refined his search by reporting only on the subset of data in which Raven’s Matrices were used to assess intelligence. Although all the tests in the database have a contribution to make, by restricting himself to only Raven’s Matrices in this particular exercise, Becker can avoid the effects of test heterogeneity. There are fewer test results, and fewer countries covered, but that is a cost worth paying in order to reduce an important source of possible error variance. Here is the link to the Matrices “Raven’s only”subset of results

Becker explains how to read the file:

LEVEL(N) shows a list of all nations for which raven-data were available and replicable

LEVEL(R) shows a list of all sources and samples from replicable raven-data

Both are connected to the working sheet WORLDIQ. This includes replicated (blue) but also non-replicable data (red). This is the best summary of all the available data, though less reliance should be placed on the results in red.

CALCULATIONS shows tables for special estimations carried out on each paper. IDs of sources are noted in every table header, so it should be possible for readers to see to which they belong.

Further sheets contain norm-tables and the FLynn-Effect estimation for UK.

– P&V means Flynn Effect-correction according to Pietschnig and Voracek (2015)
– L&V means Flynn Effect-correction according to Lynn & Vanhanen (from Richard’s working paper)
– 3PD means Flynn Effect-correction by a rough estimate of 3 IQ-points per decade
The other tabs are for the Advanced, Standard, Coloured

Here are Becker’s explanatory lecture slides

There is a ton of work here in these spreadsheets, and if you can help improve the database even further, then contact David directly. We are well on the road to having an accurate and transparent database of the world’s intelligence, openly available for all to use.

• Category: Science • Tags: IQ 
Hide 48 CommentsLeave a Comment
Commenters to FollowEndorsed Only
Trim Comments?
  1. Out of curiousity, JT, how strongly do you think that working memory predicts IQ?

    • Replies: @Stephen R. Diamond
  2. Haven’t got the papers to hand at the moment, but seem to remember it is in the moderate range

  3. Agent76 says:

    Why Libertarianism Is So Dangerous

    A former libertarian abandons his dream of a voluntary world and explains the potential worse case scenario after the overnight disappearance of government. The ending will SHOCK you! DISCLAIMER: Most libertarians do support such a drastic change, and many might say something like: “liberty is a philosophical evolution, and not an overnight thing.”

    • Replies: @Daniel Chieh
    , @Wally
  4. @Agent76

    How in God’s name is this related to the article?

    • Replies: @jacques sheete
    , @Agent76
  5. As you will have noted, the concept of intelligence and the ways it could be measured was one of the many topics considered central to a proper psychology degree.

    What went wrong*

    In short, they are about changes which a culture would note in tracking and hunting animals, in searching for food, and in finding out how to achieve favourable outcomes by noting positive developments.

    personnél provocation, irrrrrr grrrrrr

    I think that the Matrices test reveals power differences between people, and not a fundamental operating-system incompatibility between continents.

    If it is true so people with highest IQ would be organic machines of pattern recognition, SPECIALLY that patterns which are or appear to be too obvious to be mis-captured.


    What we understand as intelligence tend to be quite specialized even when it have a broader general knowledge as a possible support or basis, remember that knowledge and understanding are not the same. Seems most people understand/deeply know in more specific ways.

    Knowledge is more democratic, understanding is not.

  6. utu says:

    Recently I looked trough the test and noticed that some matrices were much harder than others. My question is how the cumulative score is calculated? Is it a sum of correct solutions or is it a weighed sum of correct solutions where the weights reflect a varying difficulty of matrices?

    The scaling of score to IQ is age dependent, right? So 14 years old needs to solve more matrices correctly to get the same IQ as, say a 10 year old, right? This presupposes that IQ is constant.

    It appeared to me that there are several simple rules if learned could greatly facilitate solving the matrices correctly. I would suspect that if there is a second test taken a higher score will be achieved. This means that Raven matrices may not be useful in longitudinal studies. Also this may have ramification in Flynn effect. Ron Unz suggested that our environment exposes us, as technology develops, to solving problems that facilitate solving Raven matrices. Also teaching of math has changed. I think that in 1940 students were not exposed to set theory and Venn’s diagrams while in in 1970’s ‘new math” was introduced where students learned about Venn diagrams and different transformations and symmetries. I am sure that student after taking such a course will preform better on Raven matrices test than students who was exposed to traditional geometry, trigonometry and algebra only.

  7. utu says:

    Important question: Anybody knows of data sets where children and their parents IQ’s are gathered together and differentiate by race. I would like to do children parent IQ regression, say for whites and blacks separately to find what are (1) heritability and (2) “genetic” mean for both populations.

  8. Anon • Disclaimer says:

    A way to circumvent the monopoly of the Ivory Tower.

    Radical professors, bureaucrats, and rabid students can shut down speech on campus, but they can’t shut down Discourse on the Internet.

    • Replies: @Wally
    , @CanSpeccy
  9. “The scores confirmed the university entry requirement of that time in the 1960s of being “in the top 2 %”.”

    That’s the most depressing sentence, given that about 40% of UK school leavers (and they ALL have to stay til age 18) now attend uni.

  10. @Daniel Chieh

    The vid is an intelligence test of sorts. But it has little to do with G-d’s name. 😉

  11. @utu

    Thanks. If you look at the tabs on the datasheets for the individual tests, and for example look at Standard PM 1979 and Standard PM 2007 you will get all the data you need. For simplicity, I looked at a score of 20, and then cast my eye on the imputed IQs for each age. A score of 20 is hard for a younger child, and much easier to achieve as the child develops. The scores are just the totals correct. By looking at the difference between standardization dates you can work out a Matrices based Flynn effect. Many people have argued that children now have better knowledge of sequence type tests, and that is likely to contribute to “IQ inflation”. Put “Elijah Armstrong” into my archive search bar to look at his publication on this issue.

    • Replies: @Stephen R. Diamond
  12. Agent76 says:
    @Daniel Chieh

    Had you viewed this video you would not be asking me that very question. It should be the first and foremost and basic thing you should do before commenting on videos and also read the articles and comments before replying as well. Have a great day Daniel Chieh.

    • Replies: @Pericles
  13. Wally says:

    That’s a false, strawman, Leftist argument.

    Libertarians do not advocate the elimination of Government or laws.

    • Replies: @Agent76
  14. Wally says:

    Indeed, the gatekeepers are losing control and don’t like it one bit.

    Here are the college professors:

  15. CanSpeccy says: • Website

    Professor Jordan Peterson’s deconstruction of the post modernist thought that underlies the war on traditional morality, whiteness, and free market economics is profound. But, as Anon presumably intended to demonstrate, it also demolishes the significance of IQ testing by showing that the way people relate to problems in the world, including the items in an IQ test, depends on what, for the sake of brevity, can be referred to as their mindset, which is to say their understanding of the world in the political, economic, social, philosophical and spiritual dimensions. What that means is that people with different views of the world will deal with a real world problem in a multitude of different ways; some ways proving to be more productive or remarkable than others, and some perhaps being seen as “genius,” but a genius that no IQ test could ever predict.

  16. Agent76 says:

    Please view the whole video before commenting. This is not rocket science!

    • Replies: @Wally
  17. Anon • Disclaimer says:
    @James Thompson

    Comment swallowed by UR software. Short version in response to #6, #11, #12. Thanks for confirming a part of my intuition about the Flynn Effect on top of Flynn’s own and Ron Unz’s cultural and sociological explanations (mine included antibiotics and learning to attend to radio comedy in a long list). The prominence of Raven’s Matrices in demonstrating the Flynn Effect over other tests seemed the key to the puzzle and now the young genius has turned the key (surely he is Griffe’s Prodigy). There’s still some interesting work to be done even on a one (or two) time use test like Raven’s (OK once every 8 years). How much faster do the smart learn the rules and thereby increase their score advantage over the not so bright on repeated testing.
    So what advantage does a very high IQ person have as CEO or President? Well, he will start a long way behind in solving ME problems without the depth of knowledge that a 30 year career diplomat or oil industry manager who has been ambassador or local CEO to 3 ME countries. But… it is surely when the nature of the problem has to be reconceptualised that the extra IQ points come into their own. And Raven’s Matrices’ first time tests may well be predictive of the ability to reframe the question.

  18. Pat Boyle says:

    I have unique perspective on IQ testing.

    I had switched to psychology from economics. One day in some undergraduate psychology class they asked for volunteers to take the Stanford-Binet test. Some graduate students in psychology needed to practice. The Stanford-Binet is of course an individual test where the test giver makes judgements about the answers. It takes a long time to administer and is strictly one subject at a time.

    As it happened this was San Francisco State just as the student riots were beginning. Someone in the Black Student’s Union had invited a large number of black thugs from the nearby Fillmore District. These street people came on campus and trashed the place. They broke up classes in session and beat up random whites.

    My girl friend and one of my male friends were panicked. They knew about my fiery temper and were sure my bleeding remains would be found somewhere in a lump on the lawn. But I was not only safe, I was calm and peaceable. I spent the whole period of the riot taking the Stanford-Binet.

    So I can say with some justice that IQ testing saved my life.

    • Replies: @Wally
  19. AaronB says:

    The Flynn Effect is *particularly* pronounced on the Raven’s Matrices – more so than any other test.

    The RM is also our best and most neutral measure of intelligence…….

    • Replies: @utu
  20. Wally says:

    Please review what you said.

    I’ll spoon feed you:

    Why Libertarianism Is So Dangerous

    A former libertarian abandons his dream of a voluntary world and explains the potential worse case scenario after the overnight disappearance of government.

    • Replies: @Agent76
  21. Wally says:
    @Pat Boyle

    Your score was …

    • Replies: @Pat Boyle
  22. I have spent some time today working with some Perl Web Scraping software scraping lists of articles from various sites.

    Does that require intelligence?

    Whether it does or not it was fun to write a short piece of code to do a lot of work.

  23. @Daniel Chieh

    If I read IQ as g, I think the answer is: high to nearly perfect.

    See: Working memory is (almost) perfectly predicted by g
    Roberto Coloma,, Irene Rebolloa, Antonio Palaciosa,
    Manuel Juan-Espinosaa, Patrick C. Kyllonenb
    aFacultad de Psicologı´a, Universidad Auto´noma de Madrid (UAM), Madrid 28049, Spain
    b Educational Testing Service (ETS), Princeton, NJ, USA
    Received 19 May 2001; received in revised form 16 December 2003; accepted 22 December 2003
    Available online 20 February 2004

    But you have to extract a working-memory latent factor. Otherwise, the low reliability of individual working-memory tests will produce a much less substantial result. [Thus, digit span is a relatively poor test on the WAIS.]

    • Replies: @ANON
  24. utu says:

    The RM is also our best and most neutral measure of intelligence…….

    It suppose to be culturally neutral. You can be illiterate to take it. But it is also a test, in my opinion, that is the easiest to prepare to. I think one day training can improve the score significantly.

  25. ANON • Disclaimer says:
    @Stephen R. Diamond

    I know some bright people where both father and daughter are said to suffer from ADHD (it didn’t stop the daughter getting 10 A*s in GCSE which put her in the top 16 per cent at her highly academic selective school). Now her mother tells me that she is very low on working memory (by what tests I don’t know but by some psychologist).

    It sounds very odd to me but I wonder now if there is a relaionship between ADHD and working memory. (BTW though the daughter got an A* in maths with the help of tutoring it is apparently far from her favourite subject).

    • Replies: @Stephen R. Diamond
  26. Agent76 says:

    At least I now know you are a child. “The oppinion of 10.000 men is of no value if none of them know anything about the subject.” Marcus Aurelius

  27. Anonymous [AKA "ACCprof"] says:

    As I understand the facts:
    (1) RPM has one of the highest g-loadings of any (sub)test.
    (2) RPM shows one of the largest Flynn effects
    (3) RPM shows one of the largest black-white gaps
    (4) RPM shows one of the largest international gaps

    Since the Flynn effect is surely environmental, and may be “non-real” in some sense (i.e. not reflecting true differences in functional intelligence), this suggests that the black-white and international IQ gaps are likewise environmental, and may be “non-real” in a similar sense. (Of course, by itself this is not conclusive.)

    (1) further suggests that a common interpretation of the Jensen effect is false — that higher g-loadings do not imply a higher genetic component.

    One missing datum that would be useful here is measures of the heritability of RPM in particular.

    I would be curious to hear your thoughts on this line of argument, and whether it is consistent with comparisons of other subtests. (i.e. that g-loadings, and implication in Flynn, racial, and international gaps are positively related.)

  28. Pat Boyle says:

    I never disclose my IQ scores. I will only say that I got an 800 on my GRE verbal.

  29. utu says:

    The claim (1) is irrelevant.

    • Replies: @ANON
    , @ANON
  30. ANON • Disclaimer says:

    Would you say thst it is irrelevant (to his point) if
    1. RPM had a high g loading
    2. RPM had a low g loading
    3. RPM haf one of the lowest g loadings?

  31. @Anonymous

    Thanks for this complex question, which you have set out very clearly, but is complicated to answer.

    Initially I wondered if there really was one paper which would make a start at the answer, and then remembered this one.
    Rushton worked out the heritability and the environmentality of each of the items in two versions of the Raven’s on twin samples, MZ vs DZ and this gives a partial answer to your questions. I think he presented it at the ISIR conference in Amsterdam in 2007, I remember thinking it was an elegant treatment of the issue. See what you think, but there is much to discuss in your sequence of implications.

    • Replies: @ANON
    , @utu
  32. Anonymous • Disclaimer says:

    It looks as though it would add considerably to the validity of any conclusions about the level or heritability of g if one also had a measure of the presumptive test sophistication of subjects and also knew how many times they had done RPM tests. The former measure would help distinguish beteeen the testing of a ten year old who had just learned to use pencil and paper and one who had been taught in a mission school over sevetal years to read, write and do arithmetic.

  33. ANON • Disclaimer says:

    Also if low g loaded or just high g loaded?

    • Replies: @utu
  34. ANON • Disclaimer says:
    @James Thompson

    Would it not be very valuable in assessing what valid implications can be drawn to know whether the g or raw scores from RPM were those of first time RPM test subjects or second or fifth, and which first time scores came from the completely illiterate and unschooled as against those from children as literate and numerate and test savvy as you would expect of a child in his fourth year in a mission school?

    • Replies: @James Thompson
  35. @ANON

    I think they are virtually always first time subjects. The other social variables depend on sampling and the country’s educational system and general wealth, so that is far more problematical to specify with precision, but one generally assumes that poorer countries have less well prepared students. However, Rushton did some work in South Africa with local psychologists to ensure that test conditions (proper lighting, seating, privacy etc) and test preparation (ensuring that practice items were understood) were of good quality. Rindermann (2015) has a good review.

  36. utu says:

    g-loading depends on the provenance of g. How was it constructed? What battery of subtests was used to construct this g? Was RPM test part of this battery or not?

    I do not want to get to the whole line of arguments about g but it suffices to say that any declarative statements with g in it should be taken with a big grain of salt. So if you read in Wiki entry on g which is heavily biased to Jensen that RPM have high g-loading which is 0.8 it really means the following:

    In some studies conducted by some guys who are very partial to Jensen’s methodology a test was constructed from which some g was extracted by means of heavily massaged factor analysis and this g had high correlation with RPM.

  37. utu says:
    @James Thompson

    I looked at Ruston’s paper you suggested. It has no single graph or table. They are in an electronic supplement to which I could not find access. It is really hard to say what was the question and what is the answer from his paper. You say that when you heard his presentation you found as an elegant treatment. Elegant treatment of what?

    • Replies: @res
    , @James Thompson
  38. res says:

    Is the spreadsheet linked here what you are looking for?

    • Replies: @utu
  39. Pericles says:

    After all, we have nothing better to do with our time than watch random videos posted by obscure commenters.

  40. utu says:

    Thanks. I still can’t figure out their methodology and the significance of result. Actually what is the result? My rule f thumb is that if it is after 11am and I am after 1 cup of coffee and I still do not understand then it is not my fault. If they shorten their paper and concentrate on explains their methodology instead of using 1/2 of paper’s volume to paying tributes to authors of other publications perhaps they could convey what they really did. But apparently they did not try hard enough. Perhaps themselves they do not think it is of much importance.

  41. utu says:
    @James Thompson

    Thanks. res also provided a link to the supplement (above). It wasn’t helpful unfortunately. V. poor quality of writing is often an indication of poor quality of research. If one can’t explain clearly what he/she did and what is the significance of results is usually very telling.

  42. @ANON

    ADHD does deleteriously affect working memory. Then, you may ask, how can working memory be the basis for g? (ADHD does not much affect g.) The answer, I think, is that ADHD affect working memory performance rather than capacity, and it is capacity which grounds g. Persons with ADHD are uneven rather than depressed in the working memory performance.

  43. @James Thompson

    Is there any evidence (or do you have an opinion) regarding whether Ravens is a better test for (uniformly) experienced subjects or inexperienced subjects? Another way of putting this is whether Ravens measures the ability to deal with novelty or the ability to apply new complex rules. (I would expect the latter.)

  44. Virtually everyone tested on Raven’s is tested once only. It is not used in clinical assessments where repeated testing is required.
    The requirements are shown in the practice items, the test has a logical structure, and it is easy to see that you have to pick an answer, but increasingly difficult to work out what the answers are.
    So, I think it is neither a test of novelty nor of applying new complex rules.
    It is a test of problem solving, using features which are common to most problems: things are changing in some ways, so what happens next?

Current Commenter

Leave a Reply - Comments on articles more than two weeks old will be judged much more strictly on quality and tone

 Remember My InformationWhy?
 Email Replies to my Comment
Submitted comments have been licensed to The Unz Review and may be republished elsewhere at the sole discretion of the latter
Subscribe to This Comment Thread via RSS Subscribe to All James Thompson Comments via RSS