The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 BlogviewJames Thompson Archive
Origins of IQ Tests
🔊 Listen RSS
Email This Page to Someone

 Remember My Information

Psychological test. 1990.0034.173.

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

When I started work in September 1968 one of the first things I was taught was that intelligence testing had a long history, and that many of the subtests in the Wechsler assessments I had been taken from previous research. Kohs’ blocks (1920), I used to mutter, when people talked about Block Design. I was also taught something about the Stanford Binet tests that I would not be using, because some clinicians still used them, and there was historical data I would need to know about. In hearing about skilled Binet testers I learned about dynamic testing: going from one domain to another as quickly as possible, just to establish general levels efficiently. I also learned that such procedures were only possible once one had achieved a very good knowledge of the test.

I was required to know my material almost by heart so that I could concentrate on every aspects of the patient’s behaviour. After 200 test administrations I began to feel confident I had seen it all, and knew all error types intimately. On my 201st test administration I encountered an entirely new error on Block Design, a scope and size error which was highly unusual. Even psychologists can learn something.

Does history matter? I think so. The early history of intelligence testing allows us to test the idea that IQ items are “arbitrary” and have no relevance to real life problems.

First publication of subtests in the Stanford-Binet 5, WAIS-IV, WISC-V, and WPPSI-IV. Aisa Gibbons, Russell T. Warne. Intelligence, 2019, Volume 75, July–August 2019, Pages 9-18.

The authors discuss the pre-history of intelligence testing from 1905 onwards. The period up to 1920 was extremely productive, and testing was popular, perhaps because of its widespread use in the military. Binet was interested in the lower levels of ability, Terman in the highest levels. Test have to cater for the entire range, all 7 tribes of intellect. Not only that, they have to maintain discriminatory power throughout the whole range, though that is hard to do at the extremes.

Wechsler favored test formats and items that (a) showed high discrimination in intelligence across much of the continuum of ability, (b) produced scores with high reliability, (c) correlated strongly with other widely accepted measures of intelligence, and (d) correlated with “pragmatic” subjective ratings of intelligence from people who knew the examinee—such as a work supervisor (Wechsler, 1944).

That is a good summary of what an intelligence test item must achieve: discrimination, reliability, validity with other tests and, most importantly, with intelligence in everyday life.

Gibbons and Warne show that many tests go back a long way, and are earlier than generally realized. Their list of tests is an excellent way to understand all the tasks which have constituted the core elements of intelligence testing.

I learned a great reading through this section of the paper. For example, I did not know that Binet said of his early reasoning test that it was the best of the lot:

“the 1908 scale (of reasoning) has three images, each containing at least one human figure. The child then was asked to describe the picture, and more complex responses based on interpretation (rather than simply naming objects in the image) were viewed as indicative of greater intellectual ability. Binet found this subtest so useful when diagnosing intellectual disabilities that he wrote, “Very few tests yield so much information as this one.. .. We place it above all the others, and if we were obliged to retain only one, we should not hesitate to select this one” (Binet & Simon, 1908/1916, p. 189).

Intelligence goes beyond the obvious.

Here are some historical points which were news to me:

Jean Marc Gaspard Itard was the first to use a form board-like task when he studied and educated a young boy found in the wild (named the “wild boy of Aveyron”) in 1798.

The very similar visual puzzles and object assembly subtests have an origin in the puzzles used for entertainment and geography education, which were first created in the 1750s in England and were in wide-spread use in the early 20th century when the first intelligence tests were being created (Norgate, 2007).

One discovery that we found striking was the diverse sources of inspiration for subtests. While the majority did have roots in the creation of cognitive tests, others have their origin in games (the delayed response subtest, the object assembly subtest), classroom lessons (the block design subtest), the study of a feral child (form boards and related subtests), school assessments (vocabulary subtest) and more. To us, this means that items on intelligence tests often have a connection with the real world—even when they are presented in a standardized, acontextual testing setting. Additionally, this undercuts the suggestion that critics of intelligence testing often make that intelligence test items are meaningless tasks that are divorced from any relationship to an examinee’s environment (e.g., Gould, 1981).

On the other hand, one criticism of intelligence tests seems justified from our study: subtests that appear on popular intelligence tests have changed little in the past century (Linn, 1986). While one could argue that the enduring appeal of these subtests is due to their high performance in measuring intelligence, the fact remains that many of these subtests were often created with little guiding theory or understanding of how the brain and mind work to solve problems (Naglieri, 2007). While sophisticated theories regarding test construction and the inter-relationships of cognitive abilities have developed in recent decades (e.g., Carroll, 1993), it is often not clear exactly how the tasks on modern intelligence tasks elicit examinees to use their mental abilities to respond to test items.

One way to test this criticism is to think of new tests more suited to the present age. Of course, others have already had that thought, and have created computer games which measure intelligence. Fun, but is this a big advance? It is only a gain if the results are more accurate, better predictive of real-life achievements and more speedily obtained. That is a hard bar to clear, since reasonable overall measures can be obtained in a few minutes. More likely, corporations are measuring our intelligence very quickly and surreptitiously by noting our google searches, Facebook likes, and perhaps even commenting histories.

A more pressing problem, to which the authors allude in passing, is that some new-fangled tests are launched each year, and most fall out of use. The reasons is that Wechsler testers have now become highly pragmatic, and do not take kindly to complicated administration procedures, not to test materials which are difficult to assemble and present quickly.

The reality appears to be that any puzzling task taps ability, and there are diminishing returns when using new psychometric tasks. This is the familiar “indifference of the indicator” which Spearman proposed in 1904. This does not exclude finding that individuals have strength and weaknesses in specific domains, but simply that all tasks lead to g, either quickly or slowly, to slightly varying degrees.

• Category: Science • Tags: General Intelligence, Intelligence, IQ 
Hide 35 CommentsLeave a Comment
35 Comments to "Origins of IQ Tests"
Commenters to FollowEndorsed Only
Trim Comments?
  1. dearieme says:

    What’s been the most satisfying part of your career in IQ, doc?

  2. dearieme says:

    Irresistible to those who used to take donkey-rides on the beach.

  3. Lot says:

    “though that is hard to do at the extremes.”

    Isn’t making IQ tests extend to the highest range simple to do by taking an existing test that goes relatively high, and then having the perfect or near perfect scorers repeat it, or repeat it with a stricter time limit?

    • Replies: @Anon
    , @EH
  4. Anon[152] • Disclaimer says:

    “though that is hard to do at the extremes.”

    Isn’t making IQ tests extend to the highest range simple to do by taking an existing test that goes relatively high, and then having the perfect or near perfect scorers repeat it, or repeat it with a stricter time limit?

    Speed is one, but not the only component of intelligence. I think the recent paper that found that smarter brains have more sparsely connected neurons relates to speed. But the size of different brain parts is also a factor, and other things as well.

    So to test an extreme right tail person, you could speed up certain tests. But for instance, with vocabulary, you simply test a lot of words and include more difficult words in the smart person test. This is one area where the test maker doesn’t have to be as smart as the test taker.

    For things like Raven’s matrices, it almost seems like smart people would have to come up with the new test questions, although trial and error may also produce new patterns.

    For digit span and block tap span type tests, they are easy to scale since they are scored from a true zero on up in integral steps.

    Validating tests for the right scale is almost impossible given the few people you could test them on.

  5. res says:

    Validating tests for the right scale is almost impossible given the few people you could test them on.

    That does seem to be the trouble. This page might be worthwhile for those interested in high range IQ tests.
    Uncommonly Difficult IQ Tests

  6. Cortes says:

    “Nine times out of ten it’s your imagination. It’s the tenth that’s the killer.” recently read about advice for special forces guys.

    Everything’s ok until it’s not and alertness keeps one alive. The terrific wartime diary “With The Jocks” has numerous examples of sanctions against the platoon members (country boys by and large) going out on patrol against the Wehrmacht without ammunition or helmets etc. And that was in a military environment with geographic intake – KOSB. Imagine if drafted into a vast war machine where nobody cared…?

  7. EH says:

    About 15 years ago in the Ultranet/Mega SocietyEast mailing list I proposed a way of extending regular IQ tests up to 4 sd beyond their usual range by a calibrated reduction of the subjects’ capabability using variable concentration anaesthetic gasses and choice reaction time tests. This is based in Rasch measure theory; the likelihood of getting an answer correct is the ability divided by the difficulty, so to get the likelihood into the maximally informative 50:50 range, one can not only increase the difficulty, but reduce the ability. (Terse since typed on phone.)

    • Replies: @James Thompson
  8. @EH

    Thanks. Up to 6 sd achievable for the last 4 decades by giving a hard Maths test designed for 18 year old to 13 year olds. SMPY. Neat, not expensive, highly predictive.

    • Replies: @EH
    , @prime noticer
    , @Bruno
  9. EH says:
    @James Thompson

    Nice to hear from you, hope you’re enjoying the conference.

    I’d like to ask a favor – does anybody there have any pull with Riverside Publishing? I’d really like to get a graph of the Rasch / SB5 CSS / WJ W scores and s.d.s for a full test similar to this one for the block rotation subtest I mentioned to you a few years ago. Kevin McGrew, who made the original, said he thought Riverside would consider it proprietary, but I think it would give a better understanding of the meaning of their Rasch measures, of intelligence itself, and would encourage the use of Riverside’s tests. A table of the means and s.d.s for different ages would be nearly as useful.

    Thanks again for your enlightening articles.

    • Replies: @res
    , @res
  10. Well, I met Woodcock for the first time, and had a good conversation with him, but I don’t have any contact with Riverside, or know the best person on this. Perhaps Prof Nathan Kuncel, University of Minnesota, a specialist in occupational selection is in fact the best person to approach.

    • Replies: @EH
  11. res says:

    Not exactly what you are looking for, but have you seen this?
    Figure 3 has some growth curves by age for individual subtests.

    This (389 page!) document:
    discusses age-equivalent scores which seems relevant.

    Calculation of Age- and Grade-Equivalent Scores
    In the WJ IV, bootstrap-based smoothed curves from the entire age range for a test or cluster were used to generate the age- and grade-equivalent scores that are reported by the online scoring program. An age-equivalent score was obtained for each W score (from the y-axis of the fitted curve) by identifying the corresponding age (in months) along the x-axis. Grade-equivalent scores were obtained in the same manner, except that the smoothed curves were based on bootstrap samples of norming study participants sorted in order by grade placement (to the tenth of each grade). Points along these curves represent the median W score (REF W) of students at each tenth-of-grade placement. A grade-equivalent score was then obtained for each REF W score (from the y-axis of the fitted curve) by identifying the corresponding grade (in tenths of a year) along the x-axis.

    Pages 136-142 including Figures 5-3 through 5-8 looks like a nicer version of the plots in the first document. These are expressed in “W Score Difference From Age 6” which maps nicely into your plot IIUC. The figures contain a wide variety of factors and composites, but only the means. The other big issue I see relative to your plot is these cover ages 0-90 so childhood is fairly compressed and much harder to extract accurate numerical estimates for.

    That graph of yours is very informative. Thanks for creating your version (it is easier to read than the original, the y-axis grid is especially helpful). I don’t know what process you used to create it (i.e. how hard my request would be to do), but would it be possible to merge in the following slide with ages after 25? The y-axis looks like the same scale. Having those both (so ages 2-100) on the same graph would be even more interesting.

    I hope you are able to find the data you are looking for. The graph you have in mind would be a valuable addition to the (accessible) literature.

    P.S. The SB5_ASB_3.pdf link in your blog post is broken now. Here is another:

    • Replies: @EH
  12. res says:

    This paper might also be of interest. Again, means only. I had a little trouble finding the PDF, but it is buried in the linked journal issue. See page 590 for Figure 2. Curve of mean W-scale Broad Cognitive scores by month of age for the WJ-R normative sample (ages 2.21).

    If you want the SDs I think the thing to look for is the Woodcock Johnson Standard Scores computation. Here is an excerpt from page 83 of the WJIV Technical Manual linked above.

    Calculation of Percentile Rank and Standard Score Norms

    The WJ IV standard scores are calculated using a special procedure that combines features of both area and linear transformations of the distribution of scores (McGrew et al., 1991; McGrew & Woodcock, 2001). The percentile rank and standard score norms for the WJ IV were constructed as follows.
    1. The WJ IV, as in prior editions of the test, employs a unique procedure for maintaining the real-world skew of score distributions. Different standard deviations (SDs) are estimated for the two halves of the score distribution (High SD and Low SD value) above the median REF W at different ages.13
    2. For each normative comparison (age- or grade-based) for each test and cluster, the mathematical algorithms representing the REF W score equations and either the High SD or Low SD are used to calculate the percentile rank and corresponding standard scores for each individual’s obtained score.14 The standard score scale is based on a mean of 100 and standard deviation of 15.

    OK. I think Appendix B has the mean/SD information you want (for subtests only) in table form. See pp. 253-277
    And similar information for clusters (e.g. General Intellectual Ability) is in Appendix C. See pp. 279-305.

    P.S. Do you know if your plot is actual +-1/2/3 SD values or did they just plot multiples of the overall SD (or low/high SDs, as discussed above)? From looking at the plot I would guess either low/high SD multiples or discrete measurements (probably the latter given the differences in curve slopes).

    • Replies: @EH
  13. EH says:
    @James Thompson

    The Woodcock? I’m impressed, not many can drop a name like that. Even Taki, a name-dropper with decades of professional experience, would wear steel-toed boots before dropping one that solid. Thanks for The suggestion of Prof. Kuncel, and I’ll also try Deborah Ruf.

    • Replies: @James Thompson
  14. EH says:

    Wow! Thanks, res. The first link’s fig. 3 is a bit difficult to make out the labels, but the W-score vs. age curves for different clusters vary far more than I would have expected and often show more decline with age as well. The WJ technical manual in the second link is an amazing goldmine that I’ll be spending months if not years exploring. The SEM (error of measurement) vs. difficulty graphs in particular are fascinating, I would have thought the error would rise much earlier than ~550, which is close to +5 sd for block rotation (if I recall correctly 510 mean, 8.5 s.d), and some items seem to have difficulties of over 600!

    I think I have the remake of the older range of the block rotation score graph on my HDD, but my laptop ist geborken (water spill). I did them in using layers to overlay Bezier curves over the original lines and the measurement tool to lay out the axis scales. GIMP or Photoshop would do just as well. When I can get my password file off the HDD so I can log in to the blog I’ll update the broken link.

    Thanks again for your amazing research, res, I’ve been at a standstill for years but now it’s as if there were no time lost.

    • Replies: @res
  15. res says:

    You are very welcome. I look forward to hearing about what you discover! Your block rotation post and graph have been a useful resource for me to help understand how growth by age compares with SDs within peers.

  16. EH says:

    Yes, Appendix C of the WJIV technical manual starting on page 279 is almost exactly what I’m looking for, though as you note it doesn’t give the spacing of all the s.d.s at each age – I think the McGrew graph does, looking at the 5-10 age range, particularly the +/- 2 and 3 s.d curves. The means and s.d.sfor general intelligence are substantially larger in the technical manual than for the block rotation subtest, e.g. about 517.5 and 12 for ages in the 40s.

    Thanks very much again!

  17. @EH

    An agreeable man, with whom I discussed the Wechsler/Woodcock divide, and the rise in the use of his tests.

  18. James, if I wanted a relatively quick, accurate estimate of my IQ, what would you recommend?

    • Agree: Macumazahn
    • Replies: @Anon
    , @James Thompson
  19. @James Thompson

    this tests mathematical ability though, which is related, but not exactly the same thing being tested on intelligence tests.

    many genius ability level mathematicians are surprisingly not very smart the moment they leave the math room. math ability like that seems to be highly specialized. many math geniuses are not solid all around thinkers.

    sort of like testing near superhuman reaction times. it’s related to intelligence, but the fastest guys aren’t going to be physicists. their reactions are much faster than physicists, but they’re used for sports or war. or, today, video games.

  20. Anon[424] • Disclaimer says:
    @Yapius the 2nd

    You can Use the Wilson test , or the Kent test , or both , they are short , they have an aproximate equivalence with the WAIS .

  21. Guys, what do you all recommend for an IQ test? One that is good for as many people as possible. I am gonna try to get my nephew to take one just to see how well he does.

  22. @dearieme

    Presumably watching all the high-IQ whites and Jews deconstruct and dissolve the social fabric, the nation, and the very idea of what it means to be human. But, hey, whites are smarter than José Olé. With so few victories to celebrate in Weimar Amerika nowadays even a pyrrhic one must be savored!

    Yes, the Mexicans, blacks, and Muslims are violent, dumb, etc.—but they also are not consumed by a barely concealed desire to genocide themselves. Whites, especially the high IQ ones, certainly are, leading all of us into the depths. Or am I mistaken and it is the beaners, moon crickets, and falafel-jockeys that are the ones aggressively proselytizing the Gospel of trannies, poofters, vaccination avoidance, sexualization of children, junk culture, drug addiction, and self hate? Because “social justice”, “safe spaces”, and the perennial wailing about “toxic masculinity” originated in Ghana and Pakistan, right?

    (A) Intelligence: is the ability to solve problems that you have never seen before.
    (B) Creativity: is the ability to come up with something new.
    In a sense (A) always involves (B), because in (A) you have to come up with something “new to YOU”. Nevertheless, (B) usually implies “new to EVERYONE”.
    Creativity: the ability to create.
    Create: Etymology: from Latin creātus, “to beget, give birth to”.

  24. My autism has detected a missing “had” in line 3. Nothing personal, doc. To me, trivial literals that a normie would skip are like having grit in my eye. Luckily I miss quite a few.

  25. @Yapius the 2nd

    Oh dear, a good question, and I may have to give a long answer.
    If you look at my recent postings, there is a series I have done on very quick tests. Getting the material might be difficult but you might find a psychologist who could administer some of those.
    Getting someone to give you Raven’s matrices would be a good option. It will be 40 minutes but will be reliable.

    • Replies: @EH
  26. The very similar visual puzzles and object assembly subtests have an origin in the puzzles used for entertainment and geography education, which were first created in the 1750s in England

    What were these puzzles? Any modern equivalents?

  27. padre says:

    This IQ tests seem to me like statistics or opinion or opinion polls, they tell you nothing about nothing!You can bend them and twist them till they suit your purpose!

    • Replies: @Wizard of Oz
  28. EH says:
    @James Thompson

    For someone with a verbal tilt, the Miller’s Analogies Test is a good option with lots of top. Old (early ’90 s) “10 SATS” practice books and online IQ conversion table is also a good option. Though not normed as well as a proper IQ test, being made for teens of 25 years ago, it’s a much better test than the Ravens’ in most ways. Come to think of it, the fastest estimate of IQ would be from past standardized test scores, hardly anybody hasn’t taken a bunch of these already. Wonderlic is worth a mention, too.

  29. Bruno says:
    @James Thompson

    6sd is one in a billion. It’s not testable yet. Even 5 sd, around 1 in 3 millions is not testable.

    In my opinion, math test, when they are not knowledge intensive, are one of the best IQ test – but it’s a sufficient and not a necessary condition of g – when they are tricky like math Olympiad math problems, wich is the highest range IQ test available.

    But contrary to Derbyshire, I don’t think the threshold is anywhere 1 in 1 million. The average candidate presented (a bronze medalist) would be around 145 and the average gold medalist around 150 IQ.

    But the perfect scorers – 2 or 3 each year – are probably around 165 in g and 4.3 sd is the highest you can seriously go from what I have seen.

    The Vanderbilt test and Duke test – Sat score at 12/13 – wich allowed them to track 1 in 10 000 scorers (would be a 155 IQ) have not re tested the children as adults. I guess they would regress around 140. That’s why they are a brilliant group but nowhere near geniuses.

    IQ would need a Bill Gates giving 1 billion to have a team of people really trying to identify and measure differences in abilities.

    It’s possible that the GCHQ test – very amusing but biases toward teen geek culture – could have a 160 IQ discrimination level.

  30. @padre

    Maybe you have the good fortune for it not to matter for your welfare that you haven’t learned how to get something out of statistics, opinion polls or IQ tests or even that other, smarter or better educated people do. Good luck to you.

    • Replies: @obwandiyag
  31. IQ tests intelligence, but it does not test for he most important feature of humanity:


  32. GMC says:

    LOL – What good is a high IQ , if you work as a yes man or woman for some NWO government ? Granted you may be able to come up with fantastic ways to screw your countrymen but is that the IQ talkin or your severely flawed – personality. IQ tests should test common sence, and if you know how rebuild an Engine, build a house from scratch, keep 50 men on a job working together , flying airplanes, sailing ships. College boys/girls with daddys big money – all can have big IQs but ain’t worth a shit – in normal life. Unless you work for the Governments, Corporations or MSM.

  33. Tim too says:

    “Physician, heal thy self…” an existential IQ test.

  34. @Wizard of Oz

    You are so stupid you don’t even get what padre said. And we don’t need an IQ test to know that.

  35. Schmedly says:

    Why is it NO ONE has any problem admitting “Asians” have higher IQs. But just imply blacks have lower ones, and you’re labeled a RACIST ???

Current Commenter

Leave a Reply -

 Remember My InformationWhy?
 Email Replies to my Comment
Submitted comments become the property of The Unz Review and may be republished elsewhere at the sole discretion of the latter
Subscribe to This Comment Thread via RSS Subscribe to All James Thompson Comments via RSS