The Unz Review • An Alternative Media Selection$
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
 BlogviewJames Thompson Archive
The Factor Factory
Email This Page to Someone

 Remember My Information


Bookmark Toggle AllToCAdd to LibraryRemove from Library • B
Show CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Thanks, LOL, or Troll with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used three times during any eight hour period.
Ignore Commenter Follow Commenter
Search Text Case Sensitive  Exact Words  Include Comments
List of Bookmarks

over factored test graph

If you or a family member, beset by a clinical or neurological problem, are given a face-to-face intelligence test, it is likely to be a Wechsler. It is considered the gold standard, and the Full Scale IQ result, the consequence of spending over an hour doing the 10 subtests, is like doing the decathlon: you get a reliable result across a broad range of intelligence domains. It was David Wechsler’s pragmatic approach of testing different skills which gave the Wechsler dominance over other tests, particularly in the clinical domain. In the case of a head injury, even a mild one; or when investigating incipient memory problems or childhood developmental disorders, it is usual to give a broad-band intelligence test to act as a baseline for other investigations. The positive manifold holds across a very broad domain of abilities. What may be evidence of a memory deficit in a bright person may be normal memory in a less able person. To some extent all specialized clinical tests act in the shadow of general intelligence.

Although Wechsler tests are the best known, others like the Woodcock Johnson have established a very good record, particularly in occupational settings, and yet others like Raven’s Matrices have been widely used cross-culturally. All of them work and produce very useful results. On average, a retest on normal samples will be within 4 IQ points of the original test. Others on the list, as you will see below, include the Kaufman Assessment Battery, Stanford Binet, Woodcock Johnson, Differential Ability Scale and the Reynolds Intelligence Assessment tests.

However, there is a problem which might impact you if you are given the Wechsler tests. Apart from the Full Scale result they also produce factor scores, and sometimes test givers interpret these factor scores and the discrepancies between them in ways which may go beyond prudent extrapolation. Why is this? Well, you already know my dictum “No-one gets around sampling theory, not even the Spanish Inquisition”.

In my ancient professional history, the Wechsler had 5 verbal subtests, plus one supplementary test; and 5 non-verbal tests, plus one supplementary. So, in addition to the full total, there were two factors, a Verbal IQ and a Performance IQ. Given that each factor summed up 5 or even 6 tests, that seemed reasonable. This approach held up for decades, leading to easy comparability between childhood and adult results, and between middle age and elderly testing. What happened next was either exciting or dismaying, depending on your attitude to what is now called an “update”.

Test constructors seem to have decided to extract more factors from the same subtests, perhaps to give users the impression that they were getting more bang for their buck. This is very silly, because all test result summaries are based on the individual items taken, and it is only worth declaring that a factor exists if it accounts for a large proportion of the variance. Otherwise, just give the subtest scores. If you extract too many factors you run into a sampling problem: your ratio of test items to factors goes down, and although it is not visible to test users, the reliability and validity are compromised. I was taught a simple rule of thumb: you should have at least five times as many people as variables. It might also be the case that you should have at least 5 subtests to each factor. As you will see below, there is a general trend for all intelligence tests to have a lower ratio of tests to factors. That is, the length of the test is pretty static, but as the years go by most of them claim to have found more factors.

Revisiting the Historical Increase in the Number of Factors Measured by Commercial Intelligence Tests: An Update and Extension of Frazier and Youngstrom (2007)
Ryan J. McGill and Thomas J. Ward, Thomas W. Frazier and Eric A. Youngstrom. ISIR 2018 poster session.

Over-factored intelligence tests

The standardization sample sizes are mostly pretty good, given that this is an expensive process involving face to face testing of over an hour. However, some of the samples are small, and the presumed factors identified unbelievably high. Here is the full poster:

Indeed, the same authors say of the most frequently used WISC V (Wechsler Intelligence Test for Children) that the best factorial solution for the test is one factor: general intelligence.

Construct validity of the Wechsler Intelligence scale For Children – Fifth UK Edition: Exploratory and confirmatory factor analyses of the 16 primary and secondary subtests
Gary L. Canivez1*, Marley W. Watkins2 and Ryan J. McGill3

1Eastern Illinois University, Charleston, Illinois
2Baylor University, Waco, Texas
3William & Mary, Williamsburg, Virginia

In sum, the tests are fine, but too many factors are being claimed. This allow some clinicians free rein to speculate as to why a person does well on one factor and not another, proposing that there is a deficit due to some extraneous cause. This is a common claim in medico-legal cases. I think that test constructors and interpreters should stick to more reliable, valid and prudent claims. Intelligence tests, so long as they sample a broad range of abilities can give you an accurate measure of overall ability. They can probably distinguish between verbal and non-verbal, but not always with confidence. At a pinch they can hazard a guess about three factors, but that is pushing it.

Don’t let yourself or your family members be over-factored.

• Category: Science • Tags: IQ 
Hide 12 CommentsLeave a Comment
Commenters to FollowEndorsed Only
Trim Comments?
  1. if Woodley of Menie is right (“g” is going down, while more specific abilities are going up) these “many factored” tests may be more appropriate 100 or more years from now:)

  2. res says:

    Thanks. I am not used to thinking about how sample size relates to the number of reliable factors.

    Table 1 of your final link gives the % variance for each factor of an oblique four factor solution. I was surprised by how low the % variance explained was for each factor: 36.55, 6.5, 3.38, 3.03. Is this a typical result?

    Supporting information for that paper is available at

    Figure S1 gives a scree plot comparing the WISC-V standardization sample (N = 415) to random data. I think that is a good thing to refer to when responding to Shalizi’s argument against g. It also helps put the explanatory power of each factor in perspective. The plot suggests to me there is some justification for one, two, or four factor models. But the first factor is clearly dominant.

    Tables S1 and S2 are similar to Table 1 except they give the results for two, three, and five oblique factor solutions.

  3. dearieme says:

    “beset by a clinical or neurological problem …”: does this imply that IQ results from one’s salad days might be useful years later, allowing comparisons that might lead to diagnosis?

    • Replies: @James Thompson
  4. @dearieme

    Yes, a baseline gives a good indication of previous levels from which one may have fallen. The National Adult Reading Test gives a very good prediction of those levels however, if only for English speakers.

    • Replies: @dearieme
  5. dearieme says:
    @James Thompson

    Ah well, then it’s a pity that I don’t have any numerical results for the tests I sat at 11 and 21.

    My university exam results were pinned on noticeboards for anyone to see – indeed, some were published in national newspapers. However it’s presumably a bit much to expect anyone to get hold of my results and have a stab at inferring an IQ from them.

  6. After ISIR 2018 and the discussions I had there, I think that the usual standardisation sample is probably too small for factor analytic precision. The variance accounted for is small, but that is not a problem if the main factor really absorbs most of the variance, which it often does. The real difficulty is deciding whether your factorial solution will replicate on a new sample, essentially the thousands who will take the test over the next decade. A national birth sample, or something close to it will give more powerful results.

    • Replies: @Wizard of Oz
  7. Anonymous [AKA "CGVibes"] says:

    As a clinician I watch the debates between academics and researchers keenly.

    But, CHC theory (on which the Woodcock-Johnson is based), with the three distinct levels (g / Broad abilities / Narrow abilities) can be very practically useful for assessing possible reasons for specific learning problems. These broad and narrow abilities (which have been investigated via factor analysis) have practical utility.

    For example, testing Phonetic Coding (narrow ability) from Auditory Processing (broad ability) can help to pick up students who struggle with reading (decoding) and spelling (encoding). Within Fluid Reasoning (broad ability) differences in ability for induction and deduction (narrow abilities) can help with instruction guidelines for maths teaching.

  8. You forgot to add: “And blacks are inferior.”

  9. “which it often (sic) does” What kind of useful tests (and for what uses) don’t have a dominant g factor?

  10. g usually emerges, so long as there is a broad range of tasks and of test takers.

Current Commenter

Leave a Reply - Comments on articles more than two weeks old will be judged much more strictly on quality and tone

 Remember My InformationWhy?
 Email Replies to my Comment
Submitted comments have been licensed to The Unz Review and may be republished elsewhere at the sole discretion of the latter
Commenting Disabled While in Translation Mode
Subscribe to This Comment Thread via RSS Subscribe to All James Thompson Comments via RSS