The Flynn Effect of rising raw scores on IQ tests is one of the most interesting phenomena in all the human sciences. It was first noticed in the 1940s, but for a long time little attention was paid to the fact that IQ test publishers had to renorm their tests periodically because people kept doing better on them. This pattern began to be explored by political philosopher James Flynn from around 1979 onward, and the phrase “Flynn Effect” was coined in his honor in 1994′s The Bell Curve.
One interesting aspect of the Flynn Effect is that it tends to be larger on the less culturally biased tests, such as the outer space-looking Raven’s Progressive Matrices:
Historically, much effort was put into the obvious challenge of developing IQ tests that are stable across space, from culture to culture. In contrast, nobody until Flynn paid all that much attention to the question of IQ tests being stable across time.
For example, the alien-looking Raven’s Matrices IQ test that was introduced in the 1930s in the hope of being more culture-free than previous IQ tests has seen a huge Flynn Effect of around 3 points per decade, or a standard deviation (15 points) in a half century. A score on the Raven’s that would put you at the 50th percentile a half century ago would only put you at the 16th percentile today.
The more human-seeming Wechsler Intelligence Scale for Children (WISC) saw a still-substantial Flynn Effect of about two points per decade, but that’s less than the Raven’s.
Importantly, the size of the Flynn Effect from 1947-2002 differed sharply amongst the subtests on the WISC as shown above, from only 2 points over the 55 years on the “Information” and “Arithmetic” subtests to 22 points on “Picture Arrangement” and 24 points on “Similarities.” (In the table above, the Flynn Effect column is taken from my 2007 review in VDARE of Flynn’s book What Is Intelligence? )
The kind of cognitive facilities that come up in normal conversation, such as vocabulary, arithmetic and general knowledge, have only seen small Flynn Effects, which is why the Flynn Effect isn’t easily noticeable in much of daily life (although I’ll point out below where it can be seen).
Recently, James Thompson’s Psychological Comments had a table of the “cultural load” of each WISC subtest from a 2013 paper:
Kees-Jan Kan, Jelte M. Wicherts, Conor V. Dolan, and Han L. J. van der Maas. “On the Nature and Nurture of Intelligence and Specific Cognitive Abilities: The More Heritable, the More Culture Dependent.” Psychological Science 24(12) 2420–2428
… Cultural load was operationalized as the average proportion of items that were adjusted in each subtest of the WAIS-III when the scale was adapted for use in 13 countries.
I presume that means adjustments in questions beyond simple translation. IQ test publishers validate new editions of their tests in each country in which they intend to sell them, and that lets them notice proposed questions that don’t work well due to local idiosyncrasies. (In contrast, the PISA international school achievement tests have a “we’ll fix it in post-production” philosophy of dropping poorly designed questions after the PISA test is given. But in either case, it’s important to figure out at some point which questions just don’t work the same across space and which ones work well around the world with just simple translations.)
Wicherts et al have noticed that heritability is strongest on the most culture loaded subtests, which is very important. But I want to focus today upon the potential implications of their data (the Cultural Load column in my table above) for better understanding of the Flynn Effect.
My table above combines the two sets of figures for Weschler substests. (Note the oranges to tangerines comparison of WISC [Flynn Effect] to WAIS [Cultural Load] — there are a ton of technical issues here, such as the Digit Span subtest being missing from Flynn’s data, but I’m just going to blunder onward.)
Eyeballing my table, it looks like there’s a moderate negative correlation between the size of the Flynn Effect and the size of the Cultural Load. The correlation is -0.44.
This overall pattern shouldn’t be surprising because it’s in line with the general difference between the Raven’s and the Wechsler’s: the more a Wechsler subtest is like the Raven’s, the higher the Flynn Effect. Conversely, the more culture-dependent a Wechsler subtest is, the lower the Flynn Effect.
For example, “vocabulary” is the most culturally sensitive Wechsler subtest, not surprisingly, and it’s got a quite small Flynn Effect. Interestingly, vocabulary’s also a really good subtest of overall intelligence. For instance, the ongoing General Social Survey includes a 10 word vocabulary test and that has proven to be a surprisingly decent proxy for IQ.
If we leave out the “Similarities” outlier, the correlation is -0.74.
My best theory for what’s going on with the Flynn Effect besides obvious ones like better nutrition is that the world has seen a major cultural / environmental shift that has been going on in most cultures around the world at a fairly steady pace that makes young people better at certain subtests, typically on Performance IQ subtests, but doesn’t do them much good on Verbal IQ subtests except for “Similarities.”
As I wrote in 2007 about “Similarities:”
Finally, the fastest rising subtest on the WISC, Similarities, rewards abstract scientific thinking, what Flynn calls viewing the world through “scientific spectacles.”
A child gets a maximum score for replying that dogs and rabbits are “mammals.” A kid in 1947 who had never seen a nature documentary on TV would likely have said “They have four legs” or something else more concrete than the Linnaean category “mammals.”
In 1947 a child in the hollers of Kentucky would probably know more concrete things about dogs and rabbits than an urban child today. But IQ tests have tended to anticipate the direction in which global culture has evolved, away from the concrete and toward the abstract and two-dimensional, toward what can be represented on a piece of paper or a screen.
Whatever this change is, it’s reminiscent of Moore’s Law in its endurance and steady pace. As you know, around 1968 Gordon Moore of Intel, the famous Silicon Valley silicon chip firm descended from Shockley Semiconductor, pointed out that Intel had been able to double the number of transistors on a standard size piece of silicon every year or two throughout the 1960s, and he believed that the industry would be able to keep up this pace for some time into the future. This more or less proved true for at least four decades, with world changing consequences, such as the coining of the term “Silicon Valley” in 1971 and the rise of Silicon Valley to immense economic importance.
I don’t know if Moore’s Law is still in effect (the laptop I bought in 2015 is only trivially faster than the one I bought in 2012, the first time in my personal computer owning career, which goes back to 1984, that a new computer wasn’t tangibly faster). Similarly, I don’t know if the Flynn Effect is still operating everywhere. (I haven’t really been following the data in this decade.)
But Moore’s Law has been kind of like the Flynn Effect in that it has been relatively incremental, decade after decade, rather than erratic, and the effects have been felt globally even though its heartland has been Silicon Valley, kind of like how IQ testing’s heartland has been Silicon Valley ever since Lewis Terman released America’s first IQ test, the Stanford-Binet, a century ago.
Moreover, Moore’s Law (in the sense of higher performance in general) has had multiple causes. For example, when clock speeds on CPU chips topped out, the chip companies were able to regroup and keep Moore’s Law progressing for a number of years further by doing other things. Similarly, it’s likely that better nutrition both contributed to the Flynn Effect (the U.S. added micronutrient supplementation of both iodine and iron to staples between WWI and WWII) in the past, but improved nutrition has been less of a contributor to the Flynn Effect in some countries in recent years as nutrition has gotten about as good as it’s going to get. But other more mysterious factors apparently stepped in to keep the Flynn Effect going a while longer.
So, Moore’s Law is an informative analogy for the Flynn Effect.
But I would go further and suggest, somewhat hand-wavingly, that one of the driving forces of the Flynn Effect has been Moore’s Law, or, to be both more precise and more vague, some kind of superset of a direction to technological change of which Moore’s Law is a subset.
One of the big changes in daily life over recent centuries has been the growth of what I might call humans having to deal with “machine logic.” People today deal far more often each day than in the past with semi-intelligent machines who can only be dealt with in a certain way according to their own logic. You deal with the ATM rather than with a bank teller, with a gasoline pump rather than with a pump jockey, with elevator buttons rather than with elevator operators. You can’t wave your hands around with these machines until they figure out what you want done. You have to follow a precise logical series of steps.
(This trend may not continue forever. For example, searching the Internet using Google today requires users to use less logic than searching the Internet using Alta Vista in 1998 required. The term “Boolean operators” was useful to understand to get more out of Alta Vista, while Google is so smart today that you don’t have to be as smart.)
This trend toward people having to interface more each decade with machine logic hasn’t just been happening since the silicon chip was invented. Before the silicon chip was the transistor, perfected by William Shockley, and before that the vacuum tube, which Lee de Forest made significant progress upon in Palo Alto around the time Lewis Terman of Stanford was adopting Binet’s pioneering IQ test for the American market.
Granted, I’m waving my hands around in making this argument in the hopes that you’ll grasp what I’m trying to get across. I don’t have this reduced to a precise series of steps that a machine intelligence could understand, but I do think I’m onto something: that the high Flynn Effect, low Culture Load IQ subtests are a kind of like mastering dealing with information technologies, and kids these days get more practice in that than we did and we got more practice than our parents did.
In contrast, kids these days likely have less practice dealing with complex 3-d entities, such as repairing automobile engines. Instead, they are used to dealing with 2-d paper and, ever increasingly, 2-d screens. But IQ tests tend to shy away from much in the way of 3-d testing, other than some blocks subtests on the WISC and other children’s IQ tests, largely for reasons of economy. Asking and answering questions in a 2-d format, whether on paper or on a computer screen, is cheap.
But because 2-d is cheap, the real world has also moved in the 2-d direction that IQ tests anticipated.
One thing that seems pretty likely is that in each person’s life, he has a window where it’s easy and fun to learn to communicate logically with a new set of systems, and over time that window closes. For example, when I was in the marketing research industry, I jumped all over the coming of the personal computer in 1984 and the Internet in 1996.
More senior executives at the information company where I worked back then tended to find the new personal information technologies difficult to master. They were used to issuing orders to intelligent human beings, such as their secretaries, who wouldn’t take everything quite so literally. The founders of the company where I worked were superbly intelligent at dealing with human psychology, but they found arbitrary machine logic daunting.
But similar information technology developments in this century have not struck me as fun at all to learn about. On Twitter, for example, I’m basically clueless about whether I’m replying to one person or to thousands. Today, I feel like the Vice Chairman of my employer back in 1984 when he gave me his $9,000 IBM PC XT with the coveted 10-meg hard disk because he was too old to learn to type.
Generation after generation, children grow up in an environment ever denser with the kind of systems logic that the more Flynn Effected-Wechsler subtests ask about. Growing up, kids these days get more practice with the kind of thinking tested on the Raven’s and on some of the Wechsler subtexts. And they legitimately are better at it.
The Flynn Effect is a side effect of the developers of the IQ test being on “the right side of history.” We’re used to hearing progressives denounce IQ tests as obsolete pseudoscience on the wrong side of history, but, in reality, IQ testing in the United States has some amusing organic ties to the triumph of Silicon Valley. Louis Terman’s son Fred Terman (1900-1982), a professor of electrical engineering at Stanford, was the perhaps the single most important figure in the rise of Silicon Valley. The mentor of Hewlett and Packard, he largely invented the model of Stanford grad students like Larry Page and Sergey Brin starting up high tech firms like Google.
You are supposed to believe that the Termans were all wrong, but it sure looks like we’re living in the world the Terman family anticipated.