The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

 TeasersiSteve Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS


There has been much discussion lately of the “Replication Crisis” in psychology, especially since the publication of a recent study attempting to replicate 100 well-known psychology experiments. From The Guardian:

Study delivers bleak verdict on validity of psychology experiment results

Of 100 studies published in top-ranking journals in 2008, 75% of social psychology experiments and half of cognitive studies failed the replication test

For more analysis, see Scott Alexander at SlateStarCodex: “If you can’t make predictions, you’re still in a crisis.”

(By the way, some fields in psychology, most notably psychometrics, don’t seem to have a replication crisis. Their PR problem is the opposite one: they keep making the same old predictions, which keep coming true, and everybody who is anybody therefore hates them for it, kill-the-messenger style. For example, around the turn of the century, Ian Deary’s team tracked down a large number of elderly individuals who had taken the IQ test given to every 11-year-old in Scotland in 1932 to see how their lives had turned out. They found that their 1932 IQ score was a fairly good predictor. Similarly, much of The Bell Curve was based on the lives of the huge National Longitudinal Study of Youth 1979 sample up through 1990. We now have another quarter a century of data with which to prove that The Bell Curve doesn’t replicate. And we even have data on thousands of the children of women in the original Bell Curve sample. This trove of data is fairly freely available to academic researchers, but you don’t hear much about findings in The Bell Curve failing to replicate.)

Now there are a lot of reasons for these embarrassing failures, but I’d like to emphasize a fairly fundamental one that will continue to plague fields like social psychology even if most of the needed methodological reforms are enacted.

Consider the distinction between short-term and long-term predictions by pointing out two different fields that use scientific methods but come up with very different types of results.

At one end of the continuum are physics and astronomy. They tend to be useful at making very long term predictions: we know to the minute when the sun will come up tomorrow and when it will come up in a million years. The predictions of physics tend to work over very large spatial ranges, as well. As our astronomical instruments improve, we’ll be able to make similarly long term sunrise forecasts for other planetary systems.

Why? Because physicists really have discovered some Laws of the Universe.

At the other end of the continuum is the marketing research industry, which uses scientific methods to make short-term, localized predictions. In fact, the marketing research industry doesn’t want its predictions to be assumed to be permanent and universal because then it would go out of business.

For example, “Dear Jello Pudding Brand Manager: As your test marketer, it is our sad duty to report that your proposed new TV commercials nostalgically bringing back Bill Cosby to endorse your product again have tested very poorly in our test market experiment, with the test group who saw the new commercials going on to buy far less Jello Pudding over the subsequent six months than the control group that didn’t see Mr. Cosby endorsing your product. We recommend against rolling your new spots out nationally in the U.S. However, we do have some good news. The Cosby commercials tested remarkably well in our new test markets in China, where there has been far less coverage of Mr. Cosby’s recent public relations travails.”

I ran these kind of huge laboratory-quality test markets over 30 years ago in places like Eau Claire, Wisconsin and Pittsfield, MA. (We didn’t have Chinese test markets, of course.) The scientific accuracy was amazing, even way back then.

But while our marketing research test market laboratories were run on highly scientific principles, that didn’t necessarily make our results Science, at least not in the sense of discovering Permanent Laws of the Entire Universe. I vaguely recall that other people in our company did a highly scientific test involving Bill Cosby’s pudding ads, and I believe Cosby’s ads tested well in the early 1980s.

But that doesn’t mean we discovered a permanent law of the universe: Have Bill Cosby Endorse Your Product.

In fact, most people wouldn’t call marketing research a science, although it employs many people who studied sciences in college and more than a few who have graduate degrees in science, especially in psychology.

Marketing Research doesn’t have a Replication Crisis. Clients don’t expect marketing research experiments from the 1990s to replicate with the same results in the 2010s.

Where does psychology fall along this continuum between physics and marketing research?

Most would agree it falls in the middle somewhere.

My impression is that economic incentives push academic psychologists more toward interfacing closely with marketing research, which is corporate funded. For example, there are a lot of “priming” studies by psychologists of ways to manipulate people. “Priming” would be kind of like the active ingredient of “marketing.”

Malcolm Gladwell discovered a goldmine in recounting to corporate audiences findings from social sciences. People in the marketing world like the prestige of Science and the assumption that Scientists are coming up with Permanent Laws of the Universe that will make their jobs easier because once they learn these secret laws, they won’t have to work so hard coming up with new stuff as customers get bored with old marketing campaigns.

That kind of marketing money pushes psychologists toward experiments in how to manipulate behavior, making them more like marketing researchers. But everybody still expects psychological scientists to come up with Permanent Laws of the Universe even though marketing researchers seldom do. Psychologists don’t want to disabuse marketers of this delusion because then they would lose the prestige of Science!

🔊 Listen RSS

Something that’s important to keep in mind in all the hoopla over the Norwegian study of conscripts showing a few point higher IQs for oldest brothers, with the New York Times running three articles on the subject over the last week, is that small differences in IQ scores like this can be influenced by methodological issues of specific tests.

Now, big differences in average IQ, such as 15 points (one standard deviation), are test-independent. For decades, the Holy Grail of cognitive test designers has been to invent a test on which blacks and whites would average the same, without losing most real world predictive power. The first psychometrician to accomplish this would be rich and celebrated. Unfortunately, it has turned out to be the equivalent of the perpetual motion machine for engineers and cold fusion for physicists.

But small differences are sensitive to test design trade-offs. For example, the U.S. military’s 1979 version of its very heavily g-loaded entrance exam for enlistment applicants, the Armed Forces Qualifying Test, found an anomalously large 18.6 point gap between whites and blacks when it was standardized on the National Longitudinal Study of Youth in 1980 (this is the study that provides much of the new data in The Bell Curve). The average study has found a 16.5 point difference, so this 18.6 point gap was strange because the AFQT is a test the military has spent a fortune developing, and the NLSY sample, with about 13,000 participants including an oversampling of minorities, was the gold standard for a nationally representative data set..

The 1979 AFQT was designed to be highly accurate around IQs of 100. For instance, from 1992-2004, the military took very few applicants with IQs of 90, but would take quite a few 95s.

So, the 1979 AFQT was designed to be extremely thorough for the average person: it was 105 pages long! As Charles Murray pointed out recently, in the 1990s it was finally realized by studying results on a question-by-question basis that the length of the test had a downside that explained the unusually large 18.6 point white-black gap. Low IQ applicants, especially black males, often got discouraged by all the questions they couldn’t answer and would give up, either not filling in the rest or bubbling in the rest of the way.

In 1997, the 105 page paper and pencil AFQT was replaced with a computerized test that dynamically changed the test to reflect performance so far. For instance, if you missed a lot of early questions, the computer would serve up easier questions. The white-black gap turned out to be 14.7 points on the 1997 normalization of the computerized AFQT.

Unfortunately, we don’t know enough to be able to divvy up this 3.9 point narrowing of the white-black gap from one version of the AFQT to the next between the test methodology change and actual change in the size of the gap.

Somewhat similarly, Half Sigma hypothesizes that the brother result on the Norwegian equivalent of the AFQT is caused by older brothers being more conscientious. Perhaps they study harder in school and thus do better on the parts of the test that are less g-weighted. Or perhaps they just don’t give up as easily.

Or this could be a real result.

The point is, however, that it’s exactly backward for the media to get all worked up over one study reporting a 3 point difference between demographic groups (older and younger siblings) while ignoring the dozens of studies reporting much larger differences between demographic groups, such as between whites and Hispanics — especially because the Senate is voting on an immigration bill right now!

The best estimate I’ve yet seen of Hispanic-American IQs is the 2001 meta-analysis by Roth of 39 studies covering a total 5,696,519 individuals in America (aged 14 and above). It came up with an overall difference of 0.72 standard deviations in g (the “general factor” in cognitive ability) between “Anglo” whites and Hispanics. The 95% confidence range of the studies ran from .60 to .88 standard deviations, so there’s not a huge amount of disagreement among the studies.

One standard deviation equals 15 IQ points, so that’s a gap of 10.8 IQ points, or an IQ of 89 on the Lynn-Vanhanen scale where white Americans equal 100. That would imply the average Hispanic would fall at the 24th percentile of the white IQ distribution. This inequality gets worse at higher IQs Assuming a normal distribution, 4.8% of whites would fall above 125 IQ versus only 0.9% of Hispanics, which explains why Hispanics are given ethnic preferences in prestige college admissions.

In contrast, 105 studies of 6,246,729 individuals found an overall average white-black gap of 1.10 standard deviations, or 16.5 points. (I typically round this down to 1.0 standard deviation and 15 points). So, the white-Hispanic gap appears to be about 65% as large as the notoriously depressing white-black gap.

So, the white-Hispanic IQ gap is about what you’d guess from observing life around you with your lying eyes: not as big and deleterious as the white-black gap, but not trivial either.

If a 3 point IQ difference between brothers is worth three articles in the New York Times, you might think that an eleven point gap between whites and Hispanics would be worth, oh, say, eleven articles, especially when the immigration bill is up for debate in the Senate. But almost nobody has ever mentioned Roth’s finding in the press.

(Republished from iSteve by permission of author or representative)
Steve Sailer
About Steve Sailer

Steve Sailer is a journalist, movie critic for Taki's Magazine, columnist, and founder of the Human Biodiversity discussion group for top scientists and public intellectuals.

The “war hero” candidate buried information about POWs left behind in Vietnam.
The evidence is clear — but often ignored
The unspoken statistical reality of urban crime over the last quarter century.
The major media overlooked Communist spies and Madoff’s fraud. What are they missing today?
What Was John McCain's True Wartime Record in Vietnam?