The Unz Review: An Alternative Media Selection
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Print Archives3 Items • Total Print Archives • Readable Only
Nothing found
 TeasersJames Thompson Blogview

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New ReplyRead More
ReplyAgree/Disagree/Etc. More... This Commenter This Thread Hide Thread Display All Comments
These buttons register your public Agreement, Disagreement, Troll, or LOL with the selected comment. They are ONLY available to recent, frequent commenters who have saved their Name+Email using the 'Remember My Information' checkbox, and may also ONLY be used once per hour.
Ignore Commenter Follow Commenter
Cognitive power leads to monetary accumulation.
🔊 Listen RSS


It is just a coincidence, but the initials NNT are best known to me as Numbers Needed to Treat. This is a measure of the numbers of patients you need to give a drug to in order to get one cure. For example, an NNT of 5 means that you have to treat five patients in order to get one patient cured. Even very useful drugs do not help everybody, so several people need to take the drug before one of them benefits.

I do not know how many posts I need to write to convince one person that Nassim Nicholas Taleb is an unreliable guide to intelligence research. It might take 30 posts, but I still think it would be worthwhile.

One of NNT’s complaints was that intelligence test scores did not predict important real-world achievements, such as being a good investor. A very quick search immediately came up with a paper which showed that brighter people had more successful investment histories. Those authors found that by making better stock selections and achieving lower transaction costs high IQ subjects did 4.9% per year better than low IQ subjects. Given that real returns average 7%, this is a massive difference which will accumulate over time and result in far high net personal worth for brighter investors. By the way, intelligence was measured at conscription age, long before there is much investment history, so it is more likely to be causal.

In this post I will describe another paper which shows that an intelligence test predicts savings 40 years later. This is real-world skin-in-the-game of which NNT should approve.

Furnham, A., & Cheng, H. (2019). Factors influencing adult savings and investment: Findings from a nationally representative sample. Personality and Individual Differences, 151, 109510. doi:10.1016/j.paid.2019.109510

The authors review the cognitive and personality measures which affect savings rates: neuroticism and conscientiousness affect outcomes. Both these measures involve paper and pencil questionnaires designed by psychologists, yet are associated with real life outcomes.

The National Child Development Study 1958 is a large-scale longitudinal study of the 17,415 individuals who were born in Great Britain in a week in March 1958 (Ferri, Bynner, & Wadsworth, 2003). They were a representative sample of the country at the time. 14,134 children at age 11 completed tests of cognitive ability (response =87%).

At 33 years, 11,141 participants provided information on their educational qualifications obtained (response =72%). At age 50 years, 8210 participants provided information on their current occupational levels (response = 67%); 9790 participants completed a questionnaire on personality (response = 79%); 9762 participants provided information on their self-assessed financial situation (response= 79%), 9729 participants provided information on their savings and investment (response =57%). The analytic sample comprises 5766 cohort members (50% females) for whom complete data were collected at birth, at ages 11years, and the outcome measure at age 50years. Bias due to attrition of the sample during childhood has been shown to be minimal (Davie, Butler, & Goldstein; 1972).

3.2. Measures

1. Family Social Background at Birth Family social background includes information on parental social class and parental education. Parental social class at birth was measured by the Registrar General’s measure of social class (RGSC). Parental education is measured by the age parents had left their full-time education.

2. Childhood Intelligence Childhood intelligence was assessed at age 11 in school using a general ability test (Douglas, 1964) consisting of 40 verbal and 40 non-verbal items. For the verbal items, children were presented with an example set of four words that were linked either logically, semantically, or phonologically. For the non-verbal tasks, shapes or symbols were used. The children were then given another set of three words or shapes or symbols with a blank. Participants were required to select the missing item from a list of five alternatives. Scores from these two set of tests correlate strongly with scores on an IQ-type test used for secondary school selection (r =0.93, Douglas, 1964) suggesting a high degree of validity.

3. Educational Qualifications At age 33, participants were asked about their highest academic or vocational qualifications. Responses are coded to the six-point scale of National Vocational Qualifications levels (NVQ) ranging from ‘none’ to ‘higher degree level’: 0= no qualifications; 1=some qualifications [Certificate of Secondary Education Grades 2 to 5]; 2= O level [equivalent to qualifications taken at the end of compulsory schooling]; 3= A level [equivalent to university entrance level qualifications]; 4=postsecondary degree/diploma and equivalent; and 5=higher post-graduate degrees and equivalent.

4. Personality Traits Personality traits were assessed at age 50, by the 50 questions from the International Personality Item Pool (IPIP)(Goldberg, 1999). Responses (5-point, from “Strongly Agree” to “Strongly Disagree”) are summed to provide scores on the so called ‘Big-5’ personality traits: Extraversion, Emotionality/neuroticism, Conscientiousness, Agreeableness and Openness. Scores on each trait range between 5 and 50 with higher scores equating to higher levels of each trait, of which 10 items for each trait. A preliminary test showed that the associations between traits Extraversion and Agreeableness were not significantly associated with adult savings and investment, thus these two traits were excluded from the following analyses. Preliminary analysis which included these two traits in the SEM model demonstrated that they were not moderator variables. Alpha was 0.88 for emotionality/neuroticism, 0.77 for conscientiousness, and 0.79 for intellect/openness.

5. Occupational Prestige Data on current or last occupation held by cohort members at age 50 are coded according to the RGSC described above, using a 6-point classification.

6. Financial Assessment was assessed at age 50. Participants were asked to assess their personal financial situation on a 5-point measure (1 =Finding it very difficult, 2= Finding it quite difficult, 3=Just about getting by, 4=Doing all right, 5= Living comfortably).

7. Adult Savings and Investment. At age 50, participants provided information on the amount of savings and investment they had, which were logged in the following analyses. In addition, participants also mentioned the specific types of their savings and investment, of which bank or building society =70.2%, ISA=51.8%, premium bonds= 35.0%, stocks and/or other shares =32.9%.

All this is fine, though self-report on wealth could be a problem. Asking people about their wealth can be a tricky business in the UK. If anything, people might be tempted to downplay it to avoid any tax enquiries. One simple control would have been to study the postcodes of participants, from which it is easy to get wealth estimates. Worth doing as a further measure of the accuracy of self-reports.

The verbal and non-verbal scores have been added for an overall ability latent variable, which has more predictive power.

The strongest association was between personal financial assessment and adult savings and investment, followed by education and occupation. This is a well-established finding. However, what was particularly interesting was the correlation between IQ measured as age 11 and savings measured 39 years later.

• Category: Science • Tags: Finance, Intelligence, Nassim Nicholas Taleb, Wealth 
When dull meet bright, tough love rules
🔊 Listen RSS

What happens when above average and below average ability people have to deal with each other?

Specifically, how will they interact when potentially both are able to gain from the exchange? It seems obvious that they should cooperate, and extract the greatest amount of mutual gain, but does this really happen in situations where there are also gains to be made from not cooperating?

How do bright and dull cope with each other now, in ordinary life, particularly when they cannot all meet face to face, but have to deal with the consequences of each other’s behaviours. Do these two groups understand each other, or are they always at loggerheads, doomed to perpetual conflict? Why can’t we all get along with each other?

I never say of any book that it has changed my life. People and events have changed my life, but books have only changed my mind. Robert Axelrod’s 1990 “The evolution of co-operation” was one such book. He wondered how co-operation could emerge in a world of self-seeking egoists – whether superpowers, businesses, or individuals – when there was no central authority to police their actions. I enjoyed his analysis of Prisoner’s Dilemma competitions, and the simplicity of “Tit for Tat”, which turned out to be the winning strategy. Start by cooperating, then do unto others as they do unto you.

The Prisoner’s Dilemma is a conceptual game in which two people accused of a crime are held separately, and each is told that if they implicate the other they will be set free. Clearly, if both keep quiet they both will be released for lack of evidence, but the sting is that the person who cooperates with the Police gets sets free and the other denounced person serves a long time behind bars. If there is solidarity among criminals then both will keep quiet and each will be released. If either one breaks, then the other is heavily punished. If prisoners doubt each other, both may denounce each other, to their profound mutual disadvantage.

It is a long time since I looked at the literature on this game, but decades ago I think no-one bothered to research whether intelligence made a difference. Experimentalists rarely considered this possibility. Now a team have looked at this, with very interesting results, which may have wide application. They studied how intelligence and personality affected the outcomes of games, focusing on repeated interactions that provide the opportunity for profitable cooperation.

Intelligence, personality and gains from cooperation in repeated interactions.
Journal of Political Economy (2019).

This is a very interesting and complex paper, and I have left out any consideration of the other games they have tested, and the further neuro-scanning measures they took of participants while playing games, which reveal that intelligent people showed more brain activity, presumably as they worked on the different strategies required for optimal cooperation. I will mention their personality measures in passing, because the intelligence differences were the most significant.

The method was as follows:

Our design involves a two-part experiment administered over two different days separated by one day in between. Participants are allocated into two groups according to some individual characteristic that is measured during the first part, and they are asked to return to a specific session to play several repetitions of a repeated game. Each repeated game is played with a new partner. The individual characteristics that we consider are: intelligence, Agreeableness and Conscientiousness, across different treatments that we will define as IQ-split, A-split and C-split, respectively.

In one treatment, participants are not separated according to any characteristic, but rather allocated to ensure similar groups across characteristics; we define this the combined treatment.

There were 792 subjects in all, university students on a wide range of courses, and on average they earned £20 each, of which £4 was paid for showing up. Motivating but not life-changing. Only 1 or 2 per 100 mentioned intelligence as the possible difference between groups, which strongly suggests that other explanations came to mind more readily. Instructive that we assume complicated motives rather than simple lack of understanding. It may be yet another example that bright people assume that others can think like them.

First, the authors show that brighter people do better at cooperation games than duller people. They cooperate more, and thus end up with higher final scores. Since the scores convert to money, they end up richer. They avoid immediate selfish gains in order to obtain higher long-term cooperative returns. Smart strategy. As an analogy for how they operate in real life, they are likely to reap the benefits of maximal cooperation, leading to increasing wealth.

The researchers then deliberately paired up an above average intelligence player with one who was below average to see what happened. The overall return to the participants fell, because lower ability players tended to defect so as to obtain an immediate advantage, at great cost to the other player. How should the bright player respond? Simply continuing to try to cooperate does not work, because the duller player is then rewarded for his lack of cooperation. Instead, the “tit for tat” punishment strategy is required. Start by cooperating, and on the next round do whatever the other person did: if they cooperated, you cooperate; if they defected, you defect. The researchers call this “tough love”.

Four applications of retaliation were, on average, required to teach the lesson that lack of cooperation would be punished with reciprocal lack of cooperation. Eventually cooperation is established between bright and dull, but at an initial cost. Lower intelligence players learn to cooperate, because higher intelligence players punish them if they don’t. In societies where cooperation is already low, lenient and forgiving strategies become less frequent. There is very probably a level at which trust can be assumed, but below that punishment will be the norm. Where is the social tipping point below which cooperation is too costly a strategy? At what point do civil societies collapse and turn into uncivil bands?

The outcome of games with a trade-off between short-run gain and continuation value loss was strikingly different when played by subjects with higher or lower levels of intelligence. Higher intelligence resulted in significantly higher levels of cooperation and earnings. The failure of individuals with lower intelligence to appropriately estimate the future consequences of current actions accounts for these differences in outcomes. Personality also affects behavior, but in smaller measure, and with low persistence. These results have potentially important implications for policy. For example, while the complex effects of early childhood intervention on the development of intelligence are still currently being evaluated (e.g. Heckman, 2006), our results suggest that any such effect would potentially enhance not only the economic success of the individual, but the level of cooperation in society (at least when interactions are repeated).

• Category: Science • Tags: Game Theory, IQ 
🔊 Listen RSS

As every conference attendee knows, a few minutes with a researcher is worth many hours of reading their work. What researchers say in person will be up to date, generally unvarnished and to the point. Compared to writing, conversation is speedy, interactive, and tends towards confession: the spoken word accompanied by the revealed emotion, a multi-level signal, rich in content. Ambiguities can be probed with short queries about meaning and anything contentious subjected to rapid forensic examination, in a two-way process which homes in on core issues. All this would take weeks by email, and in 5 exchanges would lead to boiling rage on Twitter.

Minneapolis is a fine city, with an excellent gallery. Cranach knew a thing or two about the human condition.

ISIR2019 was a conference at which one was spoilt for choice, since within speaking distance over coffee one could chat with Charles Murray, Steve Hsu, James Lee, Greg Cochran, Greg Clark, Razib Khan, Bruce Lahn and Neven Sesardic and many others. At breakfast with Tim Bates I met an amiable couple and, assuming they were wild-variant humans who happened to be staying at the hotel, launched immediately into a general enquiry about life in Minneapolis. They were a sparky and fun couple, and later in the day I realized I had been giving car buying advice to Prof Tom Bouchard, a legendary figure in twin research.

Even better, all of the prominent researchers were excited to see so many younger researchers, whom they quizzed enthusiastically. There is an excellent crop of young scientists already making their mark, and they were the de facto stars of the event, because established participants are all too aware that a decade ago such new talent was rare: it was a conference for older researchers. (ISIR offers special inducements for researchers at the start of their careers).

The first day of the conference had a Symposium on Science and Ethics of genetic engineering, with Greg Cochran, Steve Hsu, Razib Khan, Bruce Lahn and Neven Sesardic. Sesardic argued that John Rawls’s work was a far from perfect guide to ethics in this field. Impossible to cover each contribution, but a general theme was that “designer babies” were unlikely, mostly because of doubts about unintended consequences. Crispr techniques are accurate for point deletions and small sequence insertions, but not so accurate when dealing with longer stretches of DNA. The panel as a whole was cautious about any gene editing procedures at this stage, though Razib Khan said that some in the genetics world, while condemning He Jiankui for his work on twin babies susceptible to HIV, were also grudgingly impressed with what he had done.

In answer to a question, Bruce Lahn said that genetic engineering in mouse was accurate, and came up with very few unintended effects, of the order of 1%. There was a common agreement among the panel that the appropriate ethical standards would prevent such experimentation in the West, but uncertainty about whether this would be the case in China. This raised a possibility that whichever nation relaxed ethical standards to allow experimentation might gain a considerable advantage over other, merely by the deletion of intelligence-lowering mutations. The panel also noted that screening for Down’s syndrome was now routine. In-vitro fertizilation was now running at over a million births a year, and these children has been previously stigmatized as “test tube babies”. Attitudes change if people are give the ability to choose new techniques.

This is a very brief summary, but here is the sequence as I see it, from those likely to happen soon to those much further downstream and happening later, if at all:

1) In countries where pregnancies can be terminated, more pre-natal screening, not only for Down’s syndrome, but for other forms of severe mental handicap and, when possible, for some genetic disorders like cystic fibrosis and Huntington’s chorea.
2) In the case of in-vitro fertilization, far more screening based on polygenic risk scores for a wider variety of disorders, concentrating initially on those with the very highest scores which put embryos most at risk. This depends on having viable foetuses to select from. No changes are made to the foetus, but choice is guided by polygenic risk scores.
3) Limited use of Crispr on foetuses to remove mutations directly linked to serious genetic disorders.
4) Crispr being used more generally to remove SNPs which increase vulnerability to a broader range of genetic disorders.
5) Crispr being used even more generally to remove SNPs which increase vulnerability to psychiatric disorders and low intelligence.
6) Crispr or other techniques being used to create “disease resistant” embryos.

Incidentally, one prominent researcher said that he and his colleagues were perplexed as to exactly what had been said at the London Conference which had caused so much trouble. I replied that I too was perplexed, but thought that it was because one of the 59 papers given at UCL was about eugenics, arguing it would only be contemplated in the setting of Malthusian over-population, and that it would not select for intelligence, but for a propensity to be happy. “Really” he replied “but that is far, far less than we have discussed here today”.

Strange are the ways of humankind.

• Category: Science • Tags: Genetic Engineering 
🔊 Listen RSS

The full conference began yesterday. In the midst of listening to all the papers I can’t post anything much, but will keep live tweeting some of the presentations.

As ever, the best thing is meeting participants and finding out first hand about their work, stuff which will be published a year from now. Great fun watching people exchange data analyses by showing principal components results on their phones, and making arrangements to swop data sets. Also very entertaining to see the mix of ages, and to see Tom Bouchard talking to James Lee about the progress of our understanding of the genetics of intelligence, from what I will call the “twin age” to the genomic age.

Fabulous to see the youngest delegates attending and meeting the very researchers whose publications several decades ago inspired those student to enter the field. I saw first hand a few “your book changed my life” testimonial statements.

More to come.

🔊 Listen RSS

When I started work in September 1968 one of the first things I was taught was that intelligence testing had a long history, and that many of the subtests in the Wechsler assessments I had been taken from previous research. Kohs’ blocks (1920), I used to mutter, when people talked about Block Design. I was also taught something about the Stanford Binet tests that I would not be using, because some clinicians still used them, and there was historical data I would need to know about. In hearing about skilled Binet testers I learned about dynamic testing: going from one domain to another as quickly as possible, just to establish general levels efficiently. I also learned that such procedures were only possible once one had achieved a very good knowledge of the test.

I was required to know my material almost by heart so that I could concentrate on every aspects of the patient’s behaviour. After 200 test administrations I began to feel confident I had seen it all, and knew all error types intimately. On my 201st test administration I encountered an entirely new error on Block Design, a scope and size error which was highly unusual. Even psychologists can learn something.

Does history matter? I think so. The early history of intelligence testing allows us to test the idea that IQ items are “arbitrary” and have no relevance to real life problems.

First publication of subtests in the Stanford-Binet 5, WAIS-IV, WISC-V, and WPPSI-IV. Aisa Gibbons, Russell T. Warne. Intelligence, 2019, Volume 75, July–August 2019, Pages 9-18.

The authors discuss the pre-history of intelligence testing from 1905 onwards. The period up to 1920 was extremely productive, and testing was popular, perhaps because of its widespread use in the military. Binet was interested in the lower levels of ability, Terman in the highest levels. Test have to cater for the entire range, all 7 tribes of intellect. Not only that, they have to maintain discriminatory power throughout the whole range, though that is hard to do at the extremes.

Wechsler favored test formats and items that (a) showed high discrimination in intelligence across much of the continuum of ability, (b) produced scores with high reliability, (c) correlated strongly with other widely accepted measures of intelligence, and (d) correlated with “pragmatic” subjective ratings of intelligence from people who knew the examinee—such as a work supervisor (Wechsler, 1944).

That is a good summary of what an intelligence test item must achieve: discrimination, reliability, validity with other tests and, most importantly, with intelligence in everyday life.

Gibbons and Warne show that many tests go back a long way, and are earlier than generally realized. Their list of tests is an excellent way to understand all the tasks which have constituted the core elements of intelligence testing.

I learned a great reading through this section of the paper. For example, I did not know that Binet said of his early reasoning test that it was the best of the lot:

“the 1908 scale (of reasoning) has three images, each containing at least one human figure. The child then was asked to describe the picture, and more complex responses based on interpretation (rather than simply naming objects in the image) were viewed as indicative of greater intellectual ability. Binet found this subtest so useful when diagnosing intellectual disabilities that he wrote, “Very few tests yield so much information as this one.. .. We place it above all the others, and if we were obliged to retain only one, we should not hesitate to select this one” (Binet & Simon, 1908/1916, p. 189).

Intelligence goes beyond the obvious.

Here are some historical points which were news to me:

Jean Marc Gaspard Itard was the first to use a form board-like task when he studied and educated a young boy found in the wild (named the “wild boy of Aveyron”) in 1798.

The very similar visual puzzles and object assembly subtests have an origin in the puzzles used for entertainment and geography education, which were first created in the 1750s in England and were in wide-spread use in the early 20th century when the first intelligence tests were being created (Norgate, 2007).

One discovery that we found striking was the diverse sources of inspiration for subtests. While the majority did have roots in the creation of cognitive tests, others have their origin in games (the delayed response subtest, the object assembly subtest), classroom lessons (the block design subtest), the study of a feral child (form boards and related subtests), school assessments (vocabulary subtest) and more. To us, this means that items on intelligence tests often have a connection with the real world—even when they are presented in a standardized, acontextual testing setting. Additionally, this undercuts the suggestion that critics of intelligence testing often make that intelligence test items are meaningless tasks that are divorced from any relationship to an examinee’s environment (e.g., Gould, 1981).

On the other hand, one criticism of intelligence tests seems justified from our study: subtests that appear on popular intelligence tests have changed little in the past century (Linn, 1986). While one could argue that the enduring appeal of these subtests is due to their high performance in measuring intelligence, the fact remains that many of these subtests were often created with little guiding theory or understanding of how the brain and mind work to solve problems (Naglieri, 2007). While sophisticated theories regarding test construction and the inter-relationships of cognitive abilities have developed in recent decades (e.g., Carroll, 1993), it is often not clear exactly how the tasks on modern intelligence tasks elicit examinees to use their mental abilities to respond to test items.

One way to test this criticism is to think of new tests more suited to the present age. Of course, others have already had that thought, and have created computer games which measure intelligence. Fun, but is this a big advance? It is only a gain if the results are more accurate, better predictive of real-life achievements and more speedily obtained. That is a hard bar to clear, since reasonable overall measures can be obtained in a few minutes. More likely, corporations are measuring our intelligence very quickly and surreptitiously by noting our google searches, Facebook likes, and perhaps even commenting histories.

A more pressing problem, to which the authors allude in passing, is that some new-fangled tests are launched each year, and most fall out of use. The reasons is that Wechsler testers have now become highly pragmatic, and do not take kindly to complicated administration procedures, not to test materials which are difficult to assemble and present quickly.

The reality appears to be that any puzzling task taps ability, and there are diminishing returns when using new psychometric tasks. This is the familiar “indifference of the indicator” which Spearman proposed in 1904. This does not exclude finding that individuals have strength and weaknesses in specific domains, but simply that all tasks lead to g, either quickly or slowly, to slightly varying degrees.

• Category: Science • Tags: General Intelligence, Intelligence, IQ 
🔊 Listen RSS

I have good memories of San Antonio, host city of the ISIR 2012 conference. We visited the Alamo, and where throngs of tourists looked respectfully at an ancient wall of the building which was being restored with lime mortar. It was regarded as a restoration of national importance, and the wall was cordoned off, with detailed explanations. At that same time in England the local stone mason was restoring our similarly aged West facing cottage walls, putting in lime mortar, so it seemed a natural process, though accorded a profound reverence in this case. If only stones could speak, and so on. History is when things turn, not just when they happen. Impact matters, so a distant siege can become an icon of national affirmation. All respect to those citizens who saved the monument from encroaching development.

It was that visit in December 2012 which really got me blogging. It seemed a waste of an airfare to attend a conference and then not tell anyone about it. I listed the papers and made some comments, aware that I was casting my words into empty space. My actual conference report got 20 views. As I blogged more about intelligence research the numbers built up slowly. In the subsequent year I got 71,701 pageviews.

Next week I am off to the ISIR 2019 conference in Minneapolis. I will do my best to post up the papers presented, and to tweet comments and links. On the other hand, I tend to just listen to papers, so might well blog less for a while.

This morning I find that I have reached a million page views. Thanks for reading.

• Category: Science 
🔊 Listen RSS

Teachers loom large in most children’s lives, and are long remembered. Class reunions often talk of the most charismatic teacher, the one whose words and helpfulness made a difference. Who could doubt that they can have an influence on children’s learning and future achievements?

Doug Detterman is one such doubter:

Education and Intelligence: Pity the Poor Teacher because Student Characteristics are more Significant than Teachers or Schools.

Douglas K. Detterman, Case Western Reserve University (USA)
The Spanish Journal of Psychology (2016), 19, e93, 1–11.


Education has not changed from the beginning of recorded history. The problem is that focus has been on schools and teachers and not students. Here is a simple thought experiment with two conditions: 1) 50 teachers are assigned by their teaching quality to randomly composed classes of 20 students, 2) 50 classes of 20 each are composed by selecting the most able students to fill each class in order and teachers are assigned randomly to classes. In condition 1, teaching ability of each teacher and in condition 2, mean ability level of students in each class is correlated with average gain over the course of instruction. Educational gain will be best predicted by student abilities (up to r = 0.95) and much less by teachers’ skill (up to r = 0.32). I argue that seemingly immutable education will not change until we fully understand students and particularly human intelligence. Over the last 50 years in developed countries, evidence has accumulated that only about 10% of school achievement can be attributed to schools and teachers while the remaining 90% is due to characteristics associated with students. Teachers account for from 1% to 7% of total variance at every level of education. For students, intelligence accounts for much of the 90% of variance associated with learning gains. This evidence is reviewed.

Have we over-rated the impact of teachers, and ignored the importance of innate ability? How can we have been so mistaken? Read on.

At least in the United States and probably much of the rest of the world, teachers are blamed or praised for the academic achievement of the students they teach. Reading some educational research it is easy to get the idea that teachers are entirely responsible for the success of educational outcomes. I argue that this idea is badly mistaken. Teachers are responsible for a relatively small portion of the total variance in students’ educational outcomes. This has been known for at least 50 years. There is substantial research showing this but it has been largely ignored by educators. I further argue that the majority of the variance in educational outcomes is associated with students, probably as much as 90% in developed economies. A substantial portion of this 90%, somewhere between 50% and 80% is due to differences in general cognitive ability or intelligence. Most importantly, as long as educational research fails to focus on students’ characteristics we will never understand education or be able to improve it.

Doug Detterman is a noble toiler in the field of intelligence, and has very probably read more papers on intelligence than anyone else in the world. He notes that the importance of student ability was known by Chinese administrators in 200 BC, and by Europeans in 1698.

The main reason people seem to ignore the research is that they concentrate on the things they think they can change easily and ignore the things they think are unchangeable.

Despite some experiments, the basics of teaching have not changed very much: the teacher presents stuff on a blackboard/projector screen which the students have to learn by looking at the pages of a book/screen, and then writing answers on a page/screen. By now you might have expected all lessons to have been taught by some computer driven correspondence tutorials, cheaply delivered remotely. There is some of that, but not as much as dreamed of decades ago.

Detterman reviews Coleman et al. (1966) and Jencks et al. (1972) which first brought to attention that 10% to 20% of variance in student achievement was due to schools and 80% to 90% due to students.He then look at more recent reviews of the same issue.

Gamoran and Long (2006) reviewed the 40 years of research following the Coleman report but also included data from developing countries. They found that for countries with an average per capita income above $16,000 the general findings of the Coleman report held up well. Schools accounted for a small portion of the variance. But for countries with lower per capita incomes the proportion of variance accounted for by schools is larger. Heyneman and Loxley (1983) had earlier found that the proportion of variance accounted for by poorer countries was related to the countries per capita income. This became known as the Heyneman-Loxley effect. A recent study by Baker, Goesling, and LeTendre (2002) suggests that the increased availability of schooling in poorer countries has decreased the Heyneman-Loxley effect so that these countries are showing school effects consistent with or smaller than those in the Coleman report.

The largest effect of schooling in the developing world is 40% of variance, and that includes “schooling” where children attend school inconsistently, and staff likewise.

After being destroyed during the Second World War, Warsaw came under control of a Communist government which allocated residents randomly to the reconstructed city, to eliminate cognitive differences by avoiding social segregation. The redistribution was close to random, so they expected that the Raven’s Matrices scores would not correlate with parental class and education, since the old class neighbourhoods had been broken up, and everyone attended the schools to which they had randomly been assigned. The authorities assumed that the correlation between student intelligence and the social class index of the home would be 0.0 but in fact it was R2= 0.97, almost perfect. The difference due to different schools was 2.1%. In summary, in this Communist heaven student variance accounted for 98% of the outcome.

Angoff and Johnson (1990) showed that the type of college or university attended by undergraduates accounted for 7% of the variance in GRE Math scores. Fascinatingly, a full 35% of students did not take up the offer from the most selective college they were admitted to, instead choosing to go to a less selective college. Their subsequent achievements were better predicted by the average SAT score of the college they turned down than the average SAT scores of the college they actually attended, the place where they received their teaching. Remember the Alma Mater you could have attended.

Twins attending the same classroom are about 8% more concordant than those with different teachers, which is roughly in line with the usual school effect of 10%.

Detterman’s paper continues with a review of other more recent studies. A good summary is shown below.

Here is a summary of the characteristics of students which predict good scholastic outcomes.

• Category: Science • Tags: IQ, Public Schools 
🔊 Listen RSS

For some years I have been organizing the London Conference on Intelligence, which brings together about 25 invited researchers to present papers and debate issues in a critical but friendly setting. (“The London School” was the name give to those who argued that intelligence had a general component, and was heritable). Speakers are chosen for innovative work, independent thought and for being more interested in whether things are true than whether they are comfortable. We are in favour of the under-dog and the rebellious, but if there is a theme at all is that all views must have empirical support.

As you would expect from any group of academics, there were many differences of opinion, and less emphasis on organization. We made sure there was plenty of time for informal discussion, and that resulted in many of the researchers working together on scientific papers. In fact, about half of the presented papers eventually ended up as published work, slightly better than the norm for conferences. The only common project we ever agreed upon was that the Lynn database of country IQs should be thoroughly revised and every aspect documented on a public database.

By way of background, I had originally intended that these meeting would be public, with university students attending, and journalists invited. Speakers told me I was naïve to even consider that option, because many of them were already under political pressure, and feared loss of grant money, promotion, or even their academic survival. So, we moved to invitation only, and reduced publicity.

One young speaker was a bit different because he was a sociologist by background, and attended the group primarily to seek a sounding board for his work on the link between political attitudes and intelligence. He did no work on race and intelligence, though he later wrote a paper explaining why in his view such research should continue, and that suppressing it would be wrong.

Last year he won a Fellowship at the University of Cambridge, the best of over 900 candidates. This was a great achievement. Before he could take up this prestigious post, which deservedly would have launched him on a brilliant career, a political campaign was launched against him, and one of his supposed crimes was to have attended the London Conference on Intelligence. Additionally, he had written an empirical paper arguing that people’s views of immigrant groups were affected by that immigrant group’s criminality.

Short story: Cambridge threw him out. He lost his job, and effectively has lost his career. We had no way to defend him from this outrageous injustice. We wondered how he would ever find a job, in today’s very censorious climate.

It is a pleasure to report that he has launched a crowdfunded lawsuit against the Cambridge College which hounded him out. He is doing this simply to show that he was unfairly judged. Any surplus funds, should there be any will be held over for the next person to be treated in this awful way.

Could you please support him? It turns out that the investigation into his appointment process confirms he was the best person to get the job.

Even $10 from each person reading this blog would help him mount a case, and I think he will win. This could be a turning point.

• Category: Science • Tags: Academia, IQ, Political Correctness 
🔊 Listen RSS

As an undergraduate, my psychology tutor dryly commented to me that the best way to get a paper widely read was to give it a memorable title, like “the magic number 7, plus or minus 2”.

Miller, G. A. (1956). The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychological Review, 63(2), 81–97.

Here is the abstract, in full:

A variety of researches are examined from the standpoint of information theory. It is shown that the unaided observer is severely limited in terms of the amount of information he can receive, process, and remember. However, it is shown that by the use of various techniques, e.g., use of several stimulus dimensions, recoding, and various mnemonic devices, this informational bottleneck can be broken. 20 references.

Out of respect for George Miller, this post will be equally brief. His paper became a classic because it showed that we perceive the world through an attentional bottleneck, one which we would like to expand, but which has not proved possible despite every training effort over the last 63 years, other than for a few heroic individuals who practiced digit span hard for months, and then find their abilities did not generalize to other memory tasks. Like in a funnel, the many possible inputs of experience must slowly swirl down a narrow spout into the waiting brain. A grievous restriction.

All is not lost, because we can cope with our restricted scope by learning how to “chunk” data into other more meaningful units. So, although we are cabined, cribbed, confined in actuality, we have found heuristics to cope with our limitations. Despite that, people still yearn to achieve even more if they could increase their span just a little.

The much-vaunted Flynn effect has done nothing for digit span, although it may have increased the easy “digits forwards” performance just a fraction, at the cost of reducing the harder and more predictive “digits backwards” performance by a similar fraction, resulting in no change overall.

How do other species fare in this regard? In a very brief review, Manoochehri (2019) lays out the basic picture and wonders how memory span evolved.

The evolution of memory span: a review of the existing evidence. Majid Manoochehri

The existing evidence shows that chimpanzees have a memory span of about 5 items (Inoue &Matsuzawa, 2007; Kawai & Matsuzawa, 2000).

Lately, Toyoshima et al. (2018) have stipulated that rats are able to remember 5 objects at once.

Baboons reveal a memory span of about 4 to 5 items (Fagot & DeLillo, 2011).

Herman etal. (2013) have suggested a memory span of about 4 to 5 items for bottlenose dolphins.

The results of studying two rhesus monkeys Swartz et al. (1991) suggest a memory span of about 4 objects.

Similar work by Sugita et al. (2015) has argued that rats’ memory span is approximately 4 items.

Terrace (1993) has found it takes a pigeon 3-4 months to learn a 4-item list, which suggests that 4 is a pigeon’s memory span.

More studies of more animals would be needed to show if the jump from 4-5 items up to the human 7 items is the massive discontinuity it appears to be. Did we pick up a mutation 60 to 130 thousand years ago which gave us the bandwidth to use grammatical computations, greater articulatory rehearsal, leading to automatic long-term storage, and the beginnings of introspection, self-reflection, consciousness and symbolic thought? It might even have given us the ability to create and enjoy music, a language-like spin off from newly acquired processing skills.

• Category: Science • Tags: Animal IQ 
🔊 Listen RSS

Early in any psychology course, students are taught to be very cautious about accepting people’s reports. A simple trick is to stage some sort of interruption to the lecture by confederates, and later ask the students to write down what they witnessed. Typically, they will misremember the events, sequences and even the number of people who staged the tableaux. Don’t trust witnesses, is the message.

Another approach is to show visual illusions, such as getting estimates of line lengths in the Muller-Lyer illusion, or studying simple line lengths under social pressure, as in the Asch experiment, or trying to solve the Peter Wason logic problems, or the puzzles set by Kahneman and Tversky. All these appear to show severe limitations of human judgment. Psychology is full of cautionary tales about the foibles of common folk.

As a consequence of this softening up, psychology students come to regard themselves and most people as fallible, malleable, unreliable, biased and generally irrational. No wonder psychologists feel superior to the average citizen, since they understand human limitations and, with their superior training, hope to rise above such lowly superstitions.

However, society still functions, people overcome errors and many things work well most of the time. Have psychologists, for one reason or another, misunderstood people, and been too quick to assume that they are incapable of rational thought?

Gerd Gigerenzer thinks so.

He is particularly interested in the economic consequences of apparent irrationality, and whether our presumed biases really result in us making bad economic decisions. If so, some argue we need a benign force, say a government, to protect us from our lack of capacity. Perhaps we need a tattoo on our forehead: Diminished Responsibility.

The argument leading from cognitive biases to governmental paternalism—in short, the irrationality argument—consists of three assumptions and one conclusion:

1. Lack of rationality. Experiments have shown that people’s intuitions are systematically biased.

2. Stubbornness. Like visual illusions, biases are persistent and hardly corrigible by education.

3. Substantial costs. Biases may incur substantial welfare-relevant costs such as lower wealth, health, or happiness.

4. Biases justify governmental paternalism. To protect people from theirbiases, governments should “nudge” the public toward better behavior.

The three assumptions—lack of rationality, stubbornness, and costs—imply that there is slim chance that people can ever learn or be educated out of their biases; instead governments need to step in with a policy called libertarian paternalism (Thaler and Sunstein, 2003).

So, are we as hopeless as some psychologists claim we are? In fact, probably not. Not all the initial claims have been substantiated. For example, it seems we are not as loss averse as previously claimed. Does our susceptibility to printed visual illusions show that we lack judgement in real life?

In Shepard’s (1990) words, “to fool a visual system that has a full binocular and freely mobile view of a well-illuminated scene is next to impossible” (p. 122). Thus, in psychology, the visual system is seen more as a genius than a fool in making intelligent inferences, and inferences, after all, are necessary for making sense of the images on the retina.

Most crucially, can people make probability judgements? Let us see. Try solving this one:

A disease has a base rate of .1, and a test is performed that has a hit rate of .9 (the conditional probability of a positive test given disease) and a false positive rate of .1 (the conditional probability of a positive test given no disease). What is the probability that a random person with a positive test result actually has the disease?

Most people fail this test, including 79% of gynaecologists giving breast screening tests. Some researchers have drawn the conclusion that people are fundamentally unable to deal with conditional probabilities. On the contrary, there is a way of laying out the problem such that most people have no difficulty with it. Watch what it looks like when presented as natural frequencies:

Among every 100 people, 10 are expected to have a disease. Among those 10, nine are expected to correctly test positive. Among the 90 people without the disease, nine are expected to falsely test positive. What proportion of those who test positive actually have the disease?

In this format the positive test result gives us 9 people with the disease and 9 people without the disease, so the chance that a positive test result shows a real disease is 50/50. Only 13% of gynaecologists fail this presentation.

Summing up the virtues of natural frequencies, Gigerenzer says:

When college students were given a 2-hour course in natural frequencies, the number of correct Bayesian inferences increased from 10% to 90%; most important, this 90% rate was maintained 3 months after training (Sedlmeier and Gigerenzer, 2001). Meta-analyses have also documented the “de-biasing” effect, and natural frequencies are now a technical term in evidence-based medicine (Akiet al., 2011; McDowell and Jacobs, 2017). These results are consistent with a long literature on techniques for successfully teaching statistical reasoning (e.g., Fonget al., 1986). In sum, humans can learn Bayesian inference quickly if the information is presented in natural frequencies.

If the problem is set out in a simple format, almost all of us can all do conditional probabilities.

I taught my medical students about the base rate screening problem in the late 1970s, based on: Robyn Dawes (1962) “A note on base rates and psychometric efficiency”. Decades later, alarmed by the positive scan detection of an unexplained mass, I confided my fears to a psychiatrist friend. He did a quick differential diagnosis on bowel cancer, showing I had no relevant symptoms, and reminded me I had lectured him as a student on base rates decades before, so I ought to relax. Indeed, it was false positive.

Here are the relevant figures, set out in terms of natural frequencies

Every test has a false positive rate (every step is being taken to reduce these), and when screening is used for entire populations many patients have to undergo further investigations, sometimes including surgery.

Setting out frequencies in a logical sequence can often prevent misunderstandings. Say a man on trial for having murdered his spouse has previously physically abused her. Should his previous history of abuse not be raised in Court because only 1 woman in 2500 cases of abuse is murdered by her abuser? Of course, whatever a defence lawyer may argue and a Court may accept, this is back to front. OJ Simpson was not on trial for spousal abuse, but for the murder of his former partner. The relevant question is: what is the probability that a man murdered his partner, given that she has been murdered and that he previously battered her.

Accepting the figures used by the defence lawyer, if 1 in 2500 women are murdered every year by their abusive male partners, how many women are murdered by men who did not previously abuse them? Using government figures that 5 women in 100,000 are murdered every year then putting everything onto the same 100,000 population, the frequencies look like this:

James Thompson
About James Thompson

James Thompson has lectured in Psychology at the University of London all his working life. His first publication and conference presentation was a critique of Jensen’s 1969 paper, with Arthur Jensen in the audience. He also taught Arthur how to use an English public telephone. Many topics have taken up his attention since then, but mostly he comments on intelligence research.