The Unz Review - Mobile
A Collection of Interesting, Important, and Controversial Perspectives Largely Excluded from the American Mainstream Media
Email This Page to Someone

 Remember My Information

Authors Filter?
Razib Khan
Nothing found
 TeasersGene Expression Blog

Bookmark Toggle AllToCAdd to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply
🔊 Listen RSS

A few years ago I put up a post, WORDSUM & IQ & the correlation, as a “reference” post. Basically if anyone objected to using WORDSUM, a variable in the General Social Survey, then I would point to that post and observe that the correlation between WORDSUM and general intelligence is 0.71. That makes sense, since WORDSUM is a vocabulary test, and verbal fluency is well correlated with intelligence.

But I realized over the years I’ve posted many posts using the GSS and WORDSUM, but never explicitly laid out the distribution of WORDSUM scores, which range from 0 (0 out of 10) to 10 (10 out of 10). I’ve used categories like “stupid, interval 0-4,” but often only mentioned the percentiles in the comments after prompting from a reader. This post is to fix that problem forever, and will serve as a reference for the future.

First, please keep in mind that I limited the sample to the year 2000 and later. The N is ~7,000, but far lower for some of variables crossed. Therefore, I invite you to replicate my results. After the charts I will list all the variables, so if you care you should be able to replicate displaying all the sample sizes in ~10 minutes. I am also going to attach a csv file with the raw table data. As for the charts, they are simple.

- The x-axis is a WORDSUM category, ranging from 0 to 10

- The y-axis is the percent of a given demographic class who received that score. I’ve labelled some of them where the chart doesn’t get too busy

All of the charts have a line which represents the total population in the sample (“All”).


The “Row” variable in all cases was WORDSUM. I put in YEAR(2000-*) in “Selection Filter(s).”

For the columns:

Sex = SEX

Race/ethnicity = For non-Hispanic blacks and whites put HISPANIC(1) in the filter. Then RACE. For Hispanics just limiting the sample to Hispanics will do, HISPANIC(2-*). Nothing in the row needed.

Education = DEGREE

Region = REGION

Political ideology = POLVIEWS(r:1-3″Liberal”;4″Moderate”;5-7″Conservative)

Political party = PARTYID(r:0-2″Democrat”;3″Independent”;4-6″Republican”)

Belief in God = GOD(r:1-2″Atheist & agnostic”;3-5″Theist”;6″Convinced Theist)

Religion = RELIG

Opinion about Bible = BIBLE

Income standardized to 1986 = REALINC(r:0-20000″0-20″;20000-40000″20-40″;40000-60000″40-60″;60000-80000″60-80″;80000-100000″80-100″;100000-120000″100-120″;120000-140000″120-140″;140-*”140-”)

Wealth = WEALTH(r:1-3″”)

Evolution = EVOLVED

You can find the raw table here.

(Republished from Discover/GNXP by permission of author or representative)
🔊 Listen RSS

Mike the Mad Biologist has a post up, A Modest Proposal: Alabama Whites Are Genetically Inferior to Massachusetts Whites (FOR REALZ!). The post is obviously tongue-in-cheek, but it’s actually an interesting question: what’s the difference between whites in various regions of the United States? I’ve looked at this before, but I thought I’d revisit it for new readers.

First, I use the General Social Survey. Second, I use the WORDSUM variable, a 10 question vocabulary test which has a correlation of 0.70 with general intelligence. My curiosity is about differences across white ethnic groups by region. To do this I use the ETHNIC variable, which asks respondents where their ancestors came from by nation. I omitted some nations because of small sample size, and amalgamated others.

Here are my amalgamations:

German = Austria, Germany, Switzerland

French = French Canada, France

Eastern Europe = Lithuania, Poland, Hungary, Yugoslavia, Russia, Czechaslovakia (many were asked before 1992), Romania

Scandinavian = Denmark, Norway, Sweden, Finland (yes, I know that Finland is not part of Scandinavia, Jaakkeli!)

British = England, Wales, Scotland

Next we need to break it down by region. The REGION variable uses the Census divisions. You can see them to the left. I combined a few of these to create the following classes:

Northeast = New England, Middle Atlantic

Midwest = E North Central, W North Central

South = W S Central, E S Central, South Atlantic

West = Pacific, Mountain

The key method I used is to look for mean vocabulary test scores by ethnicity and religion. I also later broke down some of these ethnic groups by religion. Finally, all bar plots have 95 percent confidence intervals. This should give you a sense of the sample sizes for each combination.

First let’s break it down by race/ethnicity and compare it by region to get a reference:

Next, the main course:

Finally, let’s separate by religion for Germans and Eastern Europeans:

I include the last plot because these reports of nationality have to be taken with a consideration for the structure they may mask. People whose ancestors from Poland in the United States fall into two large categories: people of Jewish heritage whose identity as ethnic Poles was contested (recall that Jews often spoke Yiddish as their first language, a Germanic language), and Roman Catholic Slavs. I suspect many of those in the “None” category are also Jews by culture, if not religion.

Second: there is a tendency of people of all ethnic groups to have lower vocabulary scores if they are from the South or Midwest. This tendency is in many cases outside of the 95 percent confidence interval. It’s especially striking in the three groups with huge samples sizes in all regions: Germans, Irish, and British. Irish here includes both Scots-Irish and those of Irish Catholic background. Not only are the sample sizes for these groups large, but the roots of these groups in some of these regions go rather far back. In particular, the division between the people of British ancestry goes back centuries in the North vs. South divide.

How to understand this? There are a lot of complicating factors. But as outlined in Albion’s Seed and The Cousins’ Wars the divisions between the Anglo-Celtic folkways runs deep and long. If a time traveler from the 18th century arrived in the United States today and were asked which region was the heart of intellectual ferment they would correctly guess New England. Early Puritan New England was the first universal-literacy society in the world. This was to some extent a matter of conscious planning. The leaders of the New England colonies enforced limitations upon who could emigrate to their dominion. Religious exclusions and persecutions in this region are well known, but there was also a policy of rejecting the settlement of those who were perceived to be possible burdens upon the community. New England then selected for a middle class migration out of East Anglia and the port towns of southwest England. But the fathers of the early colony also rejected the transfer of the privileges of the blood nobility from the motherland, thereby throwing up a barrier to the migration of the aristocracy.

In contrast the lowland South received a more representative selection of the British class strata. The younger sons of the British nobility and self-styled gentlemen arrived to make their mark, as did those who became indentured servants and even slaves. A class society on the model of southwestern England recapitulated itself in this region. As for the uplands, what became Appalachia, an influx of Scots-Irish came to dominate the scene by the mid of the 18th century, disembarking in Philadelphia, and pushing down the spine of the high country down to the Deep South.

Conflicts between these “Anglo” groups framed the terms of debate over the 18th and 19th centuries. They were to some extent at the root of the Age of Sectionalism. Today because of the salience of race, and the prominence of the later wave of migration in the late 19th and early 20th century which remained vibrant in living memory for mod, these early divisions have moved out of sight. But they still remain. The difference between Germans in Texas and the Anglos of Southern extraction remains to this day, but note that Germans exhibit the same regional differences in vocabulary score as Anglos. Why? This may be a case where the original cultural substratum has an outsized impact (the dialect of eastern New England, made famous by the Catholic Irish of Boston, is descended from East Anglian English!).

Of course there might be a genetic difference. Intelligence is a quantitative trait, so it would be trivial to generate two populations which are genetically similar, but very different in trait value, simply through selection. In the 1630s ~20 thousands Puritans settled New England. For various reasons there was very little migration over the next century and a half. By 1780 New England’s population was 700,000, almost all through natural increase (not only was New England the world’s first universal literacy society, but its fertility was the highest in the late 17th century).

Finally, there’s the issue of disease and pathogen load. Endemic hookworm infection does seem likely to have made Southerners, of both races, relatively indolent and lethargic in comparison to Northerners. Who knows what pathogens simply fall below our radar?

Overall I think that a more fine-grained and detailed exploration of these topics is warranted. Our public discussion is too coarse, and data-thin.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Data Analysis, Demographics, GSS, WORDSUM 
🔊 Listen RSS

WORDSUM is a variable in the General Social Survey. It is a 10 word vocabulary test. A score of 10 is perfect. A score of 0 means you didn’t know any of the vocabulary words. WORDSUM has a correlation of 0.71 with general intelligence. In other words, variation of WORDSUM can explain 50% of the variation of general intelligence. To the left is a distribution of WORDSUM results from the 2000s. As you can see, a score of 7 is modal. In the treatment below I will label 0-4 “Dumb,” 5-7 “Not Dumb,” and 8-10 “Smart.” Who says I’m not charitable? You also probably know that general intelligence has some correlation with income and wealth. But to what extent? One way you can look at this is inspecting the SEI variable in the GSS, which combines both monetary and non-monetary status and achievement, and see how it relates to WORDSUM. The correlation is 0.38. It’s there, but not that strong.

To further explore the issue I want to focus on two GSS variables, WEALTH and INCOME. WEALTH was asked in 2006, and it has a lot of categories of interest. INCOME has been asked a since 1974, but unfortunately its highest category is $25,000 and more, so there’s not much information at the non-low end of the scale (at least in current dollar values).

Below you see WEALTH crossed with WORDSUM. I’ve presented columns and rows adding up to 100%. Then you see INCOME crossed with WORDSUM. I’ve just created two categories, low, and non-low (less than $25,000 and more). Additionally, since the sample sizes were large I constrained to those 50 years and older for INCOME.

Wealth & Intelligence (2006)
Columns = 100%
Less than $40 K $40-$100 K $100-$250 K $250-$500 K More than $500 K
Dumb 22 14 12 13 5
Not Dumb 55 65 63 57 48
Smart 23 22 25 31 47
Row = 100%
Less than $40 K $40-$100 K $100-$250 K $250-$500 K More than $500 K
Dumb 50 13 18 16 4
Not Dumb 32 16 24 18 10
Smart 29 11 20 20 20
Income & intelligence (2000-2008), age 50 and above
Columns = 100%
Low Not Low
Dumb 32 11
Not Dumb 50 50
Smart 18 39
Row = 100%
Low Not Low
Dumb 58 42
Not Dumb 32 68
Smart 17 83

Of those with low income, about 1 out of 5 are smart. And of those who are smart, 1 out of 5 are poor. Remember, this is for those above the age of 50, not college students. I thought perhaps retirees might be skewing this. Constraining it to 50-64 changes the results some in a significant fashion. 1 out of 5 poor remain smart, but only 1 out of 10 of the smart are poor. As for the rich dumb, you have to look to wealth. It is notable to me that there’s a big drop off at more than $500,000 dollars in wealth. And, a large fraction of those with wealth in the $100,000 to $500,000 are dumb. I think we might be seeing the 2000s real estate boom.

In any case, I began to think of this after a recent post by the quant-blogger Audacious Epigone, Average IQ by occupation (estimated from median income). This is what he did:

…It’s not supposed to be an exact measure of IQ by profession by any means, as it is based entirely on average annual income figures. In other words, it’s an income table with the values converted to IQ scores….

…the following table estimates average IQ scores by occupation solely on the basis of the Career Cast mid-level income figures. The median salary (of a paralegal assistant) is taken to correspond to an IQ of 100. One standard deviation is assumed to be 15 IQ points….

You can see the full list at the Audacious Epigone‘s place, but here’s a selection I found of interest:

Occupation Estimated IQ from median income
Surgeon 234
Physician 161
CEO 148
Dentist 140
Attorney 128
Petroleum engineer 126
Pharmacist 126
Physicist 125
Astronomer 125
Financial planner 123
Nuclear engineer 121
Optometrist 121
Aerospace engineer 120
Mathematician 120
Economist 117
Software engineer 117
School principle 116
Electrical engineer 115
Web developer 115
Construction foreman 115
Geologist 114
Veterinarian 114
Mechanical engineer 113
Biologist 111
Statistician 111
Architect 111
Chemist 109
Stockbroker 109
Registered nurse 107
Historian 107
Philosopher 106
Accountant 106
Farmer 105
Zoologist 104
Author 103
Undertaker 103
Librarian 103
Anthropologist 103
Dietician 102
Archeologist 102
Physiologist 102
Teacher 102
Police officer 101
Actor 101
Electrician 100
Paralegal 100
Plumber 100
Clergy 98
Social worker 97
Carpenter 97
Machinist 96
Nuclear decontamination technician 96
Welder 95
Roofer 95
Bus driver 95
Agricultural scientist 95
Typist 94
Travel Agent 93
Butcher 92
Barber 90
Janitor 90
Maid 88
Dishwasher 88

Off the top of my head, I would say that the highest disjunction in the low income direction would be clergy. This is especially true for Roman Catholic and mainline Protestant denominations in the United States, which have moderately stringent educational prerequisites for their clerics. I assume that the biggest in the other direction are surgeons and medical doctors, who enter a market where there’s less and less real price signalling, where labor controls the supply of future labor, as well as well influencing the range of services that competitive professions (e.g., nurses) can provide.

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Data Analysis, I.Q., WORDSUM 
🔊 Listen RSS

Every time I use the WORDSUM variable from the GSS people will complain that a score on a 10-question vocabulary test is not a good measure of intelligence. The reality is that “good” is too imprecise a term. The correlation between adult IQ and WORDSUM = 0.71. The source for this number is a 1980 paper, The Enduring Effects of Education on Verbal Skills. I’ve reproduced the relevant table…

Estimated Correlations for Variables in a Model of Enduring Effects of Education for White, Native-Born People 25 to 72 Years Old in the Contemporary [1970s] United States
  Child IQ Age Sex Father’s Educ Father’s SEI Educ Adult IQ WORDSUM
Child IQ - 0 0 0.31 0.30 0.51 0.80 -
Age - - 0.026 -0.304 -0.130 -0.304 -0.42 -0.005
Sex - - - -0.054 0.058 0.050 0 -0.121
Father’s Educ - - - - 0.488 0.469 0.30 0.302
Father’s SEI - - - - - 0.347 0.31 0.285
Educ - - - - - - 0.66 0.511
Adult IQ - - - - - - - 0.71
WORDSUM - - - - - -   -

Obviously since the WORDSUM test was not given to those under 18 you can’t calculate the correlation between childhood IQ and WORDSUM score. Additionally, I suspect since 1980 there’s been a bit more cognitive stratification by education. I notice in the GSS sample that there are many older people, especially women, who have high WORDSUM scores but no college education. In the younger age cohorts this pattern is not as evident because if you are intelligent the probability is much higher that you’ll obtain a university education.

A correlation of 0.71 is not mind-blowing, there’s a significant difference between IQ and WORDSUM as they relate to each other linearly. But I think it’s good enough to get a sense that WORDSUM is a serviceable substitute for a more rigorous measure of g in lieu of any alternatives, and not so clumsy a proxy so as to be useless. Though that call is up to you, and readers are free to disagree with the methodology of the model used to obtain this correlation. Additionally, I would point out that WORDSUM is a subset of the vocabulary subsection of the Wechsler Adult Intelligence Scale. WORDSUM is in effect a slice of an IQ test.

I am bookmarking this post so that in the future I can simply place a link in the comment threads in response to objections to WORDSUM.

Note: Thanks to Bryan Caplan for pointing me to this paper.

Citation: Lee M. Wolfle, Sociology of Education, Vol. 53, No. 2 (Apr., 1980), pp. 104-114

(Republished from Discover/GNXP by permission of author or representative)
• Category: Science • Tags: Blog, Data, Data Analysis, GSS, IQ, WORDSUM 
No Items Found
Razib Khan
About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at"