Philip Tetlock is a veteran American academic with a specialty in studying forecasting accuracy regarding affairs of state: e.g., what will China do this calendar year regarding its disputed territorial claim to the Spratly Islands in the South China Sea? He gets funding from the American intelligence community to run an impressive annual forecasting tournament, and now he’s given a class on what he’s learned from the first four years of the tournament.
He says in The Edge:
The second story is the parable of Tom Friedman and Bill Flack. It’s a tale of two forecasters. Everybody around this table knows who Tom Friedman is. He’s this world famous New York Times columnist, award-winning columnist, and award-winning writer. … Then there’s this guy, Bill Flack. Bill Flack is a retired irrigation specialist, who worked for the U.S. Department of Agriculture in Nebraska. He has no track record of writing on world politics. He’s never been invited to sit on a panel with Tom Friedman or, for that matter, anybody else. The question is, who is likely to be a better forecaster? …
It turns out though, and we know this from the IARPA tournament that Barb and I worked on for a number of years now, we also knew this from my earlier forecasting work as well, that the correlation between your ability to tell a good explanatory story and your forecasting accuracy is rather weak. It’s not as strong an argument as you might think. Even if it is positive, it’s not all that strong. There’s a very powerful countervailing reason for believing that Bill Flack is a better forecaster, and that is because Bill Flack is a scientifically-documented, officially-certified IARPA tournament superforecaster. He did a great job assigning probability estimates to hundreds of questions posed over four years in the IARPA forecasting tournament, a superb performance. This is with neutral umpires, no room for fudging, this is objective scoring.
It’s quite possible that Bill Flack is a better forecaster. I guess it brings us to what I see as a core paradox and that is, why is it that we know so much about Bill Flack’s forecasting record and so little about Tom Friedman’s? Why is it that most of the time we don’t even know that we don’t know that, and we don’t even seem to care? Is that a satisfactory intellectual state of affairs? Is that a good way to be conducting high stakes policy debates, by relying on proxies like social status, the ability to tell a good story in determining who has the most impact on the policy debates?
Here’s a useful observation by Tetlock on the incentives facing America’s intelligence agencies:
One of the things about learning in Washington, D.C. and the intelligence community is if you’re going to make a mistake, make sure you don’t make the last mistake.
Tetlock’s superforecasters spend a lot of time updating their probabilities by small amounts:
Among the best forecasters in the IARPA tournament believe change tends to be quite granular. They think that Hillary Clinton has a 60 percent chance of being the next President of the United States today, and then some information comes out from the State Department Inspector General about Hillary’s emails and her possible culpability for her email policy: “Okay, I think I’m going to move it down to .58 now.” That’s the sort of thing superforecasters do, and the cumulative result is that their probability scores, as defined in these handouts here and in the book, are much better. The gaps between their probability judgments and reality are smaller where your dummy code reality is 0 or 1, depending on whether the event didn’t occur or did occur.
Personally, I haven’t spent more than 10 minutes over the months following the Hillary email thingie. I probably wouldn’t have much to contribute on the subject that meet my criteria of being true, new, important, and funny. Closely monitoring Hillary’s chances sounds dreadfully boring but, apparently, some people like to do it. And good for them.
Earlier, Tetlock did work showing that, to use Isaiah Berlin’s terminology, foxes who know many small things tend to make better forecasters than hedgehogs who know one big thing. Hedgehogs achieve success by identifying Factor X in Situation A. They then get nice jobs speaking at conferences and on television where they try to use Factor X to explain Situations B and C. And they are often reluctant to admit that now that they’ve made the Factor X -> Situation A correlation famous, maybe decisionmakers are adjusting for it and it’s becoming less powerful.
It’s starting to occur to Tetlock that his tournament’s medium-short time frame has some fundamental problems.
A particularly interesting example of this was in my 2005 book, which a CIA analyst was reading and wrote to me last year saying, “Professor Tetlock, does this make you change your mind about hedgehogs?”
Foxes know many things but a hedgehog knows one big thing. The hedgehogs come in many flavors in expert political judgment. There are free market hedgehogs, there are socialist hedgehogs, there are boomster hedgehogs, there are doomster hedgehogs. They come in a variety of ideological complexions. This particular hedgehog was an ethnonationalist hedgehog.
What do I mean by that? In the Daniel Patrick Moynihan sense, he thinks that the world is seething with these primordial ethnic national identifications and existing nation-states are going to be rupturing all over the place in the next 50, 60 years (he wrote it in 1980). Then things like Yugoslavia and Soviet Union and so forth happened, and the Moynihan view started to look quite prescient. This ethnonationalist hedgehog writing, offering some forecasts in 1992, anticipates that by 1997 there is going to be a war between Russia and the Ukraine. The Russians are particularly obsessed with Crimea, but they’re also going to see some eastern provinces of the Ukraine where there is a somewhat pro-Russian population, and they’re going to use oil and gas as a weapon, and they’re going to do things to the Ukrainians that more or less happened in 2014.
We’re seventeen years off. The hedgehog gets a terrible accuracy score in the five-year time frame between 1992 and 1997. What do you think of that? That is an interesting complication. That is something we can’t sweep under the rug. We need to think more systematically about how short-term and long-term foresight are related to each other, and that is one of the major focuses of what I thought would be the second session where we would talk about the importance of the questions we ask.
I would say that Greece has lasted longer in the Eurozone than many economists thought it should or would, and if you want to go hedgehog-y, go to Martin Feldstein in 1990 or so when he said, “This is really stupid; you’re going to have a common currency for these countries at these very different levels of economic development? Sounds like the United States and Mexico going in for a currency union. You guys have got to be kidding.”
Then, a few years later the Euro is at $1.50 to the dollar, and the Euro looks really strong and people are saying, “Well, so much for Feldstein.” Again, this is like the ethnonationalist hedgehog on the Ukraine or the Feldstein hedgehog on the Euro or Friedrich von Hayek in the 1930s thought that the Soviet Union was finished. The Soviet Union couldn’t even exist, for that matter, because central planning was such an abominably bad idea. But the Soviet Union managed to limp along until 1991.
Linking short-term foresight and long-term foresight in tournaments is one of the great intellectual challenges here.
Timing is a general aspect of the problem of forecasting that even if you do an outstanding job of identifying a factor in the present that correlates with a future outcome, you don’t necessarily know how all the many factors work, or how they will interact.
For example, weather forecasters are increasingly optimistic that the California drought will end this winter due to heavy El Nino rains. El Nino years are correlated with warm Pacific water (and sure enough, when I was at San Onofre beach last Saturday the ocean was much less chilly than is common).
All else being equal, warm ocean water increases the chance of El Nino rains. That’s a very important thing to know.
On the other hand, an El Nino winter in California is kind of a Rube Goldberg contraption dependent upon multiple factors besides warm water. If the winds blow in the wrong direction, for instance, it doesn’t happen.
Similarly, ethnonationalism is definitely a thing, but there are a lot of things in the human world.
Tetlock: My thinking has evolved since 2005; that’s certainly true. I like to think of forecasting tournaments as intellectual ecosystems that require different types of creatures existing and reciprocal patterns of interdependence. The foxes need hedgehogs. The hedgehogs are very useful sources of information and insights. The foxes are almost parasitic in some ways on hedgehogs; they use hedgehog ideas and they’re eclectic and they often combine, “Yes, I’ll take this from this hedgehog, and that from this one.” Just as there is a complementarity between Tom and Bill, and question generation and answer generation, forecasting tournaments, there’s a complementarity between hedgehog and fox forecasters. The foxes are the ones looking for the deep parsimonious covering laws that capture the underlying drivers of history. The foxes wonder whether history has any underlying drivers. It may just be, as one famous historian said, “History is just one damn thing after another.” There’s a tension there between those two views of history, and it’s a productive dialectical tension. I would not wish it away. …
For example, I know a few big things. In fact, I specialize in focusing upon some things that are so obviously big (e.g., race differences, IQ, crime rates, sex differences, correlates of sexual orientation, etc.) that most pundits won’t publicly mention them because they are so obvious, so stereotypical, that you aren’t allowed to publicly think about them.
So I’m kind of a multi-fox. (Or is that multi-hog? Hedgefox? Foxhog?)
That means I tend to be not too far off on some big issues year after year after year. When George W. Bush and Ted Kennedy passed the No Child Left Behind act requiring every public school student if America to be “proficient” in reading and math by 2014, I didn’t expect that to happen. Similarly, I’ve long argued that if you open up the borders, it’s not going to turn out as well as you dream; if you make some fashionable change in schooling, it won’t make much difference in outcomes; black youths get in more trouble with the cops than, say, Asian girls less because of racism than because they are male and black, etc etc…
On the other hand, I’m not terribly interested in the Tetlock forecasting tournament’s typical foreign affairs questions with a one year time horizon: in the Spratly Islands territorial dispute between China and the Philippines, will Beijing do Y by December 31 if Manila first does X?
These are, no doubt, important questions. Tetlock’s discovery in his tournament that there are amateur Moneyballers, foreign affairs junkies with an outstanding track record over four annual tournaments of being right more than chance would predict is an important one.
On the other hand, there is very little in our society telling you can’t speak honestly about the Spratley Islands, while there are many social penalties for speaking honestly about housing, educational, criminal justice, and other policies in which our culture is actively hostile and punitive toward the use of Occam’s Razor in understanding how things work.
In other parts of the discussion on The Edge, Robert Axelrod, a game theorist who worked with William D. Hamilton, reminds us of something we really ought to memorize:
Axelrod: People in positions of power on average have been luckier than average. They’ve made a lot of choices like, along the way to becoming President, some of which were disputed by their best advisors, and they were right and their advisors were wrong. Their known batting average has a substantial regression to the mean, which they tend not to account for, so they think that they’re better judges. And they have good evidence for it because they have been better judges.
Hitler, for example, had a long winning streak up through the Fall of France in 1940 — where he overruled his senior generals and went with a junior officer’s brilliant plan — and beyond; it subsequently made him assume he was a military genius, a self-estimate which the events of December 6, 1941-May 9, 1945 did not bear out.
One plausible way to improve forecasting is to be frank about the biases that caused past mistaken judgments. For example, consider the 1990s Russian catastrophe facilitated by American economists like Jeffrey Sachs, Larry Summers, Andrei Shleifer, and Stanley Fischer.
Tetlock: Let me offer you an example of one of the ways in which tournaments might create better hedgehogs. In a forecasting tournament, you don’t have the luxury that you have in academia of saying, “That’s not in my field.” When Jeffrey Sachs was arguing for shock therapy and rapid privatization in the early post-Soviet economies of the early ’90s, he had a lot of critics saying he’s going too fast. Later on, he said, “Well, I was right about the economics of rapid privatization, but there needed to be a legal system.” He needed more institutions.
But it was said almost as an afterthought as a way of saying, “I’m trying to look at economics. My economic analysis was sound, but I missed these other factors that are in another field: the law, institutions, corruption, culture. I missed that.”
Yale Law School professor Amy Chua pointed out in her 2003 book “World on Fire” that most of the American economists instructing the Russian government on what policies to follow and most of the billionaire oligarchs who were the big winners from those policies happened to share the same ethnicity (the ethnicity of Professor Chua’s husband, so she felt brave enough to point this out).
In a world in which making better forecasts and policies for the future is a high priority, Chua’s finding should have been cause for self-reflection and public discussion. Maybe there has been a tiny amount of the former, but in the 12 years since Chua’s book came out, there has been virtually none of the latter. For example, when ex-Bank of Israel governor Stanley Fischer was nominated to be the #2 at America’s Federal Reserve, as far as I can recall, I was the only one looking up Fischer’s catastrophic forecasts and advice for Russia in 1991-1998.
So my advice for better forecasting is to give up enforcing sacred cows: stop making people choose between career marginalization and not mentioning massively important factors.