βayes|eon
← Back to insights
forecastingsuperforecastersexecutives

What CEOs Can Learn from Superforecasters

The best forecasters in the world are not subject-matter experts. They are people who have learned, deliberately, how to update.

Amateurs beat the CIA on geopolitical forecasting using three habits. Each one translates directly to the calls a CEO is making this quarter.

The Bayeseon Team8 min read

In 2011, the U.S. intelligence community ran a tournament. Teams of forecasters competed on hundreds of geopolitical questions — will country X hold elections by date Y, will commodity Z cross threshold W — with their predictions scored against the actual outcomes. The CIA's own analysts were one team. A group of amateurs assembled by Phil Tetlock was another. By the end of the tournament, the amateurs had beaten the professionals by something like thirty percent. The top two percent of them — the "superforecasters" Tetlock named in the book that followed — beat the intelligence community by margins that would have ended careers in any other field.

The amateurs did not have access to classified information. Several of them were retirees. One was a pharmacist. What they had was a small set of habits that, taken together, produced more accurate forecasts than the most expensive intelligence apparatus on earth.

Those habits are not specific to geopolitics. They translate, almost line for line, to the calls a CEO is making this quarter. We want to walk through three of them — and address the objection we hear every time we bring this up in a CEO's office, which is some version of: "but I know more than a pharmacist about my industry."

Habit one: break the big question into testable sub-questions

The superforecasters, faced with a question like "will the Assad regime fall within eighteen months," did not try to answer it directly. They decomposed it. What is the regime's current military position? What is the financial state of its key sponsors? What is the historical base rate of regime collapses under conditions resembling these? Then, having answered each sub-question with its own range, they recomposed the parts into a probability for the whole.

The technique works because most big questions are emotionally loaded and analytically vague at the same time. "Will Assad fall" is the kind of question on which a smart person can hold three contradictory intuitions before lunch. "What is the historical base rate of regime collapse for governments that have lost more than 30% of territory but retained their capital city" is a question with an answer. Five answers like that, combined honestly, produce a forecast more accurate than any holistic gut call.

The CEO version is almost identical. Take the kind of question that lands on a CEO's desk: "will Competitor X enter our segment in the next 18 months?" The bad version of this question gets a single answer, usually based on what the CEO heard at the last conference. The superforecaster version decomposes it.

What is X's hiring pattern in roles that would be required for entry? (A factual question, answerable from LinkedIn in a day.) What is the historical base rate of companies of X's size entering an adjacent segment within 18 months of the hiring pattern we see? (A question that requires looking at a reference class — five or ten analogous moves over the past decade.) What is X's current capital position, and what is the implied opportunity cost of the entry given other public commitments? (Answerable from the 10-K.) What is the CEO of X on record as saying about this segment over the last six quarters? (Answerable from earnings transcripts.)

None of these sub-questions are hard. None of them require the CEO's special domain knowledge. What they require is the discipline to ask them separately, answer them honestly, and combine them into a number — rather than running the whole question through the CEO's gut and producing the answer the room expects.

Habit two: update frequently, in small increments

The second habit is harder to teach because it cuts against almost everything an executive has been trained to do. The superforecasters update their probabilities constantly — sometimes daily — as new information arrives. Not in large dramatic swings, but in small adjustments. A forecast at 60% becomes 62% on a piece of supporting news, 58% on a piece of disconfirming news, 65% on a stronger piece of supporting news a week later.

The executive default is the opposite. Most CEOs we work with form a view, defend that view publicly for months, and then either continue defending it (if events cooperate) or revise it sharply in a single dramatic moment (if they don't). The pattern is partly cognitive — sunk-cost commitments to the original view — and partly social. Updating frequently looks, in a typical boardroom, like flip-flopping. The CEO who told the board in February she was 70% confident in a competitor's entry, and walks into May at 50%, has to spend the meeting explaining the change.

This is why the second habit is much easier to adopt in private than in public. We typically encourage executives to keep a private forecast log — a running document, weekly, where they update probabilities on the half-dozen questions that actually matter to the business. The log is not for the board. It is for the CEO. Six months in, the log is doing two valuable things at once. It is producing more accurate forecasts than any single moment of executive judgment would. And it is giving the CEO a track record — a written, dated, scored history of how often her 70% forecasts actually come in at 70%, which is the only honest way to know whether her 70%s deserve the label.

The frequent-update discipline also reframes what a "change of mind" feels like. The CEO who has moved from 60% to 55% to 52% to 48% over four weeks does not experience the move below 50% as a reversal. It is the next data point. The CEO who held 70% in public for four months and then revised to 45% experiences — and is experienced as — flipping. The first version is correct. The second version is theater.

Habit three: track accuracy with brutal honesty

The third habit is the one that does most of the actual work, and the one that almost no executive practices. The superforecasters keep score on themselves. Not loosely. Specifically. Every prediction they make is scored against the outcome, using a proper scoring rule (Tetlock's tournaments used Brier scores), and the running average is what tells them whether their 70%s are actually 70%.

The discipline reveals, reliably and often uncomfortably, that the forecaster's stated confidence is miscalibrated. Most people who say 90% are right closer to 75% of the time. Most people who say 60% are right closer to 55%. The gap between the stated confidence and the actual frequency is the calibration error, and it is correctable — if you can see it. You can only see it by keeping score.

A forecasting culture is just a scorekeeping culture with a slightly fancier name. The fancier name is what people use to avoid the part that actually works.

The CEO version of this is the thing we are most often asked to help install. It is also the thing companies most often resist. To keep score honestly, the CEO has to write down her forecasts in dated form, leave them unmolested, and revisit them when the outcome resolves. The first cycle through this is bruising. Executives discover that their 90%s are 70%s and their 70%s are coin flips. Some discover that one of their most-trusted heuristics has a near-zero hit rate. Almost everyone discovers that the forecasts they make in board meetings are systematically more confident than the ones they make in private. The gap between board-meeting confidence and private confidence is, in our experience, the largest single calibration error a CEO carries.

The good news is that the gap closes fast once you start measuring it. Tetlock's superforecasters did not begin as superforecasters. They became superforecasters by keeping score and noticing the patterns of their own misjudgment. The CEO who runs the same loop for a year — predictions, dated, scored, reviewed — is, by year-end, a meaningfully better forecaster than the version of herself who started.

The obvious objection

Whenever we walk through this with a CEO, the objection arrives on schedule. "The superforecasters were generalists. I am a domain expert. I have inside information about my industry, my customers, my competitors. The forecasting habits help amateurs catch up. They don't help me."

There is a piece of this that is true. Domain knowledge matters. The CEO does see things a generalist forecaster doesn't. But the objection misses the larger point in two ways.

First, the gap between a domain expert with superforecaster habits and a domain expert without them is larger, not smaller, than the gap between two generalists. The habits are multiplicative with knowledge, not substitutive for it. A CEO who knows her industry deeply and decomposes the question into sub-parts, updates frequently, and keeps score is operating with both engines. A CEO with only the domain knowledge is operating with one. The superforecaster results, in fact, suggest that the second engine is the larger of the two — the amateurs beat the professionals because the habits added more than the classified information did.

Second, the CEO's domain knowledge is not the kind of knowledge most exposed to the questions she most needs to forecast. The questions that matter for the company over the next eighteen months — will the competitor move, will the macro turn, will the new product land, will the key hire ramp — are not questions where the CEO has thousands of reps. They are mostly questions where she has a handful of reps and no scorecard. The same problem we wrote about in the piece on intuition applies here. Domain expertise is local. Forecasting habits are general. The decisions that matter most for the company are usually outside the perimeter where the local expertise has been tested.

What this looks like in practice

We typically install the three habits in a six-month engagement structured around the CEO's existing calendar. Decomposition is taught against the half-dozen big questions actually on the desk that quarter — the questions get broken down, the sub-questions get answered with the team's help, the forecasts get recomposed honestly. The updating discipline gets installed as a weekly forecast log. The scoring discipline gets installed at six months, when the first cohort of predictions resolves and the CEO sees, in writing, how often her stated confidence matched reality.

Most CEOs find the first cycle through this uncomfortable in the way the first quarter of any honest measurement is uncomfortable. By the second cycle the discomfort fades. By the third, the CEO has begun reading her board's forecasts the way the superforecasters read their tournament rivals' — as numbers to be tested rather than statements to be agreed with. The board meetings change. The annual plan starts missing by less. The CEO has joined, quietly, the small club of executives who are better forecasters at the end of the year than they were at the beginning.

The pharmacist did not have inside information. She had habits. The habits are available. The question is whether the CEO is willing to keep score. Most are not. The ones who are end up running a different kind of company.


The Bayeseon Team

Writes about decision quality at Bayeseon. Reach the team at hello@bayeseon.com.

Got a decision you'd rather not get wrong?