Run a Decision Audit on Your Last Twelve Months

Pick any reasonably well-run company and you will find two artifacts in the operating rhythm. The first is the project retrospective: what shipped, what slipped, what we'd do differently. The second is the incident post-mortem: the system went down, here is the chain of failures, here is the remediation. Both are good things. Both have a literature, a template, an owner, and a place on the calendar.

Now ask the same company to show you its audit of how it decided last year. Not what it built. Not what broke. How its largest twenty calls were framed, what alternatives were weighed, what was assumed, what was bet on, and — crucially — whether the reasoning was any good independent of how things turned out.

In a decade of working with leadership teams, we have found this artifact almost nowhere. The retro lives in engineering. The post-mortem lives in operations. The decision audit, which is the version that compounds, lives nowhere. It is the highest-leverage hour a leadership team can spend in a calendar year, and almost nobody spends it.

Why the audit is different from the things you already do

A project retro asks: did we execute? A post-mortem asks: did the system hold? Both work backwards from an outcome. Both are useful. Neither tells you whether the decision that kicked off the project was a good decision.

This is the gap. Execution can be excellent on top of a bad call. Systems can hold through a project that should never have been funded. The retro will not catch it. The post-mortem will not catch it. The board, looking at the outcome a year later, will not catch it — because the outcome, by then, has done all the talking. If it worked, the decision must have been good. If it didn't, the decision must have been bad. The reasoning runs entirely on the back end. We have written about this confusion at length, and it is the single most expensive cognitive shortcut a leadership team can take.

The decision audit fixes the gap by doing something specific and slightly uncomfortable: it scores each major decision against what was knowable at the time it was made, not against what is known now. Howard Raiffa made the foundational case for this discipline in the 1960s. Annie Duke has restated it for a generation that prefers narrative to math. Phil Tetlock's two decades of forecasting research show, with about as much rigor as a social science question admits, that calibrated, process-based scoring of decisions is the only thing that actually trains judgment. The audit is the operational form of that finding.

What it inoculates against

Two biases do most of the damage in leadership review meetings, and the audit is the only known defense against either.

The first is hindsight bias: the felt sense, after the fact, that the outcome was more predictable than it was. The team that lived through the project remembers the path they took. They do not remember, with any fidelity, the other paths that were live at the time. So the path they took feels inevitable. So the decision feels obvious. So no learning occurs.

The second is outcome bias: the substitution of "did it work?" for "was it a good call?" These are different questions. A good decision is a defensible choice between options given what was knowable; a good outcome is what the world happens to deliver. The first is largely in your control. The second is partly luck. Confuse the two and you reward luck and punish skill, in approximately equal measure, for about a decade — at which point the talent base of the firm is so badly mis-selected that the next downturn becomes uncomfortable.

The audit forces the question back to the decision. Given what we knew, given what we could have known with reasonable effort, was the reasoning sound? That is the only question whose answer can be used to improve the next decision.

The format

A decision audit, run well, is two days of work and twelve hours of meetings. The format is unspectacular, which is part of why teams skip it.

Step one: pull the list. The leadership team, working from board materials, investment memos, hiring approvals, and any commitment over a pre-agreed threshold (we usually peg the threshold to one or two percent of operating profit), assembles every meaningful call from the last twelve months. The target is fifteen to twenty decisions. Fewer and the audit reads as anecdote. More and the team will not finish.

Step two: reconstruct the knowledge state. For each decision, the team writes a one-page summary of what was on the table at commitment time. What was the question? What were the options? What did the deck say? What did the deck not say that, in retrospect, the team did know but did not surface? This step is harder than it sounds. Memory is reconstructive. The team that lived through the year will, without discipline, smuggle in things they only learned later. The fix is to work from contemporaneous documents — the deck, the email thread, the board minutes — not from the recollections of the people in the room.

Step three: score each decision on five dimensions, independent of outcome.

Framing: was the question well-posed? Was the right decision being made, at the right level, at the right time?
Alternatives considered: were there at least two genuine options on the table, or did the team converge prematurely on the recommendation and then build the case?
Assumption stress-testing: was the one or two load-bearing assumptions identified, and were they actually stressed — or were they nodded at and moved past?
Explicitness of priors: did the team write down, in numbers or in words, how confident they were in the part that mattered? Did anyone state a probability, a range, a base rate?
Post-commitment revisitation: did the team write kill criteria? Did anyone schedule the check-in where the decision could be unwound? Did the unwind, when warranted, actually happen?

A decision can score well on outcome and badly on process. A leadership team that cannot distinguish the two is a team that learns from noise. The audit is the only routine we know that forces the distinction onto the page.

Each dimension gets a score from one to five. The scores are added. The total is, by design, uncorrelated with whether the decision worked. This is the point. A team that audits honestly will find good outcomes that scored two out of twenty-five, and bad outcomes that scored twenty-two. Both findings are gold. Both are unobtainable from any other routine in the company's operating rhythm.

The five failure modes that surface most often

We have run this audit, in one form or another, for dozens of leadership teams. The findings are surprisingly consistent. Five categories of decision-failure account for most of what shows up in the bottom half of the scoring matrix.

The first is the missing alternative. The deck contains one option dressed up as a decision. The team did not generate, evaluate, or even seriously discuss a meaningfully different path. The reasoning is recommendation-shaped, not choice-shaped. We find this in roughly four out of five strategic decisions we audit. It is, by some distance, the most common failure mode in business decision-making, and it is invisible to retros because the executed plan was the only plan ever considered.

The second is the un-stressed load-bearing assumption. Every major case has one or two assumptions on which the entire economic argument depends. Usually it is a growth rate, a retention number, a synergy estimate, a build cost, a regulatory outcome, or a key hire's productivity curve. A well-framed decision identifies that assumption explicitly and stresses it: what happens at half? What happens at zero? What is the prior literature on this kind of estimate, and how far off do they typically come in? In our audits, the load-bearing assumption is named explicitly in fewer than one in five decision documents. It is stressed in fewer than one in ten.

The third is the un-written kill criterion. The team committed to the bet. The team did not commit to a date, a metric, or a threshold at which the bet would be reconsidered. The result is the slow-motion failure: by the time the project is clearly going wrong, the constituency around it is large enough that killing it requires a political act, not an analytical one. The kill criterion is the single cheapest piece of decision insurance available. Almost nobody writes one.

The fourth is premature framing lock-in. The decision was made before it was a decision — at an offsite, in a hallway, in the CEO's mind during a flight — and the formal process was reverse-engineered to produce the answer that had already been chosen. Premature framing lock-in is hard to detect from outside the room. It is easy to detect in audit, because the alternatives section of the deck is thin and the assumptions section is fragile. The team didn't generate options because there was no longer a choice to make.

The fifth is what we have come to call decision laundering. A decision is moved into a committee, a working group, a sub-board, a steering function — and emerges with the imprimatur of "process" without the actual deliberation having happened. Everyone in the chain assumed someone else had stressed the assumption. Nobody had. The committee structure functioned as a way to distribute responsibility for a decision that was, in substance, made by one person who never had to defend it. We see this pattern especially in matrixed organizations and in regulated industries, where the appearance of process is itself a deliverable.

What to do with the findings

The audit is not the point. The audit is the diagnostic. What matters is what the leadership team does with what surfaces.

Three moves account for most of the value.

The first is to publish the rubric. The five dimensions above (or whatever rubric the team prefers) become the standard for how new decisions are framed going forward. Every memo above the threshold now answers the five questions on its face. This sounds bureaucratic. It is not. It adds about thirty minutes of work to the average investment memo, and it removes about thirty hours of post-hoc litigation from the average failed bet.

The second is to separate decision-quality scoring from outcome scoring in the people review. Promotion, compensation, and bench planning should weight both, and weight them separately. A regional GM who runs a well-framed bet that gets unlucky should not be penalized the way a GM running a sloppy bet that got lucky should not be rewarded. Most companies do exactly the opposite, and the consequences compound for a decade.

The third is to make the audit annual. The first one is hard. The second one is easier. By the third one, the leadership team has internalized the rubric to the point that the audit's findings get smaller every year, because the decisions are getting cleaner upstream. This is what compounding looks like in organizational learning. It is the only mechanism we know that produces it.

A checklist for your first audit

If your team is going to attempt this without help, run it as follows.

Pick a threshold (one to two percent of operating profit; or any commitment requiring board notification; pick something and write it down).
Pull every decision above the threshold from the last twelve months. Aim for fifteen to twenty.
For each, retrieve the contemporaneous decision document — deck, memo, board paper, email — and write a one-page reconstruction of the knowledge state at the time.
Score each on the five dimensions (framing; alternatives considered; assumption stress-testing; explicitness of priors; post-commitment revisitation). Use a one-to-five scale. Two scorers per decision, independently, then reconciled.
Plot the score against the outcome. Pay particular attention to the off-diagonals: low score, good outcome (lucky); high score, bad outcome (well-framed; bad draw). These are the most valuable findings.
Identify the dominant failure modes. Most teams will find two or three from our list above. The composition varies; the existence of a pattern does not.
Write the rubric. Publish it. Embed it in the next decision cycle.
Schedule next year's audit before you finish this year's. The thing that does not get scheduled does not get done.

The audit is not exotic and it is not expensive. The reason almost no one does it is that it is uncomfortable in a particular way: it forces the team to admit, in writing, that some of last year's wins were lucky and some of last year's losses were well-played. Most teams cannot bring themselves to put that on paper. The teams that can — in our experience, a small minority — are the ones whose forecasts get tighter, whose framing gets sharper, and whose bench gets deeper, for as long as the discipline holds.

We run this audit with leadership teams before the next planning cycle. The first one is the hardest. It is also the one that pays for the next ten.

The Bayeseon Team

Writes about decision quality at Bayeseon. Reach the team at hello@bayeseon.com.

Why the audit is different from the things you already do

What it inoculates against

The format

The five failure modes that surface most often

What to do with the findings

A checklist for your first audit

Keep reading

The Decision Tax: Why Confident-Sounding Boards Make Expensive Calls

Forecasts Without Confidence Intervals Are Marketing

Got a decision you'd rather not get wrong?