Where Data-Driven Strategy Goes Wrong

A senior executive once explained to us, with the controlled patience reserved for difficult clients, why his team had passed on entering a category that subsequently became the most profitable segment of his industry. We didn't have the data, he said. We couldn't underwrite it. He said this as if it settled the matter. To him it did. To us it was the matter — the entire matter — and the company had lost a decade of compounding because nobody in the room recognized it as such.

The phrase "data-driven strategy" has become, in most companies we work with, a sort of professional virtue signal. It means: we are serious; we don't go on hunches; we underwrite our decisions. The intent is good. The pathology hidden inside it is specific and expensive, and it is one of the most common failures of strategic reasoning we encounter.

The pathology is this. The data, when it exists, exists for the parts of the decision that are already cheap to be right about. The hard parts — the parts where the strategy actually lives — are precisely the parts where the data does not exist, will not exist before the commitment has to be made, and would not be diagnostic even if it did.

What the data is good at

This is not an argument against data. The places where data-driven analysis is appropriate are exactly the places it should be applied, and the discipline of doing so is genuinely valuable. Pricing a known SKU into a known channel. Optimizing a fulfillment route. Sizing a marketing program with measurable attribution. Reducing churn in a cohort whose behavior is observable. These are the cheap-to-be-right-about parts of running a business, and the firms that do them best build durable operational advantage by treating them with the rigor data deserves.

Note the shape of these problems. Each one has the property that the system you are trying to predict is, roughly, the system you have already been running. The customer who churned last month is informative about the customer who will churn next month, because the underlying generative process is stable. The channel that responded to last quarter's pricing is informative about how it will respond to next quarter's. The route that the trucks took last week is the route they will take next week, plus or minus traffic.

In these problems, data is not just useful — it is the substrate of good decision-making. We have spent significant parts of our careers building systems that do this work and we are not, by temperament, anti-data. The argument is not against the use of data. The argument is against the substitution of data for strategy when the data is not, in fact, available for the strategic question.

What the data is not good at

Strategy lives, almost by definition, in the parts of the business that are not the running operations. The new geography. The adjacent category. The repositioning. The acquisition of a company in a different stage. The pricing of a product into a customer base that does not yet exist. The build-versus-buy decision on a capability that, by the time it matters, will have changed shape.

For these decisions, the data you have is not informative about the system you are trying to predict, because the system has not yet been run. The CAC of a customer in a segment your company has never entered is, at six-month tenure, fiction. We have watched teams torture six months of trailing CAC data from a brand-new segment, build a sophisticated cohort model on it, and present the result to a board as if it were a forecast. It was not a forecast. It was a hallucination expressed in basis points.

The LTV of a customer in that same segment is worse than fiction; it is anti-information. The earliest customers in a new category are systematically unrepresentative — they self-selected into a thing that did not yet exist, which means they are some of the strongest possible adopters in the population. The LTV of the first cohort, projected onto the population, will over-state the population's value by some unknown but substantial multiplier. The team that built the model knows this. The deck the model produces does not.

The competitive response to a strategic move is similarly un-modelable from data. Your competitors have never responded to this move because you have never made it. The historical record contains responses to other moves by other companies in other moments. The relevance of those responses to yours is an act of judgment, not measurement.

This is the un-measurable middle, and it is where strategy lives. It is also where most companies, having committed to the rhetoric of being data-driven, simply stop. The decision either gets deferred ("we don't have the data yet") or gets dressed up in numbers that look like underwriting and aren't.

Naming the data you don't have is a more strategic act than mining the data you do. The unmeasured assumption is the one that usually decides whether the bet works.

The cost of "we don't have the data"

The sentence "we don't have the data" is one of the most consequential sentences spoken in modern corporate strategy meetings, and it is almost never recognized as a strategic act. It sounds like a deferral of judgment. It is, in fact, a judgment — the judgment that the absence of data is sufficient reason not to act.

Consider the implicit logic. The team has assembled in a room to decide whether to enter a new category. The relevant data — historical CAC, LTV, market share evolution — does not exist for this team in this category. The team observes this, names it, and recommends not entering. The CEO accepts. The board ratifies. The category is left to a competitor who, in due course, makes the entry, runs the experiment, generates the data, and within four years has built a position that the original company can no longer cheaply contest.

The original team will, in the post-mortem, say they made the right call given what they knew. They will be wrong. The call was a strategic commitment, not an analytical one. The company chose to wait for data that could only be generated by acting. By the time the data existed, the act was no longer available.

The reverse case is just as common. A team has detailed data on a strategic move because the move is, in fact, similar to a thing the company has done before. The team underwrites the move heavily, presents the numbers with confidence, gets approval. The move is approved precisely because the data is available — but the data is available because the move is easy, and easy moves do not produce strategic differentiation. The company spends three years and meaningful capital on a category extension that was always going to work and was always going to be margin-dilutive. It declined to make the harder, undermeasured bet next to it. The harder bet went to someone else.

Both failures share a structure. The presence or absence of data was treated as a proxy for the quality of the strategic argument. It is not. Often it is the opposite.

What to do when the data isn't there

A few practical postures, drawn from how we run engagements where the central question is exactly the one the data cannot answer.

First, write down what the data would need to look like for the decision to be obvious in either direction. This is unfamiliar to most teams, but it is the most useful exercise we know of for clarifying what the strategic question actually is. If the answer is "we would need eighteen months of cohort data from the new segment," then the decision is not actually about whether to enter — it is about whether to design an entry that produces, in six months, half the data we'd need to commit fully. The framing converts an unmakable decision into a designable experiment.

Second, name the one or two assumptions that the case actually hangs on, separately from the analysis. Most strategic decisions, in our experience, hinge on between one and three assumptions — usually about a customer behavior, a competitor response, or a unit economic that has not yet been observed at scale. The deck around the case may be a hundred pages, but the load-bearing claims are a handful of sentences. Identify them. Stress them. Ask, explicitly, what evidence between now and the decision date would update each. The deck's value is roughly proportional to how well it has done this and almost not at all to its overall thickness.

Third, use the outside view aggressively where the inside view has no data. Kahneman and Lovallo's reference-class forecasting framework is the most underused tool in strategy. When the inside-view data does not exist, the outside-view data almost always does — at the level of base rates rather than the level of the specific case. Eighty percent of category entries of this scale by companies of this size hit half their case within three years. Sixty percent of acquisitions of this size lose more in integration friction than they gain in the first eighteen months. These are not the data the team's models want, but they are the data the team's models can be checked against. A strategic case whose inside-view confidence diverges sharply from its outside-view base rate is a case the team needs to defend more carefully — and frequently cannot.

Fourth, be honest, on the record, about the difference between underwriting and judgment. Underwriting is what you do when you have the data. Judgment is what you do when you don't. Both are legitimate. Confusing them is not. A strategy document that calls itself underwritten when it is, in fact, judgment-led is a document that has misrepresented its own confidence to the room reading it. We discuss this at more length in our companion essay on framing strategy as bets with stated odds.

The thing nobody says

There is a deeper reason data-driven rhetoric has captured so much corporate strategy, and it is worth naming. It is defensible. An executive who passes on a category for lack of data has a story to tell when the category later proves valuable. An executive who passes on a category on judgment, having reviewed the available evidence and concluded it did not warrant the bet, has to defend the judgment. The first story is institutionally easier. The second is more honest.

The companies that compound strategically are, almost without exception, the ones whose leadership has accepted that the hardest decisions cannot be defensibly underwritten, and that taking them anyway — with stated reasoning, calibrated confidence, and explicit kill criteria — is the job. The companies that decline this responsibility, in our observation, eventually find themselves in categories that are already commoditized, because those were the only categories in which the data was conclusive in time.

If your last twelve months of strategic meetings have produced more "we don't have the data" deferrals than you are comfortable with, that pattern is itself information. It is, often, the highest-leverage thing about the strategy function to surface — and the work we do most often begins with making it visible.

The Bayeseon Team

Writes about decision quality at Bayeseon. Reach the team at hello@bayeseon.com.

What the data is good at

What the data is not good at

The cost of "we don't have the data"

What to do when the data isn't there

The thing nobody says

Keep reading

The Decision Tax: Why Confident-Sounding Boards Make Expensive Calls

Forecasts Without Confidence Intervals Are Marketing

Got a decision you'd rather not get wrong?