On Sequential Decision Analytics.

November 10, 2025

supply chain science

Joannes Vermorel

I admire Warren Powell’s ambition to unify the sprawling family of “decisions over time.” His Sequential Decision Analytics (SDA) sets out a generous tent: from control to reinforcement learning, from transportation to energy and e‑commerce, the message is that sequential decisions share a common structure and should be solved by optimizing over policies. Within that structure sit four broad ways of making decisions—myopic or cost‑function approximations, value‑function approximations, direct lookahead, and policy function approximations—each a pathway through the intractability of dynamic problems. It is a powerful framing, and it has influenced many fields at once.¹

My own work proceeds from a different starting point. In Introduction to Supply Chain I argue that supply chain is not a branch of mathematics or of software per se; it is an applied branch of economics. The day‑to‑day craft is about turning optionality under variability into money, with profit—properly risk‑adjusted—as the yardstick. This stance is not a slogan. It governs how we model, how we measure, and ultimately how we automate. If the objective is coins on a ledger, then every concept that matters—scarcity, trade‑offs, opportunity cost—must be priced before it is optimized. See Chapter 3 (“Epistemology”) and Chapter 4 (“Economics”).²

Where SDA and I meet

SDA is right to treat the future as a sequence of observations and choices where agency is preserved through policies that react to what is known at each step. Supply chains live in exactly this world. But anyone who has tried to run an enterprise at scale knows that data arrive as the by‑product of systems of record, that incentives are sometimes adversarial to truth, and that evidence is expensive to obtain. This is why the book spends time on how knowledge is produced inside firms, and on the distortions that creep in—what I call “epistemic corruption.” A framework that excels in the lab must still survive contact with the incentives and semantics of the shop floor. See Chapter 3 (“Epistemology,” esp. 3.6).²

SDA’s taxonomy of policy classes is also a useful checklist when we must approximate what cannot be solved exactly. In that sense, my work is sympathetic: supply chain engines often mix simple myopic steps with short lookahead where it pays. SDA’s vocabulary helps compare such strategies and reminds us that no single class dominates across problems.¹

Where we part ways

The divergence begins with the first move. SDA starts from a model—state, decision, exogenous information, transition, objective—and then searches over policies. I start earlier, with pricing. Before I accept any “state,” I want the costs and benefits that make a decision economically legitimate to be visible and auditable. Put differently, I prefer to price the consequences until many sequential intricacies collapse into sound, one‑step choices.

This is most obvious when we “flatten” a sequential problem by inserting the right prices. Consider dispatching scarce stock from a distribution center. If we attach a visible hold price to the DC inventory—a shadow price that reflects the option to serve a better request tomorrow—then a store receives a unit only when its marginal return genuinely beats that hold price. We are not ignoring the future; we are buying it out with a number that reflects the cost of capital, the value of information, and the opportunity to wait. See Chapter 8 (“Decisions,” §8.5).²

Two instruments make this flattening safe enough to run every day. The first is a window of responsibility: a bounded horizon over which today’s decision is held to account, with later decisions inheriting the remainder. We do not need to script the entire season to judge whether ordering a container (or shipping to a store) was wise; we measure the coin‑denominated consequences in a window and move on. The second is the economics of waiting: doing nothing yet is a legitimate option, with a cut‑off rule that acts only when the expected, risk‑adjusted return of the best admissible move clears the firm’s shadow cost of capital plus the option value of delay. Together, these devices preserve agency while avoiding the fragility of deep lookahead when data and semantics are imperfect. See Chapter 8 (“Decisions,” §8.5).²

Pricing also lets us internalize long‑horizon side effects without modeling every contingency. A retailer that values inventory solely on observed sales will under‑invest in service; the remedy is a stockout penalty, a shadow valuation that reflects the long‑run cost of lost sales. With that price in place, the sequential pain of disappointing a customer tomorrow is felt—properly—by today’s allocation. See Chapters 4 and 8.²

This “pricing first” posture carries into engineering. SDA is largely model‑first; I am engineering‑first. The book argues that the programming paradigms used to express decisions matters at least as much as the statistical model. Supply chains benefit from languages and runtimes where time, money and uncertainty are first‑class citizens; where arrays and tables dominate; where determinism enables audit; and where partial recomputation shortens feedback loops. The goal is unattended engines whose decisions are legible in coins, not dashboards that need rescuing at 7 a.m. See Chapter 9 (“Engineering,” §9.5) and Chapter 6 (“Intelligence,” §6.3).²

Finally, there is the matter of how we learn. Field evidence is costly and ambiguous; the only practical antidote is experimental optimization: instrument, emit decisions, watch for “insane” recommendations, fix the drivers, and rerun. This loop does not pretend to converge once and for all; it keeps the system moored to reality as conditions evolve. See Chapter 9 (“Engineering,” §9.2).²

What this means in practice

SDA’s breadth is a feature. When you are calibrating a lookahead for an energy store, designing a policy for a robotic controller, or comparing value‑function approximations to direct rollouts, SDA offers a coherent language and a map of methods to try. It also reminds us that we are, in the end, optimizing over policies.¹

But the enterprise supply chain is a different kind of wilderness. Data semantics shift under your feet; incentives bend evidence; experiments are risky and slow. In that terrain, I have had more success by pricing first and modeling second. The method is simple to state, if demanding to execute. Price what is scarce—including attention and capacity. Attach explicit penalties where the future hurts—stockouts, congestion, obsolescence. Bound attribution with a window. Admit “wait” as an option and enforce a cut‑off that respects both capital and uncertainty. Express the whole thing in a paradigm that makes money and time native. Then iterate until unattended decisions stop looking insane.

This is not a refutation of SDA. It is a choice of order. SDA seeks the approximations that make dynamic optimization feasible. I seek the prices that make everyday decisions economically correct, so that the dynamic problem we do have to approximate is smaller, better‑behaved, and worth the extra effort. The two views can be combined: a priced, engineered perimeter outside; a targeted lookahead or value‑function approximation inside, where it is truly needed.

Readers interested in my detailed stance will find the economic foundations in Chapters 3–4, the treatment of sequential decisions in Chapter 8, and the engineering posture—programming paradigms and experimental optimization—in Chapter 9 of Introduction to Supply Chain. For a compact statement of SDA’s scope, and of the four policy classes that span its methods, Powell’s unified framework and his modeling text are the best places to start.¹

Notes

Back to blog ›

On Sequential Decision Analytics.

Where SDA and I meet

Where we part ways

What this means in practice

Notes

More Posts