Chatting With Your Supply Chain Won’t Fix It

December 24, 2025

supply chain science

Joannes Vermorel

Every few days, a new demo makes the rounds: you type a question in plain English—“What if demand spikes in Germany?”, “What if we shut down this plant?”, “What if we switch suppliers?”—and an assistant confidently replies with a revised plan, a short explanation, and a tidy list of next actions. The pitch is irresistible: supply chain, finally made conversational. Planning, finally made frictionless. Optimization, finally made accessible.

Abstract chat bubbles above complex supply chain network

In my book Introduction to Supply Chain (see Chapter 6, especially the parts that deal with “general intelligence” and the legacy of OR), I argued that supply chain is a discipline that repeatedly mistakes slick interfaces for progress. That argument matters now, because large language models (LLMs) are being positioned—by serious researchers and by serious vendors—as the missing piece that will at last connect mathematics to operations, and planning to reality.

What the community is actually building

The operations research community has been remarkably consistent in what it wants from LLMs: a universal translator between human intent and mathematical machinery.

The NL4Opt competition series at NeurIPS is explicit about the target: increase the “accessibility and usability of optimization solvers” by letting non-experts express problems in natural language, then generating the modeling artifacts (including LP and MIP modeling code) that solvers can consume. This is not a fringe idea; it is being formalized into benchmark tasks, datasets, evaluation metrics, and prize-driven competition structures.

OptLLM pushes the same idea from the research side into an end-to-end workflow: the user states an optimization problem in natural language; the system converts it into mathematical formulations and programming code; then it calls an external solver and iterates through a multi-round dialogue to refine the model. In other words: English → math → code → solver → results → English, with the LLM orchestrating the loop.

INFORMS’ 2024 TutORials chapter by Wasserkrug, Boussioux, and Sun offers a broader, profession-facing framing: LLMs can accelerate the creation and deployment of OR/MS solutions by (i) reducing development time while improving solution quality, (ii) extracting structured data from unstructured text without building bespoke NLP models (they explicitly mention demand forecasting as a motivating use case), and (iii) enabling natural-language interfaces so business users can interact flexibly with analytical models. The chapter is notable not just for the optimism, but for its emphasis on responsible use: LLMs are presented as powerful tools, but tools that need governance and accountability.

Meanwhile, the learning-to-optimize line of work is also absorbing LLMs—not only as “interfaces,” but as components in the solver ecosystem itself. The “foundation model” ambition has arrived in MILP: Microsoft researchers propose training a single model across diverse MILP classes and introduce MILP‑Evolve, an LLM-based evolutionary framework that generates large, diverse MILP problem classes “with an unlimited amount of instances,” supporting tasks like learning to branch and aligning MILP instances with natural-language descriptions. A 2025 survey on LLMs for evolutionary optimization goes further and categorizes emerging paradigms: LLMs as stand-alone optimizers, LLMs embedded inside optimization algorithms, and LLMs used at a higher level for algorithm selection and generation—explicitly pointing toward “self-evolving agentic ecosystems” for optimization.

On the supply chain planning side, the proposals rhyme almost perfectly with the OR agenda—but with a different focal point: not “make solvers accessible,” but “make planners fast.”

MIT Sloan Executive Education’s 2025 piece on LLMs in supply chain planning is representative. It describes planners asking complex what-if questions in natural language; the LLM translates the query into mathematical adjustments and returns human-readable insights. It claims that what previously required “days” of back-and-forth with engineers can become “minutes,” and it extends the promise to “real-time” disruptions: planners can “dynamically adjust mathematical models” without relying on IT teams, and rerun optimization to produce revised plans explained in plain language. The same article admits, almost as an aside, the central difficulty: validation—ensuring the AI-generated adjustments match real business conditions—remains a challenge.

Simchi-Levi, Mellou, Menache, and Pathuri (2025) make the “democratization” thesis explicit: planners and executives currently spend too much time understanding outputs, running scenarios, and updating models, so LLMs can “democratize supply chain technology” by facilitating understanding and interaction “without human-in-the-loop,” reducing decision time from “days and weeks” to “minutes and hours.”

Microsoft Research’s OptiGuide project sits exactly at this intersection of OR and planning: it aims to “democratize decision making” by combining optimization with generative AI, and it claims production use for cloud supply chain planners answering “what if” questions and receiving actionable recommendations.

The software vendors, unsurprisingly, have converged on the same themes: copilots, agents, natural language queries, and faster decisions.

SAP positions Joule as a generative AI copilot integrated into its enterprise applications, including supply chain, and SAP’s IBP-specific pages emphasize conversational access to product documentation (a smoother way to search and retrieve guidance) and integration of Joule with SAP IBP. SAP’s own community release notes for IBP mention the integration of Joule as an innovation in that release line.

Oracle’s 2025 announcements emphasize “AI agents” embedded in Oracle Fusion Cloud Applications. For supply chain, Oracle says these agents automate processes, optimize planning and fulfillment, and accelerate decision-making; it even enumerates agents such as an exceptions-and-notes planning advisor that summarizes alerts and notes. Oracle’s readiness documentation also describes a “Supply Chain Planning Process Advisor” agent built with LLMs + RAG to answer planners’ questions about responsibilities, daily tasks, and escalation processes, grounded in enterprise documents the customer uploads.

Blue Yonder’s ICON 2025 press release presents “AI agents” and a “supply chain knowledge graph,” claiming the agents enable businesses to “see, analyze, decide, and act” with machine speed and that agentic AI is infused across planning and execution.

Kinaxis describes agentic and generative AI capabilities that lower barriers to interacting with supply chain data—querying a “digital twin” in natural language and receiving answers—and its public materials and independent analyst coverage describe a multi-agent framework intended to reduce reliance on IT and let planners interact directly with operational data.

o9 pushes the same storyline with different branding: “GenAI agents” plus an enterprise knowledge graph evolving into a “large knowledge model,” positioned as a step-change in productivity and expertise for planning functions.

Above all of this sits Gartner, which provides the category language. In an August 2025 press release, Gartner predicts rapid proliferation of AI assistants and task-specific AI agents in enterprise applications, outlines “stages” of agentic AI evolution, and warns that a common misconception is calling assistants “agents”—a misunderstanding it labels “agentwashing.”

So, yes: there is real work happening. Benchmarks, frameworks, agent architectures, integrations, and plenty of production-minded engineering. The question is not whether LLMs can be wired into supply chain tooling—they already are. The question is whether that wiring addresses the actual reasons supply chain decision-making remains stubbornly mediocre.

Why I’m unconvinced that the interface is the bottleneck

The OR community is correct about one thing: optimization is hard to express cleanly. The planning community is also correct about one thing: people waste time translating questions into spreadsheets, emails, meetings, tickets, and ad hoc model changes. An LLM can reduce that friction. It can shorten the path between “I wonder…” and “here is a computed answer,” just as the MIT Sloan piece describes.

But supply chains are not failing because planners lack a faster way to ask questions.

Decades of solvers, modeling languages, and academic progress already exist. The claim embedded in NL4Opt—“make solvers usable by non-experts via natural language”—presumes that the main obstacle is the user’s inability to formulate problems. That is a comforting story for researchers, because it turns a messy organizational failure into a translation problem. It also happens to be largely false in supply chain.

What blocks real-world adoption is not that people can’t type constraints. It is that the constraints, the objectives, and the data semantics are routinely wrong—or, more precisely, they are wrong in ways that matter economically. You can generate mountains of modeling code and still be optimizing nonsense.

This is not a new critique. Russell Ackoff argued in 1979 that OR had become “contextually naive” and that its application was increasingly restricted to problems relatively insensitive to their environments. Supply chains are the opposite of environment-insensitive. They are dominated by volatility, substitution effects, missing data, delayed feedback, and incentives that reshape behavior as soon as you start “optimizing.”

The LLM agenda in OR tends to underestimate a brutal asymmetry: writing a model is easier than validating it. OptLLM can generate formulations and code, and then iterate in dialogue. Yet the real difficulty is not producing another mathematical object—it is ensuring the object matches the company’s actual economic trade-offs, operational constraints, and measurement reality. An LLM can produce a plausible constraint; it cannot certify that this constraint is what your warehouse truly experiences at 3 a.m. during a promotion, with the real labor roster, the real packing policies, the real replenishment rules, and the real failure modes.

In planning, the same asymmetry becomes even more dangerous, because the cultural reflex is to trust the plan. The MIT Sloan article celebrates that planners can “dynamically adjust mathematical models” without IT. Simchi-Levi and coauthors frame it as removing the need for data science teams and technology providers in the loop. This is sold as empowerment. I see it as bypassing the very friction that used to prevent uncontrolled model drift.

If changing the model is easy, the organization will change the model constantly. Not because it is wise, but because it is psychologically satisfying. Every disruption invites a new exception. Every executive question invites a new scenario. Every meeting invites a new tweak to satisfy someone’s intuition. You get faster motion, yes—but you also get faster entropy.

Vendors have understandably steered toward “agents” that summarize, explain, and guide. Oracle’s Planning Process Advisor is essentially an LLM+RAG layer over policies and procedures: it answers questions based on documents you upload and embeds this chat experience inside workflow pages. SAP’s Joule positioning in IBP similarly highlights conversational search and documentation-based answers. These are legitimate productivity features. They can reduce time wasted hunting for the right process document. But they do not meaningfully change the quality of decisions. They are, at best, a better help system.

Then we get to the grander “agentic” story. Blue Yonder announces multiple AI agents, infused across planning and execution, aiming for continuous optimization and machine-speed decisioning. Kinaxis describes agentic frameworks, AI agents that monitor and act in real time, and natural language interaction with a digital twin. Gartner predicts a world full of these things, while warning that much of what is being sold as agents is really assistants—“agentwashing.”

The irony is that Gartner’s warning is not a side note; it is the central fact. Most of these “agents” are wrappers. They sit on top of existing workflows, automate fragments of interaction, and make the interface nicer. They do not repair the economic logic of the decisions, nor do they solve the foundational issues of data fidelity, uncertainty, and incentives. They may reduce decision-making lag in the narrow sense that someone can click faster—but not in the deeper sense that the company is making fewer mistakes.

Even the more substantial academic integrations—LLM + network optimization, with dashboards and role-aware explanations—frame the core difficulty as “bridging the gap” between OR outputs and stakeholder understanding by translating results into natural language and contextual KPIs. Again: useful. But it keeps the same assumption intact: the model is correct; people just need to understand it. In supply chain, the model is often the problem.

So when I hear “LLMs will democratize optimization,” I translate it to: “we will democratize the production of pseudo-models.” When I hear “LLMs will let planners update models without IT,” I translate it to: “we will accelerate the rate at which fragile models are patched, unpatched, and repatched.” And when I hear “agentic AI will orchestrate end-to-end workflows,” I ask: orchestrate toward what objective, on what semantics, with what accountability?

Where LLMs can be genuinely useful

None of this is an argument to ignore LLMs. It is an argument to stop pretending that conversation is a substitute for competence.

The OR/MS TutORials chapter is right to emphasize that LLMs can speed up the production of decision-making applications, help turn unstructured text into structured signals, and provide natural-language interaction layers—while also insisting on responsible use. This is exactly where LLMs shine: not as the decision engine, but as a productivity multiplier for the engineers and analysts who build the real decision engine.

If you use LLMs to write glue code faster, to generate tests, to draft documentation, to extract messy business rules from contracts and SOPs into structured representations, you are improving the part of the pipeline that actually is a bottleneck: the slow, expensive, failure-prone engineering of data and logic. If you use LLMs to help auditors, executives, and operators understand why a computed decision was made—by producing explanations grounded in traceable evidence—you can improve adoption without surrendering control.

Even the “democratization” line can be reframed productively. OptiGuide’s claim that it has been used in production for cloud supply chain planners answering what-if questions is not trivial. But the value is not that a planner can chat with an optimizer. The value is that a company with a serious optimization backbone can build a safer, more usable interface—one that exposes the right levers and explanations, rather than letting anyone casually rewrite the underlying math.

And yes, the solver-learning agenda—foundation models for MILP, learned branching, and so on—may yield genuine algorithmic improvements. But supply chain will benefit from those improvements only if the rest of the stack is healthy: correct semantics, economic objectives, robust handling of uncertainty, and an execution loop that turns recommendations into actions reliably.

In other words, treat LLMs the way you treat any powerful but fallible tool: as an accelerant for the people building systems, not as a layer of varnish on broken processes. Treat them as a way to reduce the cost of rigor, not as a way to skip rigor.

Supply chain is not short on words. It is short on good decisions. LLMs are exceptional at producing words. They can help us build better systems, faster. But if we confuse eloquence with correctness, we will simply get mistakes delivered in better prose—and at higher speed.

Back to blog ›

Chatting With Your Supply Chain Won’t Fix It

What the community is actually building

Why I’m unconvinced that the interface is the bottleneck

Where LLMs can be genuinely useful

More Posts