The Harvard Business Review, as part of their January-February 2025 issue, recently published How Generative AI Improves Supply Chain Management1 by Ishai Menache (Microsoft), Jeevan Pathuri (Microsoft), David Simchi-Levi (Professor, MIT), and Tom Linton (McKinsey). As the title suggests, the article provides a series of examples supposedly illustrating how LLMs (large language models) can contribute to supply chain management. Considering the list of elite organizations (Microsoft, McKinsey, MIT, and even Harvard) involved in the publication of this piece, one would expect some deep insightful views—the kind that carefully articulate how a brilliant piece of technology—LLMs—is going to contribute to the betterment of supply chains.

A 60s-style salesman stands in front of a futuristic warehouse busy with activity.

Instead, we get a mediocre contribution. More precisely: it is lazy, hyperbolic, and profoundly misguided—a piece that happens to ride a technological buzzword. The entire premise of the article can be summarized as LLMs can both generate and explain code of supply chain relevance. The authors adopt what I usually refer to as a gadgetoid take on the subject. The gadgetoid take consists, when discovering a novel piece of tech, of hastily bolting the piece onto an existing system. This is typically done with no consideration whatsoever of either the limitations of the piece or of the changes that would be brought to the system. As a result, this approach invariably produces gadgets—amusing tools of limited interest—and no business value whatsoever.

To the various organizations involved:

  • Microsoft: you have many talented engineers on board. You need to hold yourself to higher standards.

  • McKinsey: do not steer potential clients in directions that are guaranteed to be a waste of time and money.

  • MIT and Harvard: you should be voices of reason and not amplifying the drivel of tech vendors.

Let’s immediately clarify that while I do agree that LLMs can be used for the betterment of supply chains, LLMs can also be a complete distraction—depending on how the endeavor is approached. This point happens to be exactly the fundamental problem plaguing the article under review. A closely related paper by Microsoft2 also suffers from this issue; although for the sake of clarity, my review will remain focused on the Harvard article.

Before addressing the core technological nonsense, let’s point out a notable ethical issue. This article is nothing but a thinly disguised piece of advertisement for Microsoft Cloud Supply Chain. Microsoft is naturally free to advertise its services in any way it sees fit, but roping in both Harvard (via publishing) and MIT (via co-authoring) to a promotional piece rubs me the wrong way. At the very least, the article should be pointing out that coauthors have obvious conflicts of interest. Academia should not be condoning covert promotional activity of tech vendors, no matter how large and influential they happen to be.

Disclaimer: The discerning reader will have realized that I also have a conflict of interest. Yes, I run a supply chain software company and, thus, have a vested interest in contradicting the nonsense Microsoft, McKinsey, MIT, and Harvard are spreading. My cards are now on the table.

Let’s now proceed with a review of the claims made by the article.

Planners also monitor changes in the demand plan, called the demand drift, on a monthly basis to ensure that the revised plan fulfills all customer requirements and falls within budget guidelines […] LLM-based technology now does all this. It automatically generates an email report that details who made each change and the reason for doing so.

In short, LLMs can automate employees doing busywork. However, there are two obvious objections. First, LLMs are completely unneeded to execute this sort of role-based policy. A simple programming language and a few imperative statements will suffice. Moreover, programmatic plumbing will be needed anyway to access the data and send the email. In those settings, LLMs are guaranteed to represent a massive engineering complication that delivers no tangible benefits. Indeed, LLMs are very slow and very costly; about 10 orders of magnitude worse that a short list of rules. They are a tool to be used as a last resort, when everything else has failed. This is clearly not the case here.

Second, the busywork pointed out above is entirely unnecessary. The company should eliminate this class of pointless tasks3. Bureaucracies are bad enough, but technocracies are worse. Bringing overcomplicated pieces of technology to the party guarantees that the bureaucracy will be further entrenched in its dysfunctional ways. Lokad, my company, has been refactoring away this sort of busywork for more than a decade and it doesn’t require anything as complicated and costly as LLMs.

These contracts specify the details of the price paid by the OEM, quality requirements, lead times, and the resiliency measures suppliers must take to ensure supply. After feeding the LLM data from thousands of contracts, one OEM was able to identify price reductions it was entitled to for surpassing a certain volume threshold.

While it is true that companies routinely fail at applying some of the contractual clauses with their suppliers that would benefit them, identifying the relevant contractual terms is but a microscopic portion of the challenge—say, 1% or possibly much less. Moreover, it can be addressed with a tool like ChatGPT with no preparation whatsoever. All it takes is to compose a query and repeatedly submit all the PDFs via the user interface. This sort of trivia belongs to a LinkedIn post titled “The 10 things I did with ChatGPT today” and not to a Harvard-MIT publication.

Now, the actual challenge is twofold: instrumentation and relationship. On the instrumentation side, the clerical steps to emit orders and payments are largely automated in most companies that operate something that qualifies as a “supply chain”. Thus, unless there is extensive supporting instrumentation, dealing with fringe edge cases is going to complicate and delay everything.

Moreover, on the relationship side, if a contractual clause has been ignored for years, then it is naïve to think that the company can activate the clause without any consequence. More often than not, the supplier had already integrated this lax behavior of the client in its prices, and nitpicking on fringe contractual terms will be answered in kind—or possibly through a price increase.

More generally, discounts and price breaks should be managed as part of the regular transactional business systems. This is not rocket engineering, but plain old CRUD4. Once again, bringing an LLM where a few imperative rules would suffice is technologically nonsensical.

An LLM allows planners to ask detailed questions. Here are a few examples: “What would be the additional transportation cost if overall product demand increased by 15%?” […] Here’s how an LLM can answer questions like these accurately and efficiently. Many optimization tasks are written in the form of mathematical programs […] An LLM […] translates a human query into a mathematical code that is a small change to the original mathematical model used to produce the plan.

This is wishful thinking. To this date, the percentage of companies enjoying a “unified monolithic codebase” to derive their supply chain decisions is virtual nil (more on that below). Instead, companies have an ocean of spreadsheets. Unlike the pretty picture that the authors are painting, there are no “mathematical programs” to interact with for the LLMs. While LLMs could, conceptually, edit and improve a messy pile of half-obsolete spreadsheets, until proven otherwise, this is pure speculation. Even state-of-the-art LLMs would be hard-pressed to edit a single large spreadsheet—the passive code duplication that spreadsheets entail isn’t favorable at all for LLMs—but making sense of hundreds, if not thousands, of sheets remains pure science fiction at this point.

Now, there are indeed a few companies that benefit from a unified monolithic codebase managing supply chain decision-making processes—namely the clients of Lokad. If the authors had, like us, actually any experience in the matter, they would have known that those codebases are sizeable; typically tens of thousands of lines of code, despite the fact we use a DSL (domain-specific language)5 dedicated to supply chain. This DSL is about 10x more concise than Python or Java for this sort of task, by the way. There is unfortunately no shortcut: for any decision of interest, there are dozens of tables, covering hundreds of fields, that are involved in the calculation. While it is conceivable that further improvements to our DSL may further reduce the number of lines of code, those codebases won’t ever be small.

Again, LLMs could, conceptually, edit and improve a complex codebase while directed by a non-technical contributor. However, we are again in science-fiction territory. LLMs are already proven to be fantastic productivity tools for capable programmers. In other words, if you already know how to program, LLMs can help you to program faster. Yet, this is not what the authors of the article are saying. Their proposition is precisely that LLMs can be used to let non-technical contributors perform technical contributions. Based on the current LLM state of the art, this proposition is invariably false except within the confines of tiny sandboxes that do not reflect the massive ambient complexity of real-world supply chains.

Planners can use LLM technology to update the mathematical models of a supply chain’s structure and the business requirements to reflect the current business environment. Further, an LLM can update planners on a change in business conditions.

The authors are doubling down on the science fiction. This claim is technically indistinguishable from “an LLM can issue patches to a code repository on GitHub based on tickets submitted by non-technical users”. It would be fantastic news if this were possible, but again, present-day LLMs are nowhere near being able to accomplish this sort of feat reliably for serious requests. When presenting use cases for a novel technology, it’s critical to accurately convey the limits of said technology. The four co-authors appear to be entirely oblivious to the current state of the art of LLMs; that said, I have a sneaking suspicion they are not. Instead, we get infomercial drivel by people who are keenly aware of what they are doing—which is arguably a lot worse.

The need to change the supply plan may also be driven by LLM-based technology. For example, after analyzing shipment data from a specific supplier, it may generate an alarm that the lead time from the supplier has increased significantly over the past few months.

Detecting whether a lead time is abnormal is absolutely not what an LLM can do. An LLM can, however, be prompted to write a piece of code to perform this analysis. We are back to the LinkedIn post “Top 10 things I did with ChatGPT today”. Why stop there and not directly update the ordering logic where the lead time information is consumed? This is exactly what the authors suggest later on in the article.

We envision that in the next few years LLM-based technology will support end-to-end decision-making scenarios.

If by support, the authors were stating make programmers more productive—said programmers being in charge of coding end-to-end decision making—then this is already possible—in fact, it is something Lokad has been doing for some time. However, if we remove the human programmers from the picture, this statement becomes something closer to “we envision that in the next few years LLM-based technologies will achieve AGI (artificial general intelligence)”.

The authors, riding hard on the “Gen AI” buzzword, are entirely dismissive that LLMs might somehow have limitations of their own. Here is what authors put forward in their “Overcoming Barriers” concluding section:

Adoption and training. Using an LLM to optimize supply chains requires very precise language […] Each interpretation leads to different decisions.

No, this is plain wrong—unless “very precise language” is understood as “programming language” (as those are effectively very precise). To optimize a supply chain using an LLM, you need a human engineer who is capable of doing the coding entirely on his own, albeit more slowly. For the foreseeable future, no amount of training, except becoming proficient at programming, will make users capable of performing supply chain optimizations with the support of LLMs.

Telling the CEO or the supply chain director that he merely needs to train his team to use a “precise language” is wholly misleading. The sort of training workshops that would result from this view are guaranteed to be a complete waste of time for all parties involved.

Verification. LLM technology may occasionally produce a wrong output. Thus, a general challenge is to keep the technology “inside the rails”—namely, identify mistakes and recover from them.

While LLMs are probabilistic by design, this issue is dwarfed by the semantic uncertainty that permeates supply chain systems. In other words, the LLM is very likely to give you the correct answer to the wrong problem. The experience of Lokad indicates that frequently the only way to check if a given implementation (driving a supply chain decision) is correct is to perform a limited experimental test6. Real-world feedback is not an option. Knowledge cannot be conjured out of thin air—even AGI-level LLMs would still be confronted with this hurdle.

Here, the authors are caught filling in. They make a correct—but trivial—statement about the nature of LLMs without even attempting to see whether the issue is a core concern or not. Had the authors actually managed real-world supply chains using LLMs, they would have realized, like us, that this concern is a small problem within a long list of much bigger—and much more impacting—problems.

To conclude, Brandolini’s law applies here: The amount of energy needed to refute bulls**t is an order of magnitude bigger than that needed to produce it. This article is so bad that it could have been written by ChatGPT—and maybe it was for all I know. Based on my casual observations, there are dozens of equally bad articles produced every day about Gen-AI and SCM. The notoriety of the authors motivated me to put together this review. Stepping back, it wouldn’t be the first time that a vendor promises to revolutionize a domain while having nothing of substance to offer. Yet, the same vendor doing it twice7 two years in a row within the same domain might be excessive though. Then again, academia should be at least attempting to do some critical thinking instead of happily jumping aboard the buzzword bandwagon.


  1. The original article is available on hbr.org, and a copy can also be retrieved from archive↩︎

  2. Beyond this article, there is also a separate Microsoft paper providing more details: Large Language Models for Supply Chain Optimization (2023), Beibin Li, Konstantina Mellou, Bo Zhang, Jeevan Pathuri, Ishai Menache. While this paper is marginally better than the article under review—a very low bar—this is still a very weak contribution. The OptiGuide framework is nothing but a bit of trivial plumbing on top of LLMs. The paper doesn’t alleviate in any way the limitations of LLMs, nor does it bring anything usable for a real-world supply chain. ↩︎

  3. See Control and bureaucracies in Supply Chains (2022), Joannes Vermorel ↩︎

  4. See CRUD business apps (2023), Joannes Vermorel ↩︎

  5. Lokad uses Envision precisely for this purpose. ↩︎

  6. This is the whole point of the Experimental Optimization methodology. ↩︎

  7. See Microsoft to end Supply Chain Center preview less than a year after launch, 2023, by Kelly Stroh. ↩︎