00:00:00 Panel kickoff, audience-driven KPI debate
00:04:00 Supply chain equals economics: allocate scarce resources
00:08:00 Value of information links forecasts to finance
00:12:00 Teeny Beanie Babies expose accuracy fallacy
00:16:00 Service levels hide asymmetry and incentives
00:20:00 Poker-like inventory bets; stockout prediction matters
00:24:00 Optionality: pricing, discounts, transfers reshape outcomes
00:28:00 Lead-time uncertainty and correlations break simple metrics
00:32:00 Purple swans: tails reveal future stockouts
00:36:00 Aerospace demand: retrofits and phantom requisitions
00:40:00 Decision-to-cash simulations replace KPI chasing
00:44:00 Two dashboards: executive value and data sanity
00:48:00 Data quality weirdness: negatives, returns, missing power
00:52:00 Manual overrides signal model defects
00:56:00 KPI bonuses create conflicts and gaming
01:00:00 Goodhart’s law: targets rot over time
01:04:00 Purge metric walls; keep five essentials
01:08:00 Final takeaways and sign-off
Summary
Supply chain is applied economics: allocating scarce resources for maximum return. Percent KPIs like forecast accuracy and service levels look “scientific” yet often ignore the real asymmetries—stockouts can wipe out margin, while excess stock usually “only” carries costs or markdown risk. The alternative is end-to-end, euro-denominated decision evaluation: probabilistic forecasts, Monte Carlo simulation, and expected-vs-actual financial outcomes. Governance should track model failure signals (notably manual overrides) and data sanity, while avoiding incentive schemes that invite KPI gaming.
Extended summary
The discussion begins by treating supply chain for what it is: applied economics under scarcity. Every choice—buying inventory, consuming materials, moving stock—spends limited resources that cannot be spent twice. So the proper goal is not to maximize “nice-looking” percentages, but to maximize return on resources deployed.
From that premise, the panel dismantles the usual idols: forecast accuracy and service levels. Those metrics are easy to compute and easy to worship, precisely because they are detached from business reality. A percentage can look scientific—97.17% has a comforting ring—while saying little about profit, cash flow, or risk. Worse, standard accuracy metrics penalize over-forecasting and under-forecasting symmetrically, even though the economics are asymmetric: a stockout can destroy margin and customer goodwill, while excess inventory often “only” incurs carrying cost or markdown risk.
The alternative is to connect decisions to financial outcomes end-to-end. Patrick frames this as “value of information”: use probabilistic forecasts (full distributions, not point estimates), simulate decisions via Monte Carlo, propagate uncertainty through KPIs into financial statements, and then compare expected versus actual. Joannes agrees, adding that the technical debate—simulation versus density modeling—is secondary; the key is that the chain must terminate in euros or dollars, not in abstract metrics.
The conversation also attacks the “static forecast” mindset. In retail, demand depends on price actions and liquidation options; forecasting a single future without recognizing optionality turns planning into premature commitment. Real operations are dynamic: transfers between stores, discounts, and other levers change outcomes after the initial decision.
On uncertainty, lead times and rare events matter. Distributions can be bimodal with long tails, correlations appear for large orders, and edge cases dominate losses. “Accuracy” often ignores the expensive 1%—the glitch that triggers foolish purchase orders, the perishable seasonal miss, the aerospace part that becomes urgent because of retrofits or maintenance windows. These are not statistical curiosities; they are where money burns.
Finally, governance: Joannes argues the most important non-financial signal is manual overrides of automated decisions—because overrides reveal model ignorance or data failure. Both emphasize data sanity checks and warn that incentives tied to KPIs invite gaming. If you make a metric a target, you degrade it; better to keep metrics few, financially anchored, and rely on management judgment rather than bureaucratic scorekeeping.
Full transcript
Conor Doherty: This is Supply Chain Breakdown, and today’s panel will be breaking down the KPIs that matter most to your supply chain performance. You know who I am. I’m Conor, Communication Director here at Lokad.
To my left, as always, Joannes Vermorel, Lokad’s founder. Our special guest today, joining us remotely, is Patrick McDonald. He’s Executive Adviser at Evolution Analytics, and he brings to today’s panel about 30 years of very relevant experience.
So Patrick, first of all, thank you very much for joining us.
Patrick McDonald: Thank you so much, Conor. It’s so, so great to be here.
Conor Doherty: Perfect. Good to have you.
Now before we start, this is a live chat. This panel is here at the behest of our audience. So if you have any questions or comments, get them in. Do you think that forecast accuracy is an important KPI for your supply chain? Why? What about service levels? Don’t get us started.
But let’s move on. Patrick, as the guest, we come first to you. So before we get into the weeds of deconstructing KPIs, we’re all logicians here, so I think the first question should be: what exactly do you see as the goal of supply chain decision-making? And then later we can talk about KPIs for measuring the efficacy of that, right?
Patrick McDonald: And I think that’s a really important question. I’ve done, as I said, 30 years of management consulting and data science work. Getting that question answered correctly is often a lot more difficult than you might think.
We come in and we say, “What is it exactly we’re trying to do here?” And I think pretty much routinely the answer is: we’re trying to make a decision about how to allocate a resource.
Very often that’s about where we’re going to be positioning inventory. In some other context it might be where we’re going to focus our efforts, how we’re going to allocate our staff time, etc.
But from a supply chain perspective, it’s very much: how are we going to allocate the inventory? And that’s the core decision that we’re trying to make. I think you have to look at the decisions that you’re making in that context in order to be able to really focus on and get the best results.
So that’s something I’ve tried to do over the course of my career, and that’s what we’re going to get into today. Looking forward to it.
Conor Doherty: Well, thank you, Patrick. And Joannes, I know you prefer to focus on optimizing service levels in isolation, right? That’s the goal of supply chain for you. I’m paraphrasing, right?
Joannes Vermorel: Not quite. I really appreciate the approach of Patrick, which is focus on the decisions.
Most specifically, my own take is that supply chain is an applied branch of economics. So we have a series of choices that are the allocation of scarce resources.
Inventory: first you have your money that you need to spend on what do you buy. Once you spend a dollar on something, you can’t spend it on something else. Then if you have your raw materials and you consume them to produce anything, as soon as it’s consumed, again, it’s gone.
If you move a piece of inventory of a finished product from one place A to place B, as soon as it’s moved it’s not available anymore at place A. You have those scarce resources that should be made maximally useful for the company.
Then when we get to what do we mean by actually making the best use of those scarce resources, the very short answer is: maximize the rate of return.
You want to, essentially, for every dollar or dollar-equivalent of resource that you put to any use, make sure that you have the maximum amount of dollars in return. So you want to maximize, effectively, the rate of return.
Conor Doherty: Well, thank you. Patrick, back to you. Do you align with that perspective—supply chain fundamentally is an applied branch of economics?
Patrick McDonald: I think absolutely. The approach that I’ve taken over the last few years is a concept from that branch that isn’t maybe published very well. It’s called value of information.
The work that I tend to do is to help answer the question: from a data science perspective we might use a forecast as an insight, right? And based on that we’re going to make a decision about how we’re going to place inventory.
So we do that, then we can make a prediction based on our model: if we place the inventory there, what are sales likely to be? If I know what my sales are likely to be, then I can also say: what is my inventory level likely to be?
If I know that, then I can calculate what my KPIs are likely to be. If I know my KPIs, I can then calculate what my financial line items are likely to be.
So I’ve been in the process for the last several years of looking at that entire chain. Typically what I’ll do is I’ll do a Monte Carlo analysis based on the uncertainty level that we have in the forecast.
That’s one of the key things, right? People chase that point accuracy, which I don’t really care that much about at all. I care much, much more about the actual probability mass function, or the density function, around the forecast itself.
So I’m leveraging that. I’m doing Monte Carlo analysis, and I’m following that chain all the way from the insight through the decision through the KPIs to the financial statements. It allows me to really understand what’s happening and do simulation, and look at possible outcomes with an understanding of the likelihood of those happening.
Being able to do that all the way to a financial statement that you can then take into the boardroom and say, “Okay, if you’re going to make these kinds of decisions, here’s the kind of outcomes that you can expect.”
Then we can go back and we can actually measure actual versus expected and have some real value. I’ve just found that to be a much better approach than what I see most companies do, which varies from really simple spreadsheet stuff, looking at linear regression trying to get a line, to more sophisticated forecasting and trying to chase the individual forecasting accuracy.
Does that make sense?
Conor Doherty: Well, Joannes, does that make sense?
Joannes Vermorel: It does. It’s very, very aligned with the way Lokad approaches supply chains.
Indeed, there is even a duality between any kind of simulator—Monte Carlo techniques—and probabilistic forecasting with direct density modeling. If you have something that can generate many deviates about the future, you can reconstruct the probability densities.
And if you have the probability densities, then you can generate deviates that reflect those. So you can go back and forth. Sometimes it’s more practical to use one versus another, but that’s more like a technical aspect rather than a high-level thinking aspect.
The intention on both is the same. The intention is the same. Yes, the idea is we really want to connect the thing end-to-end to the financial outcome.
You have many, many steps, but fundamentally all those steps are numerical artifacts that are just means to an end. The end being the financial outcome that you want to maximize for the company.
Conor Doherty: Well, okay. I heard a lot of mathematical terms. Perfect. I understand them.
But I also want to keep this somewhat more grounded. So if we come back, Patrick, to the point: we seem to be in complete unanimity—total consensus—that economic impact is what you should be looking at. So supply chain is applied economics.
Okay, to somebody who says, “Well, hold on, Patrick. If we keep pushing for higher and higher levels of forecast accuracy and higher and higher service levels, we’re going to maximize the economic return of our supply chain.” Like, how could better numbers—better forecast accuracy and better service levels—not translate to better economic performance?
Patrick McDonald: Right. I know that’s counterintuitive, but it’s just not true.
There’s a mathematical principle—sorry, I’m going to have to go just a little bit math-wonky on you here—called Jensen’s inequality that helps us understand why that’s not true.
There are a couple of key pieces. First, if you’re looking at traditional accuracy metrics, it’s weighted equal sides. If you’re too high or too low, it’s going to be weighted the same.
But that’s not what we have in inventory. The value of information model says: if I lose a sale, I lose all my margin. If I have too much, then I just have the carrying cost of inventory. So already I’ve got an asymmetry around there that I need to account for.
Typically I will do that kind of evaluating on my accuracy metrics anyway.
The other thing that we miss is that we know there’s uncertainty inherent in the forecast— inherent in the future outcome—and that uncertainty is basically tied up in that density function.
Now you can handle that in a number of ways. The simplest way we’ve done it for years and years is to understand standard deviation and try to use that, and set some boundaries.
That was good back in the day when we had really limited capabilities for calculation. But now I can do on my personal MacBook stuff that would have taken a Cray supercomputer back when I was in college.
So our calculation capability is so much greater today than it was that we can do so much more. We need to look both at that value of information function and we also need to look at the shape of that probability density function.
That’s where I think some key metrics come into play.
If you’ll indulge me just for a minute, I’ll tell a little bit of a story about how I first got involved in this—where I was first looking at it—and why it became so important to me over the course of my career.
I was just starting out, and we were building out a data warehouse for McDonald’s. I was a consultant with Proco, and it was 1997, so that tells you how old I was.
It was the first year they did the Teeny Beanie Baby promotion. I don’t know if you remember those or not, but it was huge. Made the news. People were buying the Happy Meals, throwing away the food just to get the toy.
There was a poor guy that was a delivery guy that got assaulted because somebody tried to steal the Teeny Beanie Babies. Made the national news. It was a big, big deal.
We came in basically Monday morning that first week of the promo, and the business execs were in our face saying, “We need a store run-out date report,” right?
Because they were flying through the Teeny Beanie Babies so fast. It’s their promotional item. It’s got to be cheap. It drives a whole market basket, right?
It’s got to be cheap. They order them from China, so they got to do it a year in advance. They had a fixed amount. No opportunity for replenishment.
If your promotion lasts as long as it lasts, when you’re out of Teeny Beanie Babies, promotion’s over. Down goes sales, right?
So we did the report for them. Sure enough, they ran out in about a week and a half.
So was that a good promotion or not? I think yes, potentially, but think about it: your sales were four times what you thought they were going to be for a week and a half, and then they plummeted to below normal because you had no promotion for the remaining three weeks of the month.
So they came back to us and they said, “Well, how many should we order next year?” And we said, “Well, you know, four times as much.” And the decision was: “No, that’s way too much. We’ll do twice as much.”
The next year they ran out in two and a half weeks.
Those are the kinds of things where I started to understand the challenge. Part of the challenge they had was they couldn’t get good forecasts.
So we wrote what was called an interim forecasting system for them, and it ran for 18 years. It was supposed to just be there until they got something better, and it was calculating Happy Meal toys and McDonald’s burger patties for 18 years.
The thing that really bugged me is the forecasting vendors would come in and say, “I can get you better accuracy, and if I give you better accuracy, you’ll get better business results.”
To me it was kind of like the Gary Larson Far Side cartoon where a guy’s standing in front of the board with all the equations on it, and he points to the spot in the middle that’s unexplained and says, “And then a miracle happens.”
I could never really connect the dots from those things until I figured out how to do this value chain, with value of analytics or value of information, all the way through the chain to be able to model it.
So that’s been kind of the path of my career, and one of the things that I’ve been working on over the course of that 30 years that I’ve been doing data science.
Conor Doherty: Well thank you. Joannes, that dovetails very much into your perspective.
In your book, on the table, I know—and I’m quoting the terms—you refer to accuracy and service level as distractions, poor proxies, and one of your favorite terms: numerical artifacts.
Now in simple terms, why do you feel this way? And I presume it’s some version of what Patrick has just said.
Joannes Vermorel: Fundamentally, if you look at service level, first it’s a very mathematical construct in the sense that fundamentally it is a percentage. It does not reflect any kind of economic value for the business.
So first thing is that whenever we have things that are percentage-based, we have to be careful because it’s fundamentally not clear at all that this thing is rooted in anything real for the business.
That’s the trick: it may sound like being very scientific because you have measurements and whatnot, but is it?
I really appreciate the comment of Patrick: “and then a miracle happens.” You have your percentage, but how does that connect to profitability? “And then a miracle happens.” Maybe, but maybe not.
The danger is that whenever you have those percentages, you have the danger of scientism. It looks like science. There is a number, there is a metric, you can have even a very precise percentage—97.17—so that makes it look extra rational.
But it doesn’t. It’s still a percentage, and it’s not clear that it’s connected at all to the long-term interest of the company.
Now, if we go back to those service levels, we can have the descriptive aspect and the prescriptive aspect.
Descriptive is: I look back. The problem is that if I look back for a given SKU, and I am typically a relatively high quality of service, I can have many SKUs that are at 100%. That doesn’t tell me much.
Because what if, for example, this Beanie Baby toy before the promotion started—for all eternity—it was like 100% or undefined. In terms of descriptive, it was not very useful.
And then when you go to zero—because you go to zero sales because you don’t have it anymore—the service level doesn’t differentiate if you’re missing out from one unit or a million.
So if you see that, that’s also another problem: it says “I’m stocked out,” yes, but it’s really not the same thing to be stocked out because I sold 100 and I was short of one unit, versus I sold one and I missed 100 and I was stocked out.
So again, in terms of descriptive, it doesn’t tell the story.
Then if we say in terms of prescription—prescriptive perspective—like what should I be looking for in terms of service level, it’s exactly as you described: the asymmetries in terms of economics are not taken into account.
If I have something that can, like McDonald’s, have a double-digit percentage impact on growth on my sales—so I can substantially increase my sales—and it costs me a few cents per meal to have this toy from China, this is highly, highly asymmetric.
It means that with a limited investment I can have a massive uptake.
In this sort of situation you would say: you know what, those plastic toys are not perishable, they are very cheap, the upside is super high, maybe I should take the risk of running overstock.
If things don’t go well, I would just liquidate them over time. Clients will not be… for example, at McDonald’s, worst case clients will get two toys with their meal for a month.
Probably even if the second toy is not that great, it probably will not piss off that many customers if the Happy Meal has a second toy.
So I can see that the risk that I’m taking in having too much is not that huge, as long as the cost per meal is kind of under control.
Patrick McDonald: Exactly. I love the way, first of all, that you talk about accuracy as a mathematical artifact.
I think basically I’ve looked at it a little bit differently. I call it one of the seductive six. I have six assumptions that data scientists make that we shouldn’t all the time.
One of them is local versus global optima. So if you’re saying it’s a numerical artifact—if I focus on accuracy—I’m focusing on a local mathematical artificial optimum rather than what I should be focusing on, which is: how can I maximize how much cash flow or profit am I going to be making?
That’s local versus global.
The other thing that comes to mind when we talk about this is: what decision are you really trying to make?
I think about inventory decisions a lot like playing poker, right? You are pushing your chips in on the table to make a bet based on uncertain information.
You want to have as much information as you can, and there is a risk element to it.
My current client is over in the Netherlands. They’re a software-as-a-service company where, for retail, they do forecasting and help smaller retailers position their inventory.
One of the things that they found in their forecasting applications was they went after accuracy first and found that that was not the most important thing for their particular retailers.
Most important thing was being able to predict whether or not they were going to stock out, because retailers are going to ship.
If you have an item of clothing, you’re going to have one of them in a style and color and a size, typically—maybe a couple—but not more than that. You ship once a week.
Do you replenish or not, right? You’ve got it in the warehouse, it’s sitting there. There’s not a lot of difference in terms of operating expense whether it’s sitting in the warehouse or sitting in the store, but you don’t want too much stuff on the floor of the store.
But you want to have enough. So their forecasting capability is really geared and targeted towards being able to understand: okay, am I going to sell that one unit, and do I stock out?
That’s how they answer the question: do I go ahead and push inventory from the DC to the retail outlet?
It’s a very good approach because they’re answering a different question rather than trying to say: okay, how many am I going to sell, and can I hit that number specifically and exactly?
It’s: do I make the decision to ship or not?
Really understanding that gets to the point of which question you’re trying to answer, and that means which metric am I going to use.
So they’re looking at recall and precision much more closely around a categorical decision—am I going to stock out or not—being much more important than the numerical one: how much am I going to sell?
That’s another type of metric that I think sometimes we need to look at from a forecasting perspective that gives us better information in terms of the kinds of decisions we’re actually going to be making.
Does that make sense?
Conor Doherty: Absolutely. Absolutely it does.
Joannes Vermorel: There is another benefit if you start looking at decision. For example, the demand forecast: the problem is that depending on how you operate, there might not even be any good answer—any accurate answer.
An example: you have your fashion retail network. At the end of the collection, at the end of the season, they have the opportunity to do discounts to liquidate the current collection.
So if you say, “I forecast this level of demand,” the question is: at which price point are you forecasting?
There is the current price, but there is also the option of doing a discount. Now you have to look at the various strategies.
If I keep the unit in the warehouse until the end of the season, I will have to push it somewhere and do a discount.
If I push it into the store, I may have the opportunity to sell it before the end of the season, before I have to do a discount.
But maybe this particular store, in terms of market power when it comes to discounts—maybe the clientele locally is very poor and unresponsive to discounts for this specific store comparatively to the others.
So you see: it makes several things—your local versus global thing—but also, what I wanted to point out was that if you think of the future as something static, you remove all the agency that you have to shape this future as you walk the path in the future.
What I’m saying is that the problem with the forecasting-first mindset is that it removes all the agency that the company might have, because you’re essentially saying: this is the plan, this becomes a commitment, and this is what we’re doing.
Instead of thinking: this is the right decision, and I keep many options available to accommodate this decision that I’m taking right now.
I don’t need to be committed beyond the decision. The decision that I’m taking is my only commitment. The rest remains open.
That’s the problem with forecasting: it tends to lock the company into a trajectory that is completely blind to any optionality.
What if you want to do a transfer later on between stores? Maybe that’s an option, maybe it’s not. But if you think in terms of decision, that comes very naturally.
If you think in terms of forecasting—especially time series forecast—this is something that is almost impossible to express.
Patrick McDonald: My current client actually is doing some of that. They have a transfer module, so they look at: okay, do I transfer this particular variant from one store to another because it’s more likely to sell there?
We also were doing some analysis on the pricing. We did a basic price elasticity analysis, looking at it and saying: okay, if I do a discount, am I going to get more volume, or am I just giving away margin?
We’re starting to be able to answer that question.
I love that you talk a little bit about whether it’s dynamic or kind of a static thing. That’s number four of my sacred six that we think about all the time: we look at everything as a static equilibrium problem, and most things are dynamic.
So absolutely, I think those are critical kinds of decisions and we need to take that optionality into account.
I think the other area that we don’t think about enough—and even I haven’t gotten to this yet because the problem is fairly complicated, and I know there are some other people that have worked on it—is there’s a lot of uncertainty in terms of delivery time.
Your suppliers, particularly if you’re in manufacturing and you have a whole bill of materials that you’re waiting on: you have supply coming in and you have to wait on that.
Sometimes they deliver on time, sometimes they don’t. There’s uncertainty there that needs to be modeled.
I know there are folks that do a pretty good job of doing some of that, but I think we have more capability from a calculation perspective now to be able to look at that at a little more depth.
So I think there are opportunities even on the supply side in terms of understanding what that looks like and where it goes.
Joannes Vermorel: Yes. I even have lectures, by the way, on probabilistic lead time modeling on YouTube.
We have built technologies for quite a few years now to combine many sources of uncertainty, and that’s why typically the classic accuracy falls flat on its face.
It’s very difficult due to those asymmetrical concerns. You’re combining many uncertainties, each one having their own asymmetries that can be quite counterintuitive.
They can combine. For example, lead times tend to have… most lead time distributions are bimodal. You have a mode, a peak, for the normal expected date when everything goes smooth, and then you have the tail that is super long when things do not go according to plan.
It’s even very frequently a distribution that doesn’t even have an average because some stuff gets never delivered. So mathematically, not even an average. It’s a bit strange.
In addition to that, the case where the delivery times tend to completely go off the charts is when you pass a bigger order, which is unsurprising. You pass a big order, unusual, and then your supplier struggles.
So you have correlations.
What I’m saying is that if we go with the classic supply chain paradigm that thinks in terms of accuracy, with percentages all over the place, when you combine all those effects you realize that what can cost you money is absolutely not obvious from a percentage viewpoint.
You may end up with very dumb things. For example, if you have a product that is fresh and perishable and that you’re going to sell at Christmas—let’s say oysters—it’s incredibly time-sensitive.
If you miss Christmas and New Year, you’re toast. You will be selling whatever you have at 80% discount, and that could be the best case.
Everything is not an alignment of planets to hurt you that way, but very frequently you have plenty of edge cases. You have a forest of edge cases, where so many products have their own edge cases.
That’s why I was going back initially to say we need to tie everything to dollars at the end—or euros.
Because when you combine those uncertainties, you realize that the weakness of your predictive model can be very counterintuitive. It might be things that, on the surface, a statistician would say, “Oh, it looks fairly accurate and well calibrated,” but you realize that you end up with problems.
An example: if you rerun your logic every day to know if you want to pass a purchase order from, let’s say, China, and one day out of 100—so 1% chance—the thing spikes just due to numerical instability, it means you will be passing something like three orders per year to your supplier in China just because of numerical instability of the model.
Today it spiked and then you’re chasing like a ghost. It’s a numerical artifact that was just the numerical instability of your model on that day.
Accuracy-wise, if these sorts of problems only happen once—1% of the time—it will not even appear in your average accuracy because it will be completely dwarfed by other stuff.
That’s what I’m saying: weaknesses of your predictive model need to be assessed in dollars. Otherwise you have things that look insignificant for most percentage-based metrics, but once you look at them in dollars you realize: “Oh crap, this thing that looks small, in fact it’s not small, it’s big,” because I forgot my ratchet effect, for example, on the purchase orders made from this supplier in China.
Patrick McDonald: We’ve begun to kind of think about that. Taleb talked about it first in his book The Black Swan. I’m sure you’re familiar with that, right?
We now have gray swans and black swans. I have what I call purple swans, which are really weird-looking distributions that happen because of certain things that are pretty unique. They’re those edge cases.
They don’t fit a standard distribution. They’re not numerically describable. I have to use a real probability density function and use it as an array of values in order to describe it.
The first example I had of that was: we were doing a forecasting solution for a proof of concept for an aerospace company, and they asked me to calculate safety stock. That was something I hadn’t done before.
I looked at it a bit differently. I leveraged Sam Savage’s work out of Stanford with probability management, and he’s got a little Excel tool that allows you to basically do calculations with probability density functions.
So I did something really simple. I took the prediction interval for the forecast and I used just a basic one, so it was just a normal distribution.
It was the first time I actually did this mapping of value of information model all the way through and said, okay, based on that, how do I set my safety stocks?
I started looking at it. I was looking at an individual time series and: normal distribution, normal distribution, normal distribution. All of a sudden I see this one I’ve never seen before, and it looks like this—boom.
What the heck is that?
I figured it out. What I had done was I had in there a function where I said: my sales are the minimum of my stock level or demand, right?
The demand was a normal distribution. Stock level was right here. And if demand exceeded stock level, the distribution—the tail—pops up. Well, that’s just your stockout risk.
I was able to see that 11 months in advance by looking at that probability distribution.
So that’s an example of kind of a purple swan.
In terms of your point, when we’re chasing that accuracy, the thing that so many of us do is we’ll take that accuracy metric in isolation. We calculate a prediction interval, but we throw it out and we don’t use it.
Using that and putting it into simulation is where you start to actually see those edge cases manifest and can get a better understanding.
Joannes Vermorel: Yes, absolutely. For aerospace, it’s very interesting because we have done so much.
A few examples: accuracy locks you into the time series perspective, which is extra wrong in this case.
One of our first discoveries when we were working in aerospace was to discover the concept of retrofits. You have parts that are being demanded because you need a repair, so a part has to come in.
But then you realize that you have retrofits, which are parts where the OEM says: you need to push those parts—those new parts—as replacement to the aircraft because we don’t trust the old parts anymore.
So in your time series you’re mixing, in fact, two different types of units: the ones that are pulled for repairs, and the ones that are pushed for retrofits.
But that’s not the end.
Another element: when we look at the demand for aerospace, very frequently the aircraft needs to complete the repair within, let’s say, eight hours for the small maintenance.
As a result, the crew are going to ask for a lot more parts than what they actually need, because they will have only eight hours to do the repair.
So they would say, “We need 100 of those parts,” but then you will have, the next day, a massive amount of parts being returned, but not used.
So you need to understand your demand signal.
Those things are not super complicated, but they need to be taken into account, and you need to have the aerospace-friendly perspective to really understand: okay, what are they trying to do?
They are trying to repair a plane. They have concerns for the timing of their operation. So they need to ask a little bit more and they will return a lot of things.
Some parts are actually demanded by the crew. Some are actually pushed by the OEM.
So we have nuance here to take into account, etc.
That’s why all those ideas—unlike percentages—come from understanding how do you get an aircraft to be repaired in the first place. It’s a different domain of knowledge.
My message here when it comes to those indicators is that, as a rule of thumb, you have to be extremely suspicious of any indicators that come right from the world of mathematics—pure mathematics—as opposed to something that is really driven by a very precise understanding of what it is that you’re trying to achieve on the ground.
Unfortunately, most of the KPIs that I see come from a lot of mathematics, usually because they are so much easier to define.
If we go back to aerospace, it’s a kind of crappy situation where the crew say, “I want 100,” you have only 80, but in the end they return 30.
Did you satisfy the demand? Yes, no, yes?
The crew was very tense because they thought that they were maybe going to lack, but in the end they didn’t lack.
So that’s the sort of thing where, suddenly, we are into the nitty-gritty detail of understanding the situation, as opposed to mean square error versus MAPE versus absolute error versus etc.—all the theoretical criteria.
Conor Doherty: Well, gentlemen, on that note, I think we have thoroughly denunciated what you would both see as the mainstream approach to tracking performance.
But what remains a little bit fuzzy is what we’re proposing as an alternative.
So, for example, to come back to Patrick: you said, again, chasing forecast accuracy in isolation is a fool’s errand. Let’s grant that for the sake of discussion.
Okay, but then what are we supposed to be tracking? Saying “let’s just track money,” that’s a bit unclear to people.
So what is the actual thesis that we’re proposing—or you’re proposing—to replace the traditional KPIs?
Patrick McDonald: Right. I tend to do what I call value of information analysis.
I want to simulate that cash flow in terms of the decisions that we’re actually looking at and how we’re making them.
To do that, you really have to have a clear understanding of that value of information model.
What’s the cost of not making the sale if we’re going to stock out? What is my holding cost associated with it?
What does that uncertainty look like, whether it’s a forecast or a convolution with delivery times or whatever it is in your simulation, to model that?
Try to track that from: here’s my insight, here’s the decision I’m going to make, this is the lever I’m going to throw in the business, this is where I’m going to set my positioning, this is where I’m going to set my effort.
What are the outcomes I expect?
Do that simulation—do the Monte Carlo analysis—and look at what the probability distributions of the outcomes are going to be.
Then measure your actual versus predictive in terms of looking at that.
That’s the approach that I’ve taken. It tends to work fairly well. It does require a bit of additional sophistication from your business executives.
Business executives tend to want to think of things in very linear fashion and somewhat simplistic. That’s another one of my six things that I’m concerned about.
But that’s the approach that I take, and that’s what I would really recommend: think about that value of information model and how you apply it.
Now, the traditional forecasting metrics—are they still helpful from a statistics perspective? Yeah, they can be.
I tend to try to weight them. I look at, like I say, a weighted precision, a weighted recall, a weighted pinball metric I think is much more useful than an accuracy metric.
Pinball allows you to look at accuracy across that whole demand profile—that density function.
If you put it at the right position and put a value-of-information skew on it, and weight it properly, it can give you some insight towards that value analysis that you’re looking at.
These are things that I’m really only just now implementing. I’m in the process of learning as I’m doing this stuff.
It’s been over the course of, like I say, 30 years of work, and every day I go into the office, I learn something new.
So that’s where I’m at. That’s the approach that I’m currently taking, and it appears to be having some real impact for some of my clients.
Conor Doherty: Thank you, Patrick.
Joannes, same question. I presume you’re coming at it from a very concrete financial perspective as well.
Joannes Vermorel: I would say there are really two broad sets of indicators that we typically generate and monitor for completely different purpose and different audience.
The first audience would be supply chain management—supply chain executives. For those, it will be essentially economic drivers.
We say: we want to maximize the return on investment, so the rate of return of every decision. But we need to decompose that into a series of impacts: cost of inventory, expected margin, expected cost of inventory write-off, stockout penalty.
So we do this decomposition.
Within this realm, we have indicators that are forward-looking—so they are dependent on the predictive model—and those that are purely descriptive: they just look at what happens.
Here we’re talking of maybe, I would say, a dozen max of indicators. Some that are purely descriptive—descriptive statistics—and some that are embedding conditional to the correctness of the predictive model.
For example, if I tell you this inventory carries a risk of inventory write-off of this many dollars, I have not observed this inventory write-off yet. So it is a number that I’m building thanks to a predictive model of some kind.
That’s for the audience of executives and practitioners.
Then we have a second set of indicators that are typically relatively monstrous. It can be hundreds of numbers for the data scientists themselves.
Here we are typically looking at the scent that something can go wrong in the whole chain of processing, because we are getting data. It’s typically a haphazard process where we have dozens and dozens of tables that are extracted from ERPs—potentially multiple ones—a WMS, a CRM.
So we consolidate a lot of stuff from the applicative landscape of the company, and there are so many things that can go wrong.
For example: what if you suddenly, from one day to another, you have a variation with 5% extra suppliers? Is it significant or not significant? Did someone introduce duplicates or not?
What about 20% more suppliers? Okay, 20%—it’s probably a duplicate or probably a bug.
So you have to monitor plenty of things that are issues that can creep along your data processing pipeline.
Pretty much everything: you will monitor number of SKUs, number of suppliers.
Sometimes you try to identify things that are close to invariance that you could use to detect that there has been a bug in your data pipeline.
Example: an e-commerce of ours selling car parts. We noticed years ago that it was an extremely stable “two parts and a half per basket.”
It was incredibly stable—during the summer, the winter, Christmas—it’s all super, super stable: two parts and a half per basket.
That means we have something super stable. The business might be fluctuating a lot, but in fact this thing is very stable.
Which means that if we have a deviation on this, it probably means that we have a bug in the pipeline: for example, the order lines have been dropped, that we only have the first order line of every order, or something nonsensical like that.
So the supply chain scientist will compose a dashboard, but here it’s absolutely not really value-driven. It is, from our perspective, what we call insanity-driven.
You want to keep an eye on all the stuff that can literally wreck havoc on your calculation and completely undermine your models. It can be a lot of dumb things.
It can even be, for example, the ratio of letters and numbers in product description labels. If you have labels that suddenly are only made of numbers, most likely you don’t have the correct labels anymore for the description.
It’s plenty of heuristics that are just there to make sure that the data that you process automatically at scale is still sane.
That is only of interest to what we call the supply chain scientist, because the supply chain scientist wants to make sure that every single day, decisions at scale have 0% insanity.
I define insanity as something where anybody would look at those decisions and say, “An insane decision—oh no, that’s crazy, you should not be doing that.” Something went wrong somewhere in the data pipeline.
For us, it’s very important to make sure that this number of insane decisions is zero.
We can’t be accurate, but we can eliminate the gross, gross insanity.
Conor Doherty: Patrick, anything to add before we push on?
Patrick McDonald: I just real quick want to say: over the course of my career in data science, data accuracy and data quality has always been an issue.
I’ve seen weird things—negative inventory levels. How do you have a negative inventory level? That doesn’t make any sense.
Had one client—a big FAANG client—the director of data centers would walk into a brand new data center, pull up his power report, it would say no power was being drawn.
He looks around and of course all the machines are going and the lights are flashing, so he knows the report’s wrong. What happened?
So yeah, a lot of things can happen in the data processing perspective.
I think statistical process control is something that can be used to kind of do some of that. Sounds like that’s what you guys are using, and it’s the right way to handle it.
If you don’t have good data, garbage in, garbage out still applies. With big data, that means you’re going to have a lot of garbage. So you have to deal with it.
Joannes Vermorel: Yes, absolutely.
Sometimes the most positive thing is that the data is actually correct—just in a very, very weird way. For example, SAP decided 30 years ago that returns would be counted as negative sales.
So that means you have days where you have negative sales. It just means that you had actually more returns than items being sold.
If you go for e-commerce in Germany, where the percentage of return is like 40% of your items being shipped are being returned, you will have tons of negative sales.
But that’s a very important information—except it’s not a negative sale, it’s a return.
So that’s the sort of things.
But I agree: data is extremely messy, and it’s very important to make sure that it stays under control.
Conor Doherty: Great.
Well, gentlemen, I do have a closing question, but I’m going to push that to the end and instead prioritize some of the audience questions.
These are pulled from some direct messages questions on this thread, but also an interesting question from last week that I think applies to today quite well.
So I’ll start with a question from Miguel Lara. This is to the panel. Patrick, I’ll go to you first.
Is there any KPI—or are there any KPIs—that don’t necessarily affect financial results but you still would consider high impact, or of some utility?
Patrick McDonald: Yeah.
I think a lot of the standard metrics that we do look at are still important to look at.
MAPE is an important thing when you’re looking at and trying to understand how well a forecasting model is going to work.
Your MAE or your MASE are also things that I will look at and use.
It’s not that those aren’t important metrics and that we don’t think about them, but they’re not the most important metric, and they’re not the ones that we should chase.
I think that’s kind of the perspective that I would have.
Understand what the metrics are telling you and use them for their designated purpose.
Don’t make that “and then a miracle happens” assumption and apply those metrics to the assumption that you’re going to get value out of it in terms of financial value when you apply those in ways in which they were not intended.
So I guess that’s my answer to that. Fit-for-purpose is something I try to keep in mind in everything that I do, and that’s what I would recommend for a lot of those metrics.
Conor Doherty: Thank you.
Joannes, same question. Do you need me to repeat?
Joannes Vermorel: I think for Lokad, the most important non-financial metric is the number of manual overrides per day for decisions that should be automated. That’s the number one thing.
So it’s not financial, but for us anything that is above zero is a problem, and we see that as a defect.
The problem is that when we have a defect, it means that our model is kind of wrong. If it’s wrong, then we can’t even trust the economic modeling that we have.
By definition, if people are doing those overrides, it means that they see things that we don’t.
So all those economic measurements that we have might be completely undermined by this proof of incorrectness.
So for us, that is the number one: those manual overrides, because whenever they happen it means that there is something that we are getting wrong in the model itself, and thus potentially it can undermine all the economic analysis end-to-end.
That’s why we need to pay a lot of attention. We treat that as a bug. Unless we understand the bug, we don’t know how deep the rabbit hole is going.
Patrick McDonald: I found that to be true too.
I guess the question I would have is: is that always the case? When you found out that almost all those overrides are wrong?
Or are there cases where somebody comes in and overrides and, yeah, there was a problem with the model and we figured it out and fixed it?
Joannes Vermorel: First, the thing is that initially, when we start a project, very frequently most of the overrides—unfortunately for us—are real. There are stuff that we have missed. There are stuff that we did not understand.
So when we get started, most of those overrides usually reflect stuff that we didn’t know.
For example, when we started aviation, that was like a decade and a half ago, I didn’t know what was a retrofit—the importance of retrofit.
We had a lot of people tweaking numbers, and at some point I said, “What is going on?” They told us it’s a retrofit. I said, “What?”
Then we went back. The guy who did the modeling was very ignorant on something very important.
But then over time it can be anything. It can be a new guy that is overriding for no reason whatsoever, just because he was used to override stuff in his previous company, comes here and says, “Oh, I should override.” Well actually it doesn’t make any sense.
I treat that… the percentage of critical reports diminishes a lot.
But it’s a little bit like when you’re a software company: you have bug reports from users. You have a lot of rejects. People will say, “Oh it’s buggy,” no, you just got the feature wrong—it’s really the intended behavior.
Nevertheless, we tend to monitor those very carefully, because, according to Nassim Taleb, the occasional that prove to be correct can be very significant.
So even if you say 99% are like “the user is wrong,” maybe this 1% turns out that it was very impacting.
So that’s why we keep an eye.
For a mature project, the vast majority are essentially noise, but a few are not, and those are the ones we are really concerned about.
Conor Doherty: All right. Thank you.
This next question was posted by— I hope I’m pronouncing it correctly—Lucio Zona. Now this was on the safety stocks discussion from last week, but it’s very relevant.
So I need to read it, and then there’s two questions.
A little bit of context: Lucio pointed out that most supply chain managers’ bonuses are tied to KPI. So for example, on time and in full. As a result, no one really gets fired for excess stock, only for bad on time and in full.
That naturally pushes—or possibly incentivizes—people to inflate inventories.
In theory, we could size safety stocks using true economic costs per unit short, per stockout event, but those numbers are fuzzy. So companies default to simple KPIs.
Now the two questions: to whom do KPIs actually matter, and how should incentive structures be designed so that those metrics or KPIs don’t basically reward people gaming the system?
Patrick, I’ll go to you first.
Patrick McDonald: It’s a difficult question. I know. No, it’s not. It’s a simple question that aggravates the heck out of me.
The amount of incentives in conflict in major organizations that I’ve been in is mind-numbing.
That’s one of the biggest challenges that I have.
A whole different topic.
I have a different set of overall metrics that I think organizations should be looking at in terms of how they run their businesses.
Metrics in conflict, particularly the ones that incentivize certain behaviors that are in conflict, is problematic.
That’s part of why we have sales and operations planning. Operations and supply chain have their own problems for many reasons.
Not the least of which is that everybody has what they call a forecast, and there are at least seven different forecasts in each organization.
One’s a supply plan, one’s a demand plan, one’s a statistical forecast, one is the sales targets, one’s the marketing plan, and one’s a financial plan. They’re all called “the forecast.” They don’t reconcile.
Then the organization wonders why they’re out of alignment.
Part of the reason is you go back to the individual metrics that people’s bonuses are based upon that incentivize their behavior.
So yes, that is a huge problem. Fixing it is not a trivial challenge. It’s a big management consulting sort of work—or big management problem—that needs to be dealt with.
It can be dealt with, but it’s going to take some real leadership inside an organization to make that happen.
So does that cover the first part of the question? What was the second part? Did I miss anything there?
Conor Doherty: KPIs matter to who—or to whom do KPIs actually matter? And then how do you incentivize people not to game the system?
Patrick McDonald: You make sure your metrics are in alignment and they’re not in conflict. That’s how you do that.
Conor Doherty: All right. Thank you.
Joannes, same questions.
Joannes Vermorel: My approach will be a little bit different.
There is a corporate law that says: any good metric that is established as a target ceases to be a good metric.
My take is that this idea that you can incentivize people on metrics is infantilization of your workforce. It’s always going to backfire.
People will always game the metric. It’s just a disaster waiting to happen.
It is a sort of idea that looks good but has invariably disastrous outcomes over time.
It doesn’t matter which metric. People think, “Oh, if we pick the right metric, this time it will not be gamed. This time it will be good. This time it will be aligned with the long-term interest of the company.”
Turns out: no, it won’t.
I don’t know how, but give it a few months to your team members and they will find a way to make your life hell and your company less profitable by gaming the metric.
Again, this is one of the greatest strengths of humans. Humans are ingenious. That’s a good thing—they are ingenious. So stuff will happen. Stuff will happen and it will go wrong.
My suggestion is: give up on this naive rationalism. For me it’s naive rationalism to say, “Oh, what I only need to do is to have a very clear metric, and then people will maximize this metric, it will align with long-term interest, problem solved.”
“I don’t have to manage people, I just have to let the incentive do the work for me as a manager.” That is, for me, a very infantile view of human nature. It doesn’t work like that.
Incentives are very powerful, but incentives—Taleb, by the way, discusses a lot in his books, not only The Black Swan but Antifragile—humans are creatures that look at second order, third order, fourth order.
People will do something because they think their next employer will think better of them, and this and that and that.
So they can do things with a long plan that completely exceeds your expectations.
Approaching your staff with metrics is like first-order reasoning. You don’t take into account second order, third order, etc.
Bottom line: if you have those incentives in your company, the first priority should be just to remove them entirely—all of them.
The only things that are kind of acceptable are like stock options. You take stock options, you vest them for five years. Fine. It’s close enough to the long-term interest of the company, and that’s it.
Keep it super simple and unspecific.
Then you have to appreciate that when you start to do optimization—economic optimization—the half-life of your objective function is like one week.
Your objective function is not something that you put on a pedestal and say “this is it.” This thing will evolve a lot.
For example, when the Trump administration decided that tariffs could be changed like five times a day, then suddenly the game was changed.
There have been, what, 400 tariff updates in the US since… we’re all being very antifragile right now. We have to be.
So that’s the sort of thing: new rules, the objective function is moving.
Now we would have to introduce probabilistic forecast of tariff distribution to anticipate, because nobody can really be in the head of the president.
So apparently all you can have is a probabilistic forecast of where the tariff will be, and it will be between 0% and 200%, apparently.
My take here is: understand that because the economic function is evolving rapidly—not always as rapidly as the tariffs in the US, but still rapidly—the problem is that the metric that you have for your teams is something that is going to be revised once a year, every two years.
You’re going to drive everybody nuts if you change the way they earn money every month. That’s just crazy.
The reality is: for psychological safety, people need something that will give them some kind of at least 12 months of projection where they think, “This is what I earn.”
But supply chain requires you to think sometimes in the matter of days and to be very reactive. Those things are not compatible.
That’s why I say: remove those incentives. In practice, it will backfire if only because you need to update your economic functions much more rapidly than you can revisit your packages that you offer to your employees, even to your executives.
Conor Doherty: Cool. Thank you.
Two points there. One, I think I said “Goodwin’s law.” Godwin’s law—that’s a very different thing. It’s Goodhart’s law is what I meant to say. Those are two very, very different phenomena.
Actually just some comments coming through, basically just agreement. Miguel Lara pointing out: if you create a KPI that doesn’t drive any impact, it’s basically just extra work and has no real value at the end of the day.
Joannes Vermorel: And by the way, because nobody that thinks that large corporations only thinks in terms of additive stuff—never subtractive.
So whenever there is an indicator of any kind, whether it’s a key indicator or a performance indicator or whatever, it’s just going to be added to the pile.
Fast forward a decade and you have what I call the wall of metrics, which is the 100 numbers that nobody reads.
And yet it still freezes the business intelligence tool every first day of the month because they have so much performance indicators to compute.
They have like 100 indicators and the whole BI instance is frozen for the entire day.
I have even seen companies where, for the first day of the month, they literally have to stop certain operations to be done because the ERP is halfway frozen.
So they would pause a plant to let the ERP do the reporting.
Conor Doherty: Well, gentlemen, we’ve been going for just over an hour. I believe, Patrick, you’d mentioned that as a possible out.
However, by way of summation: Miguel has commented— I’ll put this as a summary, and you just give me your closing thoughts relative to this.
Just because there are many KPIs doesn’t mean you should track all of them at once. You should select only the KPIs that really matter depending on your approach and current focus.
In reverse order, your closing thoughts relative to that. Joannes?
Joannes Vermorel: I would say yes and no.
The problem is that this is the common wisdom of present companies. If you take it at face value, the problem is: let’s see where this common wisdom got us.
In most companies, you have a wall of metrics. You have literally dozens, if not hundreds, of indicators that nobody cares about.
It is completely opaque. Nobody even understands.
Usually you take the average supply chain practitioner: they have a screen with 15 numbers and they can’t even tell the semantics of what is computed for half of them.
They would say, “Oh, this thing, I think it computes the average stock level over the last 30 days. Maybe. I’m not sure.” Or something like that.
So I would say: this is a common wisdom, but my approach would be much more aggressive.
We know that larger organizations tend to accumulate bureaucracy and crap much more easily than you get rid of them.
You have to double down on purging—on being merciless—and purge your process from all those numbers that are not critical.
For typical supply chain executives, what we are thinking is: there is a quota of, let’s say, 10 numbers. If you want a new number in, you have to remove a number to get a new number in.
You need to maintain that.
Then if we go for the data scientist side, I say you can have hundreds of numbers, but then it’s the numerical recipe of the data scientist—the supply chain scientist—who is doing its own stuff.
The rest of the organization is not on. It’s okay because it’s not a tax that the rest of the organization has to pay. It’s only a tax that impacts the scientist himself or herself, not the rest of the company.
So that common wisdom: yes, it looks reasonable, but again, beware.
Fast forward a decade, my observation is that usually you end up in a very bad place by following this common wisdom.
Conor Doherty: Patrick, your closing thoughts. How do you feel about that?
Patrick McDonald: If you have more than five, you have too many.
I’m much more strict. I have five specifically that I look at, and they all tie to financial statements.
Number one is a market share metric. I call it brand equity.
If your sales are growing but the market’s growing faster, you’re still losing market share. So don’t just track sales, track where you’re at in the marketplace.
I want to make sure that we’re reliably meeting customer demand, right? So there’s a reliability metric.
I want to make sure that we’re effectively allocating our resources. So there’s an effectiveness metric.
I hate to talk about efficiency because efficiency always ends up being local optima. So I talk about productivity instead.
Then the last one is one I’m still working on. I still don’t have a really good metric for it, but it’s agility: how quickly can you respond to the rapid changes we see in the marketplace on a daily basis.
If you cover those five, I think that covers most of what you want to understand from a business perspective.
Of course, there’s different ways to look at those, but those are kind of my five. They don’t have a lot of overlap, and they get to the key things that go to the financials.
Conor Doherty: All right.
Well, gentlemen, thank you very much. I don’t have any more questions. We’ve addressed everything in the live chat and I think we’re just about out of time.
Joannes, as always, thank you very much for joining me.
Patrick, really appreciate you joining us remotely and for your time. Your insights were really great.
Patrick McDonald: Pleasure. Hopefully we’ll be able to do it again sometime. Thank you.
Conor Doherty: And to everyone else watching, thank you very much for attending—for your private messages, your comments.
If you want to continue the discussion, as I always say, reach out to Joannes, me, and Patrick. We’re always happy to talk to new people.
And yeah, that’s it. Thank you very much. We’ll see you next week, and get back to work.