ISF2024 Panel: Demand Planning & Human Judgement in the World of AI

July 17, 2024

supply chain science and tech

00:00:00 Opening remarks by Robert Fildes
00:01:08 Conor Doherty introduces the panel and topic
00:03:11 Nicolas Vandeput’s perspective
00:06:16 Sven Crone’s presentation
00:10:34 Alexey Tikhonov’s perspective
00:15:01 Need for automation in decision making
00:20:13 Sharing information between humans is waste of time
00:25:29 Perspective on human intervention
00:30:23 Evaluating a forecast
00:35:18 Financial perspective and decision making
00:40:14 Cost of forecasting errors
00:45:43 Automation and trust
00:50:27 Augmented AI and its applications
00:55:03 Impact of AI on human translators
01:00:16 Importance of clear vision in AI implementation
01:06:00 Closing thoughts and future of demand planners
01:11:50 Audience question: forecasting for hospitals
01:15:38 Audience question: model and human bias reduction

Panel background

The panel was first proposed by Robert Fildes (Professor Emeritus, Lancaster University) in response to Conor’s article critiquing FVA. This article was republished in the Q2 2024 edition of Foresight (produced by The International Institute of Forecasters, the same organization running the symposium). The panel was subsequently expanded to include Sven Crone, Nicolas Vandeput, and Alexey Tikhonov, in order to provide a more balanced array of perspectives from both academia and industry.

Summary of panel discussion

Filmed in July 2024 at the 44th International Symposium on Forecasting in Dijon, the panel of four speakers discussed “Demand Planning and the Role of Judgment in the New World of AI/ML”. Moderated by Lokad’s Head of Communication, Conor Doherty, the panel included Alexey Tikhonov (Lokad), Sven Crone (Lancaster University & iqast), and Nicolas Vandeput (SupChains). The discussion revolved around the integration of AI in demand planning, the value of forecasting in decision-making, and the future of demand planners. The panelists shared differing views on the role of human judgment in demand planning, the potential of AI to replace demand planners, and the importance of accuracy in forecasting.

Extended summary

The 44th International Symposium on Forecasting in Dijon, France, hosted by the International Institute of Forecasters, featured a panel discussion on “Demand Planning and the Role of Judgment in the New World of AI/ML”. The discussion was moderated by Conor Doherty from Lokad, and the panelists included Alexey Tikhonov from Lokad, Sven Crone from iqast, and Nicolas Vandeput from SupChains. The session was introduced by Robert Fildes, Professor Emeritus at Lancaster University.

The discussion began with Nicolas Vandeput outlining his vision for demand planning in the age of machine learning. He proposed a four-step process that included viewing demand forecasting as an information game, creating an automated machine learning demand forecasting engine, allowing human demand planners to enrich the forecast with information not included in the model, and tracking the added value of everyone involved in the process.

Sven Crone shared his experience in AI and forecasting, noting the slow adoption of AI in demand planning. He discussed the complexities of integrating AI into demand planning processes and suggested that AI could potentially replace demand planners in the future. However, he also highlighted the heterogeneity of forecasting, with different industries requiring different approaches.

Alexey Tikhonov argued that demand planning is an obsolete approach and that judgmental forecasting interventions are wasteful. He advocated for probabilistic forecasting, which captures the structural pattern of risk, and criticized demand planning for its lack of economic perspective and automation. He also argued for complete automation of the decision-making process in supply chains, stating that the complexity and scale of decisions required in supply chains necessitate this.

The panelists also discussed the value of forecasting in decision-making. Nicolas Vandeput emphasized that forecasts are made to facilitate decision-making and argued for models that can process as much information as possible. He also suggested that when evaluating a forecast, he would focus on forecasting accuracy rather than business outcomes, as the latter can be influenced by many other factors beyond the control of the forecaster.

Sven Crone discussed the industrial perspective of demand planning, emphasizing the importance of long-term strategic decisions and scenario-based planning. He also highlighted the challenges in measuring value add and the importance of judgment in the process.

Alexey Tikhonov questioned the value of a more accurate forecast if it does not lead to a different decision. He argued that the value of a decision does not solely depend on the forecast, but also on other factors such as decision drivers.

The panelists also discussed the trust in forecasts, with Nicolas Vandeput suggesting that the only way to build trust in a forecast, whether it’s generated by a human or a machine, is to track the accuracy of every single step in the process. Sven Crone agreed that trust is important and suggested that a combination of AI and simple, transparent methods could be used to automate parts of the process.

The panelists also discussed the future of demand planners. Sven Crone believes that demand planners will still have a role in the future, but they will face increasing challenges due to the increasing frequency of decisions and the growing amount of data available. Nicolas Vandeput sees the role of demand planners evolving to focus on collecting, structuring, and cleaning data and information. Alexey Tikhonov believes that demand planners will not be able to compete with systems of intelligence in the long run.

The panel concluded with a Q&A session, where the panelists addressed questions from the audience on topics such as the conditions or requirements for creating automatic decisions in demand planning, the role of judgment in demand planning, and how to incorporate the bias from human judgment into the statistical bias to reduce the overall bias.

Full Transcript

Robert Fildes: I’m Robert Fildes and I’m introducing these two sessions. For logistics reasons, they’ve been swapped over and we’re going for the next hour or so to talk about the changing role of demand planners and their role being potentially changed substantially by the developments in AI and machine learning. The panel will be pronouncing their words of wisdom shortly. The session of Paul Goodwin and myself talking about lots of empirical evidence on the role of judgment is swapped to this afternoon at 15:10. It’s in the program anyway. Yes, judgmental adjustment, we’ll be talking about that but not in this room, in another room. So I look forward to seeing you then and I look forward to a stimulating and preferably controversial discussion and hand over to the chair, Conor.

Conor Doherty: Well, thank you very much Robert. Hi everyone, I’m Conor, head of communications at Lokad and I’m very pleased to be joined on stage by an illustrious panel from academia and industry. To my immediate left, Alexey Tikhonov, business and product developer at Lokad. To his left, Dr. Sven Crone of Lancaster University, CEO and founder of iqast and last but very much not least, Nicolas Vandeput of SubChains. Now, the topic of today’s discussion as you can see hopefully on screen is demand planning and the role of judgment in the new world of AI and machine learning.

I’m quite confident given the people involved on stage that this will be a lively exchange of ideas and I think any advancements in technology typically raise questions of how those advancements will impact human involvement. So very much looking forward to hearing our three panelists discuss that. Now before I get into it, time is a bit of a scarce resource today so a little bit of admin. Each panelist will have 5 minutes to present their perspective relative to the topic. First will be Nicolas, then Sven and finally Alex.

Following that, I will ask some questions designed to tease apart some of the details and the implications of their perspectives and given how that goes, if we’re all still on speaking terms then hopefully some questions from the audience. What I will say is, please if possible, given how scarce time is, do have some ideas before the mic is handed over rather than a monologue followed by a question mark. But with that, I hand over first to Nicolas, please, your perspective relative to the topic.

Nicolas Vandeput: Thank you, Conor. Hello everyone. Great, I have the slides. Let me introduce to you the vision I have for demand planning excellence at the age of machine learning and how basically you can for supply chain demand forecasting integrate machine learning together with human enrichment. So you have four steps on the slide. Let me explain that.

The first main thing for me is to see demand forecasting as an information game. So basically what you want to do, what it means is you want to collect as much data, information, insights, however you call it, about future demand. That could be your promotional calendar, how much advertisement you’re going to make, sales data, inventory located at your client, orders you already got from your clients in advance, all of that. And it will be different for some industries, that’s totally fine but basically my point is the first step is to find information about what’s lying ahead. You are a journalist, a reporter, a detective, go and find this information.

Now once we have all of this information, data that can be structured needs to be fed to machine learning and you want to create a machine learning demand forecasting engine that is automated and bulletproof. By automated, I mean that it’s a tool, it’s an engine that does not require any manual modification, review or fine-tuning by human. It’s done by maybe a data science team or it’s done automatically so it’s fully automated. Bulletproof means that your machine learning engine needs to react to most of your business driver, namely promotion, prices, shortages, maybe advertisement, maybe the weather, things like that, holidays and so on. So it’s bulletproof to most of the business driver and it’s fully automated, you don’t need to touch or review it.

Once you have this, humans, so demand planners, can still enrich the forecast based on the information they found that are not included in this model. So let’s for example imagine that they call your client and one client said, “Well, it’s really a tough time, I will not order this month.” The client will not call the machine learning model, the machine learning model is not aware of that. The planner is. The planner should review the forecast and edit it because they know something that the model is not aware of.

Final step, forecast value added. This is an absolutely critical step. I would even start with this one. It means that we need to track the added value of everyone in the process. So we need to track the forecasting accuracy before and after the enrichment to ensure that over time, the enrichment adds value. Of course, everyone can be lucky or unlucky at times, that’s totally fine. Not every single enrichment will add value but what we want to prove and to show over time is that in average, these enrichments add value. So it’s worth our time. Well, that was my vision in four steps on how to integrate machine learning and demand planners.

Conor Doherty: Well, thank you very much, Nicolas. Now I’ll hand over to Sven.

Sven Crone: Thank you. Yeah, thanks. I’m Sven Crone from Lancaster University, assistant professor there with probably almost two decades of research in AI and forecasting. So I’m completely biased towards AI, I have to say that out front. I have been trying to get AI to work in forecasting for many, many years. So just that you’re aware of the fundamental bias. At the same time, we created a small company where we’ve been trying to help large multinational companies leverage new technology and looking back at those decades, it’s excruciatingly hard. I think we have to address today, hopefully on the panel, some elephants in the room.

While the vision has been there for many years, we can replace statistics with AI, we can replace that with statistics. Actually, we’ve not been terribly transformative when it comes to looking at the demand planning processes, I think. So that’s my fundamental bias. When we’re looking at this, some of the experience I can share is that we’ve trained a lot of demand planners in trying to get them to appreciate some of the exponential smoothing algorithms and ARIMA algorithms out there. I can tell you that is not a pleasant exercise. They’re slow to adopt to some of it. It’s a pleasant exercise with the demand but it’s very hard to get them to accept some of the simpler technology. So I think we’ll later on talk a little bit about what happens if that technology gets even more advanced and people have to interact with it.

But the current state, roughly 10 years ago, there was very limited use of AI although neural networks have been around for the better part of 60 years, going back certainly to some of the early innovations, certainly the 1980s. But adoption has been comparatively slow. In the last two or three years, we regularly do surveys at practitioner conferences. We were just speaking at a practitioner conference, the ASM, together in Brussels and we did a survey in the audience. We asked how many people are actually live with a proof of concept in AI and ML and roughly 50% of the audience were there. So that’s up from 5 to 10% 10 years ago. Now 50% are doing a proof of concept. They’re not in production yet but quite a few are already in production and we saw some great companies here already that are trying this out. So no fears in the audience and a few others, so really interesting case studies. But what’s also striking is that as many projects that succeed, fail.

So we had a large chunk in the audience where AI projects did not succeed and I think it is exactly that intersection between embedding a technology into a demand planning process which is much more than just the forecasting step. If we’re looking at industry, we have to look at master data management, data cleansing, prioritization, error metrics, running a statistical model, then potentially analyzing errors, identifying alerts and then making adjustments to that. And by the way, even if you have a fully blown statistical baseline forecasting process, which even today the majority of companies don’t have, Gartner has a beautiful maturity landscape of the different stages of the S&OP maturity.

Very few companies are in the level four, they most of them are between one, two and three. Even if you have a statistical process, how to cleanse the time series history, do it automatically or manually, that’s a judgmental decision. Which algorithm to choose is a decision. Which meta parameters to allow it to search for is a judgmental decision. So there’s a lot of judgment but I think traditionally we think about the judgment, the final adjustment of a statistical baseline forecast that is or is not understood. And maybe to look into the future, I have not seen a lot of movement or innovation when it comes to innovating the S&OP process as it was designed by O.W. and I see executives getting seriously disgruntled with the lack of progress, the perceived lack of progress, although there is often progress in developing the processes.

But I’ve, you know, I think there was a CEO of Unilever that said we got to get rid of demand planning, it doesn’t work during COVID times. Some real challenges ahead for demand planners to keep their jobs and unless they leverage AI and I do think that there’s a realistic scenario that AI, as you said, if you manage to do all of this, that AI will be able to replace demand planners even in the judgmental adjustment steps. But we’re not there yet. So looking forward to seeing what your views are.

Conor Doherty: Well, thank you, Sven. Alexey, your thoughts, please.

Alexey Tikhonov: Thank you. My proposition will be radically different. I think first we need to expand the scope because demand planning exists in supply chain and the goal of the supply chain is to make decisions under uncertainty, to make profitable decisions under uncertainty, under the presence of constraints. As far as this goal is concerned, my take is that demand planning is an obsolete approach and judgmental forecasting interventions are wasteful, even if they help to improve slightly forecasting accuracy. Why is that?

Demand planning is an obsolete approach because it presumes that we need to separate forecasting and decision making. This separation inevitably leads to the choice of simple tools because we introduce a human-to-human interface. We have to convey information in a very simple manner and we choose point forecasts because everyone can understand point forecasts. Accuracy metrics computations are simple so we can argue about those points, we can adjust them up or down. But unfortunately, this choice of tools prevents us from making profitable decisions.

To make a profitable decision, we need to assess financial risks and financial returns. We can only do that if we capture the structural pattern of the risk. There’s only one tool, as far as I’m aware, that does that. It’s called probabilistic forecasting, where instead of single point predictions, you come with an educated opinion about how all possible futures look like, what are the probabilities for different futures.

I’m not just talking about demand. There are other uncertainties, for instance, there might be uncertainties of lead time that you need to take into account. This is especially relevant if you’re dealing with goods shipped from overseas. Then you might have yield uncertainty if you’re dealing with the production of food. You may have return uncertainty if you’re dealing with e-commerce. So there are multiple sources of uncertainty and you need specific tooling called probabilistic modeling to combine all those uncertainties to be able to derive decisions at the later stages.

Demand planning offers us only one version of the future which is a point prediction, which is the most likely scenario. But we are interested in the tails of this distribution because risks are concentrated at two extremes. Then, demand planning perspective inevitably leads us to considering only one decision option. You have one point forecast, you apply either a safety stock formula or simple inventory policy, you derive a decision. But is this decision profitable? What if I change it up or down? How does my expected profitability change? I cannot do that because my predictions are point predictions. I don’t state any probabilities about those scenarios.

The third problem with demand planning is that the economic perspective is completely absent. We are talking about forecasting accuracy in percentages or if we use various metrics that deal with accuracy in units, we are missing the financial perspective. So what we need is to estimate the expected cost, expected rewards and we also need to estimate secondary drivers such as how do we compare inventory decisions for different products. Every retailer knows, for example, that having diapers in stock is much more important than having premium chocolate because if you don’t have the first product then your customers will be disappointed, you will lose customer loyalty.

Demand planning prevents us from extensive automation. What are we going to automate? We are going to automate the production of the endgame decisions, not just the forecast. We need to automate the entire pipeline. We need to convert our raw transactional data into actionable decisions that already respect all decision constraints such as minimum order quantities (MOQs) and other decision drivers such as price breaks.

And last but not least, there has to be ownership. Currently, there is no ownership over the endgame decisions because we have this transition process of forecast going to a different team and then they derive the decision and then you have a blame game. “Oh, we made this decision because your forecast was inaccurate or there was some judgmental increase of the forecast and thus it led us to a wrong decision.”

So to summarize, the demand planning perspective is obsolete and we have something better. Thank you.

Conor Doherty: Well, we’re going to agree today, but not on everything. Can I just respond to that because I think I fully agree with you on some aspects. I think you mentioned point versus interval or probability distribution forecasting, agreed, right. But there’s many software packages out there that have been doing that for many years but practitioners ignore it. But I think we all understand the value of communicating not only the demand value but also the risk associated with it.

SAS has been doing this for a long time, Forecast Pro has been doing that and even Forecast X has been doing that but it’s widely ignored. So why is it ignored? We probably should talk about why demand planners don’t understand interval forecasts. I mean that would be interesting. The other thing that you mentioned which I think is also a good point is that there often is a disconnect between demand planning and inventory planning, supply network planning, production planning which would be beneficial to derive a holistic solution.

But if you’re thinking about large multinational companies, I think the processes are set out that you would actually drive decisions over tens of thousands of employees that are all intervening and they don’t share their knowledge freely. And the third thing, I think we have to think about what is what we consider. I think we all have different backgrounds in looking at demand planning. I come from industry multinationals, you know, maybe multinational transportation companies, fast-moving consumers, pharma. You probably have more of a retailing vision and background where these things can be bundled together.

But what we see, I mean there are very nice books by Charlie Chase of SAS. I think he has written about demand-driven supply planning. It’s about the demand supply reconciliation. It’s as much a process to share information and to share expectations and risk management, exactly risk management over a long-term horizon in S&OP, roughly looking at 6 to 18 months out. S&OE is looking at one to four months out and it’s that information sharing that can also be invaluable, irrespective of what the final one number forecast is.

So I don’t disagree that some of the things we have to come a long way when it comes to the integration of decision-making. But I think demand planning serves more than one purpose. But I think for the interest of things, we’re looking at automation today. Maybe AI is used in the context of automation and we’re looking at accuracy predominantly. But very few people actually look at information sharing. There’s a few interesting papers on robustness of forecast. I mean, what happens if you change your forecast all the time with a very reactive machine learning model and production planning just goes crazy because they sum it up over the lead time and you get overstock or understock and you introduce production orders.

So I think if we can focus maybe on the demand planning process and the contribution of forecasting in demand planning because the other stuff is supply planning and network planning, production planning. But that’s going to make our discussion harder. But I fully agree with you that that would be an important thing.

Alexey Tikhonov: Maybe with a short comment. Yeah, so my take was not that we need better information sharing, better forecast. My take is that we need to automate completely the decision-making part. We should not let supply chain practitioners be involved in the forecasting process because supply chains are immensely complex. We’re talking about companies like, even if you take medium-sized companies like 100 million in turnover, in retail they will have tens of thousands of SKUs, hundreds of stores. You multiply one by another and for every SKU location, you have to make a decision daily.

Even if you have predefined decision cycles such as “I make this decision once a week”, you better still recompute those decisions on a daily basis. Why? Because human bandwidth is limited. You need a computer to be checking like, “Okay, if there’s a spike of demand, I better know it before my usual decision cycle because I expect demand to be more smooth.” So you need to reproduce those decisions on a daily basis even though the vast majority of those decisions will be trivial decisions which is “we are not making a purchase today”.

The human bandwidth is very limited and when we talk about sharing information between humans, we are wasting time. We need to automate as much as possible. We need to be capitalistic. We need to build assets, decision-making robots. Then only we can get profitability out of supply chains.

Nicolas Vandeput: Would you mind just going back to my slide for a minute? I’d like to structure that. So I’ll start from my framework and I would like to make it even more extreme to continue the discussion from there. So I think we all agree that we only make forecasts because we want to make great decisions. And we would say, well, maybe this type of forecast would be better, this type of forecast, this granularity, this horizon, point forecast and so on. But we all agree that we make forecasts because we need at some point to make a decision.

Now why are we so inclined towards automation, all of us here? It’s because we have to make so many forecasts and so many decisions at scale and we don’t want variability. If you have a human, you have an issue of variability. You have an issue that there are so many forecasts and decisions to be made and so on. Now the way I see forecasts, again, it’s only about how much information can you process and how good are you at processing this information. So I want to have a model that is as good as possible at dealing with as much information as possible.

The technology we have might change in 10 years. We could have a single model that can cope with all available information in the world. Let’s take a very simple example, the COVID-19 pandemic. Let’s imagine it’s mid-March 2020. If you have a forecast engine, even with the best learning technology we have today, you know as a human COVID is going to get there and the state of the world and the city will change in the following weeks. Your model is not aware of that.

Now, you could have a point forecast, you can have a probabilistic forecast, but you as a human, you still need to enrich and review that because you have access to information that your model has no access to. So for me, the discussion of whether it should be a point forecast has no interest in this discussion because the conclusion is still the same. It’s about how much information can you feed to your model as much as possible.

And if you cannot feed certain information to your model, then it’s time for a human to enrich that. And that’s why it always makes sense to have a single last human who’s able to review some decision or some forecast based on some information that the model cannot process.

Sven Crone: I think we’re talking about different industries here. When we’re looking at demand planning, I’m fully with you that if you’re in a retail space and you have tens of thousands of decisions to make on a daily level, then you need a significant degree of automation.

But we work with retailers in the UK quite extensively for some time and even there, adjustments are made across assortments for things where uncertainty rules like extreme weather effects or the effect of COVID closures on shower gel versus toilet paper in Germany.

But if you’re looking at a pharmaceutical manufacturer for example that has maybe two to 400 core products, they’re quite manageable by a human. I mean, why are all these companies getting by? We did surveys and roughly 50% of the companies use very simple statistical measures. They’re getting by, they’re profitable, they’re growing, they’re agile in their supply chain, they have all embedded S&OP.

So there’s a whole spectrum of problems that we have and I think that’s one of the things that I always love at this conference. All these different forecasting flavors come together. We have people that are showing us electricity load for smart meters. Yes, you have hundreds of thousands of smart meters with minute-by-minute forecast, there’s no human intervention feasible.

But if you have very few important items with, you know, that you understand really well, let’s say in a pharmaceutical company, you know we looked at vaccines for example that are well understood. I think so there are different flavors of demand planning.

What we’re doing in forecasting is as heterogeneous as products and markets are and that’s the beauty of it. We have that’s why we all sit at the bar and we talk about forecasting, we talk about completely different things. But I’m fully with you, so if we talked about retail space, I’m sure that automation is feasible.

We agreed to disagree that the world is big enough for all the different software packages as well in order to have specialized solutions. That’s why big companies like SAP have specialized solutions for retail and they will work differently than for consumer goods and pharmaceuticals in industry or for other areas. So I think we do agree, just coming from different points of view.

Conor Doherty: Alex, do you wish to comment?

Alexey Tikhonov: Just a slight comment. I cannot agree with the proposition that people need to intervene in the forecasting process. Where people can add value, they can add value at the inputs of this decision engine to clarify data semantics, to bring more data, to explain to the engineer who is crafting all those algorithms or decision engine, how the data are used in business so he has a better comprehension of the picture.

And then they can add value at the end, checking out the decisions that the decision engine is generating and find out those insane decisions or inaccurate decisions in their opinion and then revisit this numerical recipe to find out what’s wrong. What are the assumptions that are wrong? Why is it making the wrong decision? Because if they intervene in between, they touch forecasting or they override the decision, this is like resources are being consumed instead of being invested. You first and foremost need to look at the ways how can you improve this robotization because if you do it manually, you are wasting time.

And you should start from decisions, by the way, not from the forecast. Because what if I had an MOQ of 100 units and you came up to me and said, “Oh, now I have a better forecast. Instead of 50 units of demand, there is 55.” Well, I still have an MOQ, so from my perspective, both forecasts are kind of irrelevant despite the fact that yes, one is more accurate. I’m still making the same decision and the fact that you invested more resources in producing a more computationally expensive, potentially forecast that is more accurate. So you achieved better forecast accuracy but overall we are in a negative situation because we are still producing the same decision while investing more resources.

So that’s why I’m against touching the steps of the process manually. Revisit the whole recipe in order to improve the automation to make it more robust, more reliable, more sane.

Sven Crone: It’s just a pity we didn’t have the discussion from Robert and Paul beforehand who were probably in the afternoon looking at forecast value added because we have a lot of evidence that on top of advanced statistical methods, judgment can add significant value on top.

And I think the question that was coined here today is, can machine learning overcome the value add of statistics and judgment on top and or is it the same? And then I think the total cost of ownership question is a good one, right? Which one is more efficient?

I’m pretty sure Nicolas has lots of examples that you’ve mentioned before or you reported where you’re actually automating the whole thing and it’s adding more value than actually doing judgmental adjustments. But let’s move on to the next question maybe.

Conor Doherty: Well, actually it’s more of a segue which is still actually a transition of a kind. It’s to push forward to I think actually what is the foundational question because listening to you, there’s not there’s almost like an instrumental disagreement here but there’s an overall agreement.

And it’s actually I’m going to come to you, Nicolas, first. Something you said earlier, you said, and again forgive me if I misquote, I’m paraphrasing, we use forecasting to make better decisions. That means that forecasting is instrumental in a larger process which then raises the question, where does the value reside?

Because if you’re using forecasting as a tool to accomplish something larger, where is the value? Is the value in the tool that builds the house or is the value in the structure of the house? The house would be the decision in that analogy.

Nicolas Vandeput: I think you have two important questions with forecast. You have the first question, is your forecast accurate or not? And again, we could have a debate on how do you measure accuracy, very interesting debates, not the point today. Let’s just imagine first step is just to assess how accurate or good is it.

The second question is, well, based on a given forecast, how good is your company at making the right decision? Now, unfortunately, the team doing these are different teams with different inputs, different outputs, different KPIs.

I would not, from a supply chain point of view, blame forecasters or people making the forecast engine for bad decisions or for decision-related KPIs because they have no action on that. At the same time, you could have very bad forecast and you have the decision maker that either got lucky or extremely good at making good decision despite bad forecast.

So you might get some very nice KPIs, business KPIs. Again, we could discuss what KPI is relevant or not, but we could have it in both in different direction. So for me, when I want to evaluate a forecast, I would not look at the business outcome. I would look purely on forecasting accuracy because I know that business outcome can be driven by so many other things that are totally out of the hands of whoever makes the forecast, machine or human.

Now, what’s very interesting, and I simulated that for a few clients, is that depending on the quality of your forecast, again we could debate on how measure accuracy, but depending on that and especially depending on the bias, for example, are you very positively biased, under forecasting, or do you have a tool that’s properly calibrated, I hope so, the resulting supply chain optimization engine might go for very different policy.

So the optimal policy or the optimal way to make decision might change depending on the quality of the forecast. So it also means that if today you go from a forecast that’s extremely over forecasting everything because your process is so politically biased to a forecast that’s actually machine learning made, you need at the same time to review how your supply chain decisions are made.

People rely on the process tool engine. Historically, the forecast was always 30% too high. So if you change that now, you also need to change the supply process as well. Both need to be integrated. But from a supply chain point of view, I would evaluate KPIs independently as much as I can.

Conor Doherty: Thank you. Sven, your thoughts? Just to reiterate the question again, if forecasting is a tool for accomplishing something else, where do you ascribe value? Is it in the quality of the decision or the quality of the forecast? Define those however you please.

Sven Crone: I think we already touched upon that. For me, demand planning, or going with the Gartner definition of demand planning and the Oliver Wight process, we’re looking at it from an industrial point of view. Take pharmaceutical industries, consumer industries, they’re normally looking at much longer horizons than what you’re looking at. They’re looking at, say, 6 to 18 months out. The expensive decisions are the strategic ones. You identify long-term plans, you identify forecasts, you do gap closing activities, you try to reconcile supply and demand on a much longer horizon. You don’t have promotional information, you don’t have weather information, you don’t have disruption information. So this goes very much into scenario-based planning and end-to-end planning of scenarios that have to materialize into some inventory positions.

But that’s a long-term position. If you’re looking at it, I think the majority of the industry still have a like a three-month frozen period. I mean, that’s so that we’re looking at the time frame outside that. S&OE, on the other hand, which has recently been embraced by Gartner, looks at this very differently. So there, for the long-term horizon, I think transparency, communication, those are important things to brace yourself for, you know, changing budgets, realigning budgets, reconciling volume value. It’s a very aggregate process. You’re looking at high levels of the hierarchy, you’re looking at markets, you’re looking at channels, but you’re not looking at individual products.

Then you’re down in the day-to-day work and that’s S&OE. In S&OE, I agree, you have automated decisions, standardized decisions, accuracy is an important thing, transparency is still important, which is often overlooked, or robustness. But I think 85% of all presentations at this conference are talking about accuracy. There’s this whole session about robustness, which is good to see. But I think the most important innovation to measure the whole process is forecast value add. That is really to take the individual constituents and see how they add value. That’s a very management-focused view because you want to invest resources in where you add value.

Unfortunately, it doesn’t embrace things like master data management. It’s hard to measure that. It doesn’t embrace things like data cleansing, which I think is instrumental in order to get anything done in statistical forecasting. And it looks at the very end of this towards the human. But we have a hard time measuring value add because as a forecast accuracy, typically cost-weighted volume weighted just to see on aggregate, are they making the best decisions with error metrics that academics would not touch. But what they’re looking at really is what is the value add. But this, we have a lot of demand planners that actually make a choice for a statistical algorithm in the tool. They can overwrite the choice, but that value is attributed to the statistical algorithm. So I think there’s a lot of value in judgment today and it can be measured. And I think the majority of companies have taken that on and value add is a good tool to that. And again, I’m looking at the long term and much less at the short term where inventory performance should be linked and so forth.

Conor Doherty: Okay, thank you.

Alexey Tikhonov: I think when we talk about forecast accuracy changes, improvements potentially, and we relate to forecast value added process, I think we are confused in the terms because the better name would be forecast accuracy added, not value. Because value, when we talk about value, for me, it triggers a financial perspective immediately. And the financial perspective is driven by decisions. The results of your business depend solely on the decisions you make and how well you execute these decisions. So there’s nothing else involved in this perspective, only decisions and how well you execute them.

When we consider decisions, like for instance, I have a forecast number one, it leads me to making decision A and forecast number two, it’s more accurate. I know about how much, I know the accuracy difference. But the question is, does this different, more accurate forecast lead me to a different decision? If not, then it’s wasteful, despite being more accurate. So despite better apparent value, we have net negative from the financial perspective. And then if the decision is different, how do we evaluate that this decision discrepancy, like the change, like swapping from decision A to decision B, how much more profit we gain in comparison to the difference in the resources investment? For me, it’s an open question. We can use both forecasting models to evaluate the potential returns of both decisions, but it’s still a matter of speculation because we don’t have two alternative universes to test out two decisions. We finally have to pick only one.

And even more, decision value does not solely depend on the forecast per se. There are other considerations such as decision drivers. Case in point, we work with Air France, their MRO wing for spare parts for aircraft. And just recently, we’ve implemented a decision engine with them and just recently there was a large purchase triggered by a robot purchasing several dozen, I believe, auxiliary power units for an aircraft which is overall several million euros, triggered by a robot. And people were like, “Oh, it should be a mistake.” But when they started to inspect, it turned out that somebody on the other side made an error and set the price quite lower than the average price on the market. And the robot picked up on those and immediately implemented execution orders. It has nothing to do with forecast accuracy, but this decision has enormous value.

So you see, we are focusing on forecast accuracy, but there are so many other things that can influence the value of the decision. So I think when we talk about monetary value, we should not only focus on the forecasting. We should look at the entire process of how we’re deriving decisions, what goes into consideration, what is our decision process, which things are important, which things are of secondary concern.

Sven Crone: Just to add, I do agree, and you mentioned that before in the first round, that there should be an inventory decision and then you should evaluate it on inventory decisions. And I think to be fair, in academia, the majority of people have, or the whole range of journals have taken this as good practice or as minimum practice. So if you’re looking at the journals in the production economics and ISER, the inventory society, that is good practice. You’ll see quite a lot of presentations from academics here that actually measure the decision cost on trade-off curves over service level and lead times and actually do give the inventory cost associated with it. That is something that I don’t see in practice at all.

It’s very hard to do in practice, it’s hard to do it, but it’s possible. You have to make assumptions, of course. The supply chain is complex, but I fully agree with you. The decision cost should be good. But that goes back to Granger 1969, asymmetric loss functions. We typically don’t have the decision cost, so we have to do, we have to assume something. What I see as a huge gap, and maybe an omission, what this community has not been able to achieve, is to establish the link between forecast accuracy, however you want to measure it, and the decision cost associated with it.

So we actually had a research project. There’s very few companies, Johnson and Johnson has released that in the past. So one percentage point in forecast accuracy, typically it’s weighted cost-weighted MAPE, equates to 8 million US dollars in finished goods inventory and distribution center expedited production cost and so forth. So they had a whole line how they established this. We recently presented something where we did a bottom-up simulation with TESA. There are some calculators out on the web that I don’t fully trust, but I think that’s an important thing, that forecasting errors are costly, purely from the decision on safety stocks and inventory in the immediate next step, not even going to production planning and raw material sourcing and long-term decisions.

So I think that’s a real omission. That’s why I think demand planning teams in companies are still too small. If they knew how expensive their decisions were, how valuable forecasting is, we would see a lot more resources there. But, and by the way, TESA, it was roughly, of course, it depends on the company size, it was, we’re not allowed to say the number, it’s one Bugatti Veyron. Bugatti Veyrons have a pretty rigid price, so it’s 1.5 million per percentage point accuracy in inventory, direct translations. And we’re working with a few other companies now to establish this, given the inferior inventory models. But this is a really important thing. You’re solving the problem directly and show them the decision cost. But in when the process is decoupled, you can still do that. And I think that’s the missing pin. But I’m fully on board. Inventory or decision cost would be ideal. Inventory is a direct linkage that can be done and must be done and it’s being done by academics.

Conor Doherty: I want to push forward and stitch together some points that have been raised, particularly around assumptions. I mean, assumptions, if that’s an assumption by a demand planner like, “Oh, this is wrong, I need to make a manual overwrite,” that’s an assumption. An assumption goes into building a forecast or an automated model. So, I mean, these are distinctions. Roses by any other name, but my question is, is it reasonable to expect, for management, so people without training in the things that we’re talking about today, to have the same level of trust in automation, like an automatically generated forecast, versus a forecast that’s gone through your desk, for example? And I’ll, sorry, or Nicolas’ desk, or Alex’s.

Nicolas Vandeput: If you don’t mind going back to my slide, the most common question, if I summarize it in just a few words, is “How can I trust machine learning?” And you could change machine learning by statistical tools. I like to change this question to, “Yeah, but how do you trust humans?” Because people are like, “Okay, Nicolas, how can I trust you? How can I trust your machine learning?” And I’m like, “How can you trust your team?” And that’s, I think, the real question. And there is just, for me, one way to track that and to answer that. It’s called forecast value added. And the idea is really, I’ll try to really explain it in just a few sentences again. You want to track the accuracy of every single step in your process. That could be a human, that could be a machine, that could be information from your client, the forecast coming from your client. Every single step, you’re going to track the accuracy before the step and the accuracy after the step.

As you do that, and I would also advise that you compare the overall accuracy of your process against a statistical benchmark, which could be any simple model you could find for free. As you do that over weeks, over months, over days, depending on your horizon, you can really prove that some part of your process, could be human, could be machines, are adding value and are accurate. This is the only way to do it. And I would even go as far as saying, if you don’t do that, it’s like you don’t have the lights on in the room. You’re in the dark. You have no clue.

And when I’m contacted by companies who want to drive demand planning improvement projects, the first question is, “Do you track forecast value added?” Because if you do not, there is no way you can know if my model is adding value and if we’re doing great or bad. So this is the first step to answer the question, “How do you know if I can trust machine learning?” It’s the same question as, “How do you know if I can trust humans?” And the answer is, you need to track forecast value added.

Alexey Tikhonov: I think switching to the decisions is important. How can I trust a forecast? I don’t know if a forecast is good or bad unless I see what decision it recommends, which decisions I can derive using this forecast. And from that perspective, humans, practitioners, oftentimes have very good intuition. If you produce insane decisions, they will point out, and they will tell you why. Like if your purchase order is too large, they will tell you what it should be, what’s the ballpark for a good purchase, and why they think this way. So, tracking forecast accuracy, yes. Having humans in the loop of like doing judgmental forecast overrides, as I said already, this is wasteful, pretty much, because those interventions, they have very short expiration times.

You may override, it doesn’t last for like yet another year. It will probably have an impact on your next purchase order, but not the one that comes after. So, you involve a very expensive resource, a very unreliable resource, because we should also discuss, we don’t have time, but we should also discuss what is the process behind those judgmental overrides. They are like semi-quantitative in their nature. There is no rigorous process like for automatically generated forecast where we can inspect and decompose and see what’s wrong if it’s wrong. So, you need to automate as much as possible. And how you gain trust? Well, in the same way like you trust your weather application. If it produces consistent forecasts, if it says there is a high chance of raining and, yeah, most of the time it rains when it says it will rain. Or different technology like spam filters.

Think of when they just were introduced in email clients. We were checking the spam box very frequently because the percentage of misclassified emails was quite high. Nowadays, I only go to spam box when I know a person who is not yet in my contact list sent me an email and I have not received it. And I go and, yeah, it’s there, and I click it’s not spam, and it will never goes to spam again. You see, so trust is gained over time, and you need a process. We call it experimental optimization, when you fine-tune this decision engine. Once it starts producing the sane results, all you need to do is to track the metrics. Yes, you track forecast accuracy. If it changes drastically, you need an engineer to inspect with practitioners what’s going on behind the scenes. But you should never touch, you should never manually intervene in this decision pipeline. You want to fix what’s broken and then let it run, pretty much like you do with machines. You do maintenance, and then you drive your car.

Sven Crone: Yeah, I think trust is an important question, right? I mean, looking back at the last 10, 20 years, a lot of companies have tried to go live, for example, in the context of forecasting with statistical forecasting. And why is there such a skepticism? Why have so many companies started and stopped it, started and stopped it, started and stopped it? We all know about overfitting. We all know how we run these sandbox experiments with all the future variables where you have to be really careful designing things in a proof of concept, then in a pilot study, running side by side rather than burn this. And indeed, the majority, I mean with so many degrees of freedom and meta parameters and so many leading indicators that can just leak into whatever some decomposition, it’s very, very easy to actually promise a level of accuracy that you do not see materialize afterwards. We promised some accuracy, and then COVID hit, and the management didn’t understand why we were not achieving that level of accuracy. To us, it was clear. To them, it was not. But I’m saying, you know, trust was lost.

Trust, I don’t think, generally, technology acceptance is a topic in Information Systems, right? Technology acceptance is a big problem. There’s whole conferences analyzing how you can convey this. And I think one way is it has something to do with the general acceptance of technology. Like, you know, we’re not, most of us will be skeptical if you get into a driverless car in San Francisco, at least the first couple of times. Maybe not, but in maybe in 20 years, everybody will be happy with it. So that’s what you said, you know, the usability, you see nothing happens and so forth, and that’s how you can build up trust. You also have to communicate things. But I don’t think the answer is explainable AI. Everybody goes on and explains the algorithm has to explain itself to me. I mean, I’ve tried hard to explain what a gamma factor of 0.4 does over over 12 seasonal indices that change over time, right? No manager gets that. But the manager at the top has to have, you know, he has to make the decision finally. Does he trust this significant investment into inventory? Does he trust his team to be working effectively and efficiently?

And I think for that, we’ve lost a lot of trust with statistics, possibly with some inferior software implementations that did state-of-the-art at the time, but they didn’t, some of them over-parameterized in sample one step. Lots of evidence here on model selection. So a lot of these innovations have not been taken up but rather by younger and innovative companies. One step in between that we’ve seen work to build up trust in medicine, for example, there’s an incredible case study on detecting breast cancer from images. And the machine, the algorithm, were significantly more accurate and much higher true positive rate, much lower false positive rate, you know, with incredible cost to individual lives associated with it. And doctors would not adopt it. Just in the 1980s, they didn’t adopt some of the decision processes because they didn’t trust that. They trusted themselves more than others.

Solutions we’re building now, when AI can do outlier correction, but we actually highlight the outlier to the demand plan. AI can do model selection, but we rather highlight the ranking what we think is meaningful. We try to explain what we see in the data, so we think this is highly seasonal and it has a disruption. So this augmented thing also for medical doctors, it basically was rather than giving you a classification, it actually highlighted on the picture where the cancer is probably detected and not only did it give you a true false answer, cancer or not, but it gave you a probability of this being cancer which allowed in time critical situations to sort by probability and only look not at the ones that are clearly cancer and the ones that are clearly not cancer, but to look at the ones that were uncertain. And that’s where the doctors could actually use expertise and suddenly they had a massive acceptance.

So I think it has a lot to do with the design of the systems, the design of the decision process and it’s not all automation because we have ABC and XYZ and new products and ending products, you know, it’s not, you can’t automate, shouldn’t automate everything, but automate some parts with AI, automate other parts with very simple methods that are transparent maybe, and automate others with algorithms that are robust. But I think for now, for the current level of acceptance and skepticism towards technology, although we all love GPT to plan our next birthday party, I think probably augmented AI is a good step to get acceptance and then we can fully automate with AI.

Conor Doherty: Comment on that. Well, just to follow back on that because again, you both raised very good points there. But I just want to tease apart one of the comparisons there. So, I think Alexey, you gave the example of using a weather forecasting app, which is, I mean, meteorology, the foundation of meteorology has been probabilistic forecasting for quite some time. And then you compared it to an autonomous vehicle, or at least indirectly compared it. And I think we’ll take that and just try and paint a question. So, to use Alex’s example, if everyone in the room were told you have to take a vacation next week, your only destination is Bermuda. The weather forecast says it’s going to be a tsunami next week. Are you going to pay your own money? Are you going to financially invest your time, effort, and energy flying to Bermuda? Most people would say no. Now take that exact same perspective, which is finance and probabilistic forecasting, put all those exact same people in a company and say here is a prioritized list of rank adjusted, risk adjusted decisions that was generated by an algorithm. Oh, no absolutely not, I don’t trust that. So again, is it selective lack of trust? Your comments.

Alexey Tikhonov: I could make a short comment. I think the true issue with humans resisting adoption of new technologies in general, when we talk about automation, is fear of becoming irrelevant and displaced. And actually, what happens usually is quite the opposite. Yes, we automate some parts where humans are just not financially efficient. Like for example, we used to have translators for multiple languages to translate our own website because we publish a lot and cumulatively we probably spent like something 400,000 Euros over the several years. And now, whenever we publish anything, it’s translated through LLMs.

We have programs that take markdown page as an input and produce markdown page with all the markdown syntax, short codes, everything remains intact and only relevant parts being translated into other languages. Costs have dropped immensely, like two orders of magnitude, 100 times cheaper it is now. So, should we pay 100 times more to a human translator? No. Now, do we still need human translators? Yes, like if you want to craft a legal document, you better use a human translator because a single word, single comma may cost you a tremendous amount of money. So, do we need human translators? Yes, we still need them, but in different areas and probably there will be more need for human translators in this legal field than it was before.

And the same applies to supply chains. For example, there are entire areas which remain intact because of lack of available human resources. For example, very often when you want to pass an order, you don’t know upfront if there is an MOQ, so you need to retrieve this information. You can use humans, you can use a copilot as an AI, but you still need a human to retrieve some information that is poorly structured to feed it into your decision engine so that it produces a decision that is compliant with the MOQ. So, I think we still need humans just for different types of tasks which will be evolving.

Sven Crone: I think you mentioned an important part when it comes to the acceptance of some of these machine translation techniques because they’ve been around. IBM used neural networks back in 1982, right? So, they have been there, but the translation rate or the error rate was somewhere around 90% identification. So, a human would have to go in and change quite a lot of letters, quite a lot of words. Every 10th word would be wrong and that meant it was unacceptable because it was below a threshold that was deemed good enough.

And now, if you’re getting this accuracy towards not necessarily a human level but you get it above a threshold, then you suddenly have a massive adoption of technology. In forecasting, we’re quite guilty of that because we have seen implementations with careless use of multiplicative models on time series with zeros. And if you have 10 examples over a 100 time series that explode once a year, you have zero acceptance because the trust is gone.

So, you actually have to get to robustness in order to allow automation to take place. So, I think it’s a good point. And we have typically tried to build accurate models rather than robust models that don’t work well. The neural networks always have this issue. Also, I think we’re all biased because we’re pretty adopting to technology, not as much as my younger brother for example who just loves technology. So, there’s an age consideration there as well. But what drives trust? I think in a senior management board meeting, I know one very large software company is actually actively looking at LLMs models and there was a recent decision. I think Eric Wilson of the IBF, The Institute of Business Forecasting, has a blog and they’re quite outspoken about AI not taking over the demand planning process and everybody will keep their job.

But recently there were examples where actually in a boardroom, an LLM model trained on a majority of the knowledge that was fed in, the promotional information, the disruptions, the supply chain, and in the end, there was a forecast and the CEO asked the LLM model why this was the case. And people had different views. Marketing had a different view, Finance had a different view. The LLM model was the only one capable of giving a comprehensible argument why this is the right number. And I think there’s another bias that is there, but if you can tell a salient story about things, people will trust this. So, this also introduces trust, even if it’s wrong.

So, I think being able to argue why that’s the case for a demand planner in a context of a thousand products, you’re sitting there with the CEO to argue comprehensively why you think in six months it’s going to be twice the amount, you don’t remember. You’ve been busy working all month on all these numbers, slicing and dicing them up and down, translating them to value, and then getting top-down per channel adjustments, and then in the end, you have a number that comes out. But the LLM model was able to argue for that and that’s, I think, where we will probably see who’s going to say that an LLM model cannot read all the sales and key account meetings, he cannot get all the funnel information in, he can justify the funnel, he can align that with supply values and can actually come up with an adjustment that is better than a human because it can just handle much more data. I think that’s where we might jump over the trust and then we just go directly to accuracy. But there is evidence that these can provide trust because they can finally explain what’s going on.

Nicolas Vandeput: So, I see an interesting question and then I think we finally found our subject where I would disagree.

The first thing I would like to tackle is change management regarding adoption of machine learning for forecasting. As with any technology, you have people who are totally opposing that and you have some people who are more in favor of it. And I see it on LinkedIn each time I post, I always get a few people from the same side saying this is never going to work, I would never do it. You know what, I just stopped trying to convince them. That’s fine, just stay where you are and I will be working with people who want to make better supply chains.

Now, I’ve seen multiple clients, I’ve seen great leaders, I’ve seen average leaders, I’ve seen poor leaders. To me, if you want to successfully implement an automated process, and we can discuss machine learning demand planning but we could discuss any process, you need as a leader in the room to give a clear vision for everyone in the room what is going to be their role in the future. I’m going to go back to demand planning but then again this would apply to any process. If you say to your demand planner, your job and I pay you to change the forecast and modify and tweak the models, that’s what people are going to do. And it needs to change. It needs to change to your job is to make sure that the data that goes into the demand forecast engine is as good as possible and your job is to find information beyond what’s fed to the model and then based on that maybe enrich the forecast if needed. If you don’t say that, people will keep on modifying the forecast day in day out because they would simply feel that if they don’t do that, they cannot justify their salary. So again, for adoption, it’s extremely clear that we give a clear picture which again relates to my slide with insight-driven review and insight collection and so on, of what people should do.

Now, something I’d like to add is this explainability. I think it’s an open topic and I’m myself growing on that, but I would say that for me, explainability is not required at all. I don’t know how a car works, I still use it and I never try to send an email to Mercedes to say well I will never use it again if you can explain me how it works. I would never do that. I don’t know how the internet works, I don’t know how this thing works, I have no clue, I still do it.

If a supply chain relies on explainability or storytelling or stories to use the forecast and to trust the forecast, you will never be able to scale because it means that your supply chain and your process rely on the fact that some human has some persuasion capabilities to influence other people to use your forecast because they have a good story. For me, you need to trust the forecast because the accuracy, however you measure accuracy, is reliable and over time it has been accurate or the decision has been meaningful over time. You trust things, you trust process, people, models because quantitatively it’s great, not because the story makes sense. If you just go for the story, it’s going to be a failure. I’ve seen myself so many consultants winning projects because the story made sense and then it never drove any value because actually once you do the model, it doesn’t create any value. But the story is nice, so that’s why I would really try to stay away as much as possible from the story.

Conor Doherty: Further comments? There’s no obligation.

Alexey Tikhonov: Few words on explainability and understanding what’s going on, how decisions are produced, how forecasts are produced.

I can only speak about what we do at Lokad. We approach problems with a principle of correctness by design. One of the problems we know people will have is a lack of trust because they don’t understand how things work. That’s why we use what we call a “white boxing” element. Whenever possible, we use explicit models where you understand what parameters mean, instead of some obscured feature engineering. This way, people can comprehend what’s going on. These models are not drastically difficult to comprehend. I invite the audience to watch our submission to the M5 forecasting competition. Lokad team was ranked number one in the uncertainty challenge. If you watch the lecture delivered by our CEO, Joannes Vermorel, you will see that the model is quite simple. You will be surprised how this simple model could achieve state-of-the-art results.

It’s not necessary to use bleeding-edge AI to achieve an extra percent of forecasting accuracy. In supply chain, you want to be approximately right, not precisely wrong. That’s why we choose, for example, probabilistic methods because they can show you the structure of uncertainty, and then when you have economic drivers, you can translate this structure of uncertainty into the structure of financial risks, and you can make well-educated decisions that are risk-adjusted as opposed to just some decisions that are ranked against achieving a service level target.

I think people can comprehend the top-level story, like what do you do and why do you do that. But at the lower level, if they are curious, they can also do that, but it is almost inconsequential once you understand the top level. Once you see the decisions are sane, why are you willing to go down? For instance, typically, people use computers, but they are not interested in memory allocations, like how your random access memory is leveraging the computations. Nobody’s interested in that. Same about computer chips in your car. Yes, you have a robot maintaining the shifting of the gears, but nobody is typically interested in that. It isn’t consequential. It will not make your drive safer if you know that.

Conor Doherty: I was actually just going to ask for closing thoughts. You all seem to agree broadly that understanding the “how” of these methodologies is going to be beyond the scope of most people if they do not have the requisite training. The “what”, like what is happening, be it greater accuracy or a better decision, is comprehensible. But before we go, just maybe 30 seconds to close off. What do you see being the future of demand planners? Because again, I already can kind of guess your response, but in terms of Nicolas and Sven, you seem to, and I don’t want to put words in your mouths, but earlier, you did seem to suggest, “Well, we’re not there yet in terms of complete end-to-end optimization automation.” From your perspective, okay, well then, what is the future of demand planners? Will there be a position for them 5 years from now, 10 years, etc., etc.?

Sven Crone: I think looking at the data availability out there and the technology adoption rate, there’s definitely going to be a job for demand planners for the next much longer than five years. I’m pretty sure. Because also, the pressure to restructure for companies or to innovate, it’s not as big. If you look at all these initiatives for digitalization, right, for most companies, don’t even have cloud storage. I mean, it’s surprising how some of the largest multinationals of Europe have actually been able to operate so well.

So, it’s probably due to the amazing people that are there. But I do see that, but in the long range, I think we’re really risking if we’re not adopting, if the software vendors are not adopting, if you don’t automate, if you don’t support decisions, meaningful decisions like history correction, you don’t, you know, and you have, I do think that explainability is important, not to understand, “This is how a neural network works,” but, “These are the input variables that were fed into this,” and can answer a question, “Have you considered the promotion has been shifted from week 5 to week 12?” I think those are the questions you have to answer. They’re much simpler questions.

But I do think in the long run, with more data becoming available, it will be very hard for demand planners. So because the frequency of decision is increasing, we’re going from monthly to weekly to possibly intra-weekly forecasting to also align with retailers. I see a lot more promotions, a lot more disruptions are happening. There’s so many disruptions that it’s going to be practically, it’s going to grow incredibly hard for planners to tackle so much information in such a short time frame. And therefore, I don’t really think that in the long run, they will be able to compete on the level of accuracy and reliability with machine learning models if all the data is there.

Conor Doherty: Thank you, Sven. Your closing thoughts, Nicolas.

Nicolas Vandeput: To summarize in just one minute, what’s the role of demand planners in the coming years, how it’s going to evolve. For me, these are people who are going to spend most of their time collecting, gathering, structuring, cleaning data, information, and insight, feeding most of it to machine learning models. It’s going to be automated to forecast demand and the information, the insights that cannot be fed to machine learning models will still be used by these planners to do manual enrichment on these forecasts. But these planners will not spend time flagging outliers, manually correcting outliers. They will not spend time tweaking models, reviewing, fine-tuning the parameters of models or anything like that, or even selecting models. For me, these tasks should need to be 100% automated. Humans should not do that. The planners will focus on finding, collecting, and cleaning information and insights.

Conor Doherty: Thank you. And Alexey, your closing thoughts?

Alexey Tikhonov: I think that currently, demand planning is occupying a niche of software products called systems of intelligence because there are typically three types of enterprise software: systems of records, those are ERPs and different other transactional systems; systems of reports, these are business intelligence applications; and systems of intelligence. This is an emerging field. Those are the systems that can automate decision-making, such as one of those is that Lokad is delivering to its clients. And currently, demand planners are trying to compete with this field.

My understanding is that in the long run, they cannot compete, they will lose. Why? Because humans are great creatures, they are super smart. They can, if we consider a single decision, they can outcompete a robot because they will always come up with a greater insight, something that a robot is not aware of, like some additional information. But this is not scalable. Humans are costly. We’re talking about supply chains of immense scale, so we cannot scale this up. And that’s the primary reason why, in the long run, they will be displaced. For the same sort of reasons, like in Paris, we don’t have water carriers anymore, we have running water. Why? Because it’s cheaper. Yeah, there are still some undeveloped countries where, in small villages, you still have people carrying water in buckets because, due to the economies of scale, running water is not an option yet. But even in those villages, at some point, they will have running water. So in the long run, they have no place. And at the moment, some companies already got rid of those.

Conor Doherty: Thank you very much to everyone on stage for your insights and your answers. At this point, I’ll turn over. Does anyone have any questions? And I will sprint and pass over the mic. Of course, it’ll be right at the back. Okay, shouldn’t be so many rows. Robert, so whose hands were up?

Audience Member (Bahman): Thank you, everyone. My name is Bahman. I’m from Cardiff University. I just want to make a very short comment. You mentioned about profitable decisions. I just wanted to highlight, and actually, this is a point about what Sven also mentioned about the spectrum. There are thousands of supply chains that are not about making profits. So, I think that’s important to consider.

My understanding is the panel was more focused on the supply chain, but there is a whole spectrum of demand planning. If you think about it, there are millions of hospitals in the world doing demand planning, and they’re dealing with one or two or three time series. So, my question is more about what consists of conditions or what are the requirements to create automatic decisions, given that decisions are relying on forecast as one input. There are many other inputs, some of them might be forecast, but the majority probably are not.

Sven Crone: I’ll try to answer that. Hospitals, for instance, have a large stock of replenishment, important products like blood, unimportant, lesser important products, and so forth, you know, cancer treatment medicine, some of this make to order, make to stock. I think we focused very much, so my background is not in hospital or health systems. We very much looked at the industry, industry side, you know, I mean, which industry, supply chain management, logistics, which is probably defined by Gartner, now we’re looking at very large multinational companies that introduce these well-defined processes that have been tried and tested, that measure forecast value added.

I think it’s, you’re right, it’s probably applicable to very many other industries, pharmacies, and hospitals, and so forth. But I have little evidence on the adoption there, but I think, you know, for the logistic supply chain industry, which is roughly one-sixth of the world global GDP, right? So we’re talking about a sixth of the world global GDP that’s mainly driven by very large companies. There we’re really concerned about the lack of innovation with regard to these things. But it does not mean that it should not apply to some other places.

Audience Member (Bahman): I mean, maybe the better terminology would be a service, service supply chain. For instance, in hospitals, you make, you have demand planning for emergency services. It’s about, it’s not about products necessarily, it’s about the service itself. So it is, I think my question more is about the automatic decision-making because as I, there is a spectrum, you know, from emergency departments, they are dealing with one time series actually, and this is maybe where you don’t deal with millions of thousands of time series. So the question is, what are the requirements for creating an automatic decision-making?

Sven Crone: I think you’re right. I mean, it’s a very interesting, there are many interesting areas that we in the forecasting community have not paid as much attention to as others, right? If you look at the 10,000 papers in neural network forecasting, I think half of those are on electricity, right? But very few are on pharmaceuticals. So it’s a good point. I think we should pay more attention to the important things.

Conor Doherty: Sorry to cut off, Nicolas, you can answer the next question. I do want to hand over to the next question.

Audience Member: Hi, thank you for a very wonderful discussion. My question is more on the role of judgment. So my question is, every expert has got different judgments. So there is a pattern of bias generating from human judgment and there is a pattern of bias generating from the AI or ML models, be it any statistical model. So we have two biases, from the human judgment and the statistical models. So how can we incorporate the bias from human judgment into the statistical bias to reduce the overall bias when we are doing demand planning? Thank you.

Nicolas Vandeput: Thank you for your question. It’s a usual case where I work with supply chains. One of the first things we do is to look at historically how your forecast performed. If they had very high or very low bias, under forecasting or over forecasting, it always boils down to a story. People over forecast because they want to be on the safe side, most likely because the supply process is not good and they don’t really know how to manage inventory. So instead of changing policies or safety stock target, they rely on very high forecast. Maybe they also have very high forecast because they want to be optimistic, they want to fit the budget and so on.

On the other hand, under forecasting might happen because people want to beat the forecast to get a bonus. So usually if your supply chain creates very high bias, it’s an issue of wrong incentive or wrong supply process and you need to disconnect that, retrain people, maybe improve your supply process and maybe make it impossible for some people to change the forecast if they have a direct incentive in making it high or low. That’s for the process part, so we need to stick to people who don’t have incentives in making high or low forecasts.

The model part, if you have a model that generates, over the long run, very high or very low forecasts, too high or too low, I’m not saying that one month is wrong, of course, I’m saying that over multiple periods you have the same issue, that is very likely an issue due to how you optimize your model engine, and most likely due to the fact that the KPI that’s used to optimize the model is not the right one. I would bet that it relies on MAPE, but that’s another subject.

Conor Doherty: Can you give Alexey a chance for a closing thought because we do have to end soon.

Sven Crone: I just want to add something to what Nicolas said. When we’re talking about LLMs maybe taking over judgment making judgmental adjustments, we didn’t really go into that much detail on how that works. But there’s a lot of evidence these days that you don’t have one LLM to train on all the data to come up with one value. You would actually have personas that you would train these on. You would have a supply chain LLM, a finance LLM, a CEO LLM, a marketing and a key account management LLM, all trained on different data. Often these biases come from different costs associated with the decisions for key account versus supply chain. But often you have different information and what they actually see can lead to enhanced decision-making if you actually have agents that actually then converse with each other and argue for a consolidated process.

It’s not uncommon to see good practice in S&OP where they reach a consensus and that consensus is more accurate than the decision of a single LLM. It’s really scary because the biases are there, the decision makings are there and then you have somebody at the end that decides based on weighting the information. It’s ghostly.

Conor Doherty: Alex, the last word to you and then we’ll finish.

Alexey Tikhonov: On the same question, I think that bias is a problem of point forecast perspective. Typically, why you want your forecast to be biased intentionally is because your forecast is kind of naive in terms of capturing the structure of the risk. You predict the most likely future scenario and then you assume that the model residuals are normally distributed, which is never the case. That’s why you introduce the bias that shifts your prediction towards where most of the risk is concentrated. For instance, to the right tail, like you want to forecast the tail probability of not meeting your service level targets and thus, you shift the bias.

When you switch to a probabilistic perspective, you don’t need this bias anymore because what you come up with is an opinion about the future that looks like this future with this probability, this future with this probability. As soon as you train the parameters that capture the structure of the risk accurately enough, then all you need on top of that is an economic perspective like costs, profits and some higher ordered drivers such as those that let you make trade-off decisions. For instance, should I purchase an extra unit of this good versus an extra unit of this good because your budget is always constrained. With a probabilistic perspective, you don’t have this issue because bias is not needed.

Conor Doherty: On that note, I’m aware we’ve now run a little bit over time. For anyone who wants to ask any follow-up questions, we’ll try by the corner of the stage. But once again, Sven, Nicolas and Alexey, thank you very much for joining us and enjoy the rest of the day. Thank you.

Back to Lokad TV ›

PREVIOUS EPISODES