00:00:00 Introduction of the interview
00:00:47 Nikos Kourentzes’ background and work
00:03:25 Understanding forecasting congruence
00:04:44 Limitations of accuracy in forecasting
00:06:14 Congruence in time series forecasts
00:08:02 Supply chain inventory modeling considerations
00:09:03 Congruence and forecast consistency
00:10:29 Mathematical metrics in production
00:12:08 Luxury watchmaker inventory considerations
00:14:47 Upward fluctuation triggering production
00:16:03 Optimizing model for demand of one SKU
00:17:41 Research in shrinkage estimators and temporal hierarchies
00:19:05 Best models for all horizons
00:21:32 Controversy around forecast congruence
00:24:05 Calibrating inventory policies
00:26:27 Balancing accuracy and congruence
00:31:14 Tricks from temporal aggregation smooth out forecasts
00:32:54 Importance of gradients in optimization
00:35:28 Correlations in supply chain
00:38:10 Beyond time series forecasting
00:40:27 Honesty of probabilistic forecasting
00:42:32 Similarities between congruence and bull whip ratio
00:45:18 Importance of sequential decision making analysis
00:47:27 Benefits of keeping stages separate
00:49:34 Human interaction with models
00:52:05 Retaining human element in forecasting
00:54:35 Trust in experts and analysts
00:57:28 Realistic situation of managing millions of SKUs
01:00:01 High level model adjustments
01:02:13 Decisions steered by probability of rare events
01:04:44 Nikos’ take on adjustments
01:07:14 Wasting time on minor adjustments
01:09:08 Against manual day-to-day adjustments
01:11:43 Company-wide benefits of code tweaking
01:13:33 Role of data science team
01:15:35 Probabilistic forecasts deter manual interference
01:18:12 The million-dollar question on AI
01:21:11 Importance of understanding AI models
01:24:35 Value and cost of AI models
01:26:02 Addressing problems in inventory

About the guest

Nikolaos Kourentzes is a professor in predictive analytics and AI at the University of Skövde AI Lab in Sweden. His research interests are in time series forecasting, with recent works in modelling uncertainty, temporal hierarchies, and hierarchical forecasting models. His research focuses on translating forecasts into decisions and actions, in areas such as inventory management, liquidity modelling for monetary operations, and healthcare. He has extensive experience working in both the industry and the public sector and has authored various open-source libraries to aid the use of advanced forecasting methods in practice.

Summary

In a recent LokadTV interview, Nikos Kourentzes, a professor at the University of Skövde, and Joannes Vermorel, CEO of Lokad, discussed forecast congruence in supply chain decision-making. They emphasized the importance of aligning forecasts with decisions, acknowledging that models may be misspecified. They distinguished between forecasting accuracy and congruence, arguing that the most accurate forecast may not be the best for decision-making if it doesn’t align with the decision’s objective. They also discussed the practical application of forecasting congruence in inventory decision making and its potential to mitigate the bullwhip effect. The role of AI and human involvement in forecasting congruence was also discussed.

Extended Summary

In a recent interview hosted by Conor Doherty, Head of Communication at Lokad, Nikos Kourentzes, a professor at the University of Skövde, and Joannes Vermorel, CEO and founder of Lokad, discussed the concept of forecast congruence in the context of supply chain decision-making.

Kourentzes, who leads a team focused on AI research at the University of Skövde, explained that his work primarily revolves around model risk and model specification. He emphasized the importance of aligning forecasts with the decisions they support, a concept he refers to as forecasting congruence. This approach aims to improve accuracy by acknowledging that models may be misspecified.

Kourentzes further distinguished between forecasting accuracy and forecasting congruence. While accuracy is a measure of the magnitude of forecast errors, congruence describes the consistency of forecasts over time. He argued that the most accurate forecast may not necessarily be the best for decision-making if it does not align with the decision’s objective function.

Vermorel, agreeing with Kourentzes, pointed out that mathematical metrics often fall short when put into practice. He gave examples of how different decisions can have diverse asymmetrical costs, such as selling perishable goods versus luxury items. Vermorel also discussed the ratchet effect in supply chain management, where fluctuations in demand forecasts can lead to irreversible decisions.

Kourentzes shared his shift from focusing solely on accuracy to considering other factors in forecasting. He emphasized the importance of understanding the underlying workings of the models and the assumptions they are based on. He suggested that once a collection of accurate forecasts is found, the most congruent one should be chosen.

Vermorel, on the other hand, shared that at Lokad, they optimize directly for financial outcomes, rather than focusing on mathematical metrics. He explained that gradients are crucial for optimization, as they provide the direction in which parameters should be adjusted to minimize errors. He also discussed the importance of probabilistic forecasting, which considers all possible futures, not just for demand, but also for varying lead times and uncertainties.

The discussion then moved to the practical application of forecasting congruence in inventory decision making and its potential to mitigate the bullwhip effect. Kourentzes explained that congruence and the bullwhip ratio have many similarities, and designing forecasts with congruence in mind can help reduce the bullwhip effect.

The role of human involvement in forecasting congruence was also discussed. Kourentzes believes that human intervention should not be eliminated, but rather guided to add value where it can. Vermorel, however, shared that Lokad no longer allows forecast adjustment by humans, as it led to improved results.

The conversation concluded with a discussion on the role of AI in forecasting congruence and decision-making in supply chains. Both Kourentzes and Vermorel agreed that while AI has a role to play in addressing forecasting challenges, it should not replace all existing methods and understanding the process is crucial.

In his final remarks, Kourentzes called for a shift away from traditional forecasting methods and towards a more integrated approach with decision-making. He emphasized the need to update our way of thinking, software, and textbooks, and welcomed the inclusion of people from various fields in the forecasting field. He concluded by stressing the importance of collaboration and diverse perspectives in addressing these challenges.

Full Transcript

Conor Doherty: Welcome back. Typically, discussions on forecasting center around the idea of accuracy. Today’s guest, Nikos Kourentzes, has a different perspective. He’s a professor at the Artificial Intelligence Lab at the University of Skövde. Today, he’s going to talk to Joannes Vermorel and me about the concept of forecast congruence. Now, Nikos, can you please confirm on camera that I pronounced Skövde correctly?

Nikos Kourentzes: That’s the best I can do as well.

Conor Doherty: Well, then I don’t have any more questions. Thank you very much for joining us.

Nikos Kourentzes: It’s my pleasure.

Conor Doherty: In seriousness, I work at the University of Skövde, the Artificial Intelligence Lab. That sounds very impressive. What exactly do you get up to and what is your background in general?

Nikos Kourentzes: Right, so let me first introduce a bit about the lab and then I’ll go a bit into my background. We’re a diverse team of academics that are interested in AI research. The focus is mainly around data science, but the application space is quite diverse. For instance, as you already introduced, I will probably be talking about forecasting and time series modeling. But for instance, other colleagues are interested in topics like information fusion, visual analytics, self-driving cars, cognitive aspects of AI. That’s the great thing about the team because we have a polyphony of research and then, you know, when you have the discussions, you get a lot of diverse ideas that are going beyond your typical literature. At least I find it a very nice space to be at.

The university is, you know, what I usually say to my colleagues is not being a Swede myself, when you use Swedish names internationally, it can be anything. So, I probably it would be helpful to say that the university, in terms of data science and AI, it has quite a bit of tradition even though its name is not widely known. But, you know, I’m quite happy to have joined the team. As for myself, I’ve been working in forecasting, time series modeling, either with statistics, econometrics, or AI for the last more or less 20 years. I did my PhD in Lancaster University in artificial intelligence. That was in the business school. And my background actually originally is in management. But at some point, I said, okay, that’s quite nice. I know what questions to ask, but I don’t know how to solve them. So then I went and did some work in operational research, hence my supply chain interests, and eventually my PhD in artificial intelligence. And afterwards, I became more interested in econometrics. So, I managed to get myself a bit diverse in the understanding of the time series as well.

Conor Doherty: Thank you, Nikos. And actually, how Joannes and I first came across your profile, firstly how I came across it, was a supply chain scientist who actually follows some of your work on LinkedIn sent me an article where you had written about forecasting congruence and included a link to your working paper on the topic. The thrust of the conversation today will be around forecasting and applying it to supply chain. But before we get into the specifics, could you give a bit of a background on what forecasting congruence is and how that emerged as a research area for you?

Nikos Kourentzes: A good chunk of my work has been around model risk and model specification. Often in time series forecasting, we identify a model and we say, alright, now we go with it. But we don’t really recognize that every model will be wrong in some ways. I mean, it’s the usual mantra in forecasting, we always hear it, okay, all models are wrong, some are useful. But I think we can go further than that because we can start quantifying how wrong are the models. But the other aspect as well, that in the literature often we don’t go that far, and this is changing, I have to say that this is changing, it’s not just me saying that, there are a lot of colleagues who say that, is that we have to connect the forecast to the decision that is being supported.

So congruence came out of these two ideas. I’ve worked with my colleague at Lancaster University, Kandrika Pritularga, who is also the co-author in the paper you mentioned. And we were quite interested to say, okay, if we both come with a point of view that models are in some sense misspecified, so we’re just approximating the demand that we’re facing or the sales depending on how you want to take that, then what is the real cost of that? And forecasting congruence essentially goes into the idea of saying, can we do something better than accuracy? Because accuracy in many ways assumes that you are doing a good job in approximating your data.

And you know, yeah, we’re trying to do that in all earnestness, but we may just be not using the right model. For instance, you may have a software that gives you a selection of X models, but the correct approximation would be a model missing in your model pool. So there is where this all comes as a motivation, trying to connect forecasting with a decision once we recognize that probably our models will be misspecified. So that’s a bit the background.

If I want to be more scientific about it, one thing I should say is that usually, with my colleagues, we always start our research topics with a bit more silly idea. So, you know, we’re doing something else and we say, oh, there is an interesting hook here, let’s explore it a bit further. And often, once you do that, you end up having something that can be a useful idea. Why I’m mentioning that is because I think forecasting congruence, what it offers on the table, is a bit of a different thinking. And that’s why I think it’s originally nice because starting as a joke in some sense, it allowed us to see the whole point from a different perspective.

Conor Doherty: Joannes, I will come to you in a moment about this, but could you expand a little bit more? Again, when you say forecasting accuracy, everyone has an understanding more or less what that means. But when you say congruence or forecasting congruence helps people see things from a different perspective, could you explain a little bit more that distinction so that people understand exactly what you mean by congruence in the context of time series forecasts?

Nikos Kourentzes: Right, so first of all, the name is a bit not the most straightforward and there is a reason for that. What we’re trying to describe with that forecasting congruence is essentially how similar are forecasts over time. Now, this is an easier way to say it, but here are a few problems. Many of the words that one could use for that, for instance, stability, they’re already used already in statistical forecasting, so we don’t want to cause confusion there.

And the other problem is that, as probably will go in the discussion a bit further on, there are technical difficulties in measuring how similar are forecast over time. Because for instance, if you would think about a seasonal time series and nonseasonal time series, they imply something very different as the seasonality would itself impose a difference of the forecast over time. That’s the pattern that you have to manage there. So that’s not the kind of nonsimilarity we’re interested in. And that’s what makes a bit of, if you wish, mathematical gymnastics to define the congruence. But here lies the difference with accuracy. Accuracy, we usually understand it, irrespective of what metric you’re going to use, as a summary of the magnitude of your forecast errors.

Now, we would assume of course that if we get the most accurate forecast, that would imply that we provide the best information for the supported decisions. However, that implies that the supported decisions have the same kind of objective function as the most accurate forecast, let’s say minimize your squared errors. But that’s not the case. I mean, if you think about a supply chain inventory modeling, we may have to think about costs because of order batching, we may have to think about overage and underage cost that may change your position from let’s say the most accurate forecast. We may have to think other aspects like for instance constraints coming from our suppliers or other capacity limitations from production lines or from our warehousing and so on. So once you think about the true cost of inventory or supply chain more generally, then you suddenly see that the most accurate forecast is not necessarily the one that is better aligned with the decision. And that’s really the more interesting point about congruence.

So, on one hand, there is a line of research, and my coauthors and myself have published quite a bit in that direction, evidencing that most of the accuracy metrics don’t correlate well with good decisions. That doesn’t mean they are useless or anything like that, it’s just that they don’t provide the full story. So that a bit pushes towards the congruence. Congruence, on the other hand, tries to say if the forecasts are somewhat not changing too much over time, then probably there is some confidence there from one hand in the forecasts. But from the other hand, it would also be some forecast on which people can plan with some consistency. I don’t have to update my whole planning every forecasting cycle because the forecast will be quite similar. So even if they are not the most accurate forecasts, they are failing in a predictable way that might make decision making easier. And that’s actually what we find in our work as well. We find that the decisions that are supported by more congruent forecasts are decisions that are more consistent over time as well. So there is less effort to take those decisions.

Conor Doherty: Well, thank you, Nikos. And Joannes, I’ll turn over to you now. I feel like some of that probably resonates quite a bit with you. More accurate forecast don’t necessarily translate into better inventory decision-making.

Joannes Vermorel: Yes, I mean exactly. Our general perspective nowadays is that indeed, pretty much all the mathematical metrics, in the sense where you choose a formula and say this is a mathematical formula that characterizes your metric that you try to optimize, when this formula basically falls from the sky or is just made up, even if it comes with good intent, let’s say norm one, norm two, something that has some mathematical properties attached to it, it is usually very underwhelming once put in production for a variety of reasons.

More than a decade ago, Lokad started to evangelize the idea that people should not be doing what we call now naked forecasts. Fundamentally, I support Nikos in his proposition that a forecast is an instrument for a decision and you can only assess the validity of the forecast through the lenses of the validity of the decisions.

And that’s kind of strange because if you have 10 different decisions, then you might end up with inconsistent forecasts to support those decisions. And it feels bizarre, but the reality is it’s okay, even if it’s counterintuitive. And why is it okay? Well, because you have a set of decisions that may have very diverse asymmetrical costs in terms of overshooting or undershooting.

And thus, if you have a decision where if you overshoot, it is a catastrophe. Let’s say, for example, you’re selling strawberries. So strawberries, whatever you don’t sell at the end of the day, you pretty much throw it away. So whatever you overshoot, it’s catastrophic in the sense that it’s an immediate guaranteed loss or inventory write-off.

On the contrary, if you’re a luxury watchmaker and your items are made of gold, platinum and other fancy metals and stones, if you don’t sell them, the stock doesn’t expire. Even if whatever you forge and put into articles comes out of fashion, you can always take the materials back and reshape something that is more in tune with the present desire of the market.

So fundamentally, if you’re doing jewelry, you never have inventory write-offs. You might have some cost to reshape your products, but it’s a very, very different game.

One of the basic problems that is pretty much never mentioned in supply chain textbooks is just the ratchet effect. Let’s say you are doing an inventory replenishment game. Every day you have a SKU, you have a demand forecast and if the demand exceeds a certain threshold, you pass a reorder.

But it turns out that if your forecast is kind of fluctuating, then it means that your inventory is always set, catching the highest point of your fluctuation. I mean, considering, you know, one month for example, if your typical reorder cycle is something like one month, then your forecast is fluctuating during this month. And let’s say every single day, so that’s going to be, you know, 30, 31 days of the month, you just re-run the forecasting logic and you will invariably pass a purchase order on the day where your forecast is the highest.

It’s a ratchet effect because once your forecast is fluctuating up or down, and so accuracy wise it can be quite good to have those fluctuations. It captures the short-term variation nicely but the price that you have to pay is that whenever you trigger a decision, then you’re committed to this decision.

And when you have those fluctuations, what typically happens is that you’re going to capture the upward fluctuation. The downward fluctuation is not so bad, you just delay something for one more day, but the upward fluctuation triggers the production batch, the inventory replenishment, the inventory allocation, the price drop.

Because again, that’s the same sort of thing. If you drop your price and then you have a surge of demand caused by you dropping the price, but you’ve underestimated the demand, and now you thought that you had too much stock, but the reality is that it wasn’t the case. And now that you’ve lowered the price, you accidentally put yourself in an engineered stock-out position.

That’s all those sorts of things where you have those ratchet effects where if you have those fluctuations, you will act, and then the performance of your company will reflect the sort of extreme variation of your statistical model, predictive statistical model, whatever it is. It’s not good because you’re capturing, decision-wise, the noise of the predictive model.

Nikos Kourentzes: May I add something? First of all, I quite agree. But it may help a bit to see also the same argument from the perspective of a time series guy like myself who was brought up to think in accuracy.

Where I eventually changed my mind is because let’s suppose you have some demand of one stock keeping unit, one SKU, and then you find your best model and you will optimize that model on something like a likelihood or minimizing your mean squared error.

Now, the assumption behind doing that is that you have done a good approximation of the model, and typically your error is a one-step-ahead prediction. That’s what we usually do, at least in-sample error we minimize.

If your model is not the correct model, the correct model implying that somehow you somehow know the data-generating process, which is never true, if you would minimize that error, then your forecast would be perfect for all forecast horizons. But that’s not the case because your model is just an approximation.

So suppose you minimize your errors for one step ahead as we usually do, then your model may perform actually very well for this one-step-ahead prediction, but not over the lead time. The lead time requires further steps ahead.

If you would then say, “Oh, I can tune my model to be very good at maybe 3 months from now, let’s say three steps ahead,” well then you end up having the opposite effect. Your model is very good at being tuned at that forecast horizon, but not the forecast horizon that is shorter. So again, on the lead time, you miss out on information.

So what I’m trying to get at with that is that the traditional way of thinking, how we optimize models, will invariably lead to, effectively inaccurate forecasts in the sense that they will always be calibrated for the error that the optimizer is looking at and not the actual decision that we’re trying to support. It has a different horizon.

This is where, for instance, a lot of research in shrinkage estimators or the work that colleagues and I have been doing on temporal hierarchies have helped a bit because these techniques always think about, “Let’s not overfit to the data. Let’s not get obsessed about minimizing some error statistic.”

So, you know, what Joannes described is essentially you can see it from the two perspectives. One is the effect on the supply chain, and the other is the statistical underpinning why you will have this invariably.

Joannes Vermorel: Yes, indeed. At Lokad, our practice nowadays, and it has been the case for quite a while as part of the quantitative supply chain framework, we do a pure financial optimization. So we directly optimize Euros or dollars.

And indeed, these metrics are discovered. We even have a specific methodology for that called experimental optimization, where because the supply chain systems are very opaque, very complex, and so the metric is not a given, it’s a whole topic to discover that.

Now, the interesting thing is about the forecasting horizons and the forecast varying with that. I have been thinking along those lines for a long time, but essentially the latest forecasting competition of Makridakis, M4, M5, M6, they have proven that pretty much the best models are the best for all horizons, no matter which one you pick.

Lokad, we landed in 2020, number one at the SKU level for Walmart, and we were the best for one day ahead, 7 days ahead, everything. For a long time I’ve been working with this possibility that you might have models that perform better at certain horizons.

But if you look at the modern sort of models, the ones like differentiable programming, for example, that modern classes of forecasting models, now it’s pretty uniform. It’s very rare nowadays where we have models that perform better one step ahead rather than six months ahead.

And essentially, there are models that are indefinite horizon, they forecast till the end of time, and you just stop to save compute resources because that would be a waste. But nevertheless, the point stands that in general, the metric that is being optimized should not be assumed as known.

It should not be assumed to be one of the elegant mathematical metrics like log likelihood if you want to go Bayesian, or mean squared error, or whatever that is. That is very nice if you want to prove theorems on papers, but proving theorems and properties of models do not translate in operational results.

It can create a lot of subtle defects in the behavior that are not readily apparent from the mathematical perspective.

Conor Doherty: Well, thank you. Nikos, just to come back to something you said earlier and push forward, you said that you refer to yourself as a Time series guy and that previously you had focused on accuracy and then you said, “Oh, I changed my mind and I moved beyond accuracy or focusing on accuracy in isolation.” Could you actually describe that process? Because that is something that whenever I have conversations about forecasting, it’s quite difficult to convince people to sort of not look at forecasting accuracy as the end in and of itself. I recall even in your paper you said, “The goal of forecasting is not accuracy.” That statement is quite controversial depending on who you say it to. So, how exactly did you go around that journey?

Nikos Kourentzes: Yeah, I mean it is controversial, you’re quite right. But I think it’s an argument that people who are in the time series world are happier to accept than users of forecasts, if I can say it like that. Let me start by first picking up on something you just mentioned on the forecast horizons.

I think this understanding that the models are able to produce good forecasts for all horizons comes in how we compare the models themselves. Like you know, picking up again on the M competitions that you mentioned. This is a useful reading of the M competitions, but all these models are optimized in similar ways. Even if you take a simple exponential smoothing and you change your objective function, how you estimate your parameters, you can actually make it perform much better or much worse on different objectives or different horizons.

So this for me was also a starting point of saying like, well maybe there is something going on here. And this is where for instance I am a bit critical of just using standard… let me rephrase that. When I have to work with doctoral students or Master students doing dissertation, sometimes I ask them to do the implementation in the hard way rather than pick up a library and do it because I want them to understand what’s really happening under the model. And then is where you can find some of the details and say like, well does this make sense?

One of the things that was mentioned already before is like we do enjoy formulas and expressions that are easy to handle mathematically. I mean easy in quotes, you know sometimes they’re quite involved, but yeah they’re still easy in the sense that you can with the right assumptions, you can still work out the mathematics. But this is where the issue lies for me, that in doing that we do end up having a nice understanding of what is going on under the assumptions and that’s very useful. But we often then forget to say right, what if this assumption now is violated? What if we have model specification?

So this model specification for me is the starting point. Once you introduce that, many of these expressions become problematic. I should be careful here and, you know, being an academic myself, that does not make this research in any way useless. But it’s a stepping stone. We have to understand all the properties and then say right, now let’s introduce model specification.

I have a few colleagues from Spain I have worked with on calibrating inventory policies. And one paper we are trying to get the review done on, this is always a complicated aspect for academics, is exactly trying to do that. It’s trying to say, you know, suppose we have a very simple policy like an order up to policies, this is what we would get if we would assume the model is fine and this is what we will get if we say no, the model is misspecified. Because you can see that there are additional risks in the supply chain, there are additional risks in setting the inventory.

So for me, the moment of saying accuracy is not enough is when I start thinking, well the model is misspecified, what does this additional risk imply? If you think about it in stochastic inventory policies, what we’re saying is that we say oh there is a stochastic risk coming from the demand process, fine. But that’s not the only risk. And I’m not suggesting in any way that I’m capturing all the risks in the way I’m thinking it, but at least the logic is what says it has to be something that is more than one objective of accuracy.

It doesn’t mean drop away that objective, there must be, you know, even if you drop that objective there still must be some kind of correlation between that objective and other objectives. Because if you completely ignore having an accurate, in the wider sense, forecast, then you’re going to be not doing your job well, at least to my experience.

You may switch the objective completely, like for instance in congruence we find even theoretically that there is a connection with accuracy. It’s not a 100% connection, but there is a weak connection. So that doesn’t mean for me then, okay therefore we throw accuracy out of the window. But it’s surely not the end of the discussion. Now if you can replace it with a better metric that still does similar properties or a collection of metrics, great. I’m happy with that. I don’t care if we call the metric like that or like that, or if it’s my metric or someone else’s metric. But I really believe when we go with modeling specification and they imply the risks of that in the process, that we cannot stick with the traditional metrics.

Conor Doherty: Thank you, Nikos. And Joannes I will come back to you in a moment, but I do want to underline a point, well two points. One, I think I misspoke. I should have said accuracy is not the goal of forecasting. I think I said it the other way around. But to follow up on a point that you just made, and it is a key point I think of the paper, is that you’re not advocating, correct me if I’m wrong, you’re not advocating pursuing the most congruent forecast. It’s a mix between accuracy and congruence. Is that a fair reading? And if so, could you please expand on that for someone who might not understand how do you pursue a mix of these two metrics?

Nikos Kourentzes: So I should first stress this is work in progress, so I don’t have the full answer on that. But it seems that a simple heuristic would be something like, once you find your collection of accurate forecasts, then from those you pick the most congruent. Don’t pick the most congruent forecast directly because that might be a very inaccurate forecast, if that makes sense.

So I see these two objectives, if I phrase it somehow differently, there is a region where both of them improve together and then you end up having a trade-off. When you reach that trade-off, then go and weigh more the congruent side.

Conor Doherty: Well that was then going to be the question again. You use the term tradeoff and again that’s something that we focus on a lot of, again the tradeoffs. How do you, and I understand again a work in progress, how do you or how does a company weigh those trade-offs again, accuracy versus congruence? And I know again you’re trying to reduce the jitteriness, the fluctuation between all of the congruent forecasts. But even still, I mean forecast accuracy is simple. We can agree it might be flawed, but it’s simple to comprehend. I just want more accurate, I want the number to go up. But now we’re introducing another dimension. So again, the weighting of that, how does a company approach it is more specifically what I mean.

Nikos Kourentzes: Yeah, so I’m struggling here to give a clear answer because I don’t have the clear answer yet. But maybe I can give an example of the logic.

I made earlier the point about seasonal time series. So when the difficulty in defining congruence as a metric, and this is a discussion I had with some other colleagues who say oh but you could do this or you could do that instead, is essentially the idea of the conditional mean of the forecast. What is that? Suppose that the demand is indeed seasonal, so there is some underlying structure. That underlying structure that is unknown is the conditional mean.

If I would say I want the forecast that is the most stable or the way we call it congruent, in principle that would be a straight line, a flat line. That flat line would carry no information about the seasonality. So the most congruent forecast effectively would be a deterministic forecast that assumes no stochasticity, no structure in the series, nothing like that. So that’s clearly a bad forecast.

So where the balancing act comes is that we want the most congruent forecast in terms of this conditional mean. So we want it to try to be seasonal, we want it to try to be following this structure. But we’re not going to push it enough to say I’m going to try to pick every single detail. So you could say there is some connection with overfitting and underfitting, but it’s not 100% that connection because we can all agree that overfitting is a bad thing.

But when we looked at the same aspect in terms of over congruence and under congruence, it’s easy to show that under congruence is a bad thing, like this flat line we mentioned before. But over congruence is actually not necessarily a bad thing. And the “not necessarily” is where the things get interesting and complicated. The “not necessarily” connects a lot with the points that Joannes has raised before, that there are other aspects in the inventory management in the supply chain we’re interested in. So by having this additional congruence in the forecasts, effectively we’re making the life of the decision makers later on easier. From a statistical perspective, this will not be the most accurate forecast, but they will provide sufficient information for the decision maker to act. So that the following decisions will be financially, or in whatever other inventory metric you’re going to use like for instance less waste or something along those lines, easier to obtain.

I’m being a bit vague here because I don’t have something better than the heuristic I mentioned before to offer right now. This is, as I said, hopefully the next paper will provide the full mathematical expression to say, ah it’s actually a trivial problem. I do not have that yet. So I would say in practice right now what I suggest to people to do is identify your collection of accurate forecasts and on those forecast pick the one that maximizes congruence. So in a some sense a two-step selection, first get a pool of accurate and then go for the congruent one.

What’s interesting is that it turns out that in most of our experiments, this happens to be a model that either is using some sort of tricks from shrinkage estimators or some sort of tricks from temporal aggregation and so on because these tend to smooth out forecasts. I should stress here that there are other colleagues as well who have come up with similar ideas. They can modify the loss function to have, for instance, a term to also try to minimize the variability of the forecast and so on. Where I think the congruence metric comes a bit different is because we try to show also the connection with accuracy, so provide the expressions to say this is exactly where they’re connected, this is exactly where they diverge.

Conor Doherty: Thank you, Nikos. Joannes, your thoughts?

Joannes Vermorel: Yeah, I mean, at Lokad, we take this from a slightly different angle. We go to the radical route that literally dollars of error, Euros of errors, and we assume that the metrics are going to be discovered, so they are completely arbitrary. That’s so brutally in your face to optimize something where the metric is anything. So how do we approach that? Well, it turned out that if the metric is anything, it’s effectively a program, you know, a computer program. You might have metrics that cannot be even represented as computer programs, in mathematics you can invent the sort of things that even escape computers. But for the sake of grounding the discussion, we assume that we are not going into like a super bizarre, hyper abstract mathematical spaces. So, we have something that can be at least computed. So, this is a program, an arbitrary program.

The good thing is if you want to optimize anything pretty much, what you need is to have gradients. As soon as you have gradients, you can steer. For the audience, as soon as you can have the slope, it means that you can steer your parameters whatever in the right direction that kind of minimize whatever you’re trying to minimize. So whenever you want to optimize, get something higher or lower with a specific intent, if you can get the gradients, that gives you the direction in which you should go, it helps enormously.

That’s where Differentiable Programming really helps because Differentiable Programming is literally a programming paradigm that Lokad uses extensively. It lets you take any program and get the gradients, and that’s super powerful. That’s typically how we connect this financial perspective. We are going to discover those financial elements. It’s going to be a messy process, very haphazard, and what we will become is a program kind of weird and just reflect the quirks, the oddities of the supply chain of interest.

We can differentiate any program, so we can differentiate that and then we can optimize based on that whatever model we have, granted that the model itself is differentiable. So that restricts our approach to models that have a differentiable structure, but lo and behold, it is actually the majority. In this competition, the M5, for those Walmart competition, we have basically ranked at the SKU level number one with a differentiable model.

So, enforcing differentiability is not something that prevents you from getting state-of-the-art results. Now, fast forward, that’s just the gist of what happens when you give up on your metrics and you give up on because typically we end up balancing tons and tons of things.

Now another thing is probabilistic forecasting is the idea that we look at all the possible futures, but not just for demand. For example, you were mentioning lead times with possible horizons and whatnot, but the reality is that lead time is varying, you have uncertainty as well.

Even worse, the lead time that you will observe is coupled to the quantity that you order. If you order, for example, 100 units, it might go faster than if you order a 1000 units just because, well, the factory that is producing the stuff is going to need more time.

So you end up with tons of correlations that shape and structure the uncertainty. So the one-dimensional perspective on the time series is insufficient, even if we are talking of just one SKU, because we have to add some layers of extra uncertainty, at least with the lead times, at least with the returns with e-commerce, and so on.

I will be using the term congruence loosely because you just introduced it, but our practical observation, when we went to probabilistic models, was that those models, numerically speaking, were vastly more stable.

That was very interesting because most of those instabilities, incongruencies, whatever, they simply reflect that you have a lot of ambient uncertainty. And you have areas of relatively flattish probabilities. So, according to pretty much any metric, for as long as you have like a point forecast, the model can fluctuate widely.

And in terms of metrics, pretty much any metric that you pick, it’s going to be pretty much the same. So, you end up with the bizarre property that again, if you’re stuck with point forecasts, is that if you have a high uncertainty, high ambient uncertainty situation, you end up with the sort of problems that you can have very, very widely different forecasts that are, according to your metrics, quasi the same.

And thus, you end up with this jitter and whatnot. And it’s when you go to those probabilistic forecasts, you enter a realm where, well, the good model is just going to be one that expresses this spread, that expresses this high ambient uncertainty. And that in itself is much more, I would say, constant.

That’s very strange, but you end up with, we had tons of situations where we were struggling so much to get a little bit of numerical stability, and then when you go to the realm of probabilistic forecasts, out of the box, you have something that is vastly more stable where those problems that were really hurting just become secondary.

So, that’s kind of interesting. And then we can tie all of that with other things. When we go beyond time series forecasting, we have discussed that a little bit on this channel, but that would be a tangent is that most of the supply chain problems come with a lot of coupling between SKUs, coupling between products.

And thus, we very frequently have to upgrade toward a non-time series perspective, a more high-dimensional perspective. But again, that’s a digression upon digression.

Nikos Kourentzes: I completely agree. Probabilistic forecasting is absolutely necessary. I’ve reached the point where when I’m looking at some of the unfinished papers that have been on the back burner for a few years and I see there’s no probabilistic forecasting, I think I need to rework the whole thing. It has to have probabilistic forecasting, it is 2024 now. But here’s the thing, I like probabilistic forecasting, especially the way that Joannes has explained it, because it gives me another way to make the point about model specification.

When you look at the uncertainty around your forecast, we typically assume that this uncertainty is due to the stochasticity of the time series. But a good part of that uncertainty is because of the model being uncertain. You have the uncertainty coming from the data, the uncertainty coming from your estimation, and the uncertainty of the model itself. It may be missing some terms, or it may have more terms, or it may just be completely off. Splitting that uncertainty remains a big problem.

If you don’t split that uncertainty, you will indeed find often that a lot of different models, unless they’re substantially different, will end up masking the uncertainty by their model uncertainty. They will give you higher uncertainty, empirically speaking at least, and a good part of that uncertainty will look as if it’s similar because what it’s trying to tell you is that all of these models are problematic.

You’re not getting to the real depth of having this uncertainty due to the stochastic elements of the demand. I still have not managed to find a good way to solve it and I haven’t seen something in the literature. But at least the probabilistic forecasting is honest about saying, well look, this is your uncertainty. It’s a bit bigger than we thought if you went from the point forecast. That’s a good step towards the solution.

Conor Doherty: Thank you both. It does occur to me that I have both two academics and also two practitioners right here. I think at this point it would behoove me to direct it towards the practical. The entire thrust of what Lokad does, but certainly your paper and your research overall, Nikos, is applying it to inventory decision making. On that note, Joannes, when you talked about the quirks and oddities of supply chain, varying lead times, and the bullwhip effect, all of these concepts, your position, Nikos, in the working paper we’re talking about, was that pursuing forecasting congruence can help deal with or mitigate the effects of the bullwhip effect. Could you sketch that out for people to understand how this idea can help contend with what is a serious problem, the bullwhip effect?

Nikos Kourentzes: I presume your audience is quite well aware of that. The issue I have with a lot of bullwhip effect research is that it’s more about describing it rather than actually providing actions to remedy it. At least coming especially from the time series point of view where we say, oh look, here’s your bullwhip ratio. But that in many ways is just a description of the problem. It doesn’t tell you how do you deal with it once you have measured it.

This is where I’m saying, well okay, if I want to connect forecasting to the decision rather than keep them separate, then necessarily I need to have something that can tell me, well if you go that direction, you’re going to reduce your bullwhip. It turns out that without understanding that in the beginning, if you work out the equations, the congruence and the bullwhip ratio at least seem to have a lot of similarities. This imposition of similarity over periods, or congruence as we simply say it, seems to be aligned a lot with the idea of having a low bullwhip coming from your forecasts. Of course, there are many other reasons you’re going to have a bullwhip.

So if we’re going to use a congruous metric or something similar to that for selecting or specifying your forecasting models, then you can already target a solution that will be more favorable in terms of the bullwhip. Here I think at least since I’m working on the forecasting sphere, I have to recognize that the bullwhip is much wider than the forecasting. Forecasting is just one part of it. There are so many other elements that come into play. But at least for the forecasting, you can design, if you think about congruence and similar ways of thinking, forecasts that are at least favorable towards it.

Joannes Vermorel: When we start touching on the bullwhip, when I said we look at the decision and we optimize euros and dollars, I was actually simplifying. Because the reality is we are actually looking at the sequential decision-making process. And here we are touching on essentially the stochastic optimization of sequential decision-making processes, which was a topic discussed with Professor Warren Powell.

We are optimizing not just the one decision that comes next but all the other ones that come afterwards. We need to have a mechanism to bring back all this information from the future where we have role-played the future decisions that will be generated through those forecasts into the present day. That’s where differentiable programming shines because essentially you have a program that role-plays, simulates if you want, the decisions of the future and you need to be able to gradient it back so that you can reinject those future financial outcomes into the engineering of your present-day forecasting.

The way we typically look at that is that if we go back to the bullwhip, don’t be surprised by the bullwhip. There is nothing in your optimization framework that even acknowledges the euros of cost that it will generate over time. There is nothing that does this sequential decision-making analysis of just repeating the decision over time and see whether you are going to have the bullwhip problems.

The solution is not that complicated. It is just to optimize not just the one next decision we are looking at, it’s all that follows. Implicitly what we are optimizing is kind of the policy. But typically people think of policy optimization as strictly independent from forecasting. They would have the policy optimization that just consumes the forecast. The way Lokad sees that is that no, those things are actually entangled.

The superior forecast comes hand in hand with the superior policy. The two are very connected. There is even some recent paper from Amazon, “Deep Inventory Optimization”, where they literally go away with the distinction entirely. They directly have something that unifies the predictive modeling take and operational research take that are typically separated. They say no, we are just going to do the two things at once and they have a predictive optimization model all at once through deep learning.

That’s very interesting because that’s literally saying that the decision is optimized predictively but the forecast itself becomes completely latent. That’s just another way to look at the problem but that’s very futuristic and creates other problems. But to look at it, we are still having the predictive modeling part and the stochastic optimization part as two stages but two stages that are highly coupled and there will be a lot of back and forth into the two stages.

Nikos Kourentzes: I actually think that keeping the stages separate has its benefits. However, they should not be isolated and there is a reason for it. I completely agree that one should lead the other. I have worked in the past with the idea of having a joint optimization for both inventory policy and forecasting. The paper is out, so the details are there for people to go in if they want to see what’s happening. My concern with this work was that I couldn’t make it scalable. I didn’t have a way to make the optimization in a way that would allow me to handle a large number of SKUs. This could be due to my limitations in optimization rather than the setup itself.

I do think that keeping the two steps separate helps in having more transparency in the process. If I have a joint solution and then suddenly I say your inventory for your orders for the next period should be 10 and someone says well I think it should be 12, it’s very difficult to justify why 10 has more merit than 12. If you understand the forecast and the policy driven by the forecast, you can have a more transparent discussion. “Alright, here’s my forecast, these are the ins and outs of forecast, here is my policy driven by good forecast or potentially even adjusted because of the forecasting options I have or vice versa,” you can say, “If I’m stuck with these policies, maybe only these kinds of forecasting options should be at play.” But then you still have the transparency and saying, “I can see elements of problematic forecasting here, I can see elements of problematic ordering here.”

And the other element that I have an issue with people going completely into obscure optimization or forecasting where you have a very big trust in deep learning. No matter how we do the modeling, at some point humans will interact with the model and the outputs. Research and my experience suggest that if people understand what is happening, their interaction with the model and the numbers, their adjustments they may do to incorporate contextual information will be more successful.

If it’s a very obscure number, this black box, many people tend to say then people will either not know what to do with the number or they will destructively interact with the number. I like to keep the separation because it helps transparency. It composes the problem, says this is the contribution coming from here, this contribution coming from here. So I’m inclined here to agree quite a bit with the approach that Johannes is describing. We have to somehow join the tasks, we have one to lead to the other, but we also have to be able to describe what each step is doing.

Conor Doherty: Thank you, Nikos. I’ll come back to you, but I do want to follow up on a point there. You mentioned human involvement and override a few times. What is the role of human involvement in terms of forecasting congruence? The tendency often is if you’re just measuring accuracy to say, “the model is wrong, I know better, let me intervene,” and of course then you’re just increasing the noise in many cases. How does forecasting congruence as a concept deal with that? Does it involve a lot of override or not?

Nikos Kourentzes: This behavioral forecasting or judgmental adjustments, different names in the literature, I think we still don’t know enough, although it’s a very active area of research. Some papers argue we should eliminate these adjustments because they’re counterproductive or even destructive in terms of accuracy or the end result. The issue with this thinking is that you have to have a metric. If I use mean absolute percentage error, I’m going to get one answer. If I use mean square error, I’m going to get another answer. If I use congruence, I’m going to get another answer.

However, the question then I have is going back to our very initial point of the discussion, which is why I don’t I just stick to accuracy? I mean, same for you guys, you’re not sticking just to accuracy. As long as we recognize that this is important, then obviously we would need to adjust or evaluate the behavioral aspects of the forecasting process or the inventory process with a metric that is more aware than just the accuracy. I don’t think we should do away with the human intervention. I think that there is sufficient evidence that when the contextual information they can use is rich, then they can do better than most models. However, they cannot add value consistently. There are many cases where they just feel they need to do something or they may be overreacting to hype or to information that is very difficult to understand how this would impact your inventory. In those cases, it’s a destructive interaction with the model or the forecasts.

We need to retain the human element because it can add value, but we need to guide when they should add value. It is a time-consuming process. If I can tell the analysts to leave certain tasks to full automation and put their attention on specific actions, then I can also make their job more effective. They can put more time and resources into doing what they are good at better. Congruence comes into this discussion where we’re saying if we have to go beyond accuracy, then in evaluating which steps add value, it can help to discriminate those in inventory setting or in the decision-making setting more generally.

Similar discussion I would make for the orders. Models or policies will provide you some probably good baseline if you’re doing your job well as an analyst. However, I cannot see that this can be universally the most informative number. There will always be some elements, some disruption that just happened this morning in the supply chain for example, something that is difficult to assess. This will not have a problem of if it ages well or not. There is some conflict happening in the world. Typically, there is always some conflict happening in the world. Sometimes it will affect your supply chain, so sometimes it will not affect your supply chain. Sometimes it may put pressures, let’s say on inflation and so on, so your consumers may start acting differently. These are things that are extremely difficult to model.

So this is where I have trust in experts and analysts who have the time to do this, their job properly. And maybe I can finish with that, in terms of the adjustments, saying that research suggests that decomposing your adjustments, that is if you’re going to say, “Okay, I’m going to refine the number by 100,” saying, “Okay, why 100? Because 20 because of this reason and 80 because of this reason,” that correlates a lot with what we were saying before, decomposing if you wish or keeping the two steps of forecasting and inventory distinct, yet not isolated.

Because if you’re going to say, “Alright, I’m going to change my order by x%,” if we ask the person who is doing that, “Can you please explain which part of that is coming because of your understanding of risk coming from the model of the forecasting model or from the supply chain realities?” Potentially, they can come up with a better adjustment.

Conor Doherty: Thank you, Nikos. Johannes, I’ll turn to you. You’re a huge fan of human override, am I correct?

Joannes Vermorel: No, during the first five years at Lokad, we were kind of letting people do forecast adjustment and it was a terrible mistake. The day we started becoming a little bit dogmatic and we just prevented it entirely, results just improved dramatically. So, we don’t allow that pretty much anymore.

So first, let’s consider the role of humans. I mean, people say one SKU and they would think, but that’s not typical. A typical supply chain is millions of SKUs. And so when people say we want to make adjustments, they are actually micromanaging an incredibly super complex system. And so they are literally, it’s a little bit like if you were stepping into the random memory of your computer and you were trying to rearrange the way things are stored in your computers while you have like gigabytes of memory and drive and whatnot. You’re just cherry-picking some stuff that caught your attention and it’s just not a good use of your time.

And no matter how much information you get, the information that you get, you do not get it almost never at the SKU level. So yes, there is something that is happening in the world, but is it something that is at the SKU level? Because if your interaction with a system is tweaking something like a SKU, on what ground do you have this high-level information that it translates to anything that is remotely relevant to at the SKU level? So we have this massive disconnect.

People would think that when they, you go for toy example, I think it’s a realistic situation is just think 10 million SKUs, that’s a baseline for a company that is not even super large. That’s my beef and that’s where Lokad, we have seen that it has massively improved, is that because it’s mostly nonsense. You’re just cherry-picking 0.5% of the SKUs to do stuff and it doesn’t make sense and usually it creates a lot of problems. And more than that, it creates a lot of code because people don’t realize that allowing interaction means that you need to write a lot of code to support that and a lot of code that may have bugs. That’s the problem of enterprise software. People just typically look at this as if it was just the mathematical properties, but enterprise software has bugs, even the one that Lokad writes, unfortunately.

And when you have a large company, you want to have human interaction, you need to have workflows, approvals, checks, auditability. So you end up with so many features that you, that you basically, you start with a model that has like a thousand lines of code, that is the, the statistical model if you wish, and you end up with a workflow that is like a million lines of code just to enforce everything.

So yes, the intent is kind of good and I believe that there is value in human interaction, but absolutely not the typical way it’s produced. The typical way Lokad approaches human interaction is to say, okay, you have something that is happening in the world, yes. Now let’s revisit the very structure of the model. You see, again, of the predictive model and the optimization. And again, the classical stance in the literature is to think of models as something that is given. You have a paper, it’s published, so you operate with that. Lokad, we don’t operate with that. We only approach essentially predictive modeling and optimization through essentially programming paradigms. So Lokad doesn’t have any models, we only have a long series of programming paradigms. So essentially it’s always completely bespoke and assembled on the spot.

And so essentially it’s code, with the right programming paradigms. And when something happens, then essentially those programming paradigms give you a way to express your predictive models or optimization models in ways that are very tight, very lean, very concise. It’s literally let’s reduce those 1,000 lines of code, let’s make them 20 with proper notation if you want.

Then you can actually go back to your code and think, okay, I have something and I need to do an intervention. It’s not at the SKU level, it’s very rare that you have this level information. The information that you get from the outer world is typically something that is much more high level. And so you will typically tweak some sort of high-level aspect of your model. And that’s the beauty, is that you don’t necessarily need to have a lot of very precise information.

For example, if you think, let’s say you’re into the semiconductor industry and you’re worried about China and Taiwan heating up. What you would say is well, I’m just going to take the lead times and I’m just going to add a tail where I will say for example, 5% chance that the lead times double. Normally it’s very long lead times in semiconductor, like 18 months, but here you add out of thin air an aspect, say 5% chance annually that the lead times will double for whatever the reasons.

You don’t need to be precise, you know, in the end it can be a conflict, it can be a series of lockdowns, it can be a flu that closes harbors, it can be any sort of things. But that’s the beauty with this sort of probabilistic approach is that combined with programming paradigms, it lets you inject high-level intent in the very structure of your models. It’s going to be very crude, but also it will let you directionally do what you want as opposed to micromanaging the overwrites at the SKU level.

And the interesting thing is that if I go back to this example where we add this 5% chance of doubling lead times, the interesting thing is that you can literally name this factor. You would say this is our Fear Factor and that’s it. You just say, okay, it’s my fear factor of stuff, you know, of the really bad stuff happening and it’s fine. And that’s where the beauty of it is that once you have that, all your decisions will be gently steered toward this extra probability of a rare event and you don’t have to micromanage the SKU by SKU and do all sorts of things that won’t age well.

And if six months down the road you realize that your fear was unjustified, then it’s very easy to undo that. Why? Because you have code where you have this Fear Factor that comes with a comment that says, this is my term that is the Fear Factor. So you see, in terms of documentation, traceability, reversibility, you end up when you approach a problem through programming paradigms with something that is super maintainable. Because that’s also one problem that we had in the past when people were doing manual intervention, and that was the bulk of actually the cost, it was the poor maintenance of the overrides.

People may sometimes, not always, but sometimes have the proper idea, they put an override and then they forget about it. And then the thing stays and then it becomes radically bad. And that’s the problem because you see, once you introduce override, you would say, oh, but why do you have that? Well, the problem with overrides is that when you’re a software vendor like Lokad, you are going to regenerate your forecast every single day. So people can’t just override your forecast and this is it because tomorrow you’re going to regenerate everything.

And so they need to persist the override somehow. And the problem is that now you have a persistent setting that is going to be there and who is in charge to maintain that? And then you end up with an even more complex workflow to do the maintenance of the overrides, the sort of phase out of the override, etc. And all those things are never discussed in the literature. It’s very interesting, but from a perspective of an enterprise software vendor, it’s just a very painful situation and you end up having like 20 times or even 100 times more lines of code to deal with that, which is a very uninteresting aspect as opposed to dealing with the more fundamental aspect of the predictive optimization.

Nikos Kourentzes: In principle, the position that Joannes takes is a position that I don’t think many people would disagree with, or at least people who have faced both sides. My take is that adjustments don’t have to happen in this way. I don’t have a solution to that yet because that’s a very active area of research. As I said, I know that a lot of people have worked on saying, should we eliminate this type of adjustments or that type of adjustments?

You could also think about the problem in a very different way. Let me try to respond in some sense by picking up an analogous research with one of my colleagues, Ive Sager. He’s in Belgium. We’ve been working a lot on trying to figure out how we can transfer information that exists at the strategic level or at the company level to the SKU level.

So that potentially could give a way where you could say, look, I’m not going to go and adjust every SKU. I completely agree that micromanaging is not a good idea, I mean for your SKU or in general, I would say. But that’s a different discussion. If you let people go wild with their adjustments, most of the time, because of human biases, ownership, and so on, they will typically waste time. Whether they’re going to be destructive or constructive with adjustments is to be seen, but they’re surely going to waste time.

The software side that Joannes mentioned, I have to take your opinion as it is. I’m not in the same area, though I will agree that bugs are everywhere, my code for sure. But I can see that there is a different way that someone could think about the adjustments as a process as a whole.

I don’t think that it’s valuable to go and say, you know, I need now to manage X number of Time series. It would be more like, you know, strategically we make a change in direction or our competitor did X. These are very hard to quantify actions, so it may be still better to say inaction is better than quantifying randomly.

But I can also see that this information is not in the models. So, if I would add to the model some additional risk that the user can calibrate or if I could ask the user, can you come up with a different way of adjusting your output? It still remains a judgmental element one way or another. What is the best way to introduce that judgmental element, I think it’s an open question.

I don’t see the usual way of doing adjustment as being the productive way. It’s not only the aspects of complicating the process that Joannes is mentioning, it’s also that I see people then wasting their time. They get too caught up on that, they say my job is to come up in the office and go through each time series one by one, look at the numbers or at the graphs. That’s not what an analyst should be doing.

Especially nowadays that companies start having data science teams, there is expertise, there are well-trained people out in the market. We shouldn’t waste their time like that, we should use them to fix the process. So, that’s why I think that there is a space for adjustments, but not the traditional way of doing it. I think the research is quite conclusive there, that because of inconsistencies, because of biases, on average, you’re not going to get the benefit.

Conor Doherty: There’s nothing about pursuing forecasting congruence as a metric that precludes the ability to have automation. Automation could still be a part of the forecasting process in pursuit of congruence, yes? Or have I misunderstood?

Nikos Kourentzes: In some sense, you’re correct. My understanding of congruence, as it is defined and as we have seen it empirically in company data, it would actually point out to the user to eliminate all the minor adjustments. Because the adjustments would cause additional fluctuations that would be incongruent. So naturally, it would push towards eliminating a lot of adjustments.

But I’m being a bit skeptical because we would need to understand where we’re becoming over congruent, where the information that the experts would have would be critical. That’s still an open question. But if we think about the usual process that both Joannes and I criticized, congruence metrics would help you to see the problem.

Conor Doherty: So, neither of you is of the opinion that there ought to be manual day-to-day taking every single SKU and adjusting that. That would just be a fatuous waste of money. So there’s total agreement there.

Joannes Vermorel: But that’s a de facto practice of most companies. I agree when you say you want to translate the strategic intent. I completely agree. And when I say I use the word programming paradigms, I’m just referring to the sort of instruments that let you do that. So you, you essentially, you want just like you don’t want people to be bogged down into micromanaging SKUs, you don’t want whoever in the data science team is trying to translate the strategic intent to be bogged down into writing lengthy, inelegant code that is more likely than most to have even more bugs and problems.

For example, you have a probability distribution for the demand, you have a probability distribution for the lead times, and you just want to combine the two. Do you have an operator to do that? If you have an operator, Lokad has one, you can literally have a one-liner that gives you the lead demand. That’s the demand integrated over a varying lead time. If you don’t, then you can Monte Carlo your way out of the situation, no problem. It’s not very difficult. You know, with Monte Carlo, you will sample your demand, you will sample your lead times, and lo and behold, you will do that, no problem. But instead of having something that will take one line, it will take time, and you have a loop. So if you have a loop, that means you can have an index out of range exceptions, you can have off by one exceptions, you have all sorts of problems. Again, you can fix that by having pair programming, unit test, and whatnot, but it adds code.

So my point was, and I really follow you, I think here, you see, that’s the gist you were mentioning. They have a data science team. It’s to displace the fix, and I completely agree with you, is to displace the fix from I tweak a number to I tweak a piece of code. And I think that’s exactly, I think here on that things, we are kind of aligned. If we move essentially the human intervention from I tweak a number and I cherry pick a constant in my system and I tweak that to okay, I’m going to deal with a code and rethink a little bit what is the intent and do this adjustment, then I can approve and that works.

My point was to displace the fix from tweaking a number to tweaking a piece of code. If we move the human intervention from tweaking a number to dealing with code and rethinking a little bit about what is the intent and do this adjustment, then I can approve and that works.

And indeed, if we go back to the waste of time, the interesting thing is that when you tweak the code, yes, it takes a lot more time to change one line of code. It will maybe take one hour, where changing one number is taking like a minute. But this hour just is going to then apply to the entire company. You know, when it’s done at the right level, it’s that you have this one hour of coding that gives you a company-wide benefit as opposed to this one minute on a SKU that gives you possibly a benefit, but just for the SKU.

Conor Doherty: So, you’re talking about the difference between manually tweaking an output, what the forecast said, versus tweaking the numerical recipe that produces the forecast?

Joannes Vermorel: Exactly, there is an information in this world, the basic premise I think is there is an information that is in the news or maybe private information that you have access through the network of the company itself. So, you have an extra piece of information that is not in the model, that is not in the historical data.

So I agree with the statement, and I agree with the idea that yes, we don’t have super intelligence yet, general intelligence yet. We can’t have ChatGPT just, you know, process all the email of the company and do that for us. So we don’t have this degree of intelligence available to us. So it has to be human minds that do this sifting process. And I agree that there is value in having people that think critically about this information and try to reflect that accurately in supply chain.

And I really follow Nikos in the sense that he says, and then data science, because yes, it is ultimately, it should be the role of the data science team to every single day to say, I have a model. Is it truly faithful to the strategic intent of my company? Which is a very high-level question of, do I genuinely reflect the strategy as expressed by whoever is coming up with the strategy in the company? Which is a qualitative problem, not a quantitative one.

Nikos Kourentzes: Let me just add something here because I think Joannes said something that is very helpful for people to understand why we’re critical of the traditional adjustments. He mentioned that it’s not the point prediction, it is the probabilistic expression of that. People adjust point predictions, that doesn’t make any sense in terms of the inventory. We care about the probabilities of the whole distribution.

So if someone could do that, maybe that could actually do something. But nobody does that, and you know, I’ve been working with statistics for, as I said, the better part of 20 years. I cannot do it readily in an easy way. And you know, my inability doesn’t mean other people cannot do it, but all I’m saying is that when you think in probabilistic sense, the information is so abstracted that it’s very difficult for someone to go manually and say, yeah, just tweak it by 10 units. That’s a very difficult process. So in a sense, many people do all these adjustments on the wrong quantity anyways.

Joannes Vermorel: I completely agree. When I said at Lokad we stopped making adjustments a decade ago, it was exactly the time when we went probabilistic. People were saying we need to do adjustments, and then we were showing them the histograms of probability distribution.

We would say, be my guest, and then people would step back and say, no, we’re not going to do that. It was indeed a mechanism to stop people interfering at the wrong level. When they were shown the probability distributions, they realized that there is a lot of depth. I mean, people would think of those parity distributions for a supply chain as gentle bell curves, you know, Gaussian and whatnot. This is not the case.

For example, let’s say you have a do-it-yourself DIY store. People would buy certain products only in multiples of four or eight or 12 because there is some logic in that. So your histogram is not like a bell curve, it has spikes where people either buy one because they need a spare or they buy four or eight and nothing in between. So when you start thinking about, “Okay, should I move the average from 2.5 to 3.5?” But you look at the histogram, and the histogram is like three spikes: one unit, four units, eight units.

Suddenly people say, it doesn’t really make sense for me to try to move those things. I’m not going to move the probability that is currently allocated to four to five because it’s not happening. What I would probably want if I want to increase the mean is diminish the probability of zero and increase the probability of all the other occurrences.

People realize that there is a lot of depth in those probability distributions. There are a lot of shenanigans, just mentioning those sort of magic multiples that exist. That was our observation. We are completely in agreement that when people see those probability distributions, they realize that they’re not going to adjust manually this histogram bucket by bucket. So that there is this reaction of impracticality is real.

Conor Doherty: Well, again, I’m mindful that we’ve actually taken quite a lot of your time, Nikos. But I do have one last question. You work at an artificial intelligence lab, it would seem remiss not to ask you about how AI might fit into the entire context of what we’re talking about going forward. So, be that automation of forecasting congruence with AI doing the overrides, I don’t know, sketch out what you see is the future there, please.

Nikos Kourentzes: That’s a million-dollar question. I can respond in the same way that one of the reviewers looking at the paper had some concerns. The question was like, “Alright, so what? You know, here is another metric, so what?”

And I was saying, “Look, if you have a statistical model that is fairly straightforward, you can work out through the calculations, you can find everything analytically, fine. When you start going into machine learning and especially with the massive AI models we’re using now, this is a very difficult task. So it’s very helpful if we have some measuring sticks, something like that, that can actually make it a bit simpler to figure out what these models are doing.

If, for instance, I have a massive AI model and we can say, look, this one pushes the forecast towards increased congruence, then I may have a way to consider this model in a simpler way. That simpler way is not by reducing the complexity of the model in any way, but sort of understanding how does this affect my inventory, how does this affect my decision-making process, how does this affect my bullwhip assumption mentioned before, ongoing process.

This is essentially how we finish actually the working paper. We’re saying the benefit of this metric is to understand how models that are black boxes may behave. I don’t think we will see going forward models that are not in some way inspired by AI. I’m a bit skeptical when people want to replace everything with AI because some things can be just simpler, more efficient. My concern is not necessarily coming here from the mathematics of the problem or even the data richness and so on. I think these are problems we can resolve. My concern is coming more from a very simple process aspect and the sustainability of the issue.

If I have a massive AI model running that eventually, once I start scaling up everything to that model, starts burning a lot of cloud computing and a lot of electricity, do I need to do all that if I’m going to have just 1% difference from an exponential smoothing? Sometimes I will have much more than a 1% difference, go for it. But sometimes I don’t need all this complication. I can go with something simpler that is also more transparent for the non-AI experts.

AI is a way forward for many of the problems we have. I think in many cases the forecasting challenges we’re facing and especially the decisions we’re supporting with those forecasts are a very good ground for AI applications. But that is not a blanket let’s forget everything we knew, let’s go AI. That is a bit reflected also in the paper. Because as I mentioned before, it’s not the first paper that says, “Oh, let’s modify a bit the objective to not be just accuracy.” Other colleagues have done that as well. The difference is we’re trying to do a bit the algebra to show, “Well, this is really what’s happening once we do that.” So I like when we are able to do this kind of interpretation or get the intuition of this action.

AI is a way forward for many questions, but we shouldn’t forget that it’s useful for us to understand what on Earth are we doing. We shouldn’t just trust blindly and say somehow the AI model will do what I hope it’s doing. I’m not suggesting AI models cannot do really good stuff. It’s just I’m saying, “Let’s not leave it or I hope it works. It should be better than I just hope.”

Conor Doherty: Your thoughts there?

Joannes Vermorel: I think Nikos is absolutely spot on. Just like I was saying that for adjustment, the amount of lines of code needs to be considered. The overhead of deep learning models is absolutely huge and it complicates everything. Few people realize that for many GPU cards, it’s not even clear how to make calculations deterministic. There are many situations where literally you run the compute twice and you get two different numbers because the hardware itself is not deterministic.

That mean that you end up with the Heisenbugs? You know, the Heisenbugs is you have a bug, you try to reproduce, and it disappears. So at some point, you stop chasing because you say, “Well, I’m trying to reproduce the case, it doesn’t happen, so I guess it works.” And then you put it back into production, and then the bug happens again, and you can’t reproduce.

So I fully agree. Simplicity makes everything kind of better, when it’s remotely in the same ballpark of performance. If you have something that is massively simpler, the simpler thing is winning all the time in practice. I’ve never seen any situation where a model that would outperform from a few percents another model, according to any metric, outperform in the real world.

It’s alternative if the alternative is an order of magnitude simpler for kind of the same result in the same ballpark, even if the metric is those so-called dollars or Euros that Lokad tried to optimize. The reason is a little bit strange, but the reason is that supply chains change, as we were mentioning, human intervention.

When you want to step in to change, time is of the essence. If you have a program, a complex model, thousands of lines, it means that just the logistics, for example, a few years back at Lokad, we had dozens of clients impacted by the Evergreen ship that had blocked the Suez Canal. We had essentially 24 hours to adjust all the lead times for pretty much all our European clients importing from Asia.

That’s where being able to respond in a matter of hours, instead of needing a week just because my model is very complicated, is crucial. If you want me to deliver you the fix without introducing so many bugs in the process that it’s just going to undermine what I do, you need a simpler model. I completely agree there is value and there is cost. For those companies that have started to play with GPT4, the cost is very steep.

Conor Doherty: Well, Nikos, I don’t have any further questions, but it is customary to give the final word to the guest. So, please, any call to action or anything you would like to share with viewers?

Nikos Kourentzes: The call to action for me is that we have to step up from the traditional views of forecasting in isolation from decision making. In our discussion context, the inventory and so on, we have to try to see these things in a more joint way.

I’m an academic, other colleagues will have other opinions, Lokad has its perspective as well. I think there is value in all these perspectives because all of them point in the same direction. We have to leave what we were doing a few decades ago, update our way of thinking, update our software, update our textbooks. There is value in doing that. It’s not just about changing our software or whatever, it will actually lead to different decisions.

I welcome the inclusion in the forecasting field of a lot of people coming from Computer Science, deep learning, programming, inventory side because this is now the point where we can actually address these problems in earnest. I don’t want to give the impression that this takes away anything from the value of the forecasting world as a research field. I belong to that world, so I also would like to say that we cannot just get a bunch of libraries, run a few codes and say this is fine.

A lot of the times, when I work with industry or institutes, the value is in getting the right process, addressing the wrong methodology, which is all what the forecasting field can offer. I like the idea of keeping the steps there in the process, but we have to work together to come up with a joint solution. It’s a good space.

Back to the very beginning of the question where I said that I enjoy working with the team at the university. There is polyphony, there are a lot of ideas. I will come up with my forecasting question and other people will say, “What about this? Have you thought of this perspective?” And I’m like, “Look at this, I’ve never thought of that.”

Conor Doherty: Thank you, Nikos. I don’t have any further questions. Joannes, thank you for your time. And again, Nikos, thank you very much for joining us and thank you all very much for watching. We’ll see you next time.