00:00:06 Importance of data for optimization projects and debunking data myths.
00:01:50 Accidental data collection and the challenges of using data from different systems.
00:03:39 The limitations of time series data and the importance of transaction granularity.
00:06:18 The need for better and more relevant data for forecasting.
00:07:26 Practical example: optimizing stock in a retail chain and the importance of transaction data.
00:10:01 The role of transactional layers and data storage in historical data collection.
00:11:38 ERP system transitions and the need for improved forecasting processes.
00:13:37 The drawbacks of data cleansing and the importance of full-spectrum data.
00:15:20 The use of computer systems for supply chain operations and data accuracy.
00:17:31 Importance of considering stock levels and returns in forecasting.
00:19:24 Adapting forecasting approach based on domain-specific perspective.
00:21:46 Understanding the importance of better data and expanding the horizon of relevant data.
00:24:48 Having a clear understanding of data generation and achieving better forecasts.


Kieran Chandler interviews Joannes Vermorel, founder of Lokad, about the importance of data collection in supply chain optimization. Vermorel suggests that companies often collect data incidentally rather than intentionally for optimization purposes, but this data can still be useful for forecasting and optimization processes. He emphasizes the significance of granular data, as aggregating data into time series can result in the loss of valuable information. Vermorel advises companies to work with the raw, transactional data and approach their supply chain issues with a domain-specific perspective. The conversation also touches on the importance of considering factors like pricing, returns, backorders, and stock movements in forecasting processes.

Extended Summary

In this interview, Kieran Chandler, the host, discusses with Joannes Vermorel, the founder of Lokad, the importance of data collection and its role in supply chain optimization. They tackle the myth that data needs to be perfect for machines to work with it and explore how companies can improve their data collection processes.

Vermorel points out that most companies collect data accidentally, as a byproduct of their transactional systems, rather than intentionally for optimization purposes. Systems such as ERPs and point-of-sale devices were initially designed to streamline mundane operations, not to collect a comprehensive transactional history. This incidental data collection, however, can still serve as a foundation for forecasting and optimization processes.

Chandler questions whether there are untapped troves of data within companies that have not been utilized. Vermorel explains that data generated by corporate systems is often complex and difficult to interpret because it more closely reflects the inner workings of the IT system than the reality of the process. When companies attempt to implement forecasting processes, they often extract a simplified version of this data, such as daily or weekly sales. This simplification, however, can result in the loss of critical information about the business and its operations.

The granularity of data is a significant concern, as aggregated data may not provide sufficient insights for effective forecasting and optimization. Vermorel argues that when companies transform their raw data into simplified versions, they lose massive amounts of information that could be valuable for supply chain optimization.

The interview discusses the importance of data collection in supply chain optimization and highlights the challenges companies face in utilizing the data they collect, often incidentally. The conversation emphasizes that perfect data is not a prerequisite for effective forecasting and optimization but acknowledges that there is significant room for improvement in the way companies collect, process, and analyze their data.

They discuss the challenges and importance of working with granular data to better optimize supply chain processes.

Vermorel explains that many companies aggregate their data into time series, which simplifies the data into a single number per day. While this method is easy to work with, it may not be relevant or useful for making informed business decisions. He asserts that better forecasting and supply chain optimization can be achieved by working with data at the transaction level, as it provides more context and insight into the actual business operations.

The interview highlights some of the pitfalls of working with aggregated data, as it can be misleading and cause companies to miss out on important scenarios. For instance, in a retail chain scenario, Vermorel explains how aggregating data can lead to misinterpreting the demand at the distribution center level. By processing data into time series, companies eliminate ambiguity, which can be both advantageous and disadvantageous, as they might inadvertently make incorrect assumptions about their business operations.

The conversation also touches on historical data and how many companies lose valuable information when transitioning between different ERP systems. In the past, preserving data was not a priority, as the goal of ERP systems was to help companies operate more smoothly. Additionally, data storage used to be expensive, leading to the implementation of heuristics to get rid of data in various ways. However, nowadays, data storage is relatively cheap, so preserving data is more feasible.

Vermorel emphasizes that when Lokad works with companies, they often find that existing forecasting processes are not a suitable starting point for supply chain optimization. This is because much of the relevant information has been lost due to the crude projection of transactional data into time series. Instead, he suggests that companies should focus on working with the raw, transactional data to optimize their supply chain processes.

Lastly, the interview touches on the topic of data cleansing. Vermorel asserts that the raw transactional data is already clean enough for their purposes, and that the concept of “data cleansing” often refers to the oversimplification of data into time series, which may not be helpful in understanding the true nature of a company’s operations.

Vermorel begins by explaining that if companies only look at their data in a limited way, like only considering shades of green, their understanding of the world will be restricted. He emphasizes that data should be seen in its full spectrum of colors for a more accurate picture. He also points out that data is not inherently incorrect, but rather a reflection of the company’s processes. Companies need to acknowledge their data for what it is and utilize it to make better forecasts.

Vermorel goes on to say that companies should recognize that their systems were not initially designed to produce data but to operate the supply chain. The fact that companies have invoices, payments, and other documentation is proof that their data is largely correct. However, when it comes to forecasting, companies often overlook crucial factors such as pricing, returns, and stock levels.

Pricing has a significant impact on demand and the supply chain. When companies look at their forecasting processes, they usually find that pricing is absent. This is just the tip of the iceberg, as factors like returns and stock levels are also often missing. Vermorel explains that understanding stock levels is essential because if there is no stock, there will be no sales. Similarly, backorders represent a unique type of demand that should not be treated the same as regular demand.

Vermorel advises companies to approach their supply chain issues with a domain-specific perspective. They should consider what factors are most relevant to their industry and focus on those. For example, in aerospace, the goal might be to minimize aircraft on ground (AOG) incidents by optimizing investments, while in fresh food retail, the focus should be on maximizing long-term customer loyalty by ensuring product availability and freshness.

They discussed instead of focusing on perfecting and aggregating historical data, Vermorel suggests expanding the horizons of relevant data by considering mundane aspects like prices, returns, backorders, and stock movements. He emphasizes the importance of understanding how data is generated to avoid “garbage in, garbage out” situations. Vermorel also argues that better forecasts should be measured in dollars and tied to better decision-making, rather than relying on percentage-based metrics.

Full Transcript

Kieran Chandler: So today, we’re going to discuss if a company already collects data, what they can do to improve and tackle the myth that data needs to be perfect for machines to work with it. So Joannes, if a company’s already collecting data, is there that much more they can actually do?

Joannes Vermorel: Yes, the first thing to understand is that most companies collect data but in a completely accidental way. It was never the intent to collect data; the intent was just to operate. For example, an ERP is not designed to collect data, it’s designed so that all the very mundane operations that are happening all the time in the company can happen with the support of a centralized IT system. Just like when you are at the point of sale in a store, the electronic cash register is just there to get your payment faster. The system was not actually engineered or put in place for the purpose of collecting a full transactional history of all the receipts. Because those systems have been collecting data for ages, companies do end up with a lot of data, but it’s not naturally designed for optimization. So there is a lot of data floating around, and usually, over the last few decades, some sort of forecasting or optimization process emerges on top of that. But it doesn’t mean that there is not enormous leeway for improvement.

Kieran Chandler: If companies weren’t meaning to collect data when they first started, does that mean there’s an entire trove of data which is just left somewhere in storage and hasn’t really been thought about?

Joannes Vermorel: The problem is, usually, data is not really collected intentionally. It’s just an artifact, a byproduct of your transactional systems. It’s not exactly messy; it’s just when you look at the data as it is generated in typical corporate systems, it’s something very alien. It’s not mimicking the real world; it usually has more to do with just the inner plumbing of the IT system than with the reality of the process. As a result, when people in a large organization start a new process, let’s say an ERP process, and they want to have some kind of forecasting in place, they end up with data that is very strange and alien, and has tons of accidental complexities that have nothing to do with the forecasting challenge. Typically, what companies do is extract a very simplified version of this data, so they end up with daily sales or weekly sales, and then they build their forecasts on top of that. That’s where there is a whole range of issues: the fact that this data, once it has been extracted as daily or weekly sales, loses a lot of very critical information. It’s a very lossy transformation that looks simple and reasonable, but actually, you’ve lost a massive amount of information about what is going on in the business when you do that.

Kieran Chandler: But how granular does that data need to be? Because if we’re looking at a company that’s been collecting data for 20 odd years, surely aggregating that data makes things a lot more manageable? To manage, I mean when you aggregate data, you join up having so you typically reformat data so that it nicely fits into something that typically works with time series. And yes, time series is super nice, you know, one number per day, it’s just like that. So you have a series, one number per day, and then you want to drag that into the future. It’s very simple. There are plenty of super nice models that can operate on this type of data, starting from moving averages, we can have something a bit more fancy. But the problem is that it’s not because it’s easy that it’s actually relevant. And that’s the problem: it is very easy to do it that way, but that doesn’t mean that it’s actually relevant for the company.

Joannes Vermorel: That’s the danger. The problem is that people think, “Oh, I need more disaggregated data, so I need to go from monthly data to weekly or from weekly data to daily data.” That’s just changing the timeframe of the aggregation. They would say, “Oh, if we do it better, we’d go to hourly data.” That’s absolutely not the problem. The problem is that when you think of time series data, you’re already framing the problem in a way that is completely different from how the data actually exists in your systems. In your systems, there is no such thing as time series. What matters is to have data at transaction granularity because it can tell you a lot more. If you want better forecasts with better data, it means, in our experience, getting very close to the way things are in your IT system, as opposed to having a super dumbed-down version where all the relevant information has been lost already.

Kieran Chandler: So the relevant information is lost, and basically, the data you’re looking at could be slightly misleading. What are the sort of scenarios that you might be missing out on?

Joannes Vermorel: Usually, it’s things that are so mundane that people even forget about them. For example, let’s look at a retail chain, like groceries. Imagine you have a series of distribution centers, and each distribution center is serving, I don’t know, 20 supermarkets or something. How does it look like, you know, these sort of things? You take the position of optimizing, let’s say, the stock in the distribution center. So what it looks like is that every single day, stores place orders to the distribution center. And when you order, I don’t know, 100 units of something for the supermarket, two things can happen at the distribution center: either they fulfill the order, so they ship 100 units typically the next day, or they don’t fulfill the order. So the store places an order of 100 units and then the distribution center does not send anything. And then the next day, the same store places another order of 150 units.

Now the question is, if you want to acknowledge the demand at the distribution center level for those two days, what’s the demand? Is it 100 units plus 150 units? But that feels wrong because you see, the reason why on the second day the store is placing an order of 150 units is that the order the previous day for 100 units was not fulfilled. So basically, they had to cover both the demand for the day that was not fulfilled plus another day. So then you end up ordering more, but it’s a mistake to think that the demand is 250 units. Maybe actually the total demand should be just 150 units because you should be discarding the initial 100 units entirely. But the reality can be messy.

Kieran Chandler: Start producing a time series, you know this information is lost and all the ambiguity that exists because there is a lot of ambiguities. All those ambiguities are eliminated, and you could say it’s a good thing. Suddenly, I can have my data scientists work on non-ambiguous data. But yes and no, because by removing the ambiguity, the problem is that you’ve already made a statement about how your business operates, and this statement can be super wrong.

Joannes Vermorel: One of the reasons companies aggregate their old data is because they might be moving from one ERP system to something newer. So, is it really useful to have all of that historical information maybe re-imported into the new ERP system? Initially, when we were talking about what we nowadays call “yuppies,” transactional layers of systems that just manage routine operations, their goal was not to collect historical data. When it all started, say late ’70s or ’80s, preserving data was not the goal, and at the time, it was just to basically let the company operate more smoothly.

Due to the fact that at the time computing hardware was very expensive, compared to present time, and especially that data storage was also very expensive, a lot of software vendors did the right thing at the time. They implemented tools of heuristics to basically get rid of data in many ways. I’m not talking about nowadays, I mean most of those heuristics or most of those systems just don’t make any sense anymore, just because data storage is already super cheap.

Kieran Chandler: So, is there anywhere these companies should be doing some sort of data cleansing? Or are you saying they should just take the raw data and leave it as is?

Joannes Vermorel: The data is already clean. The problem is, when you say data cleansing, what does it mean? If I tell you the problem is that you want to have a good, accurate picture of the world, and for some reason you decide that the way you’re looking at the world is only to look at things in some shade of green. So you have a picture that only takes shades of green, and anything that is not green is just going to be black. You’re not going to see it at all. Then, things that are more or less green, you’re going to have shades of green, and that’s your picture of the world.

Obviously, you would say, “I think I need to do some data cleansing; this picture is not very accurate. I need to maybe kind of improve it.” But you need a full spectrum of colors. The problem is not the shade of green; there is no cleansing. Your picture is just what it is. The problem is just if you want to have a better picture of the world, you need the full spectrum of colors.

Kieran Chandler: Therefore, better forecast, the first thing is just to start looking at the company as it is. Data is not incorrect, it’s just where it is, you know. What I say is that producing data was never the first objective of all your systems. Your systems have been put in place so that the supply chain can operate, so that it’s possible to produce, move stuff around, and sell it. So, all the layers that you have are just a reflection of all those processes, which is just fine. The fact that it works and you have things like invoices, payments, and whatnot proves that this data is largely correct. It’s not very correct; otherwise, you would not know what to invoice, you would not know how much to pay your suppliers, and whatnot.

So, I think usually for most companies nowadays, at least companies that have been using computer systems for decades, it is a location in Europe, North America, and actually, in most of Asia at the present time, it’s all in place, it’s already solid. The problem is that when you think in terms of forecasting in simplistic terms, it’s not just about sales. It can be about returns, it can also just be, for example, one of the super basic things people think of, how can we improve our forecasts?

Joannes Vermorel: Usually, when we start looking at those time series, we say, you know what, you don’t even know the price. When we start working with companies and they want to improve their forecast, we just look at their data pipeline that is generating the forecast, and we see that pricing is absent. Obviously, prices have a massive impact on the supply chain. If suddenly you discount all your products by 50%, your demand is going to explode, maybe your profitability is just going to vanish as well, but nonetheless, prices usually have a massive impact on demand and on your supply chain. Most of the time when we look at those S&OP processes and forecasting processes, pricing is absent, but it’s usually just the tip of the iceberg.

Kieran Chandler: We’ve kind of touched upon it before, and people have this focus on demand, and pricing is obviously one thing that they can look at, but maybe perhaps it’s worth reiterating, what are the other kind of things that might be of interest?

Joannes Vermorel: Usually, returns are absent, stock levels are absent. You would think, why do you need stock levels? The answer is, well, because first, if you have stock out, then you’re not going to sell anything just because there is nothing to sell. Maybe you will get back orders, but again, it’s a very specific pattern. So can you really count a back order like a regular sale? I mean, it takes commitment. A back order is basically when the product is not there, so I’m going to ask the vendor to put something on the back order to have it shipped later on, and as a client, I’m willing to have an extensive delay. So again, this is demand, but it’s demand that is not exactly of the same nature compared to regular demand. So if you just say one unit of backorder is just exactly the same as when it just sells, not really, not really.

Especially just to give you an example, if for some reason, it just happens that a large portion of the demand that you have, maybe in B2B, are clients that are okay with back orders and long fulfillment delays if they can get a better price. Then suddenly, from a forecasting perspective, it’s very nice because suddenly you don’t have to forecast anything.

Kieran Chandler: Can you talk about demand and backorders?

Joannes Vermorel: Something on the back order to ship later on, as a client, I’m willing to have maybe an extensive delay. So again, this is demand, but it’s demand that is not exactly of the same nature compared to regular demand. So if you just say one unit of backorder is just exactly the same, that when it just says, nah, not really. Especially, especially just to give you an example, if for some reason, a large portion of the demand that you have maybe is that happen B2B that are clients that are okay with back orders and long fulfillment delays if they can get a better price, then suddenly from a forecasting perspective, it’s very, very nice because suddenly you don’t have to forecast anything. You just see the backorders, and you just orchestrate your supply chains so that when the expected delivery time comes, you have the stuff ready. But you don’t necessarily need to forecast this demand because this demand you know is known in advance because it’s already ordered.

Kieran Chandler: If you want to make the most of the data that you currently have, what’s your kind of advice then to companies?

Joannes Vermorel: The first thing is that you need to take the problem from, I would say, just from a very domain-specific perspective. You know, you need to ask yourself, “I have a supply chain. What really matters?” And the answer is, “Depends.” It really depends on the type of supply chain that you operate. If you’re in aerospace, the question really boils down to, “For every dollar that I invest in my supply chain, how can I avoid the maximum number of AOG (aircraft on ground) incident because there is something missing, and you have an aircraft that is stuck on the ground?” So, usually, the question is, “How can I get the maximum amount of AOG incidents per dollar invested?” That would be the perspective for aerospace. For fresh food, that’s going to be a completely different problem. That would be, “Okay, how can I really maximize the long-term loyalty of my clients because food is completely all about repeat business?” So, what you want is not the service level of one single product is kind of pointless. You don’t really care about that because there are so many substitutes. What you want to make sure is that your loyal clients who are coming to your store to basically buy not one product but a whole basket have a very good experience. And so, if there is something missing, there is always a substitute, and that they can walk away from your stores very subsist satisfied with not only the overall availability but the overall freshness of what they are buying. So, that will be part of the world experience, and again, the question will be per dollar invested, how can you optimize that? The question is really domain-specific. What are the ideas that are the most relevant where you need to really pay attention? Well, it’s only your domain expertise that lets you be a judge of that, and usually, it’s not it doesn’t require advanced data science skills. It’s just so I would say direct understanding of the domain that lets you tell if something is like nice to have or completely critical to avoid making like super dumb decision.

Kieran Chandler: What’s the core message of today?

Joannes Vermorel: So, if we take the goal of this episode, you know, it’s how can I have better forecast with better tail. That would be something that we want to have, and the better data is typically not what you expect.

Kieran Chandler: I mean, it’s yes, you can have data that is much better for your forecasting undertaking, but the problem is, what do I mean by better? Usually, the local experience is that we mean very specific things that are absolutely not what most people expect. First, better data is to have a complete picture of all the things you should be looking at, and usually, it’s not like Instagram or your social networks or weather forecasts. It’s something that is much more mundane, things that already exist in the system. It’s something where many people, maybe people before you, have already decided they were not even worth looking at.

Joannes Vermorel: Well, our message is those data are very much worth looking at. I’m talking about prices, returns, backorders, stock movements, all of that. They do matter, and the good news is they are already present somewhere in your systems. So first, expand your horizon on what you even consider as relevant data. The second thing would be to forget about this idea of data preparation. You need to have an understanding of how the data is generated, why? Because otherwise, you’re going to end up with a garbage in, garbage out situation. Understanding the data is tricky because it’s two things: understanding the software and also understanding the process followed by people that are operating on top of the software.

Usually, the semantics of the data have two parts: it’s half in the head of the person who operates the software and half in the head of the software engineer who designed the enterprise software in the first place. When I say the person, unfortunately, usually it’s many, many people, and the worst case is when those people have conflicting interpretations. That’s where you can have a very messy situation. So you see, expand your horizon in terms of relevant data, nothing very fancy, just basic mundane things for your business, but not just sales. Then, you need to have an understanding of this data.

Finally, if you want to have better forecasts, it comes down to what does that mean, a better forecast? Then we go to the point where people would say, “Oh, it’s better mean absolute percentage error or better mean absolute error,” or better, you know, all sorts of metrics. And again, I would say, if it’s expressed in percents, it’s just not good. It needs to be expressed in dollars. And just like we discussed in one of the previous episodes about decision-first, ultimately, a forecast can only be deemed better if it leads you to better decisions.

Unfortunately, the way you can judge if a forecast is better or not is through the prism of the final decision that you take. It’s difficult, but this is the way you do it. If you just say, “Oh, I have a better MAPE, and the forecast is better,” that’s very wrong, and you’re not even going to be on the path where you end up with, I would say, making things that are mistakes of nationalism, mistakes sort of.

Kieran Chandler: Yeah, okay, I’ll have to live with that, but I’m guessing there’s probably a few IT managers that are going to thank us for this because they’re going to be scrubbing around the archives now. Okay, that’s everything for this week. Thanks very much for tuning in, and we’ll see you again in the next episode. Bye for now.