Walmart forecasting competition - post game analysis (with Rafael de Rezende)

July 9, 2020

guest speakers

00:00:07 Introduction to Kaggle and guest Rafael de Rezende.
00:00:39 Rafael’s role and background at Lokad.
00:01:29 Kaggle and its machine learning competitions.
00:04:22 The competitive and collaborative nature of Kaggle.
00:06:55 Impact of collaboration on creativity in Kaggle competitions.
00:08:02 M5 competition and its scope (sales forecasting for 30,000 SKUs in Walmart stores).
00:08:39 Pinball loss function as the scoring metric.
00:10:26 The team members and their specific roles in the competition.
00:12:05 The difference between competition methodologies and real-life applications.
00:14:25 Analyzing the top 10 solutions and their closeness in performance.
00:16:00 Discussing operational costs and maintainability of models.
00:17:47 Importance of numerical stability in real-world scenarios.
00:19:21 Extensibility and real-world constraints in competition models.
00:20:35 Possible improvements and future focus after competition.
00:22:14 Significance of domain expertise and performance comparison to state-of-the-art.

Summary

In an interview with Kieran Chandler, Joannes Vermorel and Rafael de Rezende of Lokad discuss their participation in a Kaggle competition involving Walmart sales forecasting. They emphasize the importance of models being numerically stable, maintainable, and extensible. Despite constraints, their approach led to results close to state-of-the-art, validating their focus on practical, cost-effective, and maintainable supply chain optimization methods. The experience demonstrated the benefits of being both supply chain and data science experts. The team now plans to implement the insights gained to improve solutions for their clients.

Extended Summary

In this interview, host Kieran Chandler speaks with Joannes Vermorel, founder of Lokad, and Rafael de Rezende, Head of Product Development at Lokad. Both guests have backgrounds in supply chain optimization and bring their expertise to the discussion of a recent Kaggle competition involving the uncertainty distribution of Walmart sales.

Rafael de Rezende introduces himself as an industrial engineer with a supply chain background. He has been working at Lokad for the past three years, and his role has evolved during that time. Currently, he serves as the Head of Product Development, leading a team that tackles the “geeky” topics at Lokad. They primarily work on time series forecasting, as well as multi-scale image resolution and the MOQ system, which has been discussed previously on the show.

Joannes Vermorel provides an overview of the Kaggle competition in question and Kaggle itself, describing it as a “very specific subculture.” Kaggle, now owned by Google, is an organization that hosts machine learning competitions in which participants must predict or forecast certain outcomes. Companies typically provide a dataset and offer large cash prizes to incentivize participation. In the case of the Walmart sales competition, there was $100,000 at stake.

The environment in these competitions is highly competitive, attracting hundreds of professionals who are well-versed in the latest algorithms and publications. Despite its niche nature, Kaggle has a significant global presence, boasting over a million registered participants. The competitors are not necessarily researchers themselves but are skilled at identifying the state-of-the-art algorithms for a given problem. They then find minor improvements to gain a slight edge over others and ultimately win the competition.

According to Vermorel, Kaggle winners are typically from North America or Asia but boast a worldwide community of participants. The platform’s acquisition by Google further underscores its increasing popularity and significance in the realm of machine learning and data science.

Vermorel and Rezende both appreciate the sports analogy when it comes to the competition. They highlight the collaborative approach in the event, with participants helping each other improve their skills. At the same time, they recognize the fierce competition that occurs due to monetary incentives and the involvement of big organizations.

The Lokad team was new to this competition, but they had previous experience with intense competitions in supply chain management. Rezende acknowledges that the level of competition in this event was much higher than in previous challenges they had faced.

Some critics argue that the collaborative nature of the competition may hinder creativity, as participants may abandon their own ideas and follow the lead of high-scoring solutions. This flock effect might limit innovative thinking.

From a company perspective, Vermorel is glad that the team participated in the competition, even though winning was not part of Lokad’s strategic roadmap. He believes that such competitions clarify the state of the art in the field without necessarily changing it fundamentally. In this specific competition, the team, led by Rezende, ranked sixth out of 909 teams in a sales forecasting challenge involving 30,000 SKUs in Walmart stores over 28 days.

Vermorel finds it interesting that this competition used a pinball loss function as the scoring metric for the first time, something Lokad had proposed years before. He explains that quantile forecasts come with a purposeful bias to ensure availability of goods in stores, aiming for a high service level. This competition explicitly stated the use of a quantile forecast, which was a first in the field.

Rezende’s team consisted of four people, each with a specific role. They worked on the core model, analyzed data, and built the infrastructure for the competition. As team leader, Rezende focused on keeping everyone motivated and working together.

The interview concludes with Rezende comparing their approach to a sports analogy, implying that they may have taken a unique or strategic approach to the competition.

The conversation revolved around the differences between competition models and real-world applications in supply chain optimization.

The participants compared competition models to Formula One cars, which are fine-tuned for specific tracks but not practical for everyday use. They noted that the methods used in competitions are computationally expensive and not always suitable for real-world scenarios. For example, the top ten winners in a recent competition took about 10 hours to process a small subset of Walmart’s data, which would be impractical for real-world operations.

Vermorel and de Rezende explained that Lokad took a different approach by using a theoretical framework similar to their daily operations, making only minor adjustments for competition purposes. They emphasized the importance of being supply chain professionals first, using their experience and intuition to inform their decisions.

The interviewees also pointed out that the top 10 solutions in the competition were numerically very close, with only small differences in their performance. They identified three key concerns that make competition models different from real-world supply chain solutions: operational cost, maintainability, and adaptability to imperfect conditions. Lokad’s approach, in contrast, focused on minimizing compute cost and ensuring maintainability, while also accounting for real-world obstacles and imperfections.

Overall, the discussion highlighted the need for practical, cost-effective, and maintainable supply chain optimization methods that can be applied in real-world scenarios, rather than purely theoretical or competition-driven approaches.

They talk about the importance of having models that are numerically stable, maintainable, and extensible. Numerical stability ensures that the models can handle imperfect data and not produce wildly inaccurate results. Maintainability means that the model can perform well even under less than ideal conditions. Extensibility allows for additional factors, like stock levels and future promotions, to be incorporated into the model.

The team participated in a forecasting competition that emphasized the importance of domain knowledge in supply chain optimization. Despite having constraints unrelated to forecasting accuracy, they were able to achieve results close to state-of-the-art. The challenge validated their approach, proving that their models are indeed competitive while being lean, maintainable, and extensible.

After the competition, the focus now is to bring the insights and improvements gained to the whole Lokad team, making sure they can be implemented quickly for their clients. The experience also highlighted the benefits of being both supply chain and data science experts in the competition, as opposed to solely data science experts.

Full Transcript

Kieran Chandler: Today we’re delighted to be joined by one of our colleagues, Rafael de Rezende. He’s going to talk to us about a recent M5 competition that looked at the uncertainty distribution of Walmart sales. So, Rafael, thanks very much for making the trip down the corridor to join us.

Rafael de Rezende: It’s great to be here. Perhaps I can just tell you a little bit more about myself and my background. I’ve been working here at Lokad for the past three years. I have a supply chain background, I’m an industrial engineer, and during my time here at Lokad, my role has been changing quite a lot. Right now, I’m the Head of Product Development at Lokad, and my team and I take on the very geeky topics of Lokad. We work on time series forecasting, image resolution, and MOQs, which I believe have already been talked about here in the show.

Kieran Chandler: Great, and Joannes, today we’re going to be talking about a recent Kaggle competition that was all about looking at the uncertainty distribution of Walmart sales. Perhaps you could give us a bit of an overview of the actual challenge and Kaggle itself?

Joannes Vermorel: Yes, and maybe about Kaggle itself. It’s a very specific kind of subculture. Kaggle is a fairly large organization that has been acquired by Google. What Kaggle organizes is machine learning competitions where you have to predict or forecast something. To set up a competition, you need a dataset to forecast or predict something, and you need large prizes. For the competition we’re talking about, there were $100,000 in prizes. It’s a very competitive environment, with hundreds of people who are complete professionals at it. It’s like a high-level sport in a way.

The community that wins Kaggle competitions typically aren’t researchers but are very good at figuring out what the state of the art is. They look at all the things that get published all the time and pinpoint which one is going to be the state of the art. Then, they have to do some extra magic on top of it to gain a small extra percentage of accuracy that lets them win. They need to find a tiny thing that gives them a tiny edge, and they will be ahead of the rest. It’s a very specific subculture and it’s massive. Kaggle has more than 1 million registered participants from all over the world, even if the winners are usually from North America or Asia.

Kieran Chandler: I mean, it was purchased by Google, so it’s definitely on the rise.Joannes mentioned the analogy of it being like a sporting event, where you have data scientists competing but also collaborating, which I think is a really nice analogy. So let’s talk a little bit more about the challenge itself. What were the key challenges that you faced and who were you up against?

Joannes Vermorel: I think the sports analogy is really good. I mean, Kaggle does feel like a sport indeed. What is nice in Kaggle is the very collaborative approach where people help each other and spend considerable time helping others get better. This happens at the same time as the brutal competition because there’s money involved and there are big organizations either sponsoring or watching closely what you’re doing. We were Kaggle newbies, but our team had already been in some sort of competition before, with clients challenging our solution in supply chains. However, it was not at the same level as Kaggle. In Kaggle, we had 900 teams, while before we might have had competition against two or three other teams. Everybody was really trying to help each other even more than in Kaggle.

Rafael de Rezende: One interesting thing about Kaggle’s collaborative side is that many people criticize it for possibly hindering creativity. What often happens is that some people publish a solution that scores well in the beginning, and suddenly many other teams start following it, abandoning what they were doing before. So there’s this flock effect, and everybody goes to the same side of the corner. The collaborative notion is beneficial, but I have to agree with those who say it hinders creativity from time to time.

Kieran Chandler: What about it from a company perspective? You’ve got these guys working on their free time. What do you think of that?

Joannes Vermorel: I’m very glad that they did it. It has never been Lokad’s strategy to try to win these competitions. I did a few with much less success during the early years of Lokad, but I realized that these competitions don’t necessarily fundamentally modify the state of the art. They clarify what the state of the art is, which is very good. For example, this competition where Rafael’s team ranked sixth out of 909 teams was intended as a demand forecasting competition but turned out to be a sales forecasting competition due to not properly factoring out stock-outs. So they were forecasting sales, not demand. It was a demand forecasting competition for 30,000 SKUs in Walmart stores over 28 days. These competitions reveal the state of the art but don’t fundamentally modify it.

Kieran Chandler: It’s very interesting that in this competition, it was the first time ever, to my knowledge, that they were using a pinball loss function as the metric to score who was winning. That’s very obscure, you know? That’s literally the metric used to measure the accuracy.

Joannes Vermorel: I believe that Lokad was the first to propose, back in 2012, that supply chain forecasts needed to transition to quantile forecasts. Actually, later on, we said we had to transition to probabilistic forecasts and do even more things. Eight years ago, we stated that we had to make this transition. By the way, these forecasts are very odd because they come with a bias on purpose. For the audience that might be a bit confused, why would you even want bias on purpose for a demand forecast? The answer is because in stores, what you want is to ensure the availability of goods. You don’t want a forecast where, on average, people find what they seek half of the time. That’s not the goal. You want people to have, like, a 98% service level or something, where usually what they are looking for is present in the store. Thus, you want to have a forecast with a bias, and that technique is known as a quantile forecast. This competition was very interesting because it was the first time ever that there was a public competition where it was explicitly stated as a quantile forecast.

Rafael de Rezende: Then, you have to build the technology and tools to solve this problem. I’m very glad and proud that my team managed to rank sixth in such a brutal competition.

Kieran Chandler: Let’s talk a bit more about your team. You mentioned you were working with a team. How many of you were on this team, and who were the other people you were working with?

Rafael de Rezende: We were a team of four. It was me and three designers from Lokad. One is no longer working here, but still came from here. Each one had a very specific role within the team. Huggy was working with me on the core model, focusing on the small mathematical details on how we were going to tackle the problem. Catarina was bringing her business vision to it, analyzing the data and identifying the main points we should take from it to ensure we modeled things properly. Lastly, there was Marine, who did about 80% of the real hard work. She was working as a data engineer this time and built our own infrastructure for the competition. My role was to get everybody to work together and keep people motivated.

Kieran Chandler: Can you talk through a bit more about the approach you took? How did it vary from other methodologies out there? What was different?

Rafael de Rezende: I think a good analogy is to compare it to Formula One. If you take a Formula One car versus a regular car, you would see that it’s not exactly the same car as the one you’re buying in a store. They’re both cars, but they’re not the same. When it comes to this sort of competition, people tend to work more or less in the same way. They build methods that are extremely computationally expensive, which are great for the competition, but they’re still not exactly what you’re going to get if you actually buy the product in the end. There might have to be some changes. For instance, most of the top ten winners used methods that took a long time to run, even for a fairly small subset of data from Walmart. It was completely absurd, like ten hours just for a very small subset. So,

Kieran Chandler: Took it in a different direction, what we did and I think this idea was really present from the beginning. We wanted to roll the same theoretical framework that we do here on a daily basis and put it on track. So what we did is actually most things we use are not really far away from things we do on a daily basis. Of course, I mean we might have changed the car to put like a smaller sort of race setup, we did some changes, took off the backseat, and etc. to make it more performance-oriented. But if you really check what we did and what people are doing here, you would have to be an expert just to identify what is actually the difference.

Rafael de Rezende: Okay, so what you’re saying is computationally, because there were only about 30,000 SKUs, some of those other approaches did work, but if they were kind of at scale, it would be much more challenging for them to work in the real world. I think so. I’m not saying they wouldn’t work in the real world; I think it would be complicated. I mean, you have a lot of maintenance. We used low-dimensionality methods that have been known for a long time, but the way we cracked the problem was not from the data science perspective. We were really like supply chain professionals first. We came up with everything we know about supply chains and our intuition about the problem because we have been through many other internal competitions before, so we know how things behave, and that’s really what we put forward there.

Kieran Chandler: What were your thoughts on this approach, and perhaps you could give a basic overview of how you saw it from your perspective?

Joannes Vermorel: It’s very interesting because in the top 10, so basically there were 909 teams that competed. I did not look at all the solutions that were provided, I looked at the top 10. So, there were people that were better than Lokad, worse than Lokad. What was very interesting first is that if you look at the top 10 solutions, they are all numerically incredibly close. So basically, you know, from number one to number ten, it’s almost nothing. I think we were something like 0.01% after the guy that was ranked number five, and the person that was number seven was like 0.01% behind. The team that ranked number one was a few percent ahead, but it was still incredibly close overall.

Now, I think there are at least three angles in which these competitions don’t reflect what you need to have in the real supply chain world. And I think the difference between having a Formula One and a car that you’ve just kind of tuned for the race is very accurate in this respect. There were actually three top concerns.

First was operational cost just to get results. The methods in the top ten, Lokad was the only one that did not have insane compute costs. And again, imagine a car that consumes something like 50 liters per 100 kilometers. That would be something ten times more expensive than any car that consumes as much fuel as a truck. I mean, if you can do pit stops every 20 minutes, it’s kind of great, but otherwise, no. And by the way, with cloud computing, those costs are very real. If you need to rent a thousand servers, that costs a lot of money.

The second thing is maintainability. The settings are just, for example, if you look at this Formula One analogy, which I think is excellent, a Formula One is super nice because it works on circuits where the road is perfect.

Kieran Chandler: It’s literally as if you were trying to run a Formula One car in Paris. For example, even just the dampness on the sidewalks would actually damage the car. The car cannot suffer more than a bump that is even, you know, more than a few centimeters because the Formula One is so close to the road. It’s literally one centimeter away from the ground. So, if you have any obstacle, it would actually break the car. Obviously, if you decide to have a normal car, you have a bit more leeway and you’re not completely stuck to the ground. You don’t drive as fast, but guess what? If your road is a bit more bumpy, you will survive obstacles.

Joannes Vermorel: So, those models I’m talking about, well, in terms of numerical stability, you need to have something where if your data is not perfect, if you have things that are, you know, a bit corrupted here and there, your model doesn’t go crazy and you don’t end up with completely insane results, which would be just like your Formula One car going completely off track. Maintainability means that if you have conditions that are not ideal, it’s still relatively sane and conservative, which again translates to the fact that it will work.

In this competition, you can have people that are going to spend literally hundreds of hours making sure that everything is perfect, just like a racing circuit. But in the real world, you have stuff that happens all the time, and it’s messy, and roads are not perfect. That translates to the fact that numerical stability is very important to have models that are numerically stable, maybe a bit less accurate, so that when you have a bit of garbage, you know, there’s a saying in data mining: “garbage in, garbage out.” But the reality is that you always have a bit of garbage, so you need to have something that doesn’t go crazy when there’s a bit of garbage.

The last thing, which is also completely absent from this competition, is extensibility. The reality is that, for example, in this competition, stock levels were absent, future promotions were absent. The team had to forecast 28 days ahead. They had the pricing history but they didn’t have future prices for the duration to be forecast, which was 28 days. So, basically, they didn’t know the future promotions. If we wanted to have a real-world setup, we’d have to embed stock levels, future promotions, and probably, we would have liked shelf constraints on how much stock you can have on the shelves, for real. That’s constraints that, and then you would have, let’s say, the loss that was used to assess the accuracy was a pinball loss, but the reality is that you can have all sorts of nonlinearities that happen.

Kieran Chandler: And what are your thoughts now that the competition is over? Joannes mentioned there are really fine lines of 0.1%. Getting into the top five must be kind of frustrating. Any thoughts on how you could have improved?

Joannes Vermorel: We had many things that we did not try during the competition. At some point, you’re reaching the deadline, and you have to say, “Okay, that’s going to be it. Let’s go.” Of course, there are many ideas out there that we could have improved. I don’t think that’s going to be the next focus right now. The main focus would be to take the few improvements that we made and try to bring them to the whole team, to all the other scientists, and make sure those insights get reproduced quickly for all our clients.

Rafael de Rezende: Which Walmart isn’t, by the way. So we’re going to take all the things that we learned and try to put them to work as fast as we can, especially sharing the knowledge with others so that we can help more clients.

Kieran Chandler: Nice. What about you, Joannes? Is there anything that you noticed the team do that you think is particularly useful for maybe the future?

Joannes Vermorel: Frankly, it was great. It validated so much of our approach. When I say we need to have a model that is super lean, maintainable, and extensible, we have a lot of constraints that are completely unrelated to the forecasting accuracy. The question is, when you take those constraints into account, how far are you from state-of-the-art? Maybe the conclusion would have been that our models have good properties, are maintainable, and are extensible, but they are light years behind what you could get with state-of-the-art. The conclusion is the exact opposite: we are actually just a hair’s thickness behind what is state-of-the-art.

Rafael de Rezende: I’d like to add that it was a supply chain competition. It’s nice to know that your domain knowledge is actually helpful. We were competing mostly against people who were not supply chain experts but data science experts. We were the supply chain provider, which also happened to be data science experts playing in the field, so this might have differentiated us in the competition.

Kieran Chandler: Great. And, Rafael, what’s next for you? Do you have any more competitions on the horizon?

Rafael de Rezende: No, I think we had a lot of stress this year, so we’re going to leave it to next year, get some time to refresh, and then maybe next year.

Kieran Chandler: I think you probably deserve a break. That’s the show, and let’s leave it at that. Thanks for your time. That’s it for this week. Thanks very much for tuning in, and we’ll see you again in the next episode. Bye for now!

Back to Lokad TV ›

PREVIOUS EPISODES