M5予測コンペティションで909チーム中第6位にランクイン

7月 2, 2020

technology

Joannes Vermorel

A team of Lokad employees, namely Rafael de Rezende (leader), Ignacio Marín Eiroa, Katharina Egert and Guilherme Thompson ¹, have come in 6th position in the M5 Forecasting competition out of 909 competing teams. It’s an impressive feat, and I am proud of what this team has achieved. Building a culture oriented toward quantitative results has been a long standing goal for Lokad, and the result of this competition demonstrates just how far we have progressed on this journey.

Lokad ranked 6th out of 909 teams in the M5 forecasting competition

私の知る限り、公的な需要予測コンペティションがクォンタイル予測を採用したのはこれが初めてです。これは、2012年時点でLokadが取り組んでいた内容と直接つながっています。学術界がクォンタイルに追いつくまで8年を要したとはいえ、この成果の重要性は少しも損なわれません。いわゆる「クラシック」な裸の予測は、サプライチェーンの観点から見ると設計段階でほぼ破綻しています。クォンタイル予測は最終形ではありませんが、安全在庫が機能しない場面でも機能します。私はこれを、正しい方向への大きな一歩だと見ています。

結果だけを見れば、1位から6位までのチームは驚くほど僅差でした。1位のチーム²が数パーセント先行した形です。しかし、私自身の経験では、Walmartのような超大規模小売ネットワークであっても、クォンタイル予測の精度評価に使える指標であるピンボール損失が5%改善しても、金額ベースの誤差としてはほとんど目立ちません。実際、この水準の精度では予測モデルは本質的に同等であり、M5コンペティションでは扱われなかった別の論点、たとえば品切れ、品揃え変動、カニバリゼーション、変動するリードタイムへの対応力のほうが支配的です。こうした論点は、数パーセントのピンボール損失差よりはるかに大きな差を生みます。

モデル面では、Lokadチームは低次元のパラメトリックモデルを採用しました。これには、店舗/カテゴリ単位での関連周期性（曜日、月内日、月）と、周期性およびストックアウト・ノイズを除去したベースライン、さらにそのベースラインを日次の軌跡へ変換する2パラメータの状態空間モデル（周期性を乗法的に寄与させるもの）が含まれます。優勝チームと同様に、Lokadは価格データも外部データも使用しませんでした。Lokadチームにとって最大の技術的難所は、予測対象となる品切れへの対処でした。これは需要の予測ではなく、販売の予測だったのです。この点については、後ほどこのモデルの細部を見直す際に詳しく触れます。

Overall, if a well-chosen low dimensional parametric model, like the one Lokad used in the M5 competition, can get you within a handful of percents of accuracy of the state-of-the-art method - which happens to be range-augmented gradient boosted trees - then in production, this model is guaranteed to be much more nicely behaved when compared to nonparametric or hyperparametric models, and much easier to structurally tweak ³ when the need arises.

Also, the computing performance of the model tends to be a not-so-subtle operational killer. The first placed team reported that running their prediction took “a couple hours” (sic) on a 10+10 CPU workstation setup. This may seem fast, but keep in mind that the M5 dataset was only 30k SKUs, which is very small compared to the number of SKUs in most retail networks (a few categories over a few stores). I guesstimate that Walmart has over 100M SKUs to manage globally, so we are talking of tens of thousands of compute hours per prediction ⁴. The retail networks that Lokad serve typically give us a ~2 hours window every day to refresh our forecasts, so whatever models we pick need to be compatible with this schedule for both training and forecasting ⁵. Deploying the model of the first placed team is certainly possible at the Walmart’s scale, but managing the compute cluster alone would take a team of its own.

The M5 competition was a major improvement upon its previous iterations. However, the dataset is still a far cry from being close to a real retail situation. For example, the pricing information was only available for the past. In practice, promotions don’t just happen randomly: they are planned. As such, if the price data had been provided for the time period to be forecast, the competition would have been steered toward models actually making use of this information instead of dismissing it straight away.

Besides future prices, two major pieces of data happened to be missing from the M5 competition: stock levels and disaggregated transactions, both of which are nearly always available in retail chains. Stock levels matter because obviously without stock there are no sales (censorship bias). Disaggregated transactions matter because, in my experience, it’s nearly impossible to assess any kind of cannibalization or substitution without them - whereas a casual observation of the retail shelves clearly indicates that they do play a big role. The model that the Lokad team used to rank sixth did not have anything in this regard, and the model that ranked first did not either.

結論として、これはLokadにとって素晴らしい結果です。予測コンペティションをより現実的にする余地は確かにありますが、この結果をあまり文字どおりに受け取りすぎないよう読者に強く勧めたいと思います。M5はあくまで_予測_コンペティションです。現実の世界では、品切れ、新製品投入、販促、品揃え変更、サプライヤー問題、配送スケジュールなど、すべてを考慮に入れなければなりません。最大の課題は、誤差をほんの数パーセント削ることではなく、エンドツーエンドの数値レシピに、サプライチェーン最適化の取り組み全体を台無しにするような愚かな盲点がないことを保証する点にあります。

Technically an ex-Lokad employee at the time of the competition. ↩︎
The winning team included Northquay (pseudonym) and Russ Wolfinger. Their team was named Everyday Low SPLices for this M5 competition. For the sake of clarity, I am simply referring to them here as the first placed team. ↩︎
サプライチェーンでは危機は日常的に起こります。Covid-19は世界規模の直近の危機にすぎず、局地的な危機は常に発生しています。履歴データは、サプライチェーンで実際に展開する出来事を必ずしも反映しません。しばしば、モデルを妥当な意思決定へ導く唯一の手段は、サプライチェーンサイエンティストによる高次の洞察なのです。 ↩︎
The first placed team used LightGBM, a C++ library capable of delivering state-of-the-art algorithmic performance for this class of models. Furthermore, the team used somewhat advanced numerical performance tricks such as using half-precision numbers. When transitioning towards a production setup, the per-SKU compute performance would most likely decrease due to the extra complexity / heterogeneity imposed by an actual production environment. ↩︎
Not all models are equally suitable for isolating training from evaluation (training). Mileage may vary. Data problems happen once in a while, so in these situations, models need to be retrained, and this needs to happen fast. ↩︎

M5予測コンペティションで909チーム中第6位にランクイン

その他の投稿

Lokadに質問