Forecasting via Deep Learning (2018)

Predictive modelling at Lokad is now done through differentiable programming rather than deep learning. Differentiable programming is a descendent of deep learning, better suited to address supply chain challenges. In particular, differentiable programming is more amenable to whiteboxing than deep learning.

From probabilistic forecasting to deep learning

The engine’s design relies on a relatively recent flavor of machine learning named deep learning. For supply chains, large forecasting accuracy improvements can translate to equally large returns, serving more clients, serving them faster, while facing less inventory risks. About 18 months ago, we were announcing the 4th generation of our forecasting technology. The 4th gen was the first to deliver true probabilistic forecasts. Probabilistic forecasts are essential in supply chains because the costs are concentrated on the statistical extremes, when demand happens to be unexpectedly high or low. In contrast, traditional forecasting methods - like traditional daily, weekly or monthly forecasts - that only focus on delivering median or average forecasts, are blind to the problem. As a consequence, those methods usually fail to deliver satisfying returns for companies. The 5th generation doesn’t deny its origins; it also embraces probabilistic forecasts and builds on the experience gained with the previous gen.

Partly by chance, it turns out that deep learning happens to be heavily geared toward probabilistic forecasts by design. The motivation for this perspective was, however, entirely unrelated to supply chain concerns. Deep learning algorithms are favoring optimization built on top of a probabilistic / Bayesian perspective with metrics like cross entropy because these metrics provide huge gradient values that are especially suitable for the stochastic gradient descent, the “one” algorithm that makes deep learning possible.

In the specific case of supply chains, it happens that the foundations of deep learning are fully aligned with the actual business requirements!

Beyond the hype of artificial intelligence

Artificial intelligence - powered by deep learning in practice - has been the buzzword of the year in 2017. Claims are bold, enthralling and, well, fuzzy. From Lokad’s vantage point, we observe that the majority of these enterprise AI techs are not living up to their expectations. Very few companies can secure over half a billion USD in funding, like Instacart, to gather a world-class deep learning team in order to successfully tackle a supply chain challenge.

With this release, Lokad is making AI-grade forecasting technology accessible to any reasonably “digitalized” company. Obviously, the whole thing is still powered by historical supply chain data, so the data must be accessible to Lokad, but our technology requires zero deep learning expertise. Unlike virtually every single “enterprise” AI techs, Lokad does not rely on manual feature engineering. As far as our clients are concerned, the upgrade from our previous probabilistic forecasts to deep learning will be seamless. Lokad is the first software company to provide a turnkey AI-grade forecasting technology, accessible both to tiny 1-man ecommerces and yet scaling up to the largest supply chain networks that can include thousands of locations, and a million product references.

The age of GPU computing

Deep learning remained somewhat niche until the community managed to upgrade its own software building block to take advantage of GPUs (graphic processing units). Those GPUs differ largely from CPUs (central processing units), which are still powering the vast majority of apps nowadays with the notable exceptions of computer games, which are intensively relying on both CPUs and GPUs. Along with the complete rewrite of our forecasting engine for this 5th iteration, we have also significantly upgraded the low level infrastructure of Lokad. Indeed, in order to serve companies, the Lokad platform now leverages GPUs as well as CPUs. Lokad is now taking advantage of the GPU-powered machines that can be rented on Microsoft Azure, the cloud computing platform that supports Lokad. Through the massive processing power of the GPUs, we are not only making our forecasts more accurate, we are making them much faster too. Through a grid of GPUs, we are now typically getting the forecasts about 3x to 6x faster, for any sizeable datasets (*).

(*) For ultra-small datasets, our 5th gen forecasting engine is actually slower, and takes a few more minutes - which is largely inconsequential in practice.

Product launches and promotions

Our 5th generation forecasting engine is bringing substantial improvements to hard forecasting situations, most notably product launches and promotions. From our perspective product launches, albeit very difficult, remain a tad easier than promotion forecasts. The difference in difficulty is driven by the quality of the historical data, which is invariably lower for promotions compared to product launches. Promotion data gets better over time once the proper quality assurance processes are in place.

In particular, we are seeing deep learning as a massive opportunity for fashion brands who are struggling with product launches that dominate their sales: launching a new product isn’t the exception, it’s the rule. Then, as color and size variants vastly inflate the number of SKUs, the situation is made even more complex.

Our forecasting FAQ

Which forecasting models are you using?

Our deep forecasting engine is using a single model built from deep learning principles. Unlike classic statistical models, it’s a model that features tens of millions of trainable parameters, which is about 1000 times more parameters than our previous, most complex, non-deep machine learning model. Deep learning dramatically outperforms older machine learning approaches (random forests, gradient boosted trees). Yet, it’s worth noting that these older machine learning approaches were already outperforming all the time-series classics (Box-Jenkins, ARIMA, Holt-Winters, exponential smoothing, etc).

Do you learn from your forecasting mistakes?

Yes. The statistical training process - which ultimately generates the deep learning model - leverages all the historical data that is available to Lokad. The historical data is leveraged through a process known as backtesting. Thus, the more historical data that is available to the model, the more opportunities the model has to learn from its own mistakes.

Does your forecasting engine handle seasonality, trends, days of week?

Yes, the forecasting engine handles all the common cyclicities, and even the quasi-cyclicities, whose importance is frequently underestimated. As for the code, the deep learning model intensively uses a multiple time-series approach to leverage the cyclicities observed in other products, in order to improve the forecasting accuracy of any one given product. Naturally, two products may share the same seasonality, but not the same day-of-week pattern. The model is capable of capturing this pattern. Also, one of the major upside of deep learning is the capacity to properly capture the variability of the seasonality itself. Indeed, a season may start earlier or later depending on external variables, such as the weather, and those variations are detected and reflected in our forecasts.

What data do you need?

As was the case with our previous generation of forecasting technology, In order to forecast demand, the forecasting engine needs to be provided - at least - with the daily historical demand, and providing a disaggregated order history is even better. As far as the length of the history is concerned - the longer it is, the better. While no seasonality can be detected with less than 2 years of history, we consider 3 years of history to be good, and 5 years excellent. In order to forecast the lead times, the engine typically requires the purchase orders to contain both the order dates and the delivery dates. Specifying your product or SKU attributes helps to considerably refine the forecasts too. In addition, providing your stock levels is also very helpful to us, for getting a first meaningful stock analysis over to you.

Can you forecast my Excel sheet?

As a rule of thumb, if all of your data fits into one Excel sheet, then we usually cannot do much for you, and to be honest, nobody can either. Spreadsheet data is likely to be aggregated per week or per month, and most of the historical information ends up being lost through such aggregation. In addition, in this case, your spreadsheet is also not going to contain much information about the categories and the hierarchies that apply to your products either. Our forecasting engine leverages all the data you have, and doing a test on a tiny sample is not going to give satisfying results.

What about stock-outs and promotions?

Both stock-outs and promotions represent bias in historical sales. Since the goal is to forecast the demand, and not the sales, this bias needs to be taken into account. One frequent - but incorrect - way of dealing with these events consists of rewriting the history, to fill in the gaps and truncate the peaks. However, we don’t like this approach, because it consists of feeding forecasts to the forecasting engine, which can result in major overfitting problems. Instead, our engine natively supports “flags” that indicate where the demand has been censored or inflated.

Do you forecast new products?

Yes, we do. However, in order to forecast new products, the engine requires the launch dates for the other “older” products, as well as their historical demand at the time of the launch. Also, specifying some of your product categories and/or a product hierarchy is advised. The engine does indeed forecast new products by auto-detecting the “older” products, which can be considered as comparable to the new ones. However, as no demand has yet been observed for the new items, forecasts fully rely on the attributes that are associated with them.

Do you use external data to refine the forecasts?

We can use competitive pricing data typically obtained through 3rd party companies that specialize in web scraping for example. Web traffic data can also be used, and possibly acquired, to enrich the historical data in order to boost further the statistical accuracy. In practice, the biggest bottleneck in using external data sources isn’t the Lokad forecasting engine - which is fairly capable - but setting-up and maintaining a high-quality data pipeline attached to those external data sources.