Differentiable Programming as in 'AI' that works

March 27, 2019

technology

Joannes Vermorel

We are proud to announce the immediate availability of the Lokad private beta for differentiable programming intended for quantitative supply chain optimization. Differentiable programming is the descendent of deep learning, and represents the convergence of two algorithmic fields: machine learning and numerical optimization.

Timeline of differentiable programming at Lokad

Differentiable programming unlocks a series of supply chain scenarios that were seen as largely intractable: joint optimization of prices and stocks, loyalty-driven assortment optimization, forecasting demand for non-standard products (e.g. precious stones, artworks), large scale multi-echelon flow optimization, many-channel joint optimization, stock optimization under partially incorrect electronic stock values, large scale flow maximization under many constraints, etc. For many other scenarios that were already approachable with alternative methods, differentiable programming delivers superior numerical results with only a fraction of the overhead, both in terms of data scientists’ efforts and computational resources.

Application to Supply Chains

At its core, Differentiable Programming (DP) offers a path to unify problems that have remained disconnected for too long and resolves them jointly: assortment, pricing, forecasting, planning, merchandising. While such unification may seem unrealistically ambitious, the reality is that companies are already applying an insane amount of duct-tape to their own processes to cope with the endless problems generated by the fact that those challenges have been siloed within the organization in the first place. For example, pricing obviously impacts the demand and yet both planning and forecasting are nearly always performed while ignoring prices altogether.

DP unlocks the massive opportunity to deliver approximately correct decisions from a holistic perspective on the business, as opposed to being exactly wrong while displacing problems within the organization instead of resolving them. Anecdotally, seeking approximate correctness while taking into account the business as a whole is exactly what most spreadsheet-driven organizations are about; and for a lack of a technology - like DP - capable of embracing a whole-business perspective, spreadsheets remain the least terrible option.

Ecommerce: Being able to attach 100% of the units sold to known clients represents a massive latent amount of information about the market; yet, when it comes to pricing and inventory optimization, the loyalty information is usually not even used nowadays. DP offers the possibility to move from time-series forecasting to temporal graph forecasting where every single client-product pairs ever observed matter; leading to smarter decisions both for stocks and prices.

Luxury brands: the optimization of pricing and assortments - down to the store level - have long been considered as largely intractable due to the sparsity of the data, that is the very low sales volume per item per store - as low as one unit sold per product per store per year. DP provides angles to deliver classes of solutions that work on such ultra-sparse, because they are engineered to deliver a much greater data efficiency than regular deep learning methods.

Fashion brands: the joint optimization of stocks and prices is a clear requirement - as the demand for many articles can be highly price-sensitive - yet, the joint optimization of both purchasing and pricing could not be achieved due to the lack of a tool capable of even apprehending this coupling - i.e. the capacity to purchase more at a lower price generates more demand (and vice-versa). DP provides the expressiveness to tackle this challenge.

Manufacturing: the numerical optimization of sizeable multi-echelon networks falls apart when attempted with classic numerical solvers (see “Beyond branch-and-cut optimization” below). Indeed, those solvers become largely impractical when dealing with either millions of variables or stochastic behaviors. Unfortunately, manufacturing exhibits both with many variables and stochastic behaviors. DP offers a practical angle to cope with multi-echelon without betraying the complexity of the actual flow patterns within the network.

MRO (maintenance, repair, overhaul): If one part needed for the repair of a system is missing then the whole system - which might be an aircraft - stays down. Probabilistic forecasts are the first step to deal with such erratic and intermittent demand patterns, but figuring out the fine print of the co-occurrences of the parts required and turning this analysis into actionable inventory recommendations was too complex to be of practical use. DP streamlines the resolutions of such problems.

Retail networks: Cannibalizations within retail networks - usually between products but sometimes between stores - have been recognized as of primary importance for a long time. This problem is amplified by promotions, precisely intended to steer clients from one brand to another. DP offers the possibility to address cannibalization in the presence of promotions. Instead of merely “forecasting promotions”, DP offers the possibility to optimize promotions for what they are: profitable deals jointly operated by both the distribution channel and the brand.

Beyond the Artificial Intelligence hype

Artificial Intelligence (AI) has certainly been the tech buzzword of 2018 and the buzz is still going strong in 2019. However, while Lokad is extensively using techniques that usually do qualify for the AI buzzword - e.g. deep learning, we have been reluctant to put any emphasis on the “AI” part of Lokad’s technology. Indeed, as far quantitative supply chain optimization is concerned, packaged AI simply does not work. Supply chains are nothing like, say, computer vision: data, metrics and tasks are all extremely heterogeneous. As a result, companies who bought supposedly “turnkey” AI solutions are starting to realize that those solutions simply won’t ever work, except maybe in the simplest situations where “dumb” rule-based systems would have also worked just fine anyway.

At their core, supply chains are complex, man-made systems, and it’s usually unreasonable to expect that the AI system - based on data alone - will rediscover on its own fundamental insights about the domain such as:

doing promotions for a luxury brand is a big no-no.
negative sales orders in the ERP are actually product returns.
fresh food products must be transported within specified temperature ranges.
variants in colors might be good clothing substitutes, but not variants in sizes¹.
aircraft maintenance is driven by flight hours and flight cycles.
the sales in the UK are actually in GBP even if the ERP displays EUR as the currency.
people buy car parts for their vehicles, not for themselves.
every diamond is unique, but prices mostly depend on carat, clarity, color and cut.
any NOGO part missing from an aircraft causes the aircraft to be grounded.
many chemical plants take weeks to restart after being turned off.

In a distant future, there might be a time where machine learning succeeds at emulating human intelligence and gets results when facing wicked problems², however, so far, results have been only been obtained on relatively narrow problems. The machine learning technologies are steadily pushing back every year the boundaries of what constitutes a “narrow” problem, and after decades of efforts, important problems such as safe autonomous driving and decent automated translations are solved, or very close to getting solved.

Nevertheless, as illustrated by the list above, supply chains remain desperately too heterogeneous for a direct application of packaged machine learning algorithms. Even if deep learning provides the strongest generalization capabilities to date, it still takes the input of a supply chain scientist to frame the challenge in a way that is narrow enough for algorithms to work at all.

In this respect, deep learning has been tremendously successful because unlike many former approaches in machine learning, deep learning is profoundly compositional: it is possible to extensively tailor the model structure to better learn in a specific situation. Tailoring the model structure is different from tailoring the model input - a task known as feature engineering - which was typically the only option available for most non-deep machine learning algorithms such as random forests³.

However, as deep learning frameworks emerged from the “Big Problems” of machine learning, namely computer vision, voice recognition, voice synthesis, automated translation. Those frameworks have been engineered and tuned in-depth for scenarios that are literally nothing like the problems faced in supply chains. Thus, while it is possible to leverage those frameworks⁴ for supply chain optimization purposes, it was not an easy nor a lightweight undertaking.

In conclusion, with deep learning frameworks, much can be achieved for supply chains, but the impedance mismatch between supply chains and existing deep learning frameworks is strong; increasing the costs, delays, and limiting the real-world applicability of those technologies.

Beyond branch-and-cut optimization

Most problems in supply chains have both a learning angle - caused by an imperfect knowledge of the future, an imperfect knowledge of the present state of the market, and sometimes even an imperfect knowledge of the supply chain system itself (e.g. inventory inaccuracies) - but also a numerical optimization angle. Decisions need to be optimized against economic drivers while satisfying many nonlinear constraints (e.g. MOQs while purchasing or batch sizes while producing).

On the numerical optimization front, integer programming and its related techniques such as branch-and-cut have been dominating the field for decades. However, these branch-and-cut algorithms and their associated software solutions mostly failed⁵ to deliver the flexibility and scalability it takes to furnish operational solutions for many, if not most, supply chain challenges. Integer programming is a fantastically capable tool when it comes to solving tight problems with few variables (e.g. component placement within a consumer electronic device) but shows drastic limitations when it comes to large scale problems when randomness is involved (e.g. rebalancing stocks between 10 million SKUs when facing both probabilistic demand and probabilistic transportation times).

One of the most under appreciated aspects of deep learning is that its success is as much the result of breakthroughs on the learning side, as it is the result of breakthroughs on the optimization side. It is because the scientific community has uncovered that tremendously efficient algorithms are performing large scale optimizations⁶.

Not only are these “deep learning” optimization algorithms - all revolving around the stochastic gradient descent - vastly more efficient than their branch-and-cut counterparts, but they are a much better fit for the computing hardware that we have, namely SIMD CPU and (GP)GPU which in practice yields two or three orders of magnitude of extra speed-up.

These breakthroughs in pure numerical optimization are of high relevance for supply chains in order to optimize decisions. However, if the deep learning frameworks were already somewhat ill-suited to address learning problems in supply chains, they are even less suited to address the optimization problems in supply chains. Indeed, these optimization problems are even more dependent on the expressiveness of the framework in order to let the supply chain scientists implement the constraints and metrics to be respectively enforced and optimized.

Toward Differentiable Programming

In theory, there is no difference between theory and practice. But, in practice, there is. Walter J. Savitch, Pascal: An Introduction to the Art and Science of Programming (1984)

Differentiable Programming (DP) is the answer to bring to supply chains the best of what deep learning has to offer on both the learning front and the numerical optimization front. Through DP, supply chain scientists can make the most of their human insights to craft numerical recipes aligned - in depth - with the business goals.

There is no absolute delimitation between deep learning and differentiable programming: it’s more a continuum from the most scalable systems (deep learning) to the most expressive systems (differentiable programming) with many programming constructs that are gradually becoming available - at the expense of raw scalability - when moving toward differentiable programming.

Yet our experience at Lokad, indicates that transitioning from tools dominantly engineered for computer vision to tools engineered for supply chain challenges makes precisely the difference between an “interesting” prototype that never makes it to production, and an industrial-grade system deployed at scale.

	Deep Learning	Differentiable Programming
Primary purpose	Learning	Learning+Optimization
Typical usage	Learn-once, Eval-many	Learn-once, Eval-once
Input granularity	Fat objects (images, voice sequences, lidar scans, full text pages)	Thin objects (products, clients, SKUs, prices)
Input variety	Homogeneous objects (e.g. images all having the same height/width ratio)	Heterogeneous objects (relational tables, graphs, time-series)
Input volume	From megabytes to petabytes	From kilobytes to tens of gigabytes
Hardware acceleration	Exceptionally good	Good
Expressiveness	Static graphs of tensor operations	(Almost) arbitrary programs
Stochastic numerical recipes	Built-in	Idem

The typical usage is a subtle but important point. From the “Big AI” perspective, training time can be (almost) arbitrarily long: it’s OK to have a computational network being trained for weeks if not months. Later, the resulting computational network usually needs to be evaluated in real-time (e.g. pattern recognition for autonomous driving). This angle is completely unlike supply chains, where the best results are obtained by re-training the network every time. Moreover, from a DP perspective, the trained parameters are frequently the very results that we seek to obtain; making the whole real-time evaluation constraint moot.

The expectations surrounding the data inputs both in granularity, variariety and volume are also widely different. Typically, the “Big AI” perspective emphasizes near infinite amounts of training data (e.g. all the text pages of the web) where the prime challenge is to find tremendously scalable methods that can effectively tap into those massive datasets. In contrast, supply chain problems have to be addressed with a limited amount of highly structured yet diverse data.

This steers deep learning toward tensor-based frameworks, which can be massively accelerated through dedicated computing hardware, initially GPUs and now increasingly TPUs. Differentiable Programming, being based on stochastic gradient descent also exhibit many good properties for hardware acceleration, but to a reduced degree when compared to static graphs of tensor operations.

The importance of the stochastic numerical recipes is twofold. First, these recipes play an important role from a learning perspective. Variational auto-encoders or dropouts are examples of such numerical recipes. Second, these recipes also play an important role from a modeling perspective in order to properly factor probabilistic behaviors within the supply chain systems (e.g. varying lead times).

Conversely, there is a huge gap between differentiable programming and mixed integer programming - the dominant approach over the last few decades has been to perform complex numerical optimizations.

	Mixed integer Programming	Differentiable Programming
Primary purpose	Optimization	Learning+Optimization
Input granularity and variety	Thin objects, heterogeneous	Idem
Input volume	From bytes to tens of megabytes	From kilobytes to tens of gigabytes
Hardware acceleration	Poor	Good
Expressiveness	Inequalities over linear and quadratic forms	(Almost) arbitrary programs
Stochastic numerical recipes	None	Built-in

In defense of mixed integer programming tools, those tools - when they succeed at tackling a problem - can sometimes prove - in the mathematical sense - that they have obtained the optimal solution. Neither deep learning nor differentiable programming provides any formal proof in this regard.

Conclusions

Differentiable Programming is a major breakthrough for supply chains. It is built on top of deep learning, which proved tremendously successful in solving many “Big AI” problems such as computer vision, but re-engineered at the core to be suitable for real-world challenges as faced by real-world supply chains.

Lokad has been building upon its deep learning forecasting technology to transition toward Differentiable Programming, which is the next generation of our predictive technology. However, DP is more than just being predictive, it unifies optimization and learning unlocking solutions for a vast amount of problems which had no viable solutions before.

Interested in taking part in our private beta? Drop us an email at contact@lokad.com

Anecdotally, there is one segment where clothing sizes don’t matter that much, and still act as good substitutes, it’s children clothes. Indeed, parents tight on budget are frequently buying several sizes ahead for their children, anticipating their growth, when presented a strong price discount. ↩︎
Many, if not most, supply chain problems are “wicked” as in problems whose social complexity means that it has no determinable stopping point. See wicked problem on Wikipedia. ↩︎
Random Forests offer a few, tiny options for a meta-parameterization of the model itself: maximal depth of the tree, sampling ratio. This means that whatever patterns the random forest fails to capture have to be feature engineered for a lack of better options. See also Columnar Random Forests. ↩︎
the fact that deep learning frameworks work at all for supply chain purposes is a testament to the tremendous power behind these breakthroughs whose uncovered principles extend far beyond the original scope of study that gave birth to these frameworks. See also Deep Learning at Lokad. ↩︎
Optimization software such as CPLEX, Gurobi and now their open-source equivalent have been available for more than 3 decades. In theory, every single purchasing situation facing MOQs (minimal order quantity) or price breaks should have been addressed with software delivering similar capabilities. Yet, while I’ve had the opportunity to be in contact with over 100 companies over the last decade in many sectors, I have never seen any purchasing department using any of these tools anywhere. My own experience with such tools indicates that few problems actually fit into a set of static inequalities including only linear and quadratic forms. ↩︎
The algorithm Adam (2015) is probably the best representative of those simple-yet-tremendously efficient optimization algorithms that have made the whole machine learning field leap forward. ↩︎

Back to blog ›