Differentiable Programming as in 'AI' that works
We are proud to announce the immediate availability of the Lokad private beta for differentiable programming intended for quantitative supply chain optimization. Differentiable programming is the descendent of deep learning, and represents the convergence of two algorithmic fields: machine learning and numerical optimization.
Differentiable programming unlocks a series of supply chain scenarios that were seen as largely intractable: joint optimization of prices and stocks, loyaltydriven assortment optimization, forecasting demand for nonstandard products (e.g. precious stones, artworks), large scale multiechelon flow optimization, manychannel joint optimization, stock optimization under partially incorrect electronic stock values, large scale flow maximization under many constraints, etc. For many other scenarios that were already approachable with alternative methods, differentiable programming delivers superior numerical results with only a fraction of the overhead, both in terms of data scientists’ efforts and computational resources.
Application to Supply Chains
At its core, Differentiable Programming (DP) offers a path to unify problems that have remained disconnected for too long and resolves them jointly: assortment, pricing, forecasting, planning, merchandising. While such unification may seem unrealistically ambitious, the reality is that companies are already applying an insane amount of ducttape to their own processes to cope with the endless problems generated by the fact that those challenges have been siloed within the organization in the first place. For example, pricing obviously impacts the demand and yet both planning and forecasting are nearly always performed while ignoring prices altogether.
DP unlocks the massive opportunity to deliver approximately correct decisions from a holistic perspective on the business, as opposed to being exactly wrong while displacing problems within the organization instead of resolving them. Anecdotally, seeking approximate correctness while taking into account the business as a whole is exactly what most spreadsheetdriven organizations are about; and for a lack of a technology  like DP  capable of embracing a wholebusiness perspective, spreadsheets remain the least terrible option.
Ecommerce: Being able to attach 100% of the units sold to known clients represents a massive latent amount of information about the market; yet, when it comes to pricing and inventory optimization, the loyalty information is usually not even used nowadays. DP offers the possibility to move from timeseries forecasting to temporal graph forecasting where every single clientproduct pairs ever observed matter; leading to smarter decisions both for stocks and prices.
Luxury brands: the optimization of pricing and assortments  down to the store level  have long been considered as largely intractable due to the sparsity of the data, that is the very low sales volume per item per store  as low as one unit sold per product per store per year. DP provides angles to deliver classes of solutions that work on such ultrasparse, because they are engineered to deliver a much greater data efficiency than regular deep learning methods.
Fashion brands: the joint optimization of stocks and prices is a clear requirement  as the demand for many articles can be highly pricesensitive  yet, the joint optimization of both purchasing and pricing could not be achieved due to the lack of a tool capable of even apprehending this coupling  i.e. the capacity to purchase more at a lower price generates more demand (and viceversa). DP provides the expressiveness to tackle this challenge.
Manufacturing: the numerical optimization of sizeable multiechelon networks falls apart when attempted with classic numerical solvers (see “Beyond branchandcut optimization” below). Indeed, those solvers become largely impractical when dealing with either millions of variables or stochastic behaviors. Unfortunately, manufacturing exhibits both with many variables and stochastic behaviors. DP offers a practical angle to cope with multiechelon without betraying the complexity of the actual flow patterns within the network.
MRO (maintenance, repair, overhaul): If one part needed for the repair of a system is missing then the whole system  which might be an aircraft  stays down. Probabilistic forecasts are the first step to deal with such erratic and intermittent demand patterns, but figuring out the fine print of the cooccurrences of the parts required and turning this analysis into actionable inventory recommendations was too complex to be of practical use. DP streamlines the resolutions of such problems.
Retail networks: Cannibalizations within retail networks  usually between products but sometimes between stores  have been recognized as of primary importance for a long time. This problem is amplified by promotions, precisely intended to steer clients from one brand to another. DP offers the possibility to address cannibalization in the presence of promotions. Instead of merely “forecasting promotions”, DP offers the possibility to optimize promotions for what they are: profitable deals jointly operated by both the distribution channel and the brand.
Beyond the Artificial Intelligence hype
Artificial Intelligence (AI) has certainly been the tech buzzword of 2018 and the buzz is still going strong in 2019. However, while Lokad is extensively using techniques that usually do qualify for the AI buzzword  e.g. deep learning, we have been reluctant to put any emphasis on the “AI” part of Lokad’s technology. Indeed, as far quantitative supply chain optimization is concerned, packaged AI simply does not work. Supply chains are nothing like, say, computer vision: data, metrics and tasks are all extremely heterogeneous. As a result, companies who bought supposedly “turnkey” AI solutions are starting to realize that those solutions simply won’t ever work, except maybe in the simplest situations where “dumb” rulebased systems would have also worked just fine anyway.
At their core, supply chains are complex, manmade systems, and it’s usually unreasonable to expect that the AI system  based on data alone  will rediscover on its own fundamental insights about the domain such as:
 doing promotions for a luxury brand is a big nono.
 negative sales orders in the ERP are actually product returns.
 fresh food products must be transported within specified temperature ranges.
 variants in colors might be good clothing substitutes, but not variants in sizes^{1}.
 aircraft maintenance is driven by flight hours and flight cycles.
 the sales in the UK are actually in GBP even if the ERP displays EUR as the currency.
 people buy car parts for their vehicles, not for themselves.
 every diamond is unique, but prices mostly depend on carat, clarity, color and cut.
 any NOGO part missing from an aircraft causes the aircraft to be grounded.
 many chemical plants take weeks to restart after being turned off.
In a distant future, there might be a time where machine learning succeeds at emulating human intelligence and gets results when facing wicked problems^{2}, however, so far, results have been only been obtained on relatively narrow problems. The machine learning technologies are steadily pushing back every year the boundaries of what constitutes a “narrow” problem, and after decades of efforts, important problems such as safe autonomous driving and decent automated translations are solved, or very close to getting solved.
Nevertheless, as illustrated by the list above, supply chains remain desperately too heterogeneous for a direct application of packaged machine learning algorithms. Even if deep learning provides the strongest generalization capabilities to date, it still takes the input of a supply chain scientist to frame the challenge in a way that is narrow enough for algorithms to work at all.
In this respect, deep learning has been tremendously successful because unlike many former approaches in machine learning, deep learning is profoundly compositional: it is possible to extensively tailor the model structure to better learn in a specific situation. Tailoring the model structure is different from tailoring the model input  a task known as feature engineering  which was typically the only option available for most nondeep machine learning algorithms such as random forests^{3}.
However, as deep learning frameworks emerged from the “Big Problems” of machine learning, namely computer vision, voice recognition, voice synthesis, automated translation. Those frameworks have been engineered and tuned indepth for scenarios that are literally nothing like the problems faced in supply chains. Thus, while it is possible to leverage those frameworks^{4} for supply chain optimization purposes, it was not an easy nor a lightweight undertaking.
In conclusion, with deep learning frameworks, much can be achieved for supply chains, but the impedance mismatch between supply chains and existing deep learning frameworks is strong; increasing the costs, delays, and limiting the realworld applicability of those technologies.
Beyond branchandcut optimization
Most problems in supply chains have both a learning angle  caused by an imperfect knowledge of the future, an imperfect knowledge of the present state of the market, and sometimes even an imperfect knowledge of the supply chain system itself (e.g. inventory inaccuracies)  but also a numerical optimization angle. Decisions need to be optimized against economic drivers while satisfying many nonlinear constraints (e.g. MOQs while purchasing or batch sizes while producing).
On the numerical optimization front, integer programming and its related techniques such as branchandcut have been dominating the field for decades. However, these branchandcut algorithms and their associated software solutions mostly failed^{5} to deliver the flexibility and scalability it takes to furnish operational solutions for many, if not most, supply chain challenges. Integer programming is a fantastically capable tool when it comes to solving tight problems with few variables (e.g. component placement within a consumer electronic device) but shows drastic limitations when it comes to large scale problems when randomness is involved (e.g. rebalancing stocks between 10 million SKUs when facing both probabilistic demand and probabilistic transportation times).
One of the most under appreciated aspects of deep learning is that its success is as much the result of breakthroughs on the learning side, as it is the result of breakthroughs on the optimization side. It is because the scientific community has uncovered that tremendously efficient algorithms are performing large scale optimizations^{6}.
Not only are these “deep learning” optimization algorithms  all revolving around the stochastic gradient descent  vastly more efficient than their branchandcut counterparts, but they are a much better fit for the computing hardware that we have, namely SIMD CPU and (GP)GPU which in practice yields two or three orders of magnitude of extra speedup.
These breakthroughs in pure numerical optimization are of high relevance for supply chains in order to optimize decisions. However, if the deep learning frameworks were already somewhat illsuited to address learning problems in supply chains, they are even less suited to address the optimization problems in supply chains. Indeed, these optimization problems are even more dependent on the expressiveness of the framework in order to let the supply chain scientists implement the constraints and metrics to be respectively enforced and optimized.
Toward Differentiable Programming
In theory, there is no difference between theory and practice. But, in practice, there is. Walter J. Savitch, Pascal: An Introduction to the Art and Science of Programming (1984)
Differentiable Programming (DP) is the answer to bring to supply chains the best of what deep learning has to offer on both the learning front and the numerical optimization front. Through DP, supply chain scientists can make the most of their human insights to craft numerical recipes aligned  in depth  with the business goals.
There is no absolute delimitation between deep learning and differentiable programming: it’s more a continuum from the most scalable systems (deep learning) to the most expressive systems (differentiable programming) with many programming constructs that are gradually becoming available  at the expense of raw scalability  when moving toward differentiable programming.
Yet our experience at Lokad, indicates that transitioning from tools dominantly engineered for computer vision to tools engineered for supply chain challenges makes precisely the difference between an “interesting” prototype that never makes it to production, and an industrialgrade system deployed at scale.
Deep Learning  Differentiable Programming  

Primary purpose  Learning  Learning+Optimization 
Typical usage  Learnonce, Evalmany  Learnonce, Evalonce 
Input granularity  Fat objects (images, voice sequences, lidar scans, full text pages)  Thin objects (products, clients, SKUs, prices) 
Input variety  Homogeneous objects (e.g. images all having the same height/width ratio)  Heterogeneous objects (relational tables, graphs, timeseries) 
Input volume  From megabytes to petabytes  From kilobytes to tens of gigabytes 
Hardware acceleration  Exceptionally good  Good 
Expressiveness  Static graphs of tensor operations  (Almost) arbitrary programs 
Stochastic numerical recipes  Builtin  Idem 
The typical usage is a subtle but important point. From the “Big AI” perspective, training time can be (almost) arbitrarily long: it’s OK to have a computational network being trained for weeks if not months. Later, the resulting computational network usually needs to be evaluated in realtime (e.g. pattern recognition for autonomous driving). This angle is completely unlike supply chains, where the best results are obtained by retraining the network every time. Moreover, from a DP perspective, the trained parameters are frequently the very results that we seek to obtain; making the whole realtime evaluation constraint moot.
The expectations surrounding the data inputs both in granularity, variariety and volume are also widely different. Typically, the “Big AI” perspective emphasizes near infinite amounts of training data (e.g. all the text pages of the web) where the prime challenge is to find tremendously scalable methods that can effectively tap into those massive datasets. In contrast, supply chain problems have to be addressed with a limited amount of highly structured yet diverse data.
This steers deep learning toward tensorbased frameworks, which can be massively accelerated through dedicated computing hardware, initially GPUs and now increasingly TPUs. Differentiable Programming, being based on stochastic gradient descent also exhibit many good properties for hardware acceleration, but to a reduced degree when compared to static graphs of tensor operations.
The importance of the stochastic numerical recipes is twofold. First, these recipes play an important role from a learning perspective. Variational autoencoders or dropouts are examples of such numerical recipes. Second, these recipes also play an important role from a modeling perspective in order to properly factor probabilistic behaviors within the supply chain systems (e.g. varying lead times).
Conversely, there is a huge gap between differentiable programming and mixed integer programming  the dominant approach over the last few decades has been to perform complex numerical optimizations.
Mixed integer Programming  Differentiable Programming  

Primary purpose  Optimization  Learning+Optimization 
Input granularity and variety  Thin objects, heterogeneous  Idem 
Input volume  From bytes to tens of megabytes  From kilobytes to tens of gigabytes 
Hardware acceleration  Poor  Good 
Expressiveness  Inequalities over linear and quadratic forms  (Almost) arbitrary programs 
Stochastic numerical recipes  None  Builtin 
In defense of mixed integer programming tools, those tools  when they succeed at tackling a problem  can sometimes prove  in the mathematical sense  that they have obtained the optimal solution. Neither deep learning nor differentiable programming provides any formal proof in this regard.
Conclusions
Differentiable Programming is a major breakthrough for supply chains. It is built on top of deep learning, which proved tremendously successful in solving many “Big AI” problems such as computer vision, but reengineered at the core to be suitable for realworld challenges as faced by realworld supply chains.
Lokad has been building upon its deep learning forecasting technology to transition toward Differentiable Programming, which is the next generation of our predictive technology. However, DP is more than just being predictive, it unifies optimization and learning unlocking solutions for a vast amount of problems which had no viable solutions before.
Interested in taking part in our private beta? Drop us an email at contact@lokad.com

Anecdotally, there is one segment where clothing sizes don’t matter that much, and still act as good substitutes, it’s children clothes. Indeed, parents tight on budget are frequently buying several sizes ahead for their children, anticipating their growth, when presented a strong price discount. ↩︎

Many, if not most, supply chain problems are “wicked” as in problems whose social complexity means that it has no determinable stopping point. See wicked problem on Wikipedia. ↩︎

Random Forests offer a few, tiny options for a metaparameterization of the model itself: maximal depth of the tree, sampling ratio. This means that whatever patterns the random forest fails to capture have to be feature engineered for a lack of better options. See also Columnar Random Forests. ↩︎

the fact that deep learning frameworks work at all for supply chain purposes is a testament to the tremendous power behind these breakthroughs whose uncovered principles extend far beyond the original scope of study that gave birth to these frameworks. See also Deep Learning at Lokad. ↩︎

Optimization software such as CPLEX, Gurobi and now their opensource equivalent have been available for more than 3 decades. In theory, every single purchasing situation facing MOQs (minimal order quantity) or price breaks should have been addressed with software delivering similar capabilities. Yet, while I’ve had the opportunity to be in contact with over 100 companies over the last decade in many sectors, I have never seen any purchasing department using any of these tools anywhere. My own experience with such tools indicates that few problems actually fit into a set of static inequalities including only linear and quadratic forms. ↩︎

The algorithm Adam (2015) is probably the best representative of those simpleyettremendously efficient optimization algorithms that have made the whole machine learning field leap forward. ↩︎