**machine learning and numerical optimization**. Differentiable programming unlocks a series of supply chain scenarios that were seen as largely intractable: joint optimization of prices and stocks, loyalty-driven assortment optimization, forecasting demand for non-standard products (e.g. precious stones, artworks), large scale multi-echelon flow optimization, many-channel joint optimization, stock optimization under partially incorrect electronic stock values, large scale flow maximization under many constraints, etc. For many other scenarios that were already approachable with alternative methods, differentiable programming delivers superior numerical results with only a fraction of the overhead, both in terms of data scientists’ efforts and computational resources.

### Differentiable Programming in Motion

## Application to Supply Chains

At its core, Differentiable Programming (DP) offers a path to unify problems that have remained disconnected for too long and resolves them jointly: assortment, pricing, forecasting, planning, merchandising. While such unification may seem unrealistically ambitious, the reality is that companies are already applying an insane amount of duct-tape to their own processes to cope with the endless problems generated by the fact that those challenges have been siloed within the organization in the first place. For example, pricing obviously impacts the demand and yet both planning and forecasting are nearly always performed while ignoring prices altogether.

DP unlocks the massive opportunity to deliver approximately correct decisions from a holistic perspective on the business, as opposed to being exactly wrong while displacing problems within the organization instead of resolving them.

Anecdotally, seeking approximate correctness while taking into account the business as a whole is exactly what most spreadsheet-driven organizations are about; and for a lack of a technology - like DP - capable of embracing a whole-business perspective, spreadsheets remain the least terrible option.

### Ecommerce

Being able to attach 100% of the units sold to known clients represents a massive latent amount of information about the market; yet, when it comes to pricing and inventory optimization, the loyalty information is usually not even used nowadays. DP offers the possibility to move from time-series forecasting to temporal graph forecasting where every single client-product pairs ever observed matter; leading to smarter decisions both for stocks and prices.

### Luxury brands

The optimization of pricing and assortments - down to the store level - have long been considered as largely intractable due to the sparsity of the data, that is the very low sales volume per item per store - as low as one unit sold per product per store per year. DP provides angles to deliver classes of solutions that work on such ultra-sparse, because they are engineered to deliver a much greater data efficiency than regular deep learning methods.

### Fashion brands

The joint optimization of stocks and prices is a clear requirement - as the demand for many articles can be highly price-sensitive - yet, the joint optimization of both purchasing and pricing could not be achieved due to the lack of a tool capable of even apprehending this coupling - i.e. the capacity to purchase more at a lower price generates more demand (and vice-versa). DP provides the expressiveness to tackle this challenge.

### Manufacturing

The numerical optimization of sizeable multi-echelon networks falls apart when attempted with classic numerical solvers (see “Beyond branch-and-cut optimization” below). Indeed, those solvers become largely impractical when dealing with either millions of variables or stochastic behaviors. Unfortunately, manufacturing exhibits both with many variables and stochastic behaviors. DP offers a practical angle to cope with multi-echelon without betraying the complexity of the actual flow patterns within the network.

### MRO (maintenance, repair, overhaul)

If one part needed for the repair of a system is missing then the whole system - which might be an aircraft - stays down. Probabilistic forecasts are the first step to deal with such erratic and intermittent demand patterns, but figuring out the fine print of the co-occurrences of the parts required and turning this analysis into actionable inventory recommendations was too complex to be of practical use. DP streamlines the resolutions of such problems.

### Retail networks

Cannibalizations within retail networks - usually between products but sometimes between stores - have been recognized as of primary importance for a long time. This problem is amplified by promotions, precisely intended to steer clients from one brand to another. DP offers the possibility to address cannibalization in the presence of promotions. Instead of merely “forecasting promotions”, DP offers the possibility to optimize promotions for what they are: profitable deals jointly operated by both the distribution channel and the brand.

## Beyond the Artificial Intelligence hype

Artificial Intelligence (AI) has certainly been the tech buzzword of 2018 and the buzz is still going strong in 2019. However, while Lokad is extensively using techniques that usually do qualify for the AI buzzword - e.g. deep learning - we have been reluctant to put any emphasis on the “AI” part of Lokad’s technology. Indeed, as far quantitative supply chain optimization is concerned, packaged AI simply does not work. Supply chains are nothing like, say, computer vision: data, metrics and tasks are all extremely heterogeneous. As a result, companies who bought supposedly “turnkey” AI solutions are starting to realize that those solutions simply won’t ever work, except maybe in the simplest situations where “dumb” rule-based systems would have also worked just fine anyway. At their core, supply chains are complex, man-made systems, and it’s usually unreasonable to expect that the AI system - based on data alone - will rediscover on its own fundamental insights about the domain such as:

- doing promotions for a luxury brand is a big no-no.
- negative sales orders in the ERP are actually product returns.
- fresh food products must be transported within specified temperature ranges.
- variants in colors might be good clothing substitutes, but not variants in sizes.
- aircraft maintenance is driven by flight hours and flight cycles.
- the sales in the UK are actually in GBP even if the ERP displays EUR as the currency.
- people buy car parts for their vehicles, not for themselves.
- every diamond is unique, but prices mostly depend on carat, clarity, color and cut.
- any NOGO part missing from an aircraft causes the aircraft to be grounded.
- many chemical plants take weeks to restart after being turned off.

In this respect, deep learning has been tremendously successful because unlike many former approaches in machine learning, deep learning is profoundly compositional: it is possible to extensively tailor the model structure to better learn in a specific situation. Tailoring the model structure is different from tailoring the model input - a task known as feature engineering - which was typically the only option available for most non-deep machine learning algorithms such as random forests.

However, as deep learning frameworks emerged from the “Big Problems” of machine learning, namely computer vision, voice recognition, voice synthesis, automated translation. Those frameworks have been engineered and tuned in-depth for scenarios that are literally nothing like the problems faced in supply chains. Thus, while it is possible to leverage those frameworks for supply chain optimization purposes, it was not an easy nor a lightweight undertaking. In conclusion, with deep learning frameworks, much can be achieved for supply chains, but the impedance mismatch between supply chains and existing deep learning frameworks is strong; increasing the costs, delays, and limiting the real-world applicability of those technologies.

## Beyond branch-and-cut optimization

Most problems in supply chains have both a learning angle - caused by an imperfect knowledge of the future, an imperfect knowledge of the present state of the market, and sometimes even an imperfect knowledge of the supply chain system itself (e.g. inventory inaccuracies) - but also a numerical optimization angle. Decisions need to be optimized against economic drivers while satisfying many nonlinear constraints (e.g. MOQs while purchasing or batch sizes while producing).

On the numerical optimization front, integer programming and its related techniques such as branch-and-cut have been dominating the field for decades. However, these branch-and-cut algorithms and their associated software solutions mostly failed to deliver the flexibility and scalability it takes to furnish operational solutions for many, if not most, supply chain challenges. Integer programming is a fantastically capable tool when it comes to solving tight problems with few variables (e.g. component placement within a consumer electronic device) but shows drastic limitations when it comes to large scale problems when randomness is involved (e.g. rebalancing stocks between 10 million SKUs when facing both probabilistic demand and probabilistic transportation times).

One of the most under appreciated aspects of deep learning is that its success is as much the result of breakthroughs on the learning side, as it is the result of breakthroughs on the optimization side. It is because the scientific community has uncovered that tremendously efficient algorithms are performing large scale optimizations. Not only are these “deep learning” optimization algorithms - all revolving around the stochastic gradient descent - vastly more efficient than their branch-and-cut counterparts, but they are a much better fit for the computing hardware that we have, namely SIMD CPU and (GP)GPU which in practice yields two or three orders of magnitude of extra speed-up. These breakthroughs in pure numerical optimization are of high relevance for supply chains in order to optimize decisions. However, if the deep learning frameworks were already somewhat ill-suited to address learning problems in supply chains, they are even less suited to address the optimization problems in supply chains. Indeed, these optimization problems are even more dependent on the expressiveness of the framework in order to let the supply chain scientists implement the constraints and metrics to be respectively enforced and optimized.

## Toward Differentiable Programming

In theory, there is no difference between theory and practice. But, in practice, there is. Walter J. Savitch,

Pascal: An Introduction to the Art and Science of Programming (1984)

Differentiable Programming (DP) is the answer to bring to supply chains the best of what deep learning has to offer on both the learning front and the numerical optimization front. Through DP, supply chain scientists can make the most of their human insights to craft numerical recipes aligned - in depth - with the business goals. There is no absolute delimitation between deep learning and differentiable programming: it’s more a continuum from the most scalable systems (deep learning) to the most expressive systems (differentiable programming) with many programming constructs that are gradually becoming available - at the expense of raw scalability - when moving toward differentiable programming. Yet our experience at Lokad, indicates that transitioning from tools dominantly engineered for computer vision to tools engineered for supply chain challenges makes precisely the difference between an “interesting” prototype that never makes it to production, and an industrial-grade system deployed at scale.

Deep Learning | Differentiable Programming | |
---|---|---|

Primary purpose | Learning | Learning+Optimization |

Typical usage | Learn-once, Eval-many | Learn-once, Eval-once |

Input granularity | Fat objects (images, voice sequences, lidar scans, full text pages) | Thin objects (products, clients, SKUs, prices) |

Input variety | Homogeneous objects (e.g. images all having the same height/width ratio) | Heterogeneous objects (relational tables, graphs, time-series) |

Input volume | From megabytes to petabytes | From kilobytes to tens of gigabytes |

Hardware acceleration | Hardware acceleration | Good |

Expressiveness | Static graphs of tensor operations | (Almost) arbitrary programs |

Stochastic numerical recipes | Built-in | Idem |

The typical usage is a subtle but important point. From the “Big AI” perspective, training time can be (almost) arbitrarily long: it’s OK to have a computational network being trained for weeks if not months. Later, the resulting computational network usually needs to be evaluated in real-time (e.g. pattern recognition for autonomous driving). This angle is completely unlike supply chains, where the best results are obtained by re-training the network every time. Moreover, from a DP perspective, the trained parameters are frequently the very results that we seek to obtain; making the whole real-time evaluation constraint moot. The expectations surrounding the data inputs both in granularity, variariety and volume are also widely different. Typically, the “Big AI” perspective emphasizes near infinite amounts of training data (e.g. all the text pages of the web) where the prime challenge is to find tremendously scalable methods that can effectively tap into those massive datasets. In contrast, supply chain problems have to be addressed with a limited amount of highly structured yet diverse data. This steers deep learning toward tensor-based frameworks, which can be massively accelerated through dedicated computing hardware, initially GPUs and now increasingly TPUs. Differentiable Programming, being based on stochastic gradient descent also exhibit many good properties for hardware acceleration, but to a reduced degree when compared to static graphs of tensor operations. The importance of the stochastic numerical recipes is twofold. First, these recipes play an important role from a learning perspective. Variational auto-encoders or dropouts are examples of such numerical recipes. Second, these recipes also play an important role from a modeling perspective in order to properly factor probabilistic behaviors within the supply chain systems (e.g. varying lead times). Conversely, there is a huge gap between differentiable programming and mixed integer programming - the dominant approach over the last few decades has been to perform complex numerical optimizations.

Mixed integer Programming | Differentiable Programming | |
---|---|---|

Primary purpose | Optimization | Learning+Optimization |

Input granularity and variety | Thin objects, heterogeneous | Idem |

Input volume | From bytes to tens of megabytes | From kilobytes to tens of gigabytes |

Hardware acceleration | Poor | Good |

Expressiveness | Inequalities over linear and quadratic forms | (Almost) arbitrary programs |

Stochastic numerical recipes | None | Built-in |

In defense of mixed integer programming tools, those tools - when they succeed at tackling a problem - can sometimes *prove* - in the mathematical sense - that they have obtained the optimal solution. Neither deep learning nor differentiable programming provides any formal proof in this regard.

## Conclusions

Differentiable Programming is a major breakthrough for supply chains. It is built on top of deep learning, which proved tremendously successful in solving many “Big AI” problems such as computer vision, but re-engineered at the core to be suitable for real-world challenges as faced by real-world supply chains. Lokad has been building upon its deep learning forecasting technology to transition toward Differentiable Programming, which is the next generation of our predictive technology. However, DP is more than just being predictive, it unifies *optimization* and *learning* unlocking solutions for a vast amount of problems which had no viable solutions before.