# Generalized probabilistic forecasting

Optimizing the supply chain critically depends on the capacity to anticipate the future needs of market. Thus, demand forecasting has been identified early on in the 20th century as a cornerstone ingredient of the supply chain optimization process. However, the classic perspective on forecasting was primarily driven by limited access to computing resources. In fact, some of the earliest forecasting methods were even tailored for manual calculations. The access to a vast amount of computing resources has fundamentally changed the perspective on forecasting. One immediate upside is more accurate forecasts, another is more expressive forecasts. While it may appear counter-intuitive, it’s the latter aspect that accounts for most of the performance improvements of a supply chain. This refined type of forecast is known as a probabilistic forecast. By combining this probabilistic perspective with the modern toolset offered by machine learning - both algorithms and their software counterparts - Quantitative Supply Chain can address a much wider spectrum of challenges than the earlier supply chain methodologies geared around the notion of classic forecasts.

## Classic forecasting and its limitations

By "classic" forecasts, we refer to the notion of mean, periodic, equispaced, time-series forecasts. This specific type of forecast remains dominant in the world of supply chain and demand planning. While this type of forecasting holds limited practical interest from the modern perspective on supply chain optimization, its considerable historical importance has shaped many supply methodologies. More specifically, classic forecasts serve as the foundation for many supply chain methodologies. As a result, those methodologies inherit the limitations of the classic forecasts.

Let’s further refine this classic perspective on forecasting:

• Mean : The forecast tries to estimate the mean of the future demand; i.e., a well-balanced forecast has 50% of the mass of future demand that is above the forecast, and 50% below. It can be mathematically proved that estimating the mean is equivalent to the least squares error minimization.
• Periodic : The future demand is chunked in periods, such as day, week, month or year. The forecast takes the shape of a real-valued vector. The size of the vector is commonly referred to as the horizon. For example, when forecasting 10 weeks ahead, the period is the week, while the horizon is 10.
• Equispaced : The periods are assumed to hold uniform properties, and, in particular, to be of identical duration. This assumption is valid for day and week periods; however, it is only loosely correct for months and years. Despite the approximation, the numerical process of periods does not differentiate the periods and assumes full uniformity.
• Time-series : The historical data is assumed to be homogeneous in format with the forecast that will be produced. In particular, the historical data is modeled as a real-valued vector, related to the same period as the one that has been selected for the forecast. The length of the input vector is limited by the depth of available historical data.

Sometimes, the mean estimator is replaced by the median . In the case of the median, a well-balanced forecast has a 50% chance of being above or below the future demand; irrespective of the demand mass. Optimizing an estimator to get the best median is equivalent to the minimization of the mean absolute error - it was the least squares error for the mean. As far as the present discussion is concerned, it makes no difference whether mean or median forecasts are being considered; idem if the metric is the MAPE (mean absolute percentage error) , or a weighted flavor of the MAPE. Thus, for the sake of simplicity, we refer to all these forecasts as “classic”, placing all the minor variants under the same umbrella.

Such types of forecasts are so deeply rooted in the world of supply chain, that it may take considerable effort to step back and consider all the far-reaching implications of this perspective. By way of anecdotal evidence, multiple leading software solutions actually "hard-code" those classic forecasts in their very architecture, featuring database tables containing 52 columns, where each column is associated with a particular week. By doing so, the very architecture of those software solutions is blind to entire classes of supply chain optimizations.

## Uncertainty, the elephant in the room

Future demand comes with a high degree of irreducible uncertainty : most events that impact capitalist markets simply cannot be deterministically modeled. Your company cannot predict when your competitor is going to lower its prices, suddenly recapturing market shares. Your company cannot foresee a major industrial accident in a port in China, which will leave a critical supplier unable to deliver its products on time. Your company cannot even predict whether a new technology, currently being developed internally, will prove itself to be a superior alternative to the technology now in use.

Yet, while this irreducible uncertainty of the future state of markets seems obvious in hindsight, it’s peculiar that the classic forecasts nearly dismiss entirely the challenge instead of tackling it. Indeed, from the classical forecasting perspective, any uncertainty is absent. In practice, the forecasts' accuracy can be estimated through techniques such as backtesting, but it’s an external process that is largely decoupled from the forecasting process.

As the uncertainty is ignored, many situations can simply not even be expressed in the forecasts. For example, let’s consider the case of a wholesaler servicing a couple of large retail networks. A given product may appear to be associated with a very steady demand, as the product is consistently ordered every week in quantities that are relatively stable over time. However, delving further into the demand's structure, the orders are all actually coming from a single retail network. Hence, if the wholesaler gets delisted by this retail network for this specific product, the demand will drop to zero, immediately creating a dead stock from the leftover stock on the wholesaler side. Thus, while the average projected demand for the product may be high, the risk of the demand suddenly dropping to zero is real. This risk cannot be modeled by simply lowering the average demand. The classic forecasts fail to express such a bimodal situation : one mode is more-of-the-same, while another mode is termination.

## Known biases in demand history

The time-series perspective on a demand forecast assumes that the historical demand record we have properly reflects the demand itself. However, in practice, this assumption is rarely true, as the market demand is only indirectly observed. This point is subtle yet important in practice: the history of client orders is only an approximation of the market demand, not the real market demand, which is to some extent unknowable. For example, whenever a company faces a stock-out, clients start looking elsewhere to get their supplies. The quantities acquired through alternative channels to mitigate a stock-out won’t be reflected in the sales history, hence generating biases.

Those biases are ubiquitous. Even considering a situation where all the clients’ demands are logged, no matter if those demands have been serviced on time or not, there are still biases. Let’s consider, for example, a regional warehouse servicing a list of retail stores. Each store sends an order to the warehouse on a daily basis for its replenishment. The orders do not take into account the stock availability in the warehouse; it’s the responsibility of the warehouse to make a best effort to serve all stores fairly, based on the stock physically available. In this situation, if a store order cannot be fulfilled on day 1, as usual, the same order is moved to day 2. Possibly, it is a bit larger and still facing the same stock-out, which continues to plague the warehouse side. Yet, this process creates another artefact : when facing a stock-out situation, stores are virtually ordering a lot more than they would otherwise, because they keep reordering the same quantities as long as they aren’t delivered. Thus, while everything is logged, the total ordered quantities cannot be interpreted as properly reflecting the demand. In practice, the situation is further complicated by the consequences of the warehouse stock-outs, which in turn, generate stock-outs at the store level. Neither are those store stock-outs logged by the customers walking the aisles of the stores.

Considering the modern statistical tools that are presently available, the problem is not so much the existence of the biases, but the inadequacy of the classic forecasting perspective to embrace those biases. In fact, the time-series perspective is not only simple, it’s simplistic. Furthermore, the input data, modelled as a real-valued vector associated with past periods, cannot reflect the information that might be available about those biases. As a result, in order to mitigate the problem, classic forecasting typically involves some pre-processing steps , which recursively use the forecasting process itself to "fill the gaps" during the periods where the demand is known to be severely biased, i.e. replacing the zeros generated by a stock-out by the demand values that had been originally forecasted for those dates. Yet, by doing so, the company ends up building forecasts on top of forecasts, which are doubly inefficient. First, forecast-on-forecast is a recipe for generating very inaccurate forecasts . Second, it further complicates the data preparation stage , which is already the most complicated part of quantitative modeling .

## Forecasting is not just about the future demand

The time-series forecasting perspective has been so dominant in supply chain history that it has frequently led to what we could call a golden hammer problem : if all you have is a hammer, then everything else is a nail. The future demand is only one of the many elements that need to be forecast, and a time-series forecast is only one of the approaches that can be used to execute the forecast.

Lead times are of primary importance. The inventory kept by a company is only appropriate if the quantities held in stock are just enough to cover the demand over the duration of the lead time. Holding more stock is unnecessary because the stock will have been replenished by then. Yet, the lead times themselves exhibit complicated behaviors. Assuming that the supplier lead time is 7 days, just because it’s written in a contract, is both inefficient and dangerous. It’s inefficient because suppliers tend to negotiate lead times that they feel capable of sustaining, even under adverse circumstances, i.e., worse case scenarios. Yet, in practice, it’s frequent to observe suppliers vastly outperforming their contractual lead times on average. Also, it’s dangerous because if a supplier routinely fails at meeting its contractual lead times, the rest of the supply chain will keep "pretending" that all is well and dismiss entirely any attempt to mitigate the problems caused by this supplier.

Thus, lead times need to be forecasted. Just like demand, lead times can be forecasted based on historical data, and, just like demand, lead times exhibit complex statistical patterns , such as seasonality that can be used to refine the forecasts. For example, manufacturers in China are likely to exhibit lead times that increase by 3 to 4 weeks every year around the period of the Chinese New Year , merely as a consequence of the factories being closed during that time.

Then, beyond lead time and demand, there are many more supply chain elements that also require a forecast of a kind. For example, we have:

• Customer returns : in fashion e-commerce, a customer may return a sizable portion of the goods being ordered. For example, in Germany, it’s typical for customers to order multiple pair of shoes and to later return the sizes that do not fit. There are many situations where returned quantities exceed 30% of the quantities originally ordered. Thus, the quantities to be returned should be forecast as well.
• Unserviceable received goods : in food retail, the merchandise is fragile and perishable, and it’s frequent that a sizeable portion of the merchandises received by the warehouse does not pass the quality controls. For example, half of the strawberry cartons received by a warehouse might be ditched on the spot, because the merchandise is not considered sellable any more. When passing purchase orders to the suppliers, it’s important to take into account the expected fraction of merchandise that will not pass quality control checks. Thus, forecasting how much will be rejected is highly relevant.
• Electronic record inaccuracies : in retail, the inventory accuracy at store level is frequently quite low. Indeed, customers may damage, steal or simply move products within the store, creating discrepancies between the electronic record of the stock level and the real physical stock level on the shelf. The discrepancies between the real stock level and the electronic ones can be predicted by using the history of stock corrections generated by the stock counting process.

Depending on the vertical, there are many more problems that require a forecast or a predictive statistical estimation of some kind. It’s important to identify those problems as such, because otherwise the supply chain keeps operating on rules that may or may not be appropriate; without ever putting itself in a position to benchmark and improve upon those rules.

## Generalized forecasting with machine learning

During the last decades, the field of machine learning, which can be seen as an intersection of computer science and statistics , has been making tremendous progress. Far from having reached its peak, machine learning is still progressing faster than ever, fueled by the recent breakthroughs in deep learning . The machine learning field has gathered a large body of both software implementations and quantitative insights, to extract and leverage knowledge found in datasets of all kinds. While machine learning itself is beyond the scope of the present discussion, it’s important to understand its implications as far as Quantitative Supply Chain is concerned.

Machine learning offers a method of handling in a systematic fashion pretty much any sufficiently large body of data. More data does not make the challenge more complicated; on the contrary, it makes it easier. This insight is very important and is counter-intuitive from the classic supply chain perspective. Indeed, many supply chain practitioners, when facing a tough supply chain challenge, are tempted to narrow it down to a smaller scope, in order to make the problem more manageable. Yet, from a machine learning perspective, a smaller amount of data nearly always implies more work for the data scientist , in order to get the algorithms working despite the limitations of the dataset. Machine learning algorithms - all of them - are built to work better with more data. Arguably, many of the largest operational successes in machine learning, including voice recognition or machine translation, to name a few, have been achieved by finally succeeding at processing much larger datasets compared to the earlier attempts.

Once enough relevant data has been gathered, machine learning provides numerous methods that require little or no tuning to start generating very diverse types of forecasts. Removing nearly all manual tuning from the data pipelines powered by machine learning is an industry-wide, decades-long effort in both academic circles and the software industry. At present, most modern machine learning methods require very little manual adjustments . Actually, the machine learning community is increasingly skeptical on the sustainability of any approaches that require more than a few superficial manual adjustments. This stance has fueled many of the greatest successes in the machine learning fields and is especially strong in deep learning. Beware: while machine learning algorithms may need little or no tuning, preparing the data requires considerable effort. However, those efforts are largely agnostic of the type of machine learning algorithms used at a later stage in the data pipeline.

Thus, we advise supply chain practitioners to be highly skeptical of any predictive statistical solutions that offer even the possibility of manually adjusting the forecasts. Such a feature indicates that the solution's design is undermining some of the most important insights acquired in machine learning. In practice, it’s a near guarantee that the solution will suffer from the same issues that plagued the early rule-based computer systems , which proved to be a maintenance nightmare a few decades ago.

One key upside for predictive supply chain obtained from the advances in machine learning is that generating diverse types of forecasts does not require more effort than producing traditional demand forecasts. Nearly all efforts are concentrated in the data preparation , and, later on, on aligning company organization to make the most of those newly available forecast flavors.

Quantitative Supply Chain embraces machine learning in order to inject predictive supply chain capabilities whenever relevant and feasible. Instead of focusing on pure demand forecasts, Quantitative Supply Chain seeks to confront all sources of uncertainty in the supply chain : lead times, production defects , market shifts, etc. Under the hood, Quantitative Supply Chain makes extensive use of machine learning technologies, which allow the generation of very diverse types of forecasts to precisely fit the supply chain's requirements. This approach differs greatly from the classic perspective, which forces weekly or monthly demand forecasts to provide answers to loosely related problems.

## Probabilistic forecasting to cope with uncertainty

When facing irreducible uncertainty , as is usually the case when looking ahead while considering complex supply chains, it is very desirable to forecast not only the most probable future outcome, but many alternative outcomes as well. Probabilistic forecasting is the most popular statistical formalization of this very insight. It generates a statistical estimate for all the possible outcomes. This generalized estimator takes the form of a probability distribution associated with every possible outcome. Probabilistic forecasting can be considered a rather extreme variant of the what-if methodology, where all scenarios are being considered.

While probabilistic forecasting might feel like a rather theoretical approach, it’s actually both tractable and straightforward. Let’s consider a probabilistic demand forecast. Instead of computing a single number representing the expected value for the average future demand, we compute a list of probabilities: the probability of observing 0, 1, 2, 3, etc. units of demand. In order to visualize all the probabilities, the most common approach is to use a histogram , where each bucket represents the probability associated to a specific demand level. The probabilities put together are collectively referred to as forming a probability distribution .

Forecasting distributions of probabilities, i.e. probabilistic forecasting, is a generalization of the traditional mean or median forecast. While it might appear as considerably more complex, it is a well-established statistical approach that is already widely used in many domains. For example, nearly all the latest advances in deep learning, which are making self-driving vehicles a reality, are leveraging the probabilistic perspective at their core (and more specifically a Bayesian perspective, which is beyond the scope of the present discussion). The practice of probabilistic forecasting is well known, with thousands of research papers already published on the subject and numerous software implementations available.

Quantitative Supply Chain emphasizes that probabilistic forecasting is the preferred form of forecasting. Indeed, in supply chain, it is not the average situations that cost money, it’s the extreme ones : unexpectedly high demand resulting in a stockout, or unexpectedly low demand creating an overstock. Probabilistic forecasts confront this problem directly by delivering probabilities for all possible situations, including the problematic ones. Probabilistic forecasting is the cornerstone of structured risk management in supply chain. Through those forecasts it becomes possible to profitably mitigate problems. Indeed, supply chain is all about trade-offs : zero stock-outs usually imply infinite inventory, which is obviously not a reasonable option. Without probabilistic forecasts, comparing the respective cost of stock vs. cost of stock-out is mere guesswork.

A minor downside of probabilistic forecasting is that it tends to be a lot more intensive, in terms of computing resources , than simple forecasting methods. In particular, while it is feasible to have a classic forecasting model (e.g., exponential smoothing) implemented in Microsoft Excel, many, if not most, probabilistic forecasting models require more computing resources than can be conveniently pushed through a spreadsheet. Yet, with the advent of cloud computing, computing resources have never been so inexpensive: currently, some cloud computing platforms are already offering public prices below \$10 for 1000 compute hours on a high-end 2GHz single core server. However, in practice, in order to take advantage of such low-cost processing power , the software supporting the probabilistic forecasting process needs to have been designed from the start to operate within cloud computing platforms.

Once again, most probabilistic forecasting algorithms are derived from the insights and the findings of the broad machine learning field. Yet, just like you don’t need to be a machine learning expert to enjoy the benefits of having a spam filter - powered by machine learning - to keep your email inbox clean, you don’t need to be a machine learning expert to bring a supply chain to its next stage of performance through machine learning technologies. As discussed above, two of the key aspects of machine learning is precisely a core focus on automation and the (near) elimination of all manual tweaking of statistical models.