Oddities in quantile forecasts - Inventory Optimization Software

Oddities in quantile forecasts

Keep learning with

Update May 2016: Most of the oddities that exist in the quantile forecasting technology of Lokad addressed in our probabilistic which represent our last generation of forecasting technology. In particular, quantile crossings and quantile instabilities are eliminated by design. See also building a purchase priority list with probabilistic forecasts.

Quantile forecasts are invaluable when it comes to inventory optimization, however, the numerical behavior of a small percentage of the quantile values produced by Lokad can appear to be quite surprising. In this page, we provide some insights into the underlying mechanism that generates these oddities, and what should be done about it.

A selection from a model library

Our Forecasting Engine is designed as a library of forecasting models that all compete with each other to deliver the most accurate forecasts. The overall architecture is described in our Forecasting Engine page. The competitive process that chooses the most accurate models (1) is called the selection. The selection relies on an extensive backtesting process where every single model is challenged against many truncated datasets.

This architecture is powerful because:
  • Forecasts can be made much more accurate than with mono-model approaches.
  • It does not make a lot of prior assumptions about the nature of the demand.
  • It’s resilient against overfitting and other kinds of systematic bias.

However, this same process may generate counter-intuitive behaviors.

Quantile instability

When selecting forecasting models, the most accurate ones typically express very similar behaviors. Hence, the second best model on Day 1 may become the best model on Day 2. However, the resulting change in forecast values is usually unnoticeable from a practical perspective.

However, infrequently, the best model and the second best may behave very differently. For example, model A can be very good at capturing seasonality, but poor when it comes to the trend, while model B is very good at seizing the trend, but poor with seasonality. In such situations, the two best models A and B might have quasi-identical accuracies on the whole but still present some very different forecasts when observed at a given point in time.

In practice however, while such situations are infrequent, they can be observed with practically any sizeable dataset. Indeed, any dataset with more than 100 items is likely to trigger this type of situation for at least one of the items; and the probability is increased if the historical demand is either erratic or sparse.

What we just described is in fact what happens at the backstage of Lokad. The vast majority of the time you won't even notice it. Nevertheless, you might observe such a situation if you trigger two runs in Lokad with similar but slightly different datasets. For instance, let's imagine that you generated your forecasts on Monday, based on a dataset A that included all your sales until Sunday. Then, for some reason, maybe because you had forgotten to include some new items, on Wednesday, you decide to generate your forecasts again, this time with a dataset B that includes all your sales until Tuesday. In your eyes, the datasets are very similar, since there is only a 2-day difference, and you believe that the forecasts will be exactly the same. And yet, you may observe some differences, sometimes significant. This is a very illustrative case of quantile instability: during the selection process, the 2 days of data that were added simply tipped the balance from one mathematical model to another with very close results in terms of general performance. The general accuracy that you will get will be slightly better, but individually, the forecasts might not be organized in the same way, which can lead to what you can identify as punctual oddities.

Quantile crossing

Similarly, when the service level is increased, it is expected that the reorder point will increase as well; and indeed, this pattern is dominantly true when experimenting with Lokad. However, in the same kind of situation as described above, when triggering two runs in Lokad, this time with similar datasets, but with slightly different service levels (say, shifting from 97% to 98%), one can observe that, for a handful of items, increasing the service level leads to a decrease of the reorder point when the results of the two runs are compared. Of course, from the user's point of view, this pattern feels wrong.

What we face here is actually an issue that has been known in statistics for decades and is called quantile crossing.

Once again, quantile crossing takes place in Lokad because of our selection process. Each of our quantile forecasting models is consistent: increasing the tau-factor (service level) does indeed increase the quantile value (reorder point). However, if the selection process elects another model, with a general accuracy that is slightly superior to the initial one, but which happens to behave very differently, punctually there is a bump in the series of quantile values, and we observe a quantile crossing. You must keep in mind that the general accuracy of the forecasts does not deteriorate in such cases; the best model is chosen for the specific service level you decide to target.

Again, in practice, this situation is both very infrequent and yet observable in any sizeable dataset. It must be noted that high service levels also increase the frequency of quantile crossing, because the forecast values are less stable. Indeed, it is more difficult to evaluate the top 1% extreme demand (99% service level) than to evaluate the top 10% extreme demand (90% service level). There is a strong leverage effect involved, so the gamut of mathematical models used to generate forecasts for a 98% service level is generally not the same as the one used for a 96 or 97% service level.

Quantile collapse

The quantile collapse issue represents a more extreme variant of quantile crossing. If the service levels are constantly being increased towards extreme values, at some point, not just some but most of the quantile forecasts start to shrink. Thus, by increasing the service level, it is possible to end up with suggested inventory quantities that are near-uniformly lower than the previous ones. We call this highly counter-intuitive behavior a quantile collapse.

In order to observe a quantile collapse, unreasonable service levels must be used. Thus, if you happen to face the situation detailed in this section, we strongly advise you to take a second look at our page about choosing your service levels as sticking to our guidelines will help to directly address this issue.

The quantile collapse is caused by known weaknesses in our forecasting technology. Simply put, using extreme service levels such as 99% - or even down to 97% if data are sparse – degrades the capacity of our forecasting engine to distinguish good forecasting models from even better ones. Indeed, at such high service level values, all forecasting models massively overestimate the demand and they do so on purpose. As a result, the model selection tends to regress towards the “average” behavior among our forecasting models, which results in the possibility of a collapse whereby forecasted quantities decrease while the service level increases.

While addressing the quantile collapse issue is certainly interesting from a theoretical viewpoint, in practice, this only happens with service levels that vastly exceed sustainable inventory levels in commerce. While producing better forecasts is a core mission for Lokad, our efforts are focused on scenarios that actually matter for businesses; thus, little efforts are planned to be put in place to mitigate quantile collapses.

Dealing with these numerical oddities

Both quantile instability and quantile crossing reflect a kind of imperfection inherent in forecasting technologies based on statistics. However, it is important to realize that quick-fixing such imperfections will actually make the situation worse.

It is possible to force the consistency by overriding the forecast value in order to prevent change greater than X% from occurring. It is also possible to override the forecast value to make it strictly increasing as the service level increases.

However, in both cases, we end up favoring one forecast value over another, but without any statistical ground to do so: when a single forecast, or a couple of forecasts are considered, the resulting value might appear more consistent, but on the whole it does not make things more accurate, and hence, not more profitable for a business relying on this value.

As a result, we recommend to stick with quantile values as they come. Do not try to manually correct one report by adding a value that comes from another report and that seemed to make more sense, because, usually, you have no way of knowing which of the two values was the "best". Trust in the general coherence of one report.

In the end, yes, there might be a visible gain that can be made by tracking these oddities, but visible does not equate important. From Lokad's perspective, there are entire categories of improvements that could be brought to our technology and that would yield much more significant improvements. From the merchant's perspective, investing more efforts in establishing accurate lead times and more adequate target service levels yields benefits much greater than chasing the sub-percent extra accuracy that could be gained by mitigating both quantile instability and quantile crossing.

Lokad's Forecasting Technology is a constant work in progress. We are committed to delivering the best forecasts on the market, and we will keep working on those infrequent odd behaviors described above.

(1) We are somewhat oversimplifying the actual selection process here. The actual Forecasting Engine leverages a complex combination of the winning models. However, for the sake of clarity, it is more straightforward to think of the selection process as picking a single “winning” model.