# Probabilistic demand forecasting

Home » Resources » Here

## General syntax

The forecasting engine has a function dedicated to probabilistic forecasts. The syntax is the following:

Demand = forecast.demand(
category: C1, C2, C3, C4
hierarchy: H1, H2, H3, H4
label: PlainText
location: Store
demandStartDate: LaunchDate
demandEndDate: EndDate
offset: 0
present: (max(Orders.Date) by 1) + 1
demandDate: Orders.Date
demandValue: Orders.Quantity
censoredDemandDate: Stockouts.StartDate, Stockouts.EndDate
promotionDate: Promotions.StartDate, Promotions.EndDate
promotionDiscount: Promotions.Discount
promotionCategory: Promotions.Type
covariable: Campaign
covariableObserved: false


Unlike regular functions, call functions have named arguments instead of positional arguments. These named arguments are more suitable for complex functions because they make the source code much more readable - at the expense of limited extra verbosity. These arguments behave just like regular function arguments, thus, they are permitted for Envision expressions.

The function returns a vector Demand that is of type distribution (see also Algebra of Distributions). Distributions are an advanced data type that represents functions $p: \mathbb{Z} \to \mathbb{R}$. More specifically, the forecasting engine returns random variables - that is - distributions that are positive and have a mass equal to 1. In the present case, $p(k)$ represents the probability associated with a demand of $k$ units. Each item - in the Envision sense - becomes associated with its own demand distribution.

The full forecast.demand syntax includes many arguments, however, only four of them are mandatory:

• present: a scalar date value
• demandDate: a date vector with an item affinity
• demandValue: a number vector with an item affinity
• horizon: a distribution vector

The present value is the date intended as the first day to be forecast, following the assumption that data are complete up to the day before. Indeed, some businesses may be closed on Sundays for example, and if the most recent date found in the dataset is a Saturday, there is an ambiguity as to whether the forecast should start on Sunday or Monday. In the illustrative syntax above, we use max(Orders.Date) + 1, assuming that orders are observed every day, and that the input data is fresh from the day before.

The demandDate and demandValue are expected to belong to the same table which exhibits an item affinity, that is [Id, *] in Envision terminology. The dates demonstrate when the demand was observed in the past. The values represent the scale of the demand - which is typically counted in units or eaches. Fractional demand values are not supported. This table contains the demand history being forecast by the forecasting engine. Ideally, the history length should be made as long as possible; although in practice there are limited benefits in exceeding 5 years' worth of demand. The forecasting engine accommodates both short and long demand history alike; when this history is long, older data points simply fade into statistical irrelevance.

The horizon represents the probabilistic lead time to be used when forecasting the demand. And while the lead time is treated as an input when computing an integrated demand forecast, the lead time is typically also a forecast in itself. The forecasting engine offers the possibility to forecast lead times. The lead time forecast is decoupled from the demand forecast itself because this offers the possibility to perform ad-hoc adjustments on the lead time distributions before feeding them into the forecasting engine.

Beyond these mandatory arguments, the accuracy of the forecasts can be greatly improved by providing more data to the forecasting engine. The following sections explain this in more detail.

## Formal definition

In this section, we briefly detail the formal definition of the statistical operation performed by the forecasting engine when computing an integrated demand forecast.

Let $y(t)$ be the demand function and $t$ the time. Let the integrated demand $D$, associated with the random variable $\Lambda$ representing the lead times, be defined as follows:

$$\text{D} : (y,\Lambda,t_0) \to \int_0^{\infty} \mathbf{P}[\Lambda=\lambda] \left( \int_{t_0}^{t_0+\lambda} y(t) dt \right) d\lambda$$ where $\mathbf{P}[\Lambda=\lambda]$ represents the probability for the lead time random variable $\Lambda$ to be equal to $\lambda$. The demand is qualified as integrated because it is an integration over a probabilistic lead time.

If $t_0$ represents the present date, then the demand is known - because observed - until the time $t_0$, but unknown afterwards. The purpose of the forecasting engine is to compute $\hat{D}(y, \Lambda)$, a probabilistic estimate of this future demand expressed as a random variable.

## Categories, hierarchy and label

Categories, hierarchy and plain-text labels play a very similar role from the forecasting engine perspective: they help the forecasting engine cope with sparse historical data.

See Forecasting with categories and a hierarchy.

## New product forecasting

From a forecasting perspective, a new product is a product that has not yet been sold. This represents a rather specific forecasting challenge because, by definition, there is no historical data associated with this new product. Our forecasting engine supports new product forecasting through the demandStartDate argument. When historical start dates are known, it is advised to provide this information to the forecasting engine as it contributes to improving the forecasting accuracy, for both new and old products.

The demandStartDate argument expects a date to be provided for each item. This date is intended to represent the first day when the demand becomes effective for this item. This date is in the past for items that have already been sold, and it remains in the future for items that are yet to be launched.

There are two distinct benefits to providing the demandStartDate argument. Obviously, the first benefit consists of forecasting new items. In this case, it is usually also important to specify the offset argument. Indeed, if the offset is kept at zero - its default value - then the period covered by the forecast might not overlap the active period for the item.

Example: today is July 1st. The forecasting horizon is a Dirac distribution at 7 days; that is a constant lead time of 7 days. Product A is launched on July 15th - its start date. If the forecast is carried out today, then the forecast distribution for Product A is a Dirac at zero because the horizon ends prior to the start date of Product A. In order to forecast the first week of demand for Product A, the offset for Product A should be set at 14 days.

The second benefit of specifying the demandStartDate is to increase the forecasting accuracy for all items, and not just the items that are yet to be launched. Indeed, observing the first unit sold on the start date for a given item is not the same thing as observing the first unit sold six months after its launch. While the former case hints at steady upcoming sales, the latter hints at a very limited demand of only a handful units per year. The forecasting engine leverages the demandStartDate argument to refine the demand forecasts for all the items.

## Censored and inflated demand

The intent is to forecast the demand. Yet, frequently, historical data only approximates the real demand, thus creating distortions (willingly or not). For example, historical data might be represented by historical sales. However, in the case of stock-outs, sales volumes drop while the demand itself might remain steady. Lokad's forecasting engine is natively designed to take care of these distortions, and this is the very purpose of both the censoredDemandDate and inflatedDemandDate arguments. Both arguments expect a date vector of item affinity, that is (Id, Date) in Envision terminology.

When a date for a given item is marked as censored through censoredDemandDate, the forecasting engine will assume that the demand is higher or equal to the observed value. The engine makes no assumptions on how high the demand would have been on this particular day as this value cannot ever be known to us. Yet, by pinpointing the bias, the engine can roll out entire classes of optimization tailored around this case. In practice, the most common occurrence for censored demand are stock-outs as observed through sales data that does not capture information on prospects who silently walk away while goods are missing.

Similarly, demand can be inflated. The inflatedDemandDate argument offers the possibility to pinpoint the dates and the items where the demand should be considered as lower or equal to the observed demand. Again, the real demand remains unknowable, but pinpointing the bias is already very helpful to the forecasting engine. In practice, demand is inflated when there are temporary non-recurrent market boosts: for example, the exceptional victory of a local sports team in a national championship may impact very favorably on the sales of local supermarkets for a few days.

The two arguments inflatedDemandDate and censoredDemandDate can take one or two vectors as input. If two date vectors are provided, then, the pairs (start, end) are treated as inclusive segments, with the first date being the start of the segment, and the second date being the end of the segment. If only one date vector is provided, then, segments are considered to be 1-day long; the dates flag the exact days to be considered as inflated or censored.

If demand censorship or inflation are recurrent - per year, per week, etc. - then, there is no need to mark the demand as such, as the forecasting engine handles such patterns automatically.

## Forecasting promotions

The forecasting engine offers a native support for promotions. Providing data about promotions is optional. However, when promotions data is provided, promotions are expected to be specified both in the past and in the future. At the very least, the argument promotionDate can be provided alone. The argument promotionDate follows same usage pattern as censoredDemandDate: when a single date vector is provided, promotional periods are considered to be 1-day long; if two dates are provided, the first vector represents inclusive start dates, while the second represents inclusive end dates.

The promotionDiscount argument is optional, and can be provided in order to help the forecasting engine gain insights about the intensity of a given promotion. A number vector is expected for this argument, and the forecasting engine treats this data as ordinal values: the greater the discount, the greater the expected promotional impact. In practice, it is the forecasting engine that computes the expected demand uplift based on uplifts observed for past promotions.

The promotionCategory argument is also optional, and can be provided as a classification of promotional events. When provided, this argument is leveraged by the forecasting engine to test the affinities between promotional events and to detect whether events marked within the same category achieve similar demand uplifts. This argument is very similar in spirit to the category argument, except that it is applied to promotions instead of being applied to items.

Caveat lector. Promotions are notoriously difficult to forecast even with excellent historical data. Lokad's experience indicates that most companies do not have high accuracy promotional data readily available. That being said, such data can be obtained through careful preparation at the later stages of a project. As rule of thumb, gathering promotional data that is good enough for actually improving the accuracy of a forecast, requires significant amounts of effort. Feeding the forecasting engine with approximate promotional data only decreases the resulting accuracy.

When promotions data is provided, the periods relating to promotional activity should typically not be flagged through inflatedDemandDate. Flagging a period both through promotionDate and inflatedDemand date has a subtle semantic: it indicates that the promotional uplift has been inflated beyond what would be reasonably expected from a promotion, and the promotion itself would be considered as biased.

## Covariables

Covariables represent a fairly advanced mechanism for conveying information to the forecasting engine. In practice, it is not recommended to use covariables because this generally proves to be quite a complicated undertaking for obtaining satisfying results. Intuitively, covariables are designed to take advantage of indicators that accurately and reliably anticipate the demand. For most businesses there are no such indicators, and the demand history itself remains the best signal there is for anticipating future demand. However, there are some verticals where such a signal does exist.

Contoso Inc is a maintenance company taking care of wind turbines. The company does not build wind turbines, but wins tenders for large maintenance contracts. In order to service the wind turbines, Contoso needs to maintain its stock of spare parts. The quantity of the spare parts required is very linearly dependent on the number of wind turbines that need to be maintained. Because the results of tenders are known months prior to the contract start dates, Contoso is using the number of wind turbines as a covariable to help refine its spare parts needs.

One covariable can be assigned per item at most, and this is done through the covariable argument. All covariables are expected to be defined within a single table containing 3 columns:

• covariableName which acts as a foreign key in relation to the covariable argument
• covariableDate which represents the date associated with the covariable, possibly in the future
• covariableValue which represents the value of the covariable at the specified date

The last argument is a Boolean flag named covariableObserved, and a Boolean scalar is expected in this case. The default value is false. When this argument is set to true, the forecasting engine assumes that, from a historical perspective, the covariable values were not known before their date, and hence the values were observed. Incorrectly setting this field would mislead the forecasting engine in assuming that covariable data is available in advance when this might not be the case.