- The Quantitative Supply Chain Manifesto
- The Lokad test of supply chain performance
- An overview of quantitative supply chain
- Generalized probabilistic forecasting
- Decision-driven optimization
- Economic drivers
- Data preparation
- The Supply Chain Scientist
- Timeline of a typical project
- Project deliverables
- Assessing success
- Antipatterns in supply chain

- Inventory forecasting
- Prioritized ordering report
- Old forecasting input file format
- Old forecasting output file format
- Choosing the service levels
- Managing your inventory settings
- The old Excel forecast report
- Using tags to improve accuracy
- Oddities in classic forecasts
- Oddities in quantile forecasts
- Stock-out's bias on quantile forecasts
- Daily, weekly and monthly aggregations

Home » Resources » Here

The stock reward function quantifies the expected returns, both positive and negative, of holding a certain number of units in stock. Fundamentally, the stock reward function answers the question

However, while these probabilities give a detailed picture of the future, they don't tell us anything about decisions to be made as far as inventory is concerned. Inventory decisions cannot be based on demand probabilities alone; the financial risks should be factored in too.

For example, let’s consider two products having the same probabilities of demand. If the first product is long-lived while the second has a short shelf-life, then, from an inventory perspective, it makes sense to keep more units in stock for the long-lived product.

The stock reward is a mathematical function that computes the profitability of adding one more unit of stock for a given item, by taking into account a probabilistic forecast of the future demand and a few economic variables reflecting the expected profit when servicing the item, as well as the expected costs when the unit remains in stock due to lack of demand.

Lokad considers the stock reward function to be a **cornerstone of modern inventory optimization**. The solutions brought by the stock reward function are typically superior to those obtained through the naïve approaches that consist of targeting a specific service level or a fill rate. In reality, these latter approaches ignore all downside scenarios, that is, the costs associated with not selling the items in stock.

The economic angle should not be restricted to a naive profit-maximization analysis. In particular, the costs incurred from clients experiencing stock-outs should constitute an integral part of the analysis. However, the economic approach only provides the framework which aims to balance inventory costs with stock-out costs; but finding the right balance itself tends to be completely business-specific.

Let's define three variables associated with a single SKU when considering a duration that is equal to the lead-time:

- $M$ is the gross margin for selling 1 unit
- $S$ is the stock-out penalty (negative) for not serving 1 unit
- $C$ is the carrying cost penalty (negative) for not selling 1 unit in stock

These variables are

- the method fails to reflect the inventory risks associated with a future demand that may not happen. Hence, while the method may deliver good service levels, it's creating dead stock.
- the method fails to reflect the costs incurred on the client-side due to stock-outs, and also fails to demonstrate the opportunity loss of not serving the clients.
- the method fails to reflect the importance of serving the given unit and generating a profit that actually sustains the inventory itself.

Based on these considerations, let's review two simple scenarios depending on whether the demand exceeds the stock or not. Let $k$ be the number of units in stock, and let $y$ be the number of units requested by clients.

If the stock exceeds the demand, that is, $k \geq y$, then the immediate reward associated with the stock is $yM+(k-y)C$. Indeed, $yM$ accounts for the $y$ units that are served with their associated rewards, while $(k-y)C$ accounts for the carrying costs for the $(k-y)C$ units not sold at the end of a given the period.

If the demand exceeds the stock, that is, $k < y$, then the immediate reward is alternatively written as $kM+(y-k)S$. In this case, the first $k$ units get properly serviced and accounts for $kM$ in rewards, but then $y-k$ units are missing and incur the $(y-k)S$ penalty of stock-out.

We define the stock reward function as: $$R(t, k)= \begin{cases} kM+(y_t-k)S & \text{if $y_t \geq k$ (stockout)} \\ y_tM+(k-y_t)C + \alpha R^*(t+1, k-y_t) & \text{if $y_t < k$ (leftover)} \end{cases}$$ where :

- $k$ is the number of units held in stock
- $y_t$ is the demand for the period $t$
- $M$, $S$ and $C$ are the economic variables introduced previously
- $\alpha$ is a discount factor which will be discussed below
- $R^*$ is identical to $R$ but with $S=0$, and will also be discussed below

At first glance, this formula may look a bit overwhelming, but it's actually a straightforward model of a single SKU with $k$ units in stock confronting a demand of $y_t$ units. In fact, except for the $\alpha R^*(t+1, k-y_t)$ component, this expression is just like the immediate reward that we have detailed in the previous section.

Then, in order to take all the subsequent time periods into account, there are two twists. First, we have a recursive call to the reward function itself; signifying that the reward is the sum of the rewards (or losses) for the next time period plus all the rewards (or losses) for all the time periods that follow. At first, it might look puzzling to have a function that "walks" indefinitely into the future, but it merely reflects the fact that unsold inventory is carried on from one time period to the next.

Second, we introduce $\alpha$ as a discount factor for future rewards. This approach is inspired by the discounted cash flow concept that reflects the fact that a profit generated in a distant future has less value than a profit generated in a very near future. Conversely, the same logic applies for costs as well: an immediate cost is more impacting that a cost that is incurred in a distant future.

Finally, the recursion is performed using $R^*$, which ignores stock-out costs, instead of $R$. This reflects the fact that it is not the "responsibility" of current stock to prevent stock-outs for any other lead time period but the current one. By definition, the lead time represents the time duration to be covered by the current stock. For the next time period, there will be, by definition, another opportunity to buy more stock (we will see how the no-reorder case can be accommodated in the following section). Therefore, the responsibility of not hitting a stock-out for a time period that follows the next one falls on a later inventory decision.

As we will see in the following section, $\hat{R}$ can be computed for practical purposes. As a matter of fact, Lokad provides a built-in function named

`stockrwd`

that implements this precise formula. This point is covered in greater detail in the next section.In practice, the only measurement available is $\hat{R}$ because $R$ cannot be computed effectively since the future demand is not yet known. Thus, by using the stock reward function, we do indeed refer to its estimate $\hat{R}$ rather than to the "real" $R$ function. It should also be noted that the accuracy of the $\hat{R}$ estimate naturally depends on the accuracy of the underlying probabilistic forecasts. However, this discussion goes beyond the scope of the present document.

`stockrwd`

, the Envision function provided by Lokad.`stockrwd`

functions in Envision`stockrwd`

is a function of Lokad's Envision feature that implements the stock reward function (or rather its probabilistic estimation), given that a probabilistic forecast is readily available. In case we are interested in the reward increment for the k// margin reward component RM = stockrwd.m(Demand, AM) * M // stockout penalty component RS = stockrwd.s(Demand) * S // carrying cost component RC = stockrwd.c(Demand, AC) * C // recomposing the stock reward // with point-wise additions R = RM + RS + RCEnvision decomposes the stock reward function into its three components. As the components are linear with respect of their respective economic variables, the economic variables are kept outside the call to the

`stockrwd()`

function. This decomposition facilitates the inspection of the economic quantities generated by the stock reward, and make it easier to tune the economic assumptions that drive the calculation.The first argument

`Demand`

is expected to be a distribution. This distribution represents the probabilistic demand and is typically produced by the forecasting engine. As such, `Demand`

is not only expected to be a distribution, but it is also expected to be The three variables

`M`

, `S`

, `C`

are the economic variables defined at the beginning of this document. The arguments `AM`

and `AC`

are two distinct discount factors. In practice, `S`

and `C`

are expected to be negative. The two values `AM`

and `AC`

are also expected to be included in the segment $[0;1[$.The function returns

`R`

, a distribution which reflects $k \to R(k) - R(k-1)$. Beware, this distribution is not a random variable, but an economic reward function. Actually, the formal definition implies that it is not even a Let's review a typical definition of the economic variables:

M = SellPrice - BuyPrice // 0.5 arbitrary S = -0.5 * (SellPrice - BuyPrice) // 0.3 arbitrary C = -0.3 * BuyPrice * mean(Leadtime) / 365 // 'AM' for margin component AM = 0.3 // 'AC' for carrying cost component AC = 1 - 0.2 * mean(LeadTime) / 365We have:

`M`

is defined as the gross margin per unit.`S`

is arbitrarily defined as 0.5 times the gross margin. Naturally, the impact may vary from one industry vertical to the next depending on client tolerance for stock-outs.`C`

is expressed as annual carrying costs accounting for 30% of the initial purchase price per year. The factor`C`

reflects periods of`mean(LeadTime)`

days instead of years.`AM`

, the discount factor for margin reward, is expressed as a step decay of 70% from one period ot the next.`AC`

, the discount factor for carrying cost, is expressed as a 20% annual discount on future rewards. Likewise, the value is scaled to fit the lead time through`mean(Leadtime) / 365`

.

In practice, the probabilistic lead times are also expected to be forecast by the forecasting engine. Consequently, in the script above, we assume that

`Leadtime`

is a distribution.`A`

as documented above is not intended to be used in the same manner for the three components of the stock reward function.For the

`AM = 0.3`

.For the

For the

`AC`

, we are suggesting a 20% annual discount because inventory only generates costs over time, and because there is the opportunity cost to be considered: the money invested now to buy stock won't be available later on when the future demand has been observed.MB = 0.5 * SellPrice // arbitrary SB = 0.5 * SellPrice // arbitrary MBU = MB * uniform(1, Backorder) SBU = SB * uniform(1, Backorder) RM = MBU + (stockrwd.m(Demand, AM) * M) >> Backorder RS = SBU + zoz(stockrwd.s(Demand) * S) >> Backorder RC = (stockrwd.c(Demand, AC) * C) >> BackOrder R = RM + RS + RC // plain recompositionThe two economic variables

`MB`

and `SB`

represent the per-unit margin and stock-out penalty for the backordered units themselves. We could have used `M`

and `S`

instead, but as indicated above, back orders are typically considered as more important than just regular orders.The script extensively leverages the

`>>`

shift operator provided by Envision. Indeed, as the backordered quantities are assumed to be known demand, the distribution of rewards is shifted to the right accordingly. Beware, shifting the demand first, i.e. `Demand`

, would not yield the same results. Indeed, shifting the demand would tell the stock reward that at every period in the future, the `Backorder`

quantity would be guaranteed demand.The first graph - entitled

The

The

The

The

The

The

The final stock reward - not represented above - would be obtained by summing the three components of the stock reward function. The resulting distribution would be interpreted as the ROI for each extra unit of stock to be acquired. This distribution typically starts with positive values,the first units of stock being profitable, but converge to negative infinity as we move to higher stock levels given the unbounded carrying costs.

The term support classically refers to the demand levels associated with non-zero probabilities. In the graphs above, the term _support_ is used loosely to refer to the entire range that needs to processed as non-zero values by Envision. In particular, it’s worth mentioning that there are multiple calculations that require the distribution support to be extended in order to make sure that the final resulting distribution isn’t truncated.

- The shift operation, which happens when backorders are present, requires the support to be increased by the number of backordered units.
- The margin and carrying cost components of the stock reward function have no theoretical limits on the right, and can require arbitrarily large extensions of the support.
- Ordering constraints, such as MOQs, may require having inventory levels that are even greater than the ones reached by the shifted distributions. Properly assessing the tail of the distribution is key for estimating whether the MOQ can be profitably satisfied or not.

One notable insight of the illustration above is the need to extend the calculation of the stock reward function beyond the range of non-zero demand. Indeed, whenever MOQs are present, the company can be forced to buy goods beyond a 100% service level coverage of the future demand for the next period. The stock reward function covers those situations as well. The impact of MOQs is discussed in greater details in the following. In practice, the Envision runtime takes care of automatically adjusting the support to make sure that distributions aren’t truncated during the calculations.

In this context, it's reasonable to have:

- M=0, unless the parts are serviced for a price, there is no differentiated upside in servicing the part.
- S=
*constant*, since all NO-GO parts are equally capable of grounding an aircraft, the stock-out penalty is uniform. - C=
*constant*(annualized), since most parts are long-lived, it's acceptable, in a first approach, to approximate the annual carrying costs as a constant.

In this context, we would have:

- M, the gross margin
- S, a fraction of the gross margin
- C = 0, as nothing is carried on from one period to the next
- A = 0, idem, no reward can be gained from future periods.