Stock Reward Function (Supply Chain)

Stock Reward Function

Home » Resources » Here
By Joannès Vermorel, December 2015 (last revised May 2016)

The stock reward function quantifies the expected returns, both positive and negative, of holding a certain number of units in stock. Fundamentally, the stock reward function answers the question What do we get from holding one more unit in stock? This function can be used to compose a prioritized ordering policy where all units are prioritized according to their specific economic returns. Lokad recommends using the stock reward function for most inventory optimization situations.

The not too technical perspective

From a purely forecasting perspective, the future demand is best represented through probabilities associated with all possible futures; that is, the probability of having a demand of 0 units, the probability of having a demand of 1 unit, etc … with such probabilities being computed for every single item (products, SKUs, barcodes) depending on the context.

However, while these probabilities give a detailed picture of the future, they don't tell us anything about decisions to be made as far as inventory is concerned. Inventory decisions cannot be based on demand probabilities alone; the financial risks should be factored in too.

For example, let’s consider two products having the same probabilities of demand. If the first product is long-lived while the second has a short shelf-life, then, from an inventory perspective, it make sense to keep more units in stock for the long-lived product.

The stock reward is a mathematical function that computes the profitability of adding one more unit of stock for a given item, by taking into account a probabilistic forecast of the future demand and a few economic variables reflecting the expected profit when servicing the item, as well as the expected costs when the unit remains in stock due to lack of demand.

Lokad considers the stock reward function to be a key ingredient in inventory optimization. The solutions brought by the stock reward function are typically superior to those obtained through the naïve approaches that consist of targeting a specific service level or a fill rate. In reality, these latter approaches ignore all downside scenarios, that is, the costs associated with not selling the items in stock.

Fundamental economic factors

The stock reward analysis is an economic analysis in the sense that it seeks to establish the financial returns of an inventory position. In order to achieve this, we need to introduce a few basic economic factors that impact the returns obtained from inventory.

The economic angle should not be restricted to a naive profit-maximization analysis. In particular, the costs incurred from clients experiencing stock-outs should constitute an integral part of the analysis. However, the economic approach only provides the framework which aims to balance inventory costs with stock-out costs; but finding the right balance itself tends to be completely business-specific.

Let's define three variables associated with a single SKU when considering a duration that is equal to the lead-time:

  • $M$ is the profit reward for selling 1 unit
  • $S$ is the stock-out penalty (negative) for not serving 1 unit
  • $C$ is the carrying cost penalty (negative) for not selling 1 unit in stock

These variables are fundamental in the sense that no inventory optimization can take place without an estimation of these very variables. Without even a rudimentary estimation of the aforementioned variables, any ordering methodology is bound to suffer from one or more of the issues listed below:

  • the method fails to reflect the inventory risks associated with a future demand that may not happen. Hence, while the method may deliver good service levels, it's creating dead stock.
  • the method fails to reflect the costs incurred on the client-side due to stock-outs, and also fails to demonstrate the opportunity loss of not serving the clients.
  • the method fails to reflect the importance of serving the given unit and generating a profit that actually sustains the inventory itself.

Based on these considerations, let's review two simple scenarios depending on whether the demand exceeds the stock or not. Let $k$ be the number of units in stock, and let $y$ be the number of units requested by clients.

If the stock exceeds the demand, that is, $k \geq y$, then the immediate reward associated with the stock is $yM+(k-y)C$. Indeed, $yM$ accounts for the $y$ units that are served with their associated rewards, while $(k-y)C$ accounts for the carrying costs for the $(k-y)C$ units not sold at the end of a given the period.

If the demand exceeds the stock, that is, $k < y$, then the immediate reward is alternatively written as $kM+(y-k)S$. In this case, the first $k$ units get properly serviced and accounts for $kM$ in rewards, but then $y-k$ units are missing and incur the $(y-k)S$ penalty of stock-out.

Definition of the stock reward function

In the previous section we have computed an immediate reward, however, inventory optimization is an iterated process. Units of stock that don't get sold in the next time period may get sold in the period that follows, thus generating a delayed profit. Alternatively, units that don't get sold in the next time period may not get sold in the period that follows either, thus incurring further delayed carrying costs. The stock reward function addresses this challenge by taking into account not only the next time period, but all the periods that follow.

We define the stock reward function as: $$R(t,k)= \begin{cases} kM+(y_t-k)S, & \text{if $y_t \geq k$ (stock-out)} \\ y_tM+(k-y_t)C + \alpha R^*(t+1, k-y_t), & \text{if $y_t < k$ (leftover)} \end{cases}$$ where :

  • $k$ is the number of units held in stock
  • $y_t$ is the demand for the period $t$
  • $M$, $S$ and $C$ are the economic variables introduced previously
  • $\alpha$ is a discount factor which will be discussed below
  • $R^*$ is identical to $R$ but with $S=0$, and will also be discussed below

At first glance, this formula may look a bit overwhelming, but it's actually a straightforward model of a single SKU with $k$ units in stock confronting a demand of $y_t$ units. In fact, except for the $\alpha R^*(t+1, k-y_t)$ component, this expression is just like the immediate reward that we have detailed in the previous section.

Then, in order to take all the subsequent time periods into account, there are two twists. First, we have a recursive call to the reward function itself; signifying that the reward is the sum of the rewards (or losses) for the next time period plus all the rewards (or losses) for all the time periods that follow. At first, it might look puzzling to have a function that "walks" indefinitely into the future, but it merely reflects the fact that unsold inventory is carried on from one time period to the next.

Second, we introduce $\alpha$ as a discount factor for future rewards. This approach is inspired by the discounted cash flow concept that reflects the fact that a profit generated in a distant future has less value than a profit generated in a very near future. Conversely, the same logic applies for costs as well: an immediate cost is more impacting that a cost that is incurred in a distant future.

Finally, the recursion is performed using $R^*$, which ignores stock-out costs, instead of $R$. This reflects the fact that it is not the "responsibility" of current stock to prevent stock-outs for any other lead time period but the current one. By definition, the lead time represents the time duration to be covered by the current stock. For the next time period, there will be, by definition, another opportunity to buy more stock (we will see how the no-reorder case can be accommodated in the following section). Therefore, the responsibility of not hitting a stock-out for a time period that follows the next one falls on a later inventory decision.

Probabilistic estimate of the stock reward function

The expression of the stock reward function $R$ depends on the future demand $y_t$ that is typically unknown. However, $R$ can still be computed if forecasts are available. In order to compute $R$, we advise leveraging a probabilistic forecast of future demand, that is, not just an estimate of the average future demand, but estimates of the entire probability distribution. Based on this insight, we can introduce $\hat{R}$, that is the empirical estimation of $R$ that relies on a probabilistic demand forecast. The function $\hat{R}$ is written as follows: $$ \begin{align} \hat{R}(t,k)= & \sum_{y |y \geq k} \mathbf{P}(Y_t=y) ( kM+(y-k)S ) \\ & + \sum_{y|y<k} \mathbf{P}(Y_t=y) ( yM+(k-y)C + \alpha \hat{R}^*(t+1, k-y) ) \end{align} $$ This expression transforms the original $R$ expression into conditional probabilities. The first line reflects the stock-out scenarios while the second line reflects the stock left-over scenarios. Both lines are weighted against their respective probabilities.

As we will see in the following section, $\hat{R}$ can be computed for practical purposes. As a matter of fact, Lokad provides a built-in function named stockrwd that implements this precise formula. This point is covered in greater detail in the next section.

In practice, the only measurement available is $\hat{R}$ because $R$ cannot be computed effectively since the future demand is not yet known. Thus, by using the stock reward function, we do indeed refer to its estimate $\hat{R}$ rather than to the "real" $R$ function. It should also be noted that the accuracy of the $\hat{R}$ estimate naturally depends on the accuracy of the underlying probabilistic forecasts. However, this discussion goes beyond the scope of the present document.

The stockrwd function in Envision

stockrwd is a function of Lokad's Envision feature that implements the stock reward function (or rather its probabilistic estimation), given that a probabilistic forecast is readily available. In case we are interested in the reward increment for the kth unit in stock, we define the Envision function as: $$\text{stockrwd}: k \to R(k)-R(k-1)$$ The corresponding Envision syntax is the following:
R = stockrwd(D, M, S, C, A)

The first argument D is expected to be a distribution. This distribution represents the probabilistic demand and is typically produced by the forecasting engine. As such, D is not only expected to be a distribution, but it is also expected to be random variable (mass equal to 1).

The last four arguments M, S, C and A reflect the economic variables defined at the beginning of this document. In practice, S and C are expected to be negative. The A value is also expected to be included in the segment $[0;1[$.

The function returns R, a distribution which reflects $k \to R(k) - R(k-1)$. Beware, this distribution is not a random variable. Actually, the formal definition implies that it is not even a compact distribution. However, in Envision, R is truncated to match the support of the distribution D.

Let's review a typical definition of the economic variables:
M = SellPrice - BuyPrice
S = -0.5 * (SellPrice - BuyPrice) // 0.5 arbitrary
C = -0.3 * BuyPrice * mean(LeadTime) / 365 // 0.3 arbitrary
A = 1 - 0.2 * mean(LeadTime) / 365 // 0.2 arbitrary
We have:

  • M is defined as the gross margin per unit.
  • S is arbitrarily defined as 0.5 times the gross margin. Naturally, the impact may vary from one industry vertical to the next depending on client tolerance for stock-outs.
  • C is expressed as annual carrying costs accounting for 30% of the initial purchase price per year. The factor C reflects periods of mean(LeadTime) days instead of years.
  • A is expressed as a 20% annual discount on future rewards. Likewise, the value is scaled to fit the lead time through mean(LeadTime) / 365.

In practice, the probabilistic lead times are also expected to be forecast by the forecasting engine. Consequently, in the illustrative example, we assume that Leadtime is a distribution.

The backorder case

Backorders represent units that have already been sold while physically not being available in stock. When backorders are involved, the stock reward calculation should take into account not only the future demand, but also the backorders that are already present. Let's assume that the vector BackorderQty represents the quantities backordered for each item - possibly zero if there are no backorders. Then, the stock reward calculation can be adjusted as follows:
R = stockrwd(D +* dirac(BackorderQty), M, S, C, A)
The +* operator is the additive convolution and dirac(BackorderQty) is the Dirac distribution at BackorderQty. The convolution is shifting the demand distribution to the right of BackorderQty units, which represents a certain demand. Then, based on this revised distribution of demand, the stock reward function proceeds as detailed above, assigning the usual stock-out penalties to those extra units of demand.

Grid-flavored syntax

The stockrwd() function also provides a grid-flavored syntax with:
Grid.Reward = stockrwd(Id, Grid.Probability, Grid.Min, Grid.Max, M, S, C, A)

where the first four arguments Id, Grid.Min, Grid.Max and Grid.Probability merely represent the probabilistic forecasts produced by Lokad's forecasting engine. The other arguments remain as described in the previous section. However, whenever possible, it is suggested to use the distribution-flavored stockrwd() syntax as described in the previous section.

Properties of the stock reward function

The stock reward function can be written as $R(k, M, S, C)$ to demonstrate the economic variables. The stock reward function is additive in respect of its components: $$\begin{align} R(k, M, S, C) = & R(k, M, 0, 0) + \\ & R(k, 0, S, 0) + \\ & R(k, 0, 0, C) \end{align}$$ Then, the stock reward function is linear in respect of its parameters $M$, $S$ and $C$: $$\begin{align} R(k, aM, bS, cC) = & aR(k, M, 0, 0) + \\ & bR(k, 0, S, 0) + \\ & cR(k, 0, 0, C) \end{align}$$ These properties naturally extend to stockrwd, the Envision function provided by Lokad.

Reviews of typical situations

The stock reward function is straightforward in the sense that it reflects the financial outcome of an inventory situation in a relatively minimal way. Indeed, as we have seen, it would be unwise to actually remove anything from this model as this would not even reflect the three basic states of an SKU unit: sold, missing or in stock. By adjusting the economic variables, the stock reward can be modified to reflect situations associated with specific industry verticals.


In the aerospace industry, spare parts are required to keep aircraft properly maintained. A missing NO-GO spare part triggers AOG (Aircraft On Ground) incidents that typically cost a lot more than the spare part itself.

In this context, it's reasonable to have:

  • M=0, unless the parts are serviced for a price, there is no differentiated upside in servicing the part.
  • S= constant, since all NO-GO parts are equally capable of grounding an aircraft, the stock-out penalty is uniform.
  • C= constant (annualized), since most parts are long-lived, it's acceptable, in a first approach, to approximate the annual carrying costs as a constant.


Regarding newspaper inventory, we consider that an item that can only be sold during the next time period, otherwise losing its entire market value at the next iteration. Newspapers represent the archetype of such items, but they are not a unique case. Similar behaviors are also observed for highly seasonal and perishable items that offer only a very narrow time-window to be sold.

In this context, we would have:

  • M, the gross margin
  • S, a fraction of the gross margin
  • C = 0, as nothing is carried on from one period to the next
  • A = 0, idem, no reward can be gained from future periods