Stock Reward Function (Supply Chain)

Stock Reward Function

Home » Resources » Here
By Joannès Vermorel, December 2015 (last revised Feb 2017)

The stock reward function quantifies the expected returns, both positive and negative, of holding a certain number of units in stock. Fundamentally, the stock reward function answers the question What do we get from holding one more unit in stock? This function can be used to compose a prioritized ordering policy, where all units are prioritized according to their specific economic returns. Lokad recommends using the stock reward function for most inventory optimization situations.

The not too technical perspective

From a pure forecasting perspective, the future demand is best represented through probabilities associated with all possible futures; that is, the probability of having a demand of 0 units, the probability of having a demand of 1 unit, etc … with such probabilities being computed for every single item (products, SKUs, barcodes) depending on the context.

However, while these probabilities give a detailed picture of the future, they don't tell us anything about decisions to be made as far as inventory is concerned. Inventory decisions cannot be based on demand probabilities alone; the financial risks should be factored in too.

For example, let’s consider two products having the same probabilities of demand. If the first product is long-lived while the second has a short shelf-life, then from an inventory perspective, it makes sense to keep more units in stock for the long-lived product.

The stock reward is a mathematical function that computes the profitability of adding one more unit of stock for a given item, by taking into account a probabilistic forecast of the future demand, and a few economic variables reflecting the expected profit when servicing the item, as well as the expected costs when the unit remains in stock due to lack of demand.

Lokad considers the stock reward function to be a cornerstone of modern inventory optimization. The solutions brought by the stock reward function are typically superior to those obtained through the naïve approaches that consist of targeting a specific service level or a fill rate. In reality, these latter approaches ignore all downside scenarios, that is, the costs associated with not selling the items in stock.

Economic factors of the stock reward

The stock reward analysis is an economic analysis in the sense that it seeks to establish the financial returns of an inventory position. In order to achieve this, we need to introduce a few basic economic factors that impact the returns obtained from inventory.

The economic angle should not be restricted to a naive profit-maximization analysis. In particular, the costs incurred from clients experiencing stock-outs should constitute an integral part of the analysis. However, the economic approach only provides the framework, which aims to balance inventory costs with stock-out costs, but finding the right balance itself tends to be completely business-specific.

Let's define three variables associated with a single SKU when considering a duration that is equal to the lead-time:

  • $M$ is the gross margin for selling 1 unit
  • $S$ is the stock-out penalty (negative) for not serving 1 unit
  • $C$ is the carrying cost penalty (negative) for not selling 1 unit in stock

These variables are fundamental in the sense that no inventory optimization can take place without an estimation of these very variables. Without even a rudimentary estimation of the aforementioned variables, any ordering methodology is bound to suffer from one or more of the issues listed below:

  • the method fails to reflect the inventory risks associated with a future demand that may not happen. Hence, while the method may deliver good service levels, it's creating dead stock.
  • the method fails to reflect the costs incurred on the client-side due to stock-outs, and also fails to demonstrate the opportunity loss of not serving the clients.
  • the method fails to reflect the importance of serving the given unit and generating a profit that actually sustains the inventory itself.

Based on these considerations, let's review two simple scenarios depending on whether the demand exceeds the stock or not. Let $k$ be the number of units in stock, and let $y$ be the number of units requested by clients.

If the stock exceeds the demand, that is, $k \geq y$, then the immediate reward associated with the stock is $yM+(k-y)C$. Indeed, $yM$ accounts for the $y$ units that are served with their associated rewards, while $(k-y)C$ accounts for the carrying costs for the $(k-y)C$ units not sold at the end of a given the period.

If the demand exceeds the stock, that is, $k < y$, then the immediate reward is alternatively written as $kM+(y-k)S$. In this case, the first $k$ units get properly serviced and accounts for $kM$ in rewards, but then $y-k$ units are missing and incur the $(y-k)S$ penalty of stock-out.

Definition of the stock reward function

In the previous section we have computed an immediate reward, however, inventory optimization is an iterated process. Units of stock that don't get sold in the next time period may get sold in the period that follows, thus generating a delayed profit. Alternatively, units that don't get sold in the next time period may not get sold in the period that follows either, thus incurring further delayed carrying costs. The stock reward function addresses this challenge by taking into account not only the next time period, but all the periods that follow.

We define the stock reward function as: $$R(t, k)= \begin{cases} kM+(y_t-k)S & \text{if $y_t \geq k$ (stockout)} \\ y_tM+(k-y_t)C + \alpha R^*(t+1, k-y_t) & \text{if $y_t < k$ (leftover)} \end{cases}$$ where:

  • $k$ is the number of units held in stock
  • $y_t$ is the demand for the period $t$
  • $M$, $S$ and $C$ are the economic variables introduced previously
  • $\alpha$ is a discount factor which will be discussed below
  • $R^*$ is identical to $R$ but with $S=0$, and will also be discussed below

At first glance, this formula may look a bit overwhelming, but it's actually a straightforward model of a single SKU with $k$ units in stock confronting a demand of $y_t$ units. In fact, except for the $\alpha R^*(t+1, k-y_t)$ component, this expression is just like the immediate reward that we have detailed in the previous section.

Then, in order to take all the subsequent time periods into account, there are two twists. First, we have a recursive call to the reward function itself; signifying that the reward is the sum of the rewards (or losses) for the next time period plus all the rewards (or losses) for all the time periods that follow. At first, it might look puzzling to have a function that "walks" indefinitely into the future, but it merely reflects the fact that unsold inventory is carried on from one time period to the next.

Second, we introduce $\alpha$ as a discount factor for future rewards. This approach is inspired by the discounted cash flow concept that reflects the fact that a profit generated in a distant future has less value than a profit generated in a very near future. Conversely, the same logic applies for costs as well: an immediate cost is more impacting than a cost that is incurred in a distant future.

Finally, the recursion is performed using $R^*$, which ignores stock-out costs, instead of $R$. This reflects the fact that it is not the "responsibility" of current stock to prevent stock-outs for any other lead time period but the current one. By definition, the lead time represents the time duration to be covered by the current stock. For the next time period, there will be, by definition, another opportunity to buy more stock (we will see how the no-reorder case can be accommodated in the following section). Therefore, the responsibility of not hitting a stock-out for a time period that follows the next one falls on a later inventory decision.

Probabilistic estimate of the stock reward function

The expression of the stock reward function $R$ depends on the future demand $y_t$ that is typically unknown. However, $R$ can still be computed if forecasts are available. In order to compute $R$, we advise leveraging a probabilistic forecast of future demand, that is, not just an estimate of the average future demand, but estimates of the entire probability distribution. Based on this insight, we can introduce $\hat{R}$, that is the empirical estimation of $R$ that relies on a probabilistic demand forecast. The function $\hat{R}$ is written as follows: $$ \begin{align} \hat{R}(t,k)= & \sum_{y |y \geq k} \mathbf{P}(Y_t=y) ( kM+(y-k)S ) \\ & + \sum_{y|y < k} \mathbf{P}(Y_t=y) ( yM+(k-y)C + \alpha \hat{R}^*(t+1, k-y) ) \end{align} $$ This expression transforms the original $R$ expression into conditional probabilities. The first line reflects the stock-out scenarios while the second line reflects the stock left-over scenarios. Both lines are weighted against their respective probabilities.

As we will see in the following section, $\hat{R}$ can be computed for practical purposes. As a matter of fact, Lokad provides a built-in function named stockrwd that implements this precise formula. This point is covered in greater detail in the next section.

In practice, the only measurement available is $\hat{R}$ because $R$ cannot be computed effectively, since the future demand is not yet known. Thus, by using the stock reward function, we do indeed refer to its estimate $\hat{R}$ rather than to the "real" $R$ function. It should also be noted that the accuracy of the $\hat{R}$ estimate naturally depends on the accuracy of the underlying probabilistic forecasts. However, this discussion goes beyond the scope of the present document.

Properties of the stock reward function

The stock reward function can be written as $R(k, M, S, C)$ to emphasize the economic variables. The stock reward function is additive in respect of its components: $$\begin{align} R(k, M, S, C) = & R(k, M, 0, 0) + \\ & R(k, 0, S, 0) + \\ & R(k, 0, 0, C) \end{align}$$ Then, the stock reward function is linear in respect of its parameters $M$, $S$ and $C$: $$\begin{align} R(k, aM, bS, cC) = & aR(k, M, 0, 0) + \\ & bR(k, 0, S, 0) + \\ & cR(k, 0, 0, C) \end{align}$$ These properties naturally extend to stockrwd, the Envision function provided by Lokad.

The stockrwd functions in Envision

stockrwd is a function of Lokad's Envision feature that implements the stock reward function (or rather its probabilistic estimation), given that a probabilistic forecast is readily available. In case we are interested in the reward increment for the kth unit in stock, we define the Envision function as: $$\text{stockrwd}: k \to R(k)-R(k-1)$$ The corresponding Envision syntax is the following:
// margin reward component
RM = stockrwd.m(Demand, AM) * M
// stockout penalty component
RS = stockrwd.s(Demand) * S
// carrying cost component
RC = stockrwd.c(Demand, AC) * C
// recomposing the stock reward
// with point-wise additions
R = RM + RS + RC
Envision decomposes the stock reward function into its three components. As the components are linear with respect to their respective economic variables, the economic variables are kept outside the call to the stockrwd() function. This decomposition facilitates the inspection of the economic quantities generated by the stock reward, and make it easier to tune the economic assumptions that drive the calculation.

The first argument Demand is expected to be a distribution. This distribution represents the probabilistic demand and is typically produced by the forecasting engine. As such, Demand is not only expected to be a distribution, but it is also expected to be a random variable (mass equal to 1).

The three variables M, S, C are the economic variables defined at the beginning of this document. The arguments AM and AC are two distinct discount factors. In practice, S and C are expected to be negative. The two values AM and AC are also expected to be included in the segment $[0;1[$.

The function returns R, a distribution which reflects $k \to R(k) - R(k-1)$. Beware, this distribution is not a random variable, but an economic reward function. Actually, the formal definition implies that it is not even a compact support distribution. Envision has dedicated algorithms precisely intended to handle this type of non-compact distribution.

Let's review a typical definition of the economic variables:
M = SellPrice - BuyPrice
// 0.5 arbitrary
S = -0.5 * (SellPrice - BuyPrice)
// 0.3 arbitrary
C = -0.3 * BuyPrice * mean(Leadtime) / 365
// 'AM' for margin component
AM = 0.3
// 'AC' for carrying cost component
AC = 1 - 0.2 * mean(LeadTime) / 365
We have:

  • M is defined as the gross margin per unit.
  • S is arbitrarily defined as 0.5 times the gross margin. Naturally, the impact may vary from one industry vertical to the next, depending on client tolerance for stock-outs.
  • C is expressed as annual carrying costs accounting for 30% of the initial purchase price per year. The factor C reflects periods of mean(LeadTime) days instead of years.
  • AM, the discount factor for margin reward, is expressed as a step decay of 70% from one period ot the next.
  • AC, the discount factor for carrying cost, is expressed as a 20% annual discount on future rewards. Likewise, the value is scaled to fit the lead time through mean(Leadtime) / 365.

In practice, the probabilistic lead times are also expected to be forecast by the forecasting engine. Consequently, in the script above, we assume that Leadtime is a distribution.

Discount factors for margin and carrying cost

The discount factor A, as documented above, is not intended to be used in the same manner for the three components of the stock reward function.

For the margin component of the stock reward, the opportunity remains to buy more inventory at a later stage. Hence, the discount factor should heavily penalize purchased quantities, which would only generate margin at later periods. By definition, the opportunity remains to buy more stock later on to address those future periods. This is why we suggest a heavy discount AM = 0.3.

For the stock-out penalty component of the stock reward, by definition the discount factor, is always zero. Thus, the discount factor has no impact on this component.

For the carrying cost component of the stock reward, the stock is a decaying financial asset. For AC, we are suggesting a 20% annual discount, because inventory only generates costs over time, and because there is the opportunity cost to be considered: the money invested now to buy stock won't be available later on when the future demand has been observed.

Back orders and stock reward

Back orders are complicating the situation. When back orders are present, the future demand is only partially unknown, as the back ordered quantities are assumed to be known. Also, because clients have put some extra effort in back ordering the products, fulfilling back orders is typically considered as even more important than fulfilling regular orders. The script below illustrates how the stock reward function can be combined with back orders.
MB = 0.5 * SellPrice // arbitrary
SB = 0.5 * SellPrice // arbitrary

MBU = MB * uniform(1, Backorder)
SBU = SB * uniform(1, Backorder)

RM = MBU + (stockrwd.m(Demand, AM) * M) >> Backorder
RS = SBU + zoz(stockrwd.s(Demand) * S) >> Backorder
RC = (stockrwd.c(Demand, AC) * C) >> BackOrder
R = RM + RS + RC // plain recomposition
The two economic variables MB and SB represent the per-unit margin and stock-out penalty for the backordered units themselves. We could have used M and S instead, but as indicated above, back orders are typically considered as more important than just regular orders.

The script extensively leverages the >> shift operator provided by Envision. Indeed, as the backordered quantities are assumed to be known demand, the distribution of rewards is shifted to the right accordingly. Beware, shifting the demand first, i.e. Demand, would not yield the same results. Shifting the demand would tell the stock reward that at every period in the future, the Backorder quantity would be guaranteed demand.

Visual illustration of the stock reward

At this point, the stock reward function might still feel a bit cryptic. Below, a visual representation of the series transformations associated to the demand when applying a stock reward analysis in presence of back orders.


The first graph - entitled Future demand - represents a probabilistic demand forecast associated with a given SKU. The curve represents a distribution of probabilities, with the total area under the curve equal to one. In the background, this future demand is implicitly associated with a probabilistic lead time forecast, also represented as a distribution of probabilities. Such a distribution is typically generated through a probabilistic forecasting engine.

The Marginal fill rate graph represents the fraction of extra demand that is captured by each extra unit of stock. In other words, this graph demonstrates what happens to the fill rate as the stock increases. Since we are representing a marginal fill rate here, the total area under the curve remains equal to one. The marginal fill rate distribution can be computed with the fillrate() function.

The Demand with backorders graph is identical to the Future demand graph, except that 8 units have been introduced to represent a back order. The backorder represents guaranteed demand since these units have already been bought by clients. As a result, when backordered units are introduced, the probability distribution of demand is shifted to the right, as the backordered units being guaranteed demand. The shift operator >> is available as part of the algebra of distribution to compute such a transformation over the initial distribution.

The Fill rate with backorders graph is also very similar to the original Marginal fill rate graph, but has also been shifted 8 units to the right. Here, the plotted fill rate is only associated with the uncertain demand, hence the shape of the distribution remains the same.

The Margin graph represents the margin economic reward as computed by the stock reward function taking the Demand with backorders as input. The stock reward can be visualized as a distribution, but this is not a distribution of probabilities: the area under the curve is not equal to one but is instead equal to the total margin that would be captured with unlimited inventory. On the left of the graph, each backordered unit yields the same margin, which is not surprising as there is no uncertainty in capturing the margin, given that the units have already been bought.

The Stockout penalty represents the second component of the stock reward function. The shape of the distribution might feel a bit unexpected, but this shape merely reflects that, by construction of the stock reward function, the total area under the curve is zero. Intuitively, starting from a stock level of zero, we have the sum of all the stockout penalties as we are missing all the demand. Then, as we move to the right with higher stock levels we are satisfying more and more demand and thus further reducing the stockout penalties, until there is no penalty left because the entire demand has been satisfied. The stock-out penalty of not serving backorders is represented as greater than the penalty of not serving the demand that follows. Here we are illustrating the assumption that clients who have already backordered typically have greater service expectations than clients who haven’t yet bought any items.

The Carrying costs graph represents the third and last component of the stock reward function. As there is no upper limit for the carrying costs - it’s always possible to keep one more unit in stock thus further increasing the carrying costs - the distribution is divergent: it tends to negative infinity on the right. The total area under the curve is negative infinity, although this is a rather theoretical perspective. On the right, the carrying costs associated with the backordered units are zero: as those units have already been bought by clients they won’t incur any carrying costs, since those units will be shipped to clients as soon as possible.

The final stock reward - not represented above - would be obtained by summing the three components of the stock reward function. The resulting distribution would be interpreted as the ROI for each extra unit of stock to be acquired. This distribution typically starts with positive values, the first units of stock being profitable, but converge to negative infinity as we move to higher stock levels given the unbounded carrying costs.

The term support classically refers to the demand levels associated with non-zero probabilities. In the graphs above, the term support is used loosely to refer to the entire range that needs to be processed as non-zero values by Envision. In particular, it’s worth mentioning that there are multiple calculations that require the distribution support to be extended, in order to make sure that the final resulting distribution isn’t truncated.

  • The shift operation, which happens when backorders are present, requires the support to be increased by the number of backordered units.
  • The margin and carrying cost components of the stock reward function have no theoretical limits on the right, and can require arbitrarily large extensions of the support.
  • Ordering constraints, such as MOQs, may require having inventory levels that are even greater than the ones reached by the shifted distributions. Properly assessing the tail of the distribution is key for estimating whether the MOQ can be profitably satisfied or not.

One notable insight of the illustration above is the need to extend the calculation of the stock reward function beyond the range of non-zero demand. Whenever MOQs are present, the company can be forced to buy goods beyond a 100% service level coverage of the future demand for the next period. The stock reward function covers those situations as well. The impact of MOQs is discussed in greater details in the following section. In practice, the Envision runtime takes care of automatically adjusting the support to make sure that distributions aren’t truncated during the calculations.

Reviews of typical situations

The stock reward function is straightforward in the sense that it reflects the financial outcome of an inventory situation in a relatively minimal way. As we have seen, it would be unwise to actually remove anything from this model, as this would not even reflect the three basic states of an SKU unit: sold, missing or in stock. By adjusting the economic variables, the stock reward can be modified to reflect situations associated with specific industry verticals.


In the aerospace industry, spare parts are required to keep aircraft properly maintained. A missing NO-GO spare part triggers AOG (Aircraft On Ground) incidents that typically cost a lot more than the spare part itself.

In this context, it's reasonable to have:

  • M=0, unless the parts are serviced for a price, there is no differentiated upside in servicing the part.
  • S= constant, since all NO-GO parts are equally capable of grounding an aircraft, the stock-out penalty is uniform.
  • C= constant (annualized), since most parts are long-lived, it's acceptable, in a first approach, to approximate the annual carrying costs as a constant.


Regarding newspaper inventory, we consider that to be an item that can only be sold during the next time period, otherwise losing its entire market value at the next iteration. Newspapers represent the archetype of such items, but they are not a unique case. Similar behaviors are also observed for highly seasonal and perishable items that offer only a very narrow time-window to be sold.

In this context, we would have:

  • M, the gross margin
  • S, a fraction of the gross margin
  • C = 0, as nothing is carried on from one period to the next
  • A = 0, idem, no reward can be gained from future periods.