Algebra

Algebra of Distributions










Home » Resources » Here

Mathematical distributions are powerful and useful for modelling many business situations, especially those where uncertainty exists. Envision treats distributions as first-class citizens, and can handle a wide range of operations to be performed with these very distributions. All these operations are collectively referred to as the algebra of distributions that are supported by Envision. In this page, we introduce this distribution data type, and review the various operators and functions that apply to it.


Foreword

Lokad's forecasting engine started delivering quantile grids in early 2015. These grids were not exactly probability distributions yet - merely interpolated quantile forecasts - but we were getting fairly close. Working with our clients, we started to realize the massive potential of applying probabilistic analysis to quantitative supply chain optimization. However, our grids were just that: big tables listing all the probabilities. And as these grids represented a breakthrough both for our clients and for our ourselves, we quickly realized that processing probabilities represented in the form of lists was no easy task.

The algebra of distributions represents a broad technological answer of Lokad to supply chain challenges that involve unknown futures. Indeed, those situations do not merely require a single median forecast, but a complete risk analysis for all possibilities. Envision embraces the idea that all scenarios should be considered, instead of just focusing on a hand few scenarios. For this purpose, Random variables can be introduced within Envision scripts and manipulated through operations specifically tailored for random variables, such as convolutions - more details in the following. In practice, the algebra of distributions is an elegant way to model complex supply chain situations where both the future demand and the future lead time are uncertain.

The distribution data type

Mathematical distributions are objects that generalize the notion of functions. Within Envision, our ambition is more modest, and what we call distributions are actually functions $f: \mathbb{Z} \to \mathbb{R}$. We refer to these (mathematical) functions as distributions because the most frequent use case in Envision is to handle probability distributions, that is, strictly positive distributions that have a mass equal to 1.

Also, Envision distributions (referred as distributions in the following paragraphs) are compact: they allow non-zero values only for a finite number of values. This constraint has been introduced because non-compact distribution, while possible, generates a lot of complications for little practical benefits.

Within Envision, distributions are materialized through a special data type named distribution. Other data types include number or text. The distribution data type exhibits relatively complex behaviors precisely because it is a function rather than a single value. For example, below, we generate a Dirac, that is a discrete function with a value 0 everywhere by for the point 42 where it is valued at one.

d := dirac(42)

Distributions can be exported into a file using the Ionic data files. However, distributions cannot be exported as such into CSV or Excel files.

Envision offers many more ways to generate distributions. They will be reviewed in the following sections.

Plotting a distribution

Distributions can be visualized with histograms. Let's consider a simple Poisson distribution:



This plot has been generated in Envision with the single-liner detailed below:
show histogram "My first distribution!" a1d4 tomato with poisson(21)
The histogram tile expects a single scalar distribution to be provided after the with keyword.

Point-wise operations

The simplest operations on distributions are known as point-wise operations. For example, let $f$ and $g$ represent two distributions $\mathbb{Z} \to \mathbb{R}$. Then, we can define the addition as:

$$f+g: k \to f(k) + g(k)$$ From Envision's perspective, assuming that both X and Y are distribution vectors, the same operation is similarly written as:
Z = X + Y
It must be noted that even when dealing with distributions, Envision remains a vector language. Hence, we are typically not processing a single distribution at a time, but a whole vector of distributions at once. The same operation can be performed from a scalar perspective using the following:
Z := X + Y
In this and the following sections, whenever we use X and Y in script examples, we assume that these two variables are actual distributions.

Then, the point-wise multiplication and subtraction are defined with: $$f \times g: k \to f(k) \times g(k)$$ $$f-g: k \to f(k)-g(k)$$ which translates quite transparently into the following Envision syntax:
Z = X * Y
Z = Z - Y
From the perspective that a number $\alpha$ can be implicitly assimilated to a constant function $f_{\alpha}: k \to \alpha$, Envision allows to combine numbers and distributions - but only if the resulting distribution is compact.
Z = 2 * X // OK, it's compact
Z = X / 2 // not dividing by zero is OK
Z = X + 1 // incorrect, not a compact distribution
Z = X / Y // incorrect, Y is compact hence has zero values
The distributions can also be shifted. The shift operator is typically written as:

$$f_{n}: k \to f(k+n)$$ The corresponding Envision syntax is:
Z = X << n // left shift
Z = X >> n // right shift
Naturally, if n is negative, then the shift operators keep working, but the left shift becomes a right shift, and vice versa.

Generating distributions

There are multiple ways to create distributions. Lokad's forecasting engine generates distributions for future lead times or future demand. When these distributions have been serialized as a grid (*), it is possible to regenerate the distribution through the distrib() function. The relevant syntax is:
Demand = distrib(Id, Grid.Probability, Grid.Min, Grid.Max)
The resulting Demand variable is a distribution. When the original grid includes segments that are longer than 1, distrib() uniformly spreads the mass across the segment. The mass of the distribution is preserved by the distrib() function.

(*) The serialization of a distribution is the process of turning the distribution data into a regular tabular format which can be stored as a flat file. In order to handle the distribution as an actual distribution - and not as a table - we need to de-serialize the table first. This is exactly what is being done above with the distrib() function.

In addition, Envision also offers the possibility to generate a distribution directly from a set of observed numeric values. This is the purpose of the ranvar() aggregator:
X = ranvar(Orders.Quantity)
The ranvar() aggregator returns a random variable that matches the frequency observed in the aggregation groups. When there is nothing to aggregate, ranvar() returns dirac(0).

Extending a distribution into a table

In the previous section, we have seen how a table could be aggregated into a distribution. The reverse process, i.e. extending a distribution into table lines, is also possible. In this section, we review the extend.distrib() function which precisely does this. The syntax is illustrated as follows:
X = poisson(1)
table Grid = extend.distrib(X)
show table "My Grid" with Id, Grid.Min, Grid.Max, Grid.Probability
where X is the distribution vector generated on line 1 as a Poisson distribution. On line 2, the distributions are inflated into a table named Grid. This table has an affinity (Id, *), and as illustrated on line 3, the table is auto-populated with the numeric columns Grid.Min, Grid.Max and Grid.Probability. Both Grid.Min and Grid.Max are inclusive boundaries.

When extending relatively compact distributions, the resulting table typically contains lines of +1 increments - aka Grid.Min and Grid.Max increased by +1 from one line to the next. However, if we were to consider the extension of high valued distributions, for example dirac(1000000), then it would be extremely inefficient to generate millions of lines. Thus, the function extend.distrib() will aggregate large distributions into thicker buckets. This explains why we have both Grid.Min and Grid.Max which represent the inclusive boundaries of the bucket.

In order to gain more control on the granularity of the buckets generated, the function extend.distrib() offers the first overload:
table Grid = extend.distrib(X, S)
where S is a number vector. The resulting table provides buckets aligned with the segments [0;0] [1;S] [S+1; S+M] [S+M+1;S+2*M] ... where M is the default bucket size - also called the multiplier. This overload is typical of when the demand above the total stock needs to be considered.

Finally, the second overload of extend.distrib() provides even more control with:
table Grid = extend.distrib(X, S, M)
where M is a mandatory bucket size. If M is zero, then the extension reverts the default bucket size, auto-adjusted by Envision. This second overload is particularly useful when lot multipliers are involved in the ordering process; as the demand needs to be batched into buckets of a specific size.

Beware that extend.distrib(X, S, M) may fail depending on the capacity allocated to your Lokad account if you try to extend a high valued distribution while forcing a low multiplier.

Convoluting distributions of probabilities

Convolutions represent a more advanced class of operations on distributions. The prime use cases of convolutions involve random variables. Unlike point-wise operations, convolutions have probabilistic interpretations such as summing or multiplying independent random variables. Convolutions can be recognized in Envision by their two-character operators ending in *, namely:
Z = X +* Y // additive convolution
Z = X -* Y // substractive convolution, same as X +* reflect(Y)
Z = X ** Y // multiplicative convolution
Z = X ^* Y // convolution power
The additive (resp. the substractive) convolution can be interpreted as the sum (resp. the difference) of the two independent random variables $X+Y$ (resp. $X-Y$). The multiplicative convolution, also known as the Dirichlet convolution, can be interpreted as the product of two independent random variables.

The convolution power is more complex and represents: $$X ^ Y = \sum_{k=0}^{\infty} X^k \mathbf{P}[Y=k] \text{ where } X^k = X + \dots + X \text{ ($k$ times)}$$ This last operation is of interest because of its relationship to the process leading to an integrated demand forecast, where $X$ represents the daily demand - assumed stationary - and where $Y$ represents the probabilistic lead times.

See also our page on convolution power.