Making calculations with Envision

Making calculations

Home » Resources » Here

Envision offers the possibility to carry out just about any calculation that could otherwise be done with Excel. In this respect, the syntax to compute such calculations is similar to the syntax used for Excel formulas. Envision puts an emphasis on vector calculations. Vector-based operations are used to process many values at once rather than working with a single value at a time. Tables and vectors are covered in greater detail in this page, and should help you get started doing your own calculation with Envision.

In order to follow this page more effectively, we suggest you set up the sample dataset. We have not yet detailed out the Envision capabilities relative to uploading the input data, but our examples will show you the lines that you need to put at the top of your script.

Tables and vectors

Envision’s data model revolves around tables and vectors. An Envision table is similar to the tables that exist in relational databases. From an Excel perspective, a table is a well-formed spreadsheet where the first line contains column headers and where the (many) lines below contain data that is correctly aligned with these headers. In an Envision script, the tables are also named and the table names are typically driven by the names of the underlying tabular files. Vectors are associated with the columns in the table and similarly, they are also named. Envision uses the term “vector” rather than “column” to emphasize that operations can be performed on all vector values at once, that is, on all the lines of the original table.

Let’s illustrate this idea with a few lines of script that can be applied to the sample dataset. Below, for every order line, we calculate the tax rate, or in other words, the ratio between the amount of tax and the amount net of tax being charged to the customer.
read "/sample/Lokad_Items.tsv"
read "/sample/Lokad_Orders.tsv" as Orders
read "/sample/Lokad_PurchaseOrders.tsv" as PO

Orders.TaxRate = Orders.TaxAmount / Orders.NetAmount
Here Orders refers to the table that contains the entire sales history, with every transaction represented with as many lines as there are items within the given transaction. The variable Orders.TaxAmount refers to the vector associated with the column named TaxAmount within the Orders table. You can take a mental note of the syntax which consists of using a dot (.) between the table name and the vector name, as this pattern is frequently used in Envision.

The operation that involves the equal sign = is called an assignment: the calculation takes place on the right side of the = sign, and the result gets assigned to left side of the statement. In the example above, we have a division operation taking place on the right side. Since neither Orders.TaxAmount nor Orders.NetAmount are actually defined anywhere in the script, Envision tries to upload this data directly from the input dataset. Since the Orders table of the table dataset contains the two columns NetAmount and TaxAmount, the script is executed successfully. Then, on the left side, we have Orders.TaxRate which gets assigned the newly computed tax rate. An assignment is the logical equivalent of creating a new column in Excel named according to the assignment variable, that is, TaxRate in this instance.

In the following script snippets, for the sake of concision, we omit the read “/sample/Lokad_XYZ.tsv” lines from all the examples, as they are expected to always be included at the top of every script.

The syntax for calculations in Envision is similar to the one used for Excel formulas. In the script below, we perform a series of (rather arbitrary) calculations to illustrate the syntax.
Orders.A = 42
Orders.B = 5 * (1 + Orders.A)
Orders.C = (Orders.A + Orders.B) * (1 + Orders.A)
Here, the script defines three vectors named respectively A, B and C, and all the three vectors are attached to the Orders table. The first line is a simple assignment where the value 42 is assigned to Orders.A. However, as Orders.A is a vector, it is not just one value of 42, but one value for every line of the original table. Envision puts an emphasis on vectors and most of the operations on vectors take place on all their values at once.

Then, Orders is not the only table available in the sample dataset. For example, the sample dataset also contains a PO table, and very similar operations can be performed on this table as well.
PO.A = 42
PO.B = 5 * (1 + PO.A)
PO.C = PO.A + PO.B
Since we have just introduced a second table, it raises the following question: can Envision carry out operations between tables? The answer is yes, but it requires a tiny bit more effort. Let’s consider the following script:
Orders.A = 1
PO.A = Orders.A + 1 // WRONG!
Since the two tables Orders and PO have no reason to be aligned in any way - the two tables do not even contain the same number of lines – the semantic associated to such an operation would be very unclear. Thus, such an operation is not valid with Envision, and if an attempt is made to execute such a script, the execution fails and an error message is displayed.

Nevertheless, Envision provides abundant ways to combine data from distinct tables, and this aspect is covered in the following section.

Special “Items” table

The tables used by Envision are named, however there is an important exception to this convention, namely the items table. Indeed, in commerce, we observe that for the vast majority of calculations, there is one table that dominates all the others: the list of products / variants / SKUs / … depending on the actual business context being considered. Unlike, say, a relational database where all tables are treated equally, Envision gives special treatment to the items table, making it much easier to deal with most scenarios found in commerce.

At some point in their existence, every single retailer, small or large, comes up with a spreadsheet where all their products are listed, one product per line, and the sheet includes many additional columns providing extra information, either static information such as product categories, or dynamic information such as the total sales over the last 5 weeks. Depending on the situation, lines will be associated with products or SKUs, or any similar fine-grained representation of the items being sold. Building such a consolidated table is practical in many situations: when identifying dead inventory, updating prices, pinpointing top sellers, etc. As a retail practitioner, you have probably handled such spreadsheets on many occasions.

At Lokad, we realized that such spreadsheets are ubiquitous in retail and decided to reflect this in our technology. We designed Envision to deeply capture this pattern in order to stay as aligned as possible to this practice that is so prevalent in commerce.

Let’s go back to the sample dataset. The dataset contains a list of items. Let’s assume that we would like to compute the inventory value for every item. If the “items” table was named Items, then this could be done with:
Items.Stock = Items.StockOnHand + Items.StockOnOrder
Items.StockValue = Items.TotalStock * BuyPrice
However, for the “items” table, and for this table only, the name of the table can be omitted. Thus, the script should actually be written as:
Stock = StockOnHand + StockOnOrder
StockValue = Stock * BuyPrice
The prefix Items. has been removed, and by convention, any variable whose name does not include a dot (.) implicitly refers to a vector associated with the “items” table. Since calculations involving “items” are so frequent in commerce, this convention makes Envision much more readable in practice.

One of the key reasons why the “items” spreadsheets are so widely used is that most of the historical data encountered in commerce can be conveniently represented as a list of events where each event is attached to one item. For example, the sales history is – at the very minimum – a table with three columns: the item identifier, the date and the quantity purchased. Similarly, the purchase history can be represented with the same three columns, plus ideally an arrival date to account for the lead time between ordering and delivery. Customer returns can also be represented as a table containing the columns including item identifier, date, quantity, and ideally the client identifier, if one is interested in analyzing the behavior of recurrent customers.

And the list goes on. Commerce is all about flows: flows of goods, from suppliers to customers; and flows of money, from customers to suppliers. All such flows can be broken down into elementary lines where every line is attached to one item in particular. Thus, in commerce, we are not looking at just any kinds of tables, we are looking at all tables that are predominantly geared around items, and this is exactly the pattern that Envision aims to capture.

The “Items” table has only one column that is mandatory: the item identifier column which should be named “Id” as far as Envision conventions go. Nearly all of the other tables uploaded into Envision are also expected to have an “Id” column of their own. We have seen above that it was not possible to combine the Orders table and the PO table because the two tables were not aligned. Well, the Id column found both in the “items” table and in any other table is precisely the “bridge” that is required in order to allow this combination to take place.

Let’s illustrate this with a cash calculation. Let’s assume that for every item, one wants to compute the cash in, associated with the full sales history; and the cash out, also associated with the full purchase history. This can be done with the script below.
CashIn = sum(Orders.NetAmount)
CashOut = sum(PO.NetAmount) 
CashFlow = CashIn - CashOut
Here, we see that we have “items” on the left side of the assignments, and other tables on the right side, at least for lines 1 and 2. Using different tables simultaneously is legitimate in this case because of the use of the sum() aggregator. We will not review the aggregator in detail in this document, but it suffices to say that, as the name suggests, the sum() aggregator is computing the sum of every NetAmount for each item. The matching between items and orders is done transparently because the Orders table comes with an Id column too.

We have broken down the calculation over three lines for the sake of clarity. However, the script can be rewritten in a more compact manner, without naming the intermediate variables, as follows:
CashFlow = sum(Orders.NetAmount) - sum(PO.NetAmount)
In addition to the sum, Envision supports all the classical aggregators: average, min, max, median … and these aggregators offer many possibilities to complement the items table with descriptive attributes that can be highly useful when it comes to addressing many of the challenges found in commerce. However, while it is possible to create a new vector within the “items” by leveraging another table and an aggregator, it is also possible to do the reverse and to use a vector from the “items” table when performing a calculation associated with another table.

Let’s consider a situation where one would like to re-compute the VAT (value-added tax) associated with every sales order line. For the sake of simplicity, let’s assume that the VAT is computed at a flat rate of 20% across all the items and the entire history. This could be written as:
VatRate = 0.2 // hypothesis
F = VatRate / (1 - VatRate)
Orders.Vat = Orders.NetAmount * F
The VatRate vector expands naturally into a vector aligned with the Orders table because every order line is linked to a single original item. This expansion can be made more explicit by writing the actual code:
VatRate = 0.2 // hypothesis
Orders.VatRate = VatRate
Orders.F = Orders.VatRate / (1 - Orders.VatRate)
Orders.Vat = Orders.NetAmount * Orders.F
In the examples above, the F vectors are introduced in order to merely illustrate the breakdown of the calculation.

On a side note, please note that it is indeed possible to load arbitrary tables into Envision, even if such tables do not include an Id column. However, such advanced scenarios are beyond the scope of the present discussion.

Special status of dates

The items table is a special case in Envision because, as we have seen, many fundamental operations in commerce revolve around the notion of “items”. However, such operations are also typically linked to a specific date. Each line of the sales history comes with an applicable date, idem for the purchase order history, and this is also true for nearly all data that would qualify as historical data. Because of the importance of historical data in commerce, where practically all business operations can be described as a list of dated entries accounting for stock movements or payments, Envision makes a special case for the dates in a very commerce-driven way.

Any table can have a Date column in addition to its canonical Id column. When such a Date column is present, not only is the table indexed by the item identifiers, but also by dates. The indexation by dates is often practical because when one seeks to apply a time-window of some kind to a calculation, the entire history, no matter what type of entries are involved, is expected to be filtered similarly.

In order to illustrate this, let’s get back to our cash flow calculation. Let’s assume that instead of computing values for the entire history, we decide to compute the cash flow per item when only taking the last year of data history. This can be done with:
end := max(date)
when date > end - 365
  CashFlow = sum(Orders.NetAmount) - sum(PO.NetAmount)
The end variable is defined as the most recent date observed across the entire input dataset. Given that end is a date, this demonstrates that Envision offers the possibility to carry out date arithmetics, as shown in the script above. Adding the +1 convention to any date in Envision, results in adding one day to that particular date. Thus, by subtracting 365 days, we are moving back roughly one year in the past.

The script starts with a when filter which represents a condition that is imposed as being true for all the lines processed within a given block of script. As far as items are concerned, they are not indexed by date. Thus, the date filter has no effect on them, and all items continue to be present within the when block. However, both orders and purchase orders do have a Date column of their own, and, as a result, all the lines that do not satisfy the condition of the when filter get filtered from the block of script.

As a consequence, the sum() aggregation within the when block only processes the non-filtered lines, that is, the lines that are less than one year old. The same calculation could also have been carried out with intermediate variables just like it was done initially with the full-history example.
end := max(date)
when date > end - 365
  CashIn = sum(Orders.NetAmount)
  CashOut = sum(PO.NetAmount)
  CashFlow = CashIn - CashOut
With the example just above, it probably becomes a little clearer as to why we use the when statement at the start of the filtering block: indeed, all the lines within the block, noticeable by the two extra spaces at the beginning of lines 2, 3 and 4, undergo the same ambient filter for the dates.

This script also illustrates Envision's capacity to re-align complex data coming from the other tables with the “items” table. From an Excel perspective, it is as if one was capable of transforming the other sheets (for example, the order history) into columns using the master sheet which contains the list of products. Envision makes this process a lot less tedious.