Home »
Resources » Here
Data aggregation is the process of combining multiple table lines through special functions called
aggregators. Whenever sums, averages, counts or medians are involved, the calculations are carried out through aggregators. Aggregation also provides a way of combining data originating from different tables. Envision provides a rich syntax to support all such scenarios. In this section, we detail and illustrate how aggregators work with Envision.
An illustrating script
In this section we use, once again, the
sample dataset which should be accessible from the
/sample
path in your Lokad account. The script below is moderately complex and illustrates some of the data aggregation patterns available in Envision. Data aggregation can take place both outside the tiles and inside the tiles. We advise you to read
Making calculations with Envision first, as it should greatly help you with understanding the contents of this section.
read "/sample/Lokad_Items.tsv" as Items
read "/sample/Lokad_Orders.tsv" as O
read "/sample/Lokad_PurchaseOrders.tsv" as PO
expect O.Quantity : number
show label "Aggregating data" a1f1 tomato
oend := max(O.Date)
obegin := oend - 365
totalPurchased := sum(PO.NetAmount)
totalSold := sum(O.NetAmount) when date >= obegin
show table "Total purchased" a2c2 with
totalPurchased unit:"$"
show table "Sold over 1 year" d2f2 with
totalSold unit:"$"
VolumeSold = sum(O.NetAmount)
UnitSold = median(O.Quantity)
show table "Top sellers" a3f4 tomato with
Name
VolumeSold as "Sold" unit:"$"
UnitSold as "Median"
order by VolumeSold desc
avgRet := avg(distinct(O.Date) by O.Client)
avgQty := avg(sum(O.Quantity) by [O.Client, O.Date])
show table "Average client returns" a5c5 with
round(avgRet)
show table "Average backet quantity" d5f5 with
avgQty
O.Brand = Brand
show table "Top Suppliers" a6f7 tomato with
Supplier
distinct(Category) as "Categories"
sum(O.NetAmount) as "Sold" unit:"$"
mode(O.Brand) if (O.Brand != "Fellowes") as "Most frequent brand sold"
group by Supplier
order by sum(O.NetAmount) desc
We suggest to begin by copy-pasting the above script into your Lokad account, and running it once in order to observe the resulting dashboard. If everything works, you should see the dashboard illustrated below.

Scalars for single-value aggregations
A variable that has no “item” affinity in Envision is referred to as a scalar variable (the variable is not attached to any particular line of the
Items
table). With the Excel analogy in mind, Envision variables are akin to Excel columns by default: these variables are vectors and they hold many values at once (one value per item actually). However, it is also possible to have variables that behave like a single Excel cell, and that only hold a single value: these are the scalar variables. The script above illustrates how two scalar variables can be computed; the relevant lines are copied below for the sake of clarity.
totalPurchased := sum(PO.NetAmount)
totalSold := sum(O.NetAmount) when date >= obegin
show table "Total purchased" a2c2 with
totalPurchased unit:"$"
show table "Sold over 1 year" d2f2 with
totalSold unit:"$"
Several aspects can be highlighted in these few lines of code. Firstly, the two assignments on lines 1 and 2 are carried out using
:=
, the scalar assignment operator. When the operator
:=
is used instead of the plain assignment operator
=
, this means that the aggregation should result in a single value that is independent of the items.
Secondly, the variables
totalPurchased
and
tableSold
begin with lowercased names. While variable names are not case-sensitive in Envision, the spelling of these two variables is not an accident in this case. As a coding practice, we recommend to lowercase the first letter of the variables that contain scalar values. In contrast, the names of the tables and the names of the columns should start with an uppercase letter. Following these guidelines will make your script more readable and easier to debug.
Thirdly, the scalar aggregation takes place by default within the
show table
statement if the table has only a single column. For example, lines 1 and 3-4 could be combined into a single
show table
statement illustrated just below.
show table "Total purchased" a2c2 with
sum(PO.NetAmount) unit:"$"
Numeric scalar variables can also be used to do arbitrary calculations just like any numeric variable in Envision. For example - while it may not make much sense from a business perspective - the following calculation could be added at the end of the Envision script, once the variables
totalPurchased
and
totalSold
have been defined:
mySum := (totalSold - totalPurchased) / 2
show table "My Sum" with mySum
Multi-value aggregations
Envision also supports the possibility to perform multi-value aggregations, typically aggregating the data of a given table and projecting the result on the lines of another table. Thus, aggregation is the most frequent approach to combine data coming from different tables. In particular, one of the most frequent use cases consists of taking an historical table, that is, a table also indexed with a
Date
column, and aggregating this table into a vector aligned with the
Items
table.
The script shown previously illustrates this pattern with a table that displays the top sellers, that is, the items ordered according to their respective sales volume, putting the highest values at the top. The relevant lines of script are copied below for the sake of clarity.
VolumeSold = sum(O.NetAmount)
UnitSold = median(O.Quantity)
show table "Top sellers" a3f4 tomato with
Name
VolumeSold as "Sold" unit:"$"
UnitSold as "Median"
order by VolumeSold desc
Lines 1 and 2 are performing aggregations in a way that is very similar to the scalar aggregations that we have reviewed in the previous section. However, the assignments are carried out with the equals sign
=
instead of the
:=
operator used for the scalar assignments. As a result, both
VolumeSold
and
UnitSold
are vectors which are part of the
Items
table. Thus, these variables are similar to columns in Excel, and they contain one value per item.
Both
sum
and
median
are special functions named aggregators in Envision. There are many other aggregators in Envision such as
avg
,
min
,
max
,
first
,
last
, etc. For the sake of concision, we are not reviewing all the aggregators available in Envision in this section, however for more details, you can check out the
full list of aggregators.
The result of these two aggregations is displayed in the table defined with the lines 3 to 7. The two vectors
VolumeSold
and
UnitSold
are listed as arguments after the keyword
with
, and their values are displayed in the table. Finally on line 7, the
order by
statement indicates that the table should be sorted starting with the items that have the highest
VolumeSold
values.
Readers who may be familiar with the
GROUP BY
syntax in SQL might be wondering how Envision knows which grouping should be used when computing the
sum
at line 1. By default, Envision performs a grouping using the columns which act as “keys” on the right side of the assignment. In the case of a variable that belongs to the
Items
table – the table whose name is implicit – the column acting as the (primary) key is the
Id
column. This explains why a per-item aggregation results from using the
=
sign.
Explicit aggregation groups with by
So far, the aggregations we have performed relied on the implicit aggregation patterns of Envision. However, the behavior of all the aggregators can be modified with an optional keyword
by
that is used to explicitly specify the applicable grouping. Let’s illustrate how this keyword
by
is used:
VolumeSold = sum(O.NetAmount)
SameVolumeSold = sum(O.NetAmount) by Id // Same!
Line 2 is assigned to a second vector named
SameVolumeSold
but the values of this vector are strictly identical to the one of the vector
VolumeSold
at line 1. In fact, the option
by Id
is also used, albeit implicitly, at line 1. Intuitively, when the option
by
is used, it is as if groups were created first according to the grouping target, and second, as if the aggregator were separately computed for every group.
The
by
option offers the possibility to compose fairly complex aggregations, as illustrated by the script at the beginning of this section. Let’s review the two lines of script where the aggregations are performed through
by
options.
avgRet := avg(distinct(O.Date) by O.Client)
avgQty := avg(sum(O.Quantity) by [O.Client, O.Date])
The aggregator
distinct
counts the number of distinct values observed within each group. At line 1, the lines of the
O
(for Orders) table are first grouped according to their respective
Client
values; and then for every client, the number of distinct ordering dates is counted. Intuitively, this aggregation can be interpreted as counting the number of times each client has returned. Then, this result is re-aggregated into a single scalar value with the
avg
aggregator that contains the inner
distinct
aggregation.
The scalar
avgQty
can be interpreted as the number of units purchased per basket. The calculation begins with a
sum() by
statement, however after the
by
option, we have not one but two variables separated by commas and listed within brackets:
[O.Client, O.Date]
. This syntax should be understood as follows: create a group for every pair of
Client
and
Date
. From a business perspective, we are treating all units purchased the same day as the same basket, which is a reasonable approximation for most situations. Finally, the external call to
avg
produces the final average over all the
sum
aggregates computed for all the pairs.
More generally, the
by
option supports an arbitrary number of variables following the syntax
sum(foo) by [arg1, arg2, …, argN]
. In practice, however, it is rather rare to encounter situations where it makes sense to group by more than 4 variables at once. Also, the order of the arguments has no impact on the resulting groups used to compute the aggregates.
Explicit aggregation within a table with group by
Sometimes, through aggregation, we can produce a new table which is more relevant to be used within a dashboard than the original non-aggregated table. Thus, Envision also supports the possibility of aggregating data directly from the tile declaration statement. The most direct way of visualizing this Envision capability is to aggregate the data which is to be displayed in a table. This is precisely what is being done in the script at the top of this page. Let’s have a second look below at the relevant lines copied below.
O.Brand = Brand
show table "Top Suppliers" a6f7 tomato with
Supplier
distinct(Category) as "Categories"
sum(O.NetAmount) as "Sold" unit:"$"
mode(O.Brand) if (O.Brand != "Fellowes") as "Most frequent brand sold"
group by Supplier
order by sum(O.NetAmount) desc
The
show table
statement goes from line 2 to line 8, and more specifically, the aggregation is specified on line 7 with the statement
group by
, which is exactly the same semantic as the
by
option that we have reviewed in the previous section. At this point, you might be wondering why Envision is not using the keyword
by
instead of
group by
if the semantic is the same. The
if
at line 6 is an aggregator filter, discussed in the next section.
The answer is simple: it is possible to use the
by
option within the list of expressions used after the
with
for a given tile (such a situation is not illustrated here however). Thus
group by
allows to differentiate between a
by
statement that is part of an expression passed to the tile, and a
group by
statement that applies to the tile as a whole. In other words, the
group by
applies to all the expressions listed after the
with
keyword, in contrast to the
by
statement that only has a local impact.
When
group by
is used, all expressions passed to the tile after the
with
keyword should offer the possibility of being aggregated. For example, the 1-line script below is incorrect because
Name
does not offer an aggregation pattern while
group by Supplier
is specified.
show table "WRONG!" with
Name
group by Supplier
However, if we modify this script by introducing an aggregator,
distinct
for example, then our Envision script becomes valid.
show table "CORRECT!" with
distinct(Name)
group by Supplier
The only exception to this rule is the aggregation of the target itself. In the script at the beginning of this section, we have a
group by Supplier
statement on line 7. On line 3, the variable
Supplier
is listed without any aggregator, yet the script remains valid, precisely because the grouping takes place according to the
Supplier
variable.
The need to expose an aggregator also applies to the
order by
statement on line 8. Indeed, the table is first aggregated by supplier, and only then is it sorted according to the
Id
column – unless another sort order is specified. As a result, Envision needs to compute one value per group in order to sort all these groups. This is exactly what we see happen with the statement
group by sum(O.NetAmount)
.
While we illustrated
group by
with the
table
tile, this syntax is not specific to the
table
tile, and the same pattern can be used with most other tiles. For example, it would be possible to extend the script listed at the very top of this page with a
barchart
tile aggregated by
Brand
:
show barchart "Sales by brand" with
sum(O.NetAmount)
group by Brand
The
group by
also supports multi-grouping, that is grouping by multiple vectors or multiple expressions, with a syntax that is identical to the one of the
by
option that we have covered previously.
Aggregation filters
The Envision aggregators also support
filters through the
if
keyword. The script below illustrates the use of such a filter at line 6:
O.Brand = Brand
show table "Top Suppliers" a6f7 tomato with
Supplier
distinct(Category) as "Categories"
sum(O.NetAmount) as "Sold" unit:"$"
mode(O.Brand) if (O.Brand != "Fellowes") as "Most frequent brand sold"
group by Supplier
order by sum(O.NetAmount) desc
Here the brand
Fellowes
is explicitly excluded from the report through the
if
filter. Filters are useful within tile statement, because they offer the possibility to filter each column separately. In contrast, a
where
filter positioned outside the tile statement would have filtered all the selected lines of the
O
table indifferently for all columns.
Filters can also be used inline:
TwoAndMore = sum(O.1) if (O.Quantity >= 2)