Joining tables with the By-At aggregation in Envision

Joining tables with the By-At












Home » Resources » Here

Envision provides an implicit mechanism to join tables whenever those tables contain an Id (item index) field; as it is frequently the case to represent sales history or purchase history. However, this implicit mechanism - called natural joins is not suitable for more complex situations. Thus, Envision offers a more general join mechanism referred to as the by-at. The by-at is a special kind of aggregation that performs arbitrary joins over tables.

An illustrating script

On this page, we use the sample data set which should be available from the /sample path in your Lokad account. If it's not already done, we suggest first to read the entry reading files with Envision, with a closer look at the file type expectations as this section is very relevant to better understand the following script.
read "/sample/Lokad_Items.tsv"
read "/sample/Lokad_Orders.tsv" as Orders
read "/sample/Lokad_Suppliers.tsv" as Suppliers[*]

Moq = same(Suppliers.Moq) by Suppliers.Supplier at Supplier
show table "Item list" a1d4 tomato with Id, Name, Supplier, Moq
The first three lines are our usual statements for reading files. On line 3, the type expectation Suppliers[*] specifies that the suppliers table isn't getting any type expectations - it's an independent table with no primary key and no foreign key. Indeed the file Lokad_Suppliers.tsv - unlike all the other .tsv files of the folder - does not contain an Id column. As this column is missing, Envision cannot implicitly join this table with the Items table. As a matter of fact, the supplier's table is aggregated per supplier with one supplier per line, while all the other files are detailed per item.

On line 5, we leverage the by-at aggregation, called this way because it uses the by and at keywords, to perform a join between the Items table and the Suppliers table. We could have similarly written:
Items.Moq = same(Suppliers.Moq) by Suppliers.Supplier at Items.Supplier
However, per Envision's usual conventions, the Items table name is omitted.

In this specific example, we have populated a vector Items.Moq with the data obtained from the table Suppliers. A join has been made between the two fields Items.Supplier and Suppliers.Supplier.

In the case that a supplier entry found in the Items table does not have a counterpart in the Suppliers table, a zero value will be used. However, through the optional or keyword, following the usual behavior of the Envision aggregators, other default values could be used. We could have written:
Items.Moq = same(Suppliers.Moq) by Suppliers.Supplier at Items.Supplier or 0
On line 6, the resulting Items table is displayed in order to illustrate the calculation performed by the by-at the line above. As expected, each item is associated with the MOQ value of its respective supplier.

General syntax for the by-at

The general syntax for the by-at is:
T1.A = agg(T2.B) by [T2.X, T2.Y, T2.Z] if E at [T1.X, T1.Y, T1.Z] or T1.C
Much like the usual aggregators in Envision, the by-at offers the possibility to perform a tuple match, that is, matching n fields at a time. Then, all the usual Envision aggregators can be used: sum, but also min, median, same, etc. The or block can be omitted just like it is done with regular aggregation.

The semantic of the by-at is as follows:

  • Build all distinct groups of tuples for the by [T2.X, T2.Y, T2.Z], filtering with the condition E if it is present. For each one of those groups, if at least one matching group exists on the left, compute the aggregate agg(T2.B).
  • Build all distinct groups of tuples for the at [T1.X, T1.Y, T1.Z]. For each one of those, assign the matching aggregate from the right.
  • For the groups on the at that don't have a matching group on the right, then fall back on the default T1.C value.

Just like regular aggregators, the or statement is optional, and provides the value to be used as a result of the aggregation if the group is empty.

Tips for the by-at

The by-at is a powerful construct that can be leveraged in many situations, not just to join tables from the usual SQL perspective.

A table can be joined to itself: For example,
Orders.DaySum = sum(Orders.Quantity) by Orders.Date at Orders.Date
illustrates how to compute daily totals over the Orders table without resorting to the Day table.

Calendar tables Day and Week can be joined to: For example,
Day.Shift = sum(Orders.Quantity) by [Orders.Id, Orders.Date - 1] at [Day.Id, Day.Date]
illustrates how to shift the quantities from one day through a by-at.

Translating the by-at into SQL

For the readers who would already be familiar with SQL, then it can be noted that the Envision expression:
Moq = same(Suppliers.Moq) by Suppliers.Supplier at Supplier or defaultMoq
has the following equivalent in SQL, which uses an outer left join: UPDATE Items LEFT OUTER JOIN Suppliers ON Items.Supplier = Suppliers.Supplier SET Moq = COALESCE(Suppliers.Moq, defaultMoq) The Envision syntax emphasizes Excel-like calculations more than the relational algebra itself.