Joining tables with the left-by in Envision

Joining tables with the Left-By












Home » Resources » Here

Envision provides an implicit mechanism to join tables whenever those tables contain an Id (item index) field; as it is frequently the case to represent sales history or purchase history. However, this implicit mechanism - called natural joins is not suitable for more complex situations. Thus, Envision offers a more general join mechanism referred to as the left-by. Through the left-by, it's possible to perform arbitrary joins over tables.

An illustrating script

On this page, we use the sample data set which should be available from the /sample path in your Lokad account. If it's not already done, we suggest first to read the entry reading files with Envision, with a closer look at the file type expectations as this section is very relevant to better understand the following script.
read "/sample/Lokad_Items.tsv"
read "/sample/Lokad_Orders.tsv" as Orders
read "/sample/Lokad_Suppliers.tsv" as Suppliers[*]

Moq by Supplier = same(Suppliers.Moq) by Suppliers.Supplier
show table "Item list" a1d4 tomato with Id, Name, Supplier, Moq
The first three lines are our usual statements for reading files. On line 3, the type expectation Suppliers[*] specifies that the suppliers table isn't getting any type expectations - it's an independent table with no primary key and no foreign key. Indeed the file Lokad_Suppliers.tsv - unlike all the other .tsv files of the folder - does not contain an Id column. As this column is missing, Envision cannot implicitly join this table with the Items table. As a matter of fact, the supplier's table is aggregated per supplier with one supplier per line, while all the other files are detailed per item.

On line 5, we leverage the left-by construct to perform a join between the Items table and the Suppliers table. We could have similarly written:
Items.Moq by Items.Supplier = same(Suppliers.Moq) by Suppliers.Supplier
However, per Envision's usual conventions, the Items table name is omitted. This statement is referred to as a left-by because the by keyword appears on the left side of the assignment, which is somewhat peculiar, as the by keyword usually appears on the right side of the assignment along with an aggregator.

In this specific example, we have populated a vector Items.Moq with the data obtained from the table Suppliers. A join has been made between the two fields Items.Supplier and Suppliers.Supplier.

In the case that a supplier entry found in the Items table does not have a counterpart in the Suppliers table, a zero value will be used. However, through the optional or keyword, following the usual behavior of the Envision aggregators, other default values could be used. We could have written:
Items.Moq by Items.Supplier = same(Suppliers.Moq) or 0 by Suppliers.Supplier
On line 6, the resulting Items table is displayed in order to illustrate the calculation performed by the left-by the line above. As expected, each item is associated with the MOQ value of its respective supplier.

General syntax for the left-by

The general syntax for the left-by is:
T1.A by [T1.X, T1.Y, T1.Z] = agg(T2.B) or T1.C by [T2.X, T2.Y, T2.Z]
Much like the usual aggregators in Envision, the left-by offers the possibility to perform a tuple match, that is, matching n fields at a time. Then, all the usual Envision aggregators can be used: sum, but also min, median, same, etc. The or block can be omitted just like it is done with regular aggregation.

The semantic of the left-by is as follows:

  • Build all distinct groups of tuples on the right [T2.X, T2.Y, T2.Z]. For each one of those groups, if at least one matching group exists on the left, compute the aggregate agg(T2.B).
  • Build all distinct groups of tuples on the left [T1.X, T1.Y, T1.Z]. For each one of those, assign the matching aggregate from the right.
  • For the groups on the left that don't have a matching group on the right, then fall back on the default T1.C value.

Just like regular aggregators, the or statement is optional, and provides the value to be used as a result of the aggregation if the group is empty.

Tips for the left-by

The left-by is a powerful construct that can be leveraged in many situations, not just to join tables from the usual SQL perspective.

A table can be left-joined to itself: For example,
Orders.DaySum by Orders.Date = sum(Orders.Quantity) by Orders.Date
illustrates how to compute daily totals over the Orders table without resorting to the Day table.

Calendar tables Day and Week can be left-joined: For example,
Day.Shift by [Day.Id, Day.Date] = sum(Orders.Quantity) by [Orders.Id, Orders.Date - 1]
illustrates how to shift the quantities from one day through a left-by.

Translating the left-by into SQL

For the readers who would already be familiar with SQL, then it can be noted that the Envision expression:
Moq by Supplier = same(Suppliers.Moq) or defaultMoq by Suppliers.Supplier
has the following equivalent in SQL: UPDATE Items LEFT OUTER JOIN Suppliers ON Items.Supplier = Suppliers.Supplier SET Moq = COALESCE(Suppliers.Moq, defaultMoq) The Envision syntax emphasizes Excel-like calculations more than the relational algebra itself.