List of aggregators in Envision

List of aggregators

Home » Resources » Here

The aggregators are similar to functions, but they benefit from a richer syntax with options. Below, we list the aggregators supported by Envision.

Aggregator usage

There are two main usages for aggregators, whether a join between tables is performed or not.

Simple aggregation

When there is no join, the general usage for an aggregator agg is:
u = agg(v) by [w1, w2, w3] sort [w4, w5] if c or x over [a .. b]
  • agg is the name of the aggregator.
  • by is an option indicating the groups to be used when aggregating. It is possible to aggregate against one or more values.
  • sort is an option indicating the sorting order to be used when aggregating. It is possible to sort against one or more values.
  • if is an aggregator filter. When present, all lines that are false are omitted from the aggregation.
  • or is an option indicating the value to return if the group is empty.
  • over is an option used to define an aggregation time-span with boundaries expressed in days and relative to the date of the target line.

When by is not specified, the grouping happens according to the context, typically using the left-side of the statement in the case of an assignment. It is possible to list multiple variables after by; if only one variable is used, then the parentheses are not required.

The sort option has typically no impact on most aggregators, e.g. max, because those operators do not depend on any sort order in the first place. However, we suggest to leverage this option with first and last aggregators.

When if is not specified, all lines are included which is equivalent to if true.

When or is not specified, default values are returned when the grouping is empty: 0 (zero) for number, false for booleans, the empty string for text and the zero distribution for distributions.

Aggregation with table joins

The general syntax for this second case is similar but different to the one described above:

u = agg(v) by [w1, w2, w3] sort [w4, w5] if c at [z1, z2, z3] or x

The at option indicates the groups used when joining the tables. Note that there must be a one-to-one correspondence between these groups and the groups used in the by option. As a consequence, the by option is mandatory and the number of entries specified in the by and in the at options must be identical.

Basic aggregators

The most commonly used aggregators are:

  • avg: numeric argument only; returns the average line value.
  • count: counts the number of lines that are either true (booleans) or non-zero (numbers) or non-empty (text).
  • distinct: counts the number of distinct values.
  • first: returns the value associated to the earliest line, as defined per the sort option.
  • last: returns the value associated to the latest line, as defined per the sort option.
  • max: returns the highest value.
  • median: behaves like the MEDIAN function is Excel.
  • min: returns the lowest value.
  • product: numeric argument only; returns the product of line values.
  • ratio: returns the percentage (between 0 and 1) of Booleans that are true, or of numbers that are non-zero. The ratio of an empty group is 1.
  • same: expects all the grouped values to be identical (fails otherwise) and returns this value.
  • sum: numeric argument only; returns the sum of line values.

Logical aggregators

The aggregators below returns a boolean value:

  • all: returns true if all the input lines are true and return false otherwise.
  • any: returns false if all the input lines are false and return true otherwise.
  • exists: returns true if there is at least one line passed to the aggregator.

Temporal aggregators

Those aggregators apply only to tables that benefit from a Date column:

  • latest (cross-table, special case): returns the most recent value observed. latest is not compatible with any of the available options.

Statistical aggregators

The aggregators below provides statistical calculation:

  • entropy: returns the Shannon Entropy of the group. The value is returned expressed in shannons.
  • mode: similar to the MODE function in Excel.
  • percentile (two arguments, special case): similar to the PERCENTILE function in Excel. The second argument is percentile target between 0 and 1.
  • stdev: similar to the STDEV function in Excel.
  • stdevp: similar to the STDEV.P function in Excel.

Random variables

Distribution can be generated through aggregation (see also algebra of distributions):

  • ranvar: returns the empirical random variable associated to the numeric observations.
  • sumr: returns the sum of additive convolutions of random variables. Just like sum() is the aggregator associated to the + operator, the aggregator sumr() is associated to the +* operator.