Forecasting with categories and a hierarchy

Forecasting with categories and a hierarchy












Home » Resources »

Lokad's intention is to deliver forecasts that are as accurate as possible. However, when looking at any one single product, there is frequently not enough historical data available for this specific product to compute an accurate statistical forecast. Lokad addresses this concern by extensively establishing correlations between items in order to refine its forecasts. However, at the same time, building correlations based on the demand history tends to be quite limited once again, because the history is too short or too sparse. Thus, Lokad also leverages the concepts of item hierarchy and item categories to further enhance its forecasts. In this page, we explain how such data can be passed on to Lokad's forecasting engine.

General syntax

The categories and the hierarchy can be communicated to the forecasting engine as named arguments, as illustrated by the following syntax:
Leadtime = call forecast.leadtime(
  category: C1, C2, C3, C4
  hierarchy: H1, H2, H3, H4
  // other arguments snipped
  )
Demand = call forecast.demand(
  category: C1, C2, C3, C4
  hierarchy: H1, H2, H3, H4
  // other arguments snipped
  )  

These arguments are optional, so both category and hierarchy can be omitted. In cases where they are not omitted, up 4 vectors can be provided, but providing 1, 2 or 3 vectors is valid as well.

Forecasting perspective

Categories and hierarchy play a very similar role from the forecasting engine perspective: they help the forecasting engine cope with sparse historical data. Indeed, for any given item (1), the number of past observations might be very limited, with possibly only a handful of observations being available per year. In such situations, forecasts based on the limited historical data that we have available for any given item may be quite inaccurate because the estimation for this item would be very "noisy".

(1) The term item may refer to a product, a barcode, a SKU, a part ... depending on the specific business situation being considered. The item defines the granularity of interest from a forecasting perspective.

Lokad's forecasting engine addresses this concern by extensively leveraging correlations between items. However, since historical data is sparse, it is quite difficult to correlate items based on their past values alone. As a result, the forecasting engine also tries to leverage the relationships that exist between items based on their attributes or properties.

In retail, these properties are typically collected through PIM (Product Information Management) systems. Lokad, as a webapp, does not provide any PIM features, but the data processed by Lokad is frequently extracted from PIMs. These properties are interesting from a forecasting perspective because they implicitly embed a lot of information about the market itself - i.e. they point to the relevant market segments and highlight the differentiation factors between the items.

The hierarchy

The hierarchy is intended as a hierarchical (tree-like) organization of all the items. The forecasting engine supports up to 4 levels of hierarchy. When multiple hierarchy levels are present, they should be conveyed to the forecasting engine by decreasing order of importance - i.e. the first argument representing the top-most hierarchical level. The forecasting engine does not support multiple hierarchies.

Most businesses already have a hierarchy in place for organizing their items. Hierarchical levels can be named as market, segment, family, sub-family ... depending on the applicable terminology within the company. For example, for e-commerce businesses, the product hierarchy is typically visible on the company's website through the navigation menus.

The categories

The categories are intended for various categorizations of items which are deemed as relevant from a forecasting perspective. The categories are meant to complement the hierarchy on the more transverse properties of the good being sold (e.g. the author in case of books). Unlike the hierarchy, categories are not expected to share any specific relationships among themselves; hence the first category is not expected to be above the second category.

In practice, categories can be used to reflect relatively diverse attributes of items such as the brand, material, or color. Categories can also be used as an alternative way to represent a secondary hierarchy. The forecasting engine supports up to 4 distinct category attributes.

Prior knowledge about the market

Through the hierarchy and the categories, the forecasting engine expects to gain prior knowledge about the market itself. In our experience, any attributes or characteristics that are perceived to be insightful by a domain expert are likely to be relevant for the forecasting engine as well. For any sizeable dataset including over a hundred items, we strongly recommend including at least two attributes (possibly two hierarchy levels, or one hierarchy level plus one category). Providing such information yields substantial accuracy gains.

On the other hand, we strongly discourage generating synthetic categories - or hierarchy levels - which would try to inform the forecasting engine about say, seasonality or sales volumes, or sales erraticity for example. In fact, if the attribute can be computed from the historical data itself, the forecasting engine can do it on its own, with no additional assistance required.