Forecasting with attributes

Forecasting with attributes












Home » Resources »

Lokad's intention is to deliver forecasts that are as accurate as possible. However, when looking at any one single product, there is frequently not enough historical data available for this specific product to compute an accurate statistical forecast. Lokad addresses this concern by extensively establishing correlations between items in order to refine its forecasts. However, at the same time, building correlations based on the demand history tends to be quite limited once again, because the history is too short or too sparse. Thus, Lokad also leverages the concepts of item hierarchy and item categories to further enhance its forecasts. In this page, we explain how such data can be passed on to Lokad's forecasting engine.

General syntax

The categories and the hierarchy can be communicated to the forecasting engine as named arguments, as illustrated by the following syntax:
Leadtime = forecast.leadtime(
  category: C1, C2, C3, C4
  hierarchy: H1, H2, H3, H4
  // other arguments snipped
  )
Demand = forecast.demand(
  category: C1, C2, C3, C4
  hierarchy: H1, H2, H3, H4
  label: PlainText
  // other arguments snipped
  )  

These arguments are optional, so category, hierarchy and label can be omitted. In cases where they are not omitted, up 4 vectors can be provided, but providing 1, 2 or 3 vectors is valid as well.

Forecasting perspective

Categories and hierarchy play a very similar role from the forecasting engine perspective: they help the forecasting engine cope with sparse historical data. Indeed, for any given item (1), the number of past observations might be very limited, with possibly only a handful of observations being available per year. In such situations, forecasts based on the limited historical data that we have available for any given item may be quite inaccurate because the estimation for this item would be very "noisy".

(1) The term item may refer to a product, a barcode, a SKU, a part ... depending on the specific business situation being considered. The item defines the granularity of interest from a forecasting perspective.

Lokad's forecasting engine addresses this concern by extensively leveraging correlations between items. However, since historical data is sparse, it is quite difficult to correlate items based on their past values alone. As a result, the forecasting engine also tries to leverage the relationships that exist between items based on their attributes or properties.

In retail, these properties are typically collected through PIM (Product Information Management) systems. Lokad, as a webapp, does not provide any PIM features, but the data processed by Lokad is frequently extracted from PIMs. These properties are interesting from a forecasting perspective because they implicitly embed a lot of information about the market itself - i.e. they point to the relevant market segments and highlight the differentiation factors between the items.

The hierarchy

The hierarchy is intended as a hierarchical (tree-like) organization of all the items. The forecasting engine supports up to 4 levels of hierarchy. When multiple hierarchy levels are present, they should be conveyed to the forecasting engine by decreasing order of importance - i.e. the first argument representing the top-most hierarchical level. The forecasting engine does not support multiple hierarchies.

Most businesses already have a hierarchy in place for organizing their items. Hierarchical levels can be named as market, segment, family, sub-family ... depending on the applicable terminology within the company. For example, for e-commerce businesses, the product hierarchy is typically visible on the company's website through the navigation menus.

The categories

The categories are intended for various categorizations of items which are deemed as relevant from a forecasting perspective. The categories are meant to complement the hierarchy on the more transverse properties of the good being sold (e.g. the author in case of books). Unlike the hierarchy, categories are not expected to share any specific relationships among themselves; hence the first category is not expected to be above the second category.

In practice, categories can be used to reflect relatively diverse attributes of items such as the brand, material, or color. Categories can also be used as an alternative way to represent a secondary hierarchy. The forecasting engine supports up to 4 distinct category attributes.

The plain-text labels

The labels are intended as a loose categorization of the items through their plain-text descriptions. The forecasting engine has text-timing capabilities intended to support datasets where categories and hierarchies are lacking. Indeed, while hand-crafted categories and hierarchies are desirable as far the forecasting accuracy is concerned such information is not always available and manually re-entering it for thousands of items is impractical as well.

Fortunately, Lokad’s forecasting engine is capable of exploiting the plain-text description of the items themselves. While there is no strict limitation on the size of the plain-text description, the forecasting engine is optimized for descriptions that contains less than 20 words. As a rule of thumb, best results are obtained with one-line descriptions of the items. In practice, we observe that high-quality, plain-text descriptions bring nearly as much accuracy improvement as well-defined categories and hierarchies.

Prior knowledge about the market

Through the hierarchy and the categories, the forecasting engine expects to gain prior knowledge about the market itself. In our experience, any attributes or characteristics that are perceived to be insightful by a domain expert are likely to be relevant for the forecasting engine as well. For any sizeable dataset including over a hundred items, we strongly recommend including at least two attributes (possibly two hierarchy levels, or one hierarchy level plus one category). Providing such information yields substantial accuracy gains.

On the other hand, we strongly discourage generating synthetic categories - or hierarchy levels - which would try to inform the forecasting engine about say, seasonality or sales volumes, or sales erraticity for example. In fact, if the attribute can be computed from the historical data itself, the forecasting engine can do it on its own, with no additional assistance required.