Forecasting is almost always a difficult exercise, but there is one area in general merchandise retail considered as one order of magnitude more complicated than the rest: promotion planning. At Lokad, promotion planning is one of the frequent challenges we tackle for our largest clients, typically through ad-hoc Big Data missions.

This post is the first of a series on promotion planning. We are going to cover the various challenges that are faced by retailers when forecasting promotional demand, and give some insights in the solutions we propose.

The first challenge faced by retailers when tackling promotions is the quality of the data. This problem is usually vastly underestimated, by mid-size and large retailers alike. Yet, without highly qualified data about past promotions, the whole planning initiative faces a Garbage In Garbage Out problem.

Data quality problems among promotion’s records

The quality of promotion data is typically poor - or at least much worse than the quality of the regular sales data. A promotional record, at the most disaggregated level represents an item identifier, a store identifier, a start date (an end date) plus all the dimensions describing the promotion itself.

Tthose promotional records have numerous problems:

  • Records exist, but the store did not fully implement the promotion plan, especially with regards of the facing.
  • Records exist, but the promotion never happened anywhere in the network. Indeed, promotion deals are typically negotiated 3 to 6 months in advance with suppliers. Sometimes a deal gets canceled with only a few weeks’ notice, but the corresponding promotional data is never cleaned-up.
  • Off the record initiatives from stores, such as moving an overstocked item to an end aisle shelves are not recorded. Facing is one of the strongest factor driving the promotional uplift, and should not be underestimated.
  • Details of the promotion mechanisms are not accurately recorded. For example, the presence of a custom packaging, and the structured description of the packaging are rarely preserved.

After having observed similar issues on many retailer’s datasets, we believe that the explanation is simple: there is little or no operational imperatives to correct promotional records. Indeed, if the sales data are off, it creates so many operational and accounting problems, that fixing the problem become the No1 priority very quickly.

In contrast, promotional records can remain wildly inaccurate for years. As long nobody attempts to produce some kind of forecasting model based on those records, inaccurate records have a negligible negative impact on retailer operations.

The primary solution to those data quality problems is data quality processes, and empirically validate how resilient are those processes when facing the live store’s conditions.

However, the best process cannot fix broken past data. As 2 years of good promotional data is typically required to get decent results, it’s important to invest early and aggressively on the historization of promotional records.

Structural data problems

Beyond issues with promotional records, the accurate planning of promotions also suffers from broader and more insidious problems related to the way the information is collected in retail.

Truncating the history: Most retailers do not indefinitely preserve their sales history. Usually “old” data get deleted following two rules:

  • if the record is older than 3 years, then delete the record.
  • if the item has not been sold for 1 year, then delete the item, and delete all the associated sales records.

Obviously, depending on the retailer, thresholds might differ, but while most large retailers have been around for decades, it’s exceptional to find a non-truncated 5 years sales history. Those truncations are typically based on two false assumptions:

  • storing old data is expensive: Storing the entire 10-years sales data (down to the receipt level) of Walmart – and your company is certainly smaller than Walmart – can be done for less than 1000 USD of storage per month. Data storage is not just ridiculously cheap now, it was already ridiculously cheap 10 years ago, as far retail networks are concerned.
  • old data serve no purpose: While 10 years old data certainly serve no operational purposes, from a statistical viewpoint, even 10 years old data can be useful to refine the analysis on many problems. Simply put, long history gives a much broader range of possibilities to validate the performance of forecasting models and to avoid overfitting problems.

Replacing GTINs by in-house product codes: Many retailers preserve their sales history encoded with alternative item identifiers instead of the native GTINs (aka UPC or EAN13 depending if you are in North America or Europe). By replacing GTIN with ad-hoc identification codes, it is frequently considered that it becomes easier to track GTIN substitutions and it helps to avoid segmented history.

Yet, GTIN substitutions are not always accurate, and incorrect entries become near-impossible to track down. Worse, once two GTINs have been merged, the former data are lost: it’s no more possible to reconstruct the two original sets of sales records.

Instead, it’s a much better practice to preserve GTIN entries, because GTINs represent the physical reality of the information being collected by the POS (point of sales). Then, the hints for GTIN substitutions should be persisted separately, making it possible to revise associations later on - if the need arises.

Not preserving the packaging information: In food retail, many products are declined in a variety of distinct formats: from individual portions to family portions, from single bottles to packs, from regular format to +25% promotional formats, etc.

Preserving the information about those formats is important because for many customers, an alternative format on the same product is frequently a good substitute to the product when the other format missing.

Yet again, while it might be tempting to merge the sales into some kind of meta-GTIN where all size variants have been merged, there might be exception, and not all sizes are equal substitutes (ex: 18g Nutella vs 5kg Nutella). Thus, the packaging information should be preserved, but kept apart from the raw sales.

Data quality, a vastly profitable investment

Data quality is one of the few areas where investments are typically rewarded tenfold in retail. Better data improve all downstream results, from the most naïve to the most advanced methods. In theory, data quality would suffer from the principle of diminishing returns, however, our own observations indicate that, except for a few raising stars of online commerce, most retailers are very far from the point where investing more in data quality would not be vastly profitable.

Then, unlike building advance predictive models, data quality does not require complicated technologies, but a lot of common sense and a strong sense of simplicity.

Stay tuned, the next time, we will discuss of process challenges for promotion planning.