Lokad has a pretty unique approach to forecasting where we leverage all the data that we have to perform every single forecast. While discussing with customers, I have been asked whether Lokad would mix Chinese food data with sports bar data. Indeed, the customer was worried that we might mix data that exhibits very different sales patterns although it was the same food and drink retail industry.

In fact, the more abstract question was: How refined is the notion of industry segments within Lokad? Well, the real answer is that we don’t have any notion of industry segments in Lokad. And, in my humble opinion, it would be a really poor idea to even try to improve statistical forecasts based on such information

  • No matter how refined is your industry segments classification, it’s still a very poor approximation of the reality. Industry segments are changing all the time, and who knows whether sales of Thai food exhibit the same patterns that sales of Vietnamese food. In a way, this is why dmoz.org is massively less popular and useful than search engines.
  • Even of small point of sales is usually generating a dozen of time-series (for each product being sold) at least 200 worked days per year. Thus, one year of history represent already more than 1.000 numbers to be exploited. The information contained in those number is dwarfing the amount of information contained in the classification that would typically be represented as just a few numbers.
  • Creating a classification that matches the forecasting purposes is probably as hard, if not harder, than the forecasting task itself. Indeed, an efficient classification would be able tell whether business segments will exhibit same patterns in the future.

Instead of relying on such a manual classification, Lokad is relying directly on statistical correlations: if some data can be used to improve the considered forecast, then do it; if the data cannot be used to achieve that, then just ignore the data. With proper statistical tools, more data does not hurt and storing data has never been cheaper.

Back to the Chinese Food vs. Sport Bar initial example, the reality is more complex than it seems. Some products, sold in both places, let’s say ice cream, might exhibit similar sales patterns because they depend a lot from the weather, while some others, let’s say beers, might behave very differently. Lokad is relying on automated processes to validate the correlations for every single forecasts; as opposed to do it once for a whole industry segment.

Introducing industry segments in Lokad would be like reverting from full text search to a hierarchical directory: time-consuming, and, in the end, much less efficient.