Statistical forecasting is something deeply tricky and counterintuitive. I have already discussed why Lokad “must” fail against cos(x) and sin(x) and also why you should definitively not sum your forecasts. The key question of statistical forecasting is How accurate are your forecasts? Although the question might appear simple, there are many untold subtleties in that question.

Indeed, how do you define the notion of accuracy? The best definition of accuracy would be the difference between forecasted values and the “real future” values. Yet, there is a big issue: future values are unknown, otherwise what’s the point of making a forecast. Then, you might say “not knowing the future value is not an issue”, let’s do the following

• Every week I am making a forecast about next week.
• Then, I wait one week. Now the “future” value is known and I can compare the two values.
• Repeat the process for 6 months and compute the average forecasting accuracy.

This little scheme looks nice, but unfortunately it is not a operational scheme. Indeed, when you are trying to do forecasting, the problem is not to “benchmark” a single forecasting model, it’s to choose the best forecasting model among a large space of possible forecasting models.Indeed, the forecasting model is not something that known a priori, it’s a particular mathematical function chosen among  many other similar function by a “statistical learning” algorithm.

The 6 months scheme presented here above works to evaluate a single model. But, then what happen if you compare the accuracy of 1.million models over the same 6 month period?  If you start trying a lot of models, then one model is going to be a perfect fit for your historical data, i.e. forecasts perfectly matching data. Yet, since you’ve tried so many models, you can’t be sure it’s a good model, it might be just pure luck. 1 million might looks very large to you, but just consider that you’re not going to do it by hand, a computer is going to do it; and nowadays computers are making billions of operations per seconds.

You can think of this phenomenon as lottery forecasting: each model represents a lottery ticket. Trying models is like buying lottery tickets. If you starting buying millions of tickets, then the probability of winning the lottery become very high. Yet, it has nothing to do with being able to forecast the winning ticket, it’s just because you bought some many tickets.

If you ever had to choose a forecasting software, make sure you won’t fall for that trap (shameless plug: I suggest to go for Lokad, since we handle completely  the design of forecasting models, we handle this burden as well).

In the end, we are pretty much stuck with our initial problem: the accuracy of a forecasting algorithm is defined against data you don’t have, no matter the way you look at the problem. OK, this is not a very helpful conclusion since it looks like a dead-end. Fortunately, modern statistics do propose solutions to this problem. Stay tuned…