Forecast's species: classification vs. regression
The word forecasting is covering a very large spectrum of processes, technologies and even markets. In the past, we introduced the worlds of forecasting software, distinguishing between:
- Deterministic simulation software
- Expert aggregation software
- Statistical forecasting software
Lokad falls in the last category as our technology is purely statistical. Yet, Lokad is far from covering the entire statistical spectrum on is own. Two broads categories of forecasts exist in statistical forecasting (*):
- Classification forecasts
- Regression forecasts
(*) We are oversimplifying here for the sake of clarity, as statistical learning subtleties are well beyond the scope of this modest blog post.
Classification attempts to separate (or classify) objects according to their properties. The illustration below from Tomasz Malisiewicz illustrates a classification task trying to separate images picturing a chair from images picturing a table.
Illustration from tombone’s blog
The output of a classification is binary (or rather discrete): objects get assigned to classes with more or less confidence, i.e. higher or lower probabilities.
On the other hand, regressions typically output curves. The illustration below is considering a time-series representing historical sales, and displays the corresponding forecast.
The regression forecast is a curve rather than a binary (or combination of binary) settings. Inputs get prolonged into the future.
How does this distinction impact the business?
Well, it turns out that Lokad - as it stands early 2010 - only delivers regression forecasts. Thus, there are many interesting problems that cannot be tackled by Lokad because these are classification problems:
- Customer segmentation: for each customer, we would like to evaluate the probability of achieving successful up-sale through a direct marketing action. Following the same idea, we could try to predict the churn as well.
- Fraud detection: for each transaction, we would like to evaluate - based on the transaction pattern - the probability for the operation to be a fraud attempt.
- Deal prioritization: based on the properties of the prospect (availability of budget, industry, contact rank in the company, expressed level of interest, …), we would like to evaluate the likelihood to get a profitable deal out of each prospect to prioritize the sales team efforts.
Frequently, we are asked whether Lokad could deliver classification forecasts as well. Unfortunately, the answer will be negative for the time being. Albeit being rooted by the same mathematical theory, classification and regression entail very different technologies; and Lokad is pushing all its efforts toward regression problems.
Although, we are not dismissive about classification problems, they truly deserve attention and efforts. For 2010, we are sticking to our roadmap, but further ahead, classification could be a natural extension of our forecasting services.