Machine learning is the subfield of artificial intelligence that is concerned with the design and development of algorithms that allow computers to improve their performance over time based on data, such as from sensor data or databases. Wikipedia

Ten years ago, machine learning companies were virtually non-existent, or say, marginal at most. The main reason for that situation was simply that there weren’t that many algorithms actually working and delivering business value at the time. Automated translation, for example, is still barely working, and very far from being usable in most businesses.

Lokad fits into the broad machine learning field, with a specific interest for statistical learning. Personally, I have been working in the machine learning field for now almost a decade, and it’s still surprising to see how things are deeply different is this field compared to the typical shrinkwrap software world. Machine learning is a software world of its own.

Scientific progress on areas that looked like artificial intelligence has been slow, very slow compared to most other software areas. But a fact that is too little known is also that scientific progress has been steady; and, at present day, there are quite a few successful machine learning companies around:

  • Smart spam filter: damn, akismet caught more than 71 000 blog spam comments on my blog, with virtually zero false positive as far I can tell.
  • Voice recognition: Dragon Dictate is now doing quite an impressive job just after a few minutes of user tuning.
  • Handwriting recognition and even equation recognition are built in Windows 7.

Machine learning has become mainstream.

1. Product changes but user interface stays

For most software businesses, bringing something new to the customer eyes is THE way to get recurrent revenues. SaaS is slowly changing this financial aspect, but still, for most SaaS products, evolution comes with very tangible changes on the user interface.

On the contrary, in machine learning, development usually doesn’t mean adding any new feature. Most of the evolution happens deep inside with very little or no surfacing changes. Google Search - probably the most successful of all machine learning products - is notoriously simple, and has been that way for a decade now. Lately, ranking customization based on user preferences has been added, but this change occurred almost 10 years after the launch, and I would guess, is still unnoticed by most users.

Yet, it doesn’t mean that Google folks have been staying idle for the last 10 years. Quite the opposite actually, Google teams have been furiously improving their technology winning battle after battle against web spammers who are now using very clever tricks.

2. Ten orders of magnitude in performance

When, it comes to software performance, usual shrinkwrap operations happen within 100ms. For example, I suspect that usual computation times, server side, needed to generate a web page application are ranging from 5ms for the most optimized apps to 500ms for the slowest ones. Be slower than that, and your users will give up on visiting your website. Although, it’s hardly verifiable, I would suspect this performance range holds true for 99% of the web applications.

But it comes to machine learning, typical computational costs are varying for more than 10 orders of magnitude, from milliseconds to weeks.

At present day, the price of 1 month of CPU at 2Ghz has dropped to $10, and I expect this price drop under $1 in the next 5 years. Also, one month of CPU can be compressed within a few hours of wall time through large scale parallelization. For most machine learning algorithms, accuracy can be improved by dedicating more CPU to the task at hand.

Thus, gaining 1% in accuracy with a 1 month CPU investment ($10) can be massively profitable, but that sort of reasoning is just plain insanity for most, if not all, software areas outside machine learning.

3. Hard core scalability challenges

Scaling-up a Web2.0 such as say Twitter is a challenge indeed, but, in the end, 90% of the solution lies into a single technique: in-memory caching of the most frequently viewed items.

On the contrary, scaling up machine learning algorithms is usually a terrifyingly complicated task. It took Google several years to manage to perform large scale sparse matrix diagonalization (PageRank); and linear algebra is clearly not the most challenging area of mathematics when it comes to machine learning problems.

The core problem of machine learning is that the most efficient way to improve your accuracy consists in adding more input data. For example, if you want to improve the accuracy of your spam filter, you can try to improve your algorithm, but you can also use a larger input database where emails are already flagged as spam or not spam. Actually, as long as you have enough processing power, it’s frequently way easier to improve your accuracy through larger input data than through smarter algorithms.

Yet, handling large amount of data in machine learning is a complicated problem because you can’t naively partition your data. Naïve partitioning is equivalent of discarding input data and of performing local computations that are not leveraging all the data available. Bottom line: machine learning needs very clever ways of distributing its algorithms.

4. User feedback is usually plain wrong

Smart people advise to do hallway usability testing. This also apply to whatever user interface you put on your machine learning product, but when it comes to improve the core of your technology, user feedback is virtually useless when not simply harmful if actually implemented.

The main issue is that, in machine learning, most good / correct / expected behaviors are unfortunately counter intuitive. For example, at Lokad, a frequent customer’s complain is that we deliver flat forecasts which are perceived as incorrect. Yet, those flat forecasts are just in the best interest of those customers, because they happen to be more accurate.

Although being knowledgeable about spam filtering, I am pretty sure that 99% of the suggestions that I come up with and send to the akismet folks would be just junk to them, simply because the challenge in spam filtering is not how do I filter spam, but how do I filter spam, without filtering legit emails. And yes, the folks at Pfizer have the right to discuss by email of Sildenafil citrate compounds without having all their emails filtered.

5. But user data holds the truth

Mock data and scenarios mostly make no sense in machine learning. Real data happens to be surprising in many unexpected ways. Working in this field for 10 years now, and each new dataset that I have ever investigated has been surprising in many ways. It’s completely useless to work on your own made-up data. Without real customer data at hand, you can’t do anything in machine learning.

This particular aspect frequently leads to chicken-egg problem in machine learning: if you want to start optimizing contextual ads display, you need loads of advertisers and publishers. Yet, without loads of advertisers and publishers, you can’t refine your technology and consequently, you can’t convince loads of advertisers and publishers to join.

6. Tuning vs. Mathematics, Evolution vs. Revolution

Smart people advise that rewriting from scratch is the type of strategic mistake that frequently kills software companies. Yet, in machine learning, rewriting from scratch is frequently the only way to save your company.

Somewhere at the end of nineties, Altavista, the leading search engine, did not took the time to rewrite their ranking technology based on the crazy mathematical ideas based on large scale diagonalization. As a result, they got overwhelmed by a small company (Google) lead by a bunch inexperienced people.

Tuning and incremental improvement is the heart of classical software engineering, and it’s also hold true for machine learning - most of the time. Gaining the next percent of accuracy is frequently achieved by finely tuning and refining an existing algorithm, designing tons of ad-hoc reporting mechanisms in the process to get deeper insights in the algorithm behavior.

Yet, each new percent of accuracy that way costs you tenfold as much of efforts than the previous one; and after a couple of months or years your technology is just stuck in a dead-end.

That’s where hard core mathematics come into play. Mathematics is critical to jump on the next stage of performance, the kind of jump were you make a 10% improvement which seemed not even possible with the previous approach. Then, trying new theories is like playing roulette: most of the time, you lose, and the new theory is not bringing any additional improvements.

In the end, making progress in machine learning means very frequently trying approaches that are doomed to fail with a high probability. But once in a while something actually happens to work and the technology leaps forward.