00:00:00 Volatility discussion and introduction of Peter Cotton.
00:01:22 Peter’s performance at the M6 competition and its structure.
00:03:21 The theme of the M6 competition and investigation of the efficient markets hypothesis.
00:06:01 Peter’s model approach to predicting volatility using the options market.
00:08:10 Comparing finance to supply chain in terms of volatility understanding.
00:09:20 Probabilistic forecasts in supply chain and finance.
00:10:01 The difficulty of getting people to think stochastically.
00:12:26 AI as a buzzword and its impact on forecasting.
00:14:55 Simplicity and robustness in the face of complexity.
00:17:01 Benchmarking time series prediction algorithms and their performance.
00:18:58 Discussing how a distorted view of model performance can lead to overfitting and P hacking.
00:20:14 Purpose of forecasting competitions in preventing overfitting and data manipulation.
00:21:27 Critiquing the academic incentive structure and advocating for real-world, continuous algorithm testing.
00:22:55 Comparing finance to supply chain management and the need for rationality and efficiency.
00:27:15 The potential of prediction markets for obtaining accurate forecasts and overcoming biases.
00:28:14 Discussion on future probabilities and discovery mechanisms.
00:29:34 Comparing tested mechanisms to compensated weighted opinions.
00:31:40 Discrepancy in numbers during the M6 and 2006 financial crisis.
00:32:25 Distortion in expectations and effect of promotions in retail.
00:36:31 Quantitative traders breaking barriers and automating processes in supply chain.
00:38:09 Importance of discipline in prediction markets.
00:39:58 The impact of regulations on prediction markets.
00:40:44 The problem with statistical models and the Trump election example.
00:42:57 The necessity of feedback loops and real-world consequences.
00:46:10 Philip model’s success in the M6 competition by finding more data.
00:47:20 Lightweight mechanisms for predictions in data science pipelines.
00:48:41 MicroPrediction.org and its unique microstructure for predictions.
00:50:47 The evolution of supply chain and logistics concepts.
00:52:35 Cultural challenge in embracing uncertainty in supply chain management.
00:54:46 History of data science in finance and its relation to probabilities.
00:56:41 Beating the stock market and comparing to Warren Buffett.
00:58:36 M6 contest, individual efforts, and collective activity.
01:00:08 The moral takeaway from the M6 and using market power in other domains.


In an interview, Peter Cotton, Chief Data Scientist at Intech, and Joannes Vermorel, founder of Lokad, discuss probabilistic forecasting, the M6 forecasting competition, and the differences between finance and supply chain perspectives on volatility and uncertainty. They emphasize that perfect forecasts are impossible and probabilistic forecasting can help make better decisions amid volatility. Both agree on the value of simplicity and robustness in handling complex systems, whether financial markets or supply chains. They also discuss issues like P-hacking, transparency in prediction model errors, and market mechanisms for improving predictions. Vermorel highlights the cultural challenges in supply chain management, while Cotton emphasizes the importance of markets in improving overall forecasting.

Extended summary

In this interview, Peter Cotton, Chief Data Scientist at Intech and a quantitative trader specializing in forecasting is invited by Conor Doherty, the host, and speaks with Joannes Vermorel, the founder of Lokad, a software company specializing in supply chain optimization. The discussion revolves around probabilistic forecasting, the M6 forecasting competition, and the differences between finance and supply chain perspectives on volatility and uncertainty.

Peter Cotton, who ranked in the top 10 in the M6 forecasting competition, shares that the competition aimed to investigate the efficient markets hypothesis and whether good predictors could create sensible diversified portfolios that perform well. He explains that his approach to the competition was different from others, as he used data from the options market to predict volatility instead of forecasting it himself. He viewed the M6 competition as a battle between data scientists, forecasters, and quantitative finance professionals against the options market. Despite his high ranking, Peter was surprised by how well he performed in comparison to other participants.

Joannes Vermorel adds that finance has been far ahead of supply chain in terms of acknowledging and dealing with volatility and uncertainty. He notes that supply chain professionals still often strive for perfect forecasts, which is unrealistic. The first step in addressing this issue is acknowledging that perfect forecasts are impossible, and the second is understanding that uncertainty does not mean that things are unknowable. Probabilistic forecasting can help quantify the structure of uncertainty and make better decisions in the face of volatility.

Both Peter and Joannes agree that there is still much work to be done in encouraging the world to think in more stochastic terms and incorporate this understanding into decision-making processes. While finance has had a long history of dealing with uncertainty and risk, it has taken much longer for these concepts to be widely acknowledged and utilized in the supply chain industry.

Vermorel’s observes that AI has become a buzzword that often masks incompetence. He believes that when professionals are competent, they refer to their techniques by their technical names, such as hyperparametric models or gradient boosted trees.

Vermorel and Cotton discuss the complexity and chaotic nature of supply chains and the best approach to handle such systems. Both agree that rather than doubling down on complexity, a more reasonable path forward is to find something simple and robust. Cotton shares his experience with micro-prediction, which focuses on maintaining open-source packages for time series prediction. He emphasizes that the most successful models are often the simplest, such as precision-weighted averages of recent performance.

The interviewees also touch on the issue of P-hacking, where researchers manipulate data to support their desired outcome. They argue that forecasting competitions, such as the M5, can mitigate this problem by only releasing data after participants submit their results, preventing them from tweaking their models to engineer fake outcomes.

Cotton criticizes the academic literature for often having a closed contest run by the same person who enters and judges the competition. He suggests that instead of publishing papers, researchers should run their algorithms forever and let them autonomously determine their effectiveness across different business problems. Cotton advocates for a more data-driven approach, such as turning everything into an M6 competition or options market, to increase rationality and efficiency.

Vermorel also compares the unforgiving environment of finance with the inertia present in supply chains, where companies can remain inefficient for long periods without facing severe consequences. He questions the practice of sales and operations planning (S&OP), which involves gathering people to discuss and vote on forecasts, suggesting that this method is not the most effective way to make predictions.

Vermorel shares his experiences working with large retailers on forecasting the impact of promotions. He notes that expectations are often inflated, and a simple averaging model looking at historical data can produce more accurate predictions. However, presenting these more conservative estimates can sometimes be met with resistance, as it might be seen as undermining enthusiasm or diminishing human intelligence.

Cotton highlights the importance of discipline in making accurate predictions, which can be fostered through market-based approaches. He suggests encouraging people to be more transparent about their prediction model errors and to consider using lightweight market mechanisms within their data science pipelines. Prediction markets, while interesting, have been hampered by regulation and concerns about gambling.

Cotton recounts a disagreement with the team behind The Economist’s election model prior to the 2016 US Presidential election, which had assigned a much lower probability to a Trump victory compared to betting markets. The exchange underscores the need for better methods of evaluating model accuracy and the limitations of relying solely on expert opinion.

The participants agree that market mechanisms have proven to be more reliable than alternative methods for making predictions, but emphasize the importance of finding ways to introduce market discipline into other areas, such as supply chain optimization and retail forecasting.

Vermorel identifies a problem with traditional forecasting exercises, which often involve separate teams that are disconnected from the rest of the company. This leads to practices like sandbagging, where salespeople underestimate their forecasts to exceed their quotas and receive bonuses. Production, on the other hand, tends to overestimate forecasts to secure higher budgets for ramping up production. Vermorel suggests that creating feedback loops with real-world consequences can help ground predictive models and make them more effective.

Cotton discusses the role of prediction markets in improving forecasting models. While traditional prediction markets can be cumbersome, lightweight alternatives can be more effective in a data science pipeline. Cotton also mentions his book on microprediction mechanisms capable of receiving or soliciting predictions and serving upstream purposes for business applications.

The interviewees acknowledge the cultural challenges in supply chain management, particularly as supply chain emerged from the logistics field in the 1990s. Logistics focuses on operational certainty, whereas supply chain management involves long-term planning and working with uncertainty. Vermorel wonders how long it took finance to embrace probabilistic models of the future, while Cotton notes that data science has been in earnest for at least 40 years.

Cotton also touches on the difference between beating the market and providing accurate probability estimates. He explains that while individuals like Warren Buffett have beaten the market consistently, they cannot create standalone models that provide better probabilistic estimates than the market itself. He emphasizes the importance of markets as a combination of individual efforts to create probabilities and improve overall forecasting.

Full Transcript

Conor Doherty: Welcome back to Lokad TV, I’m your host Conor, and as always, I’m joined by Lokad founder Joannes Vermorel. Today’s guest is Peter Cotton, he’s a Senior VP and Chief Data Scientist at InTech Investment. Today, he’s going to talk to us about probabilistic forecasting and possibly how to beat the stock market. Peter, welcome to Lokad.

Peter Cotton: Thank you for having me.

Conor Doherty: At Lokad, we like to know who we’re talking to. So, Peter, could you tell us a little bit about your background and what you do at InTech Investment?

Peter Cotton: Oh, sure. I would describe myself as a career quant. I’ve worked on the buy side and the sell side, and I had a brief stint as an entrepreneur building a data company. I currently spend my time trying to predict things, which won’t surprise you, and also pushing the frontiers of portfolio theory.

Conor Doherty: We should say right at the start, congratulations on your recent performance in the M6 competition. I believe you placed in the top 10, is that correct?

Peter Cotton: I did. I’m not sure if it’s my credit or just the credits of all those option traders and the quants who support them. In some respects, it wasn’t my work at all; I was just a mere conduit from one source of predictive power to another.

Joannes Vermorel: For the audience, the M6 was actually the sixth iteration of a very known series of forecasting competitions, where the goal is literally to make predictions. The competition works as follows: there is a dataset that is made public, then there are certain set of rules, and people have to make predictions, typically in the form of time series forecasts. In this case, there was a probabilistic aspect to the last two competition iterations, the M5 and the M6. It was an iterated game with 12 iterations, where people had to submit their results and the competition would move forward. They had plenty of rules to establish who performed best and actually outperformed the market. That’s a very demanding exercise and a very brutal one because there is very little room to fake your results.

Conor Doherty: My understanding is that each iteration of the M competition is different. So, Peter, what was the theme of the M6? I mean, what was the express goal?

Peter Cotton: The goal of the organizers, in a broad sense, was to investigate the Efficient Markets Hypothesis, which states, in its various forms, that it’s hard to beat the market. The reason it’s hard to beat the market is that there’s a lot of financial incentive for doing so, and there are plenty of smart people who’ve spent the last 40 years of their careers trying to do that and building up teams to do it, and moving up all the data they can find to do that. It’s undoubtedly true that the best thing predicted on planet Earth is probably the price of Google stock or something. Everything else is a rung below that in terms of prediction, so that was one stated goal of the organizers. Another was to investigate whether people who could predict well would also be able to turn that into sensible diversified portfolios that performed according to some metric that we can quibble with. So, I think those were the main two goals of the organizers, at least as I understood them. And what exactly is it that your model did that other participants failed to do well?

What was different about my entry is that, from a philosophical perspective, I viewed the problem as one of finding whatever data was relevant. Of course, other people would view that in that respect, but I think what’s different is that people sometimes overlook the fact that data can take on the form of implied numbers or numbers which are implicit in the existing markets.

Now, if you look at the M6 competition, what we’re asked to do was try to predict the probability that a given stock or ETF would have returns in, let’s say, the second quantile compared to its other peers out of 100 after one month. So, you ask yourself, what really goes into determining whether a stock is going to finish in the second quantile of its peers? Well, if you have a view on the direction of the stock, that’s going to push up the probability of finishing the top two quantiles, obviously. But if you don’t have an opinion on the stock, which I personally didn’t, then the main thing that’s going to influence whether you end up in the first quantile or the third is the volatility of the stock.

So, I would argue that this was really a contest in predicting volatility, not the direction of the stock, perhaps somewhat contrary to the maybe stated hypothesis of the organizers, but that’s fine, it’s a laboratory experiment. So, what I did was, I said, “Well, look, there’s already a source of incredibly good information about the volatility of stocks. It’s called the options market.” So, I will simply look at the options market, and instead of forecasting volatility myself, I’ll just use those numbers. That’s pretty much all I did.

So, you could think of my entry as really just a market benchmark, perhaps not the same market benchmark that people would anticipate. The organizers have put in a different, weaker market benchmark. But that was mine, and I said, “Look, it’s very difficult to come up with better forward-looking estimates of how far a stock is going to move than that would be implied by the options market because if you could do that, you could make money by buying and selling options.” Of course, there are some people who make money selling, buying, and selling options, I do, but it drives the market to a very efficient state, and so that’s what I thought was fun about this contest.

It was a way of taking a community of data scientists, forecasters, and some quants, and saying, “Look, here’s this kind of battle, and I thought that was really fun to do that.” So that’s what I did, and now I was actually a little surprised at how high I finished up on the leaderboard. I think I was within .002 Brier score of being in the money, actually winning some money, so agonizingly close. But the main point was just to see, you know, would I beat 70% of participants, would it be 80%? It turned out it’d be 96% of participants. I was a little surprised by that, to be honest.

Joannes Vermorel: The interesting thing for me, coming from a supply chain background, is that I’m always so incredibly impressed by how finance is literally decades ahead of supply chain as far as all sorts of things are concerned.

My main battle at Lokad is that volatility exists. We are still in the battle of whether it does exist at all because in supply chain circles, there are plenty of people who say, “Let’s forecast down to four decimals how much we are going to sell next year.” If you had a perfect sales forecast, everything becomes a matter of orchestration. You can decide exactly how much you’re going to produce, how much you need to buy, and how much you need to allocate in terms of inventory. So, if you had perfect forecasts and all the execution to deliver the goods and services becomes just a pure matter of mundane orchestration.

When Lokad started to push for probabilistic forecasts in supply chain a decade ago, it was not new, as finance has been doing that with value at risk for at least three or four decades. The key idea is, first, we have to give up on the idea that we will have a perfect forecast. The first step is acknowledging that you don’t know all there is to be known about the future. It seems obvious to people coming from finance, but in supply chain, it’s still not widely acknowledged that you can’t get to a perfect forecast.

Once you accept that you have uncertainty, it doesn’t mean that you don’t know anything. You can have both uncertainty and quantify the structure of this uncertainty with volatility. It’s not because it’s uncertain that it’s unknowable. There are things to be known about the structure of uncertainty, and that’s when we say probabilistic forecasts. From a supply chain perspective, we use it to say that you don’t take the same decisions facing immense spread or very concentrated uncertainty. When you’re facing enormous volatility, you’re not approaching the risk quantitatively the same way as when it’s almost a sure thing comparatively.

Peter Cotton: It’s true that it’s still taking decades to try to get that message through. There are people in managerial science who’ve tried to popularize this notion, like Sam Savage with the Flaw of Averages, and encourage people to understand that taking one path or an average value will lead you into trouble. In finance, you’ve had all these incredibly fine-grained notions of convexity risk for years. It’s amazing how different that is.

I’ve noticed it too because some competitors have to provide distributional predictions, and if you’re coming from Kaggle or somewhere else, you might not be familiar with the motivation for it. So, what’s the solution? How do we encourage the world to think in more stochastic terms and get that working in people’s decision-making or spreadsheets? It’s not so easy.

Joannes Vermorel: Absolutely. And I believe one of the ingredients that is muddying the picture further is that, at least from my background in enterprise software in supply chain, the buzzword of the decade has been AI. It’s interesting because, as soon as you have AI, you’re supposedly having a superior grasp of the future.

From my personal take, AI is just a buzzword to mask your own incompetence with something. Once you’re very competent, you tend to call it something else, like a hyper-parametric model or gradient-boosted trees. When you’re saying AI, it’s just the mumbo-jumbo of something you don’t understand.

The interesting thing is that, very frequently, when you’re facing something that is incredibly chaotic and complex, my experience and our results with the M5 show that Lokad did very well with a method that was orders of magnitude simpler than AI-driven methods. What I found interesting with your micro prediction approach is that I believe you did something very similar in its simplicity. So when facing something incredibly complex, is it better to have a system that reflects all that complexity or, on the contrary, to have something very simple to steer you through the storm?

Joannes Vermorel: I undertook a couple of experiments in this regard. I was keen to have as many good algorithms as I could find from the open-source world for time series prediction. I try to maintain these open-source packages that make it fairly easy to benchmark things or figure out what’s a good time series for your purpose. Some of those have an autonomous life of their own and they try to see if they’re good at predicting something. Micro prediction is sort of like the M6 for algorithms, but typically on higher frequencies.

Peter Cotton: Of course, we started to develop views over time about what actually works and what’s robust across different situations. I did some offline benchmarking of univariate time series and there are probably 20 or 50 Python packages out there for time series prediction. Most of them wrap other packages like TSA and stats models. But when you benchmark them against classic stuff, you find that simple precision-weighted averages of recent performance of a bunch of simple models end up on top. Simple models include things like Auto ARIMA and their variants or even simpler things.

Joannes Vermorel: For the audience, I think what you’re pointing out is that P-hacking is a very real problem. When you venture into the realm of fancy models, you can nearly always find a model that accidentally performs well. This can lead to overfitting and P-hacking, where you cherry-pick dimensions and hypotheses to pass some statistical test of confidence. Forecasting competitions prevent this by only releasing the data after people submit their results so they can’t tweak their models to engineer fake results.

Peter Cotton: That’s right. Most academic literature comprises a tiny, closed contest run by the same person as the entrant. They decide who else is allowed to enter, run the race 10 times, and then publish the result. It’s ridiculous. The point of forecasting competitions is to prevent this from happening.

I agree. It’s absolutely ridiculous. Why is there even an empirical literature? I don’t know. I spent my time trying to mock the very notion of an empirical literature. Why do you have this, you know, why publish a paper on the efficacy of a model in something real time if the paper isn’t going to update itself, right? I don’t know what we can do to get away from this, unfortunately. As we all know and The Economist says, the joke about incentives is that the problem isn’t that they don’t work, it’s that they work too well. So if the only incentive is publishing papers, that’s what you’ll get. If the only incentive is a slightly weird metric for the M6 for the investment side of the contest, you’re going to get, you’re gonna find, you know, three out of 200 people who will find that that’s the way to game it, right? That’s the way it goes.

So, yeah, I mean, I advocate instead of publishing papers, people should take their algorithm and run it forever. And we should encourage an infrastructure that companies could share in that would enable these algorithms to travel from one business problem to the next and find out if they actually do well. And if these methods that are coming along these days, and some of them are very ingenious and machine learning, are capable of truly performing well out of sample and if there’s enough data for them to really do that, there’s going to be enough data to autonomously determine whether they’re good or not. And so, we don’t really need humans with their strong opinions and strong priors and self-interest and gatekeeping all the rest of it to determine which algorithm should work for a given business problem. Often, at least in my domain, and your domain is a little more challenging because you have longer-term forecasts, but in my domain, if you’re saying what’s going to work for you, you know, predicting how many customers will turn up in the next five minutes or how many cars are passed an intersection in the next two minutes, that’s a large data problem. We shouldn’t have, there’s no reason for people with their PDFs and all the rest of it to get in the way. That’s, in my opinion, let’s just turn everything into an M6 but speed it up or better turn everything into an options market.

Joannes Vermorel: Yes, and the interesting thing is, is that, again, for me, finance is just the sort of practice and I say in the good sense because you see there is this general perception of the public that, you know, if you have a villain in a movie, it’s going to be the finance guy and the options. But my take is that those markets are an exercise in rationality. I mean, you, if you’re profoundly irrational, you just go bankrupt. And it’s only people that can maintain a very, very high level of rationality in what they do over a long period of time that do not go bankrupt. It’s a very unforgiving environment. Even small inefficiencies are very quickly exploited. If you have some competitors that are, year after year, a few percent more efficient than you, then people reallocate their funds toward those people and then you literally go bankrupt. So it’s literally fast-paced Darwinism in action in a way that is fairly brutal.

In ways that those, you know, long-term predictions, that’s also the sort of things that people don’t realize in supply chains, that there are many companies that can survive for decades not because they are very, very good, but because there is such incredible inertia in setting up the infrastructure, updating the practices and whatnot, you can stay super inefficient for a decade or more before it makes even a dent. For example, a lot of retailers went to the internet to set up their web store two decades late after Amazon, and they suffered a lot instead of just disappearing. In finance, there were plenty of things – if you’re two decades late to the party, that’s just unbearable. So, from a supply chain background, when it comes to thinking about the future, one of the most popular methods is still S&OP, which means Sales and Operations. It involves having all the people together in the room and discussing, so that through the discussion, the proper forecast will emerge. From your quantitative trader perspective, would that sound like a reasonable option? Like, we want to perform well, so let’s bring 20 people into the room, let’s have a look at those charts, and then make a vote to decide the forecast, with bonus points if you have a higher rank in the organization.

Peter Cotton: Oh goodness, to be perfectly honest, I don’t envy people who are in the position of making one or two-year-ahead forecasts. Obviously, it’s a tricky game. The question of collective intelligence amongst humans in that sort of prediction task and how you accomplish that certainly has an interesting literature. But I do feel that sometimes there’s a simple fact that the US puritanical bias is simply getting in the way of a pretty obvious solution. I mean, I grew up in Australia, and if you want to know if two flies are crawling up a wall which one’s going to get there first, you let people bet on it. It’s really that simple. Let’s not overcomplicate this.

The best prediction device, the first great prediction device, was built the size of a building at Ellerslie Racecourse and opened, I think, in 1913. It was the world’s first mechanical totalizator machine. People could place their bets on horses, and these giant pistons would slowly rise up in the air to let people know how much was bet on each horse. And through this amazing mechanical apparatus, probability arose – the first example of risk-neutral probability defined in a real-time information processing system. In 100 years, that is still the only really reasonable way to arrive at future probabilities of events as far as I’m aware. I don’t think there’s been a better invention.

Joannes Vermorel: Yeah, and one point that is very interesting to me is that you’re pointing out that it is a discovery mechanism at play. That’s what we are talking about, and there is ingenuity in that. What is really worth it is not necessarily the model that goes with it or the thought of human insight, but it is having an approach where you think, “What is my discovery mechanism to gain more reliable information about this future? Is there even something that acts as a discovery mechanism, or am I just making things up and declaring those statements that I’m making implicitly are about the future as good and valid before there is even considering that there might be something of a journey to get there?” Something that has been engineered with this discovery in mind, and that’s a great way of putting it.

Peter Cotton: Here you have one mechanism that’s been tested in and out in a thousand different places for a century, and it just keeps working. People will constantly come along and say, “Well, wait, there’s something else we can do,” like a great example of compensated weighted opinions in a room looking at a spread. Well, maybe that’s the right mechanism for prediction, who knows? Look at the history. I started my career in 2001 in credit and lived through the 2006 experience. You had a market that was providing an implied correlation number that told you what the market’s view was on the relative codependence of one company’s fortunes and another. Let’s say that number was 30. The rating agencies took an actuarial approach, just like the M6 participants. They ignored the market information and came up with their own model, even in ignorance of the mathematics sometimes required to recognize the information. They told the institutional investors that the number was not 30, not even 20, but 5 percent. That’s a huge discrepancy in a number. So, how has this panned out other than a global financial crisis and other disasters in supply chain? How long is it going to take us to realize that the market is the only way? How many examples do we need?

Joannes Vermorel: The funny thing is that there is some sort of partial insanity going on. Just to give an example, in retail, Lokad is working with many large retailers. Typically, when it comes to forecasting the impact of promotions, like a 30% off chocolate bar, people are enthusiastic about the effect. They want to move the needle and acquire market share. But when we look at the promotion forecasts, the numbers are almost always inflated. People think sales will be three or four times the normal amount. However, when you apply a super basic averaging model and look at the past promotions, the reality is more conservative. It’s interesting because when you show them a more conservative model, they feel like their enthusiasm and human intelligence are being diminished.

Peter Cotton: In computer science, there’s a maxim: write the test first. But nobody writes the test first when it comes to forecasting or making predictions of the future, right? And only about five percent of the time do they write the test afterwards if they ever even go back in a rigorous way and look at what they’ve actually done.

Yes, it’s true. Markets have, for all their flaws, an incredible way of supplying discipline. There’s a reason that some top hedge funds, for example, incorporate things like poker camps. I grew up trying to understand gambling markets of various kinds, and if you don’t have that discipline, you’re not going to get better at predicting things. So how do we create that discipline?

We don’t want the EU to mandate that everybody puts their model residuals on the blockchain, as that would be inefficient for various reasons. However, we can maybe encourage people to think about how they could wield things that are like markets, but more lightweight, and start thinking about how they could fit into their existing data science pipelines.

We could start to encourage people to say, “Hey, look, what are you doing with the errors in your prediction models? Where do they go? Do they get thrown in the trash? Make them public, are they really that proprietary?” Most people don’t even know what your model is, or what you’re modeling, or how you used to do it, and you’re producing something that you claim is noise.

Well, what are you afraid of? That may be one approach. The prediction markets area is certainly interesting, and at least in the US, it’s been pretty much hobbled over the years by regulation. All sorts of people have tried to use this discipline, but then they pull back when it comes up against the gambling label. For things to work well, you need staking sometimes, so there can be a cost. We don’t want to turn the world into poker machines, but without some kind of market discipline, I don’t see it getting any better ever. I just see a repetition of things.

Joannes Vermorel: I think you’re touching on something very important, and also something that I’ve been advocating for decades: if you do not have a feedback loop from the real world when you operate in your mathematical space with statistical models and algorithms, you don’t know whether you’re doing things that are insane or not.

Mathematics only tells you consistency, whether what you’re doing inside this mathematical space is consistent with itself, not with the world. If you don’t have a feedback loop, you don’t know. At best, if you’re statistically and graphically correct, that just means that you’re consistent with yourself, which is good, but it doesn’t say anything about the world at large.

When you were saying, “Would you be willing to bet a few Euros or dollars on the case?”, it’s literally the feedback loop. That’s the punishment, the reward, and the skin in the game. In supply chain, one of the problems with those forecasting exercises is that they are typically entirely disconnected from what people are doing.

The problem I’ve identified is that most companies have one team doing the forecasting, producing time series forecasts, and then the rest of the company deals with the consequences. You end up with very weird practices. For example, salespeople, when they have to contribute to the sales forecast, will vastly under forecast in a process called sandbagging. Why? Because if they forecast their quota as 100 but are confident they’ll sell 200, they’ll exceed their quota and get their bonus.

On the other hand, in production, forecasting a high demand gets you more budget to ramp up your production apparatus. If you have a factory that can produce twice as much as what you need, production is smooth because your capacity is way beyond what you actually need. The problem is not that people are playing these games; it’s to have the feedback loop engineered in a way that people suffer the consequences. You want predictive models to be grounded, and financial attachments like betting can be an incredibly straightforward and grounded way to achieve that. Operationally, it’s also relatively simple to execute.

Peter Cotton: There was a good entrant in the M6 contest, which I’ll call the “Philip model.” An important part of his approach was to find more data. He wasn’t content with the stocks and ETFs provided by the organizers, so he sought more data, built models, and saw how they performed against a broader universe. This made him less inclined to overfit to a particular history. While prediction markets can be cumbersome, lightweight alternatives without staking can still be effective. Microprediction.org, for example, allows the cream to rise to the top without staking.

In my book, I talk about “micromanagers,” which are autonomous mechanisms that receive or solicit predictions and serve an upstream purpose for a business application. There are lots of different mechanisms for doing this. For instance, microprediction.org uses a continuous lottery system with a collective distribution of the future value of a variable. You can be rewarded for driving the collective distribution towards the true one. There’s a lot of literature on scoring and characterizations of point estimates and distributional ones. The challenge is more about culture: do businesses want to have the discipline found in finance for the last 40 years?

Joannes Vermorel: It’s indeed a fun problem to solve, and culture plays a significant role. Supply chain is a recent concept that emerged in the 90s, with logistics being the dominant field before. Excellency in logistics means having no accidents, eliminating hazards, and ensuring safety in the workplace. A lot of progress has been made in this area, with dangerous professions becoming much safer. Supply chain, however, focuses on the long view, making things happen on the ground, which is a different challenge.

And the thing is, that when you’re starting to think about that, that’s what I heard, you know, all the sort of concepts, for example, you know, Kullback-Leibler distance, that’s all again, that’s literally all sort of conceptual tools where you accept that the future is uncertain and thus you can work with uncertainty, and you even have the mathematical instrument to work with that.

That’s the interesting thing. The cultural challenge for supply chain is that it’s incredibly difficult. The logistics, where supply chain emerged, was to remove uncertainty in a way. You don’t want to have a probability of somebody dying on your watch, you want this probability to be zero or something that is so vanishingly small that when it happens, it’s really something that honestly was almost impossible to prevent. So people, and that’s good, want to have certainty in their processes. But then when you evolve into this supply chain mindset where you’re thinking years ahead, and suddenly you cannot obtain those certainties for things that are about to happen years from now. There is this culture that needs to be re-engineered because having complete certainty is very good on the ground for your operation, but it’s a completely different game when you start thinking about the future, especially not the immediate future but a bit beyond.

Conor Doherty: Would you estimate how much time it took for finance to embrace, during the course of the 20th century, this more elaborate probabilistic vision of the future? I believe that Value at Risk instruments were introduced in the ’80s, but I’m not 100% sure about my timing.

Peter Cotton: That’s a good question. Options markets existed well before then, and a lot of people had a pretty good grasp on what was going on. There’s always been smart people out there, and they published a lot. Data science is not 10 years old; it’s at least 40 years old, if you read the Jim Simons biography. Probability being dollars is a very old idea, and the notion that probability is unreliable if they’re not dollars is also very old.

Conor Doherty: Just one last question to tie it together. Did the M6 demonstrate that it is possible to beat the market and better than other people have historically done for six plus decades?

Peter Cotton: The problem with that, and it’s a very important distinction to make, is that Warren Buffett would not have finished in the top 10. Warren Buffett would have had horribly calibrated estimates of probability. There’s a difference between being able to beat the market and creating the same or better probabilistic estimates as the market can. Neither Warren Buffett nor Jim Simons nor any single hedge fund can do that. M6 is a contest and a collection of individual efforts to create probabilities, but a market is much more than that. It’s a collective activity, and you can’t beat that collective activity. From the M6, I expected to find some smart people and all credit to Philip who beat me fair and square. But if you look at a numerical simulation, it’s impossible to say that Philip was actually better than me or vice versa.

The overall performance of the options market in the M6 is kind of overwhelming. There was a pilot stage, and then quarter one, quarter two, quarter three, and quarter four. In every single one, five out of five times, my entry was in the top quartile. If that’s not luck, I think, hopefully, the M6 teaches people that the discipline of the market is way up here compared to the discipline they’re used to in their machine learning papers or conferences or whatever else.

I hope the moral is not that people should stay away from the markets because they’re too hard to beat. I hope the moral is a different one, that people start thinking about how they can use the power of the markets or things like it, or these feedback loops, in their own pipelines and their own companies. That’s what I hope people take away from it. I’m not sure if they will, but one can only hope.

Conor Doherty: I think that’s probably the end. I’ll draw things to a close. I want to thank you for your time, Peter, and thank you very much, Joannes, for your expertise and congratulations again on the M6. Thank you, everyone, at home for watching. We’ll see you all next time.