Last time I tried to predict a race, I trained up a neural network on past race results, ahead of the World Championships in Harrogate. The model backed Sam Bennett, but it did not take account of the weather conditions, which turned out to be terrible. Fortunately the forecast looks good for tomorrow’s Milan Sanremo.
This time I have tried using a Random Forest, based on the results of the UCI races that took place in 2020 and so far in 2021. The model took account of each rider’s past results, team, height and weight, together with key statistics about each race, including date, distance, average speed and type of parcours.
One of the nice things about this type of model is that it is possible to see how the factors contribute to the overall predictions. The following waterfall chart explains why the model uncontroversially has Wout van Aert as the favourite.
The largest positive contribution comes from being Wout van Aert. This is because he has a lot of good results. His height and weight favour Milan Sanremo. He also has a strong positive coming from his team. This distance and race type make further positive contributions.
We can contrast this with the model’s prediction for Mathieu van der Poel, who is ranked 9th.
We see a positive personal contribution from being van der Poel, but having raced fewer UCI events, he has less of a strong set of results than van Aert. According to the model the Alpecin Fenix team contribution is not a strong as Jumbo Visma, but the long distance of the race works in favour of the Dutchman. The day of year gives a small negative contribution, suggesting that his road results have been stronger later in the year, but this could be due to last year’s unusual timing of races.
Each of the other riders in the model’s top 10 is in with a shout.
It’s taken me all afternoon to set up this model, so this is just a short post.
Post race comment
Where was Jasper Stuyven?
Like Mads Pedersen in Harrogate back in 2019, Jasper Stuyven was this year’s surprise winner in Sanremo. So what had the model expected for him? Scrolling down the list of predictions, Stuyven was ranked 39th.
His individual rider prediction was negative, perhaps because he has not had many good results so far this year, though he did win Omloop Het Nieuwsblad last year and had several top 10 finishes. The model assessed that his greatest advantage came from the length of the race, suggesting that he tends to do well over greater distances.
The nice thing about this approach is that that it identifies factors that are relevant to particular riders, in a quantitative fashion. This helps to overcome personal biases and the human tendency to overweight and project forward what has happened most recently.