A couple of years ago I built a model to evaluate how Froome and Dumoulin would have matched up, if they had not avoided racing against each other over the 2017 season. As we approach the 2019 World Championships Road Race in Yorkshire, I have adopted a more sophisticated approach to try to predict the winner of the men’s race. The smart money could be going on Sam Bennett.
With only two races outstanding, most of this year’s UCI world tour results are available. I decided to broaden the data set with 2.HC classification European Tour races, such as the OVO Energy Tour of Britain. In order to help with prediction, I included each rider’s weight and height, as well as some meta-data about each race, such as date, distance, average speed, parcours and type (stage, one-day, GC, etc.).
The key question was what exactly are you trying to predict? The UCI allocates points for race results, using a non-linear scale. For example, Mathieu Van Der Poel was awarded 500 points for winning Amstel Gold, while Simon Clarke won 400 for coming second and Jakob Fuglsang picked up 325 for third place, continuing down to 3 points for coming 60th. I created a target variable called PosX, defined as a negative exponential of the rider’s position in any race, equating to 1.000 for a win, 0.834 for second, 0.695 for third, decaying down to 0.032 for 20th. This has a similar profile to the points scheme, emphasising the top positions, and handles races with different numbers of riders.
A random forest would be a typical choice of model for this kind of data set, which included a mixture of continuous and categorical variables. However, I opted for a neural network, using embeddings to encode the categorical variables, with two hidden layers of 200 and 100 activations. This was very straightforward using the fast.ai library. Training was completed in a handful of seconds on my MacBook Pro, without needing a GPU.
After some experimentation on a subset of the data, it was clear that the model was coming up with good predictions on the validation set and the out-of-sample test set. With a bit more coding, I set up a procedure to load a start list and the meta-data for a future race, in order to predict the result.
With the final start list for the World Championships Road Race looking reasonably complete, I was able to generate the predicted top 10. The parcours obviously has an important bearing on who wins a race. With around 3600m of climbing, the course was clearly hilly, though not mountainous. Although the finish was slightly uphill, it was not ridiculously steep, so I decided to classify the parcours as rolling with a flat finish
|1||Mathieu Van Der Poel||0.602|
|5||Edvald Boasson Hagen||0.507|
|6||Greg Van Avermaet||0.500|
It was encouraging to see that the model produced a highly credible list of potential top 10 riders, agreeing with the bookies in rating Mathieu Van Der Poel as the most likely winner. Sagan was ranked slightly below Kristoff and Bennett, who are seen as outsiders by the pundits. The popular choice of Philippe Gilbert did not appear in my top 10 and Alaphilippe was only 9th, in spite of their recent strong performances in the Vuelta and the Tour, respectively. Riders in positions 5 to 10 would all be expected to perform well in the cycling classics, which tend to be long and arduous, like the Yorkshire course.
For me, 25/1 odds on Sam Bennett are attractive. He has a strong group of teammates, in Dan Martin, Eddie Dunbar, Connor Dunne, Ryan Mullen and Rory Townsend, who will work hard to keep him with the lead group in the hillier early part of the race. Then he will then face an extremely strong Belgian team that is likely to play the same game that Deceuninck-QuickStep successfully pulled off in stage 17 of the Vuelta, won by Gilbert. But Bennett was born in Belgium and he was clearly the best sprinter out in Spain. He should be able to handle the rises near the finish.
A similar case can be made for Kristoff, while Matthews and Van Avermaet both had recent wins in Canada. Nevertheless it is hard to look past the three-times winner Peter Sagan, though if Van Der Poel launches one of his explosive finishes, there is no one to stop him pulling on the rainbow jersey.
After the race, I checked the predicted position of the eventual winner, Mads Pedersen. He was expected to come 74th. Clearly the bad weather played a role in the result, favouring the larger riders, who were able to keep warmer. The Dane clearly proved to be the strongest rider on the day.