Predicting the World Champion

A couple of years ago I built a model to evaluate how Froome and Dumoulin would have matched up, if they had not avoided racing against each other over the 2017 season. As we approach the 2019 World Championships Road Race in Yorkshire, I have adopted a more sophisticated approach to try to predict the winner of the men’s race. The smart money could be going on Sam Bennett.

Deep learning

With only two races outstanding, most of this year’s UCI world tour results are available. I decided to broaden the data set with 2.HC classification European Tour races, such as the OVO Energy Tour of Britain. In order to help with prediction, I included each rider’s weight and height, as well as some meta-data about each race, such as date, distance, average speed, parcours and type (stage, one-day, GC, etc.).

The key question was what exactly are you trying to predict? The UCI allocates points for race results, using a non-linear scale. For example, Mathieu Van Der Poel was awarded 500 points for winning Amstel Gold, while Simon Clarke won 400 for coming second and Jakob Fuglsang picked up 325 for third place, continuing down to 3 points for coming 60th. I created a target variable called PosX, defined as a negative exponential of the rider’s position in any race, equating to 1.000 for a win, 0.834 for second, 0.695 for third, decaying down to 0.032 for 20th. This has a similar profile to the points scheme, emphasising the top positions, and handles races with different numbers of riders.

A random forest would be a typical choice of model for this kind of data set, which included a mixture of continuous and categorical variables. However, I opted for a neural network, using embeddings to encode the categorical variables, with two hidden layers of 200 and 100 activations. This was very straightforward using the fast.ai library. Training was completed in a handful of seconds on my MacBook Pro, without needing a GPU.

After some experimentation on a subset of the data, it was clear that the model was coming up with good predictions on the validation set and the out-of-sample test set. With a bit more coding, I set up a procedure to load a start list and the meta-data for a future race, in order to predict the result.

Predictions

With the final start list for the World Championships Road Race looking reasonably complete, I was able to generate the predicted top 10. The parcours obviously has an important bearing on who wins a race. With around 3600m of climbing, the course was clearly hilly, though not mountainous. Although the finish was slightly uphill, it was not ridiculously steep, so I decided to classify the parcours as rolling with a flat finish

PositionRiderPrediction
1Mathieu Van Der Poel0.602
2Alexander Kristoff0.566
3Sam Bennett0.553
4Peter Sagan0.540
5Edvald Boasson Hagen0.507
6Greg Van Avermaet0.500
7Matteo Trentin0.434
8Michael Matthews0.423
9Julian Alaphilippe0.369
10Mike Teunissen0.362

It was encouraging to see that the model produced a highly credible list of potential top 10 riders, agreeing with the bookies in rating Mathieu Van Der Poel as the most likely winner. Sagan was ranked slightly below Kristoff and Bennett, who are seen as outsiders by the pundits. The popular choice of Philippe Gilbert did not appear in my top 10 and Alaphilippe was only 9th, in spite of their recent strong performances in the Vuelta and the Tour, respectively. Riders in positions 5 to 10 would all be expected to perform well in the cycling classics, which tend to be long and arduous, like the Yorkshire course.

For me, 25/1 odds on Sam Bennett are attractive. He has a strong group of teammates, in Dan Martin, Eddie Dunbar, Connor Dunne, Ryan Mullen and Rory Townsend, who will work hard to keep him with the lead group in the hillier early part of the race. Then he will then face an extremely strong Belgian team that is likely to play the same game that Deceuninck-QuickStep successfully pulled off in stage 17 of the Vuelta, won by Gilbert. But Bennett was born in Belgium and he was clearly the best sprinter out in Spain. He should be able to handle the rises near the finish.

A similar case can be made for Kristoff, while Matthews and Van Avermaet both had recent wins in Canada. Nevertheless it is hard to look past the three-times winner Peter Sagan, though if Van Der Poel launches one of his explosive finishes, there is no one to stop him pulling on the rainbow jersey.

Appendix

After the race, I checked the predicted position of the eventual winner, Mads Pedersen. He was expected to come 74th. Clearly the bad weather played a role in the result, favouring the larger riders, who were able to keep warmer. The Dane clearly proved to be the strongest rider on the day.

References

Code used for this project

Sunflowers

Image in the style of @grandtourart

Last year, I experimented with using style transfer to automatically generate images in the style of @grandtourart. More recently I developed a more ambitious version of my rather simple bike identifier. The connection between these two projects is sunflowers. This blog describes how I built a flower identification app.

In the brilliant fast.ai Practical Deep Learning for Coders course, Jeremy Howard recommends downloading a publicly available dataset to improve one’s image categorisation skills. I decided to experiment with the 102 Category Flower Dataset, kindly made available by the Visual Geometry Group at Oxford University. In the original 2008 paper, the researchers used a combination of techniques to segment each image and characterise its features. Taking these as inputs to a Support Vector Machine classifier, their best model achieved an accuracy of 72.8%.

Annoyingly, I could not find a list linking the category numbers to the names of the flowers, so I scraped the page showing sample images and found the images in the labelled data.

Using exactly the same training, validation and test sets, my ResNet34 model quickly achieved an accuracy of 80.0%. I created a new branch of the GitHub repository established for the Bike Image model and linked this to a new web service on my Render account. The huge outperformance of the paper was satisfying, but I was sure that a better result was possible.

The Oxford researchers had divided their set of 8,189 labelled images into a training set and a validation set, each containing 10 examples of the 102 flowers. The remaining 6,149 images were reserved for testing. Why allocate less that a quarter of the data to training/validation? Perhaps this was due to limits on computational resources available at the time. In fact, the training and validation sets were so small that I was able to train the ResNet34 on my MacBook Pro’s CPU, within an acceptable time.

My plan to improve accuracy was to merge the test set into the training set, keeping aside the original validation set of 1,020 images for testing. This expanded training set of 7,261 images immediately failed on my MacBook, so I uploaded my existing model onto my PaperSpace GPU, with amazing results. Within 45 minutes, I had a model with 97.0% accuracy on the held-out test set. I quickly exported the learner and switched the link in the flowers branch of my GitHub repository. The committed changes automatically fed straight through to the web service on Render.

I discovered, when visiting the app on my phone, that selecting an image offers the option to take a photo and upload it directly for identification. Having exhausted the flowers in my garden, I have risked being spotted by neighbours as I furtively lean over their front walls to photograph the plants in their gardens.

Takeaways

It is very efficient to use smaller datasets and low resolution images for initial training. Save the model and then increase resolution. Often you can do this on a local CPU without even paying for access to a GPU. When you have a half decent model, upload it onto a GPU and continue training with the full dataset. Deploying the model as a web service on Render makes the model available to any device, including a mobile phone.

My final model is amazing… and it works for sunflowers.

References

Automated flower classification over a large number of classes, Maria-Elena Nilsback and Andrew Zisserman, Visual Geometry Group, Department of Engineering Science, University of Oxford, United Kingdom, men,az@robots.ox.ac.uk

102 Flowers Jupyter notebook

Bike Identification as a web app

One of the first skills acquired in the latest version of the fast.ai course on deep learning is how to create a production version of an image classifier that runs as a web application. I decided to test this out on a set of images of road bikes, TT bikes and mountain bikes. To try it out, click on the image above or go to this website https://bike-identifier.onrender.com/ and select an image from your device. If you are using a phone, you can try taking photos of different bikes, then click on Analyse to see if they are correctly identified. Side-on images work best.

How does it work?

The first task was to collect some sample images for the three classes of bicycles I had chosen: road, TT and MTB. It turns out that there is a neat way to obtain the list of urls for a Google image search, by running some javascript in the console. I downloaded 200 images for each type of bike and removed any that could not be opened. This relatively small data set allowed me to do all the machine learning using the CPU on my MacBook Pro in less than an hour.

The fast.ai library provides a range of convenient ways to access images for the purpose of training a neural network. In this instance, I used the default option of applying transfer learning to a pre-trained ResNet34 model, scaling the images to 224 pixel squares, with data augmentation. After doing some initial training, it was useful to look at the images that had been misclassified, as many of these were incorrect images of motorbikes or cartoons or bike frames without wheels or TT bars. Taking advantage of a useful fast.ai widget, I removed unhelpful training images and trained the model further.

The confusion matrix showed that final version of my model was running at about 90% accuracy on the validation set, which was hardly world-beating, but not too bad. The main problem was a tendency to mistake certain road bikes for TT bikes. This was understandable, given the tendency for road bikes to become more aero, though it was disappointing when drop handlebars were clearly visible.

The next step was to make my trained network available as a web application. First I exported the models parameter settings to Dropbox. Then I forked a fast.ai repository into my GitHub account and edited the files to link to my Dropbox, switching the documentation appropriately for bicycle identification. In the final step, I set up a free account on Render to host a web service linked to my GitHub repository. This automatically updates for any changes pushed to the repository.

Amazingly, it all works!

References

fast.ai lesson 2

My GitHub repository, include Jupyter notebook

Cycling Through Artistic Styles

HR

My earlier post on cycling art provided an engaging way to consider the creative potentials of deep learning. I have found myself frequently gravitating back to the idea, using the latest code available over at fast.ai. The method uses a neural network to combine the content of a photograph with the style of an artist, but I have found that it takes a few trials to find the right combination of content versus style. This led to the idea of generating a range of images and then running them together as a movie that gradually shifts between the base image to a raw interpretation of the artist’s style.

Artistic styles

Using a range of artistic styles from impressionist to abstract, the weights that produced the most interesting images varied according to the photograph and artistic style.

My selected best images are shown below, next to snippets of the corresponding artworks. It turned out that the impressionist artists (Monet, Van Gogh, Cézanne and Braque) maintained the content of the image, in spite of being more heavily weighted to artistic style. In contrast, the more monochromatic styles (O’Keeffe, Polygons, Abstract as well as Dali) needed to be more strongly weighted towards content, in order to preserve the cyclist in the image. The selections for Picasso and Pollock were evenly balanced.

Every image is unique and sometimes some real surprises pop up. For example, using Picasso’s style, the mountains are interpreted as rooftops, complete with windows and doors. Strange eyes peer out the background of finger-shapes in the Dali image and the mountains have become Monet’s water lilies. The Pollock image came out very nicely.

Deep learning

The approach was based on the method described in the paper referenced below. Running the code on a cloud-based GPU, it took about 30 seconds for a neural network to learn to generate in image with the desired characteristics. The learning process was achieved by minimising a loss function, using gradient descent. The clever part lay in defining an appropriate loss function. In this instance, the sample image was passed through a separate pre-trained neural network (VGG16), where the activations, at various layers in the network, were compared to those generated by the photograph and the artwork. The loss function combined the difference in photographic content with the difference in artistic style, where the critical parameter was the content weighting factor.

I decided to vary the content weighting factor logarithmically between around 0.1 and 100, to obtain a full range of content to style combinations. A movie was be produced simply by packing together the images one after the other.

References

A Neural Algorithm of Artistic Style, Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

 

 

What are you looking at?

Screen Shot 2018-05-06 at 18.48.40.png

In a recent blog, I described an experiment to train a deep neural network to distinguish between photographs of Vincenzo Nibali and Alejandro Valverde, using a very small data set of images. In the conclusion, I suggested that the network was probably basing its decisions more on the colours of the riders’ kit rather than on facial recognition. This article investigates what the network was actually “looking at”, in order to understand better how it was making decisions.

The issues of accountability and bias were among the topics discussed at the last NIPS conference. As machine learning algorithms are adopted across industry, it is important for companies to be able to explain how conclusions are reached. In many instances, it is not acceptable simply to rely on an impenetrable black box. AI researchers and developers need to be able to explain what is going on inside their models, in order to justify decisions taken. In doing so, some worrying instances of bias have been revealed in the selection of data used to train the algorithms.

I went back to my rider recognition model and used an approach called “Class Activation Maps” to identify which parts of the images accounted for the network’s choice of rider. Making use of the code provided in lesson 7 of the course offered by fast.ai, I took advantage of my existing small set of training, validation and test images of the two famous cyclists. Starting with a pre-trained version of ResNet34, the idea was to replace the last two layers with four new ones, the crucial one being a convolutional layer with two outputs, matching the number of cyclists in the classification task. The two outputs of this layer were 7×7 matrix representations of the relevant image.

The final predictions of the model came from a softmax of a flattened average pooling of these 7×7 representations. The softmax output gave the probabilities of Nibali and Valverde respectively. Since there was no learning beyond the final convolution, the activations of the two 7×7 matrices represented the “Nibali-ness” and “Valverde-ness” of the image. This could be displayed as a heat map on top of the image.

Examples are shown below for the validation set of 10 images of Nibali followed by 10 of Valverde. The yellow patch of the heat map highlights the part of the image that led to the prediction displayed above each image. Nine out of ten were correct for Nibali and six for Valverde.

Screen Shot 2018-05-06 at 18.10.00.png
Class Activation Maps applied to the validation set

The heat maps were very helpful in understanding the model’s decision making process. It seemed that for Nibali, his face and helmet were important, with some attention paid to the upper part of his blue Astana kit. In contrast, the network did a very good job at identifying the M on Valverde’s Moviestar kit. It was interesting to note that the network succeeded in spotting that Nibali was wearing a Specialized helmet whereas Valverde had a Catlike design. Three errors arose in the photos of his face, which was mistaken for Nibali’s. In fact, any picture of a face led to a prediction of Nibali, as demonstrated by the cropped image below that was used for training.

Screen Shot 2018-05-06 at 18.21.58

Why should that be? Looking back at the training set, it turned out that, by chance, there were far more mugshots of Nibali, while there were more photos of Valverde riding his bike, with his face obscured by sunglasses. This was an example of unintentional bias in the training data, providing a very useful lesson.

The final set of pictures shows the predictions made on the out-of-sample test set. All the predictions are correct, except the first one, where the model failed to spot the green M on Valverde’s chest and mistook the blurred background for Nibali. Otherwise the results confirmed that the network looked at Nibali’s face, the rider’s helmet or Valverde’s kit. It also remembered seeing an image of Nibali holding the Giro trophy in the training set.

Screen Shot 2018-05-06 at 18.34.38.png
Class Activation Maps applied to the test set

In conclusion, Class Activation Maps provide a useful way of visualising the activations of hidden laters in a deep neural network. This can go some way to accounting for the decisions that appear in the output. The approach can also help identify unintentional bias in the training set.

Which team is that?

Screen Shot 2018-04-11 at 11.18.09

My last blog explored the effectiveness of deep learning in spotting the difference between Vincenzo Nibali and Alejandro Valverde. Since the faces of the riders were obscured in many of the photos, it is likely that the neural network was basing its evaluations largely on the colours of their team kit. A natural next challenge is to identify a rider’s team from a photograph. This task parallels the approach to the kaggle dog breed competition used in lesson 2 of the fast.ai course on deep learning.

Eighteen World Tour teams are competing this year. So the first step was to trawl the Internet for images, ideally of riders in this year’s kit. As before, I used an automated downloader, but this posed a number of problems. For example, searching for “Astana” brings up photographs of the capital of Kazakhstan. So I narrowed things down by searching for  “Astana 2018 cycling team”. After eliminating very small images, I ended up with a total of about 9,700 images, but these still included a certain amount of junk that I did have the time to weed out, such as photos of footballers or motorcycles in the “Sky Racing Team”,.

The following small sample of training images is generally OK, though it includes images of Scott bikes rather than Mitchelton-Scott riders and  a picture of  Sunweb’s Wilco Kelderman labelled as FDJ. However, with around 500-700 images of each team, I pressed on, noting that, for some reason, there were only 166 of Moviestar and these included the old style kit.

Screen Shot 2018-04-11 at 10.18.54.png
Small sample of training images

For training on this multiple classification problem, I adopted a slightly more sophisticated approach than before. Taking a pre-trained Resnet50 model, I performed some initial fine-tuning, on images rescaled to 224×224. I settled on an optimal learning rate of 1e-3 for the final layer, while allowing some training of lower layers at much lower rates. With a view to improving generalisation, I opted to augment the training set with random changes, such as small shifts in four directions, zooming in up to 10%, adjusting lighting and left-right flips. After initial training, accuracy was 52.6% on the validation set. This was encouraging, given that random guesses would have achieved a rate of 1 in 18 or 5.6%.

Taking a pro tip from fast.ai, training proceeded with the images at a higher resolution of 299×299. The idea is to prevent overfitting during the early stages, but to improve the model later on by providing more data for each image. This raised the accuracy to 58.3% on the validation set. This figure was obtained using a trick called “test time augmentation”, where each final prediction is based on the average prediction of five different “augmented” versions of the image in question.

Given the noisy nature of some of the images used for training, I was pleased with this result, but the acid test was to evaluate performance on unseen images. So I created a test set of two images of a lead rider from each squad and asked the model to identify the team. These are the results.

75 percent right.png
75% accuracy on the test set

The trained Resnet50 correctly identified the teams of 27 out of 36 images. Interestingly, there were no predictions of MovieStar or Sky. This could be partly due to the underrepresentation of MovieStar in the training set. Froome was mistaken for AG2R and Astana, in column 7, rows 2 and 3. In the first image, his 2018 Sky kit was quite similar to Bardet’s to the left and in the second image the sky did appear to be Astana blue! It is not entirely obvious why Nibali was mistaken for Sunweb and Astana, in the top and bottom rows. However, the huge majority of predictions were correct. An overall  success rate of 75% based on an afternoon’s work was pretty amazing.

The results could certainly be improved by cleaning up the training data, but this raises an intriguing question about the efficacy of artificial intelligence. Taking a step back, I used Bing’s algorithms to find images of cycling teams in order to train an algorithm to identify cycling teams. In effect, I was training my network to reverse-engineer Bing’s search algorithm, rather than my actual objective of identifying cycling teams. If an Internet search for FDJ pulls up an image of Wilco Kelderman, my network would be inclined to suggest that he rides for the French team.

In conclusion, for this particular approach to reach or exceed human performance, expert human input is required to provide a reliable training set. This is why this experiment achieved 75%, whereas the top submissions on the dog breeds leaderboard show near perfect performance.

Valverde or Nibali?

Alejandro Valverde has kicked off the 2018 season with an impressive series of wins. Meanwhile Vincenzo Nibali delighted the tifosi with his victory in Milan San Remo. It is pretty easy to tell these two riders apart in the pictures above, but could computer distinguish between them?

Following up on my earlier blogs about neural networks, I have been taking a look at the updated version of fast.ai’s course on deep learning. With the field advancing at a rapid pace, this provides a good way to staying up to date with the state of the art. For example, there are now a couple of cheaper alternatives to AWS for accessing high powered GPUs, offered by Paperspace and Crestle. The latest fast.ai libraries include many new tools that work extremely well in practice.

There’s a view that deep learning requires hours of training on high-powered supercomputers, using thousands (or millions) of labelled examples, in order to learn to perform computer vision tasks. However, newer architectures, such as ResNet, are able to run on much smaller data sets. In order to test this, I used an image downloader to grab photos of Nibali and Valverde and manually selected about 55 decent pictures of each one.

I divided the images into a training set with about 40 images of each rider, a validation set with 10 of each and a test set containing the rest. Nibali appears in a range of different coloured jerseys, though the Astana blue is often present. Valverde is mainly wearing the old dark blue Movistar kit with a green M. There were more close-up shots of Nibali’s face than Valverde.

Screen Shot 2018-04-03 at 18.30.08.png

I was able to fine-tune a pre-trained ResNet neural network to this task, using some of the techniques from the fast.ai tool box, each designed to improve generalisation. The first trick was to augment the training set by performing minor transformations of the images at random, such as taking a mirror image, shifting left or right and zooming in a bit. The second set of tricks varied the rate of learning as the algorithm iterated repeatedly through the training set. A final useful technique created a set of variants of each test image and took the average of the predictions. Everything ran at lightning speed on a Paperspace GPU. After a run time of just a few minutes, the ResNet was able to  score 17 out of 20 on the following validation set.

Screen Shot 2018-04-03 at 18.49.27.png

The confusion matrix shows that the model correctly identified all the Nibali images, but it was wrong on three pictures of Valverde. The first incorrect image (below) shows Valverde in the red leader’s jersey of the Tour of Murcia, which is not dissimilar to Nibali’s new Bahrain Merida kit, though he was wearing red in two of his training images. In the second instance, the network was fooled by the change in colour of Moviestar’s kit, which had become rather similar to Astana’s light blue. The figure of 0.41 above the close-up image indicates that the model assigned only a 41% probability that the image was Valverde. It probably fell below the critical 50% level, in spite of the blue/green colours, because there were were far more close-up shots of Nibali than Valverde in the training set.

Overall of 17 out of 20 on the validation set is impressive. However, the network had access to the validation set during training, so this result is “in sample”. A proper  “out of sample” evaluation of the model’s ability made use the following ten images, comprising the test set that was kept aside.

Screen Shot 2018-04-03 at 21.21.59

Amazingly, the model correctly identified 9 out of the 10 pictures it had not seen before. The only error was the Valverde selfie shown in the final image. In order to work better in practice, the training set would need to include more examples of the riders’ 2018 kit. A variant of the problem would be to identify the team rather than the rider. The same network can be trained for multiple classes rather than just two.

This experiment shows that it is pretty straightforward to run state of the art image recognition tools remotely on a GPU somewhere in the cloud and come up with pretty impressive results, even with a small data set.

The next blog describes how to identify a rider’s team.