Sunflowers

Image in the style of @grandtourart

Last year, I experimented with using style transfer to automatically generate images in the style of @grandtourart. More recently I developed a more ambitious version of my rather simple bike identifier. The connection between these two projects is sunflowers. This blog describes how I built a flower identification app.

In the brilliant fast.ai Practical Deep Learning for Coders course, Jeremy Howard recommends downloading a publicly available dataset to improve one’s image categorisation skills. I decided to experiment with the 102 Category Flower Dataset, kindly made available by the Visual Geometry Group at Oxford University. In the original 2008 paper, the researchers used a combination of techniques to segment each image and characterise its features. Taking these as inputs to a Support Vector Machine classifier, their best model achieved an accuracy of 72.8%.

Annoyingly, I could not find a list linking the category numbers to the names of the flowers, so I scraped the page showing sample images and found the images in the labelled data.

Using exactly the same training, validation and test sets, my ResNet34 model quickly achieved an accuracy of 80.0%. I created a new branch of the GitHub repository established for the Bike Image model and linked this to a new web service on my Render account. The huge outperformance of the paper was satisfying, but I was sure that a better result was possible.

The Oxford researchers had divided their set of 8,189 labelled images into a training set and a validation set, each containing 10 examples of the 102 flowers. The remaining 6,149 images were reserved for testing. Why allocate less that a quarter of the data to training/validation? Perhaps this was due to limits on computational resources available at the time. In fact, the training and validation sets were so small that I was able to train the ResNet34 on my MacBook Pro’s CPU, within an acceptable time.

My plan to improve accuracy was to merge the test set into the training set, keeping aside the original validation set of 1,020 images for testing. This expanded training set of 7,261 images immediately failed on my MacBook, so I uploaded my existing model onto my PaperSpace GPU, with amazing results. Within 45 minutes, I had a model with 97.0% accuracy on the held-out test set. I quickly exported the learner and switched the link in the flowers branch of my GitHub repository. The committed changes automatically fed straight through to the web service on Render.

I discovered, when visiting the app on my phone, that selecting an image offers the option to take a photo and upload it directly for identification. Having exhausted the flowers in my garden, I have risked being spotted by neighbours as I furtively lean over their front walls to photograph the plants in their gardens.

Takeaways

It is very efficient to use smaller datasets and low resolution images for initial training. Save the model and then increase resolution. Often you can do this on a local CPU without even paying for access to a GPU. When you have a half decent model, upload it onto a GPU and continue training with the full dataset. Deploying the model as a web service on Render makes the model available to any device, including a mobile phone.

My final model is amazing… and it works for sunflowers.

References

Automated flower classification over a large number of classes, Maria-Elena Nilsback and Andrew Zisserman, Visual Geometry Group, Department of Engineering Science, University of Oxford, United Kingdom, men,az@robots.ox.ac.uk

102 Flowers Jupyter notebook

Bike Identification as a web app

One of the first skills acquired in the latest version of the fast.ai course on deep learning is how to create a production version of an image classifier that runs as a web application. I decided to test this out on a set of images of road bikes, TT bikes and mountain bikes. To try it out, click on the image above or go to this website https://bike-identifier.onrender.com/ and select an image from your device. If you are using a phone, you can try taking photos of different bikes, then click on Analyse to see if they are correctly identified. Side-on images work best.

How does it work?

The first task was to collect some sample images for the three classes of bicycles I had chosen: road, TT and MTB. It turns out that there is a neat way to obtain the list of urls for a Google image search, by running some javascript in the console. I downloaded 200 images for each type of bike and removed any that could not be opened. This relatively small data set allowed me to do all the machine learning using the CPU on my MacBook Pro in less than an hour.

The fast.ai library provides a range of convenient ways to access images for the purpose of training a neural network. In this instance, I used the default option of applying transfer learning to a pre-trained ResNet34 model, scaling the images to 224 pixel squares, with data augmentation. After doing some initial training, it was useful to look at the images that had been misclassified, as many of these were incorrect images of motorbikes or cartoons or bike frames without wheels or TT bars. Taking advantage of a useful fast.ai widget, I removed unhelpful training images and trained the model further.

The confusion matrix showed that final version of my model was running at about 90% accuracy on the validation set, which was hardly world-beating, but not too bad. The main problem was a tendency to mistake certain road bikes for TT bikes. This was understandable, given the tendency for road bikes to become more aero, though it was disappointing when drop handlebars were clearly visible.

The next step was to make my trained network available as a web application. First I exported the models parameter settings to Dropbox. Then I forked a fast.ai repository into my GitHub account and edited the files to link to my Dropbox, switching the documentation appropriately for bicycle identification. In the final step, I set up a free account on Render to host a web service linked to my GitHub repository. This automatically updates for any changes pushed to the repository.

Amazingly, it all works!

References

fast.ai lesson 2

My GitHub repository, include Jupyter notebook

Cycling Through Artistic Styles

HR

My earlier post on cycling art provided an engaging way to consider the creative potentials of deep learning. I have found myself frequently gravitating back to the idea, using the latest code available over at fast.ai. The method uses a neural network to combine the content of a photograph with the style of an artist, but I have found that it takes a few trials to find the right combination of content versus style. This led to the idea of generating a range of images and then running them together as a movie that gradually shifts between the base image to a raw interpretation of the artist’s style.

Artistic styles

Using a range of artistic styles from impressionist to abstract, the weights that produced the most interesting images varied according to the photograph and artistic style.

My selected best images are shown below, next to snippets of the corresponding artworks. It turned out that the impressionist artists (Monet, Van Gogh, Cézanne and Braque) maintained the content of the image, in spite of being more heavily weighted to artistic style. In contrast, the more monochromatic styles (O’Keeffe, Polygons, Abstract as well as Dali) needed to be more strongly weighted towards content, in order to preserve the cyclist in the image. The selections for Picasso and Pollock were evenly balanced.

Every image is unique and sometimes some real surprises pop up. For example, using Picasso’s style, the mountains are interpreted as rooftops, complete with windows and doors. Strange eyes peer out the background of finger-shapes in the Dali image and the mountains have become Monet’s water lilies. The Pollock image came out very nicely.

Deep learning

The approach was based on the method described in the paper referenced below. Running the code on a cloud-based GPU, it took about 30 seconds for a neural network to learn to generate in image with the desired characteristics. The learning process was achieved by minimising a loss function, using gradient descent. The clever part lay in defining an appropriate loss function. In this instance, the sample image was passed through a separate pre-trained neural network (VGG16), where the activations, at various layers in the network, were compared to those generated by the photograph and the artwork. The loss function combined the difference in photographic content with the difference in artistic style, where the critical parameter was the content weighting factor.

I decided to vary the content weighting factor logarithmically between around 0.1 and 100, to obtain a full range of content to style combinations. A movie was be produced simply by packing together the images one after the other.

References

A Neural Algorithm of Artistic Style, Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

 

 

What are you looking at?

Screen Shot 2018-05-06 at 18.48.40.png

In a recent blog, I described an experiment to train a deep neural network to distinguish between photographs of Vincenzo Nibali and Alejandro Valverde, using a very small data set of images. In the conclusion, I suggested that the network was probably basing its decisions more on the colours of the riders’ kit rather than on facial recognition. This article investigates what the network was actually “looking at”, in order to understand better how it was making decisions.

The issues of accountability and bias were among the topics discussed at the last NIPS conference. As machine learning algorithms are adopted across industry, it is important for companies to be able to explain how conclusions are reached. In many instances, it is not acceptable simply to rely on an impenetrable black box. AI researchers and developers need to be able to explain what is going on inside their models, in order to justify decisions taken. In doing so, some worrying instances of bias have been revealed in the selection of data used to train the algorithms.

I went back to my rider recognition model and used an approach called “Class Activation Maps” to identify which parts of the images accounted for the network’s choice of rider. Making use of the code provided in lesson 7 of the course offered by fast.ai, I took advantage of my existing small set of training, validation and test images of the two famous cyclists. Starting with a pre-trained version of ResNet34, the idea was to replace the last two layers with four new ones, the crucial one being a convolutional layer with two outputs, matching the number of cyclists in the classification task. The two outputs of this layer were 7×7 matrix representations of the relevant image.

The final predictions of the model came from a softmax of a flattened average pooling of these 7×7 representations. The softmax output gave the probabilities of Nibali and Valverde respectively. Since there was no learning beyond the final convolution, the activations of the two 7×7 matrices represented the “Nibali-ness” and “Valverde-ness” of the image. This could be displayed as a heat map on top of the image.

Examples are shown below for the validation set of 10 images of Nibali followed by 10 of Valverde. The yellow patch of the heat map highlights the part of the image that led to the prediction displayed above each image. Nine out of ten were correct for Nibali and six for Valverde.

Screen Shot 2018-05-06 at 18.10.00.png
Class Activation Maps applied to the validation set

The heat maps were very helpful in understanding the model’s decision making process. It seemed that for Nibali, his face and helmet were important, with some attention paid to the upper part of his blue Astana kit. In contrast, the network did a very good job at identifying the M on Valverde’s Moviestar kit. It was interesting to note that the network succeeded in spotting that Nibali was wearing a Specialized helmet whereas Valverde had a Catlike design. Three errors arose in the photos of his face, which was mistaken for Nibali’s. In fact, any picture of a face led to a prediction of Nibali, as demonstrated by the cropped image below that was used for training.

Screen Shot 2018-05-06 at 18.21.58

Why should that be? Looking back at the training set, it turned out that, by chance, there were far more mugshots of Nibali, while there were more photos of Valverde riding his bike, with his face obscured by sunglasses. This was an example of unintentional bias in the training data, providing a very useful lesson.

The final set of pictures shows the predictions made on the out-of-sample test set. All the predictions are correct, except the first one, where the model failed to spot the green M on Valverde’s chest and mistook the blurred background for Nibali. Otherwise the results confirmed that the network looked at Nibali’s face, the rider’s helmet or Valverde’s kit. It also remembered seeing an image of Nibali holding the Giro trophy in the training set.

Screen Shot 2018-05-06 at 18.34.38.png
Class Activation Maps applied to the test set

In conclusion, Class Activation Maps provide a useful way of visualising the activations of hidden laters in a deep neural network. This can go some way to accounting for the decisions that appear in the output. The approach can also help identify unintentional bias in the training set.

Which team is that?

Screen Shot 2018-04-11 at 11.18.09

My last blog explored the effectiveness of deep learning in spotting the difference between Vincenzo Nibali and Alejandro Valverde. Since the faces of the riders were obscured in many of the photos, it is likely that the neural network was basing its evaluations largely on the colours of their team kit. A natural next challenge is to identify a rider’s team from a photograph. This task parallels the approach to the kaggle dog breed competition used in lesson 2 of the fast.ai course on deep learning.

Eighteen World Tour teams are competing this year. So the first step was to trawl the Internet for images, ideally of riders in this year’s kit. As before, I used an automated downloader, but this posed a number of problems. For example, searching for “Astana” brings up photographs of the capital of Kazakhstan. So I narrowed things down by searching for  “Astana 2018 cycling team”. After eliminating very small images, I ended up with a total of about 9,700 images, but these still included a certain amount of junk that I did have the time to weed out, such as photos of footballers or motorcycles in the “Sky Racing Team”,.

The following small sample of training images is generally OK, though it includes images of Scott bikes rather than Mitchelton-Scott riders and  a picture of  Sunweb’s Wilco Kelderman labelled as FDJ. However, with around 500-700 images of each team, I pressed on, noting that, for some reason, there were only 166 of Moviestar and these included the old style kit.

Screen Shot 2018-04-11 at 10.18.54.png
Small sample of training images

For training on this multiple classification problem, I adopted a slightly more sophisticated approach than before. Taking a pre-trained Resnet50 model, I performed some initial fine-tuning, on images rescaled to 224×224. I settled on an optimal learning rate of 1e-3 for the final layer, while allowing some training of lower layers at much lower rates. With a view to improving generalisation, I opted to augment the training set with random changes, such as small shifts in four directions, zooming in up to 10%, adjusting lighting and left-right flips. After initial training, accuracy was 52.6% on the validation set. This was encouraging, given that random guesses would have achieved a rate of 1 in 18 or 5.6%.

Taking a pro tip from fast.ai, training proceeded with the images at a higher resolution of 299×299. The idea is to prevent overfitting during the early stages, but to improve the model later on by providing more data for each image. This raised the accuracy to 58.3% on the validation set. This figure was obtained using a trick called “test time augmentation”, where each final prediction is based on the average prediction of five different “augmented” versions of the image in question.

Given the noisy nature of some of the images used for training, I was pleased with this result, but the acid test was to evaluate performance on unseen images. So I created a test set of two images of a lead rider from each squad and asked the model to identify the team. These are the results.

75 percent right.png
75% accuracy on the test set

The trained Resnet50 correctly identified the teams of 27 out of 36 images. Interestingly, there were no predictions of MovieStar or Sky. This could be partly due to the underrepresentation of MovieStar in the training set. Froome was mistaken for AG2R and Astana, in column 7, rows 2 and 3. In the first image, his 2018 Sky kit was quite similar to Bardet’s to the left and in the second image the sky did appear to be Astana blue! It is not entirely obvious why Nibali was mistaken for Sunweb and Astana, in the top and bottom rows. However, the huge majority of predictions were correct. An overall  success rate of 75% based on an afternoon’s work was pretty amazing.

The results could certainly be improved by cleaning up the training data, but this raises an intriguing question about the efficacy of artificial intelligence. Taking a step back, I used Bing’s algorithms to find images of cycling teams in order to train an algorithm to identify cycling teams. In effect, I was training my network to reverse-engineer Bing’s search algorithm, rather than my actual objective of identifying cycling teams. If an Internet search for FDJ pulls up an image of Wilco Kelderman, my network would be inclined to suggest that he rides for the French team.

In conclusion, for this particular approach to reach or exceed human performance, expert human input is required to provide a reliable training set. This is why this experiment achieved 75%, whereas the top submissions on the dog breeds leaderboard show near perfect performance.

Valverde or Nibali?

Alejandro Valverde has kicked off the 2018 season with an impressive series of wins. Meanwhile Vincenzo Nibali delighted the tifosi with his victory in Milan San Remo. It is pretty easy to tell these two riders apart in the pictures above, but could computer distinguish between them?

Following up on my earlier blogs about neural networks, I have been taking a look at the updated version of fast.ai’s course on deep learning. With the field advancing at a rapid pace, this provides a good way to staying up to date with the state of the art. For example, there are now a couple of cheaper alternatives to AWS for accessing high powered GPUs, offered by Paperspace and Crestle. The latest fast.ai libraries include many new tools that work extremely well in practice.

There’s a view that deep learning requires hours of training on high-powered supercomputers, using thousands (or millions) of labelled examples, in order to learn to perform computer vision tasks. However, newer architectures, such as ResNet, are able to run on much smaller data sets. In order to test this, I used an image downloader to grab photos of Nibali and Valverde and manually selected about 55 decent pictures of each one.

I divided the images into a training set with about 40 images of each rider, a validation set with 10 of each and a test set containing the rest. Nibali appears in a range of different coloured jerseys, though the Astana blue is often present. Valverde is mainly wearing the old dark blue Movistar kit with a green M. There were more close-up shots of Nibali’s face than Valverde.

Screen Shot 2018-04-03 at 18.30.08.png

I was able to fine-tune a pre-trained ResNet neural network to this task, using some of the techniques from the fast.ai tool box, each designed to improve generalisation. The first trick was to augment the training set by performing minor transformations of the images at random, such as taking a mirror image, shifting left or right and zooming in a bit. The second set of tricks varied the rate of learning as the algorithm iterated repeatedly through the training set. A final useful technique created a set of variants of each test image and took the average of the predictions. Everything ran at lightning speed on a Paperspace GPU. After a run time of just a few minutes, the ResNet was able to  score 17 out of 20 on the following validation set.

Screen Shot 2018-04-03 at 18.49.27.png

The confusion matrix shows that the model correctly identified all the Nibali images, but it was wrong on three pictures of Valverde. The first incorrect image (below) shows Valverde in the red leader’s jersey of the Tour of Murcia, which is not dissimilar to Nibali’s new Bahrain Merida kit, though he was wearing red in two of his training images. In the second instance, the network was fooled by the change in colour of Moviestar’s kit, which had become rather similar to Astana’s light blue. The figure of 0.41 above the close-up image indicates that the model assigned only a 41% probability that the image was Valverde. It probably fell below the critical 50% level, in spite of the blue/green colours, because there were were far more close-up shots of Nibali than Valverde in the training set.

Overall of 17 out of 20 on the validation set is impressive. However, the network had access to the validation set during training, so this result is “in sample”. A proper  “out of sample” evaluation of the model’s ability made use the following ten images, comprising the test set that was kept aside.

Screen Shot 2018-04-03 at 21.21.59

Amazingly, the model correctly identified 9 out of the 10 pictures it had not seen before. The only error was the Valverde selfie shown in the final image. In order to work better in practice, the training set would need to include more examples of the riders’ 2018 kit. A variant of the problem would be to identify the team rather than the rider. The same network can be trained for multiple classes rather than just two.

This experiment shows that it is pretty straightforward to run state of the art image recognition tools remotely on a GPU somewhere in the cloud and come up with pretty impressive results, even with a small data set.

The next blog describes how to identify a rider’s team.

 

 

Froome versus Dumoulin

Screen Shot 2017-10-27 at 19.04.21Many commentators have been licking their lips at the prospect of head-to-head combat between Chris Froome and Tom Dumoulin at next year’s Tour de France. It is hard to make a comparison based on their results in 2017, because they managed to avoid racing each other over the entire season of UCI World Tour races, meeting only in the World Championship Individual Time Trial, where the Dutchman was victorious. But it is intriguing to ask how Dumoulin might have done in the Tour de France and the Vuelta or, indeed, how Froome might have fared in the Giro.

Inspiration for addressing these hypothetical questions comes from an unexpected source. In 2009 Netflix awarded a $1million prize to a team that improved the company’s technique for making film recommendations to its users, based on the star ratings assigned by viewers. The successful algorithm exploited the fact that viewers may enjoy the films that are highly rated by other users who have generally agreed on the ratings of the films they have seen in common. Initial approaches sought to classify films into genres or those starring particular actors, in the hope of grouping together viewers into similar categories. However, it turned out to be very difficult to identify which features of a film are important. An alternative is simply to let the computer crunch the data and identify  the key features for itself. A method called Collaborative Filtering became one of the most popular employed for recommender systems.

Our cycling problem shares certain characteristics with the Netflix challenge: instead of users, films and ratings, we have riders, races and results. Riders enter a selection of races over the season, preferring those where they hope to do well. Similar riders, for example sprinters, tend to finish high in the results of races where other sprinters also do well. Collaborative filtering should be able to exploit the fact that climbers, sprinters or TTers tend to finish close to each other, across a range of races.

This year’s UCI World Tour concluded with the Tour of Guangxi, completing the data set of results for 2017. After excluding team time trials, 883 riders entered 174 races, resulting in 26,966 finishers. Most races have up to 200 participants , so if you imagine a huge table with all the racers down the rows and all the races across the columns, the resulting matrix is “sparse” in the sense that there are lots of missing values for the riders who were not in a particular race. Collaborative Filtering aims to fill in the spaces, i.e. to estimate the position of a rider who did not enter a specific race. This is exactly what we would like to do for the Grand Tours.

It took a couple of minutes to fit a matrix factorisation Collaborative Filtering model, using keras, on my MacBook Pro. Some experimenting suggested that I needed about 50 hidden factors plus a bias to come up with a reasonable fit for this data set. Taking at random the Milan San Remo one day stage race, it did a fairly good job of predicting the top ten riders for this long, hilly race with a flat finish.

 Model fit (prediction) Rider Actual result
1 Peter_Sagan 2
2 Alexander_Kristoff 4
3 Michael_Matthews 12
4 Edvald_Boasson_Hagen 19
5 Sonny_Colbrelli 13
6 Michal_Kwiatkowski 1
7 John_Degenkolb 7
8 nacer_Bouhanni 8
9 Julian_Alaphilippe 3
10 Diego_Ulissi 40

The following figure visualises the primary factors the model derived for classifying the best riders. Sprinters are in the lower part of chart, with climbers towards the top and allrounders in the middle. Those with a lot of wins are towards the left.

Screen Shot 2017-10-27 at 19.26.17

Now we come to the interesting part: how would Tom Dumoulin and Chris Froome have compared in the other’s Grand Tours? Note that this model takes account of the results of all the riders in all the races, so it should be capable of detecting the benefit of being part of a strong team.

Tour de France

The model suggested that Tom Dumoulin would have beaten Chris Froome in stages 1(TT), 2, 5, 6, 10 and 21, but the yellow jersey winner would have been stronger in the mountains and won overall.

Giro d’Italia

The model suggested that Chris Froome would have been ahead in the majority of stages, leaving stages 4, 5, 6, 9,  10(TT), 14 and 21(TT) to Dumoulin. The Brit would have most likely claimed the pink jersey.

Vuelta a España

The model suggested that Tom Dumoulin would have beaten Chris Froome in stages 2, 4, 12, 18, 19 and 21. In spite of a surge by the Dutchman towards the end of the race, the red jersey would have remained with Froome.

Conclusions

Based on a Collaborative Filtering approach, the results of 2017 suggest that Chris Froome would have beaten Tom Dumoulin in any of the Grand Tours.