Pro Cycling – Science4Performance

Cyclo-Social Networks

When Mexico’s Isaac del Toro and the Brit, Finlay Pickering, joined World Tour teams in 2024, they became part of an elite group. The structure of the professional cycling community differs from other types of social network, because the sport is organised around close-knit teams. You may have heard of the idea that everyone in the world is connected by six degrees of separation. Many social networks include key individuals who act as hubs linking disparate groups. How closely connected are professional cyclist? Which cyclists are the most connected?

Forming a link

An obvious way for cyclists to become acquainted is by being part of the same team. They spend long periods travelling, training, eating and relaxing together, often sharing rooms. Working as a team through the trials and tribulations of elite competition develops high degrees of camaraderie.

Each rider’s page on ProCyclingStats includes a team history. So I checked the past affiliations of all the current UCI world team riders. The idea was to build a graph where each node represented a rider, with edges connecting riders who had been in the same team. Each edge was weighted by the number of years a pair of riders had been in the same team, reflecting the strength of their relationship.

Cyclo-social network

The resulting graphical network, displayed at the top of this blog, has some interesting properties that reveal the dynamics of professional cycling. The 18 world tour teams are displayed in different colours. The size of each rider node is scaled by length of career. An experienced rider, like Geraint Thomas, has a larger node and more connections, which are shown as light grey lines, where the thickness represents years in the pair of riders were in the same team. Newer riders, with fewer connections tend to be on the periphery. For example, Isaac del Toro is the orange UAE Team Emirates rider at the top.

There are no fixed rules about how to represent the network, but the ideas is that more closely related riders ought to congregate together. The network shows INEOS near UAE as shades of orange in the top right. Next we have Team Visma Lease a Bike in cyan, around two o’clock, close to Lidl – Trek in light green, Bahrain – Victorious in darker blue and Cofidis in lighter blue. Yellow Groupama – FDJ lies above the dark brown Jayco AlUla riders in the lower right. Red Movistar, light green Team dsm-firmenich PostNLand light blue BORA – Hansgrohe are near 6 o’clock, though Primož Roglič lies much closer to his old team. Decathlon AG2R La Mondiale is light blue in the lower left, near the darker blue of Alpecin – Deceuninck and pale green EF Education – EasyPost. Astana is orange, around nine o’clock. Dark blue Soudal Quick-Step, lighter blue Intermarché – Wanty and red Arkéa – B&B Hotels all hover around the top left. Teams that are more dispersed would be indicative of higher annual turnover.

3 degrees of separation

No rider is more than three steps from any other. In fact the average distance between riders is two steps, because the chances are that two riders have ridden with someone in common over their careers. Although the neo-pros are obviously more distantly connected, Isaac del Toro rides with Adam Yates, who spent six years alongside Jack Haig at Michelton Scott, but who now rides for Bahrain Victorious as a teammate of Finlay Pickering.

Bob Jungels is the most connected rider, having been a teammate of 100 riders in the current peloton. Since 2012, he has ridden for five different teams. He and Florian Senechal are only two steps from all riders. The Polish rider Łukasz Wiśniowski has been with six teams and has 94 links. Then we have Mark Cavendish with 91 and Rui Costa with 89.

In contrast, taking account of spending multiple years as teammates, Geraint Thomas tops the list with 218 teammate years. Interestingly, he is followed by Jonathan Castroviejo, Salvatore Puccio, Michal Kwiatkowski, Luke Rowe, Ben Swift, all long-time colleagues at INEOS Grenadiers. This suggests they are very happy (or well paid) staying where they are.

If we estimate “long-term team loyalty” by number of teammate years divided by number teammates, Geraint is top, followed by Salvatore Puccio, Michael Hepburn, Luke Durbridge, Luke Rowe, Simon Yates and Jasper Stuyven.

Best buddies

The riders who have been teammates for the longest are Luke Durbridge/ Michael Hepburn and Robert Gesink/Steven Kruijswijk both 15 years,
Geraint Thomas has ridden with Ben Swift and Salvatore Puccio for 14 years and Luke Rowe with Salvatore Puccio for 13.

Outliers

In a broader analysis that includes all the Pro Continental teams alongside the World Tour, the graph below shows a notable team of outliers. This is Team Novo Nordisk for athletes compete with type 1 diabetes. Their Hungarian rider, Peter Kusztor, was the teammate of several current riders, such as Jan Tratnik, prior to joining Novo Nordisk, in a career that stretches back to 2006. The team is an inspiration to everyone affected by diabetes.

Analysis

This analysis was performed in Python, using the NetworkX library.

Code can be found here.

Dreaming of the Giro

fast.ai’s latest version of Practical Deep Learning for Coders Part 2 kicks off with a review of Stable Diffusion. This is a deep neural network architecture developed by Stability AI that is able to convert text into images. With a bit of tweaking it can do all sorts of other things. Inspired by the amazing videos created by Softology, I set out to generate a dreamlike video based on the idea of riding my bicycle around a stage of the Giro d’Italia.

Text to image

As mentioned in a previous post, Hugging Face is a fantastic resource for open source models. I worked with one of fast.ai’s notebooks using a free GPU on Google Colab. In the first step I set up a text-t0-image pipeline using a pre-trained version of stable-diffusion-v1-4. The prompt “a treelined avenue, poplars, summer day, france” generated the following images, where model was more strongly guided by the prompt in each row. I liked the first image in the second row, so I decided to make this the first frame in an initial test video.

Stable diffusion is trained in a multimodal fashion, by aligning text embeddings with the encoded versions corresponding images. Starting with random noise, the pixels are progressively modified in order to move the encoding of the noisy image closer to something that matches the embedding of the text prompt.

Zooming in

The next step was to simulate the idea of moving forward along the road. I did this by writing a simple two-line function, using fast.ai tools, that cropped a small border off the edge of the image and then scaled it back up to the original size. In order to generate my movie, rather that starting with random noise, I wanted to use my zoomed-in image as the starting point for generating the next image. For this I needed to load up an image-to-image pipeline.

I spent about an hour experimenting with with four parameters. Zooming in by trimming only a couple of pixels around the edge created smoother transitions. Reducing the strength of additional noise enhanced the sense of continuity by ensuring that that subsequent images did not change too dramatically. A guidance scale of 7 forced the model to keep following prompt and not simply zoom into the middle of the image. The number of inference steps provided a trade-off between image quality and run time.

When I was happy, I generated a sequence of 256 images, which took about 20 minutes, and saved them as a GIF. This produced a pleasing, constantly changing effect with an impressionist style.

Back to where you started

In order to make the GIF loop smoothly, it was desirable to find a way to return to the starting image as part of the continuous zooming in process. At first it seemed that this might be possible by reversing the existing sequence of images and then generating a new sequence of images using each image in the reversed list as the next starting point. However, this did not work, because it gave the impression of moving backwards, rather than progressing forward along the road.

After thinking about the way stable diffusion works, it became apparent that I could return to the initial image by mixing it with the current image before taking the next step. By progressively increasing the mixing weight of the initial image, the generated images became closer to target over a desired number of steps as shown below.

Putting it al together produced the following video, which successfully loops back to its starting point. It is not a perfect animation, because the it zooms into the centre, whereas the vanishing point is below the centre of the image. This means we end up looking up at the trees at some points. But overall it had the effect I was after.

A stage of the Giro

Once all this was working, it was relatively straightforward to create a video that tells a story. I made a list of prompts describing the changing countryside of an imaginary stage of the Giro d’Italia, specifying the number of frames for each sequence. I chose the following.

[‘a wide street in a rural town in Tuscany, springtime’, 25],

[‘a road in the countryside, in Tuscany, springtime’,25],

[“a road by the sea, trees on the right, sunny day, Italy”,50],

[‘a road going up a mountain, Dolomites, sunny day’,50],

[‘a road descending a mountain, Dolomites, Italy’,25],

[‘a road in the countryside, cypress trees, Tuscany’,50],

[‘a narrow road through a medieval town in Tuscany, sunny day’,50]

These prompts produced the video shown at the top of this post. The springtime blossom in the starting town was very effective and the endless climb up into the sunlit Dolomites looked great. For some reason the seaside prompt did not work, so the sequence became temporarily stuck with red blobs. Running it again would make something different. Changing the prompts offered endless possibilities.

The code to run this appears on my GitHub page. If you have a Google account, you can open it directly in Colab and set the RunTime to GPU. You also need a free Hugging Face account to load the stable diffusion pipelines.

Eddy goes to Hollywood

Should Eddy Merckx win an Oscar? Could the boyish looks of Tadej Pogačar or Remco Evenepoel make it in the movies? Would Mathieu van der Poel’s chiselled chin or Wout van Aert strong features help them lead the cast in the next blockbuster? I built a FilmStars app to find out.

https://sci4-filmstars.hf.space

Building a deep learning model

Taking advantage of the fantastic deep learning library provided by fast.ai, I downloaded and cleaned up 100 photos of IMDb’s top 100 male and female stars. Then I used a free GPU on Kaggle to fine-tune a pre-trained Reset50 neural net architecture to identify movie stars from their photos. It took about 2 hours to obtain an accuracy of about 60%. There is no doubt that this model could be greatly improved, but I stopped at that point in the interest of time. After all, 60% is a lot better than the 0.5% obtained from random guessing. Then I used HuggingFace to host my app. The project was completed in two days with zero outlay for resources.

It is quite hard to identify movie stars, because adopting a different persona is part of the job. This means that actors can look very different from one photo to the next. They also get older: sadly, the sex bombs of the ’60s inevitably become ageing actresses in their sixties. So the neural network really had its work cut out to distinguish between 200 different actors, using a relatively small sample of data and only a short amount of training.

Breaking away

Creating the perfect film star identifier was never really the point of the app. The idea was to allow people to upload images to see which film stars were suggested. If you have friend who looks like Ralph Fiennes, you can upload a photo and see whether the neural net agrees.

I tried it out with professional cyclists. These were the top choices.

Eddy Merckx	James Dean
Tadej Pogačar	Matt Damon
Remco Evenepoel	Mel Gibson
Mathieu van der Poel	Leonardo DiCaprio
Wout van Aert	Brad Pitt
Marianne Vos	Jodie Foster
Ashleigh Moolman	Marion Cotillard
Katarzyna Niewiadoma	Faye Dunaway
Anna van der Breggen	Brigitte Bardot

Cycling Stars

In each case I found an image of the top choice of film star for comparison.

The model was more confident with the male cyclists, though it really depends on the photo and even the degree of cropping applied to the image. The nice thing about the app is that people like to be compared to attractive film stars, though there are are few shockers in the underlying database. The model does not deal very well with beards and men with long hair. It is best to use a “movie star” type of image, rather than someone wearing cycling kit.

Closing the gap

One of the big differences between amateur and professional cycle racing is the role of the breakaway. Amateur racing usually consists of a succession of attacks until a group of strong riders breaks away and disappears into the distance to contest the win. There is rarely enough firepower left in the peloton to close down the gap.

This contrasts with professional racing, where a group of weaker riders typically contests for the breakaway, hoping that the pursuing teams will miscalculate their efforts to bring their leaders to the head of the race in the final kilometres. Occasionally a solo rider launches a last minute attack, forcing other riders to chase.

One minute for every ten kilometres

Much of the excitement for cycling fans is generated by the tension between the breakaway and the peloton, especially when the result of the race hangs in the balance until the final metres. Commentators often say that the break needs a lead of at least one minute for every ten kilometres before the finish line. Where does this rule of thumb come from?

It’s time for some back of the envelope calculations. On flat terrain, the breakaway may ride the final 10km of a professional race at about 50kph. A lead of one minute equates to a gap of 833m, which the peloton must close within the 12 minutes that it will take the breakaway riders to reach the finish line. This means the peloton must ride at 54.2kph, which is just over 8.3% faster than the riders ahead.

On a flat road power would be almost exclusively devoted to overcoming aerodynamic drag. The effort rises with the cube of velocity, so the power output of the chasing riders needs to be 27% high. If the breakaway riders are pushing out 400W, the riders leading the chasing group need to be doing over 500W.

The peloton has several advantages. Riding in the bunch saves a lot of energy, especially relative to the efforts of a small number of riders who have been in a breakaway all day. This means that many riders have energy reserves available to lift the pace at the end of the race. Teams are drilled to deploy these reserves efficiently by drafting behind the riders who are emptying themselves at the front of the chasing pack. Having the breakaway in sight provides a psychological boost as the gap narrows. The one minute rule suggests these benefits equate to a power advantage of around 25%.

Not for an uphill finish

If the race finishes on a long climb, a one minute lead is very unlikely to be enough for the break to stay away. Ascending at 25kph equates to a gap of only 417m and now the peloton has 25 minutes to make up the difference. This can be achieved by riding at 26kph. This is just 4% faster, requiring 13% higher power to overcome the additional aerodynamic drag. This would be about 450W, if the break is holding 400W.

The chasing peloton still has fresher riders, who may be able to see the break up the road, they do not have the same drafting advantages when climbing at 26kph. The other big factor is gravity. The specialist climbers are able to put in strong accelerations on steep sections, quickly gaining on those ahead. They can climb faster than heavier riders at equivalent power.

If we take the same figure for the power advantage over the break as before, of around 25%, the break would need to have a lead of 1 minute 55 seconds as it passes the 10km banner. However, experience suggests that unless there is a very strong climber in the break, a much bigger time gap would be required for the break to stay away.

Chasing downhill

This analysis also explains why it is very difficult to narrow a gap on a fast descent. Consider a 10km sweeping road coming down from an alpine pass to the valley. Riding at 60kph, a one minute gap equates to 1km. The peloton would have to average 66kph over the whole 10km in order to make the catch in then ten minute descent. In spite of the assistance of gravity, the 10% higher speed converts into a 33% increase in the effect of drag, where riders begin to approach terminal velocity.

Amateur breaks

Amateurs do not have the luxury of a directeur sportif running a spreadsheet in a following team car. In fact you are lucky if you anyone gives you an idea of a time gap. The best strategy is firstly to follow the attacks of the strongest riders in order to get into a successful break and then encourage your fellow breakaway riders, verbally and by example, to ride through and off, in order to establish a gap. As you get closer to the finish, you should assess the other riders in order to work out how you are going to beat them over the line.

Tadej Pogacar

Click on the image to hear some music I composed. It was featured on The Cycling Podcast for Stage 14 of the Tour de France 2021. The tune is free to download.

These are the lyrics.

The boy prince

Who’s the guy in yellow up the road?
He’s a kind of mellow looking dude
Who’s the guy in yellow up the road?
Motivated fellow looking good.

Alaphilippe
Mathieu van der Poel
You were digging deep
But guess who’s on a roll
Tadej Pogacar
We know who you are
Tadej Pogacar
You’re a superstar!

Faster in the TT than a train
Riding up the Giant of Provence
Winner of a grand tour once again?
Boy prince of the tour: princeling of the Tour de France.

Alaphilippe
Mathieu van der Poel
Your were digging deep
But guess who’s on a roll
Tadej Pogacar
We know who you are
Tadej Pogacar
You’re a superstar!

Who’s the guy in yellow riding up the road?
He’s a kind of mellow gentle-looking dude
Who’s the guy in yellow rolling up the road?
Motivated fellow: he’s sure looking good
Faster in the TT faster than a train
Riding up Mont Ventoux, Giant of Provence
Winner of a grand tour, in Paris once again?
Boy prince of the tour: princeling of the Tour de France.

Milan Sanremo in a Random Forest

Predicted top ten for Milan San Remo 2021

Last time I tried to predict a race, I trained up a neural network on past race results, ahead of the World Championships in Harrogate. The model backed Sam Bennett, but it did not take account of the weather conditions, which turned out to be terrible. Fortunately the forecast looks good for tomorrow’s Milan Sanremo.

This time I have tried using a Random Forest, based on the results of the UCI races that took place in 2020 and so far in 2021. The model took account of each rider’s past results, team, height and weight, together with key statistics about each race, including date, distance, average speed and type of parcours.

One of the nice things about this type of model is that it is possible to see how the factors contribute to the overall predictions. The following waterfall chart explains why the model uncontroversially has Wout van Aert as the favourite.

Breakdown of prediction for Wout van Aert

The largest positive contribution comes from being Wout van Aert. This is because he has a lot of good results. His height and weight favour Milan Sanremo. He also has a strong positive coming from his team. This distance and race type make further positive contributions.

We can contrast this with the model’s prediction for Mathieu van der Poel, who is ranked 9th.

Breakdown of prediction for Mathieu van der Poel

We see a positive personal contribution from being van der Poel, but having raced fewer UCI events, he has less of a strong set of results than van Aert. According to the model the Alpecin Fenix team contribution is not a strong as Jumbo Visma, but the long distance of the race works in favour of the Dutchman. The day of year gives a small negative contribution, suggesting that his road results have been stronger later in the year, but this could be due to last year’s unusual timing of races.

Each of the other riders in the model’s top 10 is in with a shout.

It’s taken me all afternoon to set up this model, so this is just a short post.

Post race comment

Where was Jasper Stuyven?

Like Mads Pedersen in Harrogate back in 2019, Jasper Stuyven was this year’s surprise winner in Sanremo. So what had the model expected for him? Scrolling down the list of predictions, Stuyven was ranked 39th.

Breakdown of prediction for Jasper Stuyven

His individual rider prediction was negative, perhaps because he has not had many good results so far this year, though he did win Omloop Het Nieuwsblad last year and had several top 10 finishes. The model assessed that his greatest advantage came from the length of the race, suggesting that he tends to do well over greater distances.

The nice thing about this approach is that that it identifies factors that are relevant to particular riders, in a quantitative fashion. This helps to overcome personal biases and the human tendency to overweight and project forward what has happened most recently.

Hexagons in the Arctic Circle

An attractive aspect of hexagonal patterns is that they can repeat in interesting ways across a cycling jersey. This is partly due to the fact that a hexagon can be divided up into three equal lozenge shapes, as seen near the neck of the top right jersey. These shapes can be combined in imaginative ways, as displayed in the lower two examples.

This three-way division of a hexagon can create a 3D optical illusion called a “Necker cube”, which can appear to flip from convex to concave and back again. The orange patch can appear to be the top of a cube viewed from above or the ceiling in a corner, viewed from below. See if this happens if you stare at the image below.

Looking down on a cube or up into the corner of a room?

Spoiler alert: from here things gets a bit mathematical

Tessellations

A tessellation, or tiling, is a way of covering a plane with polygons of various types. Tessellations have many interesting mathematical properties relating to their symmetries. It turns out that there are exactly 17 types of periodic patterns. Roger Penrose, who was awarded the 2020 Nobel Prize in Physics for his work on the formation of black holes, discovered many interesting aperiodic tilings, such as the Penrose tiling.

While some people were munching on mince pies before Christmas, I watched a thought-provoking video on a related topic, released by the Mathologer, Burkard Polster. He begins by discussing ways of tiling various shapes with dominoes and goes on describe something called the Arctic Circle Theorem. Around the middle of the video, he shifts to tiling hexagon shapes with lozenges, resulting in images with the weird 3D flipping effect described above. This prompted me to spend rather a lot of time writing Python code to explore this topic.

After much experimentation, I created some code that would generate random tilings by stochastically flipping hexagons. Colouring the lozenges according to their orientation resulted in some really interesting 3D effects.

Algorithm flips a random hexagon to create a new tiling.

Neckered

The video shows random tilings of a hexagonal area. These end up looking like a collection of 3D towers with orange tops. But if you focus on a particular cube and tilt your screen backwards, the whole image can flip, Necker-style into an inverted version where the floor becomes the ceiling and the orange segments push downwards.

I used my code to create random tilings of much bigger hexagons. It turned out that plotting the image on every iteration was taking a ridiculous amount of time. Suspending plotting until the end resulted in the code running 10,000 time faster! This allowed me to run 50 million iterations for a hexagon with 32 lozenges on each size, resulting in the fabled Arctic Circle promised by the eponymous theorem. The central area is chaotic, but the colours freeze into opposite solid patches of orange, blue and grey outside the circumference of a large inscribed circle.

Arctic Circle emerged on a hexagon of side 32 after 50 million iterations

Why does the Arctic Circle emerge?

There are two intuitive ways to understand why this happens. Firstly, if you consider the pattern as representing towers with orange tops, then every tower must be taller than the three towers in front of it. So if you try to add or remove a brick randomly, the towers at the back are more likely to become taller, while those near the front tend to become shorter.

Two examples of paths from left to right

The second way to think about it is that, if you look carefully, there is a unique path from each of the lozenges on the left hand vertical side to the corresponding lozenge on the right hand vertical side. At every step, each path either goes up (blue) or down (grey). The gaps between the various paths are orange. Each step of the algorithm flips between up-down and down-up steps on a particular path. On the large hexagon, the only way to prevent the topmost cell from being orange is for the highest path to go up (and remain blue) 32 times in a row. This is very unlikely when flips are random, though it can happen more often on a smaller size-6 hexagon like the one shown in the example.

Resources

A Jupyter notebook demonstrating the approach and Python code for running longer simulations are available on this GitHub page.

Back to cycling jerseys

The Dutch company DSM is proudly sponsoring a professional cycling team in 2021. And a hexagon lies at the heart of the DSM logo, that will appear on the team jerseys.

Pro cycling team networks

The COVID-19 pandemic has further exposed the weakness of the professional cycling business model. The competition between the teams for funding from a limited number of sponsors undermines the stability of the profession. With marketing budgets under strain, more teams are likely to face difficulties, in spite of the great advertising and publicity that the sport provides. Douglas Ryder is fighting an uphill struggle trying to keep his team alive after the withdrawal of NTT as a lead sponsor. One aspect of stability is financial, but another measure is the level of transfers between teams.

The composition of some teams is more stable than others. This is illustrated by analysing the history of riders’ careers, which is available on ProCyclingStats. The following chart is a network of the transfers between teams in the last year, where the yellow nodes are 2020 teams and the purple ones are 2019. The width of the edges indicates how many riders transferred between the teams, with the thick green lines representing the bulk of the riders who stuck with the same team. The blue labels give the initials of the official name of each team, such as M-S (Mitchelton-Scott), MT (Movistar Team), T-S (Trek-Segafredo) and TS (Team Sunweb). Riders who switched teams are labelled in red.

Although there is a Dutch/German grouping on the lower right, the main structure is from the outside towards the centre of the network.

The spikes around the end of the chart show riders like Geoffrey Soupe or Rubén Fernández, who stepped down to smaller non World Tour teams like Team Total Direct Energie (TTDE), Nippo Delko One Provence (NNDP), Euskaltel-Euskadi (E-E), Androni Giocattoli-Sidermec (AG-S ) or U-XPCT (Uno-X Pro Cycling Team).

The two World Tour outliers were Mitchelton-Scott (M-S) and Groupama FDJ (GF), who retained virtually all their riders from 2019. Moving closer in, a group of teams lies around the edge of the central mass, where a few transfers occurred. Moving anti-clockwise we see CCC Team (CT), Astana Pro Team (APT), Trek-Segafredo (T-S), AG2R Le Mondial (ALM), Circus-Wanty Gobert (C-WG), Team Jumbo Visma (TJV), Bora-Hansgrohe (B-H) and EF Pro Cycling (EPC).

Deeper in the mêlée, Ineos (TI_19/IG_20), Deceuninck – Quick Step (D-QS), UAE-Team Emirates (U-TE), Lotto Soudal (LS), Bahrain – McLaren (B-H) and Movistar Team(MT) exchanged a number of riders.

Right in the centre Israel Start-Up Nation (IS-UN) grabbed a whole lot of riders, including 7 from Team Arkéa Samsic (TAS). Meanwhile likes of Victor Campenaerts and Domenico Pozzovivo are probably regretting joining NTT Pro Cycling (TDD_19/NPC_20).

Looking forward

A few of the top riders have contracts for next year showing up on ProCyclingStats. So far 2020/2021 looks like the network below. Many riders are renewing with their existing teams, indicated by the broad green lines. But some big names are changing teams, including Chris Froome, Richie Porte, Laurens De Plus, Sam Oomen, Romain Bardet and Wilco Keldeman, Bob Jungels and Lilian Calmejane.

What about networks of riders?

My original thought when starting this analysis was that over their careers, certain riders must have been team mates with most of the riders in today’s peloton, so who is the most connected? Unfortunately this turned out to be ridiculously complicated, as shown in the image below, where nodes are riders with links if they were ever teammates and the colours represent the current teams. The highest ranked rider in each team is shown in red.

It is hard to make much sense of this, other than to note that those with shorter careers in the same team are near the edge and that Philippe Gilbert is close to the centre. Out of interest, the rider around 9 o’clock linking Bora and Jumbo Visma is Christoph Pfingsten, who moved this year. At least we can conclude that professional cyclists are well-connected.

Lord of the (cycling) rings

Which Lord of the Rings characters do they look like? Ask an AI.

After building an app that uses deep learning to recognise Lord of the Rings characters, I had a bit of fun feeding in pictures of professional cyclists. This blog explains how the app works. If you just want to try it out yourself, you can find it here, but note that may need to be fairly patient, because it can take up to 5 minutes to fire up for the first time… it does start eventually.

Identifying wizards, hobbits and elves

The code that performs this task was based on the latest version of the excellent fast.ai course Practical Deep Learning for Coders. If you have done bit of programming in Python, you can build something like this yourself after just a few lessons.

The course sets out to defy some myths about deep learning. You don’t need to have a PhD in computer science – the fastai library is brilliantly designed and easy to use. Python is the language of choice for much of data science and the course runs in Jupyter notebooks.

You don’t need petabytes of data – I used fewer than 150 sample images of each character, downloaded using the Bing Image Search API. It is also straightforward to download publicly available neural networks within the fastai framework. These have been pre-trained to recognise a broad range of objects. Then it is relatively quick to fine-tune the parameters to achieve a specific task, such as recognising about 20 different Tolkien characters.

You don’t need expensive resources to build your models – I trained my neural network in just a few minutes, using a free GPU available on Google’s Colaboratory platform. After transferring the essential files to a github repository, I deployed the app at no cost, using Binder.

Thanks to the guidance provided by fastai, the whole process was quick and straightforward to do. In fact, by far the most time consuming task was cleaning up the data set of downloaded images. But there was a trick for doing this. First you train your network on whatever images come up in an initial search, until it achieves a reasonable degree of accuracy. Then take a look at the images that the model finds the most difficult to classify. I found that these tended to be pictures of lego figures or cartoon images. With the help of a fastai tool, it was simple to remove irrelevant images from the training and validation sets.

After a couple of iterations, I had a clean dataset and a great model, giving about 70% accuracy, which as good enough my purposes. Some examples are shown in the left column at the top of this blog.

The model’s performance was remarkably similar to my own. While Gollum is easy to identify, the wizard Saruman can be mistaken for Gandalf, Boromir looks a bit like Faramir and the hobbits Pippin and Merry can be confused.

Applications outside Middle Earth

One of the important limits of these types of image recognition models is that even if they work well in the domain in which they have been trained, they cannot be expected do a good job on totally different images. Nevertheless, I thought it would be amusing to supply the pictures of professional cyclists, particularly given the current vogue for growing facial hair.

My model was 87% sure that Peter Sagan was Boromir, but only 81.5% confident in the picture of Sean Bean. It was even more certain that Daniel Oss played the role of Faramir. Geraint Thomas was predicted to be Frodo Baggins, but with much lower confidence. I wondered for a while with Tadej Pogacar should be Legolas, but perhaps the model interpreted his outstretched arms as those of an archer.

I hoped that a heavily bearded Bradley Wiggins might come out as Gimli, but that did not not seem to work. Nevertheless it was entertaining to upload photographs of friends and family. With apologies for any waiting times to get to it running, you can try it here.

In earlier blogs, I have described similar models to identify common flowers or different types of bike.

Tour de France and COVID-19

A report in VeloNews on the eve of the Tour de France stated that the French government had insisted that the “two strikes and you are out” policy must be enforced by the ASO. This means that if two positive COVID-19 test arise within a team or its support staff, the team will be removed from the race. This raises the possibility of the yellow jersey rider being ejected from the race if, for example, two mechanics record positive tests. This would be particularly unjust if it turns out that a test result was a false positive. So what are the chances that this might happen?

False positives

One of the great frustrations of the reporting on COVID testing has been the lack of clarity about what type of testing is being discussed. Tests fall in to two categories. Antigen tests use a sample from a nasal or pharyngeal swab to detect patients who currently have the disease, whereas antibody tests use a blood sample to identify patients who have developed antibodies as a result of exposure to the disease in the past – more than 28 days earlier.

There are two general types of antigen test. Real time polymerase chain reaction (RT-PCR) tests looks for specific viral fragments and need to be conducted in a laboratory, typically requiring at least 24 hours for a result. Less reliable rapid tests look for proteins associated with the COVID-19 virus, producing results in as little as 15 minutes.

The UCI requires riders and staff to be tested using RT-PCR, which is a very reliable method, having both high sensitivity (ability to detect those with the disease) and high specificity (ability to clear those without the disease). The relevant question for the Tour de France is the probability of a false positive RT-PRC test. Indeed Larry Warbass recently said he thought his result was a false positive, as he had experienced no symptoms and had maintained strict self isolation during training.

The evidence indicates that the machines performing the RT-PRC test are extremely unlikely to generate a false positive, because the test needs to find significant levels of three different targets to confirm the presence of COVID-19. In FDA experiments, 100% of negatives where correctly identified – there were no false positives. However, it remains possible that, in the moving circus of the Tour de France, a sample could become contaminated before it is tested or that samples might somehow be mislabelled. A high level of responsibility falls on the shoulders of team doctors to minimise these risks, but we can never be sure that it is zero.

One in a thousand

As a thought experiment, suppose that a negative RT-PCR test is 99.9% reliable, i.e. that one COVID-free person in a thousand somehow produces a false positive result. What is the chance that a team is unjustly sent home from the Tour?

Each team has eight riders plus support staff. Although teams might want to reduce the number of staff in the team bubble, it may be necessary to have extra catering staff in order to remain self sufficient. Let us assume an average of 17 staff on each of the 21 teams and that everybody has passed the required two negative tests prior to the start of the race. Assume further that nobody contracts COVID-19 throughout the race.

It has been indicated that everyone will be tested on the two rest days. Reassuringly, the probability of two or more false positives in a single team bubble of 25 people would be 0.03% (1-0.999^(25*2)). However, the probability that every team rider receives a negative result would be only 85% (0.999^168), meaning that there would be a 15% chance that at least one rider is unjustly ejected from the race. In fact, since at total of 1,050 tests would be taken by everyone in a team bubble, the chance of at least one person receiving a false positive would be surprisingly high: 65% (1- 0.999^1050).

Perhaps the assumption of 1 in a thousand false positives was a bit alarmist. Reducing it to 1 in thousand still produces a probability of 10% that somebody would be sent home during the Tour.

Blind eyes

In some situations, draconian sanctions might deter team members or staff from reporting symptoms. One could imagine a soigneur or mechanic having to go home quietly after mysteriously spraining a wrist. However, this could create very negative press coverage if word got out that this person was infected.

Furthermore, the UCI rules place responsibility on the teams and specifically the team doctors to apply strict daily monitoring and controls to detect suspected COVID-19 cases.

Champs Elysées

While in the above scenarios no one actually contracted COVID-19, there is, of course a not inconsiderable chance that one of the 525 people in the team bubbles does actually become infected. If the virus spreads to more than one team, the whole race could become a fiasco.

But let’s keep our fingers crossed and hope Tour makes it to the Champs Elysées.