Kings and Queens of the Mountains

Screen Shot 2017-11-09 at 18.40.09.png

I guess that most male cyclists don’t pay much attention to the women’s leaderboards on Strava. And if they do it might just be to make some puerile remark about boys being better than girls. From a scientific perspective the comparison of male and female times leads to some interesting analysis.

Assuming both men and women have read my previous blogs on choosing the best time, weather conditions and wind directions for the segment that suits their particular strengths, we come back to basic physics.

KOM or QOM time = Work done / Power = (Work against gravity + Drag x Distance + Rolling resistance x Distance) / (Mass x Watt/kg)

Of the three components of work done, rolling resistance tends to be relatively insignificant. On a very steep hill, most of the work is done against gravity, whereas on a flat course, aerodynamic drag dominates.

The two key factors that vary between men and women are mass and power to weight ratio (watts per kilo).  A survey published by the ONS in 2010, rather shockingly reported that the average British man weighed 83.6kg, with women coming in at 70.2kg. This gives a male/female ratio of 1.19. KOM/QOM cyclists would tend to be lighter than this, but if we take 72kg and 60kg, the ratio is still 1.20.

Males generate more watts per kilogram due to having a higher proportion of lean muscle mass. Although power depends on many factors, including lungs, heart and efficiency of circulation, we can estimate the relative power to weight ratio by comparing the typical body composition of males and females. Feeding the ONS statistics into the Boer formula gives a lean body mass of 74% for men and 65% for women, resulting in a ratio of 1.13. This can be compared against the the useful table on Training Peaks showing maximal power output in Watts/kg, for men and women, over different time periods and a range of athletic abilities. The table is based on the rows showing world record performances and average untrained efforts.  For world champion five minute efforts and functional threshold powers, the ratios are consistent with the lean mass ratio. It makes sense that the ratio should be higher for shorter efforts, where the male champions are likely to be highly muscular. Apparently the relative performance is precisely 1.21 for all durations in untrained people.

Screen Shot 2017-11-08 at 10.23.33

On a steep climb, where the work done against gravity dominates, the benefit of additional male muscle mass is cancelled by the fact that this mass must be lifted, so the difference in time between the KOM and the QOM is primarily due to relative power to weight ratio. However, being smaller, women suffer from the disadvantage that the inert mass of bike represents a larger proportion of the total mass that must be raised against gravity. This effect increases with gradient. Accounting for a time difference of up to 16% on the steepest of hills.

In contrast, on a flat segment, it comes down to raw power output, so men benefit from advantages in both mass and power to weight ratio. But power relates to the cube of the velocity, so the elapsed time scales inversely with the cube root of power. Furthermore, with smaller frames, women present a lower frontal area, providing a small additional advantage. So men can be expected to have a smaller time advantage of around 9%. In theory the advantage should continue to narrow as the gradient shifts downhill.

Theory versus practice

Strava publishes the KOM and QOM leaderboards for all segments, so it was relatively straightforward to check the basic model against a random selection of 1,000 segments across the UK. All  leaderboards included at least 1,666 riders, with an overall average of 637 women and 5,030 men. One of the problems with the leaderboards is that they can be contaminated by spurious data, including unrealistic speeds or times set by groups riding together. To combat this, the average was taken of the top five times set on different dates, rather than simply to top KOM or QOM time.

The average segment length was just under 2km, up a gradient of 3%. The following chart plots the ratio of the QOM time to the KOM time versus gradient compared with the model described above. The red line is based on the lean body mass/world record holders estimate of 1.13, whereas the average QOM/KOM ratio was 1.32. Although there is a perceivable upward slope in the data for positive gradients, clearly this does not fit the data.

Screen Shot 2017-11-09 at 17.54.43

Firstly, the points on the left hand side indicate that men go downhill much more fearlessly than women, suggesting a psychological explanation for the observations deviating from the model. To make the model fit better for positive gradients, there is no obvious reason to expect the weight ratio of male to female Strava riders to deviate from the general population, so this leaves only the relative power to weight ratio. According to the model the QOM/KOM ratio should level off to the power to weight ratio for steep gradients. This seems to occur for a value of around 1.40, which is much higher than the previous estimates of 1.13 or the 1.21 for untrained people. How can we explain this?

A notable feature of the data set was that sample of 1,000 Strava segments was completed by nearly eight times as many men as women. This, in turn reflects the facts that there are more male than female cyclists in the UK and that men are more likely to upload, analyse, publicise and gloat over their performances than women.

Having more men than women, inevitably means that the sample includes more high level male cyclists than equivalent female cyclists. So we are not comparing like with like. Referring back to the Training Peaks table of expected power to weight ratios, a figure of 1.40 suggests we are comparing women of a certain level against men of a higher category, for example, “very good” women against “excellent” men.

A further consequence of having far more men than women is that is much more likely that the fastest times were recorded in the ideal conditions described in my previous blogs listed earlier.

Conclusions

There is room for more women to enjoy cycling and this will push up the standard of performance of the average amateur rider. This would enhance the sport in the same way that the industry has benefited as more women have joined the workforce.

Froome versus Dumoulin

Screen Shot 2017-10-27 at 19.04.21Many commentators have been licking their lips at the prospect of head-to-head combat between Chris Froome and Tom Dumoulin at next year’s Tour de France. It is hard to make a comparison based on their results in 2017, because they managed to avoid racing each other over the entire season of UCI World Tour races, meeting only in the World Championship Individual Time Trial, where the Dutchman was victorious. But it is intriguing to ask how Dumoulin might have done in the Tour de France and the Vuelta or, indeed, how Froome might have fared in the Giro.

Inspiration for addressing these hypothetical questions comes from an unexpected source. In 2009 Netflix awarded a $1million prize to a team that improved the company’s technique for making film recommendations to its users, based on the star ratings assigned by viewers. The successful algorithm exploited the fact that viewers may enjoy the films that are highly rated by other users who have generally agreed on the ratings of the films they have seen in common. Initial approaches sought to classify films into genres or those starring particular actors, in the hope of grouping together viewers into similar categories. However, it turned out to be very difficult to identify which features of a film are important. An alternative is simply to let the computer crunch the data and identify  the key features for itself. A method called Collaborative Filtering became one of the most popular employed for recommender systems.

Our cycling problem shares certain characteristics with the Netflix challenge: instead of users, films and ratings, we have riders, races and results. Riders enter a selection of races over the season, preferring those where they hope to do well. Similar riders, for example sprinters, tend to finish high in the results of races where other sprinters also do well. Collaborative filtering should be able to exploit the fact that climbers, sprinters or TTers tend to finish close to each other, across a range of races.

This year’s UCI World Tour concluded with the Tour of Guangxi, completing the data set of results for 2017. After excluding team time trials, 883 riders entered 174 races, resulting in 26,966 finishers. Most races have up to 200 participants , so if you imagine a huge table with all the racers down the rows and all the races across the columns, the resulting matrix is “sparse” in the sense that there are lots of missing values for the riders who were not in a particular race. Collaborative Filtering aims to fill in the spaces, i.e. to estimate the position of a rider who did not enter a specific race. This is exactly what we would like to do for the Grand Tours.

It took a couple of minutes to fit a matrix factorisation Collaborative Filtering model, using keras, on my MacBook Pro. Some experimenting suggested that I needed about 50 hidden factors plus a bias to come up with a reasonable fit for this data set. Taking at random the Milan San Remo one day stage race, it did a fairly good job of predicting the top ten riders for this long, hilly race with a flat finish.

 Model fit (prediction) Rider Actual result
1 Peter_Sagan 2
2 Alexander_Kristoff 4
3 Michael_Matthews 12
4 Edvald_Boasson_Hagen 19
5 Sonny_Colbrelli 13
6 Michal_Kwiatkowski 1
7 John_Degenkolb 7
8 nacer_Bouhanni 8
9 Julian_Alaphilippe 3
10 Diego_Ulissi 40

The following figure visualises the primary factors the model derived for classifying the best riders. Sprinters are in the lower part of chart, with climbers towards the top and allrounders in the middle. Those with a lot of wins are towards the left.

Screen Shot 2017-10-27 at 19.26.17

Now we come to the interesting part: how would Tom Dumoulin and Chris Froome have compared in the other’s Grand Tours? Note that this model takes account of the results of all the riders in all the races, so it should be capable of detecting the benefit of being part of a strong team.

Tour de France

The model suggested that Tom Dumoulin would have beaten Chris Froome in stages 1(TT), 2, 5, 6, 10 and 21, but the yellow jersey winner would have been stronger in the mountains and won overall.

Giro d’Italia

The model suggested that Chris Froome would have been ahead in the majority of stages, leaving stages 4, 5, 6, 9,  10(TT), 14 and 21(TT) to Dumoulin. The Brit would have most likely claimed the pink jersey.

Vuelta a España

The model suggested that Tom Dumoulin would have beaten Chris Froome in stages 2, 4, 12, 18, 19 and 21. In spite of a surge by the Dutchman towards the end of the race, the red jersey would have remained with Froome.

Conclusions

Based on a Collaborative Filtering approach, the results of 2017 suggest that Chris Froome would have beaten Tom Dumoulin in any of the Grand Tours.

Ranking Top Pro Cyclists for 2017

peter-sagan.jpg

Following Il Lombardia last weekend, the World Tour has only two more events this year. It is time to ask who were the best sprinters of 2017? Who was the best climber or puncheur? The simplest approach is to count up the number of wins, but this ignores the achievement of finishing consistently among the top riders on different types of parcours. This article explores ways of creating rankings for different types of riders.

The current UCI points system, introduced in 2016, is fiendishly complicated, with points awarded for winning races and bonuses given to those wearing certain jerseys in stage races. The approach applies different scales according to the type of event, but each of these scales puts a premium on winning the race, with points awarded for first place being just over double the reward of the fifth-placed rider. In fact, taking the top 20 places in the four main world tour categories of event, the curve of best fit is exponential with a coefficient of approximately -1/6. In other words, there’s a linear relationship between a rider’s finishing position and the logarithm of the UCI points awarded.

UCI Points

This observation is really useful, because it provides a straightforward way of assessing the performance in different types of races, based on their finishing positions. The  PCS web site is great source of  professional cycling statistics. One nice feature is that most of the races/stages have an associated profile indicated by a little logo, see Tour de France. These classify races into the following categories:

  • Flat e.g. TdF stage 2 from Düsseldorf to Liège
  • Hills with a flat finish e.g. Milan San Remo
  • Hills with an uphill finish e.g. Fleche Wallonne
  • Mountains with a flat finish e.g. TdF stage 8 Station des Rousses
  • Mountains with an uphill finish e.g. TdF stage 5 La Planche des Belles Filles
  • It is also reasonable to assume that any stage of less than 80km was a TT

We would expect outright sprinters to top the rankings in flat races, whereas the puncheurs come to the fore when it becomes hilly, with certain riders doing particularly well on steep uphill finishes. The climbers come into their own in the mountains, with some being especially strong on summit finishes.

Taking the results of all the World Tour races in 2017 completed up to Il Lobardia and applying the simple -1/6 exponential formula equally to all categories of event,  we obtain the following “derived ranking”,  arranged by the profile of event.

Derived ranking for 2017 World Tour events, according to parcours

Screen Shot 2017-10-10 at 20.02.24

Marcel Kittel rightly tops the sprinters on flat courses (while Cavendish was 11th), but the Katusha Alpecin rider and several others have tended to be dropped on hilly courses, where Sagan, Ewan and Kristoff were joined by Trentin, Gaviria and some classic puncheurs. Sagan managed to win some notable uphill finishes, such as Tirreno-Adriatico and Grand Prix Cycliste de Quebec, alongside riders noted for being strong in the hills. The aggression of Valverde and Contador put them ahead of Froome on mountain stages that finished on the flat, but the TdF winner, Zakarin and Bardet topped the rankings of pure climbers for consistency on summit finishes. Finally we see the usual suspects topping the TT rankings.

It should be noted that ranking performances based simply on positions, without some form of scaling, gave very unintuitive results. While simpler than the UCI points system, this analysis supports the idea of awarding points in a way that scales exponentially with the finishing position of a rider.

 

Deep Learning – Faking It

Screen Shot 2017-08-20 at 15.01.01
Thumbnails of real bikes (Bianchi, Giant, Cube…)
Screen Shot 2017-08-20 at 15.01.15
Fake thumbnails generated randomly by Wasserstein Generative Adversarial Network

My last blog showed the results of using a deep convolutional neural network to apply different artistic styles to a photograph of cyclist.  This article looks at the trendy topic of Generative Adversarial Networks (GANs). Specifically, I investigate the application of a Wasserstein GAN to generate thumbnail images of bicycles.

In the field of machine learning, a generative model is a model designed to produce examples from a particular target distribution. In statistics, the output might be samples from a Gaussian distribution, but we can extend the idea to create a model that produces examples of sonnets in the style of Shakespeare or pictures of cats… or bicycles.

The adversarial framework introduces an attractive idea from game theory: to create a competitive form of learning. While a generator learns from a corpus of real examples how to create realistic “fakes”, a discriminator (or critic) learns to distinguish been fakes and authentic examples. In fact, the generator is given the objective of trying to fool the discriminator. As the discriminator improves, the generator is driven to enhance the authenticity of its output. This creates in a virtuous cycle.

When originally proposed in 2014, Generative Adversarial Networks stimulated much interest, but it proved hard to make them work reliably in practice. One problem was “mode collapse”, where the generator becomes stuck, producing the same output all the time. However, this changed with the publication of a recent paper, explaining how earlier problems could be overcome by using a so-called Wasserstein loss function.

As an experiment, I downloaded a batch of images of bicycles from the Internet. After manually removing pictures with riders and close-ups of components, there were about 1,200 side views of road bikes (mostly with handlebars to the right, so you can see the chainset). After a few experiments, I reduced the dataset to the 862 images, by automatically selecting bikes against a white background.

Screen Shot 2017-08-20 at 14.45.29
Sample of real bike images

As a participant of part 2 of the excellent fast.ai deep learning course, I made use of WGAN code that runs using Pytorch. I loaded the bike images at thumbnail size of 64×64 (training with larger images exceeded the memory constraints of the p2.large GPU I’m running on AWS). It was initially disappointing to experience the mode collapse problem, especially because the authors of the WGAN paper claimed never to have encountered it. However, speeding up the learning rate of the generator seemed to solve the problem.

Although each fake was created from a completely random starting point, the generator learned to produce images against a white background, with two circles joined by lines. After a couple of hundred iterations the WGAN began to generate some recognisably bicycle-like images. Notice the huge variety. Some of the best ones are shown at the top of this post.

Screen Shot 2017-08-20 at 14.41.19
Sample of images generated by WGAN

I tried to improve the WGAN’s images, using another deep learning tool: super resolution. This amazing technique is used to solve the seemingly impossible task of converting images from low resolution to high resolution. It is achieved by taking downgraded versions of a large dataset of high resolution images, then training a neural network to reproduce a high-res version from the corresponding low-res input. A super resolution network is able to learn about certain properties of the world, for example, it converts jagged curves into smooth ones – a feature I’d hoped might be useful for making wheels look rounder.

Example of a super resolution network on real photographs

Unfortunately, my super resolution experiments did not lead to the improvement I’d hoped for. Two possible explanations are that a) the fake images were not low-res photos and b) the network had been trained on many types of images other than bicycles with white backgrounds.

Example of super resolution network on a fake bicycle image

In the end I was pretty happy with the best of the 64×64 images shown above. They are at least as good as something I could draw by hand. This is an impressive example of unsupervised learning. The trained network is able to use some learned notion of what a bicycle looks like in order to produce new images that possess similar properties. With more time and training, I’m sure the WGAN could be improved, perhaps to the point where the images might provide creative inspiration for new bike designs.

References

Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … Bengio, Y. (2014). Generative Adversarial Networks. 

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein GAN. 

Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual Losses for Real-Time Style Transfer and Super-Resolution. 

 

Deep Learning – Cycling Art

I’ve always be fascinated by the field of artificial intelligence, but it is only recently that significant and rapid advances have been made, particularly in the area of deep learning, where artificial neural networks are able to learn complex relationships. Back in the early 1990s, I experimented with forecasting share prices using neural networks. Performance was not much better than the linear models we were using at the time, so we never managed money this way, though I did publish a paper on the topic.

I am currently following an amazing course offered by fast.ai that explains how to programme and implement state of the art techniques in deep learning. Image recognition is one of the most interesting applications. Convolutional neural networks are able to recognise the content and style of images. It is possible to explore what the network has “learnt” by examining the content of the intermediate layers, between the input and the output.

Over the last week I have been playing around with some Python code, provided for the course, that uses a package called keras to build and run networks on a GPU using Google’s TensorFlow infrastructure. Starting with a modified version of the publicly available network called VGG16, which has been trained to recognise images, the idea is to combine the content a photograph with the style of an artist.

An image is presented to the network as an array of pixel values. These are passed through successive layers, where a series of transformations is performed. These allow the network to recognise increasingly complex features of the original image. The content of the image is captured by refining an initially random set of pixels, until it generates similar higher level features.

The style of an artist is represented in a slightly different way. This time an initially random set of pixels is modified until it matches the overall mixture of colours and textures, in the absence of positional information.

Finally, a new image is created, again initially from random, but this time matching both the content of the photograph and the style of the artist. The whole process takes about half an hour on my MacBook Pro, though I also have access to a high-spec GPU on Amazon Web Services to run things faster.

Here are some examples of a cyclist in the styles of Cézanne, Braque, Monet and Dali. The Cézanne image worked pretty well. I scaled up the content versus style for Braque. The Monet picture confuses the sky and trees. And the Dali result is just weird.

 

References

Trained to Forecast – Risk Magazine, January 1993

Deep Learning for Coders

A Neural Algorithm of Artistic Style, Leon A. Gatys, Alexander S. Ecker, Matthias Bethge

 

 

 

Chain reactions

At this year’s Royal Society Summer Exhibition, scientists and engineers from Bristol University presented some interesting work on improvements to the drive chains used by Team GB in the Rio Olympics. They reached clear conclusions about the design of the chain and sprockets, taken up by Renold. Current research is exploring the the problem of chain resonance.

Bicycle chains and sprockets and sprockets tend to receive less attention than aerodynamics, for several reasons. As noted in previous blogs, the power required to overcome aerodynamic drag scales with the cube of velocity, whereas frictional effects scale simply in proportion to velocity. Furthermore, a good well-lubricated drive chain typically has an efficiency of around 95% or more, so it is hard to make further improvements. Note that a dirty chain has significantly lower efficiency, so you should certainly keep your bike clean.

The loss of power comes from the friction between links as they bend around the chainring and the rear sprocket. Using a high precision rig, the researchers demonstrated that larger sprockets are more efficient than smaller ones. For example, with a gear ratio of 4:1, it is more efficient to use a 64/16 than a more conventional 52/13.

In fact, one of the experts told me that the efficiency of the drive chain falls off sharply as the sprocket size is reduced from 13 to 12 to 11. This is because the chain has to bend around a much sharper angle for a smaller sprocket. If you think about it, the straight chain has to bend to a certain angle that depends on the number of teeth on the sprocket. Recalling some school maths about the interior angles of polygons, for 16 teeth, the angle is 157.5º, whereas for 11 teeth, the angle is 147.3º. For the larger sprocket, each pair of links overcomes less friction bending through 22.5º and back, compared with a more dramatic 32.7º and back for the smaller one.

Note that this analysis of the rear sprocket applies to single speed track bikes. On a road bike the chain has to pass the two derailleur cogs, which typically have 13 teeth, whatever gear you choose. However, the argument still applies to the chainring  at the front, where the gains of going larger were shown to exceed the additional aerodynamic drag.

The Bristol team also explored the effect of a number of other factors on performance. Using different length links obviously requires customised sprockets and chainrings. This would be a major upheaval for the industry, but it is possible for purpose-built track bikes. Certain molybdenum-based lubricating powders used in the space industry may be better than traditional oils. Other materials could replace traditional steel.

A different kind of power loss can occur when the chain resonates vertically. A specially designed test rig showed that this can occur at frequencies, which could be triggered at certain pedalling cadences. Current research is investigating how the tension of the chain and its design can help mitigate this problem (which is also an issue for motor cycles).

In conclusion, when we see Tony Martin pushing a 58+ chainring, it may not be simply an act of machismo – he is actually be benefitting from efficiency gains.

 

Update on cycling aerodynamics

A recently published paper provides a useful review of competition cycling aerodynamics. It looks at the results of a wide range of academic studies, highlighting the significant advances made in the last 5 to 10 years.

The power required to overcome aerodynamic drag rises with the cube of velocity, so riding at 50km/h takes almost twice as much power as riding at 40km/h. At racing speed, around 80% of a cyclist’s power goes into overcoming aerodynamic drag. This is largely because a bike and rider are not very streamlined, resulting in a turbulent wake.

The authors quote drag coefficients, Cd, of 0.8 for upright and 0.6 for TT positions. These compare with 0.07 for a recumbent bike with fairing, indicating that there is huge room for improvement.

Wind tunnels, originally used in the aerospace and automotive industries, are now being designed specifically for cycling, though no specific standards have been adopted. These provide a simplification of environmental conditions, but they can be used to study air flow for different body positions and equipment. Mannequins are often used in research, as one of the difficulties for riders is the ability to repeat and maintain exactly the same position. Some tunnels employ cameras to track movements. Usually a drag area measurement, CdA, is reported, rather than Cd, thereby avoiding uncertainty due to measurement of frontal area, though this can be estimated by counting pixels in a image.

One thing that makes cycling particularly complex is the action of pedalling. This creates asymmetric high drag forces as one leg goes up and the other goes down, resulting in variations of up to 20% relative to a horizontal crank position.

Cycling has been studied using computational fluid dynamics, helping to save on wind tunnel costs. These use fine mesh models to calculate details of flow separation and pressure variations across the cyclist’s body. The better models are in good agreement with wind tunnel experiments.

Practical advice

Cycling speed is a maximum optimisation problem between aerodynamic and biomechanical efficiency

Ultimately, scientists need to do field tests. The extensive use of power meters allows cyclists to experiment for themselves. The authors provide two practical ways to separate the coefficient of rolling resistance, Crr,  from CdA. One based on rolling to a halt and the other using a series of short rides at constant speed.

Minimising aerodynamic resistance through rider position is one of the most effective ways to improve performance among well-trained athletes

Compared with riding upright on the hoods, moving to the drops saves 15% to 20% while adopting a TT position saves 30% to 35%. Studies show quite a lot of variance in these figures, as the results depend on whether the rider is pedalling, as well as body size. The following quote suggests that when freewheeling downhill in an aero tuck, your crank should be horizontal (unless you are cornering).

Current research suggests that the drag coefficient of a pedalling cyclist is ≈6% higher than that of a static cyclist holding a horizontal crank position

The authors quote the figures for CdA of 0.30-0.50 for an upright position, 0.25 to 0.30 on the drops and 0.20-0.25 for a TT position. Variation is largely, but not only, due to changes in frontal area, A. Unfortunately, relatively minor changes in position can have large effects on drag, but the following effects were noted.

Broker and Kyle note that rider positions that result in a flat back, a low tucked head and forearms positioned parallel to the bicycle frame generally have low aerodynamic drag. Wind tunnel investigations into a wide range of modifications to standard road cycling positions by Barry et al. showed that that lowering the head and torso and bringing the arms inside the silhouette of the hips reduced the aerodynamic drag.

Bike frames, wheels, helmets and skin suits are all designed with aerodynamics in mind, while remaining compliant with UCI rules. Skin suits are important, due to their large surface areas. By delaying airflow separation, textured fabrics reduce wake turbulence, resulting in as much as a 4% reduction in drag.

In race situations, drafting skills are beneficial, particularly behind a larger rider. While following riders gain a significant benefit, it has been shown that the lead rider also accrues a small advantage of around 3%. It is best to overtake very closely in order to take maximal advantage of lateral drafting effects.

For a trailing cyclist positioned immediately behind the leader, drag reduction has been reported in the range of 15–50 % and reduces to 10–30 % as the gap extends to approximately a bike length… The drafting effect is greater for the third rider than the second rider in a pace-line, but often remains nearly constant for subsequent riders

For those interested in greater detail, it is well worth looking at the full text of the paper, which is freely available.

Reference

Riding against the wind: a review of competition cycling aerodynamics, Timothy N. CrouchEmail authorDavid BurtonZach A. LaBryKim B. Blair, Sports Engineering, June 2017, Volume 20, Issue 2, pp 81–110