Predicting the World Champion

A couple of years ago I built a model to evaluate how Froome and Dumoulin would have matched up, if they had not avoided racing against each other over the 2017 season. As we approach the 2019 World Championships Road Race in Yorkshire, I have adopted a more sophisticated approach to try to predict the winner of the men’s race. The smart money could be going on Sam Bennett.

Deep learning

With only two races outstanding, most of this year’s UCI world tour results are available. I decided to broaden the data set with 2.HC classification European Tour races, such as the OVO Energy Tour of Britain. In order to help with prediction, I included each rider’s weight and height, as well as some meta-data about each race, such as date, distance, average speed, parcours and type (stage, one-day, GC, etc.).

The key question was what exactly are you trying to predict? The UCI allocates points for race results, using a non-linear scale. For example, Mathieu Van Der Poel was awarded 500 points for winning Amstel Gold, while Simon Clarke won 400 for coming second and Jakob Fuglsang picked up 325 for third place, continuing down to 3 points for coming 60th. I created a target variable called PosX, defined as a negative exponential of the rider’s position in any race, equating to 1.000 for a win, 0.834 for second, 0.695 for third, decaying down to 0.032 for 20th. This has a similar profile to the points scheme, emphasising the top positions, and handles races with different numbers of riders.

A random forest would be a typical choice of model for this kind of data set, which included a mixture of continuous and categorical variables. However, I opted for a neural network, using embeddings to encode the categorical variables, with two hidden layers of 200 and 100 activations. This was very straightforward using the fast.ai library. Training was completed in a handful of seconds on my MacBook Pro, without needing a GPU.

After some experimentation on a subset of the data, it was clear that the model was coming up with good predictions on the validation set and the out-of-sample test set. With a bit more coding, I set up a procedure to load a start list and the meta-data for a future race, in order to predict the result.

Predictions

With the final start list for the World Championships Road Race looking reasonably complete, I was able to generate the predicted top 10. The parcours obviously has an important bearing on who wins a race. With around 3600m of climbing, the course was clearly hilly, though not mountainous. Although the finish was slightly uphill, it was not ridiculously steep, so I decided to classify the parcours as rolling with a flat finish

PositionRiderPrediction
1Mathieu Van Der Poel0.602
2Alexander Kristoff0.566
3Sam Bennett0.553
4Peter Sagan0.540
5Edvald Boasson Hagen0.507
6Greg Van Avermaet0.500
7Matteo Trentin0.434
8Michael Matthews0.423
9Julian Alaphilippe0.369
10Mike Teunissen0.362

It was encouraging to see that the model produced a highly credible list of potential top 10 riders, agreeing with the bookies in rating Mathieu Van Der Poel as the most likely winner. Sagan was ranked slightly below Kristoff and Bennett, who are seen as outsiders by the pundits. The popular choice of Philippe Gilbert did not appear in my top 10 and Alaphilippe was only 9th, in spite of their recent strong performances in the Vuelta and the Tour, respectively. Riders in positions 5 to 10 would all be expected to perform well in the cycling classics, which tend to be long and arduous, like the Yorkshire course.

For me, 25/1 odds on Sam Bennett are attractive. He has a strong group of teammates, in Dan Martin, Eddie Dunbar, Connor Dunne, Ryan Mullen and Rory Townsend, who will work hard to keep him with the lead group in the hillier early part of the race. Then he will then face an extremely strong Belgian team that is likely to play the same game that Deceuninck-QuickStep successfully pulled off in stage 17 of the Vuelta, won by Gilbert. But Bennett was born in Belgium and he was clearly the best sprinter out in Spain. He should be able to handle the rises near the finish.

A similar case can be made for Kristoff, while Matthews and Van Avermaet both had recent wins in Canada. Nevertheless it is hard to look past the three-times winner Peter Sagan, though if Van Der Poel launches one of his explosive finishes, there is no one to stop him pulling on the rainbow jersey.

Appendix

After the race, I checked the predicted position of the eventual winner, Mads Pedersen. He was expected to come 74th. Clearly the bad weather played a role in the result, favouring the larger riders, who were able to keep warmer. The Dane clearly proved to be the strongest rider on the day.

References

Code used for this project

Sunflowers

Image in the style of @grandtourart

Last year, I experimented with using style transfer to automatically generate images in the style of @grandtourart. More recently I developed a more ambitious version of my rather simple bike identifier. The connection between these two projects is sunflowers. This blog describes how I built a flower identification app.

In the brilliant fast.ai Practical Deep Learning for Coders course, Jeremy Howard recommends downloading a publicly available dataset to improve one’s image categorisation skills. I decided to experiment with the 102 Category Flower Dataset, kindly made available by the Visual Geometry Group at Oxford University. In the original 2008 paper, the researchers used a combination of techniques to segment each image and characterise its features. Taking these as inputs to a Support Vector Machine classifier, their best model achieved an accuracy of 72.8%.

Annoyingly, I could not find a list linking the category numbers to the names of the flowers, so I scraped the page showing sample images and found the images in the labelled data.

Using exactly the same training, validation and test sets, my ResNet34 model quickly achieved an accuracy of 80.0%. I created a new branch of the GitHub repository established for the Bike Image model and linked this to a new web service on my Render account. The huge outperformance of the paper was satisfying, but I was sure that a better result was possible.

The Oxford researchers had divided their set of 8,189 labelled images into a training set and a validation set, each containing 10 examples of the 102 flowers. The remaining 6,149 images were reserved for testing. Why allocate less that a quarter of the data to training/validation? Perhaps this was due to limits on computational resources available at the time. In fact, the training and validation sets were so small that I was able to train the ResNet34 on my MacBook Pro’s CPU, within an acceptable time.

My plan to improve accuracy was to merge the test set into the training set, keeping aside the original validation set of 1,020 images for testing. This expanded training set of 7,261 images immediately failed on my MacBook, so I uploaded my existing model onto my PaperSpace GPU, with amazing results. Within 45 minutes, I had a model with 97.0% accuracy on the held-out test set. I quickly exported the learner and switched the link in the flowers branch of my GitHub repository. The committed changes automatically fed straight through to the web service on Render.

I discovered, when visiting the app on my phone, that selecting an image offers the option to take a photo and upload it directly for identification. Having exhausted the flowers in my garden, I have risked being spotted by neighbours as I furtively lean over their front walls to photograph the plants in their gardens.

Takeaways

It is very efficient to use smaller datasets and low resolution images for initial training. Save the model and then increase resolution. Often you can do this on a local CPU without even paying for access to a GPU. When you have a half decent model, upload it onto a GPU and continue training with the full dataset. Deploying the model as a web service on Render makes the model available to any device, including a mobile phone.

My final model is amazing… and it works for sunflowers.

References

Automated flower classification over a large number of classes, Maria-Elena Nilsback and Andrew Zisserman, Visual Geometry Group, Department of Engineering Science, University of Oxford, United Kingdom, men,az@robots.ox.ac.uk

102 Flowers Jupyter notebook

Bike Identification as a web app

One of the first skills acquired in the latest version of the fast.ai course on deep learning is how to create a production version of an image classifier that runs as a web application. I decided to test this out on a set of images of road bikes, TT bikes and mountain bikes. To try it out, click on the image above or go to this website https://bike-identifier.onrender.com/ and select an image from your device. If you are using a phone, you can try taking photos of different bikes, then click on Analyse to see if they are correctly identified. Side-on images work best.

How does it work?

The first task was to collect some sample images for the three classes of bicycles I had chosen: road, TT and MTB. It turns out that there is a neat way to obtain the list of urls for a Google image search, by running some javascript in the console. I downloaded 200 images for each type of bike and removed any that could not be opened. This relatively small data set allowed me to do all the machine learning using the CPU on my MacBook Pro in less than an hour.

The fast.ai library provides a range of convenient ways to access images for the purpose of training a neural network. In this instance, I used the default option of applying transfer learning to a pre-trained ResNet34 model, scaling the images to 224 pixel squares, with data augmentation. After doing some initial training, it was useful to look at the images that had been misclassified, as many of these were incorrect images of motorbikes or cartoons or bike frames without wheels or TT bars. Taking advantage of a useful fast.ai widget, I removed unhelpful training images and trained the model further.

The confusion matrix showed that final version of my model was running at about 90% accuracy on the validation set, which was hardly world-beating, but not too bad. The main problem was a tendency to mistake certain road bikes for TT bikes. This was understandable, given the tendency for road bikes to become more aero, though it was disappointing when drop handlebars were clearly visible.

The next step was to make my trained network available as a web application. First I exported the models parameter settings to Dropbox. Then I forked a fast.ai repository into my GitHub account and edited the files to link to my Dropbox, switching the documentation appropriately for bicycle identification. In the final step, I set up a free account on Render to host a web service linked to my GitHub repository. This automatically updates for any changes pushed to the repository.

Amazingly, it all works!

References

fast.ai lesson 2

My GitHub repository, include Jupyter notebook

Can self-driving cars detect cyclists?

Screenshot 2019-05-10 at 14.05.59

Self-driving cars employ sophisticated software to interpret the world around them. How do these systems work? And how good are they at detecting cyclists? Can cyclists feel safe sharing roads with an increasing number of vehicles that make use of these systems?

How hard is it to spot a cyclist?

Vehicles can use a range of detection systems, including cameras, radar and lidar.  Deep learning techniques have become very good at identifying objects in photographic images. So one important question is how hard is it to spot a cyclist in a photo taken from a moving vehicle?

Researchers at Tsinghua University, working in collaboration with Daimler, created a publicly available collection of dashboard camera photos, where humans have painstakingly drawn boxes around other road users. The data set is used by academics to benchmark the performance of their image recognition algorithms. The images are rather grey and murky, reflecting the cloudy and polluted atmosphere of the Chinese city location. It is striking that, in the majority of cases, the cyclists are very small, representing around 900 pixels out of the 2048 x 1024 images, i.e. less than 0.05% of the total area. For example, the cyclist in the middle of the image above is pretty hard to make out, even for a human.

Object-detecting neural networks are typically trained to identify the subject of a photo, which normally takes up are significant portion of the image. Finding a tall, thin segment containing a cyclist is significantly more difficult.

If you think about it, the cyclist taking up the largest percentage of a dash cam image will be riding across the direction of travel, directly in front of the vehicle, at which point it may be too late to take action. So a crucial aspect of any successful algorithm is to find more distant cyclists, before they are too close.

Setting up the problem

Taking advantage of skills acquired on the fast.ai course on deep learning, I decided to have a go at training a neural network to detect cyclists. Many of the images in the Tsinghua Daimler data set include multiple cyclists. In order to make the problem more manageable, I set out to find the single largest cyclist in each image.

If you are not interested in the technical bit, just scroll down to the results.

The technical bit

In order to save space on my drive, I downloaded about a third of the training set. The 3209 images were split 80:20 to create a training and validation sets. I also downloaded 641 unseen images that were excluded from training and used only for testing the final model.

I used transfer learning to fine-tune a neural network using a pre-trained ResNet34 backbone, with a customised head designed to generate four numbers representing the coordinates of a bounding box around the largest object in each image. All images were scaled down to 224 pixel squares, without cropping. Data augmentation added variation to the training images, including small rotations, horizontal flips and adjustments to lighting.

It took a couple of hours to train the network on my MacBook Pro, without needing to resort to a cloud-based GPU, to produce bounding boxes with an average error of just 12 pixels on each coordinate. The network had learned to do a pretty good job at detecting cyclists in the training set.

Results

The key step was to test my neural network on the set of 641 unseen images. The results were impressive: the average error on the bounding box coordinates was just 14 pixels. The network was surprisingly good at detecting cyclists.

oosImages

The 16 photos above were taken at random from the test set. The cyan box shows the predicted position of the largest cyclist in the image, while the white box shows the human annotation. There is a high degree of overlap for eleven cyclists 2, 3, 4, 5, 6, 8, 11, 12, 14, 15 and 16. Box 9 was close, falling between two similar sized riders, but 7 was a miss. The algorithm failed on the very distant cyclists in 1, 10 and 13. If you rank the photos, based on the size of the cyclist, we can see that the network had a high success rate for all but the smallest of cyclists.

In conclusion, as long as the cyclists were not too far away, it was surprisingly easy to detect riders pretty reliably, using a neural network trained over an afternoon.  With all the resources available to Google, Uber and the big car manufacturers, we can be sure that much more sophisticated systems have been developed. I did not consider, for example, using a sequence of images to detect motion or combining them with data about the motion of the camera vehicle. Nor did I attempt to distinguish cyclists from other road users, such as pedestrians or motorbikes.

After completing this project, I feel reassured that cyclists of the future will be spotted by self-driving cars. The riders in the data set generally did not wear reflective clothing and did not have rear lights. These basic safety measures make cyclists, particularly commuters, more obvious to all road users, whether human or AI.

Car manufacturers could potentially develop significant goodwill and credibility in their commitment to road safety by offering cyclists lightweight and efficient beacons that would make them more obvious to automated driving systems.

References

“A new benchmark for vision-based cyclist detection”, X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun, S. Pan, K. Li and D. M. Gavrila, in proceedings of IEEE Intelligent Vehicles Symposium (IV), pages 1028-1033, June 2016

Link to Jupyter notebook

Learning the language of the Giro

Fausto_Coppi

Computers are becoming ever better at natural language processing (NLP). I set up a neural network as a language model and loaded a set of pre-trained weights on my MacBook Pro, to see what it could do. After priming it with some text about the Giro d’Italia, this is what the model produced as a continuation.

Priming text: The Giro d’ Italia (Italian pronunciation: [ˈdʒiːro diˈtaːlja]; English: Tour of Italy; also known as the Giro) is an annual multiple-stage bicycle race primarily held in Italy, while also occasionally passing through nearby countries. The first race was organized in 1909 to increase sales of the newspaper La Gazzetta dello Sport; however it is currently run by RCS Sport. The race has been held annually since its first edition in 1909, except when it was stopped for the two world wars. As the Giro gained prominence and popularity the race was lengthened, and the peloton expanded from primarily Italian participation to riders from all over the world.

Computer generated continuation:  the race was won by the italian rider , giovanni di u_n , who won the race in the first leg of the race . the race was won by italian rider giovanni u_n , who won the race by a margin of two lengths .
= = = world tour = = =
the tour de france was the first of the tour de france . the tour de france was won by the reigning world champion , the reigning world champion , who had won the tour de france in the previous year ‘s race …

The output may not make a lot of sense, but the point is that it looks like English (in lower case). The grammar is reasonable, with commas, fullstops and a header inserted in  a logical way. Furthermore, the model has demonstrated some understanding of the context by suggesting that the Giro could be won by an Italian ride called Giovanni. The word “u_n” stands for unknown, which is consistent with the idea that an Italian surname may not be a familiar English word. It turns out that a certain Giovanni Di Santi raced against Fausto Coppi (pictured above) in the 1940 Giro, though he did not win the first stage. In addition to this, the model somehow knew that the Giro, in common with the Tour the France, is a World Tour event that could be won by the reigning world champion.

I found this totally amazing. And it was not a one off: further examples on random topics are included below. This neural network is just an architecture, defining a collection of matrix multiplications and transformations, along with a set of connection weights. Admittedly there are a lot of connection weights: 115.6 million of them, but they are just numbers. It was not explicitly provided with any rules about English grammar or any domain knowledge.

How could this possibly work?

In machine learning, language models are assessed on a simple metric: accuracy in predicting the next word of a sentence. The neural network approach has proved to be remarkably successful. Given enough data and a suitable architecture, deep learning now far outstrips traditional methods that relied on linguistic expertise to parse sentences and apply grammatical rules that differ across languages.

I was experimenting with an AWD-LSTM model originally created by Stephen Merity. This is a recurrent neural network (RNN) with three LSTM layers that include dropout. The pre-trained weights for the wt103 model were generated by Jeremy Howard of fast.ai, using a large corpus of text from Wikipedia.

Jeremy Howard converted the Wikipedia text into tokens. A tokeniser, such as spaCy,  breaks text into words and punctuation, resulting in a vocabulary of tokens that are indexed as integers. This allows blocks of text to be fed into the neural network as lists of numbers. The outputs are numbers that can be converted back into the predicted words.

The wt103 model includes a linear encoder that creates embeddings of word tokens. These are passed through three LSTM layers whose states are able to retain a memory of previous words or context. The result is passed through a decoder, employing the same weights as the encoder, to produce a softmax output that can be treated as a set of probabilities, across the vocabulary, to predict the next word token. Special forms of dropout were employed, as described in the paper, to limit overfitting and make the model more robust.

The network was trained by minimising cross-entropy loss using stochastic gradient descent. A technique called back propagation through time (BPTT) introduced some random variation into the length of the blocks of text employed in each training batch. The connection weights were iteratively improved towards optimal values.

OK, but so what?

Impressive as it may be for a neural network to spout highly plausible but nonsensical English, what possible use could this have? Well, suppose you want to automate the separation of texts into different categories, for example, distinguish different authors or determine the sentiment of tweets or predict the reaction of the stock market to news items or identify the sentiment of movie reviews. Fine-tuning a neural network that already has a good knowledge of English provides a huge advantage over starting from scratch. In lesson 10 of the fast.ai course, Jeremy Howard demonstrates how this approach can be used to achieve state of the art results on the widely used IMDB movie review data set.

Further examples

In the first example, the network finishes the sentence about a hurricane in perfect English.  Next, it comes up with the name of a critic (actually a rock music critic rather than a book reviewer, but heh!) and it makes use of its LSTM to open and close quotation marks. In the bean example, it amazingly starts describing flowers and correctly converts their dimensions from centimetres to inches!

My code can be found here.

Priming text: Hurricane Ivan was a large long Cape Verde hurricane that caused widespread damage in the Caribbean and United States The cyclone was

Computer generated continuation:  the first hurricane to strike the united states since hurricane katrina in 2005 .

Priming text: The book was widely acclaimed by the critics.

Computer generated continuation: the book was published in the united states by the american library association on june 1 , 2006 .

= = = critical response = = =

the book received mixed reviews from critics . the new york times reviewer robert christgau called it ” a great book ” and ” a great book ” . he praised the book ‘s ” strong , strong , and well – written ” prose , and the ” excellent ” prose .

Priming text: Most varieties of bean grow either as an erect bush or as a climbing plant, but a few important kinds are of intermediate form. Dwarf and semiclimbers are grown extensively. When the climbing type is grown for its immature pods, artificial supports are necessary to facilitate harvesting. Varieties differ greatly in size, shape, colour, and fibrousness or tenderness of the immature pods. In general, varieties grown for dry mature seeds produce pods that are too fibrous to be eaten at any state of development. Most edible-podded beans produce relatively low yields of mature seeds, or seeds that are of low eating quality. Seed colours range from white through green, yellow, tan, pink, red, brown, and purple to black in solid colours and countless contrasting patterns. Seed shapes range from nearly spherical to flattened, elongated, and kidney-shaped. Pods are of various shades of green, yellow, red, and purple and splashed with red or purple; pod shapes range from flat to round, smooth to irregular, and straight to sharply curved; length ranges from 75 to 200 millimetres (3 to 8 inches) or more.

Computer generated continuation:  the most common form of the flower is the flower spike , which is a characteristic of the flower spike . the flower is made up of a series of small , cylindrical , cylindrical , cylindrical , cylindrical stems that are up to 10 cm ( 3.9 in ) long and 2.5 cm ( 0.98 in ) wide . the flower spikes are arranged in a series of three or four pairs of flowers 

 

 

Creating artistic images from Strava rides

firstimage
Four laps of Richmond Park

When you upload a ride, Strava draws a map using the longitude and latitude coordinates recorded by your GPS device. This article explores ways in which these numbers, along with other metrics, can be used to create interesting images that might have some artistic merit.

The idea was motivated by the huge advances made in the field of Deep Learning, particularly applications for image recognition. However, since datasets come in all shapes and forms, researchers have explored ways of converting different types of data into images.  In a paper published in 2015, the authors achieved success in identifying standard time series by converting them into images.

GPS bike computers typically record snapshots of information every second. What kind of images could these time series generate? It turns out that there are several ways to convert a time series into an image.

Spectrogram

Creating a spectrogram is a standard approach from signal processing that is particularly useful for analysing acoustic files. The spectrogram is a heat map that shows how the underlying frequencies contributing to the signal change over time. Technically, it is derived by calculating the discrete Fourier transform of a window that slides across the time series. I applied this to my regular Saturday morning club ride of four laps around Richmond Park. The image changes a bit once the ride gets going after about 1200 seconds (20 minutes), but, frankly, the result was not particularly illuminating. There is no obvious reason to consider cycling power data as a superposition of frequencies.

spectrogram

Ah! Now we are getting somewhere

The authors of the referenced paper took a different approach to produce things called Gramian Angular Summation Field (GASF), Gramian Angular Difference Field (GADF), and Markov Transition Field (MTF). Read the paper if want to know the details. I created these and something call a Recurrence Plot. All of these methods generate a matrix, by combining every element in the time series with every other element. The underling observations occurring at times t_{1} and t_{2} determine the colour of the pixel at position (t_{1}, t_{2}). Images are symmetric along the lower-left to upper-right diagonal, apart from GADF, which is antisymmetric.

Let’s see how do they look for on four laps of Richmond Park. We have six time series, with corresponding sets of images below. The segmentation of the images is due to periodicity of the data. This is particularly clear in the geographic data (longitude, latitude and altitude). The higher intensity of the main part of the ride is most obvious in the heart rate data. The MTF plots are quite interesting. Scroll down through the images to the next section

data1
Raw time series of power, heart rate, cadence, longitude, latitude and altitude

gasf
Gramian Angular Sum Field

gadf
Gramian Angular Difference Field

mtf
Markov Transition Field

rp
Recurrence Plot

From cycle ride to art

It is one thing to create an image of each item, but how can we combine these to summarise a ride in a single image. I considered two methods of combining time series into a single image: a) create a new image where the vertical and horizontal axes represent different series and b) create a new image by simply adding the corresponding values from two underlying images.

One problem is that some cyclists don’t have gadgets like heart rate monitors and power meters, so I initially restricted myself to just the longitude, latitude and altitude data. Nevertheless, as noted in an earlier blog, it is possible to work out speed, because the time interval is one second between each reading. Furthermore, one can estimate power, from the speed and changes in elevation.

Another problem is that rides differ in length. For this I split the ride into, say, 128 intervals and took the last observation in each interval. So for a 3 hour ride, I’d be sampling about once every 84 seconds.

The chart at the top of this blog was created by first normalising each series to a standard range (-1, +1). Method a) was used to create two images: longitude was added to latitude and altitude was multiplied by speed. These were added using method b). Using these measures will produce pretty much the same chart each time the ride is done. In contrast, an image that is totally unique to the ride can be produced using data relating to the individual rider. The image below uses the same recipe to combine speed, heart rate, power and cadence. If this had been a particularly special ride, the image would be a nice personal memento.

lastimage
A different take on four laps of Richmond Park

For anyone interested in the underlying code, I have posted a Jupyter notebook here.

References

Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks, Wang Z Oates T, https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/viewFile/10179/10251

 

Machine learning for a medical study of cyclists

Screen Shot 2018-10-11 at 15.28.46

This blog provides a technical explanation of the analysis underlying the medical paper about male cyclists described previously. Part of the skill of a data scientist is to choose from the arsenal of machine learning techniques the tools that are appropriate for the problem at hand. In the study of male cyclists, I was asked to identify significant features of a medical data set. This article describes how the problem was tackled.

Data

Fifty road racing cyclists, riding at the equivalent of British Cycling 2nd category or above, were asked to complete a questionnaire, provide a blood sample and undergo a DXA scan – a low intensity X-ray used to measure bone density and body composition. I used Python to load and clean up the data, so that all the information could be represented in Pandas DataFrames. As expected this time-consuming, but essential step required careful attention and cross-checking, combined with the perseverance that is always necessary to be sure of working with a clean data set.

The questionnaire included numerical data and text relating to cycling performance, training, nutrition and medical history. As a result of interviewing each cyclist, a specialist sports endocrinologist identified a number of individuals who were at risk of low energy availability (EA), due to a mismatch between nutrition and training load.

Bone density was measured throughout the body, but the key site of interest was the lumbar spine (L1-L4). Since bone density varies with age and between males and females, it was logical to use the male, age-adjusted Z-score, expressing values in standard deviations above or below the comparable population mean.

The measured blood markers were provided in the relevant units, alongside the normal range. Since the normal range is defined to cover 95% of the population, I assumed that the population could be modelled by a gaussian distribution in order to convert each blood result into a Z-score. This aligned the scale of the blood results with the bone density measures.

Analysis

I decided to use the Orange machine learning and data visualisation toolkit for this project. It was straightforward to load the data set of 46 features for each of the 50 cyclists. The two target variables were lumbar spine Z-score (bone health) and 60 minute FTP watts per kilo (performance). The statistics confirmed the researchers’ suspicion that the lumbar spine bone density of the cyclists would be below average, partly due to the non-weight-bearing nature of the sport. Some of the readings were extremely low (verging on osteoporosis) and the question was why.

Given the relatively small size of the data set (a sample of 50), the most straightforward approach for identifying the key explanatory variables was to search for an optimal Decision Tree. Interestingly, low EA turned out to be the most important variable in explaining lumbar spine bone density, followed by prior participation in a weight-bearing sport and levels of vitamin D (which was, in most cases, below the ideal level of athletes). Since I had used all the data to generate the tree, I made use of Orange’s data sampler to confirm that these results were highly robust. This had some similarities with the Random Forest approach. Although Orange produces some simple graphical tools like the following, I use Python to generate my own versions for the final publication.

 

Finding a robust decision tree is one thing, but it was essential to verify whether the decision variables were statistically significant. For this, Orange provides box plots for discrete variables. For my own peace of mind, I recalculated all of the Student’s T-statistics to confirm that they were correct and significant. The charts below show an example of an Orange box plot and the final graphic used in the publication.

The Orange toolkit includes other nice data visualisation tools. I particularly liked the flexibility available to make scatter plots. This inspired the third figure in the publication, which showed the most important variable explaining performance. This chart highlights a cluster of three cyclists with low EA, whose FTP watts/kg were lower than expected, based on their high training load. I independently checked the T-statistics of the regression coefficients to identify relationships that were significant, like training load, or insignificant, like percentage body fat.

Conclusions

The Orange toolkit turned out to be extremely helpful in identifying relationships that fed directly into the conclusions of an important medical paper highlighting potential health risks and performance drivers for high level cyclists. Restricting nutrition through diet or fasted rides can lead to low energy availability, that can cause endocrine responses in the body that reduce lumbar spine bone density, resulting in vulnerability to fracture and slow recovery. This is know as Relative Energy Deficiency in Sport (RED-S). Despite the obsession of many cyclists to reduce body fat, the key variable explaining functional threshold power watts/kg was weekly training load.

References

Low energy availability assessed by a sport-specific questionnaire and clinical interview indicative of bone health, endocrine profile and cycling performance in competitive male cyclists, BMJ Open Sport & Exercise Medicine, https://doi.org/10.1136/bmjsem-2018-000424

Relative Energy Deficiency in Sport, British Association of Sports and Exercise Medicine

Synergistic interactions of steroid hormones, British Journal of Sports Medicine

Cyclists: Make No Bones About It, British Journal of Sports Medicine

Male Cyclists: bones, body composition, nutrition, performance, British Journal of Sports Medicine