Machine learning for a medical study of cyclists

Screen Shot 2018-10-11 at 15.28.46

This blog provides a technical explanation of the analysis underlying the medical paper about male cyclists described previously. Part of the skill of a data scientist is to choose from the arsenal of machine learning techniques the tools that are appropriate for the problem at hand. In the study of male cyclists, I was asked to identify significant features of a medical data set. This article describes how the problem was tackled.

Data

Fifty road racing cyclists, riding at the equivalent of British Cycling 2nd category or above, were asked to complete a questionnaire, provide a blood sample and undergo a DXA scan – a low intensity X-ray used to measure bone density and body composition. I used Python to load and clean up the data, so that all the information could be represented in Pandas DataFrames. As expected this time-consuming, but essential step required careful attention and cross-checking, combined with the perseverance that is always necessary to be sure of working with a clean data set.

The questionnaire included numerical data and text relating to cycling performance, training, nutrition and medical history. As a result of interviewing each cyclist, a specialist sports endocrinologist identified a number of individuals who were at risk of low energy availability (EA), due to a mismatch between nutrition and training load.

Bone density was measured throughout the body, but the key site of interest was the lumbar spine (L1-L4). Since bone density varies with age and between males and females, it was logical to use the male, age-adjusted Z-score, expressing values in standard deviations above or below the comparable population mean.

The measured blood markers were provided in the relevant units, alongside the normal range. Since the normal range is defined to cover 95% of the population, I assumed that the population could be modelled by a gaussian distribution in order to convert each blood result into a Z-score. This aligned the scale of the blood results with the bone density measures.

Analysis

I decided to use the Orange machine learning and data visualisation toolkit for this project. It was straightforward to load the data set of 46 features for each of the 50 cyclists. The two target variables were lumbar spine Z-score (bone health) and 60 minute FTP watts per kilo (performance). The statistics confirmed the researchers’ suspicion that the lumbar spine bone density of the cyclists would be below average, partly due to the non-weight-bearing nature of the sport. Some of the readings were extremely low (verging on osteoporosis) and the question was why.

Given the relatively small size of the data set (a sample of 50), the most straightforward approach for identifying the key explanatory variables was to search for an optimal Decision Tree. Interestingly, low EA turned out to be the most important variable in explaining lumbar spine bone density, followed by prior participation in a weight-bearing sport and levels of vitamin D (which was, in most cases, below the ideal level of athletes). Since I had used all the data to generate the tree, I made use of Orange’s data sampler to confirm that these results were highly robust. This had some similarities with the Random Forest approach. Although Orange produces some simple graphical tools like the following, I use Python to generate my own versions for the final publication.

 

Finding a robust decision tree is one thing, but it was essential to verify whether the decision variables were statistically significant. For this, Orange provides box plots for discrete variables. For my own peace of mind, I recalculated all of the Student’s T-statistics to confirm that they were correct and significant. The charts below show an example of an Orange box plot and the final graphic used in the publication.

The Orange toolkit includes other nice data visualisation tools. I particularly liked the flexibility available to make scatter plots. This inspired the third figure in the publication, which showed the most important variable explaining performance. This chart highlights a cluster of three cyclists with low EA, whose FTP watts/kg were lower than expected, based on their high training load. I independently checked the T-statistics of the regression coefficients to identify relationships that were significant, like training load, or insignificant, like percentage body fat.

Conclusions

The Orange toolkit turned out to be extremely helpful in identifying relationships that fed directly into the conclusions of an important medical paper highlighting potential health risks and performance drivers for high level cyclists. Restricting nutrition through diet or fasted rides can lead to low energy availability, that can cause endocrine responses in the body that reduce lumbar spine bone density, resulting in vulnerability to fracture and slow recovery. This is know as Relative Energy Deficiency in Sport (RED-S). Despite the obsession of many cyclists to reduce body fat, the key variable explaining functional threshold power watts/kg was weekly training load.

References

Low energy availability assessed by a sport-specific questionnaire and clinical interview indicative of bone health, endocrine profile and cycling performance in competitive male cyclists, BMJ Open Sport & Exercise Medicine, https://doi.org/10.1136/bmjsem-2018-000424

Relative Energy Deficiency in Sport, British Association of Sports and Exercise Medicine

Synergistic interactions of steroid hormones, British Journal of Sports Medicine

Cyclists: Make No Bones About It, British Journal of Sports Medicine

Male Cyclists: bones, body composition, nutrition, performance, British Journal of Sports Medicine

 

Fuelling for Cycling Performance

CF
Chris Froome (LaPresse)

Some commentators were skeptical of Team Sky’s explanation for Chris Froome’s 80km tour-winning attack on stage 19 of the Giro. His success was put down to the detailed planning of nutrition throughout the ride, with staff positioned at strategic refuelling points along the entire route.  If you consider how skeletal the riders look after two and a half weeks of relentless competition, along with the limits on what can be physically absorbed between stages, the nutrition story makes a lot of sense. Did Yates, Pinot and Aru dramatically fall by the wayside simply because they ran out of energy?

The best performing cyclists have excellent balancing skills. This includes the ability to match energy intake with energy demand. The pros benefit from teams of support staff monitoring every aspect of their nutrition and performance. However, many serious club-level cyclists pick up fads and snippets of information from social media or the cycling press that lead them to try out all kinds ideas, in an unscientific manner, in the hope of achieving an improvement in performance. Some of these activities have potentially harmful effects on the body.

Competitive riders can become obsessed with losing weight and sticking to extremely tough training schedules, leading to both short-term and long-term energy deficits that are detrimental to both health and performance. One of the physiological consequences can be a reduction in bone density, which is particularly significant for cyclists, who do not benefit from gravitational stress on bones, due to the non-weight-bearing nature of the sport. In a recent paper, colleagues at Durham University and I describe an approach for identifying male cyclists at risk of Relative Energy Deficit in Sport (RED-S).

You need a certain amount of energy simply to maintain normal life processes, but an athlete can force the body into a deficit in two ways: by intentionally or unintentionally restricting energy intake below the level required to meet demand or by increasing training load without a corresponding increase in fuelling.

EnergyBalance

Our bodies have a range of  ways to deal with an energy deficit. For the average, slightly overweight casual cyclist, burning some fat is not a bad thing. However, most competitive cyclists are already very lean, making the physiological consequences of an energy deficit more serious. Changes arise in the endocrine system that controls the body’s hormones. Certain processes can shut down, such as female menstruation, and males can experience a reduction in testosterone. Sex steroids are important for maintaining healthy bones. In our study of 50 male competitive cyclists, the average bone density in the lumbar spine, measured by DXA scan, was significantly below normal. Some relatively young cyclists had the bones of a 70 year old man!

The key variable associated with poor bone health was low energy availability, i.e. male cyclists exhibiting  RED-S. These riders were identified using a questionnaire followed by an interview with a Sports Endocrinologist. The purpose of the interview was to go through the responses in more detail, as most people have a tendency to put a positive spin on their answers. There were two important warning signs.

  • Long-term energy deficit: a prolonged significant weight reduction to achieve “race weight”
  • Short-term energy deficit: one or more fasted rides per week

Among riders with low energy availability, bone density was not so bad for those who had previously engaged in a weight-bearing sport, such as running. For cyclists with adequate energy availability, those with vey low levels of vitamin D had weaker bones. Across the 50 cyclists, most had vitamin D levels below the level of 90 nmol/L recommended for athletes, including some who were taking vitamin D supplements, but clearly not enough. Studies have shown that the advantages of athletes taking vitamin D supplements include better bone health, improved immunity and stronger muscles, so why wouldn’t you?

In terms of performance, British Cycling race category was positively related with a rider’s power to weight ratio, evaluated by 60 minute FTP per kg (FTP60/kg). Out of all the measured variables, including questionnaire responses, blood tests, bone density and body composition, the strongest association with FTP60/kg was the number of weekly training hours. There was no significant relationship between percentage body fat and FTP60/kg. So if you want to improve performance, rather than starving yourself in the hope of losing body fat, you are better off getting on your bike and training with adequate fuelling.

Cyclists using power meters have the advantage of knowing exactly how many calories they have used on every ride. In addition to taking on fuel during the ride, especially when racing, the greatest benefits accrue from having a recovery drink and some food immediately after completing rides of more than one hour.

For those wishing to know more about RED-S, the British Association of Sports and Exercise Medicine has provided a web resource.

A related blog will explore the machine learning and statistical techniques used to analyse the data for this study.

References

Low energy availability assessed by a sport-specific questionnaire and clinical interview indicative of bone health, endocrine profile and cycling performance in competitive male cyclists, BMJ Open Sport & Exercise Medicine,https://doi.org/10.1136/bmjsem-2018-000424

Relative Energy Deficiency in Sport, British Association of Sports and Exercise Medicine

Synergistic interactions of steroid hormones, British Journal of Sports Medicine

Cyclists: Make No Bones About It, British Journal of Sports Medicine

Male Cyclists: bones, body composition, nutrition, performance, British Journal of Sports Medicine