Creating artistic images from Strava rides

firstimage
Four laps of Richmond Park

When you upload a ride, Strava draws a map using the longitude and latitude coordinates recorded by your GPS device. This article explores ways in which these numbers, along with other metrics, can be used to create interesting images that might have some artistic merit.

The idea was motivated by the huge advances made in the field of Deep Learning, particularly applications for image recognition. However, since datasets come in all shapes and forms, researchers have explored ways of converting different types of data into images.  In a paper published in 2015, the authors achieved success in identifying standard time series by converting them into images.

GPS bike computers typically record snapshots of information every second. What kind of images could these time series generate? It turns out that there are several ways to convert a time series into an image.

Spectrogram

Creating a spectrogram is a standard approach from signal processing that is particularly useful for analysing acoustic files. The spectrogram is a heat map that shows how the underlying frequencies contributing to the signal change over time. Technically, it is derived by calculating the discrete Fourier transform of a window that slides across the time series. I applied this to my regular Saturday morning club ride of four laps around Richmond Park. The image changes a bit once the ride gets going after about 1200 seconds (20 minutes), but, frankly, the result was not particularly illuminating. There is no obvious reason to consider cycling power data as a superposition of frequencies.

spectrogram

Ah! Now we are getting somewhere

The authors of the referenced paper took a different approach to produce things called Gramian Angular Summation Field (GASF), Gramian Angular Difference Field (GADF), and Markov Transition Field (MTF). Read the paper if want to know the details. I created these and something call a Recurrence Plot. All of these methods generate a matrix, by combining every element in the time series with every other element. The underling observations occurring at times t_{1} and t_{2} determine the colour of the pixel at position (t_{1}, t_{2}). Images are symmetric along the lower-left to upper-right diagonal, apart from GADF, which is antisymmetric.

Let’s see how do they look for on four laps of Richmond Park. We have six time series, with corresponding sets of images below. The segmentation of the images is due to periodicity of the data. This is particularly clear in the geographic data (longitude, latitude and altitude). The higher intensity of the main part of the ride is most obvious in the heart rate data. The MTF plots are quite interesting. Scroll down through the images to the next section

data1
Raw time series of power, heart rate, cadence, longitude, latitude and altitude
gasf
Gramian Angular Sum Field
gadf
Gramian Angular Difference Field
mtf
Markov Transition Field
rp
Recurrence Plot

From cycle ride to art

It is one thing to create an image of each item, but how can we combine these to summarise a ride in a single image. I considered two methods of combining time series into a single image: a) create a new image where the vertical and horizontal axes represent different series and b) create a new image by simply adding the corresponding values from two underlying images.

One problem is that some cyclists don’t have gadgets like heart rate monitors and power meters, so I initially restricted myself to just the longitude, latitude and altitude data. Nevertheless, as noted in an earlier blog, it is possible to work out speed, because the time interval is one second between each reading. Furthermore, one can estimate power, from the speed and changes in elevation.

Another problem is that rides differ in length. For this I split the ride into, say, 128 intervals and took the last observation in each interval. So for a 3 hour ride, I’d be sampling about once every 84 seconds.

The chart at the top of this blog was created by first normalising each series to a standard range (-1, +1). Method a) was used to create two images: longitude was added to latitude and altitude was multiplied by speed. These were added using method b). Using these measures will produce pretty much the same chart each time the ride is done. In contrast, an image that is totally unique to the ride can be produced using data relating to the individual rider. The image below uses the same recipe to combine speed, heart rate, power and cadence. If this had been a particularly special ride, the image would be a nice personal memento.

lastimage
A different take on four laps of Richmond Park

For anyone interested in the underlying code, I have posted a Jupyter notebook here.

References

Encoding Time Series as Images for Visual Inspection and Classification Using Tiled Convolutional Neural Networks, Wang Z Oates T, https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/viewFile/10179/10251

 

Machine learning for a medical study of cyclists

Screen Shot 2018-10-11 at 15.28.46

This blog provides a technical explanation of the analysis underlying the medical paper about male cyclists described previously. Part of the skill of a data scientist is to choose from the arsenal of machine learning techniques the tools that are appropriate for the problem at hand. In the study of male cyclists, I was asked to identify significant features of a medical data set. This article describes how the problem was tackled.

Data

Fifty road racing cyclists, riding at the equivalent of British Cycling 2nd category or above, were asked to complete a questionnaire, provide a blood sample and undergo a DXA scan – a low intensity X-ray used to measure bone density and body composition. I used Python to load and clean up the data, so that all the information could be represented in Pandas DataFrames. As expected this time-consuming, but essential step required careful attention and cross-checking, combined with the perseverance that is always necessary to be sure of working with a clean data set.

The questionnaire included numerical data and text relating to cycling performance, training, nutrition and medical history. As a result of interviewing each cyclist, a specialist sports endocrinologist identified a number of individuals who were at risk of low energy availability (EA), due to a mismatch between nutrition and training load.

Bone density was measured throughout the body, but the key site of interest was the lumbar spine (L1-L4). Since bone density varies with age and between males and females, it was logical to use the male, age-adjusted Z-score, expressing values in standard deviations above or below the comparable population mean.

The measured blood markers were provided in the relevant units, alongside the normal range. Since the normal range is defined to cover 95% of the population, I assumed that the population could be modelled by a gaussian distribution in order to convert each blood result into a Z-score. This aligned the scale of the blood results with the bone density measures.

Analysis

I decided to use the Orange machine learning and data visualisation toolkit for this project. It was straightforward to load the data set of 46 features for each of the 50 cyclists. The two target variables were lumbar spine Z-score (bone health) and 60 minute FTP watts per kilo (performance). The statistics confirmed the researchers’ suspicion that the lumbar spine bone density of the cyclists would be below average, partly due to the non-weight-bearing nature of the sport. Some of the readings were extremely low (verging on osteoporosis) and the question was why.

Given the relatively small size of the data set (a sample of 50), the most straightforward approach for identifying the key explanatory variables was to search for an optimal Decision Tree. Interestingly, low EA turned out to be the most important variable in explaining lumbar spine bone density, followed by prior participation in a weight-bearing sport and levels of vitamin D (which was, in most cases, below the ideal level of athletes). Since I had used all the data to generate the tree, I made use of Orange’s data sampler to confirm that these results were highly robust. This had some similarities with the Random Forest approach. Although Orange produces some simple graphical tools like the following, I use Python to generate my own versions for the final publication.

 

Finding a robust decision tree is one thing, but it was essential to verify whether the decision variables were statistically significant. For this, Orange provides box plots for discrete variables. For my own peace of mind, I recalculated all of the Student’s T-statistics to confirm that they were correct and significant. The charts below show an example of an Orange box plot and the final graphic used in the publication.

The Orange toolkit includes other nice data visualisation tools. I particularly liked the flexibility available to make scatter plots. This inspired the third figure in the publication, which showed the most important variable explaining performance. This chart highlights a cluster of three cyclists with low EA, whose FTP watts/kg were lower than expected, based on their high training load. I independently checked the T-statistics of the regression coefficients to identify relationships that were significant, like training load, or insignificant, like percentage body fat.

Conclusions

The Orange toolkit turned out to be extremely helpful in identifying relationships that fed directly into the conclusions of an important medical paper highlighting potential health risks and performance drivers for high level cyclists. Restricting nutrition through diet or fasted rides can lead to low energy availability, that can cause endocrine responses in the body that reduce lumbar spine bone density, resulting in vulnerability to fracture and slow recovery. This is know as Relative Energy Deficiency in Sport (RED-S). Despite the obsession of many cyclists to reduce body fat, the key variable explaining functional threshold power watts/kg was weekly training load.

References

Low energy availability assessed by a sport-specific questionnaire and clinical interview indicative of bone health, endocrine profile and cycling performance in competitive male cyclists, BMJ Open Sport & Exercise Medicine, https://doi.org/10.1136/bmjsem-2018-000424

Relative Energy Deficiency in Sport, British Association of Sports and Exercise Medicine

Synergistic interactions of steroid hormones, British Journal of Sports Medicine

Cyclists: Make No Bones About It, British Journal of Sports Medicine

Male Cyclists: bones, body composition, nutrition, performance, British Journal of Sports Medicine

 

Strava – Automatic Lap Detection

Screen Shot 2018-08-04 at 16.30.58
Opening Laps of Hillingdon Race

As you upload your data, you accumulate a growing history of rides. It is helpful to find ways of classifying different types of activities. Races and training sessions often include laps that are repeated during the ride. Many GPS units can automatically record laps as you pass the point where you began your ride or last pressed the lap button. However, if the laps were not recorded on the device, it is tricky to recover them. This article investigates how to detect laps automatically.

First consider the simple example of a 24 lap race around the Hillingdon cycle circuit. Plotting the GPS longitude and latitude against time displays repeating patterns. It is even possible to see the “omega curve” in the longitude trace. So it should be possible to design an algorithm that uses this periodicity to calculate the number of laps.

Screen Shot 2018-08-03 at 19.07.16This is a common problem in signal processing, where the Fourier Transform offers a neat solution. This effectively compares the signal against all possible frequencies and returns values with the best fit in the form of a power spectrum. In this case, the frequencies correspond to the number of laps completed during the race. In the bar chart below, the power spectrum for latitude shows a peak around 24. The high value at 25 probably shows up because I stopped my Garmin slightly after the finish line. A “harmonic” also shows up at 49 “half laps”. Focussing on the peak value, it is possible to reconstruct the signal using a frequency of 24, with all others filtered out.

Screen Shot 2018-08-03 at 19.20.38Screen Shot 2018-08-03 at 19.24.53

So we’re done – we can use a Fourier Transform to count the laps! Well not quite. The problem is that races and training sessions do not necessarily start and end at exactly the starting point of a lap. As a second example, consider my regular Saturday morning club run, where I ride from home to the meeting point at the centre of Richmond Park, then complete four laps before returning home. As show in the chart below, a simple Fourier Transform approach suggests that ride covered 5 laps, because, by chance, the combined time for me to ride south to the park and north back home almost exactly matches the time to complete a lap of the park. Visually it is clear that the repeating pattern only holds for four laps.

Screen Shot 2018-08-03 at 19.35.07

Although it seems obvious where the repeating pattern begins and ends, the challenge is to improve the algorithm to find this automatically. A brute force method would compare every GPS location with every other location on the ride, which would involve about 17 million comparisons for this ride, then you would need to exclude the points closely before or after each recording, depending on the speed of the rider. Furthermore, the distance between two GPS points involves a complex formula called the haversine rule that accounts for the curvature of the Earth.

Fortunately, two tricks can make the calculation more tractable. Firstly, the peak in the power spectrum indicates roughly how far ahead of the current time point to look for a location potentially close to the current position. Given a generous margin of, say, 15% variation in lap times, this reduces the number of comparisons by a whole order of magnitude. Secondly, since we are looking for points that are very close together, we only need to multiply the longitudes by the cosine of the latitude (because lines of longitude meet at the poles) and then a simple Euclidian sum the squares of the differences locates points within a desired proximity of, say, 10 metres.  This provides a quicker way to determine the points where the rider was “lapping”. These are shaded in yellow in the upper chart and shown in red on a long/latitude plot below. The orange line on the upper chart shows, on the right hand scale, the rolling lap time, i.e. the number of seconds to return to each point on the lap, from which the average speed can be derived.

Screen Shot 2018-08-03 at 20.26.18

Two further refinements were required to make the algorithm more robust. One might ask whether it makes a difference using latitude or longitude. If the lap involved riding back and forth along a road that runs due East-West, the laps would show up on longitude but not latitude. This can be solved by using a 2-dimensional Fourier Transform and checking both dimensions. This, in turn, leads to the second refinement, exemplified by the final example of doing 12 ascents of the Nightingale Lane climb. The longitude plot includes the ride out to the West, 12 reps and the Easterly ride back home.

Screen Shot 2018-08-03 at 20.34.02

The problem here was that the variation in longitude/latitude on the climb was tiny compared with the overall ride. Once again, the repeating section is obvious to the human eye, but more difficult to unpick from its relatively low peak in the power spectrum. A final trick was required: to consider the amplitude of each frequency in decreasing order of power and look out for any higher frequency peaks that appear early on the list. This successfully identified the relevant part of the ride, while avoiding spurious observations for rides that did not include laps.

The ability for an algorithm to tag rides if they include laps is helpful for classifying different types of sessions. Automatically marking the laps would allow riders and coaches to compare laps against each other over a training session or a race. A potential AI-powered robo-coach could say “Ah, I see you did 12 repeats in your session today… and apart from laps 9 and 10, you were getting progressively slower….”

 

Strava Power Curve

Screen Shot 2018-05-11 at 16.34.08
Comparing Historic Power Curves

If you use a power meter on Strava premium, your Power Curve provides an extremely useful way to analyse your rides. In the past, it was necessary to perform all-out efforts, in laboratory conditions, to obtain one or two data points and then try to estimate a curve. But now your power meter records every second of every ride. If you have sustained a number of all-out efforts over different time intervals, your Power Curve can tell you a lot about what kind of rider you are and how your strengths and weaknesses are changing over time.

Strava provides two ways to view your Power Curve: a historical comparison or an analysis of a particular ride. Using the Training drop-down menu, as shown above, you can compare two historic periods. The curves display the maximum power sustained over time intervals from 1 second to the length of your longest ride. The times are plotted on a log scale, so that you can see more detail for the steeper part of the curve. You can select desired time periods and choose between watts or watts/kg.

The example above compares this last six weeks against the year to date. It is satisfying to see that the six week curve is at, or very close to, the year to date high, indicating that I have been hitting new power PBs (personal bests) as the racing season picks up. The deficit in the 20-30 minute range indicates where I should be focussing my training, as this would be typical of a breakaway effort. The steps on the right hand side result from having relatively few very long rides in the sample.

Note how the Power Curve levels off over longer time periods: there was a relatively small drop from my best hour effort of 262 watts to 243 watts for more than two hours. This is consistent with the concept of a Critical Power that can be sustained over a long period. You can make a rough estimate of your Functional Threshold Power by taking 95% of your best 20 minute effort or by using your best 60 minute effort, though the latter is likely to be lower, because your power would tend to vary quite a bit due to hills, wind, drafting etc., unless you did a flat time trial. Your 60 minute normalised power would be better, but Strava does not provide a weighted average/normalised power curve. An accurate current FTP is essential for a correct assessment of your Fitness and Freshness.

Switching the chart to watts/kg gives a profile of what kind of rider you are, as explained in this Training Peaks article. Sprinters can sustain very high power for short intervals, whereas time trial specialists can pump out the watts for long periods. Comparing myself against the performance table, my strengths lie in the 5 minutes to one hour range, with a lousy sprint.

Screen Shot 2018-05-11 at 17.19.45.png
Single Ride Power Curve versus Historic

The other way to view your Power Curve comes under the analysis of a particular ride. This can be helpful in understanding the character of the ride or for checking that training objectives have been met. The target for the session above was to do 12 reps on a short steep hill. The flat part of the curve out to about 50 seconds represents my best efforts. Ideally, each repetition would have been close to this. Strava has the nice feature of highlighting the part of the course where the performance was achieved, as well as the power and date of the historic best. The hump on the 6-week curve at 1:20 occurred when I raced some club mates up a slightly longer steep hill.

If you want to analyse your Power Curve in more detail, you should try Golden Cheetah. See other blogs on Strava Fitness and Freshness, Strava Ride Statistics or going for a Strava KOM.

 

Cycling Data Science – building models

 

Screen Shot 2017-12-24 at 21.19.31.pngIn the previous blog, I explored the structure of a data set of summary statistics from over 800 rides recorded on my Garmin device. The K-means algorithm was an example of unsupervised learning that identified clusters of similar observations without using any identifying labels. The Orange software, used previously, makes it extremely easy to compare a number of simple models that map a ride’s statistics to its type: race, turbo trainer or just a training ride. Here we consider Decision Trees, Random Forests and Support Vector Machines.

Decision Trees

Perhaps the most basic approach is to build a Decision Tree. The algorithm finds an efficient way to make a series of binary splits of the data set, in order to arrive at a set of criteria that separates the classes, as illustrated below.

Tree
Decision Tree

The first split separates the majority of training rides from races and turbo trainer sessions, based on an average speed of 35.8km/h. Then Average Power Variance helped identify races, as observed in the previous blog. After this, turbo trainer sessions seemed to have a high level of TISS Aerobicity, which relates to the percentage of effort done aerobically. Pedalling balance, fastest 500m and duration separated the remaining rides. An attractive way to display these decisions is to create a Pythagorean Tree, where the sides of each triangle relate to the number of observations split by each decision.

Screen Shot 2017-12-24 at 16.32.02
Pythagorean Tree

Random Forests

Many alternative sets of decisions could separate the data, where any particular tree can be quite sensitive to specific observations. A Random Forest addresses this issue by creating a collection of different decision trees and choosing the class by majority vote. This is the Pythagorean Forest representation of 16 trees, each with six branches.

Pythagorean1
Pythagorean Forest

Support Vector Machines

A Support Vector Machine (SVM) is a widely used model for solving this kind of categorisation problem. The training algorithm finds an efficient way to slice the data, that largely separates the categories, while allowing for some overlap. The points that are closest to the slices are called support vectors. It is tricky to display the results in such a high dimensional space, but the following scatter plot displays Average Power Variance versus Average Speed, where the support vectors are shown as filled circles.

SVM
Support Vectors shown as filled circles

Comparison of results

A Confusion Matrix provides a convenient way to compare the accuracy of the models. This correlates the predictions versus the actual category labels. Out of the 809 rides, only 684 were labelled. The Decision Tree incorrectly labelled 20 races and 7 turbos as training rides. The Random Forest did the best job, with only six misclassifications, while the SVM made 11 errors.

Looking at the classification errors can be very informative. It turns out that the two training rides classified as races by the SVM had been accidentally mislabelled – they were in fact races! Furthermore, looking at the five races the that SVM classified as training rides, I punctured in one, I crashed in another and in a third race, I was dropped from the lead group, but eventually rolled in a long way behind with a grupetto. The Random Forest also found an alpine race where my Garmin battery failed and classified it as a training ride. So the misclassifications were largely understandable.

After correcting the data set for mislabelled rides, the Random Forest improved to just two errors and the SVM dropped to just eight errors. The Decision Tree deteriorated to 37 errors, though it did recognise that the climbing rate tends to be zero on a turbo training session.

Prediction

Having trained three models, we can take a look at the sample of 125 unlabelled rides. The following chart shows the predictions of the Random Forest model. It correctly identified one race and suggested several turbo trainer sessions. The SVM also found another race.

asapv
Random Forest predictions of unlabelled rides

Conclusions

Several lessons can be learned from these experiments. Firstly, it is very helpful to start with a clean data set. But if this is not the case, looking at the misclassified results of a decent model can be useful in catching mislabelled data. The SVM seemed to be good for this task, as it had more flexibility to fit the data than the Decision Tree, but it was less prone to overfit the data than the Random Forest.

The Decision Tree was helpful in quickly identifying average speed and power variance (chart below) as the two key variables. The SVM and Random Forest were both pretty good, but less transparent. One might improve on the results by combining these two models.

apv
Distribution of APV (large peak at zero is where no power was recorded for ride)

The next blog will explore this topic further.

 

Cycling Data Science – clusters

Screen Shot 2017-12-11 at 13.38.30

Data Science is a hot topic that is impacting a range of diverse areas from business to sport. With so many cyclists collecting and uploading their data, there is plenty of raw material from which to draw interesting insights. This is the first in a series of articles exploring applications of data science in the field of cycling, beginning with the concept of clustering.

As a data set, I took all my Garmin files covering 2014-2017. Having previously uploaded them onto Golden Cheetah (GC), I took advantage of the API that allows external programmes, such as Python, to retrieve data. I also used a Python library to download the same rides from Strava, where I had recorded additional information about the rides.
After a certain amount of (rather time-consuming) tidying up, I ended up with over 800 rides. Each ride had over 200 summary statistics calculated by GC, as well as other meta-data, such as whether the ride was a race or turbo session. The metrics included all the standard items, such as time, distance, speed, heart rate, power, elevation gain, TSS, normalised power, as well as more esoteric metrics like “Time expended when Power is above CP and W’ bal is between 50% and 75% of W'”. When each ride is represented by a point in 200-dimensional space, it is easy to be overwhelmed. As a coach or an informed rider, which metrics are the most meaningful? This is precisely where data science steps in.
I decided to use some open source machine learning and data visualisation software called Orange. This makes it very straightforward to set up simple pipelines using a toolbox of standard approaches, as illustrated above.
One of the first things to do was to ask the computer to look for clusters of rides with similar characteristics. Orange has a useful feature that finds informative projections of the data that can be displayed on a scatter plot. As a first cut, the K-means algorithm categorised the data into four clusters that were largely explained by the time of day and the duration of the ride.
Screen Shot 2017-12-11 at 16.34.22
Duration of ride (in seconds) versus Time of day (seconds since midnight)

Although this makes a pretty graph, it simply tells us that I start a lot of rides in the morning, but do quite a few in the afternoon and evening. The green cluster includes my longer rides that rather obviously have to start earlier in the day. The scale is annoyingly shown in seconds, so a duration of 1800 would be a five hour ride. The blue band runs from about 1:30pm to about 6:30pm.

Grouping rides by time of day was not very helpful, so I filtered out that variable and searched again for rides that were similar in terms of effort. This made the results much more interesting. Distance and Average Power Variance (APV) were among the most informative metrics. The following scatter plot does a very good job of separating out races (shown in green), from normal rides and turbo trainer sessions (red). The points I did not have time to label are shown in grey.
Screen Shot 2017-12-12 at 19.33.40
Average Power Variance measures the mean power deviation with respect to its 30 second moving average. This will be high when power output is continually changing sharply, as it does on very short town centre courses or the Crystal Palace loop, where you are repeatedly sprinting out of corners. When racing on the Hillingdon and Dunsfold circuits or longer Surrey League routes, power is still much more variable than on a club ride. The band of Saturday club riders is very obvious at 53km: four laps of Richmond Park, with varying levels of APV depending on how aggressively the group was riding. You can also see that I quite often do only one or two laps, at about 19km and 30km. Short TTs and hill climb races tend to have less power variability. This was also the case on the endlessly long climbs encountered on the Haute Route. Lastly, turbo sessions have much lower APV because, even if target power levels vary, they tend to be sustained at the same level for each segment.
It is worth noting that APV is not correlated with the Variability Index, which is the ratio of normalised power to average power. APV is affected by continual changes in power output, whereas the Variability Index is strongly affected by power peaks, even if they a relatively few. The two power files below illustrate the difference.
Screen Shot 2017-12-11 at 17.39.55
Crit race: High APV Low VI
Screen Shot 2017-12-11 at 17.39.02
Three sprints: Low APV High VI

Conclusions

This analysis draws attention to Average Power Variance as a useful metric that is high for circuit and road races, but lower for TTs and long hilly races. The key observation for me is that relatively little of my training has a high APV.

The next part in this series zooms in on the races, to identify metrics associated with good and bad results.

Kings and Queens of the Mountains

Screen Shot 2017-11-09 at 18.40.09.png

I guess that most male cyclists don’t pay much attention to the women’s leaderboards on Strava. And if they do it might just be to make some puerile remark about boys being better than girls. From a scientific perspective the comparison of male and female times leads to some interesting analysis.

Assuming both men and women have read my previous blogs on choosing the best time, weather conditions and wind directions for the segment that suits their particular strengths, we come back to basic physics.

KOM or QOM time = Work done / Power = (Work against gravity + Drag x Distance + Rolling resistance x Distance) / (Mass x Watt/kg)

Of the three components of work done, rolling resistance tends to be relatively insignificant. On a very steep hill, most of the work is done against gravity, whereas on a flat course, aerodynamic drag dominates.

The two key factors that vary between men and women are mass and power to weight ratio (watts per kilo).  A survey published by the ONS in 2010, rather shockingly reported that the average British man weighed 83.6kg, with women coming in at 70.2kg. This gives a male/female ratio of 1.19. KOM/QOM cyclists would tend to be lighter than this, but if we take 72kg and 60kg, the ratio is still 1.20.

Males generate more watts per kilogram due to having a higher proportion of lean muscle mass. Although power depends on many factors, including lungs, heart and efficiency of circulation, we can estimate the relative power to weight ratio by comparing the typical body composition of males and females. Feeding the ONS statistics into the Boer formula gives a lean body mass of 74% for men and 65% for women, resulting in a ratio of 1.13. This can be compared against the the useful table on Training Peaks showing maximal power output in Watts/kg, for men and women, over different time periods and a range of athletic abilities. The table is based on the rows showing world record performances and average untrained efforts.  For world champion five minute efforts and functional threshold powers, the ratios are consistent with the lean mass ratio. It makes sense that the ratio should be higher for shorter efforts, where the male champions are likely to be highly muscular. Apparently the relative performance is precisely 1.21 for all durations in untrained people.

Screen Shot 2017-11-08 at 10.23.33

On a steep climb, where the work done against gravity dominates, the benefit of additional male muscle mass is cancelled by the fact that this mass must be lifted, so the difference in time between the KOM and the QOM is primarily due to relative power to weight ratio. However, being smaller, women suffer from the disadvantage that the inert mass of bike represents a larger proportion of the total mass that must be raised against gravity. This effect increases with gradient. Accounting for a time difference of up to 16% on the steepest of hills.

In contrast, on a flat segment, it comes down to raw power output, so men benefit from advantages in both mass and power to weight ratio. But power relates to the cube of the velocity, so the elapsed time scales inversely with the cube root of power. Furthermore, with smaller frames, women present a lower frontal area, providing a small additional advantage. So men can be expected to have a smaller time advantage of around 9%. In theory the advantage should continue to narrow as the gradient shifts downhill.

Theory versus practice

Strava publishes the KOM and QOM leaderboards for all segments, so it was relatively straightforward to check the basic model against a random selection of 1,000 segments across the UK. All  leaderboards included at least 1,666 riders, with an overall average of 637 women and 5,030 men. One of the problems with the leaderboards is that they can be contaminated by spurious data, including unrealistic speeds or times set by groups riding together. To combat this, the average was taken of the top five times set on different dates, rather than simply to top KOM or QOM time.

The average segment length was just under 2km, up a gradient of 3%. The following chart plots the ratio of the QOM time to the KOM time versus gradient compared with the model described above. The red line is based on the lean body mass/world record holders estimate of 1.13, whereas the average QOM/KOM ratio was 1.32. Although there is a perceivable upward slope in the data for positive gradients, clearly this does not fit the data.

Screen Shot 2017-11-09 at 17.54.43

Firstly, the points on the left hand side indicate that men go downhill much more fearlessly than women, suggesting a psychological explanation for the observations deviating from the model. To make the model fit better for positive gradients, there is no obvious reason to expect the weight ratio of male to female Strava riders to deviate from the general population, so this leaves only the relative power to weight ratio. According to the model the QOM/KOM ratio should level off to the power to weight ratio for steep gradients. This seems to occur for a value of around 1.40, which is much higher than the previous estimates of 1.13 or the 1.21 for untrained people. How can we explain this?

A notable feature of the data set was that sample of 1,000 Strava segments was completed by nearly eight times as many men as women. This, in turn reflects the facts that there are more male than female cyclists in the UK and that men are more likely to upload, analyse, publicise and gloat over their performances than women.

Having more men than women, inevitably means that the sample includes more high level male cyclists than equivalent female cyclists. So we are not comparing like with like. Referring back to the Training Peaks table of expected power to weight ratios, a figure of 1.40 suggests we are comparing women of a certain level against men of a higher category, for example, “very good” women against “excellent” men.

A further consequence of having far more men than women is that is much more likely that the fastest times were recorded in the ideal conditions described in my previous blogs listed earlier.

Conclusions

There is room for more women to enjoy cycling and this will push up the standard of performance of the average amateur rider. This would enhance the sport in the same way that the industry has benefited as more women have joined the workforce.

The fractal nature of GPS routes

The mathematician, Benoît Mandelbrot, once asked “How long is the coast of Britain?“. Paradoxically, the answer depends on the length of your measuring stick. Using a shorter ruler results in a longer total distance, because you take account of more minor details of the shape of the coastline. Extrapolating this idea, reducing the measurement scale down to take account of every grain of sand, the total length of the coast increases without limit.

This has an unexpected connection with the data recorded on a GPS unit. Cycle computers typically record position every second. When riding at 36km/h, a record is stored every 10 metres, but at a speed of 18k/h, a recording is made every 5 metres. So riding as a lower speed equates to measuring distances with a shorter ruler. When distance is calculated by triangulating between GPS locations, your riding speed affects the result, particularly when you are going around a sharp corner.

Consider two cyclists riding round a sharp 90-degree bend with a radius of 13m. The arc has a length of 20m, so the GPS has time to make four recordings for the a rider doing 18km/h, but only two recordings for the rider doing 36km/h. The diagram below shows that the faster rider will have a record of position at each red dot, while the slower rider also has a reading for each green dot.  Although the red and green distances match on the straight section, when it comes to the corner the total length of the red line segments is less than the total of the green segments. You can see this jagged effect if you zoom into a corner on the Strava map of your course. Both triangulated distances are shorter than the actual arc ridden.

Cornering.pngIt is relatively straightforward to show that the triangulation method will underestimate both distance and speed by a factor of 2r/s*sin(s/2r), where r is the radius of the corner in metres and s is speed in m/s. So the estimated length of the 20m arc for the fast rider is 19.4m ridden at a speed of 35.1km/h (2.5% underestimate), while the corresponding figures for the slower rider would be 19.8m at 17.9km/h (0.6% underestimate).

We might ask whether these underestimates are significant, given the error in locating real-time positions using GPS. Over the length of a ride, we should expect GPS errors to average out to approximately zero in all directions. However, triangulation underestimates distance on every corner, so these negative errors accumulate over the ride. Note that when the bike is stationary, any noise in the GPS position adds to the total distance calculated by triangulation. But guess what? This can only happen when you are not moving fast. The case remains that slower riders will show a longer total distance than faster riders.

The simple triangulation method described above does not take account of changes of elevation. This has a relatively small effect, except on the steepest gradients, thus a 10% climb increases in distance by only 0.5%.  In fact, the only reliable way to measure distance that accounts for corners and changes in altitude is to use a correctly-calibrated wheel-based device. Garmin’s GSC-10 speed and cadence monitor tracks the passage of magnets on the wheel and cranks, transmitting to the head unit via ANT+. This gives an accurate measure of ground speed, as long as the correct wheel size is used (and, of course, that changes with the type of tyre, air pressure, rider weight etc.).

According to Strava Support, Garmin uses a hierarchy for determining distance. If you have a PowerTap hub, its distance calculation takes precedence. Next, if you have a GSC-10, its figure is used. Otherwise the GPS positions are used for triangulation. This means that, if you don’t have a PowerTap or a GSC-10 speed/cadence meter, your distance (and speed) measurements will be subject to the distortions described above.

But does this really matter? Well it depends on how “wiggly” a route you are riding. This can be estimated using Richardson’s method. The idea is that you measure the route using different sized rulers and see how much the total distance changes. The rate of change determines the fractal dimension, which we can take as the “wiggliness” of the route.

One way of approximating this method from your GPS data is, firstly, to add up all the distances between consecutive GPS positions,  triangulating latitude and longitude. Then do the same using every other position. Then every fourth position, doubling the gap each time. If you happened to be riding at a constant 36km/h, this equates to measuring distance using a 10m ruler, then a 20m ruler, then a 40m ruler etc..

Using this approach, the fractal dimension of a simple loop around the Surrey countryside is about 1.01, which is not much higher than a straight line of dimension 1. So, with just a few corners, the GPS triangulation error will be low. The Sella Ronda has a fractal dimension of 1.11, reflecting the fact that alpine roads have to follow the naturally fractal-like mountain landscape. Totally contrived routes can be higher, such as this one, with a fractal dimension of 1.34, making GPS triangulation likely to be pretty inaccurate – if you zoom in, lots of corners are cut.

In conclusion, if you ride fast around a wiggly course, your Garmin will experience non-relativistic length contraction. Having GPS does not make your wheel-based speed/cadence monitor redundant.

If you are interested in the code used for this blog, you can find it here.

Strava Fitness and Freshness

The last blog explored the statistics that Strava calculates for each ride. These feed through into the Fitness & Freshness chart provided for premium users. The aim is to show the accumulated effect of training through time, based on the Training-Impulse model originally proposed by Eric Banister and others in a rather technical paper published in 1976.

Strava gives a pretty good explanation of Fitness and Freshness. A similar approach is used on Training Peaks in its Performance Management Chart. On Strava, each ride is evaluated in terms of its Training Load, if you have a power meter, or a figure derived from your Suffer Score, if you just used a heart rate monitor. A training session has a positive impact on your long-term fitness, but it also has a more immediate negative effect in terms of fatigue. The positive impact decays slowly over time, so if you don’t keep up your training, you lose fitness. But your body is able to recover from fatigue more quickly.

The best time to race is when your fitness is high, but you are also sufficiently recovered from fatigue. Fitness minus fatigue provides an estimate of your form. The 1976 paper demonstrated a correlation between form and the performance of an elite swimmers’ times over 100m.

The Fitness and Freshness chart is particularly useful if you are following a periodised training schedule. This approach is recommended by many coaches, such as Joe Friel. Training follows a series of cycles, building up fitness towards the season’s goals. A typical block of training includes a three week build-up, followed by a recovery week. This is reflected in a wave-like pattern in your Fitness and Freshness chart. Fitness rises over the three weeks of training impulses, but fatigue accumulates faster, resulting in a deterioration of form. However, fatigue drops quickly, while fitness is largely maintained during the recovery week, allowing form to peak.

In order to make the most of the Fitness and Freshness charts, it is important that you use an accurate current figure for your Functional Threshold Power. The best way to do this is to go and do a power test. It is preferable to follow a formal protocol that you can repeat, such as that suggested by British Cycling. Alternatively, Strava premium users can refer to the Strava Power Curve. You can either take your best effort over 1 hour or 95% of your best effort over 20 minutes. Or you can click on the “Show estimated FTP” button  and take the lower figure. In order for this to flow through into your Fitness and Freshness chart, you need to enter your 1 hour FTP into your personal settings, under “My Performance”.

Screen Shot 2018-05-08 at 15.14.00

The example chart at the top of this blog shows how my season has panned out so far. After taking a two week break before Christmas, I started a solid block of training in January. My recovery week was actually spent skiing (pretty hard), though this did not register on Strava because I did not use a heart rate monitor. So the sharp drop in fatigue at the end of January is exaggerated. Nevertheless, my form was positive for my first race on 4 February. Unfortunately, I was knocked off and smashed a few ribs, forcing me to take an unplanned two week break. By the time I was able to start riding tentatively, rather than starting from an elevated level, my fitness had deteriorated to December’s trough.

After a solid, but still painful, block of low intensity training in March, I took another “recovery week” on the slopes of St Anton. I subsequently picked up a cold that delayed the start of the next block of training, but I have incorporated some crit races into my plan, for higher intensity sessions. If you edit the activity and make the “ride type” a “race”, it shows up as a red dot on the chart. Barring accident and illness, the hope is to stick more closely to a planned four-week cycle going forward.

This demonstrates how Strava’s tools reveal the real-life difficulties of putting the theoretical benefits of periodisation into practice.

Related posts

Modelling Strava Fitness and Freshness

Supercompensating with Strava

See other blogs on Strava Power Curve, Strava Ride Statistics or going for a Strava KOM.

Strava Ride Statistics

If you ride with a power meter and a heart rate monitor, Strava’s premium subscription will display a number of summary statistics about your ride. These differ from the numbers provided by other software, such as Training Peaks. How do all these numbers relate to each other?

A tale of two scales

Over the years, coaches and academics have developed statistics to summarise the amount of physiological stress induced by different types of endurance exercise. Two similar approaches have gained prominence. Dr Andrew Coggan has registered the names of several measures used by Training Peaks. Dr Phil Skiba has developed as set of metrics used in the literature and by PhysFarm Training Systems. These and other calculations are available on Golden Cheetah‘s excellent free software.

Although it is possible to line up metrics that roughly correspond to each other, the calculations are different and the proponents of each scale emphasise particular nuances that distinguish them. This makes it hard to match up the figures.

Here is an example for a recent hill session. The power trace is highly variable, because the ride involved 12 short sharp climbs.

Metric Coggan TrainingPeaks Skiba Literature Strava
Power equivalent physiological cost of ride Normalised Power 282 xPower 252 Weighted Avg Power 252
Power variability of ride Variability Index 1.57 Variability Index 1.41
Rider’s sustainable power Functional Threshold Power 312 Critical Power 300 FTP 300
Power cost / sustainable power Intensity Factor 0.9 Relative Intensity 0.84 Intensity 0.84
Assessment of intensity and duration of ride Training Stress Score 117 BikeScore 101 Training Load 100
Training Impulse based on heart rate Suffer Score 56

Weighted Average Power

According to Strava, Weighted Average Power takes account of the variability of your power reading during a ride. “It is our best guess at your average power if you rode at the exact same wattage the entire ride.” That sounds an awful lot like Normalized Power, which is described on Training Peaks as “an estimate of the power that you could have maintained for the same physiological “cost” if your power output had been perfectly constant (e.g., as on a stationary cycle ergometer), rather than variable”. But it is apparent from the table above that Strava is calculating Skiba’s xPower.

The calculations of Normalized Power and xPower both smooth the raw power data, raise these observations to the fourth power, take the average over the whole ride and obtain the fourth root to give the answer.

Normalized Power or xPower = (Average(Psmoothed4))1/4

The only difference between the calculations is the way that smoothing accounts for the body’s physiological delay in reacting to rapid changes in pedalling power. Normalized Power uses a 30 second moving average, whereas xPower uses a “25 second exponential average”. According to Skiba, exponential decay is better than Coggan’s linear decay in representing the way the body reacts to changes in effort.

The following chart zooms into part of the hill reps session, showing the raw power output (in blue), moving average smoothing for Normalised Power (in green), exponential smoothing for xPower (in red), with heart rate shown in the background (in grey). Two important observations can be made. Firstly, xPower’s exponential smoothing is more highly correlated with heart rate, so it could be argued that it does indeed correspond more closely with the underlying physiological processes. Secondly, the smoothing used for xPower is less volatile, therefore xPower will always be lower than Normalized Power (because the fourth-power scaling is dominated by the highest observations).

Power

Why do both metrics take the watts and raise them to the fourth power? Coggan states that many of the body’s responses are “curvilinear”. The following chart is a good example, showing the rapid accumulation of blood lactate concentration at high levels of effort.

Screen Shot 2017-04-20 at 15.08.31

Plotting the actual data from a recent test on a log-log scale, I obtained a coefficient of between 3.5 and 4.7, for the relation between lactate level and watts. This suggests that taking the average of smoothed watts raised to the power 4 gives an indication of the average level of lactate in circulation during the ride.

The hill reps ride included multiple bouts of high power, causing repeated accumulation of lactate and other stress related factors. Both the Normalised Power of 282W and xPower of 252W were significantly higher than the straight average power of 179W. The variability index compares each adjusted power against average power, resulting in variability indices of 1.57 and 1.41 respectively. These are very high figures, due to the hilly nature of the session. For a well-paced time trial, the variability index should be close to 1.00.

Sustainable Power

It is important for a serious cyclist to have a good idea of the power that he or she can sustain for a prolonged period. Functional Threshold Power and Critical Power measure slightly different things. The emphasis of FTP is on the maximum power sustainable for one hour, whereas CP is the power theoretically sustainable indefinitely. So CP should be lower than FTP.

Strava allows you to set your Functional Threshold Power under your personal performance settings. The problem is that if Strava’s Weighted Average Power is based on Skiba’s xPower, it would be more consistent to use Critical Power, as I did in the table above. This is important because this figure is used to calculate Intensity and Training Load. If you follow Strava’s suggestion of using FTP, subsequent calculations will underestimate your Training Load,  which, in turn, impacts your Fitness & Freshness curves.

Intensity

The idea of intensity is to measure severity of a ride, taking account of the rider’s individual capabilities.  Intensity is defined as the ratio of the power equivalent physiological cost of the ride relative to your sustainable power. For Coggan, the Intensity Factor is NP/FTP; for Skiba the Relative Intensity is xPower/CP; and for Strava the Intensity is Weighted Average Power/FTP.

Training Load

An overall assessment of a ride needs to take account of the intensity and the duration of a ride. It is helpful to standardise this for an individual rider, by comparing it against a benchmark, such as an all-out one hour effort.

Coggan proposes the Training Stress Score that takes the ratio the work done at Normalised Power, scaled by the Intensity Factor squared, relative to one hour’s work at FTP. Skiba defines the BikeScore as the ratio the work done at xPower, scaled by the Relative Intensity squared, relative to one hour’s work at CP. And finally, Strava’s Training Load takes the ratio the work done at Weighted Average Power, scaled by Intensity squared, relative to one hour’s work at FTP.

Note that for my hill reps ride, the BikeScore of 101, was considerably lower than the TSS of 117. Although my estimated CP is 12W lower than my FTP, xPower was 30W lower than NP. Using my CP as my Strava FTP, Strava’s Training Load is the same as Skiba’s Bike Score (otherwise I’d get 93).

Suffer Score

Strava’s Suffer Score was inspired by Eric Banister’s training-impulse (TRIMP) concept. It is derived from the amount of time spent in each heart rate zone, so it can be calculated for multiple sports. You can set your Strava heart rate zones in your personal settings, or just leave then on default, based on your maximum heart rate.

A non-linear relationship is assumed between effort and heart rate zone. Each minute in Zone 1, Endurance, is worth 12 seconds; Moderate Zone 2 minutes are worth 24 seconds; Zone 3 Tempo minutes are worth 45 seconds; Zone 4 Threshold minutes are worth 100 seconds; and Anaerobic Zone 5 minutes are worth 120 seconds. The Suffer Score is the weighted sum of minutes in each zone.

The next blog will comment on the Fitness & Freshness charts available on Strava Premium.