Kings and Queens of the Mountains

Screen Shot 2017-11-09 at 18.40.09.png

I guess that most male cyclists don’t pay much attention to the women’s leaderboards on Strava. And if they do it might just be to make some puerile remark about boys being better than girls. From a scientific perspective the comparison of male and female times leads to some interesting analysis.

Assuming both men and women have read my previous blogs on choosing the best time, weather conditions and wind directions for the segment that suits their particular strengths, we come back to basic physics.

KOM or QOM time = Work done / Power = (Work against gravity + Drag x Distance + Rolling resistance x Distance) / (Mass x Watt/kg)

Of the three components of work done, rolling resistance tends to be relatively insignificant. On a very steep hill, most of the work is done against gravity, whereas on a flat course, aerodynamic drag dominates.

The two key factors that vary between men and women are mass and power to weight ratio (watts per kilo).  A survey published by the ONS in 2010, rather shockingly reported that the average British man weighed 83.6kg, with women coming in at 70.2kg. This gives a male/female ratio of 1.19. KOM/QOM cyclists would tend to be lighter than this, but if we take 72kg and 60kg, the ratio is still 1.20.

Males generate more watts per kilogram due to having a higher proportion of lean muscle mass. Although power depends on many factors, including lungs, heart and efficiency of circulation, we can estimate the relative power to weight ratio by comparing the typical body composition of males and females. Feeding the ONS statistics into the Boer formula gives a lean body mass of 74% for men and 65% for women, resulting in a ratio of 1.13. This can be compared against the the useful table on Training Peaks showing maximal power output in Watts/kg, for men and women, over different time periods and a range of athletic abilities. The table is based on the rows showing world record performances and average untrained efforts.  For world champion five minute efforts and functional threshold powers, the ratios are consistent with the lean mass ratio. It makes sense that the ratio should be higher for shorter efforts, where the male champions are likely to be highly muscular. Apparently the relative performance is precisely 1.21 for all durations in untrained people.

Screen Shot 2017-11-08 at 10.23.33

On a steep climb, where the work done against gravity dominates, the benefit of additional male muscle mass is cancelled by the fact that this mass must be lifted, so the difference in time between the KOM and the QOM is primarily due to relative power to weight ratio. However, being smaller, women suffer from the disadvantage that the inert mass of bike represents a larger proportion of the total mass that must be raised against gravity. This effect increases with gradient. Accounting for a time difference of up to 16% on the steepest of hills.

In contrast, on a flat segment, it comes down to raw power output, so men benefit from advantages in both mass and power to weight ratio. But power relates to the cube of the velocity, so the elapsed time scales inversely with the cube root of power. Furthermore, with smaller frames, women present a lower frontal area, providing a small additional advantage. So men can be expected to have a smaller time advantage of around 9%. In theory the advantage should continue to narrow as the gradient shifts downhill.

Theory versus practice

Strava publishes the KOM and QOM leaderboards for all segments, so it was relatively straightforward to check the basic model against a random selection of 1,000 segments across the UK. All  leaderboards included at least 1,666 riders, with an overall average of 637 women and 5,030 men. One of the problems with the leaderboards is that they can be contaminated by spurious data, including unrealistic speeds or times set by groups riding together. To combat this, the average was taken of the top five times set on different dates, rather than simply to top KOM or QOM time.

The average segment length was just under 2km, up a gradient of 3%. The following chart plots the ratio of the QOM time to the KOM time versus gradient compared with the model described above. The red line is based on the lean body mass/world record holders estimate of 1.13, whereas the average QOM/KOM ratio was 1.32. Although there is a perceivable upward slope in the data for positive gradients, clearly this does not fit the data.

Screen Shot 2017-11-09 at 17.54.43

Firstly, the points on the left hand side indicate that men go downhill much more fearlessly than women, suggesting a psychological explanation for the observations deviating from the model. To make the model fit better for positive gradients, there is no obvious reason to expect the weight ratio of male to female Strava riders to deviate from the general population, so this leaves only the relative power to weight ratio. According to the model the QOM/KOM ratio should level off to the power to weight ratio for steep gradients. This seems to occur for a value of around 1.40, which is much higher than the previous estimates of 1.13 or the 1.21 for untrained people. How can we explain this?

A notable feature of the data set was that sample of 1,000 Strava segments was completed by nearly eight times as many men as women. This, in turn reflects the facts that there are more male than female cyclists in the UK and that men are more likely to upload, analyse, publicise and gloat over their performances than women.

Having more men than women, inevitably means that the sample includes more high level male cyclists than equivalent female cyclists. So we are not comparing like with like. Referring back to the Training Peaks table of expected power to weight ratios, a figure of 1.40 suggests we are comparing women of a certain level against men of a higher category, for example, “very good” women against “excellent” men.

A further consequence of having far more men than women is that is much more likely that the fastest times were recorded in the ideal conditions described in my previous blogs listed earlier.


There is room for more women to enjoy cycling and this will push up the standard of performance of the average amateur rider. This would enhance the sport in the same way that the industry has benefited as more women have joined the workforce.

The fractal nature of GPS routes

The mathematician, Benoît Mandelbrot, once asked “How long is the coast of Britain?“. Paradoxically, the answer depends on the length of your measuring stick. Using a shorter ruler results in a longer total distance, because you take account of more minor details of the shape of the coastline. Extrapolating this idea, reducing the measurement scale down to take account of every grain of sand, the total length of the coast increases without limit.

This has an unexpected connection with the data recorded on a GPS unit. Cycle computers typically record position every second. When riding at 36km/h, a record is stored every 10 metres, but at a speed of 18k/h, a recording is made every 5 metres. So riding as a lower speed equates to measuring distances with a shorter ruler. When distance is calculated by triangulating between GPS locations, your riding speed affects the result, particularly when you are going around a sharp corner.

Consider two cyclists riding round a sharp 90-degree bend with a radius of 13m. The arc has a length of 20m, so the GPS has time to make four recordings for the a rider doing 18km/h, but only two recordings for the rider doing 36km/h. The diagram below shows that the faster rider will have a record of position at each red dot, while the slower rider also has a reading for each green dot.  Although the red and green distances match on the straight section, when it comes to the corner the total length of the red line segments is less than the total of the green segments. You can see this jagged effect if you zoom into a corner on the Strava map of your course. Both triangulated distances are shorter than the actual arc ridden.

Cornering.pngIt is relatively straightforward to show that the triangulation method will underestimate both distance and speed by a factor of 2r/s*sin(s/2r), where r is the radius of the corner in metres and s is speed in m/s. So the estimated length of the 20m arc for the fast rider is 19.4m ridden at a speed of 35.1km/h (2.5% underestimate), while the corresponding figures for the slower rider would be 19.8m at 17.9km/h (0.6% underestimate).

We might ask whether these underestimates are significant, given the error in locating real-time positions using GPS. Over the length of a ride, we should expect GPS errors to average out to approximately zero in all directions. However, triangulation underestimates distance on every corner, so these negative errors accumulate over the ride. Note that when the bike is stationary, any noise in the GPS position adds to the total distance calculated by triangulation. But guess what? This can only happen when you are not moving fast. The case remains that slower riders will show a longer total distance than faster riders.

The simple triangulation method described above does not take account of changes of elevation. This has a relatively small effect, except on the steepest gradients, thus a 10% climb increases in distance by only 0.5%.  In fact, the only reliable way to measure distance that accounts for corners and changes in altitude is to use a correctly-calibrated wheel-based device. Garmin’s GSC-10 speed and cadence monitor tracks the passage of magnets on the wheel and cranks, transmitting to the head unit via ANT+. This gives an accurate measure of ground speed, as long as the correct wheel size is used (and, of course, that changes with the type of tyre, air pressure, rider weight etc.).

According to Strava Support, Garmin uses a hierarchy for determining distance. If you have a PowerTap hub, its distance calculation takes precedence. Next, if you have a GSC-10, its figure is used. Otherwise the GPS positions are used for triangulation. This means that, if you don’t have a PowerTap or a GSC-10 speed/cadence meter, your distance (and speed) measurements will be subject to the distortions described above.

But does this really matter? Well it depends on how “wiggly” a route you are riding. This can be estimated using Richardson’s method. The idea is that you measure the route using different sized rulers and see how much the total distance changes. The rate of change determines the fractal dimension, which we can take as the “wiggliness” of the route.

One way of approximating this method from your GPS data is, firstly, to add up all the distances between consecutive GPS positions,  triangulating latitude and longitude. Then do the same using every other position. Then every fourth position, doubling the gap each time. If you happened to be riding at a constant 36km/h, this equates to measuring distance using a 10m ruler, then a 20m ruler, then a 40m ruler etc..

Using this approach, the fractal dimension of a simple loop around the Surrey countryside is about 1.01, which is not much higher than a straight line of dimension 1. So, with just a few corners, the GPS triangulation error will be low. The Sella Ronda has a fractal dimension of 1.11, reflecting the fact that alpine roads have to follow the naturally fractal-like mountain landscape. Totally contrived routes can be higher, such as this one, with a fractal dimension of 1.34, making GPS triangulation likely to be pretty inaccurate – if you zoom in, lots of corners are cut.

In conclusion, if you ride fast around a wiggly course, your Garmin will experience non-relativistic length contraction. Having GPS does not make your wheel-based speed/cadence monitor redundant.

If you are interested in the code used for this blog, you can find it here.

Strava Fitness and Freshness

The last blog explored the statistics that Strava calculates for each ride. These feed through into the Fitness & Freshness chart provided for premium users. The aim is to show the accumulated effect of training through time, based on the Training-Impulse model originally proposed by Eric Banister and others in a rather technical paper published in 1976.

Strava gives a pretty good explanation of Fitness and Freshness. A similar approach is used on Training Peaks in its Performance Management Chart. On Strava, each ride is evaluated in terms of its Training Load, if you have a power meter, or a figure derived from your Suffer Score, if you just used a heart rate monitor. A training session has a positive impact on your long-term fitness, but it also has a more immediate negative effect in terms of fatigue. The positive impact decays slowly over time, so if you don’t keep up your training, you lose fitness. But your body is able to recover from fatigue more quickly.

The best time to race is when your fitness is high, but you are also sufficiently recovered from fatigue. Fitness minus fatigue provides an estimate of your form. The 1976 paper demonstrated a correlation between form and the performance of an elite swimmer’s times over 100m.

The Fitness and Freshness chart is particularly useful if you are following a periodised training schedule. This approach is recommended by many coaches, such as Joe Friel. Training follows a series of cycles, building up fitness towards the season’s goals. A typical block of training includes a three week build-up, followed by a recovery week. This is reflected in a wave-like pattern in Fitness and Freshness chart. Fitness rises over the three weeks of training impulses, but fatigue accumulates faster, resulting in a deterioration of form. However, fatigue drops quickly, while fitness is largely maintained during the recovery week, allowing form to peak.

The example chart above shows how my season has panned out so far. After taking a two week break before Christmas, I started a solid block of training in January. My recovery week was actually spent skiing (pretty hard), though this did not register on Strava because I did not use a heart rate monitor. So the sharp drop in fatigue at the end of January is exaggerated. Nevertheless, my form was positive for my first race on 4 February. Unfortunately, I was knocked off and smashed a few ribs, forcing me to take an unplanned two week break. By the time I was able to start riding tentatively, rather than starting from an elevated level, my fitness had deteriorated to December’s trough.

After solid, but still painful, block of low intensity training in March, I took another “recovery week” on the slopes of St Anton. I subsequently picked up a cold that delayed the start of the next block of training, but I have incorporated some crit races into my plan, for higher intensity sessions. If you edit the activity and make the “ride type” a “race”, it shows up as a red dot on the chart. Barring accident and illness, the hope is to stick more closely to a planned four-week cycle going forward.

This demonstrates how Strava’s tools reveal the real-life difficulties of putting the theoretical benefits of periodisation into practice.

Strava Ride Statistics

If you ride with a power meter and a heart rate monitor, Strava’s premium subscription will display a number of summary statistics about your ride. These differ from the numbers provided by other software, such as Training Peaks. How do all these numbers relate to each other?

A tale of two scales

Over the years, coaches and academics have developed statistics to summarise the amount of physiological stress induced by different types of endurance exercise. Two similar approaches have gained prominence. Dr Andrew Coggan has registered the names of several measures used by Training Peaks. Dr Phil Skiba has developed as set of metrics used in the literature and by PhysFarm Training Systems. These and other calculations are available on Golden Cheetah‘s excellent free software.

Although it is possible to line up metrics that roughly correspond to each other, the calculations are different and the proponents of each scale emphasise particular nuances that distinguish them. This makes it hard to match up the figures.

Here is an example for a recent hill session. The power trace is highly variable, because the ride involved 12 short sharp climbs.

Metric Coggan TrainingPeaks Skiba Literature Strava
Power equivalent physiological cost of ride Normalised Power 282 xPower 252 Weighted Avg Power 252
Power variability of ride Variability Index 1.57 Variability Index 1.41
Rider’s sustainable power Functional Threshold Power 312 Critical Power 300 FTP 300
Power cost / sustainable power Intensity Factor 0.9 Relative Intensity 0.84 Intensity 0.84
Assessment of intensity and duration of ride Training Stress Score 117 BikeScore 101 Training Load 100
Training Impulse based on heart rate Suffer Score 56

Weighted Average Power

According to Strava, Weighted Average Power takes account of the variability of your power reading during a ride. “It is our best guess at your average power if you rode at the exact same wattage the entire ride.” That sounds an awful lot like Normalized Power, which is described on Training Peaks as “an estimate of the power that you could have maintained for the same physiological “cost” if your power output had been perfectly constant (e.g., as on a stationary cycle ergometer), rather than variable”. But it is apparent from the table above that Strava is calculating Skiba’s xPower.

The calculations of Normalized Power and xPower both smooth the raw power data, raise these observations to the fourth power, take the average over the whole ride and obtain the fourth root to give the answer.

Normalized Power or xPower = (Average(Psmoothed4))1/4

The only difference between the calculations is the way that smoothing accounts for the body’s physiological delay in reacting to rapid changes in pedalling power. Normalized Power uses a 30 second moving average, whereas xPower uses a “25 second exponential average”. According to Skiba, exponential decay is better than Coggan’s linear decay in representing the way the body reacts to changes in effort.

The following chart zooms into part of the hill reps session, showing the raw power output (in blue), moving average smoothing for Normalised Power (in green), exponential smoothing for xPower (in red), with heart rate shown in the background (in grey). Two important observations can be made. Firstly, xPower’s exponential smoothing is more highly correlated with heart rate, so it could be argued that it does indeed correspond more closely with the underlying physiological processes. Secondly, the smoothing used for xPower is less volatile, therefore xPower will always be lower than Normalized Power (because the fourth-power scaling is dominated by the highest observations).


Why do both metrics take the watts and raise them to the fourth power? Coggan states that many of the body’s responses are “curvilinear”. The following chart is a good example, showing the rapid accumulation of blood lactate concentration at high levels of effort.

Screen Shot 2017-04-20 at 15.08.31

Plotting the actual data from a recent test on a log-log scale, I obtained a coefficient of between 3.5 and 4.7, for the relation between lactate level and watts. This suggests that taking the average of smoothed watts raised to the power 4 gives an indication of the average level of lactate in circulation during the ride.

The hill reps ride included multiple bouts of high power, causing repeated accumulation of lactate and other stress related factors. Both the Normalised Power of 282W and xPower of 252W were significantly higher than the straight average power of 179W. The variability index compares each adjusted power against average power, resulting in variability indices of 1.57 and 1.41 respectively. These are very high figures, due to the hilly nature of the session. For a well-paced time trial, the variability index should be close to 1.00.

Sustainable Power

It is important for a serious cyclist to have a good idea of the power that he or she can sustain for a prolonged period. Functional Threshold Power and Critical Power measure slightly different things. The emphasis of FTP is on the maximum power sustainable for one hour, whereas CP is the power theoretically sustainable indefinitely. So CP should be lower than FTP.

Strava allows you to set your Functional Threshold Power under your personal performance settings. The problem is that if Strava’s Weighted Average Power is based on Skiba’s xPower, it would be more consistent to use Critical Power, as I did in the table above. This is important because this figure is used to calculate Intensity and Training Load. If you follow Strava’s suggestion of using FTP, subsequent calculations will underestimate your Training Load,  which, in turn, impacts your Fitness & Freshness curves.


The idea of intensity is to measure severity of a ride, taking account of the rider’s individual capabilities.  Intensity is defined as the ratio of the power equivalent physiological cost of the ride relative to your sustainable power. For Coggan, the Intensity Factor is NP/FTP; for Skiba the Relative Intensity is xPower/CP; and for Strava the Intensity is Weighted Average Power/FTP.

Training Load

An overall assessment of a ride needs to take account of the intensity and the duration of a ride. It is helpful to standardise this for an individual rider, by comparing it against a benchmark, such as an all-out one hour effort.

Coggan proposes the Training Stress Score that takes the ratio the work done at Normalised Power, scaled by the Intensity Factor, relative to one hour’s work at FTP. Skiba defines the BikeScore as the ratio the work done at xPower, scaled by the Relative Intensity, relative to one hour’s work at CP. And finally, Strava’s Training Load takes the ratio the work done at Weighted Average Power, scaled by Intensity, relative to one hour’s work at FTP.

Note that for my hill reps ride, the BikeScore of 101, was considerably lower than the TSS of 117. Although my estimated CP is 12W lower than my FTP, xPower was 30W lower than NP. Using my CP as my Strava FTP, Strava’s Training Load is the same as Skiba’s Bike Score (otherwise I’d get 93).

Suffer Score

Strava’s Suffer Score was inspired by Eric Banister’s training-impulse (TRIMP) concept. It is derived from the amount of time spent in each heart rate zone, so it can be calculated for multiple sports. You can set your Strava heart rate zones in your personal settings, or just leave then on default, based on your maximum heart rate.

A non-linear relationship is assumed between effort and heart rate zone. Each minute in Zone 1, Endurance, is worth 12 seconds; Moderate Zone 2 minutes are worth 24 seconds; Zone 3 Tempo minutes are worth 45 seconds; Zone 4 Threshold minutes are worth 100 seconds; and Anaerobic Zone 5 minutes are worth 120 seconds. The Suffer Score is the weighted sum of minutes in each zone.

The next blog will comment on the Fitness & Freshness charts available on Strava Premium.