Hugging Face

I have been blown away exploring Hugging Face. It’s a community on a mission “to democratize good machine learning”. It provides access to a huge library of state-of-the-art models. So far I have only scratched the surface of what is available, but this blog gives a sample of things I have tried.

At the time of writing, there were 128,463 pre-trained models covering a huge range of capabilities, including computer vision, natural language processing, audio, tabular, multimodal and reinforcement models. The site is set up to make it incredibly easy to experiment with a demo, download a model, run it in a Jupyter notebook, fine-tune it for a specific task and then add it to the space of machine learning apps created by the community. For example, an earlier blog describes my FilmStars app.

Computer vision with text

This is an example from an app that uses the facebook/detr-resnet-50 model to identify objects in an image. It successfully located eight objects with high confidence (indicated by the numbers), but it was fooled into thinking part of the curved lamppost in front of the brickwork pattern was a tennis racket (you can see why).

Image-to-text models go further by creating captions describing what is in the image. I used an interactive demo to obtain suggested captions from a range of state-of-the-art models. The best result was produced by the GIT-large model, whereas a couple of models perceived a clocktower .

These models can also answer questions about images. Although all of the answers were reasonable, GIT-large produced the best response when I asked “Where is the cyclist?”

The next image is an example of text-based inpainting with CLIPSeg x Stable Diffusion, where I requested that wall should be replaced with an apartment block. The model successfully generated a new image while preserving the cyclist, flowers, arch, background and even the birds on the roof. I had great fun with this app, imagining what my friend’s house will look like, when it eventually emerges from a building site.

Continuing with the theme of image generation, I reversed the image to caption problem, by asking a stable-diffusion-v1-5 model to generate an image from the caption “a cyclist rides away through an old brick archway in a city”. It came up with an image remarkably similar to what we started with, even including a female cyclist.

Do it yourself

HuggingFace provides various ways for you to download any of the models from its library. The easiest way to do this is to set up a free account on kaggle, which offers a Jupyter notebook environment with access to a GPU.

Using a HuggingFace pipeline, you can run a model with three lines of Python code! Pipelines can be set up for the image models above, but this is an example of the code required to run a text-based natural language processing task. It creates and runs a pipeline that summarises text, using a model specifically trained to generate output in the style of SparkNotes.

from transformers import pipeline
summarizer = pipeline("summarization",model="pszemraj/long-t5-tglobal-base-16384-book-summary")
summarizer("""Sample text from a book...""")

This rather morbid sample text produced the output from Python that follows.

The fact that Henry Armstrong was buried did not seem to him to prove that he was dead: he had always been a hard man to convince. That he really was buried, the testimony of his senses compelled him to admit. His posture — flat upon his back, with his hands crossed upon his stomach and tied with something that he easily broke without profitably altering the situation — the strict confinement of his entire person, the black darkness and profound silence, made a body of evidence impossible to controvert and he accepted it without cavil.

But dead — no; he was only very, very ill. He had, withal, the invalid’s apathy and did not greatly concern himself about the uncommon fate that had been allotted to him. No philosopher was he — just a plain, commonplace person gifted, for the time being, with a pathological indifference: the organ that he feared consequences with was torpid. So, with no particular apprehension for his immediate future, he fell asleep and all was peace with Henry Armstrong.

But something was going on overhead. It was a dark summer night, shot through with infrequent shimmers of lightning silently firing a cloud lying low in the west and portending a storm. These brief, stammering illuminations brought out with ghastly distinctness the monuments and headstones of the cemetery and seemed to set them dancing. It was not a night in which any credible witness was likely to be straying about a cemetery, so the three men who were there, digging into the grave of Henry Armstrong, felt reasonably secure.

From One Summer Night by Ambrose Bierce
[{'summary_text': "Henry's body is buried in the cemetery, but it does not seem to make him any more certain that he is dead. Instead, he seems to be completely ill."}]

Having come this far, it takes only a few steps to fine tune the model to match your desired task, put it into a GitHub repository and launch your own app as a fully fledged member of the Hugging Face community. A nice explanation is available at lesson 4.

Active Inference

Active Inference is a fascinating and ambitious book. It describes a very general normative approach to understanding the mind, brain and behaviour, hinting at potential applications in machine learning and the social sciences. The authors argue that the ways in which living beings interact with the environment can be modelled in terms of something called the free energy principle.

Active Inference builds on the concept of a Bayesian Brain. This is the idea that our brains continually refine an internal model of the external world, acting as probabilistic inference machines. The internal generative model continually predicts the state of the environment and compares its predictions with the inputs of sensory organs. When a discrepancy occurs, the brain updates its model. This is called perception.

But Active Inference goes further my recognising that living things can interact with their environments. Therefore an alternative way to deal with a discrepancy versus expectations is to do something that modifies the world. This is called action.

Variational Free Energy

Active Inference, Parr, Pezzulo, Friston

Either you change your beliefs to match the world or you change the world to match your beliefs. Active Inference makes this trade off by minimising variational free energy, which improves the match between an organism’s internal model and the external world.

The theory is expressed in elegant mathematical terms that lend themselves to systematic analysis. Minimising variational free energy can be considered in terms of finding a maximum entropy distribution, minimising complexity or reducing the divergence between the internal model and the actual posterior distribution.

Expected free energy

Longer term planning is handled in terms of expected free energy. This is where the consequences of future sequences of actions (policies) are evaluated by predicting the outcomes at each stage. The expected free energy of each policy is converted into a score, with the highest score determining the policy the organism expects to pursue. The process of selecting policies that improve the match with the priors pertaining to favoured states is called learning.

Planning is cast in terms of Bayesian inference. Once again the algebraic framework lends itself to a range of interpretations. For example, it automatically trades off information gain (exploration) against pragmatic value (exploitation). This contrasts with reinforcement learning, which handles the issue more heuristically, by trial and error, combined with the notion of a reward.


The book describes applications in neurobiology, learning and perception. Although readers are encouraged to apply the ideas to new areas, a full understanding of the subject demands the dedication to battle through some heavy duty mathematical appendices, covering Bayesian inference, partially observed Markov Decision Processes and variational calculus.

Nevertheless the book is filled with thought provoking ideas about how living things thrive in the face of the second law of thermodynamics.

Eddy goes to Hollywood

Should Eddy Merckx win an Oscar? Could the boyish looks of Tadej Pogačar or Remco Evenepoel make it in the movies? Would Mathieu van der Poel’s chiselled chin or Wout van Aert strong features help them lead the cast in the next blockbuster? I built a FilmStars app to find out.

Building a deep learning model

Taking advantage of the fantastic deep learning library provided by, I downloaded and cleaned up 100 photos of IMDb’s top 100 male and female stars. Then I used a free GPU on Kaggle to fine-tune a pre-trained Reset50 neural net architecture to identify movie stars from their photos. It took about 2 hours to obtain an accuracy of about 60%. There is no doubt that this model could be greatly improved, but I stopped at that point in the interest of time. After all, 60% is a lot better than the 0.5% obtained from random guessing. Then I used HuggingFace to host my app. The project was completed in two days with zero outlay for resources.

It is quite hard to identify movie stars, because adopting a different persona is part of the job. This means that actors can look very different from one photo to the next. They also get older: sadly, the sex bombs of the ’60s inevitably become ageing actresses in their sixties. So the neural network really had its work cut out to distinguish between 200 different actors, using a relatively small sample of data and only a short amount of training.

Breaking away

Creating the perfect film star identifier was never really the point of the app. The idea was to allow people to upload images to see which film stars were suggested. If you have friend who looks like Ralph Fiennes, you can upload a photo and see whether the neural net agrees.

I tried it out with professional cyclists. These were the top choices.

Eddy MerckxJames Dean
Tadej PogačarMatt Damon
Remco EvenepoelMel Gibson
Mathieu van der PoelLeonardo DiCaprio
Wout van Aert Brad Pitt
Marianne VosJodie Foster
Ashleigh MoolmanMarion Cotillard
Katarzyna NiewiadomaFaye Dunaway
Anna van der BreggenBrigitte Bardot
Cycling Stars

In each case I found an image of the top choice of film star for comparison.

The model was more confident with the male cyclists, though it really depends on the photo and even the degree of cropping applied to the image. The nice thing about the app is that people like to be compared to attractive film stars, though there are are few shockers in the underlying database. The model does not deal very well with beards and men with long hair. It is best to use a “movie star” type of image, rather than someone wearing cycling kit.

Closing the gap

Artwork: David Mitchell

One of the big differences between amateur and professional cycle racing is the role of the breakaway. Amateur racing usually consists of a succession of attacks until a group of strong riders breaks away and disappears into the distance to contest the win. There is rarely enough firepower left in the peloton to close down the gap.

This contrasts with professional racing, where a group of weaker riders typically contests for the breakaway, hoping that the pursuing teams will miscalculate their efforts to bring their leaders to the head of the race in the final kilometres. Occasionally a solo rider launches a last minute attack, forcing other riders to chase.

One minute for every ten kilometres

Much of the excitement for cycling fans is generated by the tension between the breakaway and the peloton, especially when the result of the race hangs in the balance until the final metres. Commentators often say that the break needs a lead of at least one minute for every ten kilometres before the finish line. Where does this rule of thumb come from?

It’s time for some back of the envelope calculations. On flat terrain, the breakaway may ride the final 10km of a professional race at about 50kph. A lead of one minute equates to a gap of 833m, which the peloton must close within the 12 minutes that it will take the breakaway riders to reach the finish line. This means the peloton must ride at 54.2kph, which is just over 8.3% faster than the riders ahead.

On a flat road power would be almost exclusively devoted to overcoming aerodynamic drag. The effort rises with the cube of velocity, so the power output of the chasing riders needs to be 27% high. If the breakaway riders are pushing out 400W, the riders leading the chasing group need to be doing over 500W.

The peloton has several advantages. Riding in the bunch saves a lot of energy, especially relative to the efforts of a small number of riders who have been in a breakaway all day. This means that many riders have energy reserves available to lift the pace at the end of the race. Teams are drilled to deploy these reserves efficiently by drafting behind the riders who are emptying themselves at the front of the chasing pack. Having the breakaway in sight provides a psychological boost as the gap narrows. The one minute rule suggests these benefits equate to a power advantage of around 25%.

Not for an uphill finish

If the race finishes on a long climb, a one minute lead is very unlikely to be enough for the break to stay away. Ascending at 25kph equates to a gap of only 417m and now the peloton has 25 minutes to make up the difference. This can be achieved by riding at 26kph. This is just 4% faster, requiring 13% higher power to overcome the additional aerodynamic drag. This would be about 450W, if the break is holding 400W.

The chasing peloton still has fresher riders, who may be able to see the break up the road, they do not have the same drafting advantages when climbing at 26kph. The other big factor is gravity. The specialist climbers are able to put in strong accelerations on steep sections, quickly gaining on those ahead. They can climb faster than heavier riders at equivalent power.

If we take the same figure for the power advantage over the break as before, of around 25%, the break would need to have a lead of 1 minute 55 seconds as it passes the 10km banner. However, experience suggests that unless there is a very strong climber in the break, a much bigger time gap would be required for the break to stay away.

Chasing downhill

This analysis also explains why it is very difficult to narrow a gap on a fast descent. Consider a 10km sweeping road coming down from an alpine pass to the valley. Riding at 60kph, a one minute gap equates to 1km. The peloton would have to average 66kph over the whole 10km in order to make the catch in then ten minute descent. In spite of the assistance of gravity, the 10% higher speed converts into a 33% increase in the effect of drag, where riders begin to approach terminal velocity.

Amateur breaks

Amateurs do not have the luxury of a directeur sportif running a spreadsheet in a following team car. In fact you are lucky if you anyone gives you an idea of a time gap. The best strategy is firstly to follow the attacks of the strongest riders in order to get into a successful break and then encourage your fellow breakaway riders, verbally and by example, to ride through and off, in order to establish a gap. As you get closer to the finish, you should assess the other riders in order to work out how you are going to beat them over the line.

Critical Power Model – energy and waste

The critical power model is one of the most useful tools for optimising race performance, but why does it work? The answer lies in the connection between the depletion of energy reserves and the accumulation of waste products.

Variation of W’ Balance over a race

A useful overview of the critical power model can be found in a paper by Clarke and Skiba. It applies well to cycling, where power can be measured directly, and to other sports were velocity can play the role of power. Critical power (CP) is the maximum power that an athlete can sustain for a long time without suffering fatigue. This measure of performance is closely related to other threshold values, including lactate threshold, gas exchange threshold, V̇O2max and functional threshold power (FTP). An advantage of CP is that it is a directly related to performance and can be measured outside a laboratory.

The model is based on empirical observations of how long athletes can sustain levels of power, P, in excess of their personal CP. The time to exhaustion tends to be inversely proportional to the extent that P exceeds CP. This can be described by a simple formula, where excess power multiplied by time to exhaustion, t, is a constant, known as W’ (read as “W-prime”) or anaerobic power capacity.


Physics tells us that power multiplied by time is work (or energy). So the model suggests that there is a fixed reserve of energy that is available for use when we exceed our CP. For a typical athlete, this reserve is in the order of 20 to 30 kilojoules.

Knowing your personal CP and W’ is incredibly useful

Suppose you have a CP of 250W and a W’ of 21.6kJ. You are hoping to complete a 10 mile TT in 24 minutes. This means you can afford to deplete your W’ by 0.9kJ per minute, which equates to 900J in 60 seconds or a rate of 15W. Therefore your target power should be 15W above CP, i.e. 265W. By holding that power your W’ balance would slowly fall to zero over 24 minutes.
Theoretically, you could burn through your entire W’ by sprinting at 1250W for 21.6 seconds.

Replenishing W’

While it may be possible to maintain constant power on a flat TT or on a steady climb, most race situations involve continual changes of speed. A second aspect of the critical power model is that W’ is slowly replenished as soon as your power drops below CP. The rate of replenishment varies between individuals, but it has a half-time of the order of 3.5 minutes, on gentle recovery.

This means that in a race situation, W’ can recover after an initial drop. By hiding in the peloton and drafting behind other riders, your W’ can accumulate sufficiently to mount a blistering attack, of precisely known power and duration. The chart above, generated in Golden Cheetah, shows the variation of my W’ balance during a criterium race, where I aimed to hit zero in the final sprint. You can even download an app onto your Garmin headset that measures W’ in real time. It is great for criterium racing, but becomes less accurate in longer races if you fail to take on fuel at the recommended rate.


Although I am completely convinced that the critical power model works very well in race situations, I have always had a problem with the idea that W’ is some kind of magical energy reserve that only becomes available when my power exceeds CP. Is there a special biological label that says this glycogen is reserved only for use of when power exceeds CP?

One possible answer is that energy is produced largely by the aerobic system up to CP, but above that level, the anaerobic system has to kick in to produce additional power, hence the name anaerobic work capacity. That sounds reasonable, but the aerobic system burns a mix of two fuels, fat and glucose, while the anaerobic system burns only glucose. The glucose is derived from carbohydrates, stored in the liver and muscles in the form of glycogen. But it is all the same glucose, whether it is used aerobically or anaerobically. The critical power model seems to imply that there is a special reserve of glucose that is held back for anaerobic use. How can this be?

The really significant difference between the two energy systems is that the byproducts of aerobic metabolism are water and exhaled CO2, whereas anaerobic glycolysis produces lactic acid, which dissociates into H+ ions and lactate. Note that two H+ ions are produced from every glucose molecule. The lactate can be used as a fuel, but the accumulation of H+ ions presents a problem, by reducing the pH in the cells and making the blood more acidic. It is the H+ ions rather than the lactate that causes the burning sensation in the muscles.

The body is well equipped to deal with a drop in pH in the blood, in order to prevent the acidity from causing essential proteins to denature. Homeostasis is maintained by buffering agents, such as zwitterions, that mop up the H+ ions. However, if you keep producing more H+ ions by furiously burning glucose anaerobically, the cell environment become increasing hostile, with decreasing levels of intramuscular phosphocreatine and rising inorganic phosphate. The muscles eventually shut down because they simply can’t absorb the oxygen required to maintain the flux of ATP. There is also a theory that a “central governor” in the brain forces you to stop before too much damage ensues.

You don’t “run out of energy”; your muscles drown in their own waste products

It is acknowledged that the magnitude of the W′ might also be attributed to the accumulation of fatigue-related metabolites, such as H+ and Pi and extracellular K+.

Jones et al

If you reach the point of exhaustion due to an accumulation of deleterious waste products in the muscles, why do we talk about running out of energy? And what does this have to do with W’?

Firstly note that CP represents the maximum rate of aerobic exertion, at which the body is able to maintain steady state. Oxygen, inhaled by the lungs, is transported to the muscles and the CO2 byproduct is exhaled. Note that the CO2 causes some acidity in the blood, but this is comfortably managed by the buffering agents.

The connection between H+ ions and energy is evident in the following simple chemical formula for anaerobic glycolysis. Each glucose molecule produces two lactate ions and two H+ ions, plus energy.

C6H12O6 → 2 CH3COCO2 + 2 H+ + Energy

This means that the number of H+ ions is directly proportional to energy. A W’ of 21.6kJ equates to a precise number of excess H+ ions being produced aerobically. If you maintain power above CP, the H+ ions accumulate, until the muscles stop working.

If you reduce power below CP, you do not accumulate a magic store of additional energy stores. What really happens is that your buffering systems slowly reduce the accumulated H+ ions and other waste products. This means you are able to accommodate addition H+ ions next time you exceed CP and the number of H+ ions equates to the generation a specific amount of energy that can be conveniently labeled W’.


W’ or anaerobic work capacity acts as a convenient, physically meaningful and measurable proxy for the total accumulated H+ ions and other waste products that your muscles can accommodate before exhaustion is reached. When racing, as in life, is always a good idea to save energy and reduce waste.


Overview : Rationale and resources for teaching the mathematical modeling of athletic training and performance, David C. Clarke and Philip F. Skiba

Detailed analysis: Critical Power: Implications for Determination of V˙O2max and Exercise Tolerance, Andrew Jones et al

Implementation: W’bal its implementation and optimisation, Mark Liversedge

Wasteland – my message to COP26

Click on image to play


The sun went down
On a barren old town
Just a grave of degradation.
Once was green
Now a desolate scene
Not a blade of vegetation.
Cutting down trees anywhere you please
Is it really human nature?

The heat is on
As the desert moves on
In world of mass migration.
Floods and storms
Setting up new norms,
Butterfly wing causation.
Hurricanes blow and the night skies glow
In a primal scream of nature.

You never walked in the wasteland
You never came face to face, man.
But I knew.
We all knew…
…You knew too.

Electric cars
Never getting too far
In a green revolution.
SpaceX guy
Putting rockets in the sky
Spilling out more pollution.
Will getting on a plane, ever feel the same
When you stop and think about the future?

What a good day
For the prophets to say
The levels of the seas are rising.
Does anyone care
When they’re blowing hot air?
Is anyone compromising?
When it’s hotter each day our children pay
With a volatile mixed-up future.

You never walked in the wasteland
You never came face to face, man.
But I knew.
We all knew…
…You knew too.

Tadej Pogacar

Click on the image to hear some music I composed. It was featured on The Cycling Podcast for Stage 14 of the Tour de France 2021. The tune is free to download.

These are the lyrics.

The boy prince

Who’s the guy in yellow up the road?
He’s a kind of mellow looking dude
Who’s the guy in yellow up the road?
Motivated fellow looking good.

Mathieu van der Poel
You were digging deep
But guess who’s on a roll
Tadej Pogacar
We know who you are
Tadej Pogacar
You’re a superstar!

Faster in the TT than a train
Riding up the Giant of Provence
Winner of a grand tour once again?
Boy prince of the tour: princeling of the Tour de France.

Mathieu van der Poel
Your were digging deep
But guess who’s on a roll
Tadej Pogacar
We know who you are
Tadej Pogacar
You’re a superstar!

Who’s the guy in yellow riding up the road?
He’s a kind of mellow gentle-looking dude
Who’s the guy in yellow rolling up the road?
Motivated fellow: he’s sure looking good
Faster in the TT faster than a train
Riding up Mont Ventoux, Giant of Provence
Winner of a grand tour, in Paris once again?
Boy prince of the tour: princeling of the Tour de France.

Supercompensating with Strava

Supercompensation sounds like a reference to an investment banker’s salary, but in fact it describes the body’s ability to adapt positively to a training stimulus. The idea is to attain a higher level of fitness, following a training session, than you had before. In fact, that is generally the point of training. This concept is closely linked to Strava’s Fitness and Freshness charts.

The development of athletic performance requires a delicate balance between an adequate stimulus that drives adaptation and the provision of sufficient recovery time to allow these adaptations to take place

Endocrinology of Physical Activity in Sport, Third Edition

Much has been written about supercompensation, but, as the quotation above highlights, improving your own personal performance depends on
– applying the optimal amount of training stimulus and
– allowing the correct amount of recovery time.

How does supercompensation work?

A hard training session puts your body under stress. An athlete who is perspiring profusely and complaining of aching limbs experiences similar symptoms to a patient with a severe fever. The stress induced by both of these situations is picked up in the brain by the hypothalamus, which triggers a range of hormonal responses, putting the body into recovery mode.

Physical exercise challenges the muscular-skeletal, cardiovascular and neurological systems. The hormonal response elicits a range of actions around the body, including muscle repair, replenishment of glycogen stores, increase in mitochondria and reinforcement of neural pathways. These processes do not begin until activity has ceased, so, in fact, you become fitter during the rest and recovering phase, rather than while you are actually exercising.

The recovery processes take time and energy. In addition to fuelling before and during exercise, it is important to refuel after a hard training session, particularly during the first 20 minutes.

Optimal training stimulus

Training stimulus is a function of duration and intensity. Strava measures this as Training Load, which shows up as Training Impulse on your Fitness & Freshness chart. This is similar to other commonly used measures. You should also have in mind what aspect of fitness you need to develop for your target events (endurance, power, sprint etc.).

I recently rode over 200km from London to Brighton and back, which Strava calculated as a Training Load of 400. Unfortunately this probably did not make me much fitter, because it left me greatly fatigued. During the next two days that I spent recovering, my body probably just about reattained its previous base line level of fitness and failed to achieve supercompensation. It was a great ride, but it was also an example of excessive training stimulus .

On the other hand, going for a gentle ride without any strong effort is unlikely to put the body under enough stress to give rise to the desired hormonal response. Any supercompensation is likely to be minimal. Some people might call this “junk training”, because higher duration or intensity is needed, in order to become fitter.

So what is the optimal training stimulus should you aim for? A simple answer is to check your Strava Fitness & Freshness page and set a target Training Load equal to about 1.3 to 1.5 times your current Fitness (quite a hard session). This all links back to how to ramp up your fitness.

The right recovery time

As mentioned above, you get fitter while you are recovering. Ideally your next training session should be timed to match the peak of supercompensation. The colour coding of the chart provides a traffic light system. If you train again too early, your body will not have time to recover. But if you leave it too long, you miss the opportunity. As a general rule, it is sensible to follow a hard training day with an easier day. It is also very important to take one full rest day per week, where activity is limited to nothing more than a short walk or some stretching. When is comes to recovery, remember that sleep is “Chief nourisher in life’s feast”.

Functional overreaching (FOR)

Good periodisation of training stimulus and recovery results in beneficial performance adaptation, known as functional overreaching. This stimulates anabolic (muscle building) hormones, such as IGF1 and testosterone, while stress hormones, like cortisol remain low. The athlete sees a steady improvement in performance.

Nonfunctional overreaching (NFOR)

Nonfunctional overreaching occurs when an athlete is too eager to train again. Without sufficient recovery, the body is only just back to base line when it is hit with another bout of exercise. No time is allowed for the anabolic response. This is throwing away the potential benefits of supercompensation and leads to a stagnation of performance.

Overtraining syndrome (OTS)

Overtraining syndrome occurs when the next training session begins before the body has fully recovered from the last one. This can be a problem for athletes juggling a high number of training hours with a full-time job. When the endocrine system is put under this level of stress, cortisol, prolactin and creatine kinase tend to rise, while sex steroids become depressed. This results in an accumulation of fatigue and a progressive deterioration of performance.

When were you last in a fully recovered state?

You can tell which of these situations applies to you, by asking how long has it been since you were in a fully recovered state? If it is days, you should be able to get fitter. If it is weeks, you may be in a state of nonfunctional overreaching. If you have not been in a fully recovered state for months, you have overtraining syndrome. The period taken to recover to a healthy state often has the same timescale.

How do I know if I am in a fully recovered state?

Various apps use heart rate variability (HRV) as an indicator of recovery. Alternatively, you can activate the sliders for Fatigue and Form on your Strava Fitness & Freshness page and look for positive Form. This is when Fitness is greater than Fatigue. My chart below shows a sustained period of high Fatigue and negative Form in April, suggesting that some of the training in that heavy block may have been somewhat counterproductive, but at least I took a rest week in early May.

Super compensation

Supercompensation is the underlying mechanism of periodised training. It works on a number of timescales from the days in a weekly plan, to the weeks in a monthly plan and up to the months in the season’s plan. I hope that this read has provided you with super compensation.

Related posts

Milan Sanremo in a Random Forest

Last time I tried to predict a race, I trained up a neural network on past race results, ahead of the World Championships in Harrogate. The model backed Sam Bennett, but it did not take account of the weather conditions, which turned out to be terrible. Fortunately the forecast looks good for tomorrow’s Milan Sanremo.

This time I have tried using a Random Forest, based on the results of the UCI races that took place in 2020 and so far in 2021. The model took account of each rider’s past results, team, height and weight, together with key statistics about each race, including date, distance, average speed and type of parcours.

One of the nice things about this type of model is that it is possible to see how the factors contribute to the overall predictions. The following waterfall chart explains why the model uncontroversially has Wout van Aert as the favourite.

Breakdown of prediction for Wout van Aert

The largest positive contribution comes from being Wout van Aert. This is because he has a lot of good results. His height and weight favour Milan Sanremo. He also has a strong positive coming from his team. This distance and race type make further positive contributions.

We can contrast this with the model’s prediction for Mathieu van der Poel, who is ranked 9th.

Breakdown of prediction for Mathieu van der Poel

We see a positive personal contribution from being van der Poel, but having raced fewer UCI events, he has less of a strong set of results than van Aert. According to the model the Alpecin Fenix team contribution is not a strong as Jumbo Visma, but the long distance of the race works in favour of the Dutchman. The day of year gives a small negative contribution, suggesting that his road results have been stronger later in the year, but this could be due to last year’s unusual timing of races.

Each of the other riders in the model’s top 10 is in with a shout.

It’s taken me all afternoon to set up this model, so this is just a short post.

Post race comment

Where was Jasper Stuyven?

Like Mads Pedersen in Harrogate back in 2019, Jasper Stuyven was this year’s surprise winner in Sanremo. So what had the model expected for him? Scrolling down the list of predictions, Stuyven was ranked 39th.

Breakdown of prediction for Jasper Stuyven

His individual rider prediction was negative, perhaps because he has not had many good results so far this year, though he did win Omloop Het Nieuwsblad last year and had several top 10 finishes. The model assessed that his greatest advantage came from the length of the race, suggesting that he tends to do well over greater distances.

The nice thing about this approach is that that it identifies factors that are relevant to particular riders, in a quantitative fashion. This helps to overcome personal biases and the human tendency to overweight and project forward what has happened most recently.

Hexagons in the Arctic Circle

An attractive aspect of hexagonal patterns is that they can repeat in interesting ways across a cycling jersey. This is partly due to the fact that a hexagon can be divided up into three equal lozenge shapes, as seen near the neck of the top right jersey. These shapes can be combined in imaginative ways, as displayed in the lower two examples.

This three-way division of a hexagon can create a 3D optical illusion called a “Necker cube”, which can appear to flip from convex to concave and back again. The orange patch can appear to be the top of a cube viewed from above or the ceiling in a corner, viewed from below. See if this happens if you stare at the image below.

Looking down on a cube or up into the corner of a room?

Spoiler alert: from here things gets a bit mathematical


A tessellation, or tiling, is a way of covering a plane with polygons of various types. Tessellations have many interesting mathematical properties relating to their symmetries. It turns out that there are exactly 17 types of periodic patterns. Roger Penrose, who was awarded the 2020 Nobel Prize in Physics for his work on the formation of black holes, discovered many interesting aperiodic tilings, such as the Penrose tiling.

While some people were munching on mince pies before Christmas, I watched a thought-provoking video on a related topic, released by the Mathologer, Burkard Polster. He begins by discussing ways of tiling various shapes with dominoes and goes on describe something called the Arctic Circle Theorem. Around the middle of the video, he shifts to tiling hexagon shapes with lozenges, resulting in images with the weird 3D flipping effect described above. This prompted me to spend rather a lot of time writing Python code to explore this topic.

After much experimentation, I created some code that would generate random tilings by stochastically flipping hexagons. Colouring the lozenges according to their orientation resulted in some really interesting 3D effects.

Algorithm flips a random hexagon to create a new tiling.


The video shows random tilings of a hexagonal area. These end up looking like a collection of 3D towers with orange tops. But if you focus on a particular cube and tilt your screen backwards, the whole image can flip, Necker-style into an inverted version where the floor becomes the ceiling and the orange segments push downwards.

I used my code to create random tilings of much bigger hexagons. It turned out that plotting the image on every iteration was taking a ridiculous amount of time. Suspending plotting until the end resulted in the code running 10,000 time faster! This allowed me to run 50 million iterations for a hexagon with 32 lozenges on each size, resulting in the fabled Arctic Circle promised by the eponymous theorem. The central area is chaotic, but the colours freeze into opposite solid patches of orange, blue and grey outside the circumference of a large inscribed circle.

Arctic Circle emerged on a hexagon of side 32 after 50 million iterations

Why does the Arctic Circle emerge?

There are two intuitive ways to understand why this happens. Firstly, if you consider the pattern as representing towers with orange tops, then every tower must be taller than the three towers in front of it. So if you try to add or remove a brick randomly, the towers at the back are more likely to become taller, while those near the front tend to become shorter.

Two examples of paths from left to right

The second way to think about it is that, if you look carefully, there is a unique path from each of the lozenges on the left hand vertical side to the corresponding lozenge on the right hand vertical side. At every step, each path either goes up (blue) or down (grey). The gaps between the various paths are orange. Each step of the algorithm flips between up-down and down-up steps on a particular path. On the large hexagon, the only way to prevent the topmost cell from being orange is for the highest path to go up (and remain blue) 32 times in a row. This is very unlikely when flips are random, though it can happen more often on a smaller size-6 hexagon like the one shown in the example.


A Jupyter notebook demonstrating the approach and Python code for running longer simulations are available on this GitHub page.

Back to cycling jerseys

The Dutch company DSM is proudly sponsoring a professional cycling team in 2021. And a hexagon lies at the heart of the DSM logo, that will appear on the team jerseys.