My last blog showed the results of using a deep convolutional neural network to apply different artistic styles to a photograph of cyclist. This article looks at the trendy topic of Generative Adversarial Networks (GANs). Specifically, I investigate the application of a Wasserstein GAN to generate thumbnail images of bicycles.
In the field of machine learning, a generative model is a model designed to produce examples from a particular target distribution. In statistics, the output might be samples from a Gaussian distribution, but we can extend the idea to create a model that produces examples of sonnets in the style of Shakespeare or pictures of cats… or bicycles.
The adversarial framework introduces an attractive idea from game theory: to create a competitive form of learning. While a generator learns from a corpus of real examples how to create realistic “fakes”, a discriminator (or critic) learns to distinguish been fakes and authentic examples. In fact, the generator is given the objective of trying to fool the discriminator. As the discriminator improves, the generator is driven to enhance the authenticity of its output. This creates in a virtuous cycle.
When originally proposed in 2014, Generative Adversarial Networks stimulated much interest, but it proved hard to make them work reliably in practice. One problem was “mode collapse”, where the generator becomes stuck, producing the same output all the time. However, this changed with the publication of a recent paper, explaining how earlier problems could be overcome by using a so-called Wasserstein loss function.
As an experiment, I downloaded a batch of images of bicycles from the Internet. After manually removing pictures with riders and close-ups of components, there were about 1,200 side views of road bikes (mostly with handlebars to the right, so you can see the chainset). After a few experiments, I reduced the dataset to the 862 images, by automatically selecting bikes against a white background.
As a participant of part 2 of the excellent fast.ai deep learning course, I made use of WGAN code that runs using Pytorch. I loaded the bike images at thumbnail size of 64×64 (training with larger images exceeded the memory constraints of the p2.large GPU I’m running on AWS). It was initially disappointing to experience the mode collapse problem, especially because the authors of the WGAN paper claimed never to have encountered it. However, speeding up the learning rate of the generator seemed to solve the problem.
Although each fake was created from a completely random starting point, the generator learned to produce images against a white background, with two circles joined by lines. After a couple of hundred iterations the WGAN began to generate some recognisably bicycle-like images. Notice the huge variety. Some of the best ones are shown at the top of this post.
I tried to improve the WGAN’s images, using another deep learning tool: super resolution. This amazing technique is used to solve the seemingly impossible task of converting images from low resolution to high resolution. It is achieved by taking downgraded versions of a large dataset of high resolution images, then training a neural network to reproduce a high-res version from the corresponding low-res input. A super resolution network is able to learn about certain properties of the world, for example, it converts jagged curves into smooth ones – a feature I’d hoped might be useful for making wheels look rounder.
Example of a super resolution network on real photographs
Unfortunately, my super resolution experiments did not lead to the improvement I’d hoped for. Two possible explanations are that a) the fake images were not low-res photos and b) the network had been trained on many types of images other than bicycles with white backgrounds.
Example of super resolution network on a fake bicycle image
In the end I was pretty happy with the best of the 64×64 images shown above. They are at least as good as something I could draw by hand. This is an impressive example of unsupervised learning. The trained network is able to use some learned notion of what a bicycle looks like in order to produce new images that possess similar properties. With more time and training, I’m sure the WGAN could be improved, perhaps to the point where the images might provide creative inspiration for new bike designs.