My earlier post on cycling art provided an engaging way to consider the creative potentials of deep learning. I have found myself frequently gravitating back to the idea, using the latest code available over at fast.ai. The method uses a neural network to combine the content of a photograph with the style of an artist, but I have found that it takes a few trials to find the right combination of content versus style. This led to the idea of generating a range of images and then running them together as a movie that gradually shifts between the base image to a raw interpretation of the artist’s style.
Using a range of artistic styles from impressionist to abstract, the weights that produced the most interesting images varied according to the photograph and artistic style.
My selected best images are shown below, next to snippets of the corresponding artworks. It turned out that the impressionist artists (Monet, Van Gogh, Cézanne and Braque) maintained the content of the image, in spite of being more heavily weighted to artistic style. In contrast, the more monochromatic styles (O’Keeffe, Polygons, Abstract as well as Dali) needed to be more strongly weighted towards content, in order to preserve the cyclist in the image. The selections for Picasso and Pollock were evenly balanced.
Every image is unique and sometimes some real surprises pop up. For example, using Picasso’s style, the mountains are interpreted as rooftops, complete with windows and doors. Strange eyes peer out the background of finger-shapes in the Dali image and the mountains have become Monet’s water lilies. The Pollock image came out very nicely.
The approach was based on the method described in the paper referenced below. Running the code on a cloud-based GPU, it took about 30 seconds for a neural network to learn to generate in image with the desired characteristics. The learning process was achieved by minimising a loss function, using gradient descent. The clever part lay in defining an appropriate loss function. In this instance, the sample image was passed through a separate pre-trained neural network (VGG16), where the activations, at various layers in the network, were compared to those generated by the photograph and the artwork. The loss function combined the difference in photographic content with the difference in artistic style, where the critical parameter was the content weighting factor.
I decided to vary the content weighting factor logarithmically between around 0.1 and 100, to obtain a full range of content to style combinations. A movie was be produced simply by packing together the images one after the other.