Self-driving cars employ sophisticated software to interpret the world around them. How do these systems work? And how good are they at detecting cyclists? Can cyclists feel safe sharing roads with an increasing number of vehicles that make use of these systems?
How hard is it to spot a cyclist?
Vehicles can use a range of detection systems, including cameras, radar and lidar. Deep learning techniques have become very good at identifying objects in photographic images. So one important question is how hard is it to spot a cyclist in a photo taken from a moving vehicle?
Researchers at Tsinghua University, working in collaboration with Daimler, created a publicly available collection of dashboard camera photos, where humans have painstakingly drawn boxes around other road users. The data set is used by academics to benchmark the performance of their image recognition algorithms. The images are rather grey and murky, reflecting the cloudy and polluted atmosphere of the Chinese city location. It is striking that, in the majority of cases, the cyclists are very small, representing around 900 pixels out of the 2048 x 1024 images, i.e. less than 0.05% of the total area. For example, the cyclist in the middle of the image above is pretty hard to make out, even for a human.
Object-detecting neural networks are typically trained to identify the subject of a photo, which normally takes up are significant portion of the image. Finding a tall, thin segment containing a cyclist is significantly more difficult.
If you think about it, the cyclist taking up the largest percentage of a dash cam image will be riding across the direction of travel, directly in front of the vehicle, at which point it may be too late to take action. So a crucial aspect of any successful algorithm is to find more distant cyclists, before they are too close.
Setting up the problem
Taking advantage of skills acquired on the fast.ai course on deep learning, I decided to have a go at training a neural network to detect cyclists. Many of the images in the Tsinghua Daimler data set include multiple cyclists. In order to make the problem more manageable, I set out to find the single largest cyclist in each image.
If you are not interested in the technical bit, just scroll down to the results.
The technical bit
In order to save space on my drive, I downloaded about a third of the training set. The 3209 images were split 80:20 to create a training and validation sets. I also downloaded 641 unseen images that were excluded from training and used only for testing the final model.
I used transfer learning to fine-tune a neural network using a pre-trained ResNet34 backbone, with a customised head designed to generate four numbers representing the coordinates of a bounding box around the largest object in each image. All images were scaled down to 224 pixel squares, without cropping. Data augmentation added variation to the training images, including small rotations, horizontal flips and adjustments to lighting.
It took a couple of hours to train the network on my MacBook Pro, without needing to resort to a cloud-based GPU, to produce bounding boxes with an average error of just 12 pixels on each coordinate. The network had learned to do a pretty good job at detecting cyclists in the training set.
The key step was to test my neural network on the set of 641 unseen images. The results were impressive: the average error on the bounding box coordinates was just 14 pixels. The network was surprisingly good at detecting cyclists.
The 16 photos above were taken at random from the test set. The cyan box shows the predicted position of the largest cyclist in the image, while the white box shows the human annotation. There is a high degree of overlap for eleven cyclists 2, 3, 4, 5, 6, 8, 11, 12, 14, 15 and 16. Box 9 was close, falling between two similar sized riders, but 7 was a miss. The algorithm failed on the very distant cyclists in 1, 10 and 13. If you rank the photos, based on the size of the cyclist, we can see that the network had a high success rate for all but the smallest of cyclists.
In conclusion, as long as the cyclists were not too far away, it was surprisingly easy to detect riders pretty reliably, using a neural network trained over an afternoon. With all the resources available to Google, Uber and the big car manufacturers, we can be sure that much more sophisticated systems have been developed. I did not consider, for example, using a sequence of images to detect motion or combining them with data about the motion of the camera vehicle. Nor did I attempt to distinguish cyclists from other road users, such as pedestrians or motorbikes.
After completing this project, I feel reassured that cyclists of the future will be spotted by self-driving cars. The riders in the data set generally did not wear reflective clothing and did not have rear lights. These basic safety measures make cyclists, particularly commuters, more obvious to all road users, whether human or AI.
Car manufacturers could potentially develop significant goodwill and credibility in their commitment to road safety by offering cyclists lightweight and efficient beacons that would make them more obvious to automated driving systems.
“A new benchmark for vision-based cyclist detection”, X. Li, F. Flohr, Y. Yang, H. Xiong, M. Braun, S. Pan, K. Li and D. M. Gavrila, in proceedings of IEEE Intelligent Vehicles Symposium (IV), pages 1028-1033, June 2016