This post is by Emanuel Ott, a Solutions Architect at iMerit and an expert in machine learning and computer vision. It summarizes a talk given at the Machine Intelligence in Autonomous Vehicles Summit in Amsterdam.
To create an algorithm that learns to ‘see’ a typical road the way humans do, data experts first need to classify and then label the different components of the road: for example, “this is a tree, this is another car, this is the curb of the road”. A process which is natural to the human eye and brain needs to be entirely dissected in order to build the data that feeds the algorithm that powers image recognition for a self-driving car. This is not without challenges, the chief one for data experts being: how do I ‘tell what I see’ in a particular image of a road, in words that are predefined and common to all the data experts working on one data set? And what happens when two autonomous vehicles that are trained on different datasets meet? How do they agree on whose rules to use, the way humans automatically agree on following standard driving rules? Taxonomy is a challenge that is common to all fields in machine learning, yet the issue especially critical in the emerging reality of autonomous driving because of its obvious implications for public safety.
To complicate matters further, taxonomy is only one part of an equation that includes other variables such as time available and the level of accuracy that is required on a project. For example, can we adopt a wider ‘flat’ taxonomy that allows for greater speed in execution, but does not include crucial variations within categories ? In the talk, I took the example of roadside ‘vegetation’ as a class of data that the algorithm would be trained to recognize. However, it could be important for the car to be able to distinguish between ‘grass’ (that it can drive on) and ‘agricultural land’ (that is hazardous to navigate). To be efficient, however, most companies choose to adopt a flat taxonomy that does not allow for subclasses within a category of data. The complexity of the scene itself is another element that influences taxonomy: it is not the same to label the components of an urban scene versus a rural road, or a road at night and a road during the day. The challenge becomes broader when you include non-passenger vehicles, like autonomous farm vehicles or trucks.
Lastly, the taxonomy problem is also compounded by the succession of development cycles: it happens that one labeling effort is focused on creating bounding boxes around “Bicycles” while the next round would need labeling of “Bicyclists”. This conflicting taxonomy often introduces the possibility of biases and errors in the labeling of data.
One possible solution to solving the problem of semantic classification is to take an empirical approach to category-naming. Many of the issues linked to data classification arise from the fact that categories are ‘abstract’ (eg: ‘vegetation’ is a high-level concept. In real life, people are more likely to use words like ‘grass’ or ‘nature’). Before you start your data labeling project, take a survey of the people responsible for annotating the data and agree to use the word that most people have intuitively selected to describe the category. If two words are commonly used by the people in your group, you can already forecast issues with the taxonomy of data on your project.
Several companies are working on self-driving cars at the moment, but there is no unified standard on how to teach these cars to ‘see’. I believe now would be the right time to question the assumptions around the data that powers these vehicles, and work towards creating unified standards for labeling this training data. One way to put it is: “self driving cars are safer when they talk to each other”. What’s more, having a unified standard for road data annotation would free up resources to focus on other challenges such as improving the tools to annotate the data. My final prediction is that a movement towards unification will indeed start to take shape in the near future. This will happen either organically through companies opening their datasets or through regulators enforcing industry-wide rules on data labeling.
You can watch Emanuel’s full talk and learn more about the topic at this link.