The most effective data annotation teams succeed through highly scientific, systematic processes – but present throughout is the steady hand of experienced managers who bring a very human perspective to image – and pattern – recognition.
Machine learning is very much a scientific process but with a bit of artistry thrown in; teaching an algorithm to operate an autonomous crop spraying machine through training material selected by human data analysts who need to educate by examples. But what of the entire annotation process? How are successful annotation projects architected, and what are the essential skills for the people in charge?
While many geospatial data experts come from computer science backgrounds, the best annotation venders tap a diverse group of experts who together bring a rich set of technical, cultural, and contextual experience. It’s not just their expertise that matters, but also their ability to help transfer essential elements of their subject matter expertise to agile teams of data annotators.
Elizabeth Pratt, for example, an iMerit solutions manager for the company’s geospatial team focused on agriculture, earned a PhD in Linguistics from the City University of New York, and that background underscores the inherent relationship between image annotation and language structure and analysis. Linguistics is, after all, the study of language form, meaning and context – essential elements for the annotation of images – and a blend of quantitative and qualitative analysis.
That analytical approach begins…at the beginning, when annotation team and client commence the collaboration on machine learning. “You have to start by understanding the end goal and the path that gets you there,” Pratt says. “What is the question you’re trying to answer?”
The process requires an empirical mindset that consistently examines the ongoing process, both from the standpoint of pattern recognition that must be transferred to an artificial intelligence algorithm, and the annotation work itself. “Are we analyzing correctly,” Pratt says of the bottom-line deliverable.
The solutions manager can work with a team of dozens of annotators, sometimes hundreds over the life of a project, and the process of initial review of the data to be annotated, definitions of relevant data, determination of accuracy metrics – all of the components for a successful engagement – have to be communicated from the initial collaboration between client and iMerit solutions managers and architects to the individual data analysts who will perform the annotations.
Weekly meetings involving iMerit data analysts, client data scientists, and iMerit solutions managers function as feedback and training loops that review accuracy, identify systemic flaws, and develop corrective strategies. The process – particularly in the crucial early stages of a project – often has an exponential impact on accuracy. Where crowd-sourced annotation can be hard pressed to approach even 40-50 percent accuracy, a well-designed training loop can double that within 30-45 days of the initial work. When can the project be deemed a success, and the work complete? When can the AI algorithm graduate to the field? It’s a simple, direct – and yet an eternally eye-of-the-beholder answer: “When,” says Pratt, can a human not do it any better.”