The Importance of Data Labeling Quality to Your AI Effort

December 15, 2020

I have long believed that technical success and financial success go hand in hand. In my earlier blogs, I talked about the importance of a solution approach to being successful with AI. The idea is to focus your limited resources on what will make your AI efforts technically and financially successful.  Getting a labeling services provider that can “make it happen” for you across a big ecosystem of many specialized labeling tools will help you keep a solution focus.

Given that the data labels or metadata are key inputs to the training and operation of your AI, it’s apparent that the quality of the data labels is critical to the AI’s success.  This means that your data labeling service provider has to be up to the task of integrating the right combination of its tools and specialized tools from business partners in a way that efficiently delivers quality.

The labeling services provider is like a general contractor who has his own people and resources – and selectively brings in specialized subcontractors to do specific parts of each labeling effort.  The general contractor is responsible to the client for overall delivery.  In this illustration, the general contractor has to have great working relationships with many specialized subcontractors – while having the people and processes in place to assure overall delivery with quality.

The extent that the labeling services provider’s people and processes “bake in” quality from start to finish across the client, the provider’s people, and the business partners selected for each customized labeling solution drives both technical and financial performance of your AI engagement.

I say, without question, the link between quality labels and effectiveness of AI training and the quality of AI outputs is direct. 

Quality labeling drives the financial performance of your AI program in both a direct and indirect way.  The direct way is that the technical performance of the AI improves due to better training and model outputs.  The indirect way I directly tied to the efficiency of working with the labeling service provider.  A provider with the people and processes to efficiently work with your people to specify and manage the creation of quality labels allows your people to spend more time building models and analyzing output – which is the part of the solution-first approach that delivers the insights that create the financial benefit.

Think about it! There is a big financial incentive for your AI team to spend less time preparing data. When as a data/analytics executive I had to describe the contributions of my team to senior organizational leadership, I typically used a simple comparison of the total budget for my department (including people, travel, software and outsourcing costs) to the department’s operating profit contributions across all analytic and AI projects.

If the department budget was $10 million per year and we increased operating profits by $50 million, we are operating at a 5/1 contribution to cost ratio.

Now suppose that the data scientists spent 60% of their time preparing data and only 40% of their time building models and analyzing the outputs.  If there were ways (like outsourcing data labeling) that would reduce data prep from 60% to 40%, then in theory the department’s operating profit contributions could increase because the data scientists would spend more time on the activities that drive financial success.   In our example, this could increase expected operating profit contribution from $50 million to $75 million per year.

Data labeling is one aspect of data preparation.  The above scenario illustrates the potential magnitude of the indirect boost to your AI program’s financial performance that can be achieved when you choose a labeling services provider that provides the needed high quality labels/metadata – and does so without squandering your data scientists’ time with inefficient processes, mistakes/omissions when choosing specialized tools from business partners, and/or failures to get the job done right the first time.

We can now see that quality data labeling services goes well beyond the accuracy of the labels.  Other quality dimensions include…

  • The quality of the project management processes established by your labeling services provider.  How much time does it take to specify the data labeling work? …to monitor and manage the work performed? How much time can you expect to spend on correcting work that was not specified right? …not executed right?
  • How much time it takes to identify problems with AI training, re-do training or otherwise correct for quality shortfalls in your AI’s training data?
  • How much time is spent trying to identify, and identify and correct avoidable problems with your AI’s operational outputs that are caused by shortfalls in training or operating data?

From a practical perspective, it is far better to do things right regarding data than to go down a rabbit hole of identifying and trouble shooting avoidable problems… and the easiest way to do that, is to avoid problems by working with a data labeling provider that “bakes in” quality at every step.

From a data executive’s perspective, what should you look for in a labeling services provider to assure that they are up to the quality challenge?

A starting point is that they have well trained and experienced people who know how to do the work, and efficient and effective processes to manage the work across the client’s people, their people, and the specialized business partners selected for each customized solution.  The service provider’s people, from the solution architects to the people doing the labeling, should be fluent in both the provider’s and business partners’ tools. Training and retention of the experienced people is important.

The service provider also has to be able to put together customized labeling solutions for each AI project.  The solutions often supplement the provider’s tools with carefully chosen specialized tools from across a big business partner ecosystem.  The challenge is to manage each customized solution in a way that efficiently “bakes in” quality at each step.

The big incentive in terms of AI program financial performance derives from both efficiently using the client’s AI team’s time and providing quality labeling on the first try.

In this blog, we looked at the importance of quality in terms of doing things right the first time – with both completeness and high accuracy. 

We’ve shown that quality has a big impact on the financial performance of your AI program with better AI model outputs and by increasing the percentage of time that your AI team spends creating value rather than preparing data – or trying to identify and correct problems caused by shortfalls in data quality. 

iMerit has understood the dynamics of ‘quality’ in labeling services from the start.  It understands the importance of developing and retaining full-time, data labeling specialists and creating efficient processes that “bake in” quality. iMerit’s people are fluent with iMerit’s tools and with many specialized business partner tools.

iMerit has also systematically cultivated and developed an extensive set of business partners.

In simple terms, you’ll benefit from the facts that iMerit has built its capabilities to work with you to efficiently specify customized labeling efforts for each of your AI initiatives, assemble the right combination of iMerit and business partner tools for each effort, and transparently manage everything to deliver high-quality data labels to make your AI program more successful – both technically and financially. — Anthony Palella

(Anthony Palella is an iMerit contributor. He created the CDO Offering at Accenture and led the deployment of the offering across commercial and government clients. Prior to his work at Accenture, Anthony was the CDO at Angie’s List, Head of Big Data Analytics at Kimberly-Clark, and Head of Analytics at Fox Interactive Media.)