Today, most agree AI is a key enabler in the expanding pursuit of business value. We have started the journey to ubiquitous AI. More companies are adopting AI, and using it in more solution areas.
The Wall Street Journal’s CIO Blog says “the necessary ingredients have finally come together to propel AI beyond early adopters to a broader marketplace”. The CIO Blog further says that “AI advances have the potential to increase global GDP by 14% between now and 2030”.
Modern AI relies both on traditional structured data and unstructured data like video, photos, and free form text. The good news is that the data that AI needs to work over an expanding array of solutions is available. The capture and storage of unstructured data has been helped along by the widespread adoption of unstructured data stores such as Hadoop.
The bad news is that the raw unstructured data often has to be labelled with metadata before it can be used by an AI application. The metadata is crucial to AI’s use of the data.
The availability of large Amounts of quality metadata is a key enabler for AI. The availability of quality metadata is, in fact, key to unlocking the value of vast amounts of unstructured data. Some of the metadata can be created in automated ways like Optical Character Recognition (OCR). Very importantly, other metadata can be created as a service by people trained to create it.
Interestingly, since automated capabilities to create metadata are often based on AI/Machine Learning (ML), you can have an AI/ML automated metadata creation capability providing the metadata needed by the solution AI.
The AI/ML based automated capability needs to be trained on “training data sets”. People are often needed to create the metadata for the data sets used to train the AI/ML capability. In other cases, it is not practical for AI/Machine Learning capabilities to create all the metadata needed to train and operate the solution AI. People need to directly create this metadata.
Given that human involvement is needed to create metadata for the solution AI in the two scenarios described above, it is not surprising that cloud services providers like AWS have included iMerit as a provider of metadata creation / data labeling services as part of their cloud ecosystem. (Pls see imerit.net/partners/aws.)
There are best practices for creating value with a “data labeling” service provider. It is, perhaps, most important to “keep your eyes on the prize” by staying focused on creating a solution. Technology options are just a means and not an end. The solution is the end. Not all technology choices perform strongly in ways that are critical to your success.
Keep your solution focus by choosing a labeling services vendor that has a combination of an effective and efficient process to work with you and experienced data labeling people to assure high quality, fast throughput, and a practical price. As a practical reality, quality has to be “baked in” to the processes that define the needed metadata and guide the performing of the manual labeling.
Fast throughput is achieved through a combination of avoiding missteps, strong processes, and people who are both experienced and dedicated to the work.
Keep your solution focus by choosing a labeling services vendor who can either sort through the technology options with you or implement the solution with the technology choices you have made – including your proprietary technologies. This implies that the service provider’s people are fluent in many technologies. It is a positive sign when the service provider has strong relationships with many relevant technology partners.
From hard experience, I encourage you to proactively avoid big potential problems by choosing a labeling services vendor with high enterprise security awareness. It is a good sign when the service provider brings up security reviews as an integral part of their service.
Consider choosing a labeling service provider that offers a Proof of Concept (POC).
- You get a sense of the nuances of the metadata that will be provided. These nuances may be important to the success of your AI initiative.
- You get a sense of how well the service provider will take care of you.
- You get a sense of everything you’re getting for your money – not just output but processes, baked-in quality, and much more.
In summation, whether you call it creating metadata or labeling data, it is vital to get value from your AI solutions – both in terms of training the AI and keeping the AI operational.
Although data labeling tools can often do a lot of the job, a big part of the job often has to be done by well trained and dedicated people. This is critical to getting high-quality metadata at scale.
As part of keeping a solution focus, consider choosing a service provider for data labeling / metadata creation that shares your solution focus. — Anthony Palella
(Anthony Palella is an iMerit contributor. He created the CDO Offering at Accenture and led the deployment of the offering across commercial and government clients. Prior to his work at Accenture, Anthony was the CDO at Angie’s List, Head of Big Data Analytics at Kimberly-Clark, and Head of Analytics at Fox Interactive Media.)