Crowdsourced Data Labeling: When To Use it, and When Not To

September 17, 2021

It’s a fact of life: machine learning and deep learning, while revolutionary, require tremendous volumes of data. Even with algorithmic processes like web scraping that automate data collection, companies still require annotators to label the data before they can use it to train an AI or ML model. Often in a hurry to develop an algorithm, companies turn to crowdsourced workforces for easy annotation.

Crowdsourced data labeling

But is that always the best option? With crowdsourcing platforms like Amazon Mechanical Turk, your data can essentially be annotated by anyone. In this article we’ll investigate why this may not be the best approach to data annotation and how subject-matter experts can make-or-break a successful AI project.

Crowdsourcing: Good, Bad, and Ugly

  • The Good
    • Affordable
    • 24/7 worldwide workforce
    • Annotation accuracy sometimes reaches 97%

Price will always be crowdsourcing’s ultimate appeal. Companies who choose crowdsourcing as their annotation path stand to save thousands while potentially gaining access to annotated data that’s primed for an AI or ML algorithm. This annotation method is also easily accessible and rapidly deployable, making it a go-to source for annotation for all the busy data scientists that don’t have time to research other approaches to annotation. 

Crowdsourcing Good Bad and Ugly

While quality control is often touted as a fault of crowdsourcing, simple tasks can sometimes yield much better results. A recent study carried out by found that the expected accuracy of crowdsourcing, depending on the number of annotation tasks, annotators, and annotator experience, is estimated to be between 74.36% and 97.4%

It should be noted that the key differentiator between poorly annotated data and accurately annotated data is annotator experience and subject matter expertise.

  • The Bad
    • Poor quality control
    • Deficit in subject-matter expertise
    • Security

Fundamentally, Amazon Mechanical Turk and similar crowdsourcing services act as marketplaces. Requesters post jobs called Human Intelligence Tasks (HITs), and workers select the ones they’re interested in and submit their work. The challenge with this crowdsourced approach is that the Requester (aka, you) have limited control over the qualifications of whoever fulfills your service request.

This means you’ll spend time poring through the outputs generated by your workers and deciding which to accept or reject, which isn’t exactly time well spent. Amazon MTurk understands this caveat as well, as displayed on their FAQ page:

“Virtually anyone can complete tasks on the Amazon Mechanical Turk (MTurk) web site using the skills they already have and according to their own schedule. The only requirement to complete tasks and collect payment from Requesters is a computing device connected to the Internet and to be at least 18-years-old.”

While entry-level annotators may suffice for simple tasks like labeling dogs in a still image, other tasks require vetted experts handling the annotation. If your company uses machine learning to perform cancer diagnoses from X-Ray or MRI imagery, you’ll want a credentialed radiologist handling the labeling. 

If you’re working with legal documents, you’ll almost certainly prefer having a trained law clerk performing the annotation. Generalized services such as Amazon Mechanical Turk do not provide you with the flexibility to choose expert-level annotators for your task.

Since crowdsourced workers aren’t typically held to any security or privacy standards, there are no real safeguards to prevent workers from sharing your data. This is especially important if you’re working in a domain where compliance to some standard such as HIPAA is legally necessitated. In such instances, you’re putting your company in legal jeopardy by outsourcing data annotation to a non-secure service.

  • The Ugly
    • Ethically questionable
    • Poor annotation has consequences
    • Unreliable annotation accuracy

Companies today are rapidly prioritizing ethical labor sourcing as part of their social impact mission(s). While crowdsourcing’s greatest strength is in its pricing, so too is its weakness. Annotators around the world are paid wages that are barely enough to live on, and studies are rapidly proving that crowdsourced workers are becoming an exploited source of labor. This is because crowdsourced workers are legally considered contractors, which makes minimum wage laws inapplicable to them. 

As such, companies are questioning crowdsourcing as a means to labeling their data. To combat this, companies utilizing a crowdsourcing platform are estimating the time it takes to complete an annotation task and then apply the U.S. per-hour minimum wage to that task. While this is the ethical approach, for many it defeats the purpose of crowdsourcing as it doesn’t guarantee better work.

As machine learning algorithms are only as good as the data they’re presented with, poorly-annotated data can have tremendous consequences. Feeding it with incorrectly labeled data is worse than having unlabeled data as the algorithm learns to predict the incorrect labels. This means future outputs of the model will be corrupt due to bad data, which can have catastrophic consequences. 

For example, a person’s cancer could be missed on an X-Ray, or an autonomous car could crash into a pedestrian. In all instances, having accurately annotated data is crucial to getting the best performance out of your model.

Why Subject-Matter Expertise Counts

Data solutions vendors offer access to teams of hundreds of vetted and verified experts across highly-specialized domains to accurately annotate your data. These teams work quickly to label your data with an average accuracy exceeding 95%, depending on the industry and the vendor. 

For example, in order to section lesions of an image for surgery and identify relevant anatomical structures, the annotator should have some understanding of the surgery being performed, what to look for, and how to label it. The annotator would ideally have performed the surgery themselves, and can understand the nuances of it or at least be guided by someone that does.

Why Subject-Matter Expertise Counts

Similarly, labeling medical documents requires an understanding of the medical lexicon, which terms are most relevant, and how to redact personally identifiable patient information. In all cases, having a subject-matter expert do the annotation is paramount to getting the job done right.

Subject matter experts are invaluable to autonomous technology as well. Subject-matter experts are needed to accurately label signage and road hazards such as stop signs, lights, pedestrians, construction, lampposts, sidewalks, and more. Due to the complexity of the average driving scenario, a highly-qualified team must vet the annotation work to ensure there are no hazards or relevant objects go unmissed. Missing relevant items can cause an autonomous vehicle to become poorly trained and thus lead to accidents on the road and lawsuits in the courtroom.


The complexity of your project will ultimately determine the subject matter expertise of your annotation workforce. While crowdsourcing may be ideal for smaller tasks, AI and ML’s best use cases demand subject matter expertise to perform adequately. If you firmly believe that your project is simple, then crowdsourcing may be the best approach for you. But if you’re looking for optimal AI and ML outcomes, then consider speaking with an iMerit expert today.