Post

Edge Cases: How AI Pipelines Solve Hidden Data Challenges

One thing that is certain when it comes to edge cases: most occur within the last mile. As the application of autonomous technology widens, the varying operational complexities that hinder its adoption will compound exponentially. That’s because as operational complexity increases, so too do edge cases that autonomous models haven’t been trained to interpret yet.

After annotating over a billion images for use in autonomous vehicles, healthcare AI, and geospatial AI, iMerit has encountered its fair share of edge cases. Some typical examples include autonomous vehicles experiencing difficulty interpreting trailers attached to trucks, or teaching a model to recognize a pedestrian that’s wearing a costume.

But edge cases aren’t limited to just the last mile. In this piece, we will outline how iMerit has discovered and adapted to edge cases encountered while conducting annotations for our many cutting-edge clients. But it isn’t always about the object on the road. Sometimes, edge cases crop up when assumptions are made about how images should be annotated or how image batches are compiled.

Edge Case #1: The Shoulderless Road

iMerit teams were working with an autonomous vehicle manufacturer’s self-driving car. Previously, this company had all of its road data annotated using the shoulder of the road as a boundary which the car needed to recognize and avoid colliding into or crossing over. This annotative assumption posed certain challenges, such as the vehicle getting too close for comfort with the road shoulder. However, the greatest challenge surfaced when a road had no shoulder whatsoever.

Edge Case 1 The Shoulderless Road

 

iMerit created an annotation workflow that taught the computer to identify the sides of the roads themselves, rather than look for a shoulder. In doing so, the model was taught always to leave a space between itself and the road boundaries, which solved this particular edge case.

Edge Case #2: Clear or Cloudy?

This autonomous vehicle client asked iMerit to interpret a series of scenes based on whether the sky was clear or cloudy. Upon digging into the files, iMerit teams found that many of the images were neither clear nor cloudy.

n many cases, the sky was blue but dashed with clouds, while other times the sky was cloudy yet clear enough not to be considered cloudy.

Edge Case 2 Clear or Cloudy

IiMerit spent time assessing the images for features that were unique to them, regardless of the conditions overhead. iMerit’s annotation experts discovered that the most unique feature within the scenery was whether or not objects were casting shadows. If shadows were present, the vehicle would interpret the conditions as being clear. But if there were no shadows, the conditions would be interpreted as cloudy. After testing this annotation approach, iMerit delivered the annotated data, and the model performed as expected.

Edge Case #3: The Bad Batch

This client came to iMerit needing terabytes of image data annotated. The images had already been batched, and the client claimed that the only thing left to do was annotate the images. Upon examination of the batches, iMerit teams discovered that they had been inconsistently collected.

For example, one batch’s first image was taken at the 50-minute mark of its video origin, with the second image taken at the 15-minute mark and the third taken at the 20-minute mark. Every five images, they created another batch that was just as inconsistently compiled. Then they looked for consistency across batches. To solve this, iMerit teams noticed that the second image across batches was named similarly, which was the first sign of consistency in the batches. 

TThey began by annotating the second image in each batch and continued this across batches. Once the second image across each batch was annotated, iMerit used its annotation tool to list the first and third images of each batch side-by-side. Annotators could then reference them and figure out which ones were supposed to be together.

After some piecemealing of the batches, annotation followed, and consistency was achieved.

Edge Case #4: The Rungs

This client asked iMerit to annotate rasps on electrical poles across a series of drone images. However, each pole was very tall, and the time of day meant different shadows were cast by the pole that affected how the image should be annotated. Many of the images had varying angles as well that impacted how it needed to be annotated.

While the client had assumed that annotating the rasps on the poles would result in a consistent interpretation, they had asked members of the iMerit team independently to annotate these poles. This would have resulted in an inconsistent labeling process, but iMerit figured out how to solve it by leveraging a feature in the annotation tool that allowed annotators to assign a set of images to a unique operator. 

Edge Case 4 The Rungs

This way, one operator was annotating all of the images in a series, which corrected the inconsistencies. This level of professional input around the annotation workflow is another example of why a centralized annotation workforce with experience is superior to a crowdsourced annotation workforce. 

Crowdsourced vs Service Provider: Who is Better with Edge Cases?

When working without a centralized annotative process, there will be inconsistencies in the labeling. In each of the above scenarios, iMerit’s experience was the difference between solving the edge case and suffering from it. After years of annotating 1BN+ images, iMerit teams have developed workflows that leverage outside-the-box thinking that cutting-edge autonomous technology demands. 

Working with clients, iMerit can ask operators to annotate in unique ways to solve a certain edge case. Understanding a client’s needs is paramount to the success of their AI/ML project. Often, edge cases are a sign of a bigger problem on the client’s side. In the above “Clear or Cloudy” example, the client’s assumption that the image of the sky was the only way for the model to interpret the conditions wasn’t optimal.

Had that client been working with a crowdsourced workforce, the workforce would most likely have not challenged this assumption and would’ve spent the client’s time and resources annotating the images accordingly in a way that wouldn’t have performed as intended once fed into the model. The same applies across all edge cases, in that the centralized annotation workforce ended up solving much of the issue.

Conclusion

Edge cases will continue to be one of the biggest challenges for autonomous technology and AI pipelines. The difference between solving them and suffering from them often comes down to how data is managed, annotated, and integrated into your models. The above examples are barely the tip of the iceberg in terms of what iMerit has solved in the past and will continue solving in the coming years.

With iMerit’s Ango Hub, organizations gain a centralized platform to detect, track, and resolve edge cases efficiently. By combining human-in-the-loop expertise with automation-first workflows, Ango Hub ensures that anomalies and hidden data challenges are consistently addressed before being fed into AI models. iMerit Scholars, our specially trained annotation experts, bring the domain expertise and judgment needed to tackle even the most complex or unusual scenarios.

Together, Ango Hub and iMerit Scholars combine automation, human-in-the-loop expertise, and specialized judgment to solve edge cases reliably, giving AI projects the precision, confidence, and performance they need to succeed.