Why Human Feedback in AI Model Tuning Still Matters: Comparing Meta’s SAM to Purpose-Built Models for Specialized Use Cases

The speed with which AI is advancing is undeniable. Every week, advancements in technologies such as computer vision, generative language AI, and robotics make headlines – promising more, better, and faster to the people and businesses that use them. In turn, the world of data annotation is advancing at a similar pace. As data hungry AI and ML models require higher quality to achieve better precision, annotation tools, experts, and people must be better equipped to meet the demand.

To meet this demand, data annotation technologies require automation to deliver better scale and speed throughout the MLOps lifecycle. A number of tools have been developed to address the need to deliver better accuracy while annotating increasingly large volumes of data – all while keeping costs down.

Enter SAM

In April, 2023, Meta introduced its Segment Anything Model (SAM). The intent of SAM is to help AI teams automate their image and video segmentation processes by providing a powerful model trained on a massive dataset. The promise of SAM is that with the world’s largest segmentation dataset of more than a billion masks and 11 million images, it is a powerful tool to generalize to new types of images and video.

Some pundits claimed that with models like SAM, human-assisted data annotation would quickly become a thing of the past. Although SAM can automate many image annotation tasks with confidence, the model can struggle to accurately annotate images as complexity grows. This is especially true for specialized use cases such as medical imaging or autonomous vehicles, where precision is paramount and can be a life or death matter.

Testing the Possibilities

iMerit researchers devised a test to better understand SAM’s capabilities – especially when examining data for highly specialized use cases, such as radiology imaging for medical AI. The team built a custom A/B test project annotating radiology images provided through the public database with the National Institute of Health (NIH). The test compared SAM’s ability to identify anatomic structures and pathology in medical images against a human expert-trained model custom built by iMerit engineers.

The addressable problem was to identify tumors in images of diseased lung tissue. To establish a ground truth for comparison, the images and results were vetted by a board-certified radiologist. The images contained tumors and anomalies of varying sizes, and the models were used to identify and annotate occurrences of tumors. SAM was able to accurately identify lung tissue in the image scans in the axial plane and when the lungs were easily distinguishable. However, SAM failed to identify lung tissue when the lungs were obstructed and was unable to identify any tumors of any size within the images.

iMerit found that a customized ML model was necessary to get the accuracy needed for radiology image annotation. In order to test the accuracy level of SAM’s out-of-the box capabilities against a more specialized approach, iMerit developed a custom-trained model to analyze the medical data and identify diseased tissue in the lungs, including different types and sizes of tumors. Using expertly trained human annotators with a background in radiology imaging, iMerit built a high -quality training set and trained a custom model with a continuous feedback loop. The iMerit model developed the ability to not only more accurately identify organ types – even with obstructions, but also to identify tumors. Using this approach, the model was able to operate with greater precision and overcome edge cases that came up in the source data.

The following images outline examples of the iMerit model vs. SAM in correctly identifying tumors. The images on the left using iMerit’s model accurately identified tumors by measuring a number of factors specific to medical imaging data, including the combination of Dice coefficient and cross entropy on axial images. The tumor identification by the iMerit model was independently verified by a radiology expert. By comparison, it is clear that the generalized SAM model was unable to identify tumor tissue with any degree of accuracy.

The Need for the Expert in the Loop

The results of the test were clear. In its original iteration, SAM was unable to accurately identify tumors in the images. Whether it was delivering false positives or failing to identify tumors altogether, the original SAM model struggled to precisely annotate more sophisticated data sets found in medical imaging. Following training from experienced experts-in-the-loop, iMerit’s custom-built model trained by experts-in-the-loop achieved more than 75% accuracy for tumor identification overall, and more than 95% accuracy for large tumors with diameters of 3 cm or greater. For context, it is important to note that studies have shown that in some areas of south Asia, radiologists are overtaxed with the sheer volumes of images they must examine. As a result, the error rate for identifying or diagnosing tumors in scans in some countries can reach as much as 70%. Using a model such as the tuned expert-in-the-loop model developed by iMerit can yield dramatically improved results and help radiologists prioritize patients to deliver more life-saving treatments.

iMerit’s test proved that the combination of technology, talent, and technique – driven by human experts-in-the-loop can augment even today’s hottest and newest technologies. Precision results require far more than the model and the annotation. The real power of driving results comes from subject expertise. Human experts-in-the-loop that understand the problem and deliver the right insights and solution can have a dramatic impact on outcomes.

Taking it to Scale: Life Saving Advancements

Expertly trained models such as iMerit’s demonstrate a small step toward using machine learning to drive better healthcare outcomes for patients. It should be noted that few would expect a generalized model such as SAM to immediately be able to identify complex cases such as tumors in radiology scans. However, there are many in the industry that claim that generalized models such as SAM eliminate the need for human-driven annotation. It’s clear that this is not yet the case.

Moreover, by using human experts-in-the-loop, iMerit was able to easily surpass the limitations of SAM’s annotation technology in delivering precision results or identifying edge cases. Despite advances in automated annotation technologies and tools, most experienced AI and ML practitioners agree there is still significant need for human-in-the-loop. In a recent iMerit/VentureBeat study, 86% of respondents indicated subjectivity and inconsistency are the primary challenges for data annotation in any ML model. Additionally, 65% of respondents also stated that a dedicated workforce with domain expertise was required for successful AI-ready data.

This expert-driven process represents a promising future combining automated labeling and human expertise to deliver better precision. According to the State of MLOps report, 82% of AI experts reported that scaling wouldn’t be possible without investing in both automated annotation technology and human data labeling expertise.

There is still a long way to go for automated AI to accomplish what a radiologist can do. However, there is immediate value in results such as the one in this project. With tools such as the model built for this project, it can help doctors to more quickly identify cancers and prioritize which patients need treatment. Using an expert-trained model such as the one developed by iMerit can help relieve the volume of scans that radiologists must examine, helping them to reach more patients and deliver life-saving treatments.

iMerit has deep expertise and experience managing data for many of the world’s top innovators in medical AI, working with leading pharmaceutical companies, device manufacturers, health plans, and provider networks to deliver quality, secure, HIPAA-compliant data solutions both locally and off-shore. iMerit’s experts-in-the-loop are trained, supervised, and quality checked by board certified radiologists to help deliver high-quality data to help improve model precision.

For more information on iMerit’s data management and annotation solutions for medical AI datasets, read here.

Are you looking for data annotation to advance your Medical AI project? Contact us today.

Talk to an expert

Enter SAM

Testing the Possibilities

The Need for the Expert in the Loop

Taking it to Scale: Life Saving Advancements

Are you looking for data annotation to advance your Medical AI project? Contact us today.

You might also like

Medical Data De-identification: Ensuring Privacy and Compliance in 2024

Challenges and Benefits of Data De-identification in Healthcare Analytics

The Impact of GDPR on Healthcare Data De-Identification: What You Need to Know

Subscribe to our newsletter