With Machine Learning, It’s All About The Ground Truth Data

June 19, 2019

AI in healthcare and medicine are growing areas of investment, reaching almost $1.3
billion across 107 deals in 2017. The availability of the right data at scale is a critical factor in this space.

In the Global AI issue of AIMed magazine, iMerit Solutions Architect Dr Sina Bari writes about the data bottleneck and how it is tackled with a skilled workforce of data contributors.

Some excerpts from the article:

The art and science of diagnostics is perhaps the most difficult challenge for AI as each clinician’s decision is often informed by a lifetime of experience and study. It is also part of the medical pipeline that offers the greatest opportunity for machine learning to produce a significant impact. An accurate image analysis algorithm, for instance, can free up the bottleneck caused by the low specialist to patient ratio in most geographies around the world. But for AI to be successful, three facets of the data challenge – scale, accuracy, and cost – have to be resolved to train models to reach their full potential. The sheer volume of medical data presents a challenge to scaling. A single MRI, CT, PET, or ultrasound scan generates thousands of images, and up to 80% of a radiologist’s time is spent going through each image one at a time and organizing the findings. However if experts like radiologists, cardiologists, or pathologists have to work on segmentation and labeling of datasets with hundreds of thousands of images for the purpose of teaching algorithms, the same bottleneck becomes further aggravated. Using medical experts to label the data also quickly escalates the expense of the operation and the cost of building a viable product.

There are several advantages to partnering with a dedicated in-house workforce. The secure environment helps customers manage their data pipeline with confidence. Once the project is underway, a continuous and iterative review process takes place, particularly while dealing with tricky edge cases. Large volume data exposure combined with expert supervision then provides insights to long-tail problems and obstacles. Lastly, a multipass workflow ensures the degree of precision and accuracy necessary for medical applications. Labelers gain further expertise by expanding their understanding to new modalities and pathologies over time.

Download the entire article here.