Overcoming Data Challenges in AI-Assisted Pathology

May 22, 2023

Technological advances and the increased focus on precision medicine have recently paved the way for developing digital pathology-based approaches for quantitative pathologic assessments, namely whole slide imaging and artificial intelligence (AI)–based solutions, allowing us to explore and extract information beyond human visual perception. Today, we are past the point of early adoption. There will be an inflection point where AI suddenly becomes a standard part of routine diagnostic practice. 

By analyzing large volumes of data and identifying patterns that may be difficult for human experts to detect, AI can provide new insights for treatment selection and prognosis. Furthermore, it brings equity to diagnostics worldwide, enabling patients to receive accurate diagnoses and treatments, regardless of location. 

Developing an effective AI-powered digital pathology solution is a challenging task requiring medical and technical expertise. The complex nature of the whole slide images (WSIs) requires creating a reliable AI model for diagnosis. 

This blog shares the various data challenges of AI-enabled digital pathology and ways to overcome them.

Common Data Challenges in Pathology AI

Training deep learning algorithms requires pathologists or trained experts to label standard H&E and immunohistochemistry glass slides. The labels can include patient outcomes, clinical classifications, and image annotations. Sometimes, AI models are trained on large sets of training data, like whole slide images, to avoid the need for manual labeling by pathologists. This method allows automatic extraction and identification of histopathological features. 

In pathology, obtaining high-quality data to improve AI models is not as simple as it seems. Tissue fixation and cutting and staining procedures vary between laboratories and cause morphological differences. This heterogeneity of input data is a challenge for AI methods.

Let us look at the data challenges companies face while developing AI models for digital pathology.

Lack of Training Data

AI algorithms typically require a vast collection of high-quality annotated training data. Annotation means that a pathologist delineates the region of interest in all images, indicating anomalies or malignancy. Annotation is typically best done by experts, but this is time-consuming and can also create financial obstacles for model development. Crowdsourcing may be a cheaper and faster alternative, but it risks introducing noise. 

For pathologists, the detailed annotation of large numbers of images can be mundane and challenging, especially when working with low-resolution or blurry images in slow networks and with ambiguous features. On top of this, only a few publicly available datasets containing labeled images are available for this purpose.


There are different tissue types, such as epithelium, connective tissue, nervous tissue, and muscle. However, in histopathology images and computer algorithms, the number of patterns derived from these tissues is almost infinite. The tissue types combine to form an organ with new textural variations, which makes it difficult for image algorithms to identify tissues. It creates a challenge for deep AI, which requires many training cases for each tissue variation and may not be readily available, especially as labeled data.


In digital pathology, research papers often focus on classification problems with only two possible outcomes, benign or malignant, which oversimplifies the complexity of pathology diagnosis. Pathologists consider clinical context, perception, and experience in their diagnosis process. Sometimes, pathologists use cautious language or descriptive terminology for complex and rare cases. Such language has ramifications for potential monitoring and treatment.

Scalability, Security & Cost

In addition to data challenges, other significant factors present challenges when training models for pathology. Scalability is an issue, as large datasets of high-resolution images can be computationally intensive and time-consuming to process. Additionally, data security is crucial, as medical images and patient information must be secure in compliance with regulatory requirements. Finally, the cost is a significant factor, as acquiring and storing gigapixel histopathological scans can be expensive, and the cost of advanced hardware, such as GPUs, can limit the adoption of AI-assisted digital pathology.

Overcoming Data Challenges in AI-Assisted Pathology with iMerit

Adherence to good machine learning practices during AI development helps to mitigate some of the risks. At iMerit, we recognize the intricacies of medical AI data and have developed a resilient solution to overcome the mentioned challenges.

End-to-End Data Solution

From training data to regulatory validation, iMerit’s trained pathology experts design custom data annotation workflows to achieve high-quality ground truth data and provide cost-efficient scaling. 

Specialized Workforce

To tackle the challenges of data complexity, variability, and scalability, iMerit has developed a specialized workforce with curriculum-driven training who work hand-in-hand with pathology experts for quality at scale. Our team includes US board-certified physicians for benchmarking and validation.

Tool-Agnostic Approach

iMerit’s tooling ecosystem with over six partners prioritizes the needs of healthcare clients and ensures compatibility with the project requirements.

HIPAA Compliance

iMerit is committed to maintaining the security and privacy of sensitive healthcare data. As such, we have implemented HIPAA-compliant measures to safeguard the data we handle. Additionally, we utilize regulatory-oriented tools designed to protect the data and ensure product success.


AI solutions for pathology have enormous potential to enhance efficiency, reduce costs, and speed up diagnosis. However, this potential can be realized if these AI models are trained on high-quality datasets using a combination of domain experts, trained specialists, and best practices.

iMerit is an industry leader in labeling, annotation, segmentation, transcription, and analysis of diverse data sets – images, text, video, audio, and more. We have extensive experience in providing annotation services across 20 million data points for the healthcare sector.

Are you looking for data specialists to advance your Pathology AI project? Here is how iMerit can help.