Clinical Grade Annotations
Open-source 3d mammogram dataset built for AI breast cancer detection.
iMerit, a leader in software-delivered AI data, in collaboration with Segmed and Advocate Health, announced the release of an open-source, annotated mammogram data set designed to accelerate the development of artificial intelligence (AI) applications for breast cancer detection. This data set has been meticulously reviewed and vetted by U.S. board-certified radiologists, ensuring the highest standards of accuracy and clinical relevance.
1 in 8
WOMEN DIAGNOSED
Early Detection
90%
5-YEAR SURVIVAL RATE
Detailed scientific methodology and clinical validation for this cohort can be found in the DBT-2026 preprint: “Technical Framework for De-identified 3D Mammography with Ground Truth Biopsies”. This study was co-authored by clinical experts from Segmed and iMerit.
Wu J, Perandini L, Batra T, Igoshin S, Bari S, de Araujo AL, Willemink MJ. DBT-2026, a de-identified publicly available dataset of digital breast tomosynthesis exams with ground truth biopsies.medRxiv 2026.03.03.25337924; doi: 10.64898/2026.03.03.25337924.
This initiative reflects a shared commitment by iMerit, Segmed, and Advocate Health to advance women’s health by fostering open research and collaboration. By making the dataset open-source, the partners aim to lower barriers for academic researchers, startups, and established institutions alike.
CHOOSE iMERIT
iMerit is the leader in providing AI data solutions for the leading brands and technology innovators advancing healthcare AI today. With over 10 years of data annotation experience, a full time workforce of 5,500 employees and world-class technology, iMerit provides the highest quality data in the industry.
Our data is labeled and reviewed by board certified radiologists and domain-trained annotators experienced in DICOM/NRRD workflows.
From mammography and tomosynthesis to ultrasound and MRI, we support advanced views for annotation and labeling: multi-resolution zoom, side-by-side comparisons, symmetry alignment, and pre-labeling and more.
What is included in the breast cancer imaging dataset?
The breast cancer imaging dataset includes de-identified medical images (e.g., mammography, ultrasound, or MRI depending on modality), curated for AI development and evaluation. It is enriched with clinically validated ground-truth labels and structured metadata to support detection, classification, and segmentation use cases in medical imaging AI.
How was the breast cancer dataset annotated and validated?
The dataset was annotated using a multi-reader clinical workflow, where qualified experts labeled findings with defined consensus processes. Human-in-the-loop quality assurance and adjudication were applied to ensure labeling consistency, reduce inter-observer variability, and produce regulatory-grade ground truth for model training and evaluation.
Is the dataset suitable for training regulatory-ready medical AI models?
Yes. The dataset is designed to support regulatory-aligned AI workflows, with emphasis on traceability, annotation documentation, and validation rigor. Structured labeling protocols and quality controls help support evidence generation needed for FDA, CE, or SaMD pathways.
Who should use this breast cancer imaging dataset?
This dataset is ideal for medical imaging AI startups, healthcare AI research teams, diagnostic platform developers, and life sciences organizations developing breast cancer detection and analysis models. It supports both research and commercial AI development stages.