3D Mammogram Dataset

Clinical Grade Annotations

Open-source 3d mammogram dataset built for AI breast cancer detection.

3D Mammogram Dataset

Open-Source 3D Mammogram Dataset for Breast Cancer AI

iMerit, a leader in software-delivered AI data, in collaboration with Segmed and Advocate Health, announced the release of an open-source, annotated mammogram data set designed to accelerate the development of artificial intelligence (AI) applications for breast cancer detection. This data set has been meticulously reviewed and vetted by U.S. board-certified radiologists, ensuring the highest standards of accuracy and clinical relevance.

1 in 8

WOMEN DIAGNOSED

Early Detection

SAVES LIVES

90%

5-YEAR SURVIVAL RATE

‍Dataset Summary

  • 558 female patients
  • Digital Breast Tomosynthesis (3D Mammography)
  • 271 malignant (48.5%) / 287 benign (51.5%) cases
  • Average size: 1.34 cm
  • Approximately 85% of lesions <2 cm
  • Expert segmentation annotations
  • DICOM, NRRD, and structured JSON outputs
  • Fully de-identified in compliance with HIPAA and GDPR

Detailed scientific methodology and clinical validation for this cohort can be found in the DBT-2026 preprint“Technical Framework for De-identified 3D Mammography with Ground Truth Biopsies”. This study was co-authored by clinical experts from Segmed and iMerit.

Wu J, Perandini L, Batra T, Igoshin S, Bari S, de Araujo AL, Willemink MJ. DBT-2026, a de-identified publicly available dataset of digital breast tomosynthesis exams with ground truth biopsies.medRxiv 2026.03.03.25337924; doi: 10.64898/2026.03.03.25337924.

Collaboration for Women’s Health

This initiative reflects a shared commitment by iMerit, Segmed, and Advocate Health to advance women’s health by fostering open research and collaboration. By making the dataset open-source, the partners aim to lower barriers for academic researchers, startups, and established institutions alike.

ABOUT iMERIT
iMerit is a leading AI data company that powers advanced machine learning and artificial intelligence models through its software Ango Hub
About Segmed
Segmed, Inc. streamlines access to diverse, high-quality medical imaging studies for biopharmaceutical R&D and AI development. For more information
About Advocate Health​
Advocate Health is one of the largest not-for-profit health systems in the U.S., advancing clinical excellence and innovative research.

TOP HEALTH COMPANIES

CHOOSE iMERIT

iMerit is the leader in providing AI data solutions for the leading brands and technology innovators advancing healthcare AI today. With over 10 years of data annotation experience, a full time workforce of 5,500 employees and world-class technology, iMerit provides the highest quality data in the industry.

CERTIFIED EXPERTS

Our data is labeled and reviewed by board certified radiologists and domain-trained annotators experienced in DICOM/NRRD workflows.

ADVANCED TOOLING

From mammography and tomosynthesis to ultrasound and MRI, we support advanced views for annotation and labeling: multi-resolution zoom, side-by-side comparisons, symmetry alignment, and pre-labeling and more.

REGULATORY-GRADE DATA

Workflows designed for FDA 510(k), MQSA, and HIPAA compliance with audit trails, metadata tagging, and rigorous QA processes for annotated and labeled clinical datasets.

Frequently Asked Questions

The breast cancer imaging dataset includes de-identified medical images (e.g., mammography, ultrasound, or MRI depending on modality), curated for AI development and evaluation. It is enriched with clinically validated ground-truth labels and structured metadata to support detection, classification, and segmentation use cases in medical imaging AI.

The dataset was annotated using a multi-reader clinical workflow, where qualified experts labeled findings with defined consensus processes. Human-in-the-loop quality assurance and adjudication were applied to ensure labeling consistency, reduce inter-observer variability, and produce regulatory-grade ground truth for model training and evaluation.

Yes. The dataset is designed to support regulatory-aligned AI workflows, with emphasis on traceability, annotation documentation, and validation rigor. Structured labeling protocols and quality controls help support evidence generation needed for FDA, CE, or SaMD pathways.

This dataset is ideal for medical imaging AI startups, healthcare AI research teams, diagnostic platform developers, and life sciences organizations developing breast cancer detection and analysis models. It supports both research and commercial AI development stages.