Annotation Protocol

Overview and Purpose

This annotation protocol was designed to create a high-quality, radiologist-validated dataset for breast imaging analysis using mammography and digital breast tomosynthesis (DBT). The annotation process was report-guided and performed by trained radiologists to ensure accurate lesion localization, segmentation, and structured characterization.

The primary objective of the protocol was to extract detailed lesion-level and study-level information from radiology reports and accurately map these findings onto corresponding imaging data through structured classification and pixel-level segmentation.

Data Sources and Imaging Modalities

Annotations were performed using the following imaging inputs:

  • Digital mammography
  • Digital breast tomosynthesis (DBT)
  • Synthetic mammography images (when available)
  • Corresponding radiology reports, which served as the primary reference standard
  • Histopathological reports (when available)

All standard mammographic and tomosynthesis views were reviewed. Lesions were annotated across all views and slices where clearly visible.

The primary objective of the protocol was to extract detailed lesion-level and study-level information from radiology reports and accurately map these findings onto corresponding imaging data through structured classification and pixel-level segmentation.

Annotation Workflow

The annotation process was performed in a structured, multi-step manner:

Step 1: Report-Guided Interpretation

Radiologists carefully reviewed the associated radiology report before initiating image annotation. The report served as the primary source of reference and provided critical study-level and lesion-level information, including:

Study-level parameters:

  • Breast density (BI-RADS A–D)
  • Overall BI-RADS assessment category (0–6)
  • Breast composition
  • Final impression and diagnostic summary
  • Histopathological reports (when available)

Lesion-level parameters:

  • Number of lesions
  • Laterality (left or right breast)
  • Lesion type (mass, asymmetry, architectural distortion, suspicious calcification, etc.)
  • Lesion size
  • Lesion-specific BI-RADS category (if provided)
  • Histopathological information (if available)

Radiologists used this report-based information to guide subsequent annotation and classification.

Step 2: Lesion Identification and Segmentation

All lesions described in the radiology report were identified on mammography and tomosynthesis images.
For each lesion:

  • Precise segmentation annotations were performed on all tomosynthesis slices and mammographic views where the lesion was clearly visible.
  • Segmentation was performed using region-based delineation to accurately capture lesion boundaries.
  • Each lesion was assigned a unique Lesion ID, which was consistently maintained across all views and imaging modalities.
  • Lesion annotations were performed based on clear correlation between the report description and imaging findings.

This ensured accurate lesion localization and spatial consistency across imaging datasets.

Step 3: Structured Classification and Metadata Annotation

In addition to segmentation, detailed structured classification was performed at both the study and lesion levels.

Study-Level Classification

Radiologists recorded the following report-derived study-level parameters:

  • Breast density (BI-RADS A, B, C, or D)
  • Overall BI-RADS category
  • Study-level assessment

These values were directly extracted from the report to preserve the original diagnostic interpretation.

An “Annotator disagreement comment field”  was incorporated for each classification parameter to document instances where the annotating radiologist’s interpretation differed from the report. This allowed capture of additional expert-derived insights while preserving the original report-based diagnostic classification.

Lesion-Level Classification

Each annotated lesion was characterized using structured metadata fields, including:

  • Lesion ID
  • Laterality (left/right breast)
  • Lesion type and appearance
  • Lesion size
  • Lesion density characteristics (if described)
  • Lesion-specific BI-RADS category (if available)
  • Histopathology classification (if biopsy information was available)
  • Lesion origin classification:
    • Lesion described in report
    • Additional lesion identified during radiologist image review

Lesion IDs were maintained consistently across mammography and tomosynthesis images to ensure cross-modality consistency.

Step 4: Annotation of Additional Radiologist-Detected Lesions

The protocol included provisions for capturing additional findings identified during image review that were not explicitly described in the report.

If a radiologist identified an additional lesion:

  • The lesion was assigned a new Lesion ID
  • It was marked as “not mentioned in a report” in the “lesion info” classification in lesion descriptors.
  • The lesion was segmented across all relevant mammography and tomosynthesis views
  • Structured classification was completed similar to report-based lesions

This allowed capture of additional radiologist-interpreted findings while preserving the distinction between report-derived and radiologist-detected lesions.

Step 5: Annotator Disagreement and Commentary

To preserve interpretive transparency and enable secondary analysis, the protocol incorporated structured disagreement documentation.

For each classification field, annotators were provided with a disagreement comment option to record situations where imaging interpretation differed from the report.

Examples include:

  • Differences in BI-RADS category assessment
  • Differences in lesion characterization
  • Differences in lesion identification or appearance interpretation

In such cases:

  • The original report classification was preserved in structured fields
  • Annotators recorded their alternative interpretation in the “ Annotator disagreement” comment field.

This approach allowed retention of report fidelity while capturing expert radiologist insight.

Cross-View and Cross-Modality Consistency

To ensure annotation reliability:

  • Each lesion maintained a consistent Lesion ID across:
    • All mammographic views
    • All tomosynthesis slices
    • Synthetic images (if available)
  • Segmentation was performed on all views where the lesion was clearly visible.

This ensured spatial and anatomical consistency of lesion annotations across the dataset.

Annotation Scope and Exclusions

The annotation protocol focused on clinically relevant lesions and excluded benign findings unlikely to contribute meaningful diagnostic value.
The following were excluded:

  • Benign skin calcifications
  • Benign vascular calcifications
  • Clearly benign calcifications without suspicious features

However, suspicious calcifications described in the report were included and annotated.

GET STARTED

TODAY!

The need for high quality, trusted and secure AI training data services has never been greater. iMerit combines the best of technology and automation with world-class subject matter expertise to deliver the data you need to get to production, fast.