Do radiologists trust mammography AI software?

In a large-scale prospective trial conducted at a breast screening program in Europe, fewer women were recalled based on AI analysis of their digital mammogram than those recalled due to consensus findings from radiologists.

As a result, women with mammograms flagged by AI computer-aided detection (CAD) software may be under-recalled, potentially leading to a lower breast cancer detection rate, according to a team led by Karin Dembrower, MD, PhD, of Karolinska Institute in Sweden.

“[This study] suggests a differential reliance on decision support related to whether that originated from AI CAD or from a fellow radiologist,” the authors wrote in an article published March 18 in Radiology. “The observed behavior may attenuate, and underestimate, the potential benefits of AI CAD in screening programs.”

Conducted between April 2021 and June 2022, the ScreenTrustCAD trial prospectively evaluated the performance of Insight MMG AI CAD software (Lunit) on nearly 55,000 women in Sweden receiving breast cancer screening with full-field digital mammography.

The initial ScreenTrustCAD trial results found that replacing one radiologist in a double-reading regimen with AI software had a 4% higher and noninferior cancer detection rate compared with consensus reading by two radiologists. However, the findings also raised concerns that the radiologists tended to agree with the AI software too much on cases that were erroneously flagged or too little when the software had correctly flagged the study, according to the authors.

In the current research, the authors sought to assess the differences in recall proportion and positive predictive value (PPV) in the ScreenTrustCAD trial based on whether the AI CAD software and/or radiologists flagged the mammogram. Of the 54,991 women in the study, 5,480 were flagged for consensus discussion by the radiologists and 1,348 were recalled.

Recall results by reader on screening mammograms
Flagged by one radiologist Flagged by both radiologists Flagged by AI CAD Flagged by AI CAD and one radiologist Flagged by both radiologists and AI CAD
Proportion of recalls 14.2% 57.2% 4.6% 38.6% 82.6%
Positive predictive value for breast cancer 3.4% 2.5% 22% 25% 34.2%

The investigators noted that interval cancer analysis on the ScreenTrustCAD trial will be performed after the follow-up time has passed in order to determine whether there are actually more missed cancer cases due to false-negative consensus discussion decisions among AI-flagged versus radiologist-flagged mammograms. In addition, they also pointed out that their study results were not directly applicable to a single-reader setting.

“However, it is not unreasonable to speculate that radiologists in any setting would show a similar tendency of underestimating the accuracy of AI CAD when interpreting screening mammograms,” they wrote. “It should be acknowledged that a lower proportion recalled might not be caused by radiologist mistrust in AI CAD but rather might indicate that those mammograms contain image signs that are less obvious for a radiologist.”

Full-field digital screening mammograms in participants recalled by the consensus discussion after being flagged by artificial intelligence (AI) computer-aided detection (CAD) but not by either of the two radiologists. (A) Mammogram with AI CAD score of 73 for a 55-year-old woman diagnosed with grade 3 in situ cancer and T1 (9-mm) invasive cancer in the left breast and lymph node metastasis. (B) Mammogram with AI CAD score of 54 for a 49-year-old woman diagnosed with grade 2 in situ cancer and T1 (6-mm) invasive cancer in the right breast and without lymph node metastasis. All images and caption courtesy of the RSNA.Full-field digital screening mammograms in participants recalled by the consensus discussion after being flagged by artificial intelligence (AI) computer-aided detection (CAD) but not by either of the two radiologists. (A) Mammogram with AI CAD score of 73 for a 55-year-old woman diagnosed with grade 3 in situ cancer and T1 (9-mm) invasive cancer in the left breast and lymph node metastasis. (B) Mammogram with AI CAD score of 54 for a 49-year-old woman diagnosed with grade 2 in situ cancer and T1 (6-mm) invasive cancer in the right breast and without lymph node metastasis. All images and caption courtesy of the RSNA.

In an accompanying editorial, Lars Grimm, MD, of Duke University Medical Center in Durham, NC, noted that the study provides important nuance to the initial conclusions of the ScreenTrustCAD trial.

“Longer-term follow-up data from the trial will assess false-negative rates, and a more complete picture of the influence of radiologists and AI on consensus panel decision-making can be completed,” Grimm wrote. “More work is ultimately needed in different settings to understand the real-world implications of AI tools on radiologist decision-making.”

The full article and accompanying commentary can be found here and here.

Page 1 of 374
Next Page