Artificial Intelligence Detection of Missed Cancers at Digital Mammography That Were Detected at Digital Breast Tomosynthesis

Published Online:https://doi.org/10.1148/ryai.2021200299

Abstract

Purpose

To investigate how an artificial intelligence (AI) system performs at digital mammography (DM) from a screening population with ground truth defined by digital breast tomosynthesis (DBT), and whether AI could detect breast cancers at DM that had originally only been detected at DBT.

Materials and Methods

In this secondary analysis of data from a prospective study, DM examinations from 14 768 women (mean age, 57 years), examined with both DM and DBT with independent double reading in the Malmӧ Breast Tomosynthesis Screening Trial (MBTST) (ClinicalTrials.gov: NCT01091545; data collection, 2010–2015), were analyzed with an AI system. Of 136 screening-detected cancers, 95 cancers were detected at DM and 41 cancers were detected only at DBT. The system identifies suspicious areas in the image, scored 1–100, and provides a risk score of 1 to 10 for the whole examination. A cancer was defined as AI detected if the cancer lesion was correctly localized and scored at least 62 (threshold determined by the AI system developers), therefore resulting in the highest examination risk score of 10. Data were analyzed with descriptive statistics, and detection performance was analyzed with receiver operating characteristics.

Results

The highest examination risk score was assigned to 10% (1493 of 14 786) of the examinations. With 90.8% specificity, the AI system detected 75% (71 of 95) of the DM-detected cancers and 44% (18 of 41) of cancers at DM that had originally been detected only at DBT. The majority were invasive cancers (17 of 18).

Conclusion

Almost half of the additional DBT-only screening-detected cancers in the MBTST were detected at DM with AI. AI did not reach double reading performance; however, if combined with double reading, AI has the potential to achieve a substantial portion of the benefit of DBT screening.

Keywords: Computer-aided Diagnosis, Mammography, Breast, Diagnosis, Classification, Application Domain

Clinical trial registration no. NCT01091545

© RSNA, 2021

Summary

A digital mammography (DM) artificial intelligence (AI) system was evaluated as a stand-alone reader, using digital breast tomosynthesis with double reading as ground truth; additional cancers were detected at DM using the AI system.

Key Points

  • ■ Screening-detected cancers in general had high examination artificial intelligence (AI) risk scores on a scale of 1 to 10, with 10 being highest, with a median of 9.85 (interquartile range [IQR], 8.96–9.98).

  • ■ A total of 44% (18 of 41) of cancers that had been detected only at digital breast tomosynthesis by radiologists were detected at digital mammography by the AI system.

Introduction

Breast cancer screening programs with digital mammography (DM) have been established in many countries to reduce breast cancer mortality through earlier detection and treatment (13). Systems for computer-aided detection (CAD) at DM have been developed to provide decision support, and such systems are relatively widely used (mostly in the United States) to reduce the need for multiple reviewers or to improve reliability (4). Use of a CAD system has previously been linked to increased recall rates (5,6). There has been a rapid development of artificial intelligence (AI) tools in recent years, and the use of AI in mammography CAD systems has shown promising results (711). It has been proposed that similar systems could be used as stand-alone readers to allow more thorough review of the cases that the system classifies as high risk (12,13). Conversely, an AI system might be used to identify low-risk cases, which could be read by a single radiologist, or not read by a human at all (1316).

The ground truth has often been defined by using interval cancers or cancers diagnosed at the next screening round (16,17). Use of these data in model evaluation allows the AI system to potentially detect more cancers than radiologists detect; however, some cancers may have developed after the last screening examination. An alternative is to use a method with higher sensitivity to cancer than mammography, such as digital breast tomosynthesis (DBT) (1820), to provide the ground truth. Because DBT is not yet available in most screening programs, it would be valuable if some of the increased sensitivity gained with DBT could be achieved with DM aided by an AI system.

Several studies have found the cancer detection performance of an AI system to be similar to that of a breast radiologist in a retrospective setting using a cancer-enriched dataset (15,16,21,22). To the best of our knowledge, the cancer detection performance of an AI system analyzing DM images has not been compared with DBT screening, nor has it been studied if the AI system can identify the cancers on a lesion level.

In a previous study, we investigated if AI can identify normal mammograms (14). The aim of this study was to retrospectively evaluate a commercially available AI system and investigate if it could detect additional cancers at DM screening that would otherwise be identified only at DBT or be undetected and later appear as interval cancers.

Materials and Methods

ScreenPoint Medical provided the AI system and technical support as part of a research agreement, but no one from the company took part in performing the study.

Study Population

Data from the prospective population-based study Malmö Breast Tomosynthesis Screening Trial (MBTST) (ClinicalTrials.gov: NCT01091545) were used in this study (18). In total, 14 848 women were screened in the trial from 2010 to 2015. The entire study population underwent both DM (with craniocaudal and mediolateral oblique views) and one-view wide-angle DBT (mediolateral oblique) screening examinations, with separate double reading and separate decisions to recall after a consensus meeting. All examinations were performed with a Mammomat Inspiration system (Siemens Healthcare). CAD was not used in the trial reading setting. The trial setup and primary results have been reported in detail elsewhere (18,2328). The present study is covered by the original study approval for the MBTST by the local ethics committee at Lund University in Lund, Sweden (official records number: 2009/770). Written informed consent was obtained from all participants.

The present study included information from 14 768 women (mean age, 57 years) of the 14 848 women in the MBTST. Only one screening examination, including both DM and DBT, was included per woman. A small number of women were excluded, as explained in Figure 1. The number of women recalled and cancers diagnosed at DM and DBT, respectively, are also shown in Figure 1. In total, 136 cancers were detected, including 41 DBT only–detected cancers and eight DM only–detected cancers. One woman had bilateral cancers, but only the largest was included in the study. In the follow-up period between the initial screening and the subsequent screening (1824 months depending on patient age), 22 interval cancers were diagnosed.

Overview of the study population, including exclusions, recalls, and cancers. Digital mammography (DM) and digital breast tomosynthesis (DBT) cancer detection is based on double reading with consensus. AI = artificial intelligence, MBTST = Malmӧ Breast Tomosynthesis Screening Trial.

Figure 1: Overview of the study population, including exclusions, recalls, and cancers. Digital mammography (DM) and digital breast tomosynthesis (DBT) cancer detection is based on double reading with consensus. AI = artificial intelligence, MBTST = Malmӧ Breast Tomosynthesis Screening Trial.

AI System

A prerelease version of the commercially available mammography AI system Transpara (version 1.7.0; ScreenPoint Medical) was used to analyze the DM images (21,22). The system uses image processing algorithms combined with deep convolutional neural network–based learning to identify and classify suspicious areas on mammograms. Data from the studied population had not been used for training of the AI system.

Each suspicious area is assigned a score between 0 and 100. Areas with a finding score of at least 25 are defined as findings and are recorded by the system. Findings scored over 39 for calcifications, or 59 for soft-tissue lesions, are presented as CAD marks. The thresholds were predefined by the company developing the AI system and are optimized to present a suitable number of CAD marks when the system is used as a decision support tool for the reading radiologist. The system also combines all finding scores in an examination to calculate a composite cancer risk score between 0 and 9.99 for the whole examination, which is rounded up to an integer between 1 and 10. The algorithm for calculating the examination score is calibrated by the company developing the AI system to yield approximately one-tenth of total cases in each integer category in a reference screening sample dataset. Thus, with this calibration, an examination is given the maximum risk score of 10 if any individual finding is scored 62 or more.

Examination AI Risk Scores

The AI system was used to analyze all DM images from the included women, and the examination AI risk scores of the noncancer cases and cancer cases diagnosed at DM and DBT, respectively, were compared. The examination AI risk scores were also used to study the performance of the system as a stand-alone reader.

Detection of Cancer Lesions

In all cancer lesions, the location and risk score given by the AI system was matched with the location of the cancer detected and marked by the examining radiologist. If the circle provided by the AI system to indicate the location of a finding included the whole or a part of the lesion identified by the examining radiologist, the finding was considered to correspond to the cancer (ie, correctly localized by the AI system). This matching was performed by a radiologist in training (V.D. [>5 years of general radiology experience]) and verified by an experienced breast radiologist (S.Z. [>10 years of breast radiology experience] or I.A. [>45 years of breast radiology experience]) for the subset of cancers detected only at DBT screening, or appearing as interval cancers but detected on DM images by AI.

A cancer was defined as AI detected if the cancer lesion was correctly localized and the corresponding finding had a finding score of 62 or more, the threshold for a maximum examination risk score of 10. If the area was identified by the AI system but was assigned a lower score, the cancer was defined as potentially AI detected. All thresholds were predefined by the manufacturer developing the AI system and thus, were prespecified before the analyses were performed in our study.

Cancer Characteristics

The examination AI risk score and the number of AI-detected cancers were calculated and stratified according to different cancer characteristics, including breast density classification (according to the Breast Imaging Reporting and Data System Atlas, 4th ed), histologic cancer type, histologic grade (for invasive cancers), nuclear grade (for in situ cancers), tumor size, presence of lymph node metastases, and radiologic appearance as reported in the trial (18).

Statistical Analysis

The examination AI risk scores were analyzed using descriptive statistics with the interquartile range (IQR) as a distribution measure. Receiver operating characteristic (ROC) analyses were performed for AI cancer detection with ground truth defined by DM screening-detected cancers, DM plus DBT screening-detected cancers, and cancers diagnosed with DM plus DBT screening or as interval cancers. The area under the ROC curve (AUC) and 95% CI were calculated using bootstrapping with 1000 replicas. The cancer detection rates of DM with AI were compared with DM screening and DBT screening using an exact McNemar test.

The number and proportion of examinations scored 10 and AI-detected cancers data were analyzed with descriptive statistics, and 95% CIs were calculated using the Clopper-Pearson method. AI-detected cancers in subgroups with different cancer characteristics were analyzed with the same methods. Additionally, the median examination AI risk score and IQR were calculated, and Kruskal-Wallis analysis of variance was performed for each subgroup. Differences were considered statistically significant if they reached the 95% confidence level (P < .05). All statistical analyses were performed in MATLAB (R2019b, MathWorks).

Results

AI Risk Score Distributions

The results concerning the distribution of examination AI risk scores for the study population and the cancers are given in Figure 2 and Table 1. The median examination AI risk score for the whole study population was 3.78 (IQR, 1.307.08). Screening-detected cancers were heavily skewed toward high examination AI risk scores with a median of 9.85 (IQR, 8.969.98). For examinations scored 1, there was a 0.065% (two of 3061) risk of being diagnosed with cancer at combined DM and DBT screening, while the risk in the group scored 10 was 6.8% (101 of 1493). These values can be compared with the entire study population, where 0.92% (136 of 14 768) had a cancer diagnosed at screening.

Distribution of examination artificial intelligence (AI) risk score for the whole population and the cancers. AI scores are based on analysis of digital mammography (DM) examinations. DM and digital breast tomosynthesis (DBT) screening cancer detection is based on double reading with consensus. Interval cancers were those diagnosed during the 18- to 24-month follow-up period.

Figure 2: Distribution of examination artificial intelligence (AI) risk score for the whole population and the cancers. AI scores are based on analysis of digital mammography (DM) examinations. DM and digital breast tomosynthesis (DBT) screening cancer detection is based on double reading with consensus. Interval cancers were those diagnosed during the 18- to 24-month follow-up period.

Table 1: Distribution between Examination AI Risk Scores for all Examinations and Screening-detected Cancers

Table 1:

AI System Performance for Cancer Detection

The performance of the AI system on the detection of cancers at DM was analyzed. ROC curves and the AUC for the AI cancer detection depending on ground truth are presented in Figure 3, together with the operating points of the radiologists’ DM double reading with corresponding ground truths shown for comparison. The AUC for cancer detection by the AI system, taking both screening-detected (DM and DBT) and interval cancers into account, was 0.88 (95% CI: 0.84, 0.90). It was not possible to calculate an ROC curve for the radiologists in MBTST because the study design did not include a continuous risk rating of individual cases by human readers (18,29). However, the operating point of radiologists was higher than any point on the AI ROC curve (Fig 3). The radiologist DM reading arm in the original study had a false-positive rate of 1.85% (271 of 14 610 [14 768 less 136 screening-detected cancers and 22 interval cancers]) and detected 69.9% (95 of 136) of the cancers. This can be compared with the ROC curve with ground truth defined by DM plus DBT screening-detected cancers (Fig 3), where a threshold accepting a false-positive rate of 1.85% would lead to the identification of 54% of the cancers. If instead the AI system was used as a single stand-alone reader to increase the cancer detection rate (eg, recalling all women with a score of 10), the false-positive rate would be increased to 9.5% (1386 of 14 610; specificity, 90.5%).

Receiver operating characteristic (ROC) curves for artificial intelligence (AI) cancer detection at digital mammography (DM) with ground truth (GT) defined by DM double-reading screening-detected cancers, DM plus digital breast tomosynthesis (DBT) double-reading screening-detected cancers, and cancers detected at DM plus DBT double-reading screening or diagnosed as interval cancers (ICs) during 18- to 24-month follow-up. Area under the ROC curve (AUC) with 95% CIs. Operating points of radiologist DM double reading with consensus with corresponding ground truths are shown for comparison.

Figure 3: Receiver operating characteristic (ROC) curves for artificial intelligence (AI) cancer detection at digital mammography (DM) with ground truth (GT) defined by DM double-reading screening-detected cancers, DM plus digital breast tomosynthesis (DBT) double-reading screening-detected cancers, and cancers detected at DM plus DBT double-reading screening or diagnosed as interval cancers (ICs) during 18- to 24-month follow-up. Area under the ROC curve (AUC) with 95% CIs. Operating points of radiologist DM double reading with consensus with corresponding ground truths are shown for comparison.

Of all the screening-detected cancers, 74% (101 of 136) were scored 10 and 65% (89 of 136) were AI detected (Table 2). When taking potentially AI-detected cancers into account, 91% (124 of 136) of the cancers were identified by the system. Taking only the 95 DM-detected cancers into account, 75% (71 of 95) were AI detected. Of the 41 DBT only–detected cancers, 44% (18 of 41) were AI detected. For the interval cancers, the proportion of AI-detected cancers was 9% (two of 22).

Table 2: AI-detected and Potentially AI-detected Cancers on DM Images Compared with Original Findings

Table 2:

Cancer Detection Rates

In an optimal scenario, in which AI in DM screening would lead to the detection of all the cancers that had been detected solely by DBT, the total cancer detection rate per 1000 women would be 7.7 (95% CI: 6.3, 9.2; 1000 · [113/14 768]), or 7.8 (95% CI: 6.4, 9.3; 1000 · [115/14768]) if the AI-detected interval cancers were also taken into account. These are both higher than the value for DM screening without AI, which is 6.4 (95% CI: 5.2, 7.9; 1000 · [95/14 768]; P < .001 for both). The detection rate with DBT screening was 8.7 (95% CI: 7.2, 10.3; 1000 · [128/14 768]) and was higher than DM with AI (P < .001).

AI Findings and Scores

In total, there were 43 941 findings (across 11 452 examinations), and 7162 of these were CAD marks (across 3609 examinations). The scores of all findings are presented in Figure 4, together with the finding scores for the cancer lesions. Most cancers had higher finding scores, but there was also a cluster of cancers with lower finding scores. Among all findings with a score of 90 or more, 35% (77 of 217) were cancer. The total number of findings with a score of 62 or more, which is the threshold for an AI-detected cancer, was 2172, of which 136 corresponded to cancer lesions. This gives a positive predictive value at the findings level (ie, the proportion of findings with a score of ≥62 corresponding to cancers) of 6.26% (136 of 2172 findings in 1457 women). Had the AI system been used as a stand-alone reader, recalling all women in whom there was at least one finding with a score of 62 or more, 9.9% (1457 of 14 768) of the women would have been recalled. Thus, the specificity of AI as a stand-alone reader was 90.8% (13 259 of 14 610). The mean number of false-positive findings per woman, that is, findings with a score of 62 or more not corresponding to cancers, was 0.137 (2036 in 14 768).

Artificial intelligence (AI) scores (left) for all findings of the AI system analysis of digital mammography examinations and (right) for the findings corresponding to cancer lesions.

Figure 4: Artificial intelligence (AI) scores (left) for all findings of the AI system analysis of digital mammography examinations and (right) for the findings corresponding to cancer lesions.

Of the DBT only–detected cancers that the AI system also detected at DM, the majority were invasive cancers (Table 3). There was a small difference in examination AI scores depending on axillary lymph node status (positive [9.98; IQR, 0.15] vs negative [9.82; IQR, 1.27]; P = .01). We found no evidence of any differences between the proportions of detected cancers or examination AI risk scores within the other different subgroups. An example of a cancer detected by AI at DM, but detected only at DBT with radiologist double reading, is shown in Figure 5.

Table 3: Examination AI Risk Score and Detection by AI System for Different Cancer Types and Characteristics

Table 3:
Example of an invasive ductal carcinoma (circles) detected with artificial intelligence (AI) on digital mammogram (DM), but otherwise detected only at digital breast tomosynthesis (DBT). (A–D) Mediolateral oblique and craniocaudal DM images with AI system lesion scores and (E, F) DBT images for comparison.

Figure 5: Example of an invasive ductal carcinoma (circles) detected with artificial intelligence (AI) on digital mammogram (DM), but otherwise detected only at digital breast tomosynthesis (DBT). (A–D) Mediolateral oblique and craniocaudal DM images with AI system lesion scores and (E, F) DBT images for comparison.

Discussion

The objective of this study was to determine if an AI system could detect cancers at DM that had originally only been detected at DBT. We used a dataset consisting of DBT and DM images from 14 768 women. The cancer detection performance of an AI system used as a stand-alone reader with DM was tested in a screening population with ground truth defined by DBT screening and interval cancers, and in cases of cancer, the findings of the AI system were matched to the verified cancers on a lesion level. Of the cancers that radiologists detected only at DBT, 44% (18 of 41) were detected by the AI system.

Using the AI system as a stand-alone single reader in screening would be inferior to double reading by radiologists, although it should be noted that a combined radiologist and AI reading setup was not evaluated. Some examinations were assigned a high AI risk score owing to a finding other than the cancer lesion. The AI system also identified relevant findings missed by the radiologists, however; a substantial number of the DBT only–detected cancers and some interval cancers were correctly identified by AI on the basis of DM images. As of this writing and to our knowledge, whether or not DBT contributes to better long-term outcome in breast cancer screening has not been assessed (30); if this is the case, however, this would have important clinical value. The increase in sensitivity with AI found in this study could be expected owing to the study design, wherein AI is studied at a low specificity. The AI-detected cancers provide a measure of the greatest potential gain in using AI in conjunction with radiologists’ readings, while most potentially AI-detected cancers do not have scores high enough to be displayed as CAD marks and are unlikely to be identified in a true screening situation.

The results of this study suggest that, in an optimal scenario, when all AI-detected cancers are found, the cancer detection rate at DM per 1000 women could be increased from 6.4 to 7.7 using AI. This would reduce the gap to the higher DBT cancer detection rate of 8.7 and could be more feasible than DBT screening. In a clinical workflow, the large number of CAD marks necessitates a limited selection for further assessment; thus, the AI results would have to be moderated by radiologists. To attain a reasonable number of false-positive recalls, it would be necessary to discard many of the AI findings. However, doing so would likely result in not recalling some AI-detected cancers. As this retrospective study is based on radiologist readings without AI, it was not possible to study how the sensitivity and number of false-positive recalls would be affected by integrated AI and radiologists’ readings in a real-world screening situation. The results here thus establish a current maximum additional cancer detection potential, but further studies are needed to explore the clinical potential of AI.

The examination AI score was slightly—but significantly—higher for cancers with axillary lymph node metastasis than for those without, but this needs to be assessed with multivariable analysis in larger datasets. No significant differences were observed in the performance of the AI system depending on other cancer characteristics or breast density, but it cannot be excluded that such differences might be revealed were a larger dataset to be studied.

The relatively high proportion of women with scores of 13 differs from a previous study using the same system (31). This may be owing to differences between the current screening dataset and the datasets used for training and calibrating the system. The comparison of ROC curves between studies is complicated owing to differences in study design, but the AUC values in this study are roughly in the middle among the AUC values reported in previous studies (1517,21,32,33). It is also difficult to compare the performance of the AI system in relation to radiologists in different studies because the performance and operating points of the radiologists may vary (21). The use of DBT in the ground truth might include some cancers, undetectable with DM also during a follow-up interval, which would not be included in the ground truth in many other studies. Also, the number and characteristics of interval cancers are affected by the use of DBT.

This study had several limitations. A single-center, single-vendor screening dataset was analyzed with a single AI system, which limits the generalizability of the results. It is possible that the results might be slightly different from the commercially available version of the AI system, since a prerelease version was used. Cancers diagnosed at later screening rounds were not included in the ground truth in this study. The MBTST design might cause a slightly decreased number of recalls at DM (18). The correlation of AI findings on the craniocaudal projection with the area where the malignancy was detected by radiologists introduces an uncertainty in location, in particular for DBT only–detected cancers, since only the mediolateral oblique view was available in the DBT examination. The number of cancers was relatively small, and a larger study would be necessary for more comprehensive results. Finally, as noted above, in a retrospective study, it is not possible to investigate how AI would affect the number of false-positive recalls, and further experimental and prospective trials are needed to assess the true effect of using AI in conjunction with human readers.

This study shows that additional cancers detected by radiologists at DBT but not at DM were detected and correctly localized at DM by a stand-alone AI system. The AI system did not reach the performance of radiologist double reading; however, supplementing DM with AI has the potential to achieve a substantial part of the benefits of DBT screening. Prospective trials are needed to study how AI can be implemented in a screening workflow without introducing an unacceptable increase in false-positive rates.

Disclosures of Conflicts of Interest: V.D. institution received governmental funding for clinical research, AIDA/VINNOVA, and The Swedish Cancer Society; ScreenPoint Medical provided the software and technical support under a research agreement, no financial support was received. I.A. disclosed no relevant relationships. K.L. disclosed no relevant relationships. A.T. disclosed no relevant relationships. S.Z. institution received speaker's fees from Siemens Healthcare; issued US patent no PCT/EP2014/057372. M.D. issued US patent no PCT/EP2014/057372.

Author Contributions

Author contributions: Guarantor of integrity of entire study, M.D.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, V.D., I.A., S.Z., M.D.; clinical studies, V.D., K.L., S.Z.; experimental studies, V.D., M.D.; statistical analysis, V.D., M.D.; and manuscript editing, V.D., I.A., A.T., S.Z., M.D.

This study was funded by Governmental Funding for Clinical Research, Analytic Imaging Diagnostics Arena (AIDA)/VINNOVA and The Swedish Cancer Society.

Data sharing. Data generated or analyzed during the study are available from the corresponding author by request. Parts of the dataset are available by request at https://doi.org/10.23698/aida/mbtst-dm.

References

  • 1. Marmot MG, Altman DG, Cameron DA, Dewar JA, Thompson SG, Wilcox M. The benefits and harms of breast cancer screening: an independent review. Br J Cancer 2013;108(11):2205–2240.
  • 2. Siu AL; U.S. Preventive Services Task Force. Screening for Breast Cancer: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med 2016;164(4):279–296 [Published correction appears in Ann Intern Med 2016;164(6):448.].
  • 3. Perry N, Broeders M, de Wolf C, Törnberg S, Holland R, von Karsa L. European guidelines for quality assurance in breast cancer screening and diagnosis. Fourth edition--summary document. Ann Oncol 2008;19(4):614–622.
  • 4. Rao VM, Levin DC, Parker L, Cavanaugh B, Frangos AJ, Sunshine JH. How widely is computer-aided detection used in screening and diagnostic mammography? J Am Coll Radiol 2010;7(10):802–805.
  • 5. Henriksen EL, Carlsen JF, Vejborg IM, Nielsen MB, Lauridsen CA. The efficacy of using computer-aided detection (CAD) for detection of breast cancer in mammography screening: a systematic review. Acta Radiol 2019;60(1):13–18.
  • 6. Katzen J, Dodelzon K. A review of computer aided detection in mammography. Clin Imaging 2018;52:305–309.
  • 7. Burt JR, Torosdagli N, Khosravan N, et al. Deep learning beyond cats and dogs: recent advances in diagnosing breast cancer with deep neural networks. Br J Radiol 2018;91(1089):20170545.
  • 8. Ribli D, Horváth A, Unger Z, Pollner P, Csabai I. Detecting and classifying lesions in mammograms with Deep Learning. Sci Rep 2018;8(1):4165.
  • 9. Becker AS, Marcon M, Ghafoor S, Wurnig MC, Frauenfelder T, Boss A. Deep Learning in Mammography: Diagnostic Accuracy of a Multipurpose Image Analysis Software in the Detection of Breast Cancer. Invest Radiol 2017;52(7):434–440.
  • 10. Rodríguez-Ruiz A, Krupinski E, Mordang JJ, et al. Detection of Breast Cancer with Mammography: Effect of an Artificial Intelligence Support System. Radiology 2019;290(2):305–314.
  • 11. Pacilè S, Lopez J, Chone P, Bertinotti T, Grouin JM, Fillard P. Improving Breast Cancer Detection Accuracy of Mammography with the Concurrent Use of an Artificial Intelligence Tool. Radiol Artif Intell 2020;2(6):e190208.
  • 12. Hupse R, Samulski M, Lobbes M, et al. Standalone computer-aided detection compared to radiologists’ performance for the detection of mammographic masses. Eur Radiol 2013;23(1):93–100.
  • 13. Dembrower K, Wåhlin E, Liu Y, et al. Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study. Lancet Digit Health 2020;2(9):e468–e474.
  • 14. Lång K, Dustler M, Dahlblom V, Åkesson A, Andersson I, Zackrisson S. Identifying normal mammograms in a large screening population using artificial intelligence. Eur Radiol 2021;31(3):1687–1692.
  • 15. Salim M, Wåhlin E, Dembrower K, et al. External Evaluation of 3 Commercial Artificial Intelligence Algorithms for Independent Assessment of Screening Mammograms. JAMA Oncol 2020;6(10):1581–1588.
  • 16. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577(7788):89–94.
  • 17. Schaffter T, Buist DSM, Lee CI, et al. Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms. JAMA Netw Open 2020;3(3):e200265.
  • 18. Zackrisson S, Lång K, Rosso A, et al. One-view breast tomosynthesis versus two-view mammography in the Malmö Breast Tomosynthesis Screening Trial (MBTST): a prospective, population-based, diagnostic accuracy study. Lancet Oncol 2018;19(11):1493–1503 [Published correction appears in Lancet Oncol. 2019 Jan;20(1):e9.].
  • 19. Skaane P, Sebuødegård S, Bandos AI, et al. Performance of breast cancer screening using digital breast tomosynthesis: results from the prospective population-based Oslo Tomosynthesis Screening Trial. Breast Cancer Res Treat 2018;169(3):489–496.
  • 20. Bernardi D, Macaskill P, Pellegrini M, et al. Breast cancer screening with tomosynthesis (3D mammography) with acquired or synthetic 2D mammography compared with 2D mammography alone (STORM-2): a population-based prospective study. Lancet Oncol 2016;17(8):1105–1113.
  • 21. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-Alone Artificial Intelligence for Breast Cancer Detection in Mammography: Comparison With 101 Radiologists. J Natl Cancer Inst 2019;111(9):916–922.
  • 22. Kooi T, Litjens G, van Ginneken B, et al. Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal 2017;35:303–312.
  • 23. Rosso A, Lång K, Petersson IF, Zackrisson S. Factors affecting recall rate and false positive fraction in breast cancer screening with breast tomosynthesis - A statistical approach. Breast 2015;24(5):680–686.
  • 24. Lång K, Andersson I, Rosso A, Tingberg A, Timberg P, Zackrisson S. Performance of one-view breast tomosynthesis as a stand-alone breast cancer screening modality: results from the Malmö Breast Tomosynthesis Screening Trial, a population-based study. Eur Radiol 2016;26(1):184–190.
  • 25. Lång K, Nergården M, Andersson I, Rosso A, Zackrisson S. False positives in breast cancer screening with one-view breast tomosynthesis: An analysis of findings leading to recall, work-up and biopsy rates in the Malmö Breast Tomosynthesis Screening Trial. Eur Radiol 2016;26(11):3899–3907.
  • 26. Sartor H, Lång K, Rosso A, Borgquist S, Zackrisson S, Timberg P. Measuring mammographic density: comparing a fully automated volumetric assessment versus European radiologists’ qualitative classification. Eur Radiol 2016;26(12):4354–4360.
  • 27. Förnvik D, Förnvik H, Fieselmann A, Lång K, Sartor H. Comparison between software volumetric breast density estimates in breast tomosynthesis and digital mammography images in a large public screening cohort. Eur Radiol 2019;29(1):330–336.
  • 28. Johnson K, Zackrisson S, Rosso A, et al. Tumor Characteristics and Molecular Subtypes in Breast Cancer Screening with Digital Breast Tomosynthesis: The Malmö Breast Tomosynthesis Screening Trial. Radiology 2019;293(2):273–281.
  • 29. Zackrisson S, Lång K, Rosso A, et al. One-view breast tomosynthesis vs two-view mammography: a methodological issue - Authors’ reply. Lancet Oncol 2019;20(1):e7.
  • 30. Houssami N, Lång K, Hofvind S, et al. Effectiveness of digital breast tomosynthesis (3D-mammography) in population breast cancer screening: a protocol for a collaborative individual participant data (IPD) meta-analysis. Transl Cancer Res 2017;6(4):869–877.
  • 31. Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study. Eur Radiol 2019;29(9):4825–4832.
  • 32. Kim HE, Kim HH, Han BK, et al. Changes in cancer detection and false-positive recall in mammography using artificial intelligence: a retrospective, multireader study. Lancet Digit Health 2020;2(3):e138–e148.
  • 33. Sasaki M, Tozaki M, Rodríguez-Ruiz A, et al. Artificial intelligence for breast cancer detection in mammography: experience of use of the ScreenPoint Medical Transpara system in 310 Japanese women. Breast Cancer 2020;27(4):642–651.

Article History

Received: Dec 22 2020
Revision requested: Mar 9 2021
Revision received: July 12 2021
Accepted: Aug 9 2021
Published online: Sept 01 2021