Computer-aided Detection of Masses at Mammography: Interactive Decision Support versus Prompts
Abstract
Purpose
To compare effectiveness of an interactive computer-aided detection (CAD) system, in which CAD marks and their associated suspiciousness scores remain hidden unless their location is queried by the reader, with the effect of traditional CAD prompts used in current clinical practice for the detection of malignant masses on full-field digital mammograms.
Materials and Methods
The requirement for institutional review board approval was waived for this retrospective observer study. Nine certified screening radiologists and three residents who were trained in breast imaging read 200 studies (63 studies containing at least one screen-detected mass, 17 false-negative studies, 20 false-positive studies, and 100 normal studies) twice, once with CAD prompts and once with interactive CAD. Localized findings were reported and scored by the readers. In the prompted mode, findings were recorded before and after activation of CAD. The partial area under the location receiver operating characteristic (ROC) curve for an interval of low false-positive fractions typical for screening, from 0 to 0.2, was computed for each reader and each mode. Differences in reader performance were analyzed by using software.
Results
The average partial area under the location ROC curve with unaided reading was 0.57, and it increased to 0.62 with interactive CAD, while it remained unaffected by prompts. The difference in reader performance for unaided reading versus interactive CAD was statistically significant (P = .009).
Conclusion
When used as decision support, interactive use of CAD for malignant masses on mammograms may be more effective than the current use of CAD, which is aimed at the prevention of perceptual oversights.
© RSNA, 2012
Introduction
In breast cancer screening, computer-aided detection (CAD) systems are used to prevent perceptual oversight of abnormalities on mammograms. The positive effects of CAD were shown in certain studies (
CAD may lead to disappointing results because masses are often missed because of incorrect interpretation (
Other interactive CAD systems provide additional information to justify the CAD marks so that they are not ignored. Additional interactive CAD systems aid characterization of lesions in a clinical setting by using reference libraries of similar cases (
Materials and Methods
Study Population
This retrospective study was carried out in accordance with the rules in the Netherlands that are relevant to the review of research ethics committees and informed consent. Study material was made anonymous and institutional review board approval was waived. Mammograms used in this study were acquired in a 2003–2008 digital screening pilot project that was conducted in Utrecht, the Netherlands (
Case Selection
We collected material from a prospective study on the effect of digital screening (
CAD and Reading Environment
This CAD system (
The reader study was performed by using an experimental reading environment for screening mammography (
CAD results were viewed in two modes: the prompting mode and the interactive mode. When activated and in the prompting mode, all CAD regions were shown by a display of contours without providing suspiciousness scores. Prompts were shown for CAD regions with suspiciousness scores above a particular threshold. This threshold was adjusted so that a reference set of normal mammograms was given an average of two prompts per case. This prompting mode, with the threshold we used, was similar to the way in which CAD is used in current clinical practice.
In the interactive mode, regions that were defined by CAD remained hidden until they were activated by the reader, who could query for CAD results with a mouse-click on a mammographic region. If a CAD result was available at the queried location, the contour of this region was presented to the reader with its suspiciousness score. A CAD result was available if the queried location was inside of the contour of the CAD region or if the distance between the queried location and the center point of the CAD region was less than 0.5 cm. View correspondence was also used; if an activated CAD region was linked to a region in the other view, this other region’s contour and score were also shown.
CAD results were displayed in the interactive mode by obtaining suspiciousness scores that were above the threshold. This threshold was chosen so that an average of eight CAD regions were available on a normal four-view mammogram. The increased accessibility of the interactive mode was an inherent advantage because more CAD results were accessible in this mode than they were in the prompting mode. To prevent too many false-positive determinations, we did not provide all marks that were accessible with interactive CAD as prompts in the traditional reading mode, which may have undermined our intention to compare the interactive approach with the use of CAD in current clinical practice. In the interactive mode, contours of queried regions were displayed in color by using a continuous scale from yellow (less suspicious) to red (highly suspicious). A numeric value that represented suspiciousness was also shown next to the contour, ranging from 0 (not suspicious) to 100 (highly suspicious).
Observer Study Design
Twelve readers (nine radiologists with 1–24 years of experience in mammography and three residents trained in breast imaging) participated in the study. Before the study sessions, readers were offered a short training session so that they could become familiar with the experimental setup. Most readers had previously used conventional CAD in screening practice, and five of the radiologists participated in earlier studies with a commercial CAD system where they used traditional prompts (
In the observer study, each reader interpreted all images in both modes during two sessions. In the first session, the first 100 mammograms were interpreted by using either CAD with prompts or interactive CAD, while the second series of 100 mammograms were subsequently interpreted in the alternate mode. In the second session, conducted at least 4 weeks after the first session was completed, the same mammograms were interpreted again with the reading modes switched.
To obtain sufficient data for analysis, radiologists were asked to report more of their findings than they would usually do in screening practice. Readers marked the location of their findings (not the contour) on both views, and assigned a suspiciousness score from 0 to 100 to each of their findings. In the prompted mode, readers first assigned a suspiciousness score to each image without using CAD. CAD was then made available, and readers could adjust their scores and add new findings. Results for the unaided, regular CAD, and interactive CAD modes were obtained.
Data Analysis
The standalone performance of CAD was computed by using free-response receiver operating characteristic analysis, which provided the case-based true-positive fraction as a function of the false-positive rate on images that did not depict cancer. The location of a finding was considered correct if its distance to the center of the reference standard was less than 2 cm. On one image, two malignant masses were present, and the correctly localized finding with the highest score was used. For comparison, we determined the sensitivity and specificity of the digital CAD system (R2 ImageChecker, V1.4; Hologic) on the study set. This system was adjusted to a specific setting.
Reader performance was computed by using location receiver operating characteristic (LROC) analysis, which determined the true-positive fraction for each false-positive fraction and was computed from the finding with the highest score on each of the images that did not depict cancer. For each reader and reading mode, the partial area under the LROC curve (pAUC) for false-positive fractions that were less than 0.2 was computed by using linear interpolation between the operating points. The low false-positive range was chosen to match the operating points used by radiologists in screening. In the dataset we used, 16.7% (20 of 120) of the normal cases were recalled by the original screening radiologists. The raw LROC curves for each mode were averaged by computing true-positive fraction values for a standard set of false-positive fraction values with linear interpolation. This was also calculated for false-positive fraction values that were obtained by LROC curves from all readers. Statistical analysis on the performance differences for the three modes was performed by using a method that treats readers and data as random samples and uses the jackknife method of analysis (
In the interactive mode, the average number of queries with and without a CAD response was computed in normal cases. Because most images were normal at screening, this number reflects the number of clicks that would be expected during screening practice. We also computed the median reading time for the normal cases for each reader. Instead of average reading times, median reading times were computed because the median is less affected by excessively long reading times caused by interruptions during the sessions. To investigate a potential effect of the experience of the readers on the benefit of CAD, the Pearson correlation coefficient was computed between the number of years readers practiced as qualified breast imagers and the increase in pAUC with interactive CAD (R version 2.11.1; Institute of Science and Medicine, Vienna, Austria).
Results
The performance of CAD on the 200 study cases is shown in

Figure 1: Free-response receiver operating characteristic curve for CAD regions used in the study, computed over the 200 cases. Operating points for traditional and the interactive modes are shown. FP = false-positive.
In total, 10 031 findings were reported by 12 readers in the three modes. By considering a distance of more than 2 cm unique, an average of 958 unique findings was reported for each mode. All abnormalities were correctly localized by at least one reader in the unaided mode.

Figure 2: LROC curves for unaided reading with prompted CAD and interactive CAD. The curves are averaged for all 12 readers.
![]() |
The correlation between the radiologists’ number of years of mammography experience and the performance increase with interactive CAD is shown in

Figure 3: Correlation between radiologists’ years of mammography experience and the change in their performance with use of interactive CAD. Pearson correlation coefficient was −0.53.
![]() |
![]() |
Discussion
We found that reader performance increased when CAD results were interactively displayed, and that prompts had no major effect on reader performance. The reading process of the proposed interactive system was not disrupted by appearance of false-positive prompts at unexpected locations, as was the case in the conventional system, which may have led to these results. In the interactive mode, marks remained hidden unless regions that corresponded were probed. Because most radiologists probed only a limited number of regions, and only those that they were interested in, fewer false-positives were displayed. CAD suspiciousness scores generally corresponded well with the observers’ interpretations. When this was not the case, readers were alerted and paid more attention, which may have led to better decisions.
There was a large variance in the effect of interactive CAD for the 12 readers. Results indicated that readers with more experience had a higher unaided performance and had less or no benefit with CAD. The decrease in performance of the two readers who probed regions extremely often may have been related to this deviation. That two readers were involved in supervising the abnormality annotation may have biased the results. However, annotation was performed more than 1 year before the observer study was conducted, and a potential for recall existed with both prompted and interactive modes.
To make a fair and relevant comparison, the threshold for displaying prompts in the conventional system was set to a level that corresponded to the level used in current clinical practice. In the study series, we measured 3.2 marks per normal case that were false-positive for mass. This rate was in the range of settings used in commercial systems but was slightly higher than we expected, because the thresholds on the classifier output were based on a reference set of digitized film mammograms. It appeared that average CAD scores for digital images were slightly higher than those for digitized film images. To validate a comparison with clinical practice, the sensitivity of CAD should be high. Compared with the commercial R2 ImageChecker system, the sensitivity of our system was slightly higher at the same false-positive rate.
The threshold used to determine which CAD results were available in the interactive system was different than the threshold used to display prompts in the conventional system for reasons described in the Materials and Methods section. We do not know if other results would have been obtained by using another threshold. A further study would be necessary to determine the optimal threshold.
The effect of prompts may be underestimated in this study. Our results were different from the results that were obtained in prospective studies, where prompts had a positive effect on performance (
We found that the use of CAD (interactive or with prompts) lengthened the time that radiologists took to read the images by approximately 10 seconds per image. Owing to the sequential scoring of each case, prompts could only increase the time it would take to read the images compared with the unaided mode. Some readers reported that, out of curiosity, they took more time to explore the CAD results in the interactive system, which would have increased the time it took to read the images. We expect that, with increased experience, the reading time for interactive CAD will be reduced. An earlier study showed no increase in reading times with interactive CAD (
To ensure that we obtained enough data for LROC analysis, radiologists were asked to report more findings than they ordinarily would in screening practice. This might have changed the behavior of the readers compared with their routine practice. However, for our analysis, we used pAUC for low false-positive fractions. Therefore, findings with very low suspiciousness scores did not influence the results.
A limitation of this study was that the effect size of interactive CAD did not easily translate to screening practice. We selected a challenging set of cases for this study, in which the proportions of normal and abnormal cases were different from those in screening practice. Microcalcification cases, in which no mass or architectural distortion was visible, were excluded. Reader performance depends on the subtlety of the cases in the study set. However, we used the same study set for each mode; therefore, we believe that the relative differences between unaided reading, with prompts and with interactive CAD, are valid.
In conclusion, the interactive use of the results from CAD as decision support for detection of malignant masses on mammograms may be more effective than the current use of CAD, which is aimed at prevention of perceptual oversights.
• The performance of radiologists who were detecting malignant masses was significantly more accurate when computer-aided detection (CAD) results were displayed in an interactive way compared with traditional prompts and unaided reading.
• Breast cancers might be detected earlier in screening without an increase in false-positive recalls if CAD results for masses and architectural distortions can be queried interactively and are accompanied by a suspiciousness score.
Disclosures of Conflicts of Interest: R.H. No relevant conflicts of interest to disclose. M.S. No relevant conflicts of interest to disclose. M.B.L. No relevant conflicts of interest to disclose. R.M. Mann Financial activities related to the present article: none to disclose. Financial activities not related to the present article: payment for lectures from Siemens Healthcare. Other relationships: none to disclose. R. Mus No relevant conflicts of interest to disclose. G.J.d.H. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: consultancy for Philips ICT Research Eindhoven as an expert in breast radiology; payment for lectures from Philips Healthcare Netherlands; stock/stock options in Sigmascreening. Other relationships: none to disclose. D.B. No relevant conflicts of interest to disclose. R.M.P. No relevant conflicts of interest to disclose. C.B. No relevant conflicts of interest to disclose. N.K. Financial activities related to the present article: none to disclose. Financial activities not related to the present article: grants from Matakina Technology, Qview Medical, Riverain Medical; patent from MeVis Medical Solutions; stock in Matakina Technology, Qview Medical. Other relationships: none to disclose.
The authors gratefully acknowledge the participation of C.N.A. Frotscher, MD, E. Ghazi, MD, S. Gommers, MD, M.W. Imhof-Tas, MD, and U.C. Lalji, MD, in the observer performance study.
Author Contributions
Author contributions: Guarantors of integrity of entire study, R.H., N.K.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; literature research, R.H., M.S., G.J.d.H., C.B., N.K.; clinical studies, M.B.L., R.M. Mann, R. Mus, G.J.d.H., R.M.P.; experimental studies, R.H., M.S., R.M. Mann, R. Mus, D.B., C.B., N.K.; statistical analysis, R.H., N.K.; and manuscript editing, R.H., M.B.L., R.M. Mann, R. Mus, G.J.d.H., D.B., R.M.P., N.K.
Supported by the Dutch Cancer Society (grant KUN 2006-3655).
References
- 1 . Single reading with computer-aided detection for screening mammography. N Engl J Med 2008;359(16):1675–1684. Crossref, Medline, Google Scholar
- 2 . Screening mammograms: interpretation with computer-aided detection—prospective evaluation. Radiology 2006;239(2):375–383. Link, Google Scholar
- 3 . Improved cancer detection using computer-aided detection with diagnostic and screening mammography: prospective study of 104 cancers. AJR Am J Roentgenol 2006;187(1):20–28. Crossref, Medline, Google Scholar
- 4 . Current status and future directions of computer-aided diagnosis in mammography. Comput Med Imaging Graph 2007;31(4-5):224–235. Crossref, Medline, Google Scholar
- 5 . Effectiveness of computer-aided detection in community mammography practice. J Natl Cancer Inst 2011;103(15):1152–1161. Crossref, Medline, Google Scholar
- 6 . Computer aids and human second reading as interventions in screening mammography: two systematic reviews to compare effects on cancer detection and recall rate. Eur J Cancer 2008;44(6):798–807. Crossref, Medline, Google Scholar
- 7 . Influence of computer-aided detection on performance of screening mammography. N Engl J Med 2007;356(14):1399–1409. Crossref, Medline, Google Scholar
- 8 . Computer-aided detection performance in mammographic examination of masses: assessment. Radiology 2004;233(2):418–423. Link, Google Scholar
- 9 . Early detection of breast cancer: overview of the evidence on computer-aided detection in mammography screening. J Med Imaging Radiat Oncol 2009;53(2):171–176. Crossref, Medline, Google Scholar
- 10 . CAD in mammography: lesion-level versus case-level analysis of the effects of prompts on human decisions. Int J CARS 2008;3(1-2):115–122. Crossref, Google Scholar
- 11 . Observer variability in cancer detection during routine repeat (incident) mammographic screening in a study of two versus one view mammography. J Med Screen 1999;6(3):152–158. Crossref, Medline, Google Scholar
- 12 . Perception of breast cancer: eye-position analysis of mammogram interpretation. Acad Radiol 2003;10(1):4–12. Crossref, Medline, Google Scholar
- 13 . Computer-aided detection versus independent double reading of masses on mammograms. Radiology 2003;227(1):192–200. Link, Google Scholar
- 14 . Computer aided detection of masses in mammograms as decision support. Br J Radiol 2006;79(Spec No 2):S123–S126. Crossref, Medline, Google Scholar
- 15 . Classification of breast lesions with multimodality computer-aided diagnosis: observer study results on an independent clinical data set. Radiology 2006;240(2):357–368. Link, Google Scholar
- 16 . Evaluation of information-theoretic similarity measures for content-based retrieval and detection of masses in mammograms. Med Phys 2007;34(1):140–150. Crossref, Medline, Google Scholar
- 17 . Interactive computer-aided diagnosis of breast masses: computerized selection of visually similar image sets from a reference library. Acad Radiol 2007;14(8):917–927. Crossref, Medline, Google Scholar
- 18 . Using computer-aided detection in mammography as a decision support. Eur Radiol 2010;20(10):2323–2330. Crossref, Medline, Google Scholar
- 19 . Breast cancer screening results 5 years after introduction of digital mammography in a population-based screening program. Radiology 2009;253(2):353–358. Link, Google Scholar
- 20 . Computer-aided detection as a decision assistant in chest radiography. In: Editor A, Editor B, eds. Proceedings of SPIE: medical imaging 2011—title. Vol 7966. Bellingham, Wash: SPIE–The International Society for Optical Engineering, 2011; 796614. Crossref, Google Scholar
- 21 . The use of contextual information for computer aided detection of masses in mammograms. In: Editor A, Editor B, eds. Proceedings of SPIE: medical imaging 2009—title. Vol 7260. Bellingham, Wash: SPIE–The International Society for Optical Engineering, 2009; 72600Q. Crossref, Google Scholar
- 22 . Computer-aided detection of masses in full-field digital mammography using screen-film mammograms for training. Phys Med Biol 2008;53(23):6879–6891. Crossref, Medline, Google Scholar
- 23 . Matching breast masses depicted on different views a comparison of three methods. Acad Radiol 2009;16(11):1338–1347. Crossref, Medline, Google Scholar
- 24 . Combining two mammographic projections in a computer aided mass detection method. Med Phys 2007;34(3):898–905. Crossref, Medline, Google Scholar
- 25 . Computerized localization of breast lesions from two views. An experimental comparison of two methods. Invest Radiol 1999;34(9):585–588. Crossref, Medline, Google Scholar
- 26 . Effects of computer-aided diagnosis on radiologists’ detection of breast masses. IWDM: Proceedings of the 7th International Workshop on Digital Mammography, 2004; 219–224. Google Scholar
- 27 . Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27(9):723–731. Crossref, Medline, Google Scholar
- 28 . Independent versus sequential reading in ROC studies of computer-assist modalities: analysis of components of variance. Acad Radiol 2002;9(9):1036–1043. Crossref, Medline, Google Scholar
Article History
Received February 7, 2012; revision requested March 26; revision received May 28; accepted June 6; final version accepted June 18.Published online: Jan 2013
Published in print: Jan 2013










