Potential of Computer-aided Diagnosis to Reduce Variability in Radiologists’ Interpretations of Mammograms Depicting Microcalcifications

Published Online:https://doi.org/10.1148/radiol.220001257

PURPOSE: To evaluate whether computer-aided diagnosis can reduce interobserver variability in the interpretation of mammograms.

MATERIALS AND METHODS: Ten radiologists interpreted mammograms showing clustered microcalcifications in 104 patients. Decisions for biopsy or follow-up were made with and without a computer aid, and these decisions were compared. The computer was used to estimate the likelihood that a microcalcification cluster was due to a malignancy. Variability in the radiologists’ recommendations for biopsy versus follow-up was then analyzed.

RESULTS: Variation in the radiologists’ accuracy, as measured with the SD of the area under the receiver operating characteristic curve, was reduced by 46% with computer aid. Access to the computer aid increased the agreement among all observers from 13% to 32% of the total cases (P < .001), while the κ value increased from 0.19 to 0.41 (P < .05). Use of computer aid eliminated two-thirds of the substantial disagreements in which two radiologists recommended biopsy and routine screening in the same patient (P < .05).

CONCLUSION: In addition to its demonstrated potential to improve diagnostic accuracy, computer-aided diagnosis has the potential to reduce the variability among radiologists in the interpretation of mammograms.


  • 1 Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists’ interpretations of mammograms. N Engl J Med 1994; 331: 1493-1499. Crossref, MedlineGoogle Scholar
  • 2 Beam CA, Layde PM, Sullivan DC. Variability in the interpretation of screening mammograms by US radiologists: findings from a national sample. Arch Intern Med 1996; 156: 209-213. Crossref, MedlineGoogle Scholar
  • 3 Schmidt RA, Newstead GM, Linver MN, et al. Mammographic screening sensitivity of general radiologists. In: Karssemeijer N, Thijssen M, Hendriks J, van Erning L, eds. Digital mammography. Dordrecht, the Netherlands: Kluwer Academic, 1998; 383-388. Google Scholar
  • 4 Kerlikowske K, Grady D, Barclay J, et al. Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. J Natl Cancer Inst 1998; 90: 1801-1809. Crossref, MedlineGoogle Scholar
  • 5 Getty DJ, Pickett RM, D’Orsi CJ, Swets JA. Enhanced interpretation of diagnostic images. Invest Radiol 1988; 23: 240-252. Crossref, MedlineGoogle Scholar
  • 6 D’Orsi CJ, Swets JA. Variability in the interpretation of mammograms (letter). N Engl J Med 1995; 332: 1172. MedlineGoogle Scholar
  • 7 Doi K, MacMahon H, Katsuragawa S, Nishikawa RM, Jiang Y. Computer-aided diagnosis in radiology: potential and pitfalls. Eur J Radiol 1999; 31: 97-109. Crossref, MedlineGoogle Scholar
  • 8 Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Giger ML, Doi K. Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999; 6: 22-33. Crossref, MedlineGoogle Scholar
  • 9 Huo Z, Giger ML, Vyborny CJ, Wolverton DE, Schmidt RA, Doi K. Automated computerized classification of malignant and benign masses on digitized mammograms. Acad Radiol 1998; 5: 155-168. Crossref, MedlineGoogle Scholar
  • 10 Sickles EA. Breast calcifications: mammographic evaluation. Radiology 1986; 160: 289-293. LinkGoogle Scholar
  • 11 Metz CE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989; 24: 234-245. Crossref, MedlineGoogle Scholar
  • 12 Swets JA, Pickett RM. Evaluation of diagnostic systems: methods from signal detection theory New York, NY: Academic Press, 1982. Google Scholar
  • 13 Jiang Y, Nishikawa RM, Wolverton DE, et al. Malignant and benign clustered microcalcifications: automated feature analysis and classification. Radiology 1996; 198: 671-678. LinkGoogle Scholar
  • 14 Metz CE. ROC methodology in radiologic imaging. Invest Radiol 1986; 21: 720-733. Crossref, MedlineGoogle Scholar
  • 15 Swets JA. Measuring the accuracy of diagnostic systems. Science 1988; 240: 1285-1293. Crossref, MedlineGoogle Scholar
  • 16 Fleiss JL. Statistical methods for rates and proportions 2nd ed. New York, NY: Wiley, 1981. Google Scholar
  • 17 Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960; 20: 37-46. CrossrefGoogle Scholar
  • 18 Fleiss JL. Measuring nominal scale agreement among many raters. Psych Bull 1971; 76: 378-382. CrossrefGoogle Scholar
  • 19 Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33: 159-174. Crossref, MedlineGoogle Scholar
  • 20 Maclure M, Willett WC. Misinterpretation and misuse of the kappa statistic. Am J Epidemiol 1987; 126: 161-169. Crossref, MedlineGoogle Scholar
  • 21 Chan HP, Doi K, Vyborny CJ, et al. Improvement in radiologists’ detection of clustered microcalcifications on mammograms: the potential of computer-aided diagnosis. Invest Radiol 1990; 25: 1102-1110. Crossref, MedlineGoogle Scholar
  • 22 Kegelmeyer WP, Jr, Pruneda JM, Bourland PD, Hillis A, Riggs MW, Nipper ML. Computer-aided mammographic screening for spiculated lesions. Radiology 1994; 191: 331-337. LinkGoogle Scholar
  • 23 Chan HP, Sahiner B, Helvie MA, et al. Improvement of radiologists’ characterization of mammographic masses by using computer-aided diagnosis: an ROC study. Radiology 1999; 212: 817-827. LinkGoogle Scholar
  • 24 Kobayashi T, Xu XW, MacMahon H, Metz CE, Doi K. Effect of a computer-aided diagnosis scheme on radiologists’ performance in detection of lung nodules on radiographs. Radiology 1996; 199: 843-848. LinkGoogle Scholar
  • 25 Difazio MC, MacMahon H, Xu XW, et al. Digital chest radiography: effect of temporal subtraction images on detection accuracy. Radiology 1997; 202: 447-452. LinkGoogle Scholar
  • 26 Monnier-Cholley L, MacMahon H, Katsuragawa S, Morishita J, Ishida T, Doi K. Computer-aided diagnosis for detection of interstitial opacities on chest radiographs. AJR Am J Roentgenol 1998; 171: 1651-1656. Crossref, MedlineGoogle Scholar
  • 27 Ashizawa K, MacMahon H, Ishida T, et al. Effect of an artificial neural network on radiologists’ performance in the differential diagnosis of interstitial lung disease using chest radiographs. AJR Am J Roentgenol 1999; 172: 1311-1315. Crossref, MedlineGoogle Scholar
  • 28 Thurfjell EL, Lernevall KA, Taube AA. Benefit of independent double reading in a population-based mammography screening program. Radiology 1994; 191: 241-244. LinkGoogle Scholar
  • 29 Metz CE, Shen JH. Gains in accuracy from replicated readings of diagnostic images: prediction and assessment in terms of ROC analysis. Med Decis Making 1992; 12: 60-75. Crossref, MedlineGoogle Scholar
  • 30 Beam CA, Sullivan DC, Layde PM. Effect of human variability on independent double reading in screening mammography. Acad Radiol 1996; 3: 891-897. Crossref, MedlineGoogle Scholar
  • 31 Jiang Y, Nishikawa RM, Schmidt RA, Metz CE, Doi K. Comparison of independent double reading and computer-aided diagnosis (CAD) for the diagnosis of breast lesions (abstr). Radiology 1999; 213(P): 323. Google Scholar

Article History

Published in print: Sept 2001