Abstract
When deciding whether to recall a woman for additional diagnostic examinations, experienced radiologists performed significantly better on average and, as important, more consistently in the clinic than in the laboratory when interpreting the same examinations.
Purpose
To compare radiologists' performance during interpretation of screening mammograms in the clinic with their performance when reading the same mammograms in a retrospective laboratory study.
Materials and Methods
This study was conducted under an institutional review board–approved, HIPAA-compliant protocol; the need for informed consent was waived. Nine experienced radiologists rated an enriched set of mammograms that they had personally read in the clinic (the “reader-specific” set) mixed with an enriched “common” set of mammograms that none of the participants had previously read in the clinic by using a screening Breast Imaging Reporting and Data System (BI-RADS) rating scale. The original clinical recommendations to recall the women for a diagnostic work-up, for both reader-specific and common sets, were compared with their recommendations during the retrospective experiment. The results are presented in terms of reader-specific and group-averaged sensitivity and specificity levels and the dispersion (spread) of reader-specific performance estimates.
Results
On average, the radiologists' performance was significantly better in the clinic than in the laboratory (P = .035). Interreader dispersion of the computed performance levels was significantly lower during the clinical interpretations (P < .01).
Conclusion
Retrospective laboratory experiments may not represent either expected performance levels or interreader variability during clinical interpretations of the same set of mammograms in the clinical environment well.
© RSNA, 2008
References
- 1 Comparing the area under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44(3):837–845.
- 2 Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992;27(9):723–731.
- 3 Components of variance models and multiple bootstrap experiments: an alternative method for random effects, receiver operating characteristics analysis. Acad Radiol 2000;7(5):341–349.
- 4 Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. Acad Radiol 2001;8(7):605–615.
- 5 Analysis of uncertainties in estimates of components of variance in multivariate ROC analysis. Acad Radiol 2001;8(7):616–622.
- 6 Assessment of medical imaging and computer-assist systems: lessons from recent experience. Acad Radiol 2002;9(11):1264–1277.
- 7 Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol 2004;11(9):980–995.
- 8 Variability in observer performance studies experimental observations. Acad Radiol 2005;12(12):1527–1533.
- 9 Prevalence effect in a laboratory environment. Radiology 2003;228(1):10–14.
- 10 Solitary pulmonary nodule diagnosis on CT: results of an observer study. Acad Radiol 2005;12(4):496–501.
- 11 Breast lesion detection and classification: comparison of screen-film mammography and full-field digital mammography with soft-copy reading—observer performance study. Radiology 2005;237(1):37–44.
- 12 Computer-aided diagnosis for the detection and classification of lung cancers on chest radiographs ROC analysis of radiologists' performance. Acad Radiol 2006;13(8):995–1003.
- 13 Context bias: a problem in diagnostic radiology. JAMA 1996;276(21):1752–1755.
- 14 Assessing mammographers' accuracy: a comparison of clinical and test performance. J Clin Epidemiol 2000;53(5):443–450.
- 15 Variability in radiologists' interpretations of mammograms. N Engl J Med 1994;331(22):1493–1499.
- 16 Variability in the interpretation of screening mammograms by US radiologists: findings from a national sample. Arch Intern Med 1996;156(2):209–213.
- 17 Does diagnostic accuracy in mammography depend on radiologists' experience? J Womens Health 1998;7(4):443–449.
- 18 Improving the accuracy of mammography: volume and outcome relationships. J Natl Cancer Inst 2002;94(5):369–375.
- 19 Association of volume and volume-independent factors with accuracy in screening mammogram interpretation. J Natl Cancer Inst 2003;95(4):282–290.
- 20 American College of Radiology. Breast imaging reporting and data system atlas (BI-RADS atlas). Reston, Va: American College of Radiology, 2003. Available at: http://www.acr.org/SecondaryMainMenuCategories/quality_safety/BIRADSAtlas.aspx. Accessed October 1, 2003.
- 21 Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 2007;14(6):723–748.
- 22 Changes in breast cancer detection and mammography recall rates after the introduction of a computer-aided detection system. J Natl Cancer Inst 2004;96(3):185–190.
- 23 Effect of observer instruction on ROC study of chest images. Invest Radiol 1990;25(3):230–234.
- 24 Simple robust tests for scale differences in paired data. Biometrika 1994;81(2):359–372.
- 25 Robust tests for equality of variances. In: Olkin I, ed. Contributions to probability and statistics. Palo Alto, Calif: Stanford University Press, 1960;278:–292.
- 26 Inter- and intra-observer variability in the evaluation of dynamic breast cancer MRI. J Magn Reson Imaging 2006;24(6):1316–1325.
- 27 Quality standards and certification requirements for mammography facilities—FDA. Interim rule with request for comments. Fed Regist 1993;58(243):67565–67572.
- 28 “Memory effect” in observer performance studies of mammograms. Acad Radiol 2005;12(3):286–290.
- 29 Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005;353(17):1773–1783.
- 30 Influence of computer-aided detection on performance of screening mammography. N Engl J Med 2007;356(14):1399–1409.
Article History
Received November 19, 2007; revision requested December 21, 2007; final revision received January 17, 2008; accepted March 28.Published in print: 2008