Evaluation of Reader Variability in the Interpretation of Follow-up CT Scans at Lung Cancer Screening
Abstract
In lung cancer screening, the presence or absence of a change in the size of noncalcified lung nodules appears to be the most important consideration in detecting change and making follow-up recommendations; reader agreement for those determinations seems acceptable but could be improved.
Purpose
To measure reader agreement in determining whether lung nodules detected at baseline screening computed tomography (CT) had changed at subsequent screening examinations and to evaluate the variability in recommendations for further follow-up.
Materials and Methods
All subjects were enrolled in the National Lung Screening Trial (NLST), and each participant consented to the use of their de-identified images for research purposes. The authors randomly selected 100 cases of nodules measuring at least 4.0 mm at 1-year screening CT that were considered by the original screening CT reader to be present on baseline CT scans; nodules considered by the original reader to have changed were oversampled. Selected images from each case showing the entire nodule at both examinations were preloaded on a picture archiving and communication system workstation. Nine radiologists served as readers, and they evaluated whether the nodule was present at baseline and recorded the bidimensional measurements and nodule characteristics at each examination, presence or absence of change, results of screening CT, and follow-up recommendations (high-level follow-up, low-level follow-up, no follow-up).
Results
On the basis of reviews during case selection, five nodules seen at follow-up were judged not to have been present at baseline; for 19 of the remaining 95 cases, at least one reader judged the nodule not to have been present at baseline. For the 76 nodules that were unanimously considered to have been present at baseline, 21%–47% (mean ± standard deviation, 30% ± 9) were judged to have grown. The κ values were similar for growth (κ = 0.55) and a positive screening result (κ = 0.51) and were lower for a change in margins and attenuation (κ = 0.27–0.31). The κ value in the recommendation of high- versus low-level follow-up was high (κ = 0.66).
Conclusion
Reader agreement on nodule growth and screening result was moderate to substantial. Agreement on follow-up recommendations was lower.
© RSNA, 2011
Supplemental material: http://radiology.rsna.org/lookup/suppl/doi:10.1148/radiol.10101254/-/DC1
References
- 1 . Screening for lung cancer with low-dose spiral computed tomography. Am J Respir Crit Care Med 2002;165(4):508–513.
- 2 . Interobserver and intraobserver variability in measurement of non-small-cell carcinoma lung lesions: implications for assessment of tumor response. J Clin Oncol 2003;21(13):2574–2582.
- 3 . Interobserver and intraobserver variability in the assessment of pulmonary nodule size on CT using film and computer display methods. Acad Radiol 2005;12(8):948–956.
- 4 . Lung cancer: interobserver agreement on interpretation of pulmonary findings at low-dose CT screening. Radiology 2008;246(1):265–272.
- 5 . Pulmonary nodule detection with low-dose CT of the lung: agreement among radiologists. AJR Am J Roentgenol 2005;185(4):973–978.
- 6 ; . Chest radiography as the comparison for spiral CT in the National Lung Screening Trial. Acad Radiol 2003;10(6):713–715.
- 7 . Baseline findings of a randomized feasibility trial of lung cancer screening with spiral CT scan vs chest radiograph: the Lung Screening Study of the National Cancer Institute. Chest 2004;126(1):114–121.
- 8 . Measuring nominal scale agreement among many raters. Psychol Bull 1971;76(5):378–382.
- 9 . An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 1977;33(2):363–374.
- 10 . The Lung Image Database Consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements. Acad Radiol 2007;14(12):1475–1485.
- 11 . Are two-dimensional CT measurements of small noncalcified pulmonary nodules reliable? Radiology 2004;231(2):453–458.
- 12 . Small pulmonary nodules: evaluation with repeat CT—preliminary experience. Radiology 1999;212(2):561–566.
- 13 . Small pulmonary nodules: volumetrically determined growth rates based on CT evaluation. Radiology 2000;217(1):251–256.
- 14 . Noncalcified lung nodules: volumetric assessment with thoracic CT. Radiology 2009;251(1):26–37.
- 15 . Inherent variability of CT lung nodule measurements in vivo using semiautomated volumetric measurements. AJR Am J Roentgenol 2006;186(4):989–994.
- 16 . Variability in radiologists’ interpretations of mammograms. N Engl J Med 1994;331(22):1493–1499.
- 17 . Variability and accuracy in mammographic interpretation using the American College of Radiology Breast Imaging Reporting and Data System. J Natl Cancer Inst 1998;90(23):1801–1809.
- 18 . Inter-observer and intra-observer variability of mammogram interpretation: a field study. Eur J Cancer 1992;28(6-7):1054–1058.
- 19 . CT screening for lung cancer: suspiciousness of nodules according to size on baseline scans. Radiology 2004;231(1):164–168.
- 20 . Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society. Radiology 2005;237(2):395–400.
Article History
Received June 28, 2010; revision requested July 30; revision received September 8; accepted October 14; final version accepted November 9.Published online: Apr 2011
Published in print: Apr 2011