Comparison of Commercial AI Software Performance for Radiograph Lung Nodule Detection and Bone Age Prediction

Published Online:https://doi.org/10.1148/radiol.230981

In independent validation, nine artificial intelligence products for detecting lung nodules on chest radiographs or predicting bone age on hand radiographs showed improved or comparable performance to human readers.

Background

Multiple commercial artificial intelligence (AI) products exist for assessing radiographs; however, comparable performance data for these algorithms are limited.

Purpose

To perform an independent, stand-alone validation of commercially available AI products for bone age prediction based on hand radiographs and lung nodule detection on chest radiographs.

Materials and Methods

This retrospective study was carried out as part of Project AIR. Nine of 17 eligible AI products were validated on data from seven Dutch hospitals. For bone age prediction, the root mean square error (RMSE) and Pearson correlation coefficient were computed. The reference standard was set by three to five expert readers. For lung nodule detection, the area under the receiver operating characteristic curve (AUC) was computed. The reference standard was set by a chest radiologist based on CT. Randomized subsets of hand (n = 95) and chest (n = 140) radiographs were read by 14 and 17 human readers, respectively, with varying experience.

Results

Two bone age prediction algorithms were tested on hand radiographs (from January 2017 to January 2022) in 326 patients (mean age, 10 years ± 4 [SD]; 173 female patients) and correlated strongly with the reference standard (r = 0.99; P < .001 for both). No difference in RMSE was observed between algorithms (0.63 years [95% CI: 0.58, 0.69] and 0.57 years [95% CI: 0.52, 0.61]) and readers (0.68 years [95% CI: 0.64, 0.73]). Seven lung nodule detection algorithms were validated on chest radiographs (from January 2012 to May 2022) in 386 patients (mean age, 64 years ± 11; 223 male patients). Compared with readers (mean AUC, 0.81 [95% CI: 0.77, 0.85]), four algorithms performed better (AUC range, 0.86–0.93; P value range, <.001 to .04).

Conclusions

Compared with human readers, four AI algorithms for detecting lung nodules on chest radiographs showed improved performance, whereas the remaining algorithms tested showed no evidence of a difference in performance.

© RSNA, 2024

Supplemental material is available for this article.

See also the editorial by Omoumi and Richiardi in this issue.

References

  • 1. Diagnostic Imaging Analysis Group. AI for radiology. Radboud University Medical Center. https://grand-challenge.org/aiforradiology/. Updated 2023. Accessed January 15, 2023.
  • 2. Omoumi P, Ducarouge A, Tournier A, et al. To buy or not to buy—evaluating commercial AI solutions in radiology (the ECLAIR guidelines). Eur Radiol 2021;31(6):3786–3796.
  • 3. Beheshtian E, Putman K, Santomartino SM, Parekh VS, Yi PH. Generalizability and bias in a deep learning pediatric bone age prediction model using hand radiographs. Radiology 2023;306(2):e220505.
  • 4. Khunte M, Chae A, Wang R, et al. Trends in clinical validation and usage of US Food and Drug Administration-cleared artificial intelligence algorithms for medical imaging. Clin Radiol 2023;78(2):123–129.
  • 5. Larson DB. Openness and transparency in the evaluation of bias in artificial intelligence. Radiology 2023;306(2):e222263.
  • 6. Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med 2021;27(4):582–584.
  • 7. van Leeuwen KG, Schalekamp S, Rutten MJCM, van Ginneken B, de Rooij M. Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radiol 2021;31(6):3797–3804.
  • 8. Larson DB, Harvey H, Rubin DL, Irani N, Tse JR, Langlotz CP. Regulatory frameworks for development and evaluation of artificial intelligence-based diagnostic imaging algorithms: summary and recommendations. J Am Coll Radiol 2021;18(3 Pt A):413–424.
  • 9. Tariq A, Purkayastha S, Padmanaban GP, et al. Current clinical applications of artificial intelligence in radiology and their best supporting evidence. J Am Coll Radiol 2020;17(11):1371–1381.
  • 10. Astley SM, Harkness EF, Sergeant JC, et al. A comparison of five methods of measuring mammographic density: a case-control study. Breast Cancer Res 2018;20(1):10.
  • 11. Qin ZZ, Sander MS, Rai B, et al. Using artificial intelligence to read chest radiographs for tuberculosis detection: a multi-site evaluation of the diagnostic accuracy of three deep learning systems. Sci Rep 2019;9(1):15000.
  • 12. Qin ZZ, Barrett R, Ahmed S, et al. Comparing different versions of computer-aided detection products when reading chest X-rays for tuberculosis. PLOS Digit Health 2022;1(6):e0000067.
  • 13. Salim M, Wåhlin E, Dembrower K, et al. External evaluation of 3 commercial artificial intelligence algorithms for independent assessment of screening mammograms. JAMA Oncol 2020;6(10):1581–1588.
  • 14. Daye D, Wiggins WF, Lungren MP, et al. Implementation of clinical artificial intelligence in radiology: who decides and how? Radiology 2022;305(3):555–563.
  • 15. Panch T, Pollard TJ, Mattie H, Lindemer E, Keane PA, Celi LA. “Yes, but will it work for my patients?” Driving clinically relevant research with benchmark datasets. NPJ Digit Med 2020;3(1):87.
  • 16. Diagnostic Imaging Analysis Group. Grand Challenge: a platform for end-to-end development of machine learning solutions in biomedical imaging. Radboud University Medical Center. https://grand-challenge.org/. Updated 2023. Accessed January 15, 2023.
  • 17. van Leeuwen KG, de Rooij M, Rutten MJCM, Schalekamp S, van Ginneken B. Project AIR - general study protocol. Zenodo. https://doi.org/10.5281/zenodo.7573175. Published January 26, 2023. Accessed January 26, 2023.
  • 18. Schalekamp S, van Ginneken B, Meiss L, et al. Bone suppressed images improve radiologists’ detection performance for pulmonary nodules in chest radiographs. Eur J Radiol 2013;82(12):2399–2405.
  • 19. Schalekamp S, van Ginneken B, Koedam E, et al. Computer-aided detection improves detection of pulmonary nodules in chest radiographs beyond the support by bone-suppressed images. Radiology 2014;272(1):252–261.
  • 20. Schalekamp S, van Ginneken B, Heggelman B, et al. New methods for using computer-aided detection information for the detection of lung nodules on chest radiographs. Br J Radiol 2014;87(1036):20140015.
  • 21. van Leeuwen KG, de Rooij M, Rutten MJCM, van Ginneken B, Schalekamp S. Project AIR - lung nodule detection x-ray study protocol. Zenodo. https://doi.org/10.5281/zenodo.7573186. Published January 26, 2023. Accessed January 26, 2023.
  • 22. van Leeuwen KG, Schalekamp S, de Rooij M, van Ginneken B, Rutten MJCM. Project AIR - bone age prediction hand x-ray study protocol. Zenodo. https://doi.org/10.5281/zenodo.7573224. Published January 26, 2023. Accessed January 26, 2023.
  • 23. Greulich WW, Pyle SI. Radiographic atlas of skeletal development of the hand and wrist. 2nd ed. Stanford, Calif: Stanford University Press, 1999.
  • 24. Gallas BD, Bandos A, Samuelson FW, Wagner RF. A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators. Commun Stat Theory Methods 2009;38(15):2586–2603.
  • 25. Gallas BD. DIDSR/iMRMC. https://github.com/DIDSR/iMRMC. Updated 2023. Accessed January 17, 2023.
  • 26. Martin DD, Calder AD, Ranke MB, Binder G, Thodberg HH. Accuracy and self-validation of automated bone age determination. Sci Rep 2022;12(1):6388.
  • 27. Kim JR, Shim WH, Yoon HM, et al. Computerized bone age estimation using deep learning based program: evaluation of the accuracy and efficiency. AJR Am J Roentgenol 2017;209(6):1374–1380.
  • 28. Seah JCY, Tang CHM, Buchlak QD, et al. Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit Health 2021;3(8):e496–e506.
  • 29. Ahn JS, Ebrahimian S, McDermott S, et al. Association of artificial intelligence-aided chest radiograph interpretation with reader performance and efficiency. JAMA Netw Open 2022;5(8):e2229289.
  • 30. Plesner LL, Müller FC, Nybing JD, et al. Autonomous chest radiograph reporting using AI: estimation of clinical impact. Radiology 2023;307(3):e222268.
  • 31. Homayounieh F, Digumarthy S, Ebrahimian S, et al. An artificial intelligence-based chest x-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw Open 2021;4(12):e2141096.
  • 32. Niehoff JH, Kalaitzidis J, Kroeger JR, Schoenbeck D, Borggrefe J, Michael AE. Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays. Sci Rep 2023;13(1):3680.
  • 33. Park S, Lee SM, Lee KH, et al. Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur Radiol 2020;30(3):1359–1368.

Article History

Received: Apr 26 2023
Revision requested: July 6 2023
Revision received: Nov 22 2023
Accepted: Nov 27 2023
Published online: Jan 09 2024