Comparison of Commercial AI Software Performance for Radiograph Lung Nodule Detection and Bone Age Prediction
Abstract
In independent validation, nine artificial intelligence products for detecting lung nodules on chest radiographs or predicting bone age on hand radiographs showed improved or comparable performance to human readers.
Background
Multiple commercial artificial intelligence (AI) products exist for assessing radiographs; however, comparable performance data for these algorithms are limited.
Purpose
To perform an independent, stand-alone validation of commercially available AI products for bone age prediction based on hand radiographs and lung nodule detection on chest radiographs.
Materials and Methods
This retrospective study was carried out as part of Project AIR. Nine of 17 eligible AI products were validated on data from seven Dutch hospitals. For bone age prediction, the root mean square error (RMSE) and Pearson correlation coefficient were computed. The reference standard was set by three to five expert readers. For lung nodule detection, the area under the receiver operating characteristic curve (AUC) was computed. The reference standard was set by a chest radiologist based on CT. Randomized subsets of hand (n = 95) and chest (n = 140) radiographs were read by 14 and 17 human readers, respectively, with varying experience.
Results
Two bone age prediction algorithms were tested on hand radiographs (from January 2017 to January 2022) in 326 patients (mean age, 10 years ± 4 [SD]; 173 female patients) and correlated strongly with the reference standard (r = 0.99; P < .001 for both). No difference in RMSE was observed between algorithms (0.63 years [95% CI: 0.58, 0.69] and 0.57 years [95% CI: 0.52, 0.61]) and readers (0.68 years [95% CI: 0.64, 0.73]). Seven lung nodule detection algorithms were validated on chest radiographs (from January 2012 to May 2022) in 386 patients (mean age, 64 years ± 11; 223 male patients). Compared with readers (mean AUC, 0.81 [95% CI: 0.77, 0.85]), four algorithms performed better (AUC range, 0.86–0.93; P value range, <.001 to .04).
Conclusions
Compared with human readers, four AI algorithms for detecting lung nodules on chest radiographs showed improved performance, whereas the remaining algorithms tested showed no evidence of a difference in performance.
© RSNA, 2024
Supplemental material is available for this article.
See also the editorial by Omoumi and Richiardi in this issue.
References
- 1. . AI for radiology. Radboud University Medical Center. https://grand-challenge.org/aiforradiology/. Updated 2023. Accessed January 15, 2023.
- 2. . To buy or not to buy—evaluating commercial AI solutions in radiology (the ECLAIR guidelines). Eur Radiol 2021;31(6):3786–3796.
- 3. . Generalizability and bias in a deep learning pediatric bone age prediction model using hand radiographs. Radiology 2023;306(2):e220505.
- 4. . Trends in clinical validation and usage of US Food and Drug Administration-cleared artificial intelligence algorithms for medical imaging. Clin Radiol 2023;78(2):123–129.
- 5. . Openness and transparency in the evaluation of bias in artificial intelligence. Radiology 2023;306(2):e222263.
- 6. . How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med 2021;27(4):582–584.
- 7. . Artificial intelligence in radiology: 100 commercially available products and their scientific evidence. Eur Radiol 2021;31(6):3797–3804.
- 8. . Regulatory frameworks for development and evaluation of artificial intelligence-based diagnostic imaging algorithms: summary and recommendations. J Am Coll Radiol 2021;18(3 Pt A):413–424.
- 9. . Current clinical applications of artificial intelligence in radiology and their best supporting evidence. J Am Coll Radiol 2020;17(11):1371–1381.
- 10. . A comparison of five methods of measuring mammographic density: a case-control study. Breast Cancer Res 2018;20(1):10.
- 11. . Using artificial intelligence to read chest radiographs for tuberculosis detection: a multi-site evaluation of the diagnostic accuracy of three deep learning systems. Sci Rep 2019;9(1):15000.
- 12. . Comparing different versions of computer-aided detection products when reading chest X-rays for tuberculosis. PLOS Digit Health 2022;1(6):e0000067.
- 13. . External evaluation of 3 commercial artificial intelligence algorithms for independent assessment of screening mammograms. JAMA Oncol 2020;6(10):1581–1588.
- 14. . Implementation of clinical artificial intelligence in radiology: who decides and how? Radiology 2022;305(3):555–563.
- 15. . “Yes, but will it work for my patients?” Driving clinically relevant research with benchmark datasets. NPJ Digit Med 2020;3(1):87.
- 16. . Grand Challenge: a platform for end-to-end development of machine learning solutions in biomedical imaging. Radboud University Medical Center. https://grand-challenge.org/. Updated 2023. Accessed January 15, 2023.
- 17. . Project AIR - general study protocol. Zenodo. https://doi.org/10.5281/zenodo.7573175. Published January 26, 2023. Accessed January 26, 2023.
- 18. . Bone suppressed images improve radiologists’ detection performance for pulmonary nodules in chest radiographs. Eur J Radiol 2013;82(12):2399–2405.
- 19. . Computer-aided detection improves detection of pulmonary nodules in chest radiographs beyond the support by bone-suppressed images. Radiology 2014;272(1):252–261.
- 20. . New methods for using computer-aided detection information for the detection of lung nodules on chest radiographs. Br J Radiol 2014;87(1036):20140015.
- 21. . Project AIR - lung nodule detection x-ray study protocol. Zenodo. https://doi.org/10.5281/zenodo.7573186. Published January 26, 2023. Accessed January 26, 2023.
- 22. . Project AIR - bone age prediction hand x-ray study protocol. Zenodo. https://doi.org/10.5281/zenodo.7573224. Published January 26, 2023. Accessed January 26, 2023.
- 23. . Radiographic atlas of skeletal development of the hand and wrist. 2nd ed. Stanford, Calif: Stanford University Press, 1999.
- 24. . A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators. Commun Stat Theory Methods 2009;38(15):2586–2603.
- 25. . DIDSR/iMRMC. https://github.com/DIDSR/iMRMC. Updated 2023. Accessed January 17, 2023.
- 26. . Accuracy and self-validation of automated bone age determination. Sci Rep 2022;12(1):6388.
- 27. . Computerized bone age estimation using deep learning based program: evaluation of the accuracy and efficiency. AJR Am J Roentgenol 2017;209(6):1374–1380.
- 28. . Effect of a comprehensive deep-learning model on the accuracy of chest x-ray interpretation by radiologists: a retrospective, multireader multicase study. Lancet Digit Health 2021;3(8):e496–e506.
- 29. . Association of artificial intelligence-aided chest radiograph interpretation with reader performance and efficiency. JAMA Netw Open 2022;5(8):e2229289.
- 30. . Autonomous chest radiograph reporting using AI: estimation of clinical impact. Radiology 2023;307(3):e222268.
- 31. . An artificial intelligence-based chest x-ray model on human nodule detection accuracy from a multicenter study. JAMA Netw Open 2021;4(12):e2141096.
- 32. . Evaluation of the clinical performance of an AI-based application for the automated analysis of chest X-rays. Sci Rep 2023;13(1):3680.
- 33. . Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur Radiol 2020;30(3):1359–1368.
Article History
Received: Apr 26 2023Revision requested: July 6 2023
Revision received: Nov 22 2023
Accepted: Nov 27 2023
Published online: Jan 09 2024