CommunicationsFree Access

Breaking the Bias: A Fair Assessment of AI and Radiologists for Workflow Optimization and Collaborative Reporting

Published Online:https://doi.org/10.1148/radiol.232872

Editor:

In their study, published in the September 2023 issue of Radiology, Dr Lind Plesner and colleagues (1) found that radiologists outperformed four commercially available artificial intelligence (AI) tools in accurately diagnosing common pulmonary pathology on chest radiographs. We commend their inclusion of anteroposterior projections with multiple concurrent findings in a consecutively sampled cohort. Evidently, without utilizing contextual clues, AI cannot perform at a level expected for independent reporting. Providing such an “unfair advantage” to radiologists in this study design without accounting for AI strength such as near-instant analysis precludes a fair evaluation of AI capabilities, especially its role in workflow optimization.

As the authors noted, it is unsurprising that radiologists with access to clinical data and the ability to correlate radiologic findings outperform AI, which infers diagnoses from single images without context. Evaluating previous imaging is crucial for accurate reporting because interval changes can impact interpretation and diagnoses (2). It is also essential to highlight that the evaluated AI tools were not trained on data from the included hospitals. For optimal performance, these algorithms must be trained to account for variations in radiologic appearances due to technical heterogeneity in equipment and exposure settings (3). We kindly request the authors to provide information regarding the data set on which the AI applications were trained and whether it included the target findings.

It is encouraging that the evaluated AI tools achieved a negative predictive value of at least 92% and had lower rates of false-negative findings for airspace diseases than did radiologists. This underscores the utility of such algorithms in identifying “normal” radiographs, which can be deprioritized for reporting radiologists. This point, however, was not directly addressed in the discussion section of the study. Several centers have adopted AI algorithms, not as autonomous reporting mechanisms, but as tools to optimize workflow in triaging (4,5). By triaging, most pathology can be identified earlier, potentially enabling timely intervention and optimizing patient care.

We recognize the importance of evaluating AI performance in everyday practice. However, the comparison in this study is confounding and fails to identify the true utility of these innovations in their current form, which is patient triaging and workflow optimization. Efforts should be redirected toward understanding how such tools can aid reporting radiologists rather than replace them.

Disclosures of conflicts of interest: K.H.B. Other financial or nonfinancial interests from Annalise AI, which is currently under evaluation for use in a clinical workflow environment in radiology across the health board and being managed by an innovation team that is separate from the authors with funding from the Scottish government. Y.Z. Other financial or nonfinancial interests from Annalise AI, which is currently under evaluation for use in a clinical workflow environment in radiology across the health board and being managed by an innovation team that is separate from the authors with funding from the Scottish government. K.P. Other financial or nonfinancial interests from Annalise AI, which is currently under evaluation for use in a clinical workflow environment in radiology across the health board and being managed by an innovation team that is separate from the authors with funding from the Scottish government. A.R. Member of the UK Royal College of Radiologists Scottish Standing Committee. S.W. Annalise AI product is used currently in the hospital Radiology and Emergency Medicine Department; medical director for Axon Diagnostics.

References

  • 1. Lind Plesner L, Müller FC, Brejnebøl MW, et al. Commercially available chest radiograph AI tools for detecting airspace disease, pneumothorax, and pleural effusion. Radiology 2023;308(3):e231236.
  • 2. Berlin L. Comparing new radiographs with those obtained previously. AJR Am J Roentgenol 1999;172(1):3–6.
  • 3. Wu JT, Wong KCL, Gur Y, et al. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents. JAMA Netw Open 2020;3(10):e2022779.
  • 4. Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G. Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 2019;291(1):196–202.
  • 5. Tang YX, Tang YB, Peng Y, et al. Automated abnormality classification of chest radiographs using deep convolutional neural networks. NPJ Digit Med 2020;3(1):70.

References

  • 1. Lind Plesner L, Müller FC, Brejnebøl MW, et al. Commercially available chest radiograph AI tools for detecting airspace disease, pneumothorax, and pleural effusion. Radiology 2023;308(3):e231236.
  • 2. Rutjes AWS, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PMM. Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem 2005;51(8):1335–1341.
  • 3. You S, Park JH, Park B, et al. The diagnostic performance and clinical value of deep learning-based nodule detection system concerning influence of location of pulmonary nodule. Insights Imaging. 2023;14(1):149.
  • 4. National Institute for Health and Care Excellence. Artificial intelligence-derived software to analyse chest X-rays for suspected lung cancer in primary care referrals: early value assessment. https://www.nice.org.uk/guidance/hte12. Published September 28, 2023. Accessed October 20, 2023.

References

1. Lind Plesner L, Müller FC, Brejnebøl MW, et al. Commercially available chest radiograph AI tools for detecting airspace disease, pneumothorax, and pleural effusion. Radiology 2023;308(3):e231236. Crossref Medline Google Scholar
2. Berlin L. Comparing new radiographs with those obtained previously. AJR Am J Roentgenol 1999;172(1):36. Crossref Medline Google Scholar
3. Wu JT, Wong KCL, Gur Y, et al. Comparison of chest radiograph interpretations by artificial intelligence algorithm vs radiology residents. JAMA Netw Open 2020;3(10):e2022779. Crossref Medline Google Scholar
4. Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G. Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 2019;291(1):196202. Link Google Scholar
5. Tang YX, Tang YB, Peng Y, et al. Automated abnormality classification of chest radiographs using deep convolutional neural networks. NPJ Digit Med 2020;3(1):70. Crossref Medline Google Scholar


Response

We thank Dr Bennett and colleagues for taking an interest in our study (1). Our primary objective was not to learn whether AI or human would be best at reporting chest radiographs, and the comparison of AI with the radiologist report was a secondary outcome. Rather, we wanted to learn the expected diagnostic accuracy of AI tools in a hospital radiology practice. It was neither our aim to test which specific deployment use-cases would be most optimal.

The measured performance of a diagnostic test depends on the spectrum of cases and control patients in the patient sample (2). For example, a sensitivity value will inherently vary with the “difficulty” of the cases (3). Therefore, a measurement of current standard (ie, radiologist) can be used to put the values into context. At this time, commercial AI tools and radiologists do not share the same capabilities because a radiologist can clinically correlate radiographic findings, whereas the AI algorithms can only process image pixels. Hence, there is no ideal way to compare them, and the choice must be a matter of the research question: If evaluating which reader is better at analyzing pixels, then it is better to “blind” the radiologist to previous imaging and context. If comparing clinical performance, we believe the radiologist should have access to all information. Otherwise, there is a risk of a relatively inflated AI performance and a type 1 error. This should also be strongly considered when designing studies of collaborative reporting.

We are not aware of any commercially available AI tools that can be fine-tuned on local data (except for adjusting operating thresholds), and this would also conflict with regulations. We did not participate in training any of these tools and we do not have access to training data, but all vendors were aware of our disease definitions and found them compatible with their product.

Our study did not evaluate collaborative reporting, and our findings are not useful for answering whether radiologists can improve with AI. This question is still not sufficiently answered in our opinion, which was also the conclusion in recent National Institute for Health and Care Excellence (4) guidelines for AI lung cancer detection.

Disclosures of conflicts of interest: L.L.P. No relevant relationships. F.C.M. Institutional research grant from Siemens Healthineers. M.B. Danish government digitalization grant SMARTCHEST project. M.B.A. Teaching honoraria from Philips Healthcare, Siemens Healthineers, Boehringer Ingelheim, Roche.

References

1. Lind Plesner L, Müller FC, Brejnebøl MW, et al. Commercially available chest radiograph AI tools for detecting airspace disease, pneumothorax, and pleural effusion. Radiology 2023;308(3):e231236. Crossref Medline Google Scholar
2. Rutjes AWS, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PMM. Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem 2005;51(8):13351341. Crossref Medline Google Scholar
3. You S, Park JH, Park B, et al. The diagnostic performance and clinical value of deep learning-based nodule detection system concerning influence of location of pulmonary nodule. Insights Imaging. 2023;14(1):149. Crossref Medline Google Scholar
4. National Institute for Health and Care Excellence. Artificial intelligence-derived software to analyse chest X-rays for suspected lung cancer in primary care referrals: early value assessment. https://www.nice.org.uk/guidance/hte12. Published September 28, 2023. Accessed October 20, 2023. Google Scholar

Article History

Published online: Feb 27 2024