Reviews and CommentaryFree Access

Change or No Change: Using AI to Compare Follow-up Chest Radiographs

Published Online:https://doi.org/10.1148/radiol.232481

See also the article by Yun and Ahn et al in this issue.

Julianna Czum is an associate professor in the Russell H. Morgan Department of Radiology and Radiological Science at Johns Hopkins University in Baltimore, Maryland. She is a cardiothoracic radiologist with fellowship training in cardiovascular MRI and subspecialty board certification in cardiovascular CT, and she was the division director of cardiothoracic imaging for more than 10 years at Dartmouth-Hitchcock Medical Center and the Geisel School of Medicine at Dartmouth College.

Julianna Czum is an associate professor in the Russell H. Morgan Department of Radiology and Radiological Science at Johns Hopkins University in Baltimore, Maryland. She is a cardiothoracic radiologist with fellowship training in cardiovascular MRI and subspecialty board certification in cardiovascular CT, and she was the division director of cardiothoracic imaging for more than 10 years at Dartmouth-Hitchcock Medical Center and the Geisel School of Medicine at Dartmouth College.

If you have ever spent a workday reading a large volume of inpatient chest radiographs, whether you are a subspecialty chest radiologist in an academic medical center or a private practice radiologist working at a community hospital, then you know that some hospitalized patients who undergo chest radiography will have no recent prior scans for comparison. However, after an initial chest radiograph, many inpatients will have follow-up chest radiographs. For example, chest radiography may be performed upon patient admission for initial placement of support and monitoring equipment, as well as after, for equipment adjustment or placement of additional equipment. Such chest radiographs may even be obtained in rapid succession, with mere minutes between initial imaging for malpositioned equipment and follow-up imaging to ensure satisfactory repositioning has been achieved. Additionally, chest radiographs are often performed when a patient’s clinical condition deteriorates, such as a sudden onset of hypoxia in a patient with altered mental status and an unprotected airway that may place them at risk for aspiration. If postobstructive atelectasis occurs due to aspirated material or mucus plugging, additional same-day radiographs may be obtained to look for lung re-expansion. All these scenarios can result in large hospitals having hundreds of follow-up chest radiographs needing interpretation every single day.

Obviously, a radiologist will need to interpret these hundreds of follow-up chest radiographs. Medical image perception research performed by Kundel and Nodine (1) showed that radiologists with as little as 2 years of experience can detect abnormalities on a chest radiograph with 70% accuracy after viewing it for only 200 msec, more than expected by chance only. With unlimited viewing time, that accuracy increased to nearly perfect, with 97% of true-positive findings detected (1). An independent study reported that expert radiologists have developed more efficient eye movement scan paths than those with less experience, which may enable them to achieve even higher accuracy when viewing images at high speed (2). Importantly, these efficient eye movements are not the result of deliberate training but rather develop over time without the awareness of the radiologist.

Experience helps you develop perceptual expertise to quickly extract valuable information from a complex medical image if you are a radiologist (3). However, while a quick assessment of a chest radiograph may work for lesion search and detection, more is required to fully interpret the image. For instance, image quality, positioning, and artifacts must be assessed. Next the reader needs to synthesize perceptual data, experience-based and evidence-based medical knowledge, and patient-specific information to provide a diagnosis and, potentially, a recommendation. So, imagine that when you dive into a worklist of chest radiographs at the start of your day an artificial intelligence (AI) algorithm is simultaneously working in the background to assess follow-up inpatient radiographs for change versus stability.

In this issue of Radiology, a deep learning AI algorithm developed by Yun and Ahn et al (4) was used to assess the stability (ie, no change) between baseline and follow-up paired chest radiographs. The study included 3 304 996 anteroposterior and posteroanterior chest radiographs obtained in 329 036 adult patients at a single institution over a 7-year period. Of these, 550 779 pairs of chest radiographs (356 367 with no change, 194 412 with change) were used to develop the AI algorithm and 1620 pairs of chest radiographs were used as an internal validation data set. Of 63 416 chest radiographs obtained in 18 454 patients in the emergency department (ED) and 354 312 chest radiographs obtained in 38 496 patients in the intensive care unit (ICU), the authors used 533 randomly selected ED pairs of chest radiographs (265 with no change, 268 with change) and 600 ICU pairs (310 with no change, 290 with change) for the test set. The authors used thoracic cage image registration to account for anatomic position differences between radiographs. Performance of the AI algorithm was assessed by using the area under the receiver operating characteristic curve (AUC), with ground truth based on the determinations of two radiologists.

The median time interval between acquisition of the initial radiograph and the follow-up radiograph in the training set, validation set, ED test set, and ICU test set ranged from 10 to 71 days. The absolute number of days between acquisition of the paired radiographs ranged from a low of 0–880 days for the ICU test set to a high of 0–2899 days for the training set, which included outpatients. The large time intervals between paired images in the ED and ICU data sets, especially those taken years apart, were surprising given the frequency of chest imaging in these high acuity locations, although understandable as the study did not exclude outpatient chest radiographs.

Overall, the AI algorithm developed by Yun and Ahn et al (4) achieved an AUC of 0.77 for discriminating change from no change in paired chest radiographs in the validation data set, and a cutoff value of 0.588 for the probability for triaging no change was identified with the Youden index. Using this cutoff, an AUC of 0.80 was achieved for discriminating change versus no change in chest radiographs in both the ED and ICU test sets. AUCs of 0.70–0.80 are considered to indicate good performance in machine learning, but excellent performance, which is what is expected of radiologists, and perfect performance, which is what radiologists strive for, were not achieved by the model (4). Using a threshold for a higher triage rate (40%, based on a cutoff value of 0.529), the specificity for discriminating change from no change was 88.4% (237 of 268 image pairs) in the ED data set and 90.0% (261 of 290 image pairs) in the ICU data set.

This study had three key limitations. First, device changes were not deemed real changes for the purpose of classifying change versus no change on chest radiographs. The authors admit this includes malpositioned equipment, which is of critical importance for radiologists to quickly identify and communicate because of potential morbidity and mortality. Second, it is unclear how well the algorithm would handle real-life serial imaging comparison for which it would need to compare several radiographs in succession rather than only one pair. Third, the authors acknowledge that the data used in this study came from a single institution. Given new research into bias in AI foundation models, paucity of information about how generalizable any AI algorithm is with respect to a target patient population is an important gap in knowledge (5).

Radiologists today are experiencing growing caseloads due to increased demand for medical imaging and increased images per cross-sectional examinations (6). One study calculated that the number of images needing interpretation per minute per radiologist increased more than 550% between 1999 and 2010 (7). In a survey of chest radiologists, 66.8% reported burnout related to workload (8). Although the workload of chest radiologists varies, 80% of respondents to a follow-up survey reported that reading at an average sectional case volume was at or above their capacity and correlated with measures of burnout (9). This same study went on to show that AI tools were associated with significant decreases in burnout.

Would AI capable of grouping radiographs by those with and without interval changes be helpful to your workflow? In this scenario the order of chest radiographs is simply being rearranged, which does not actually reduce the number of chest radiographs requiring interpretation by a radiologist. Thus, even if AI can accurately dichotomize chest radiographs in this way, it will not reduce the radiologist caseload. Yun and Ahn et al (4) mention the potential for autonomous reporting of no change in image pairs as a way to reduce radiologists’ workload. Although an interesting idea, it is also speculative as the authors did not study autonomous reporting and its impact on workload, including the time required for a radiologist to review and edit autonomously generated reports.

Although defining threshold work volumes may be important for burnout prevention, no national standards for minimum interpretation speed or per-radiologist workload exist, and whether or not regulating workloads is desirable is controversial (7). So, do we just keep pressing forward with the status quo of trying to interpret ever more images with the tools we have, or do we turn our hopes to AI, which has its own controversies? That is, do we want change or no change?

Disclosures of conflicts of interest: J.C. Editorial board member for Radiology: Cardiothoracic Imaging.

References

  • 1. Kundel HL, Nodine CF. Interpreting chest radiographs without visual search. Radiology 1975;116(3):527–532.
  • 2. Kundel HL, La Follette PS Jr. Visual search patterns and experience with radiological images. Radiology 1972;103(3):523–528.
  • 3. Drew T, Evans K, Võ ML, Jacobson FL, Wolfe JM. Informatics in radiology: what can you see in a single glance and how might this guide visual search in medical images? RadioGraphics 2013;33(1):263–274.
  • 4. Yun J, Ahn Y, Cho K, et al. Deep learning for automated triaging of stable chest radiographs in a follow-up setting. Radiology 2023;309(1):e230606.
  • 5. Glocker B, Jones C, Roschewitz M, Winczek S. Risk of Bias in Chest Radiography Deep Learning Foundation Models. Radiol Artif Intell 2023. https://doi.org/10.1148/ryai.230060. Published online September 27, 2023.
  • 6. Alexander R, Waite S, Bruno MA, et al. Mandating Limits on Workload, Duty, and Speed in Radiology. Radiology 2022;304(2):274–282.
  • 7. McDonald RJ, Schwartz KM, Eckel LJ, et al. The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload. Acad Radiol 2015;22(9):1191–1198.
  • 8. Eisenberg RL, Sotman TE, Czum JM, Montner SM, Meyer CA. Prevalence of Burnout Among Cardiothoracic Radiologists: Stress Factors, Career Satisfaction, and Modality-specific Imaging Volumes. J Thorac Imaging 2022;37(3):194–200.
  • 9. Meyer CA, Klein JS, Liubauskas R, Bhalla S, Eisenberg RL. Cardiothoracic Radiologist Workload, Work Capacity, and Burnout Post-COVID: Results of a Survey From the Society of Thoracic Radiology. J Thorac Imaging 2023;38(5):261–269.

Article History

Received: Sept 15 2023
Revision requested: Sept 20 2023
Revision received: Sept 21 2023
Accepted: Sept 25 2023
Published online: Oct 24 2023