Endotracheal Tube Position Assessment on Chest Radiographs Using Deep Learning

Published Online:https://doi.org/10.1148/ryai.2020200026

Abstract

Purpose

To determine the efficacy of deep learning in assessing endotracheal tube (ETT) position on radiographs.

Materials and Methods

In this retrospective study, 22 960 de-identified frontal chest radiographs from 11 153 patients (average age, 60.2 years ± 19.9 [standard deviation], 55.6% men) between 2010 and 2018 containing an ETT were placed into 12 categories, including bronchial insertion and distance from the carina at 1.0-cm intervals (0.0–0.9 cm, 1.0–1.9 cm, etc), and greater than 10 cm. Images were split into training (80%, 18 368 images), validation (10%, 2296 images), and internal test (10%, 2296 images), derived from the same institution as the training data. One hundred external test radiographs were also obtained from a different hospital. The Inception V3 deep neural network was used to predict ETT-carina distance. ETT-carina distances and intraclass correlation coefficients (ICCs) for the radiologists and artificial intelligence (AI) system were calculated on a subset of 100 random internal and 100 external test images. Sensitivity and specificity were calculated for low and high ETT position thresholds.

Results

On the internal and external test images, respectively, the ICCs of AI and radiologists were 0.84 (95% CI: 0.78, 0.92) and 0.89 (95% CI: 0.77, 0.94); the ICCs of the radiologists were 0.93 (95% CI: 0.90, 0.95) and 0.84 (95% CI: 0.71, 0.90). The AI model was 93.9% sensitive (95% CI: 90.0, 96.7) and 97.7% specific (95% CI: 96.9, 98.3) for detecting ETT-carina distance less than 1 cm.

Conclusion

Deep learning predicted ETT-carina distance within 1 cm in most cases and showed excellent interrater agreement compared with radiologists. The model was sensitive and specific in detecting low ETT positions.

Keywords: Catheters, Conventional Radiography, Convolutional Neural Network (CNN), Diagnosis, Instrumentation, Supervised learning, Thorax

© RSNA, 2020

Summary

Deep learning predicts endotracheal tube position on chest radiographs with accuracy similar to that of radiologists.

Key Points

  • ■ Deep learning predicts the endotracheal tube (ETT)-carina distance within 1 cm in most cases.

  • ■ The artificial intelligence model showed excellent interrater agreement of assessing ETT-carina distance compared with radiologists on internal and external test sets.

  • ■ The artificial intelligence model demonstrated sensitivity and specificity of greater than 90% in detecting critically low ETT positions.

Introduction

Radiologists routinely assess the position of an endotracheal tube (ETT) with chest radiography (1,2) because there are consequences of both low and high ETT placements. A low position, such as within or near a bronchus, can cause respiratory compromise including collapse and hypoxemia of the nonventilated lung, hyperinflation of the other lung, pneumothorax, and sometimes pneumonia (3,4). A high position can result in air leak, accidental extubation, or injury to the vocal cords (5).

Interest has been expressed in using automated methods to detect the ETT to facilitate identification of improper placements. For example, traditional computer-aided detection methods using feature extraction and support vector machines have yielded areas under the receiver operating characteristic curve (AUC) of 0.88 and 0.94, respectively, for detection of ETTs (6,7). Because deep learning has been widely successful in many image processing tasks, including image classification and object detection on the ImageNet Large Scale Visual Recognition Competition (8,9), there has been recent interest in applying them to medical imaging. For example, deep convolutional neural networks have been shown to be accurate in tuberculosis (10), skin cancer (11), diabetic retinopathy (12), breast cancer screening (13,14), and lung cancer screening (15).

Regarding ETT assessment, two studies using deep learning achieved an AUC of 0.99 for detecting the presence of ETT at chest radiography (16,17). However, only one study assessed ETT position (16) and achieved a relatively poorer AUC of 0.81 for differentiating low compared with satisfactory position, which is a more important clinical question. As such, the goal of this study was to determine whether a different approach using deep learning could achieve further accuracy over prior work in determining the precise location of the ETT tip with respect to the carina. An automated solution may facilitate earlier identification of malpositioned tubes, be used to flag reading worklists, and expedite radiology reporting and ensure timely notification to referring clinicians.

Materials and Methods

Study Design

This retrospective study was institutional review board–exempt and used de-identified Health Insurance Portability and Accountability Act–compliant radiographs and corresponding reports obtained from the picture archiving and communication system at Thomas Jefferson University Hospital, Philadelphia, Pa. A total of 22 960 de-identified frontal chest radiographs from 11 153 patients (average age, 60.2 years ± 19.9 [standard deviation], 55.6% men) were obtained consecutively by the use of natural language queries on a repository of chest radiology reports from Thomas Jefferson University Hospital between the dates of 2010 to 2018. The inclusion criteria for this study were reports and images in which a specific ETT-carina distance was mentioned in the report from the previously mentioned dates and patients 18 years or older. At our institution, radiologists commonly indicate a specific measurement (eg, “ETT is 4.6 cm above the carina”) in the report itself. Queries were developed to extract such reports and corresponding images in which mention of an ETT was made and with a specific distance measurement from the carina. There were 12 categories in total, including bronchial insertion, distance from the carina at 1.0-cm intervals up to 10 cm (0.0–0.9 cm, 1.0–1.9 cm, …, 9.0–9.9 cm), and 10 cm or greater. Images were randomly split into training (80%, 18 368 images), validation (10%, 2296 images), and test sets (10%, 2296 images) (Table 1, Fig 1). The images were evenly divided across training, validation, and test sets such that the relative percentage of images per interval categories were the same (Table 1). The previously mentioned test set of 2296 images was also referred to as the internal test set, derived from the same institution as the training data (Thomas Jefferson University Hospital, Philadelphia, Pa).

Table 1: Examinations and Patients per Category

Table 1:
The internal dataset was split into 80% for training (18 368 of                         22 960), 10% for validation (2296 of 22 960), and 10% for test                         (2296 of 22 960). For interrater assessment, 100 random images were                         taken from the internal test dataset (blue curved arrow), and 100 images                         were taken from an external dataset.

Figure 1: The internal dataset was split into 80% for training (18 368 of 22 960), 10% for validation (2296 of 22 960), and 10% for test (2296 of 22 960). For interrater assessment, 100 random images were taken from the internal test dataset (blue curved arrow), and 100 images were taken from an external dataset.

On the internal test dataset, ground truth ETT position was determined by ETT-carina distance as indicated in the original radiology report. A second board-certified radiologist practicing in the cardiothoracic division (P.L., 13 years of experience) also visually inspected all test images for quality assurance reasons to ensure presence of the ETT and whether the stated ETT-carina distance in centimeters was reasonable.

Radiograph Acquisition

Images were acquired from multiple different computed radiography (indicated as CR below) and digital radiography (indicated as DR below) manufacturers and models. This included the following systems: Agfa CR ADC Compact, Solo, 51xx series, Agfa CR 35, 75, and 85 series, and Agfa DR DX-G and DX-M series (Agfa Health Care, Carlstadt, NJ); Kodak CR 825, 850, 950, and 975 series (Carestream Health Care, Rochester, NY); Siemens DR Axiom-Multix (Siemens Healthcare, Erlangen, Germany), Canon DR CXDI (Canon Medical Systems USA, Tustin, Calif), Philips DR Digital Diagnost (Philips Healthcare, Andover, Mass), Carestream DR DRX-1, DRX-Evolution, DRX-Revolution, and DRX-G series (Carestream Health Care, Rochester, NY).

Interrater Agreement

On 100 random images from the internal test dataset (Fig 1), the ETT-carina distance was remeasured in centimeters by a board-certified radiologist (reader 1) practicing in the cardiothoracic division (P.L.) using a picture archiving and communication system workstation (Philips Intellispace 4.4; Koninklijke Philips, Amsterdam, the Netherlands) to assess interrater agreement. The measurement by reader 1 was compared with that stated in the original radiology report itself (original reader), which was interpreted by any one of the board-certified radiologists practicing in our department (six radiologists, 6–40 years of experience).

This same process was repeated for an external test dataset (Fig 1), which consisted of 100 consecutive chest radiographs obtained from a different hospital (Kennedy University Hospital, Cherry Hill, NJ) from March 30, 2020, to April 30, 2020, in which an ETT was present and the ETT-carina distance was specified in the report.

In both cases, reader 1 was blinded to the data in the original radiology report (original reader).

Image Preprocessing

Contrast-limited adaptive histogram equalization was performed on the original Digital Imaging and Communications in Medicine (DICOM) images. The images were then converted to 8-bit grayscale portable network graphics format. A top-center crop of the original DICOM image was performed (by dividing the original matrix size by 0.5 in both the x and y direction, and then adding 200 pixels in the x and 400 pixels in the y dimension). This reduced the dimensionality of the original image but preserved the carina and ETT for all images inspected on the test set. The cropped images were subsequently resized to 512 × 512 pixels.

Deep Learning Architecture and Training Details

The TensorFlow framework (TensorFlow 1.4, Google, Mountain View, Calif) and the Keras library (Keras version 2.12, https://keras.io) were used for network training. The Inception V3 deep convolutional neural network architecture was used for this study which consists of 48 layers (18). In the network, the last fully connected layers were set to random initialization of weights. The remainder of the layers were initialized using pretrained weights from ImageNet (9). The very last layer was set to a single node that predicted ETT-carina distance ranging from 0.5 cm (bronchial insertion) to 10.5 cm above the carina, with a linear activation function. The loss function was set to mean absolute error. Real-time augmentation during training consisted of rotation, sheer, horizontal flipping, and image translation. Training was performed for 200 epochs, using an Adam optimizer, and fixed learning rate of 0.0001, which achieved the lowest loss on the validation dataset after multiple experiments. The model with the lowest loss on the validation dataset was chosen for inference on the test datasets.

Model Inference on Test Data

The deep neural network was used to predict ETT distance from the carina in centimeters. Test-time augmentation was employed during inference on the test datasets (random rotation, sheer, horizontal flipping, and image translation) for 10 different times per image, and the average of these predictions was used for the final classification.

Statistical Analysis

For assessing the interrater agreement for both internal and external datasets, the mean absolute differences and standard deviations were calculated for the ETT-carina distances for reader 1 compared with the original reader and for the AI-generated result compared with each reader. Intraclass correlation coefficients (ICCs) and corresponding 95% CIs were calculated per the methods outlined by Koch (19). For the qualitative interpretation of ICC interrater agreement measures, we considered less than 0.40 as poor, 0.40–0.59 as fair, 0.60–0.74 as good, and more than 0.75 as excellent, per guidelines used by Cicchetti (20).

A Bland-Altman plot was constructed for AI versus reader 1 and the original reader, and between reader 1 and the original reader (Fig 2), for both internal and external test datasets (18).

Bland-Altman plot of AI compared with the mean of the radiologists                         (reader 1 and original reader), and between the radiologists (reader 1 and                         original reader), on the 200 test images used for interrater assessment. The                         top green horizontal line represents the upper boundary (+ 1.96 SD) and                         the bottom red line the lower boundary (−1.96 SD) for the limits of                         agreement. The mean difference is −0.22 denoted by the middle orange                         horizontal line. AI = artificial intelligence, SD = standard                         deviation.

Figure 2: Bland-Altman plot of AI compared with the mean of the radiologists (reader 1 and original reader), and between the radiologists (reader 1 and original reader), on the 200 test images used for interrater assessment. The top green horizontal line represents the upper boundary (+ 1.96 SD) and the bottom red line the lower boundary (−1.96 SD) for the limits of agreement. The mean difference is −0.22 denoted by the middle orange horizontal line. AI = artificial intelligence, SD = standard deviation.

Two thresholds were defined as low positions of the ETT: less than 1 cm and less than 2 cm for ETT-carina distance. A threshold of 7 cm or greater for ETT-carina distance was defined as high ETT position. Model performance in assessing low and high ETT positions is described using confusion matrices (Figs 3, 4), sensitivity, and specificity on the entire internal test data of 2296 images (Table 2). The statistical analysis was performed using R (version 3.6.2, R Foundation, Vienna, Austria). Sensitivity and specificity analysis of the thresholds was not performed on the external test set because of low sample size of the malpositions (11 low positions and six high positions of the 100 total images).

Full confusion matrix of AI prediction compared with ground truth on                         the entire internal test set (2296 images). The < 0 denotes bronchial                         insertion and the remainder of values reflects centimeter distance above                         carina. The degree of blue shading (0%–100%) corresponds to the                         percentage of ground truth cases correctly predicted by AI per category as                         denoted by the scale. AI = artificial intelligence.

Figure 3: Full confusion matrix of AI prediction compared with ground truth on the entire internal test set (2296 images). The < 0 denotes bronchial insertion and the remainder of values reflects centimeter distance above carina. The degree of blue shading (0%–100%) corresponds to the percentage of ground truth cases correctly predicted by AI per category as denoted by the scale. AI = artificial intelligence.

Confusion matrices on the entire internal test set (2296 images) for                         using (a) a less than 1-cm cutoff for low endotracheal tube (ETT) placement,                         (b) less than 2-cm cutoff for low ETT placement, and (c) a greater than or                         equal to 7-cm cutoff for high ETT placement. In c (denoted by *), 42 of                         74 images were misclassified by one category, in which AI predicted                         6–7 cm above carina, but ground truth was 7–8 cm. AI =                         artificial intelligence.

Figure 4a: Confusion matrices on the entire internal test set (2296 images) for using (a) a less than 1-cm cutoff for low endotracheal tube (ETT) placement, (b) less than 2-cm cutoff for low ETT placement, and (c) a greater than or equal to 7-cm cutoff for high ETT placement. In c (denoted by *), 42 of 74 images were misclassified by one category, in which AI predicted 6–7 cm above carina, but ground truth was 7–8 cm. AI = artificial intelligence.

Confusion matrices on the entire internal test set (2296 images) for                         using (a) a less than 1-cm cutoff for low endotracheal tube (ETT) placement,                         (b) less than 2-cm cutoff for low ETT placement, and (c) a greater than or                         equal to 7-cm cutoff for high ETT placement. In c (denoted by *), 42 of                         74 images were misclassified by one category, in which AI predicted                         6–7 cm above carina, but ground truth was 7–8 cm. AI =                         artificial intelligence.

Figure 4b: Confusion matrices on the entire internal test set (2296 images) for using (a) a less than 1-cm cutoff for low endotracheal tube (ETT) placement, (b) less than 2-cm cutoff for low ETT placement, and (c) a greater than or equal to 7-cm cutoff for high ETT placement. In c (denoted by *), 42 of 74 images were misclassified by one category, in which AI predicted 6–7 cm above carina, but ground truth was 7–8 cm. AI = artificial intelligence.

Confusion matrices on the entire internal test set (2296 images) for                         using (a) a less than 1-cm cutoff for low endotracheal tube (ETT) placement,                         (b) less than 2-cm cutoff for low ETT placement, and (c) a greater than or                         equal to 7-cm cutoff for high ETT placement. In c (denoted by *), 42 of                         74 images were misclassified by one category, in which AI predicted                         6–7 cm above carina, but ground truth was 7–8 cm. AI =                         artificial intelligence.

Figure 4c: Confusion matrices on the entire internal test set (2296 images) for using (a) a less than 1-cm cutoff for low endotracheal tube (ETT) placement, (b) less than 2-cm cutoff for low ETT placement, and (c) a greater than or equal to 7-cm cutoff for high ETT placement. In c (denoted by *), 42 of 74 images were misclassified by one category, in which AI predicted 6–7 cm above carina, but ground truth was 7–8 cm. AI = artificial intelligence.

Table 2: Model Performance for Assessing Low and High ETT on Internal Test Set

Table 2:

Additional Data Analysis and Visualization

Class activation maps were created using the methods by Zhou et al (21) to inspect the areas of the image that were activated by the network (Fig 5).

Class activation maps of the endotracheal tube (ETT) model. On the                         images, the class activation maps show the area of the image (colormap from                         red to light blue) that is most activated by the network, which is the                         region between the ETT and carina, indicating that the appropriate part of                         the image is being assessed. (a) The predicted ETT-carina distance was 2.9                         cm (ground truth was 2.5 cm). (b) A low ETT position where the predicted                         ETT-carina distance was −0.2 cm (ground truth was 0 cm at level of                         carina). (c) A high ETT position, where the predicted ETT-carina distance                         was 6.8 cm (ground truth was 8.6 cm above carina); this illustrates an image                         where the AI system was off by more than one category. The black arrow                         denotes the true carina location; the white arrow denotes potentially what                         the AI system is looking at based off the heatmap, which is the junction of                         the azygous vein and superior vena cava that resembles the appearance of a                         carina. AI = artificial intelligence.

Figure 5a: Class activation maps of the endotracheal tube (ETT) model. On the images, the class activation maps show the area of the image (colormap from red to light blue) that is most activated by the network, which is the region between the ETT and carina, indicating that the appropriate part of the image is being assessed. (a) The predicted ETT-carina distance was 2.9 cm (ground truth was 2.5 cm). (b) A low ETT position where the predicted ETT-carina distance was −0.2 cm (ground truth was 0 cm at level of carina). (c) A high ETT position, where the predicted ETT-carina distance was 6.8 cm (ground truth was 8.6 cm above carina); this illustrates an image where the AI system was off by more than one category. The black arrow denotes the true carina location; the white arrow denotes potentially what the AI system is looking at based off the heatmap, which is the junction of the azygous vein and superior vena cava that resembles the appearance of a carina. AI = artificial intelligence.

Class activation maps of the endotracheal tube (ETT) model. On the                         images, the class activation maps show the area of the image (colormap from                         red to light blue) that is most activated by the network, which is the                         region between the ETT and carina, indicating that the appropriate part of                         the image is being assessed. (a) The predicted ETT-carina distance was 2.9                         cm (ground truth was 2.5 cm). (b) A low ETT position where the predicted                         ETT-carina distance was −0.2 cm (ground truth was 0 cm at level of                         carina). (c) A high ETT position, where the predicted ETT-carina distance                         was 6.8 cm (ground truth was 8.6 cm above carina); this illustrates an image                         where the AI system was off by more than one category. The black arrow                         denotes the true carina location; the white arrow denotes potentially what                         the AI system is looking at based off the heatmap, which is the junction of                         the azygous vein and superior vena cava that resembles the appearance of a                         carina. AI = artificial intelligence.

Figure 5b: Class activation maps of the endotracheal tube (ETT) model. On the images, the class activation maps show the area of the image (colormap from red to light blue) that is most activated by the network, which is the region between the ETT and carina, indicating that the appropriate part of the image is being assessed. (a) The predicted ETT-carina distance was 2.9 cm (ground truth was 2.5 cm). (b) A low ETT position where the predicted ETT-carina distance was −0.2 cm (ground truth was 0 cm at level of carina). (c) A high ETT position, where the predicted ETT-carina distance was 6.8 cm (ground truth was 8.6 cm above carina); this illustrates an image where the AI system was off by more than one category. The black arrow denotes the true carina location; the white arrow denotes potentially what the AI system is looking at based off the heatmap, which is the junction of the azygous vein and superior vena cava that resembles the appearance of a carina. AI = artificial intelligence.

Class activation maps of the endotracheal tube (ETT) model. On the                         images, the class activation maps show the area of the image (colormap from                         red to light blue) that is most activated by the network, which is the                         region between the ETT and carina, indicating that the appropriate part of                         the image is being assessed. (a) The predicted ETT-carina distance was 2.9                         cm (ground truth was 2.5 cm). (b) A low ETT position where the predicted                         ETT-carina distance was −0.2 cm (ground truth was 0 cm at level of                         carina). (c) A high ETT position, where the predicted ETT-carina distance                         was 6.8 cm (ground truth was 8.6 cm above carina); this illustrates an image                         where the AI system was off by more than one category. The black arrow                         denotes the true carina location; the white arrow denotes potentially what                         the AI system is looking at based off the heatmap, which is the junction of                         the azygous vein and superior vena cava that resembles the appearance of a                         carina. AI = artificial intelligence.

Figure 5c: Class activation maps of the endotracheal tube (ETT) model. On the images, the class activation maps show the area of the image (colormap from red to light blue) that is most activated by the network, which is the region between the ETT and carina, indicating that the appropriate part of the image is being assessed. (a) The predicted ETT-carina distance was 2.9 cm (ground truth was 2.5 cm). (b) A low ETT position where the predicted ETT-carina distance was −0.2 cm (ground truth was 0 cm at level of carina). (c) A high ETT position, where the predicted ETT-carina distance was 6.8 cm (ground truth was 8.6 cm above carina); this illustrates an image where the AI system was off by more than one category. The black arrow denotes the true carina location; the white arrow denotes potentially what the AI system is looking at based off the heatmap, which is the junction of the azygous vein and superior vena cava that resembles the appearance of a carina. AI = artificial intelligence.

Results

ETT to Carina Distance Comparisons

The predicted ETT-carina distance for the AI system had mean absolute differences of 0.69 cm ± 0.70 on the internal test set and 0.63 cm ± 0.55 on the external test set compared with the average radiologist (Table 3). The radiologists themselves had mean absolute differences of 0.44 cm ± 0.44 and 0.63 cm ± 0.67 from each other on the internal and external datasets, respectively (Table 3).

Table 3: ETT-Carina Distance Mean Absolute Differences: Radiologists and AI

Table 3:

On the internal and external datasets, respectively, the ICCs of AI and radiologists were 0.84 (95% CI: 0.78, 0.92) and 0.89 (95% CI: 0.77, 0.94); the ICCs of the radiologists were 0.93 (95% CI: 0.90, 0.95) and 0.84 (95% CI: 0.71, 0.90) (Table 4).

Table 4: Interrater Assessment Between Radiologists and AI

Table 4:

The Bland-Altman plot (Fig 2) shows that the mean difference between the radiologists (average of reader 1 and original reader) and AI was −0.22 cm. The upper limit of agreement (+1.96 standard deviation) was +1.55 cm, and the lower limit of agreement (−1.96 standard deviation) was −1.98 cm.

Confusion Matrices and Class Activation Maps

The confusion matrix comparing the ground truth ETT-carina distance to that of the AI system on the full internal test data are provided in Figure 3. Additional confusion matrices at different cutoffs for low ETT (< 1 cm and < 2 cm) and high ETT (≥ 7 cm) are shown in Figure 4. Class activation maps for sample test images are shown in Figure 5.

Sensitivity and Specificity of AI Determination of ETT to Carina Distance

On the internal test data, the sensitivity and specificity for the two different low (ETT-carina distances of < 1 cm and < 2 cm) and high (≥ 7 cm) thresholds are shown in Table 2. For differentiating ETT-carina distance of less than 1 cm from all others, the sensitivity was 93.9% (95% CI: 90.0, 96.7) and specificity 97.7% (95% CI: 96.9, 98.3); for differentiating ETT-carina distance of less than 2 cm from all others, the sensitivity was 90.1% (95% CI: 86.6, 92.9) and specificity 92.4% (95% CI: 91.1, 93.5); for differentiating ETT-carina distance of greater than or equal to 7 cm from all others, the sensitivity was 66.5% (95% CI: 59.9, 72.2) and specificity 99.2% (95% CI: 98.6, 99.6).

Discussion

The goal of this study was to determine the efficacy of deep learning in assessing ETT position on chest radiographs compared with radiologists. It is common practice for intubated patients in the hospital to have radiographs to assess ETT position because low and high positions can affect patient morbidity. An automated method that can reliably identify ETT position may lead to earlier identification of malpositions and expedite notification to clinicians for ETT repositioning.

In this study, we demonstrated that AI can predict ETT-carina distance within 1 cm in most cases with excellent interrater agreement compared with radiologists (ICC > 0.8 for AI and radiologists). Moreover, the AI model had a sensitivity and specificity of greater than 90% in detecting low ETT positions and bronchial insertions.

Most prior studies had used computerized methods to detect the presence of an ETT (6,7,17) but did not assess the position of the ETT, which is a more important clinical question. In one prior study, deep learning using a GoogLeNet convolutional neural network was used to differentiate low and satisfactory position of the ETT with an AUC of 0.81, using 300 labeled images (16). However, in this study, we used more images, more precise annotations, and a different deep learning architecture in Inception V3 (22). As such, we were more reliably able to assess ETT position reasonably close to human measurement.

In the current study, the algorithm was designed to predict a specific ETT distance from the carina in centimeters. We designed the distance categories for two reasons. First, the tool may prove useful in expediting radiology reporting because our radiologists frequently denote the actual ETT-carina distance in centimeters in the radiology report. Second, institutions may have different practice preferences for satisfactory position. For example, proper placement has been described in the literature as 3–7 cm above the carina in some articles (23), and 2–5 cm above the carina in others (24). At our institution, 2–7 cm above the carina is the accepted practice. As such, we chose ETT-carina distances of less than 2 cm as low and 7 cm or greater as high ETT positions, respectively (Table 2). We also evaluated ETT-carina distance of less than 1 cm (Table 2) because we were interested in assessing model performance in detecting very low and bronchial insertions, for purposes of flagging critical results on a reading worklist.

The class activation maps showed that the model assessed the appropriate parts of the image, which is the region between the ETT and the carina (Fig 5). It is possible that in images in which the model is off by more than one category, it may be misinterpreting the true carina location, as shown in Figure 5c. Because the categorization was performed on an image basis (weak labeling) and not pixel basis (strong labeling), the model may be using other parts of the image besides just the ETT tip and carina. For example, in Figure 5b, the T3 and T4 vertebral bodies are also highlighted, which have been described as surrogate locations for proper ETT tip placement in cases where the carina is not seen.

Most predictions (2188 of 2296, 95.3%) were either in the appropriate category or off by one. The model performed slightly worse for higher ETT positions. In particular, the sensitivity of the model for detecting high placements (ETT-carina distance ≥ 7 cm) was lower at 66.5% compared with that of low placements, which had sensitivities greater than 90% (Table 2). However, most of these “misses” were threshold cases, whereby the model was off by one category. For example, there were cases in which the AI system predicted the ETT-carina distance as 6.8 cm above the carina but the ground truth position was 7.3 cm. This constitutes a miss and impacts model sensitivity, but in reality, many of these differences are similar to the expected interrater variability of radiologists (Tables 3, 4). It is likely that incorporating more training images for high positions, or weighting these categories more in the loss function, would improve model performance for such higher positions.

The AI solution was trained using “weak” labels. This means that entire images were used for category labeling, such as ETT-carina distance of 4–4.9 cm, 5–5.9 cm, and so forth. Weak labels require substantially more training data. In the future, it would be interesting to compare this solution to that using “strong” labels, which could include bounding boxes denoting the location of the ETT and carina, or pixel-level segmentation of the ETT and carina. AI solutions using pixel-level labeling should require much fewer training images. However, in some cases, it should be noted that the carina was not or poorly visible, in which radiologists use secondary anatomic landmarks to infer carina location. AI solutions trained on many weak labels may have advantages in these cases because it can learn from these landmarks as well. However, AI solutions based on strong labels have potential to generate a more accurate prediction of ETT-carina distance overall, and this would be an interesting future research question to explore. Also, AI solutions based on strong labels may do better for high ETT placements, where we had less training data.

Our study had limitations. There were more examinations than patients because intubated patients are commonly imaged more than once during a hospital stay. Some patients in the training, validation, and internal test dataset overlapped. However, this overlap is unlikely to result in overfitting (where the neural network memorizes the ETT-carina distance for each patient) because the ETT typically changes position day-to-day. Also, for patients with bronchial insertions, the ETT-carina distance will change after the tube is repositioned by the clinical team, so we thought it was important to divide the training, validation, and test sets by unique examinations and not patients. Another potential limitation of the study was that the training data were from a single tertiary institution, and the solution may not generalize as well to other sites. However, the data were taken from three different inpatient hospitals, from multiple different manufacturers, including computed and digital radiography systems (see Radiograph Acquisition in the Materials and Methods section), which may aid generalization. It should be noted that radiologic images in which the actual ETT-carina distance is provided may bias the dataset where the carina is visible, and therefore make it easier for the AI system to determine the ETT-carina distance.

In the future, it would be important to assess this algorithm prospectively and on larger external datasets to ensure accuracy and generalization. It would also be interesting to compare the efficacy of this solution to models built using other AI approaches including object-detection and semantic segmentation.

Disclosures of Conflicts of Interest: P.L. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: author received honorarium from Infervision for lecture unrelated to this work. Other relationships: patent planned for AI assessment of support devices on radiography. Activities related to the present article: editorial board member of Radiology: Artificial Intelligence. A.F. disclosed no relevant relationships. R.G. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: author is consultant for Bioclinica and Medtronic for clinical trial reads. Other relationships: disclosed no relevant relationships.

Author Contributions

Author contributions: Guarantor of integrity of entire study, P.L.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, P.L.; clinical studies, all authors; experimental studies, P.L., R.G.; statistical analysis, P.L.; and manuscript editing, P.L., R.G.

Authors declared no funding for this work.

References

  • 1. Goodman LR, Conrardy PA, Laing F, Singer MM. Radiographic evaluation of endotracheal tube position. AJR Am J Roentgenol 1976;127(3):433–434. Crossref, MedlineGoogle Scholar
  • 2. Koshy T, Misra S, Chatterjee N, Dharan BS. Accuracy of a Chest X-Ray-Based Method for Predicting the Depth of Insertion of Endotracheal Tubes in Pediatric Patients Undergoing Cardiac Surgery. J Cardiothorac Vasc Anesth 2016;30(4):947–953. Crossref, MedlineGoogle Scholar
  • 3. Brunel W, Coleman DL, Schwartz DE, Peper E, Cohen NH. Assessment of routine chest roentgenograms and the physical examination to confirm endotracheal tube position. Chest 1989;96(5):1043–1045. Crossref, MedlineGoogle Scholar
  • 4. Zwillich CW, Pierson DJ, Creagh CE, Sutton FD, Schatz E, Petty TL. Complications of assisted ventilation. A prospective study of 354 consecutive episodes. Am J Med 1974;57(2):161–170. Crossref, MedlineGoogle Scholar
  • 5. Varshney M, Sharma K, Kumar R, Varshney PG. Appropriate depth of placement of oral endotracheal tube and its possible determinants in Indian adult patients. Indian J Anaesth 2011;55(5):488–493. Crossref, MedlineGoogle Scholar
  • 6. Kao EF, Jaw TS, Li CW, Chou MC, Liu GC. Automated detection of endotracheal tubes in paediatric chest radiographs. Comput Methods Programs Biomed 2015;118(1):1–10. Crossref, MedlineGoogle Scholar
  • 7. Chen S, Zhang M, Yao L, Xu W. Endotracheal tubes positioning detection in adult portable chest radiography for intensive care unit. Int J CARS 2016;11(11):2049–2057. CrossrefGoogle Scholar
  • 8. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Convolutional Neural Networks. Advances in Neural Information Processing Systems 25 (NIPS 2012). 2012; 1097–1105. https://dl.acm.org/doi/10.5555/2999134.2999257. Accessed February 2019. Google Scholar
  • 9. Russakovsky O, Deng J, Su H, et al. Imagenet large scale visual recognition challenge. Int J Comput Vis 2015;115(3):211–252. CrossrefGoogle Scholar
  • 10. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 2017;284(2):574–582. LinkGoogle Scholar
  • 11. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017;542(7639):115–118 [Published correction appears in Nature 2017;546(7660):686.]. Crossref, MedlineGoogle Scholar
  • 12. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016;316(22):2402–2410. Crossref, MedlineGoogle Scholar
  • 13. Wu N, Phang J, Park J, et al. Deep neural networks improve radiologists’ performance in breast cancer screening. IEEE Trans Med Imaging 2020;39(4):1184–1194. Crossref, MedlineGoogle Scholar
  • 14. McKinney SM, Sieniek M, Godbole V, et al. International evaluation of an AI system for breast cancer screening. Nature 2020;577(7788):89–94. Crossref, MedlineGoogle Scholar
  • 15. Ardila D, Kiraly AP, Bharadwaj S, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 2019;25(6):954–961 [Published correction appears in Nat Med 2019;25(8):1319.]. Crossref, MedlineGoogle Scholar
  • 16. Lakhani P. Deep convolutional neural networks for endotracheal tube position and X-ray image classification: challenges and opportunities. J Digit Imaging 2017;30(4):460–468. Crossref, MedlineGoogle Scholar
  • 17. Frid-Adar M, Amer R, Greenspan H. Endotracheal Tube Detection and Segmentation in Chest Radiographs Using Synthetic Data. In: Shen D, Liu T, Peters TM, et al, eds. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019. MICCAI 2019. Lecture Notes in Computer Science, vol 11769. Cham, Switzerland: Springer, 2019; 784–792. CrossrefGoogle Scholar
  • 18. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1(8476):307–310. Crossref, MedlineGoogle Scholar
  • 19. Koch GG. Intraclass correlation coefficient. In: Kotz S, Johnson NL, eds. Encyclopedia of Statistical Sciences. New York, NY: Wiley, 1982; 213–217. Google Scholar
  • 20. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6(4):284–290. CrossrefGoogle Scholar
  • 21. Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A. Learning Deep Features for Discriminative Localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, June 27–30, 2016. Piscataway, NJ: IEEE, 2016. CrossrefGoogle Scholar
  • 22. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the Inception Architecture for Computer Vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, June 27–30, 2016. Piscataway, NJ: IEEE, 2016. CrossrefGoogle Scholar
  • 23. Allan E, Giggens R, Ali T, Bhuva S. The ICU chest radiograph: Line and tube essentials for radiologists and ICU physicians. European Congress of Radiology 2019. https://doi.org/10.26044/ecr2019/C-3024. Published March 3, 2019. Accessed January 15, 2020 Google Scholar
  • 24. Li Y, Wang J, Wei X, Song H, Zuo Y. Individually Confirm the Depth of Endotracheal Tube by Ultrasound. The Anesthesiology Annual Meeting 2016. http://www.asaabstracts.com/strands/asaabstracts/abstract.htm?year=2016&index=8&absnum=4204. Published October 22, 2016. Accessed January 15, 2020. Google Scholar

Article History

Received: Mar 5 2020
Revision requested: Apr 23 2020
Revision received: Sept 3 2020
Accepted: Sept 25 2020
Published online: Nov 18 2020