Prior CT Improves Deep Learning for Malignancy Risk Estimation of Screening-detected Pulmonary Nodules
Abstract
Background
Prior chest CT provides valuable temporal information (eg, changes in nodule size or appearance) to accurately estimate malignancy risk.
Purpose
To develop a deep learning (DL) algorithm that uses a current and prior low-dose CT examination to estimate 3-year malignancy risk of pulmonary nodules.
Materials and Methods
In this retrospective study, the algorithm was trained using National Lung Screening Trial data (collected from 2002 to 2004), wherein patients were imaged at most 2 years apart, and evaluated with two external test sets from the Danish Lung Cancer Screening Trial (DLCST) and the Multicentric Italian Lung Detection Trial (MILD), collected in 2004–2010 and 2005–2014, respectively. Performance was evaluated using area under the receiver operating characteristic curve (AUC) on cancer-enriched subsets with size-matched benign nodules imaged 1 and 2 years apart from DLCST and MILD, respectively. The algorithm was compared with a validated DL algorithm that only processed a single CT examination and the Pan-Canadian Early Lung Cancer Detection Study (PanCan) model.
Results
The training set included 10 508 nodules (422 malignant) in 4902 trial participants (mean age, 64 years ± 5 [SD]; 2778 men). The size-matched external test sets included 129 nodules (43 malignant) and 126 nodules (42 malignant). The algorithm achieved AUCs of 0.91 (95% CI: 0.85, 0.97) and 0.94 (95% CI: 0.89, 0.98). It significantly outperformed the DL algorithm that only processed a single CT examination (AUC, 0.85 [95% CI: 0.78, 0.92; P = .002]; and AUC, 0.89 [95% CI: 0.84, 0.95; P = .01]) and the PanCan model (AUC, 0.64 [95% CI: 0.53, 0.74; P < .001]; and AUC, 0.63 [95% CI: 0.52, 0.74; P < .001]).
Conclusion
A DL algorithm using current and prior low-dose CT examinations was more effective at estimating 3-year malignancy risk of pulmonary nodules than established models that only use a single CT examination.
Clinical trial registration nos. NCT00047385, NCT00496977, NCT02837809
© RSNA, 2023
Supplemental material is available for this article.
See also the editorial by Horst and Nishino in this issue.
Summary
A deep learning algorithm trained to estimate 3-year malignancy risk of screening-detected pulmonary nodules using current and prior low-dose CT examinations outperformed validated models that used a single CT examination.
Key Results
■ In this retrospective study of 6982 trial participants, a deep learning (DL) algorithm was trained to estimate 3-year malignancy risk of screening-detected pulmonary nodules using current and prior CT examinations of 10 508 nodules.
■ External testing with two cancer-enriched data sets showed areas under the receiver operating characteristic curve (AUCs) of 0.91 and 0.94.
■ The algorithm outperformed a validated DL algorithm using a single CT (AUCs, 0.85 and 0.89; P = .002 and P = .01).
Introduction
Lung cancer is the deadliest cancer worldwide. Early detection through screening is critical to reducing lung cancer mortality (1,2). The National Lung Screening Trial (NLST) and the Dutch-Belgian Lung Cancer Screening Trial (NELSON) showed that screening a high-risk population with low-dose chest CT examinations reduces lung cancer mortality by up to 26% (3,4). Early-stage lung cancer often manifests as small pulmonary nodules. CT examinations are highly effective at depicting these nodules. However, most pulmonary nodules are benign. This is demonstrated by the false-positive rate of 24% in the NLST (3). Thus, it is challenging for radiologists to identify and monitor potentially malignant nodules. Despite the presence of nodule management guidelines (5,6), accurate characterization remains tedious and is subject to inter- and intrareader variability (7).
Artificial intelligence using deep learning (DL) has demonstrated promising results for accurately estimating the malignancy risk of pulmonary nodules, especially compared with histopathologic analysis–based reference standards. For example, previous studies (8–11) describe DL algorithms that outperform established nodule risk calculators while performing similarly to expert thoracic radiologists. However, these algorithms do not include imaging information from prior CT examinations when available. For nodules observed at the follow-up screening rounds, temporal changes, such as growth and appearance compared with prior CT examinations, provide valuable additional information for accurately estimating malignancy risk (12–14). Huang et al (15) published a DL algorithm that estimated the 3-year malignancy risk for nodules with follow-up imaging. However, their algorithm does not work directly with imaging data and requires manual inputs.
In this study, a DL algorithm was trained to estimate the 3-year malignancy risk of pulmonary nodules by combining imaging data from a current and prior low-dose CT examination performed 1 or 2 years earlier. The algorithm was trained with nodules from the NLST (3) and evaluated with two external test sets from the Danish Lung Cancer Screening Trial (DLCST) (16) and the Multicentric Italian Lung Detection Trial (MILD) (17). The algorithm was compared with three validated models: a previously validated DL algorithm that only processes a single CT examination (9), the Pan-Canadian Early Detection of Lung Cancer Study (PanCan) model (18), and the updated NELSON management protocol that combines volume doubling time and volume cutoffs (19,20).
Materials and Methods
Data Sets
In this retrospective study, the training set included a previously reported data set of pulmonary nodules (9) curated from anonymized low-dose CT examinations in 5282 individuals who participated in the NLST (ClinicalTrials.gov, NCT00047385) between 2002 and 2004 (3). Whereas the previous study used the data set for training a DL algorithm to estimate pulmonary nodule malignancy risk using a single CT examination, the data set was updated in this study to train the algorithm to use two CT examinations. The NLST was approved by the institutional review board at each participating medical institution. Permission for this study was obtained from the National Cancer Institute (Cancer Data Access System project number NLST-846).
To rigorously validate the algorithm, external test sets were curated from anonymized low-dose CT examinations in individuals who participated in the DLCST (ClinicalTrials.gov, NCT00496977) (21) between 2004 and 2010, and in the MILD (ClinicalTrials.gov, NCT02837809) (17) between 2005 and 2014. Nodule annotation protocols for DLCST and MILD have been published previously by Winkler Wille et al (22) (who studied the accuracy of the PanCan model in 718 participants from DLCST) and by Silva et al (23) (who determined the long-term outcomes in a subset of 389 participants in the MILD with unresected subsolid nodules). Approval was obtained from the ethics committee of Copenhagen County and the institutional review board of Fondazione Istituto di Ricovero e Cura a Carattere Scientifico Istituto Nazionale Tumori di Milano, respectively. Participants in the three trials provided written informed consent.
Expert thoracic radiologists selected malignant nodules in participants with a histopathologic analysis–confirmed lung cancer diagnosis after reviewing their morphologic and temporal behavior across multiple CT examinations. Benign nodules were selected in participants who did not receive a lung cancer diagnosis.
A specific protocol was followed for selecting CT examinations. In the case of a malignant nodule, the last two annual or biennial low-dose screening CT examinations obtained before the lung cancer diagnosis were included. A similar approach was used for benign nodules, except that they were included in participants who were not diagnosed with lung cancer. In addition, by following the approach of Huang et al (15), CT examinations of lung cancers diagnosed more than 3 years after the examination date were excluded. This period was chosen to increase the likelihood of identifying cancers retrospectively, including those that demonstrate temporal changes at CT.
Training Set
In a previous study (9), an experienced thoracic radiologist (E.T.S., with >5 years of experience in reading lung screening CT) and two medical students, supervised by the radiologist, retrospectively identified malignant and benign nodules from participants with and without a lung cancer diagnosis, respectively. In this work, the data set was updated by not imposing a minimum size requirement on the nodules and by using an elastic lung registration algorithm (24) to accurately track and retrieve nodule locations across prior and follow-up CT examinations.
The algorithm was trained and internally validated with this data set using 10-fold cross-validation. Additionally, while training the algorithm, all combinations of prior and current CT examinations for each nodule were included. For instance, if a nodule underwent three annual CT examinations (baseline CT, referred to as t0; first annual follow-up CT, referred to as t1; and second annual follow-up CT, referred to as t2), we included the annual pairs (t0–t1 and t1–t2) and the biennial pair (t0–t2).
External Test Sets
Two experienced thoracic radiologists who performed the initial assessments at screening recorded all nodules in the DLCST (22). A pulmonologist (Z.S., with 6 years of experience) later annotated and temporally linked the nodules with a dedicated lung screening workstation that included a semiautomatic nodule segmentation tool (25) (details available in Appendix S1). Similarly, two experienced thoracic radiologists (M.S. and N.S., with 8 and 11 years of experience in lung screening CT, respectively) retrospectively annotated all nodules in the MILD (23), but unlike in the DLCST, only the biennial CT examinations were available. The screening radiologists located all malignant nodules in the DLCST, and another experienced thoracic radiologist (E.T.S.) retrospectively identified all malignant nodules in the MILD that showed malignant morphologic and temporal features.
Analysis of cancer-enriched subsets with large benign nodules often provides insights into the robustness of malignancy risk estimation algorithms. Following the approach of van Riel et al (26), cancer-enriched subsets were curated with size-matched benign nodules from DLCST and MILD. For every cancer, two benign nodules were selected (≤16 mm in diameter) in which the diameter was closest to that of the cancer.
Algorithm Development and Validation
The approach of a previously validated DL-based nodule malignancy classifier (9) was followed. The classifier, an ensemble of two- and three-dimensional convolutional neural networks, processed a single block of CT (50 mm in size) centered around a nodule to estimate its malignancy risk. However, in the current work, to integrate imaging data from prior CT examinations, the input channels of the neural networks were modified to accept nodule blocks from two CT examinations: the current examination and a prior examination from 1 or 2 years earlier. The corresponding volumetric segmentations, generated from a three-dimensional nnU-Net (27) (version 1.7.0; https://github.com/MIC-DKFZ/nnUNet), and the time difference between the CT examinations were included as additional inputs (Fig 1). These modifications allowed the classifier to consider the volumetric extents of the nodule and its growth rate when computing its malignancy risk (Appendix S1, Table S1). The algorithm is publicly available for research use (https://grand-challenge.org/algorithms/temporal-nodule-analysis/).

Figure 1: Schematic of the deep learning algorithm that uses current and prior low-dose CT examinations to estimate pulmonary nodule malignancy risk. First, three-dimensional (3D) nnU-Net (27) (version 1.7.0; https://github.com/MIC-DKFZ/nnUNet) performs volumetric segmentation of the nodules on both CT scans (boxes). Next, a 3D block around the nodule, 50-mm size in all directions, and the corresponding volumetric segmentations from both the current and prior CT are stacked and fed to a nodule classifier based on two-dimensional (2D) and 3D convolutional neural networks. The classifier also uses the time difference between the CT examination as input (one for annual CT examinations and two for biennial CT examinations). The outputs from the classifier, which combines current and prior CT examinations, are averaged with the outputs of the classifier, which only examines the current CT, to estimate pulmonary nodule malignancy risk. Finally, these malignancy risk scores are calibrated with Platt scaling. Softmax = tensor of probabilities.
The algorithm was compared with two clinically established models, the PanCan model (18) and the updated NELSON management protocol (19,20). The PanCan model is a multivariable logistic regression model that considers various clinical and nodule parameters to assess nodule malignancy risk. The NELSON protocol combines volume doubling time and volume cutoffs for nodule treatment. The protocol designates small nodules (<100 mm3) as benign and large nodules (≥300 mm3) as potentially malignant. For indeterminate nodules (100–300 mm3) that show a minimum 25% increase in volume, a risk evaluation based on volume doubling time is recommended, whereas stable nodules are considered benign.
Statistical Analysis
The pROC package (28) (R version 4.2.2; https://www.r-project.org/) was used to compute the area under the receiver operating characteristic curve (AUC) of the algorithms for differentiating benign and malignant nodules. The package used the DeLong method (29), a nonparametric approach that uses U statistics to compare receiver operating characteristic curves, to establish statistical significance between the performances of the algorithms, and to compute the 95% CIs for the AUC values. Statistical significance was indicated at P < .05.
Results
Data Set Characteristics
NLST.—The NLST training set included nodules from 4902 participants (mean age, 64 years ± 5 [SD]; 2778 men; median smoking history, 50 pack-years [IQR, 40–69]) who underwent at most three annual CT screening examinations (4243 with three examinations). Among participants with a lung cancer diagnosis, we identified 1404 nodule locations across all screening examinations for 720 screening-detected malignant nodules from 1343 CT examinations in 686 participants. Among participants without a lung cancer diagnosis, we identified 29 862 nodule locations for 10 607 benign nodules from 13 646 CT examinations in 4760 participants. After excluding 46 locations pertaining to cancers diagnosed after 3 years and 811 nodules without any follow-up CT examinations, the NLST data set was composed of 10 508 nodules (422 malignant) annotated across 14 397 CT examinations (Fig 2).

Figure 2: Flowchart describes the inclusion and exclusion criteria of pulmonary nodules from the National Lung Screening Trial (NLST). In NLST, there were 4243 participants who underwent three annual low-dose CT examinations, whereas in our data set 659 participants underwent two annual low-dose CT examinations. The pulmonary nodules from these participants were used for training the deep learning algorithm, which combines imaging information from prior and current CT examinations to estimate malignancy risk.
DLCST.—The DLCST test set included nodules from 679 participants (mean age, 62 years ± 5 [SD]; 372 men; smoking history, median 35 pack-years [IQR, 28–43]). Among participants diagnosed with lung cancer, 66 malignant nodules from 59 participants were screening-detected; these nodules had 232 annotations across 217 CT examinations. Among participants without a lung cancer diagnosis, 5513 annotations were recorded for 1297 nodules across 3276 CT examinations on 747 participants. Together, the data set was composed of 1363 nodules (66 malignant). After excluding 246 nodules without annual follow-up CT examinations, the DLCST data set was composed of 1117 nodules (43 malignant) annotated across 1405 annual CT examinations (Fig 3A).

Figure 3: Flowchart describes the inclusion and exclusion criteria of pulmonary nodules from the (A) Danish Lung Cancer Screening Trial (DLCST). Inclusion and exclusion criteria for pulmonary nodules from the (B) Multicentric Italian Lung Detection Trial (MILD). These pulmonary nodules were used to validate the performance of malignancy risk estimation algorithms. DICOM = Digital Imaging and Communications in Medicine.
MILD.—The MILD test set included data from 1401 participants (mean age, 62 years ± 6 [SD]; 952 men; median smoking history, 39 pack-years; IQR, 30–49.5). Among participants diagnosed with lung cancer, 117 malignant nodules were retrospectively identified in 91 participants; these nodules had 239 annotations across 181 CT examinations. Among participants without a lung cancer diagnosis, 14 540 annotations were recorded for 5877 benign nodules across 4397 CT examinations in 1589 participants. After excluding 23 cancers diagnosed more than 3 years after the last CT and 1332 nodules without biennial follow-up CT, the MILD data set was composed of 4639 nodules (42 malignant nodules) across 2924 biennial CT examinations (Fig 3B).
Follow-up information concerning histopathologic confirmation of malignancies was available for 6.5, 9, and 9.3 years for the NLST, DLCST, and MILD data sets, respectively. Table 1 shows demographic data in all participants studied in this work. Table 2 shows malignancy risk estimation and management algorithms that are compared in this work for pulmonary nodules detected at screening CT examinations. The distribution of size (diameter and volume) and morphologic properties (nodule type) for the data sets are summarized in Tables 3 and 4, respectively.
![]() |
![]() |
![]() |
![]() |
Algorithm Performance
Internal validation.—The DL algorithm that integrates imaging data from a current and previous CT examination achieved an AUC of 0.98 (95% CI: 0.97, 0.98) when evaluated with 10-fold cross-validation with the latest two annual CT examinations for each nodule (Fig S1). It outperformed the DL algorithm that only processed the latest CT examination (AUC, 0.95; 95% CI: 0.94, 0.96; P < .001).
External test sets.—In the DLCST, the DL algorithm achieved an AUC of 0.97 (95% CI: 0.95, 1.00) in the full group of 1117 nodules (43 malignant). It outperformed the DL algorithm that only processed a single CT examination (AUC, 0.96; 95% CI: 0.93, 0.99; P < .001), the updated NELSON management protocol (AUC, 0.94; 95% CI: 0.92, 0.95; P = .005), and the PanCan model (AUC, 0.94; 95% CI: 0.92, 0.96; P = .02) (Fig 4A). In the MILD, the algorithm achieved an AUC of 0.99 (95% CI: 0.98, 1.00) in the full group of 4639 nodules (42 malignant). This performance was also better than the DL algorithm that only processed a single CT examination (AUC, 0.98; 95% CI: 0.96, 0.99; P < .001), the updated NELSON management protocol (AUC, 0.93; 95% CI: 0.88, 0.97; P < .001), and the PanCan model (0.96 [95% CI: 0.94, 0.98]; P = .002) (Fig 4B).

Figure 4: Receiver operating characteristic (ROC) curves for differentiating benign nodules from malignant nodules according to the deep learning (DL) algorithms, the Pan-Canadian Early Lung Cancer Detection Study (PanCan) model, and the updated Dutch-Belgian Lung Cancer Screening Trial (NELSON) management protocol in (A) the full external test set from the Danish Lung Cancer Screening Trial (DLCST), (B) the full external test set from the Multicentric Italian Lung Detection Trial (MILD). The ROC curves are shown for the DL algorithms and the PanCan model in (C) the DLCST subset with size-matched benign nodules and (D) the MILD subset with size-matched benign nodules. AUC = area under the ROC curve, VDT = volume doubling time.
Subsets with size-matched benign nodules.—In the DLCST size-matched group, the algorithm achieved an AUC of 0.91 (95% CI: 0.85, 0.97). It outperformed the algorithm that only processed a single CT examination (AUC, 0.85; 95% CI: 0.78, 0.92; P = .002) and the PanCan model (AUC, 0.64; 95% CI: 0.53, 0.74; P < .001) (Fig 4C). In the MILD size-matched group, the algorithm achieved an AUC of 0.94 (95% CI: 0.89, 0.98), which was also better than the algorithm that only processed a single CT examination (AUC, 0.89; 95% CI: 0.84, 0.95; P = .01) and the PanCan model (AUC, 0.63; 95% CI: 0.52, 0.74; P < .001) (Fig 4D). Because all nodules in the size-matched groups were large (median volume at the prior CT examination for all data sets, ≥300 mm3), the performance of the updated NELSON management protocol was not calculated. The protocol would have recommended aggressive follow-up had it been available during the trials. Figure 5 shows examples of nodules where the algorithm produced accurate malignancy risk estimates, whereas Figure 6 shows examples of nodules with incorrect risk estimates.

Figure 5: Examples of screening-detected pulmonary nodules from the Danish Lung Cancer Screening Trial (DLCST) and the Multicentric Italian Lung Detection Trial (MILD), wherein malignancy risks were estimated accurately by the deep learning (DL) algorithm that combines a current and prior CT examination. The lines correspond to the malignancy risk estimation algorithms (solid blue, DL algorithm with prior CT; dotted blue, DL algorithm; dotted green, PanCan model). The percentages correspond to the risk scores from 0% to 100%. (A) Annual low-dose axial chest CT images in a 55-year-old woman with a lung cancer diagnosis in the DLCST show a growing spiculated malignant nodule, with a volume doubling time (VDT) of 481 days. All algorithms produced high malignancy risk scores. (B) Biennial low-dose axial chest CT images in a 67-year-old man with a lung cancer diagnosis in the MILD show a growing malignant nodule (VDT, 232 days). The DL algorithms produced high malignancy risk scores. (C) Annual low-dose axial chest CT images in a 66-year-old male participant without a lung cancer diagnosis in the DLCST show a stable benign nodule in which all algorithms produced low malignancy risk scores. (D) Biennial low-dose axial chest CT images in a 78-year-old male participant without a lung cancer diagnosis in the MILD show a stable part-solid benign nodule, in which the DL algorithm, which combines current and prior CT, produced a low malignancy risk score. However, the algorithm that only processed a single CT produced a high malignancy risk score. PanCan = Pan-Canadian Early Lung Cancer Detection Study.

Figure 6: Examples of screening-detected pulmonary nodules from the Danish Lung Cancer Screening Trial (DLCST) and the Multicentric Italian Lung Detection Trial (MILD), wherein malignancy risks were estimated incorrectly by the deep learning (DL) algorithm that combines a current and prior CT examination. The lines correspond to the malignancy risk estimation algorithms (solid blue, DL algorithm with prior CT; dotted blue, DL algorithm; dotted green, PanCan model). The percentages correspond to the risk scores from 0% to 100%. (A) Annual low-dose axial chest CT images in a 63-year-old man with a lung cancer diagnosis in the DLCST show a growing juxtapleural nodule that was found to be malignant. All algorithms produced low malignancy risk scores. (B) Biennial low-dose axial chest CT images in a 78-year-old man with a lung cancer diagnosis in the MILD show a slow-growing nonsolid malignancy with a volume doubling time (VDT) of 926 days and a mass doubling time of 755 days. The algorithm produced a low malignancy risk score, whereas the DL algorithm produced a high malignancy risk score by looking at a single CT examination. (C) Annual low-dose axial chest CT images in a 70-year-old woman without a lung cancer diagnosis in the DLCST show a false-positive finding: a part-solid nodule suspicious for cancer with a growing solid component. All algorithms produced high malignancy risk scores. (D) Biennial low-dose axial chest CT images in a 69-year-old man without a lung cancer diagnosis in the MILD show another false-positive finding: a growing solid nodule (VDT of 574 days). The algorithm produced a high malignancy risk score, whereas the DL algorithm produced a low risk score by looking at a single CT image. PanCan = Pan-Canadian Early Lung Cancer Detection Study.
Discussion
Prior chest CT examinations provide valuable temporal information (eg, changes in nodule size or appearance) to accurately estimate malignancy risk (14). In our study, we trained a deep learning (DL) algorithm to integrate imaging data from current and prior low-dose CT examinations to estimate the 3-year malignancy risk of pulmonary nodules. We trained the algorithm with 10 508 nodules (422 malignant) from the National Lung Screening Trial that were imaged at most 2 years apart (3). When evaluated with two external cancer-enriched subsets (composed of size-matched benign nodules) from the Danish Lung Cancer Screening Trial (16) and the Multicentric Italian Lung Detection Trial (17), the algorithm achieved areas under the receiver operating characteristic curve (AUCs) of 0.91 and 0.94, respectively. It outperformed a previously validated DL algorithm that only processed a single CT examination (9) (AUC, 0.85 [P < .002] and 0.89 [P = .01], respectively) and the Pan-Canadian Early Lung Cancer Detection Trial model (18) (AUC, 0.64 [P < .001] and 0.63 [P < .001], respectively). These results suggest that our DL algorithm effectively incorporates prior CT imaging to estimate the malignancy risk of pulmonary nodules.
Although the difference in performance was the highest in the size-matched subsets, the DL algorithm also achieved excellent AUCs of 0.97 and 0.99 in the full test sets, composed of 1117 nodules (43 malignant) and 4639 nodules (42 malignant) from DLCST and MILD, respectively. It outperformed the DL algorithm that processed a single CT examination (AUCs, 0.96 [P < .001] and 0.98 [P < .001]) and outperformed the updated nodule management protocol from the NELSON (19,20), which combines volume doubling time and volume cutoffs (AUC, 0.94 [P = .005] and 0.93 [P < .001]).
Previous studies (8–11,30) demonstrated the potential of DL algorithms for estimating pulmonary nodule malignancy risk. However, these algorithms were designed to process only a single CT examination. Ardila et al (31) proposed a DL algorithm that estimates participant-level cancer risk using a prior and current CT examination. However, this approach does not provide malignancy risk scores for individual nodules, which is a necessary condition for clinical integration (32). To our knowledge, there are few, if any, studies that provide comprehensive evidence across multiple independent data sets of performance gains achieved by DL algorithms for estimating pulmonary nodule malignancy risk with the help of prior CT examinations. Our algorithm is also publicly available for research use (https://grand-challenge.org/algorithms/temporal-nodule-analysis/).
Our study had limitations. First, we restricted our analysis to annual and biennial follow-up CT examinations. Because the NLST data set is only composed of annual or biennial follow-up examinations, we did not include nodules imaged with shorter follow-up intervals (3- or 6-month follow-up) similar to those found in, for example, the NELSON trial (4). Second, the algorithm only looked at one prior CT examination. We did not design the algorithm to include information from multiple prior CT examinations, which is necessary for long-term surveillance in screening. Future research with well-curated nodule data sets, including heterogenous follow-up intervals and multiple prior CT images, may stimulate the evolution of these DL algorithms. Third, some participants had multiple nodules that were included, which may not be independent of each other. Nevertheless, it is important to highlight that previous research regarding prediction of nodule malignancy also included multiple nodules per patient (18,31). The crucial aspect is to meticulously select both malignant nodules in participants with a lung cancer diagnosis, confirmed at pathology, and benign nodules from participants who were not diagnosed with lung cancer. This selection process was carefully performed in our study. Fourth, we only included cancers diagnosed within 3 years of the latest CT examination. It is likely that cancers diagnosed after this period are either small indolent tumors or do not show aggressive temporal characteristics earlier in screening (15) and could be safely monitored with follow-up imaging. Therefore, we recommend this algorithm only for computing a participant’s 3-year lung cancer risk. Finally, the performance of our algorithm on incidentally detected pulmonary nodules is currently unknown. As previously demonstrated by Massion et al (8), DL algorithms trained on nodules from NLST, like ours, have exhibited excellent performance in predicting malignancy risk of incidental pulmonary nodules, including those found in people who never smoked. However, it is important to note that chest CT examinations performed in nonsmokers at routine clinical practice may be diagnostic in nature, with varying acquisition protocols compared with those observed at screening. These scans may also include additional lung abnormalities such as pleural effusion, consolidation, and other infectious lesions. Therefore, further research is needed to assess and improve the performance of our algorithm in pulmonary nodules found in a population with a different risk profile.
In conclusion, we trained a deep learning algorithm that integrates imaging data from current and prior low-dose screening CT examinations to estimate the 3-year malignancy risk of pulmonary nodules. The algorithm exhibited excellent performance and outperformed other validated models across multiple independent cohorts, suggesting its potential in reducing unnecessary follow-up CT or diagnostic interventions in annual and biennial screening (33). Integrating these validated algorithms in dedicated screening CT viewers may also present opportunities to optimize CT reading workflows in cost-effective lung cancer screening programs (34,35).
Author Contributions
Author contributions: Guarantors of integrity of entire study, K.V.V., U.P., C.J.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, K.V.V., M.S., U.P., B.v.G., C.J.; clinical studies, Z.S., U.P., C.J.; experimental studies, K.V.V., T.A.A., E.T.S., M.S., U.P., B.v.G., C.J.; statistical analysis, K.V.V., T.A.A., U.P., C.J.; and manuscript editing, all authors
Study supported in part by a research grant from MeVis Medical Solutions.
Data sharing: Data generated or analyzed during the study are available from the corresponding author by request.
References
- 1. . Cancer statistics, 2020. CA Cancer J Clin 2020;70(1):7–30.
- 2. . MILD trial, strong confirmation of lung cancer screening efficacy. Nat Rev Clin Oncol 2019;16(9):529–530.
- 3. . Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365(5):395–409.
- 4. . Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial. N Engl J Med 2020;382(6):503–513.
- 5. . Lung-RADS Assessment Categories version1.1. https://www.acr.org/-/media/ACR/Files/RADS/Lung-RADS/LungRADSAssessmentCategoriesv1-1.pdf. Accessed October 1, 2022.
- 6. . British Thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax 2015;70(Suppl 2):ii1–ii54. [Published correction appears in Thorax 2015;70(12):1188.]
- 7. . Artificial intelligence for detection and characterization of pulmonary nodules in lung cancer CT screening: ready for practice? Transl Lung Cancer Res 2021;10(5):2378–2388.
- 8. . Assessing the Accuracy of a Deep Learning Method to Risk Stratify Indeterminate Pulmonary Nodules. Am J Respir Crit Care Med 2020;202(2):241–249.
- 9. . Deep Learning for Malignancy Risk Estimation of Pulmonary Nodules Detected at Low-Dose Screening CT. Radiology 2021;300(2):438–447.
- 10. . Lung cancer prediction by Deep Learning to identify benign lung nodules. Lung Cancer 2021;154:1–4.
- 11. . Artificial Intelligence Tool for Assessment of Indeterminate Pulmonary Nodules Detected with CT. Radiology 2022;304(3):683–691.
- 12. . Lung Cancer Screening with CT: A Few Steps on a Long Journey. Radiology 2021;300(2):448–449.
- 13. . The Need for Medical Artificial Intelligence That Incorporates Prior Images. Radiology 2022;304(2):283–288.
- 14. . Deep learning to stratify lung nodules on annual follow-up CT. Lancet Digit Health 2019;1(7):e324–e325.
- 15. . Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. Lancet Digit Health 2019;1(7):e353–e362.
- 16. . The Danish randomized lung cancer CT screening trial--overall design and results of the prevalence round. J Thorac Oncol 2009;4(5):608–614.
- 17. . Prolonged lung cancer screening reduced 10-year mortality in the MILD trial: new confirmation of lung cancer screening efficacy. Ann Oncol 2019;30(7):1162–1169.
- 18. . Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med 2013;369(10):910–919.
- 19. . Lung cancer probability in patients with CT-detected pulmonary nodules: a prespecified analysis of data from the NELSON trial of low-dose CT screening. Lancet Oncol 2014;15(12):1332–1341.
- 20. . European position statement on lung cancer screening. Lancet Oncol 2017;18(12):e754–e766.
- 21. . Results of the Randomized Danish Lung Cancer Screening Trial with Focus on High-Risk Profiling. Am J Respir Crit Care Med 2016;193(5):542–551.
- 22. . Predictive Accuracy of the PanCan Lung Cancer Risk Prediction Model -External Validation based on CT from the Danish Lung Cancer Screening Trial. Eur Radiol 2015;25(10):3093–3099.
- 23. . Long-Term Active Surveillance of Screening Detected Subsolid Nodules is a Safe Strategy to Reduce Overtreatment. J Thorac Oncol 2018;13(10):1454–1463.
- 24. . Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge. IEEE Trans Med Imaging 2011;30(11):1901–1920.
- 25. . Morphological segmentation and partial volume analysis for volumetry of solid pulmonary lesions in thoracic CT scans. IEEE Trans Med Imaging 2006;25(4):417–434.
- 26. . Malignancy risk estimation of pulmonary nodules in screening CTs: Comparison between a computer model and human observers. PLoS One 2017;12(11):e0185032.
- 27. . nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18(2):203–211.
- 28. . pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12(1):77.
- 29. . Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44(3):837–845.
- 30. . External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules. Thorax 2020;75(4):306–312.
- 31. . End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 2019;25(6):954–961. [Published correction appears in Nat Med 2019;25(8):1319.]
- 32. . Google’s lung cancer AI: a promising tool that needs further validation. Nat Rev Clin Oncol 2019;16(9):532–533.
- 33. . Protocol and Rationale for the International Lung Screening Trial. Ann Am Thorac Soc 2020;17(4):503–512.
- 34. . Lung cancer screening. Lancet 2023;401(10374):390–408.
- 35. . Assisted versus Manual Interpretation of Low-Dose CT Scans for Lung Cancer Screening: Impact on Lung-RADS Agreement. Radiol Imaging Cancer 2021;3(5):e200160.
- 36. . The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 2011;38(2):915–931.
- 37. . iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network. Sci Rep 2019;9(1):11591.
- 38. . Deep Residual Learning for Image Recognition.
2016 IEEE Conf Comput Vis Pattern Recognit CVPR . IEEE, 2016; 770–778. - 39. . Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 1999;10(3):61–74. https://scholar.google.com/citations?view_op=view_citation&hl=en&user=rtWKzFwAAAAJ&citation_for_view=rtWKzFwAAAAJ:u-x6o8ySG0sC.
Article History
Received: Jan 9 2023Revision requested: Mar 1 2023
Revision received: May 8 2023
Accepted: June 9 2023
Published online: Aug 01 2023