Deep Learning for Malignancy Risk Estimation of Pulmonary Nodules Detected at Low-Dose Screening CT
Accurate estimation of the malignancy risk of pulmonary nodules at chest CT is crucial for optimizing management in lung cancer screening.
To develop and validate a deep learning (DL) algorithm for malignancy risk estimation of pulmonary nodules detected at screening CT.
Materials and Methods
In this retrospective study, the DL algorithm was developed with 16 077 nodules (1249 malignant) collected between 2002 and 2004 from the National Lung Screening Trial. External validation was performed in the following three cohorts collected between 2004 and 2010 from the Danish Lung Cancer Screening Trial: a full cohort containing all 883 nodules (65 malignant) and two cancer-enriched cohorts with size matching (175 nodules, 59 malignant) and without size matching (177 nodules, 59 malignant) of benign nodules selected at random. Algorithm performance was measured by using the area under the receiver operating characteristic curve (AUC) and compared with that of the Pan-Canadian Early Detection of Lung Cancer (PanCan) model in the full cohort and a group of 11 clinicians composed of four thoracic radiologists, five radiology residents, and two pulmonologists in the cancer-enriched cohorts.
The DL algorithm significantly outperformed the PanCan model in the full cohort (AUC, 0.93 [95% CI: 0.89, 0.96] vs 0.90 [95% CI: 0.86, 0.93]; P = .046). The algorithm performed comparably to thoracic radiologists in cancer-enriched cohorts with both random benign nodules (AUC, 0.96 [95% CI: 0.93, 0.99] vs 0.90 [95% CI: 0.81, 0.98]; P = .11) and size-matched benign nodules (AUC, 0.86 [95% CI: 0.80, 0.91] vs 0.82 [95% CI: 0.74, 0.89]; P = .26).
The deep learning algorithm showed excellent performance, comparable to thoracic radiologists, for malignancy risk estimation of pulmonary nodules detected at screening CT. This algorithm has the potential to provide reliable and reproducible malignancy risk scores for clinicians, which may help optimize management in lung cancer screening.
© RSNA, 2021
See also the editorial by Tammemägi in this issue.
A deep learning algorithm, developed for malignancy risk estimation of pulmonary nodules detected at screening CT, demonstrated good discriminative performance in a large independent validation set.
■ A deep learning algorithm developed for malignancy risk estimation of nodules detected at a single low-dose screening chest CT examination outperformed the clinically established Pan-Canadian Early Detection of Lung Cancer model in an external validation cohort (area under the receiver operating characteristic curve [AUC], 0.93 vs 0.90, respectively; P = .046).
■ In cancer-enriched subsets with and without size-matched benign nodules, the algorithm showed excellent performance (AUC, 0.96 and 0.86, respectively) and performed comparably to four thoracic radiologists (AUC, 0.90 and 0.82; P = .11 and 0.26, respectively).
The National Lung Screening Trial (NLST) and the Dutch-Belgian Lung Cancer Screening trial showed that screening high-risk individuals with low-dose chest CT reduced lung cancer mortality by 20% and 26%, respectively (1,2). This is linked to a beneficial stage shift, with stage I and II lung cancer having a much better prognosis than stage III or IV lung cancer (3). Lung cancer typically manifests as pulmonary nodules at CT. However, most nodules are benign and do not require further clinical workup. Nodule management guidelines and data-driven models have been developed to reduce the rate of false-positive findings and avoid overtreatment (4–8), but it remains a challenge to accurately distinguish between benign and malignant nodules (9).
Deep learning (DL) with convolutional neural networks (CNNs) has recently become a method of choice for analyzing medical images (10). Several studies (11–13) showcased the potential of CNNs in predicting the malignancy risk of a pulmonary nodule by using the publicly available Lung Image Database Consortium image collection data set (14). However, these studies used the subjective labels provided by radiologists and lacked a solid reference standard set by histopathologic examination for malignant nodules and at least 2 years of imaging follow-up for benign nodules. Ardila et al (15) developed a DL algorithm that processes a whole CT image to predict patient-level malignancy risk. However, without risk scores for individual nodules, these algorithms are difficult to integrate as a second opinion in conjunction with current clinical guidelines like the Lung CT Screening Reporting and Data System (Lung-RADS) by the American College of Radiology (4,16). Another study evaluated a DL algorithm on two clinical data sets with a proven reference standard, but external validation was not performed in screening cohorts and no comparisons to clinicians were provided (17).
In our study, we developed and externally validated a DL algorithm on the basis of CNNs for malignancy risk estimation of pulmonary nodules detected at low-dose screening CT, with a reference standard set by histopathologic examination or at least 2 years of follow-up. Our validation included a performance comparison to 11 clinicians and the clinically established Pan-Canadian Early Detection of Lung Cancer (PanCan) model (7) in external screening cohorts.
Materials and Methods
In this retrospective study, the development data set included anonymized low-dose CT examinations in participants who participated in the NLST (1) between 2002 and 2004. Permission was obtained from the NLST (1) through the National Cancer Institute Cancer Data Access System (approved Project IDs: NLST-74, NLST-111, NLST-164 and NLST-267; Appendix E1 [online]). Institutional review board approval was obtained at each of the 33 centers involved in the NLST, and all participants provided informed consent.
The external validation data set included anonymized low-dose CT examinations in individuals who participated in the Danish Lung Cancer Screening Trial (DLCST) (18) between 2004 and 2010. The Ethics Committee of Copenhagen County approved the study, and informed consent was obtained from all participants.
The nodules from the NLST were annotated by an experienced radiologist (E.T.S., with > 5 years of experience in reading lung screening CT) and two medical students (in their 4th and 5th years of undergraduate studies) who were trained by the radiologist (E.T.S.). All nodules from the DLCST were recorded by two thoracic radiologists who were responsible for the initial readings during the course of screening (18).
NLST cohort.— The NLST database lists the nodules found during screening with lobar locations and CT section numbers, but exact nodule coordinates are not available. The experienced radiologist (E.T.S.) inspected all images in participants diagnosed with lung cancer within the study period (median follow-up, 6.5 years) to retrospectively locate the malignant nodules. With the ability to retrospectively examine the images in participants across screening rounds, all nodules with malignant morphologic features located in the tumor-bearing lobe were registered as lung cancer with high certainty. Other nodules were excluded from our study. CT images in participants who were not diagnosed with lung cancer were read by the two medical students. They had access to the lobe location and CT section number and were instructed to locate all nodules recorded for that image in the NLST database. Nodules with a mean diameter (on the basis of volumetric measurements; Appendix E1 [online]) smaller than 4 mm were excluded. The process is shown in Figure 1.
Full DLCST cohort.— For participants who developed lung cancer, the first image on which the lung cancer was annotated was included. For participants without lung cancer, the nodule annotations from the baseline images were used. Follow-up information regarding the presence of histologic analysis–proven malignancies was available for 9 years. Together, these nodules formed the full DLCST cohort. These nodules were used in a previous study (19) to compare the PanCan model, Lung-RADS, and the National Comprehensive Cancer Network guidelines.
Cancer-enriched DLCST subsets.— For comparisons with clinicians, two cancer-enriched subsets were selected from the DLCST cohort. Both subsets included all lung cancer nodules and twice as many benign nodules. For subset A, the benign nodules were sampled at random; for subset B, the benign nodules were size matched to the cancers before sampling at random to remove the effect of nodule size. If a participant had more than one malignant lesion, only one was included at random in the observer study. Details of these subsets were published previously in an observer study that compared the PanCan model with 11 clinicians (20).
Algorithm Development and Validation
We trained a DL algorithm to predict nodule malignancy risk from only raw voxels by using an ensemble of two-dimensional and three-dimensional CNNs (Fig 2). CT images and nodule coordinates were used as inputs for the algorithm. The two-dimensional CNN was based on the system developed by Setio et al (21) with ResNet50 (22) backbone, and the three-dimensional CNN was based on three-dimensional–inflated Inception-v1 architecture (open source; https://github.com/hassony2/kinetics_i3d_pytorch) (15,23). They were trained independently on the NLST cohort and internally validated by using 10-fold cross validation, resulting in 10 CNNs for each architecture (Appendix E1 [online]). To predict the malignancy risk of a new nodule, the scores from the 20 CNNs were averaged to produce a risk score from 0 to 1. The algorithm is freely accessible online for research purposes (https://grand-challenge.org/algorithms/pulmonary-nodule-malignancy-prediction).
The algorithm was externally validated in the DLCST cohorts. To compare with a clinically established nodule risk calculator, the performance of PanCan model 2b (7) was assessed (Appendix E1 [online]).
A group of 11 clinicians independently assessed the malignancy risk of all the nodules in the cancer-enriched cohorts. This was reported in a previous observer study (20). The group included four thoracic radiologists (with > 10 years of experience in reading chest CT images and/or intense research training with respect to interpreting nodules detected at screening CT), five radiology residents (4th- or 5th-year residents including M.M.W.W., who was a resident), and two pulmonologists (with > 25 years and > 10 years of experience as pulmonologists) from eight institutions. The nodules were presented to the clinicians in random order and each clinician was instructed to predict the malignancy risk on a scale from 0 to 100. All clinicians assessed all nodules in both cancer-enriched cohorts.
Discrimination performance was assessed by using multireader multicase receiver operating characteristic analysis with the publicly available iMRMC software (version 4.0.3, 2019; U.S. Food and Drug Administration) (24). The area under the receiver operating characteristic curve (AUC) for human readers was obtained through a diagonal average. Diagonal averaging was performed by averaging the individual curves in the direction of x = −y at every point in the x = y diagonal line, where x = 1 − specificity and y = sensitivity (24).
Multireader multicase data were computed following U statistics to provide unbiased estimates of the variance components. For variance analysis, the algorithm was set as the first modality and the clinicians or PanCan model were set as the second modality. The total variance was decomposed into eight moments from first principles (proposed by Gallas ), or BDG moments; the decomposition treated benign nodules separately from malignant nodules so that the total variance was generalizable to new readers, new benign nodules, and new malignant nodules. P values were computed from a t test with eight degrees of freedom (the number of BDG moments). P values less than .05 indicated a statistically significant difference.
Summary of Data Sets
NLST cohort.— From the 26 722 participants who underwent three rounds of annual CT screening in the NLST, chest CT data in 9421 participants were made available (Appendix E1 [online]). This included 1043 participants diagnosed with 1099 lung cancers during the study period; 390 of those 1099 cancers (35.5%) were excluded because they could not be retrospectively located with high certainty by the radiologist (E.T.S.). In total, 720 malignant nodules for 709 lung cancers in 686 participants remained. For these 720 malignant nodules, there were 1249 annotations available on a total of 1199 CT images across all screening rounds. In the 8378 participants who were not diagnosed with lung cancer, 4596 participants had 9304 nodules with 14 828 nodule annotations on 8984 CT images across all screening rounds. Thus, the NLST cohort that was used for the development of the DL algorithm had a total of 1249 malignant nodules and 14 828 benign nodules (Fig 1, Table 1).
Full DLCST cohort.— From the 2052 participants in the CT screening arm, 96 participants were diagnosed with 100 lung cancers within the study period. We excluded 32 cancers that were diagnosed after the screening period had ended. Additionally, five cancers could not be retrospectively identified with high certainty by the screening radiologists. From the remaining 59 participants with 63 lung cancers, the first occurrences of the malignant nodules were selected, resulting in 65 malignant nodules from 62 CT examinations. In the 1956 participants without a lung cancer diagnosis, 540 participants had 818 nodules in their baseline CT images; these nodules were included. Thus, the full DLCST cohort consisted of 65 malignant and 818 benign nodules.
Cancer-enriched DLCST subsets.— For the observer study, the first cancer-enriched cohort (subset A) included 59 malignant and 118 random benign nodules. After the clinicians scored the nodules, two benign nodules were found to be from participants diagnosed with lung cancer after the screening period and were excluded, leaving 116 random benign nodules for analysis. The second cancer-enriched cohort (subset B) included the same 59 malignant nodules and 118 size-matched benign nodules (Fig 1, Table 2).
By using 10-fold cross validation, the DL algorithm achieved an AUC of 0.91 (95% CI: 0.90, 0.92) in the NLST cohort. The AUC of the PanCan model in the NLST cohort was 0.84 (95% CI: 0.83, 0.85; P < .001). At a specificity of 90%, the sensitivities were 891 of 1249 (71%) and 626 of 1249 (50%) for the DL algorithm and the PanCan model, respectively.
External validation of the algorithm in the full DLCST cohort resulted in an AUC of 0.93 (95% CI: 0.89, 0.96), significantly outperforming the PanCan model with an AUC of 0.90 (95% CI: 0.86, 0.93; P = .046) (Fig 3). At a specificity of 90%, the sensitivities were 54 of 65 (84%) and 41 of 65 (63%) for the DL algorithm and the PanCan model, respectively.
The DL algorithm had an AUC of 0.96 (95% CI: 0.93, 0.99) in subset A, which was significantly better than the average AUC of the clinicians (AUC, 0.90; 95% CI: 0.87, 0.94; P = .01) and comparable to that of the PanCan model (AUC, 0.94; 95% CI: 0.91, 0.97; P = .32). The discriminatory performance of the DL algorithm was not significantly different from that of the four thoracic radiologists (AUC, 0.90; 95% CI: 0.81, 0.98; P = .11), the five radiology residents (AUC, 0.92; 95% CI: 0.89, 0.95; P = .051), and the two pulmonologists (AUC, 0.88; 95% CI: 0.79, 0.97; P = .07). The algorithm had the highest AUC and the highest sensitivity (54 of 59; 91%) at a specificity of 90% among the PanCan model and the 11 clinicians (Table 3).
In the size-matched subset B, the DL algorithm had an AUC of 0.86 (95% CI: 0.80, 0.91) (Fig 4). The performance of the DL algorithm was significantly better than that of the PanCan model (AUC, 0.75; 95% CI: 0.67, 0.82; P < .001) but was comparable to that of the clinicians (average AUC, 0.82; 95% CI: 0.77, 0.86; P = .12). This performance did not differ from that of the thoracic radiologists (AUC, 0.82; 95% CI: 0.74, 0.89; P = .26), radiology residents (AUC, 0.81; 95% CI: 0.76, 0.87; P = .13), or pulmonologists (AUC, 0.82; 95% CI: 0.77, 0.88; P = .26). The algorithm had a higher AUC than 10 of 11 clinicians and a higher sensitivity (32 of 59 [54%] at a specificity of 90%) than eight of the 11 clinicians (Table 3). Only one thoracic radiologist had a higher AUC (0.89; 95% CI: 0.84, 0.94; P = .32), with a sensitivity of 39 of 59 (67%) at a specificity of 90%.
Accurate estimation of the malignancy risk of pulmonary nodules detected at screening CT is crucial for optimizing management in lung cancer screening and remains a challenging task for radiologists (9). In this study, we developed a deep learning algorithm for malignancy risk estimation of pulmonary nodules by using low-dose CT examinations from the National Lung Screening Trial (1) and externally validated it in the Danish Lung Cancer Screening Trial (DLCST) (18). The reference standard was based on histopathologic confirmation or follow-up with CT for more than 2 years. The algorithm was based on two-dimensional and three-dimensional convolutional neural networks with information from a single CT examination. In the full DLCST cohort, the algorithm had an area under the receiver operating characteristic curve (AUC) of 0.93 and significantly outperformed the Pan-Canadian Early Detection of Lung Cancer (PanCan) model (AUC, 0.90; P = .046), a renowned multivariable risk model recommended by several nodule management guidelines (4,5,7). In cancer-enriched subsets with random benign and size-matched benign nodules, the algorithm had AUCs of 0.96 and 0.86 and performed comparably to thoracic radiologists (AUC, 0.90 and 0.82, respectively; P = .112 and .26, respectively). The algorithm significantly outperformed the PanCan model only in the size-matched cancer-enriched subset (AUC, 0.94 and 0.75, respectively; P = .32 and <.001, respectively). This suggests that although nodule size remains a strong predictor for malignancy, the algorithm relies more on imaging characteristics for its discriminative power than does the PanCan model (25).
We made the algorithm freely accessible to the public for research purposes (https://grand-challenge.org/algorithms/pulmonary-nodule-malignancy-prediction) along with a Model Facts label (26) containing the most important information for end users (Appendix E1 [online]).
Ardila et al (15) developed a DL algorithm for patient-level lung cancer prediction that performed favorably compared with six radiologists (specializations not reported) who were tasked with providing Lung-RADS scores in an internal cohort from the NLST. The comparison with Lung-RADS has its limitations because Lung-RADS provides not a direct malignancy risk estimation but a follow-up recommendation instead. In addition, the algorithm does not provide risk scores for individual nodules, making it challenging for integration with well-established clinical guidelines that rely on the characteristics and biologic behavior of nodules (16). Massion et al (17) validated a DL algorithm for nodule malignancy prediction, which was developed by using NLST nodules, in two external cohorts of incidental pulmonary nodules from clinical practice. However, no comparisons with clinical experts were performed; therefore, the performance compared with thoracic radiologists is, to our knowledge, currently unknown.
Our study had several limitations. First, the developed algorithm used only one CT image and did not consider a previous CT image, if available. This algorithm is therefore highly suitable for nodules first observed at screening, similar to the PanCan model; however, for nodules detected at incidence screening, their growth and appearance on the previous CT image are important. Second, we did not calibrate the risk scores for a certain target population so that, for example, a 10% risk score implied that there was a 10% probability of lung cancer diagnosis within the next 5 years (27,28). Third, we did not assess how the algorithm would affect the radiologists’ assessment (29). Radiologists can upgrade highly suspicious nodules to the Lung-RADS 4X category, recommending clinical workup of the nodule (30). The developed algorithm may help to make this decision more rational and thereby reduce the substantial interobserver variability (31). Finally, 35.5% of lung cancers from NLST were excluded because they could not be retrospectively located with high certainty on the CT images. The lack of exact nodule coordinates from NLST made it challenging to retrospectively identify these malignant nodules and resulted in a selection bias in our development cohort. However, the exact nodule coordinates for all the lung cancers from DLCST were available to us, and therefore this selection bias was not present in our external validation cohorts.
We foresee a demand for trained human observers, aided by reliable artificial intelligence systems, that act as first readers of chest CT when lung cancer screening programs are introduced worldwide (29,32). Future work could incorporate clinical parameters such as age, sex, smoking history, follow-up CT data, and imaging features related to chronic obstructive pulmonary diseases (33,34) and cardiovascular disease (35). Prospective validation with newer multicenter data sets is also needed.
In conclusion, we successfully developed a deep learning algorithm for malignancy risk estimation of pulmonary nodules detected at low-dose screening CT that was generalizable across screening populations and protocols. The algorithm had a discriminative performance comparable with that of clinical experts. This deep learning algorithm may aid radiologists in optimizing follow-up recommendations for participants undergoing lung cancer screening and may lead to fewer unnecessary diagnostic interventions. In addition, this could serve to lower radiologists’ workload and reduce the costs of lung cancer screening.Disclosures of Conflicts of Interest: K.V.V. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed money to author for employment from Predible Health; money to author for stock/stock options from Predible Health. Other relationships: disclosed no relevant relationships. A.A.A.S. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed money paid to author for employment from Siemens Healthineers; disclosed money to author’s institution for patents from Siemens Healthineers; disclosed money to author for stock/stock options from Siemens Healthineers; disclosed money to author’s institution for pending patents from Siemens Healthineers. Other relationships: disclosed no relevant relationships. A.S. disclosed no relevant relationships. E.T.S. disclosed no relevant relationships. K.C. disclosed no relevant relationships. M.M.W.W. disclosed no relevant relationships. Z.S. disclosed no relevant relationships. B.v.G. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed money to author for grants/grants pending from the Botnar Foundation; disclosed money to author’s institution for royalties from MeVis Medical Solutions, Thirona, and Delft Imaging; disclosed money to author for stock/stock options from Thirona. Other relationships: disclosed no relevant relationships. M.P. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed money to author’s institution for grants/grants pending from Canon Medical Systems, Siemens Healthineers; disclosed payment for speakers bureau from Canon Medical Systems, Siemens Healthineers; disclosed patents from Canon Medical Systems; disclosed royalties from Mevis Medical Solutions; disclosed travel cost paid to author from Canon Medical Systems, Siemens Healthineers. Other relationships: disclosed no relevant relationships. C.J. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: disclosed money to author’s institution for royalties from MeVis Medical Solutions. Other relationships: disclosed no relevant relationships.
Author contributions: Guarantors of integrity of entire study, K.V.V., A.A.A.S., C.J.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, K.V.V., A.A.A.S., A.S., Z.S., B.v.G., C.J.; clinical studies, K.C., M.M.W.W., Z.S., B.v.G.; experimental studies, K.V.V., A.A.A.S., B.v.G., M.P., C.J.; statistical analysis, K.V.V., Z.S., C.J.; and manuscript editing, all authors
Study supported by a research grant from MeVis Medical Solutions; A.A.A.S. supported by the Netherlands Organization for Scientific Research (639.023.207).
- 1. National Lung Screening Trial Research Team; . Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365(5):395–409. Crossref, Medline, Google Scholar
- 2. . Reduced Lung-Cancer Mortality with Volume CT Screening in a Randomized Trial. N Engl J Med 2020;382(6):503–513. Crossref, Medline, Google Scholar
- 3. . Cancer statistics, 2020. CA Cancer J Clin 2020;70(1):7–30. Crossref, Medline, Google Scholar
American College of Radiology Committee on Lung-RADS. Lung-RADS Assessment Categories version1.1. https://www.acr.org/-/media/ACR/Files/RADS/Lung-RADS/LungRADSAssessmentCategoriesv1-1.pdf. Accessed October 30, 2020. Google Scholar
- 5. . British Thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax 2015;70(Suppl 2):ii1–ii5.[Published correction appears in Thorax 2015;70(12):1188.]. Crossref, Medline, Google Scholar
- 6. . Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017. Radiology 2017;284(1):228–243. Link, Google Scholar
- 7. . Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med 2013;369(10):910–919. Crossref, Medline, Google Scholar
- 8. . Evaluation of prediction models for identifying malignancy in pulmonary nodules detected via low-dose computed tomography. JAMA Netw Open 2020;3(2):e1921221. Crossref, Medline, Google Scholar
- 9. . Protocol and Rationale for the International Lung Screening Trial. Ann Am Thorac Soc 2020;17(4):503–512. Crossref, Medline, Google Scholar
- 10. . A survey on deep learning in medical image analysis. Med Image Anal 2017;42(60):88. Google Scholar
- 11. . Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification. Pattern Recognit 2017;61(663):673. Google Scholar
- 12. . Highly accurate model for prediction of lung nodule malignancy with CT scans. Sci Rep 2018;8(1):9286. Crossref, Medline, Google Scholar
- 13. . An interpretable deep hierarchical semantic convolutional neural network for lung nodule malignancy classification. Expert Syst Appl 2019;128(84):95. Google Scholar
- 14. . The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. Med Phys 2011;38(2):915–931. Crossref, Medline, Google Scholar
- 15. . End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med 2019;25(6):954–96.[Published correction appears in Nat Med 2019;25(8):1319.]. Crossref, Medline, Google Scholar
- 16. . Google’s lung cancer AI: a promising tool that needs further validation. Nat Rev Clin Oncol 2019;16(9):532–533. Crossref, Medline, Google Scholar
- 17. . Assessing the Accuracy of a Deep Learning Method to Risk Stratify Indeterminate Pulmonary Nodules. Am J Respir Crit Care Med 2020;202(2):241–249. Crossref, Medline, Google Scholar
- 18. . Results of the Randomized Danish Lung Cancer Screening Trial with Focus on High-Risk Profiling. Am J Respir Crit Care Med 2016;193(5):542–551. Crossref, Medline, Google Scholar
- 19. . Malignancy risk estimation of screen-detected nodules at baseline CT: comparison of the PanCan model, Lung-RADS and NCCN guidelines. Eur Radiol 2017;27(10):4019–4029. Crossref, Medline, Google Scholar
- 20. . Malignancy risk estimation of pulmonary nodules in screening CTs: Comparison between a computer model and human observers. PLoS One 2017;12(11):e0185032. Medline, Google Scholar
- 21. . Pulmonary nodule detection in CT images: false positive reduction using multi-view convolutional networks. IEEE Trans Med Imaging 2016;35(5):1160–1169 10.1109/TMI.2016.2536809. Crossref, Medline, Google Scholar
- 22. . Deep Residual Learning for Image Recognition. In:
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, June 27–30, 2016.Piscataway, NJ:IEEE,2016. Crossref, Google Scholar
- 23. . Automated Assessment of COVID-19 Reporting and Data System and Chest CT Severity Scores in Patients Suspected of Having COVID-19 Using Artificial Intelligence. Radiology 2021;298(1):E18–E28. Link, Google Scholar
- 24. . iMRMC-java v4.0.3: Application for Analyzing and Sizing MRMC Reader Studies. Silver Spring, MD;2017. https://github.com/DIDSR/iMRMC/releases. Accessed April 1, 2020. Google Scholar
- 25. . Brock malignancy risk calculator for pulmonary nodules: validation outside a lung cancer screening population. Thorax 2018;73(9):857–863. Crossref, Medline, Google Scholar
- 26. . Presenting machine learning model information to clinical end users with model facts labels. NPJ Digit Med 2020;3(1):41. Crossref, Medline, Google Scholar
- 27. . Lung cancer risk to personalise annual and biennial follow-up computed tomography screening. Thorax 2018;73(7):626–633. Crossref, Google Scholar
- 28. . Lung cancer incidence and mortality in National Lung Screening Trial participants who underwent low-dose CT prevalence screening: a retrospective cohort analysis of a randomised, multicentre, diagnostic screening trial. Lancet Oncol 2016;17(5):590–599. Crossref, Medline, Google Scholar
- 29. . Artificial intelligence for detection and characterization of pulmonary nodules in lung cancer CT screening: ready for practice?. Translational Lung Cancer Research. https://tlcr.amegroups.com/article/view/41564. Published 2020. Accessed April 22, 2021. Google Scholar
- 30. . Lung-RADS Category 4X: Does It Improve Prediction of Malignancy in Subsolid Nodules?. Radiology 2017;284(1):264–271. Link, Google Scholar
- 31. . National lung screening trial: variability in nodule detection rates in chest CT studies. Radiology 2013;268(3):865–873. Link, Google Scholar
- 32. . Computer Vision Tool and Technician as First Reader of Lung Cancer Screening CT Scans. J Thorac Oncol 2016;11(5):709–717. Crossref, Medline, Google Scholar
- 33. . Predicting all-cause and lung cancer mortality using emphysema score progression rate between baseline and follow-up chest CT images: A comparison of risk model performances. PLoS One 2019;14(2):e0212756. Crossref, Medline, Google Scholar
- 34. . Normalized emphysema scores on low dose CT: Validation as an imaging biomarker for mortality. PLoS One 2017;12(12):e0188902. Crossref, Medline, Google Scholar
- 35. . Automatic calcium scoring in low-dose chest CT using deep neural networks with dilated convolutions. IEEE Trans Med Imaging 2018;37(2):615–625. Crossref, Medline, Google Scholar
- 36. . Adam: A Method for Stochastic Optimization. arXiv:1412.6980. https://arxiv.org/abs/1412.6980. Published December 22, 2014. Accessed April 22, 2021. Google Scholar
- 37. . ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis 2015;115:211–252 10.1007/s11263-015-0816-y. Crossref, Google Scholar
- 38. . Going Deeper with Convolutions. arXiv:1409.4842v1. https://arxiv.org/abs/1409.4842v1. Published September 17, 2014. Accessed April 22,2021. Google Scholar
- 39. . Action Recognition? A New Model and the Kinetics Dataset. In:
2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, July 21–26, 2017. Piscataway, NJ:IEEE,2017. Crossref, Google Scholar
- 40. . The Kinetics Human Action Video Dataset. arXiv:1705.06950. https://arxiv.org/abs/1705.06950. Published May 19, 2017. Accessed November 27, 2020. Google Scholar
- 41. . Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA 2016;316(22):2402–2410. Crossref, Medline, Google Scholar
- 42. . Morphological segmentation and partial volume analysis for volumetry of solid pulmonary lesions in thoracic CT scans. IEEE Trans Med Imaging 2006;25(4):417–434. Crossref, Medline, Google Scholar
Article HistoryReceived: Dec 3 2021
Revision requested: Dec 29 2021
Revision received: Mar 8 2021
Accepted: Mar 26 2021
Published online: May 18 2021
Published in print: Aug 2021