Differentiation of Benign from Malignant Pulmonary Nodules by Using a Convolutional Neural Network to Determine Volume Change at Chest CT
Abstract
Background
Deep learning may help to improve computer-aided detection of volume (CADv) measurement of pulmonary nodules at chest CT.
Purpose
To determine the efficacy of a deep learning method for improving CADv for measuring the solid and ground-glass opacity (GGO) volumes of a nodule, doubling time (DT), and the change in volume at chest CT.
Materials and Methods
From January 2014 to December 2016, patients with pulmonary nodules at CT were retrospectively reviewed. CADv without and with a convolutional neural network (CNN) automatically determined total nodule volume change per day and DT. Area under the curves (AUCs) on a per-nodule basis and diagnostic accuracy on a per-patient basis were compared among all indexes from CADv with and without CNN for differentiating benign from malignant nodules.
Results
The CNN training set was 294 nodules in 217 patients, the validation set was 41 nodules in 32 validation patients, and the test set was 290 nodules in 188 patients. A total of 170 patients had 290 nodules (mean size ± standard deviation, 11 mm ± 5; range, 4–29 mm) diagnosed as 132 malignant nodules and 158 benign nodules. There were 132 solid nodules (46%), 106 part-solid nodules (36%), and 52 ground-glass nodules (18%). The test set results showed that the diagnostic performance of the CNN with CADv for total nodule volume change per day was larger than DT of CADv with CNN (AUC, 0.94 [95% confidence interval {CI}: 0.90, 0.96] vs 0.67 [95% CI: 0.60, 0.74]; P < .001) and CADv without CNN (total nodule volume change per day: AUC, 0.69 [95% CI: 0.62, 0.75]; P < .001; DT: AUC, 0.58 [95% CI: 0.51, 0.65]; P < .001). The accuracy of total nodule volume change per day of CADv with CNN was significantly higher than that of CADv without CNN (P < .001) and DT of both methods (P < .001).
Conclusion
Convolutional neural network is useful for improving accuracy of computer-aided detection of volume measurement and nodule differentiation capability at CT for patients with pulmonary nodules.
© RSNA, 2020
Summary
A convolutional neural network combined with computer-aided detection of pulmonary nodule volume at CT showed that nodule volume change per day was a better predictor of malignancy than was nodule volume doubling time.
Key Results
■ The diagnostic performance of a convolutional network (CNN) with computer-aided detection of volume (CADv) for total nodule volume change per day was larger (area under the curve, 0.94) than was the doubling time of CADv with CNN for determining the presence of nodule malignancy (area under the curve, 0.94 vs 0.67, respectively; P < .001).
■ On a per-patient basis, total nodule volume change per day incorporating CNN analysis more accurately differentiated malignant from benign nodules than did nodule volume doubling time (P < .001).
Introduction
Low-dose CT for lung cancer screening was shown to be a cost-effective method for reducing overall mortality by the National Lung Screening Trial (1). Several trials performed since the publication of the National Lung Screening Trial results have confirmed the utility of this method for early diagnosis, with the potential for cure of early-stage lung cancer in an at-risk population (2–5). In addition, the recommendations by the Fleischner Society for incidentally detected pulmonary nodules and the introduction of the American College of Radiology’s Lung Imaging Reporting and Data System reporting rubric both use total nodule size for nodule management (6–8). On the other hand, other investigators including the Dutch-Belgian Randomized Lung Cancer Screening Trial have suggested that total nodule volume measurement and/or total nodule doubling time (DT) assessed with computer-aided detection of volume (CADv) software are also useful for nodule management (9–12). In addition, several studies have shown that the maximal diameter of the solid component within a ground-glass nodule correlates more accurately with tumor invasiveness and prognosis than does total size (the maximal diameter of the solid component size and the ground-glass opacity [GGO] component size) (13–15). Therefore, measurement of the solid component size within a pulmonary nodule at CT has been adopted as the clinical T factor in the eighth version of the Union for International Cancer Control TNM classification for subsolid lung cancer (16,17).
Because of the importance of determining total nodule size, we developed a number of CADv systems that were evaluated in the Quantitative Imaging Biomarkers Alliance, or QIBA, and Japan QIBA studies (18,19). This software is now clinically available; it automatically differentiates solid from the nonsolid components of a pulmonary nodule and calculates its volume as well as maximum and minimum in-plane dimensions. It has further been suggested that artificial intelligence including convolutional neural network (CNN) is useful for nodule detection, characterization, and differentiation at CT (20–25). However, to our knowledge, no studies have been reported on the utility of CNN for automatic measurement of nodule components by using CADv or have demonstrated its utility for nodule volumetric assessment in the treatment of patients with pulmonary nodules.
The purpose of this study was to determine the utility of CNN for improving the accuracy of CADv versus CADv without CNN for measuring total nodule volume, total nodule volume DT, and change in total nodule volume per day at chest CT for the prediction of the likelihood of lung malignancy.
Materials and Methods
Protocol, Support, and Funding
The training cases in this study were retrospectively obtained and all test cases were gathered as a prospective study, which was approved by our institutional review board of Kobe University Hospital (Kobe, Japan). Written informed consent was obtained from each patient. This study was financially and technically supported by Canon Medical Systems, Smoking Research Foundation, and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology (no. 18K07675). Two of the authors are employees of Canon Medical Systems (K.A) and Toshiba (A.Y) but did not have control over any of the data used in this study.
Patients
Training set.—A total of 270 cases were retrospectively collected from our institution between January 2007 and March 2013 (Fig 1). The inclusion criteria for test cases were as follows: nodule equal to or greater than 4 mm and less than 30 mm, no visualization of calcification at standard-dose CT with 1-mm section thickness, and nodule confirmed with pathologic examination or more than 2-year follow-up CT examinations. In this study, 53 cases were excluded due to insufficient follow-up CT studies within the 2-year follow-up period (n = 41) or because the patient did not visit our hospital after the initial CT examination (n = 12). Then, 217 training cases were finally included in this study.
Validation set.—Fifty-three cases were retrospectively collected from our institution between April 2013 and December 2013 (Fig 1). The same inclusion criteria, which was applied for selecting test cases, was applied for selecting validation set. Then, 21 cases were excluded due to insufficient follow-up CT studies within the 2-year follow-up period (n = 14) or because the patient did not visit our hospital after the initial CT examination (n = 7). Finally, 32 cases were included as validation set in this study.
Test set.—From January 2014 to December 2016, a total of 188 patients suspected of having pulmonary nodules who met the same inclusion and exclusion criteria, and who prospectively underwent unenhanced initial and follow-up CT examinations with 320–detector row CT and 64–detector row CT systems at our institution, were originally included in this study. Eighteen were excluded because CT revealed no evidence of pulmonary nodule or masses (n = 9) and for multiple lung metastases due to colon cancer (n = 3), breast cancer (n = 2), laryngeal cancer (n = 2), renal cell carcinoma (n = 1), and rectal cancer (n = 1) (Fig 1). Finally, 170 patients consisting of 95 men (mean age ± standard deviation, 68 years ± 9; age range, 47–83 years) and 75 women (mean age, 68 years ± 10; age range, 43–87 years) were selected as test cases.
Development of CADv by Using CNN
Figure 2 shows a flowchart of the proposed method. For input, our method requires a chest CT volume measurement with a lung mask to identify lung regions in the volume, and a seed point indicating the center of a target nodule. The lung mask is automatically generated in advance through a segmentation method. The seed point can be automatically provided by CAD systems, but it was manually inputted in our study. First, a volume of interest centered at the seed point is clipped with a cube, and its spacing is converted to isotropic at 0.6 mm. Because we intended to evaluate nodules with a diameter of 30 mm or less, we set one side of the volume of interest as 64 voxels, resulting in a volume of interest of 38.4 mm on each side, which was sufficiently large to cover each nodule. Next, preprocessing, three-dimensional fully convolutional network, and conditional random field were sequentially applied to the volume of interest, and the nodule mask was then applied to the original image. The flowchart of CADv with CNN is shown in Figure 2, and the details of these procedures are described in Appendix E1 (online).
Image Analysis
All measurements by means of CADv with and without CNN were performed by a radiologist (Y.O., with 23 years of experience) by using a commercially available workstation (Vitrea; Vital Images, Minnetonka, Minn). In this study, CADv without CNN was assessed by means of a workstation (Vitrea; Vital Images) and evaluated with automatic three-dimensional volume software (CT Lung Nodule Analysis; Vital Images). Details of this software have been described elsewhere (19). In addition, CADv with CNN was also evaluated on the same workstation. However, the CADv with CNN used proprietary software (Proto-CADv with CNN, version 3.2) provided by Canon Medical Systems and was installed on the same workstation provided by Vitrea.
Evaluation of accuracy of nodule component volume measurement.—To evaluate the accuracy of volume measurements of the various individual components (ie, solid, cavitary, and GGO) within each pulmonary nodule, we performed a subset analysis in 69 pulmonary nodules consisting of 38 part-solid nodules, 17 solid nodules, and 14 ground-glass nodules, which were randomly selected from 290 nodules.
For these 69 nodules, their solid, GGO, and cavity component volumes were manually segmented by three board-certified radiologists (S.S., Y.U., and Y.K., with 11 years, 12 years, and 9 years of experience, respectively). Then, the standard references for solid, GGO, and total nodule volumes of each nodule were determined with the simultaneous truth and performance level estimation, or STAPLE, method (28–31). The STAPLE method generates a probabilistic estimate of the true segmentation from multiple segmentations by computing an optimal combination of each segmentation according to the estimated performance level. Note that the STAPLE method is independent from the process of our CADv. Representative cases for generating reference standard by using STAPLE are shown in Figure 3 and Figures E1 and E2 (online). Finally, we computed the volumes of solid, GGO, cavity, and total nodule by the product of a per-voxel volume and the number of voxels for each component in the true segmentation. In this study, total nodule volume was determined from solid and GGO component volumes. In addition, each volume was also assessed by using CADv with and without CNN in two times. Then, final value of each CADv was determined as mean value between the first and second measurements.
Assessment of nodule differentiation capability based on parameters derived from CADv.—To assess nodule differentiation capability based on parameters obtained with CADv with and without CNN, total nodule volume change per day and DT were calculated for each nodule from initial and follow-up CT examinations in this study by using:
Statistical Analysis
Evaluation of accuracy of volume accuracy for each component within pulmonary nodules.—To assess variability of each CADv in test cases, Lin concordance correlation coefficient between first and second measurement of GGO, solid, and total nodule volumes was assessed. In addition, the reproducibility coefficient of each volume measurement between first and second measurement by using each CADv was also determined (32).
In 69 of 290 test nodules, Lin concordance correlation coefficients and multiple regression analyses were evaluated between standard reference volumes acquired with the STAPLE method and volumes measured by means of CADv with and without CNN of volumes of solid and GGO component, as well as total nodule volume.
Assessment of differentiation capability of malignant from benign nodules based on parameters derived from CADv in test cases.—Wilcoxon signed-rank test was used for a comparison between each parameter of malignant and benign nodules in test cases.
To assess differentiation capability of malignant from benign nodules based on parameters acquired by means of CADv with and without CNN, area under the curve (AUC) receiver operating curve analyses for differentiating malignant from benign nodules were performed on a per-nodule basis for the test cases. The threshold value of each parameter obtained from CADv with and without CNN was then determined based on Youden index (33). Finally, sensitivity, specificity, and accuracy of each parameter acquired with the two CADv systems were compared on a per-nodule basis as well as on a per-patient basis by means of the McNemar test in test cases. On a per-patient analysis, the largest nodule in each patient was assessed for calculation of sensitivity, specificity, and accuracy of each parameter obtained from CADv with and without CNN.
Results
Training Set
A total of 217 training cases consisted of 115 men (mean age, 67 years ± 10; age range, 48–85 years) and 102 women (mean age, 68 years ± 11; age range, 43–87 years) with 294 nodules: 137 solid nodules, 103 part-solid nodules, and 54 ground-glass nodules. In the training set, a total of 294 nodules (mean size, 11 mm ± 5; range, 4–29 mm) from the 217 training cases consisted of 137 solid nodules (mean, 9 mm ± 6; range, 4–29 mm), 103 part-solid nodules (mean, 11 mm ± 7; range, 4–29 mm), and 54 ground-glass nodules (mean, 7 mm ± 7; range, 4–29 mm). Of these, 168 nodules were diagnosed as follows: 164 lung cancers, four metastatic lung tumors, and 126 benign nodules.
Validation Set
As the validation set, 32 validation cases consisted of 18 men (mean age, 67 years ± 10; age range, 51–82 years) and 14 women (mean age, 67 years ± 10; age range, 49–86 years) with 41 nodules: 20 solid nodules, 11 part-solid nodules, and 10 ground-glass nodules. In the validation set, a total of 41 nodules (mean, 11 mm ± 9; range, 4–29 mm) from the 32 validation cases consisted of 20 solid nodules (9 mm ± 112; range, 4–29 mm), 11 part-solid nodules (11 mm ± 12; range, 4–29 mm), and 10 ground-glass nodules (8 mm ± 13; range, 4–29 mm). Of these, 37 nodules were diagnosed as follows: 34 lung cancers, three metastatic lung tumors, and four benign nodules.
Test Set
A total of 188 patients suspected of having pulmonary nodules (101 men and 87 women; mean age, 69 years ± 10; age range, 43–87 years) who prospectively underwent unenhanced initial and follow-up CT examinations at our institution were originally included in this study. Then, 18 patients were excluded and 170 patients consisting of 95 men (mean age, 68 years ± 9; age range, 47–83 years) and 75 women (mean age, 68 years ± 10; age range, 43–87 years) were selected as test cases. Therefore, a total of 290 nodules (mean size, 11 mm ± 5; range, 4–29 mm) from the 170 test cases consisted of 132 solid nodules (mean, 9 mm ± 6; range, 4–29 mm), 106 part- solid nodules (mean, 12 mm ± 6; range, 4–29 mm), and 52 ground-glass nodules (mean, 7 mm ± 6; range, 4–29 mm). Furthermore, 132 malignant nodules diagnosed as 126 lung cancers, six metastatic lung cancers, and 158 benign nodules were included. In the test cases, 228 nodules were pathologically or microbacterially diagnosed, and 62 nodules were diagnosed as benign nodules based on more than 2-year follow-up examinations. The time between initial and follow-up CT examinations ranged from 28 days to 881 days (mean, 138 days ± 154).
Details of training, validation, and test cases are shown in Table 1. Representative cases of nodule measurements obtained with automated volume for all nodule types are shown in Figures 4 and 5 and in Figure E3 (online).
Lin concordance correlation coefficients (or ρc) for all nodule volumes between first and second measurement on each CADv and all reproducibility coefficients are shown in Table E1 (online). Although there were significant correlations between first and second measurement on each CADv (P < .001), concordant correlation coefficient of each volume on CADv with CNN (solid component: ρc = 0.99, P < .001; GGO component: ρc = 0.99, P < .001; total nodule volume component: ρc = 0.99, P < .001) and that without CNN (solid component: ρc = 0.99, P < .001; GGO component: ρc = 0.99, P < .001; total nodule volume component: ρc = 0.99, P < .001) were high. In addition, the reproducibility coefficient, or RC, of each volume on CADv with CNN (solid component: −929.3 mm3 ≤ RC ≤ 929.3 mm3; GGO component: −201.8 mm3 ≤ RC ≤ 201.8 mm3; total nodule: −939.2 mm3 ≤ RC ≤ 939.2 mm3) was smaller than that without CNN (solid component: −3039.1 mm3 ≤ RC ≤ 3039.1 mm3; GGO component: −1955.5 mm3 ≤ RC ≤ 1955.5 mm3; total nodule: −3714.5 mm3 ≤RC ≤ 3714.5 mm3).
Accuracy of Volume Measurements by Using CADv Software
A total of 69 of 290 pulmonary nodules were evaluated as test cases to determine accuracy of volume measurements compared with manual measurement (as the reference standard). Thirty-eight part-solid nodules, 17 solid nodules, and 14 ground-glass nodules were evaluated for this purpose.
Lin concordance correlation coefficients between each measured volume and the measurement standard reference are shown in Figure E4 (online). Lin concordance correlation coefficients of CADv with CNN (solid component volume: ρc = 0.87, 95% confidence interval [95% CI]: 0.82, 0.91; P < .001; GGO component volume: ρc = 0.78, 95% CI: 0.68, 0.85; P < .001; total nodule volume: ρc = 0.82, 95% CI: 0.75, 0.87; P < .001) were higher than were those of CADv without CADv (solid component volume: ρc = 0.60, 95% CI: 0.48, 0.69; P < .001; GGO component volume: ρc = 0.57, 95% CI: 0.39, 0.71; P < .001; total nodule volume: ρc = 0.56, 95% CI: 0.42, 0.67; P < .001).
Multiple regression analysis demonstrated that the standard reference of solid component (r2 = 0.87; P < .001) was significantly affected by CADv with CNN (P < .001) and CADv without CNN (P < .001). In addition, multiple regression analysis demonstrated that the standard reference of GGO component (r2 = 0.72; P < .001) was significantly affected by CADv with CNN (P < .001) and CADv without CNN (P = .01). Moreover, multiple regression analysis demonstrated that the standard reference of total nodule (r2 = 0.81; P < .001) was significantly and only affected by CADv with CNN (P < .001).
Concordant correlation coefficients of each volume between each radiologist and standard reference are shown in Table E2 (online). Concordant correlation coefficients of all investigators were significant and excellent on each component (0.96 ≤ r ≤ 0.99; P < .001).
Assessment of Differentiation Capability of Malignant from Benign Nodules Based on Parameters Derived from CADv
Results of receiver operating curve analysis for differentiating malignant from benign nodules by using each parameter derived from CADv with and without CNN in the test set are shown in Figure 6 and Table 2. AUC of total nodule volume change per day for CADv with CNN (AUC, 0.94; 95% CI: 0.90, 0.96) was higher than of other parameters obtained from CADv with CNN (DT: AUC, 0.67; 95% CI: 0.60, 0.74; P < .001) and without CNN (total nodule volume change per day: AUC, 0.69; 95% CI: 0.62, 0.74; P < .001; DT: AUC, 0.58; 95% CI: 0.51, 0.64; P < .001). AUC of DT for CADv with CNN was larger than that of DT for CADv without CNN (P = .03). Moreover, for CADv without CNN, AUC of total nodule volume change per day was significantly larger than that of DT (P = .004). The threshold values determined with CADv with CNN were greater than or equal to 1.29 mm3 per day for total nodule volume change per day and greater than or equal to 24 days for DT. On the other hand, the threshold values determined with CADv without CNN were greater than or equal to 1.19 mm3 per day for total nodule volume change per day and greater than or equal to 20 days for DT.
Results of comparisons of diagnostic utility of each parameter obtained with CADv with and without CNN on a per-nodule basis are also shown in Table 3. The results show that sensitivity, specificity, and accuracy of each parameter obtained with CADv with CNN were significantly higher than those derived from CADv without CNN (total nodule volume change per day: P < .001; DT: P < .001). In addition, a comparison of the two parameters obtained with either type of CADv (with or without CNN) showed that specificity and accuracy of total nodule volume change per day were significantly higher than those of DT (P < .001). However, sensitivity of DT was significantly higher than that of total nodule volume change per day (P < .001) observed on CADv with and without CNN.
Results of comparison for diagnostic performance on each parameter obtained by CADv with and without CNN on a per-patient basis analysis are shown in Table 4. On comparison of sensitivity, sensitivities of total nodule volume change per day and DT determined by using CADv with CNN were significantly higher than those by using CADv without CNN (total nodule volume change per day from CADv with CNN vs total nodule volume change per day from CADv without CNN: P < .001; total nodule volume change per day from CADv with CNN vs DT from CADv without CNN: P < .001; DT from CADv with CNN vs total nodule volume change per day from CADv without CNN: P < .001; DT from CADv with CNN vs DT from CADv without CNN: P < .001). In addition, DT from CADv without CNN was significantly higher than was total nodule volume change per day from CADv without CNN (P < .001). When compared with all specificities, specificity of total nodule volume change per day from CADv with CNN was significantly higher than that of others (P < .001). In addition, specificity of DT by using CADv with CNN was significantly higher than that of DT by using CADv without CNN (P < .001). Moreover, specificity of total nodule volume change per day from CADv without CNN was significantly higher than that of DT from CADv without CNN (P < .001). On comparison of accuracy among all indexes, accuracy of total nodule volume change per day of CADv with CNN was significantly higher than that of others (DT of CADv with CNN: P < .001; total nodule volume change per day of CADv without CNN: P < .001; DT of CADv without CNN: P < .001). Accuracy of DT assessed by using CADv with CNN was significantly higher than that of others (P < .001).
Discussion
We show that use of a convolutional neural network (CNN) improved the efficacy of computer-aided detection of volume (CADv) measurement of pulmonary nodules on chest CT images when CADv without CNN is used. In addition, CADv with CNN had better potential for differentiating malignant from benign nodules than did CADv without CNN on a per-nodule and a per-patient basis.
We found that the efficacy of area under the curve of total nodule volume change per day for CADv with CNN (AUC, 0.94; 95% CI: 0.90, 0.96) was significantly larger than was CADv total nodule DT (AUC, 0.67; 95% CI: 0.60, 0.74; P < .0001) for separating benign from malignant pulmonary nodules. CADv without CNN was less accurate than was CADv with CNN for determining malignancy of pulmonary nodules (total nodule volume change per day: AUC, 0.69 [95% CI: 0.62, 0.75]; P < .0001; DT: AUC, 0.58; P < .0001). In addition, when applied each threshold value in test cases, the sensitivity, specificity, or accuracy of each index assessed by using CADv with CNN were significantly higher than were those assessed by using CADv without CNN on a per-nodule basis and on a per-patient basis (P < .001). These facts suggest that CNN has a potential to improve differentiation capability of malignant nodule from benign nodule based on accurate volume evaluation within nodule. Therefore, CNN would be better to be applied in this setting.
Many groups have reported that total nodule volume DT showed a significant difference between malignant and benign nodules (11,34,35). In this single-site retrospective study, we show that total nodule volume change per day, obtained with CNN, had better efficacy than did DT for differentiating malignant from benign nodules. A limitation of these previous studies was that they used data from CT screening (11,34,35) and DT was calculated for nodules that were enlarging or stable. In this series, data were obtained from routine clinical practice, where the time between initial and follow-up CT examinations ranged from 28 days to 881 days (mean ± standard deviation, 138 days ± 154). As a result, many organizing pneumonias and acute inflammatory nodules were included. These benign nodules decreased in size, with total nodule volume change per day and DT having values of less than zero. These potentially different samples and clinical targets for CADv may have affected the study results. Although these differences in the populations sampled and clinical targets for CADv between this and previously reported studies should be taken into consideration, our results suggest that the enhanced measurement accuracy of total nodule volume change per day based on the improved accuracy of total nodule volume measurement can be used to help differentiate malignant from benign nodules and is therefore more useful than DT, if applied to follow-up CT examination. In addition, total nodule volume per day and DT were calculated from three input parameters: total nodule volumefollow-up CT, total nodule volumeinitial CT, and time duration (Δt). When considering Equations (1) and (2), total nodule volume per day is easier to calculate and this could be the reason for the better accuracy of this method for distinguishing between malignant and benign nodules. In addition, CADv with CNN was better than CADv without CNN for all metrics evaluated in this study (eg, total nodule volume, total nodule DT, nodule volume change per day nodule). Thus, CNN deserves to be added to the CADv postprocessing of pulmonary nodules to help improve the accuracy for measuring volume and volume changes at CT. Moreover, CADv with CNN had better potential for differentiating malignant from benign nodules than did CADv without CNN on a per-nodule and on a per-patient basis.
Our study had several limitations. This was a retrospective study and patients were only included if there was a follow-up CT examination on a baseline study with at least one nodule. These results may not be directly applicable to the screening population, although this could be a promising method for helping to separate benign from malignant nodules. The training and validation data were obtained from a single institution and only a few CT systems from only one CT vendor were used. The number of training and validation cases was limited. There was no external test set used for confirmation of this results. CT images from a single hardware vendor were used, and CT data from other vendors were not tested in this study. Moreover, further investigations are warranted for standardizing this software in future clinical practice. Therefore, it is necessary to apply other vendors’ data to validate this software as useful in future study. Comparisons of each nodule component and total nodule volumes among CADv with and without CNN and the reference standard made by the simultaneous truth and performance level estimation, or STAPLE, method were performed on a per-nodule basis. Moreover, the nodule size and types were limited and had selection biases. Also, the standard reference obtained with the STAPLE method in this study was made in only 69 of 290 nodules (24%) and not determined in all nodules due to the limited human resources for annotation of nodules and time for developing the software in this time. The role of CT acquisition parameters (eg, radiation dose–reduction techniques including automatic exposure control, higher beam pitch, different kernels) were not modified by state-of-the art CT techniques. We determined each cutoff value in test cases and did not determine these values in a separate and independent population (eg, no test set was studied). Therefore, the diagnostic performance in this study might be overestimated, and the fact was considered as one of the limitations in this study. Furthermore, diagnostic performance of index determined from CADv with CNN was not directly compared to radiologists with different experiences or specialties or pathologic results in this study, although the diagnostic performance of index determined from CADv with CNN was determined as useful than that from CADv without CNN. In addition, this study did not compare other features determined from radiomics analysis, which was suggested as useful for assessing lung nodule (36,37). Moreover, this software is currently proprietary software and not widely available all over the world. Furthermore, this study compared diagnostic performance between CADv with and without CNN on per-nodule and per-patient basis. However, we did not account clustering multiple nodules in a patient, and this fact might be considered as one of the problematic factors for assessing diagnostic performance of both systems. Therefore, further investigations are warranted for clinical setting of this software and demonstrating clinical relevance in the near future.
In conclusion, a convolutional neural network is useful for improving computer-aided detection of volume (CADv) measurement accuracy and nodule differentiation at thin-section CT. For the assessment of nodules detected with CADv, assessment of total nodule volume change per day was more accurate than was nodule doubling time in the determination of malignancy.
Author Contributions
Author contributions: Guarantor of integrity of entire study, Y.O.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, Y.O., K.A., A.Y., D.T.; clinical studies, Y.O., S.S., Y.U., Y.K., D.T., T.Y.; statistical analysis, Y.O., D.T.; and manuscript editing, Y.O., K.A., A.Y., D.T.
Y.O., S.S., D.T., and T.Y. supported by grants from Canon Medical Systems, Smoking Research Foundation, and Grants-in-Aid for Scientific Research from the Japanese Ministry of Education, Culture, Sports, Science and Technology (18K07675).
References
- 1. . Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365(5):395–409.
- 2. . CT screening for lung cancer brings forward early disease. The randomised Danish Lung Cancer Screening Trial: status after five annual screening rounds with low-dose CT. Thorax 2012;67(4):296–301.
- 3. . A decrease in lung cancer mortality following the introduction of low-dose chest CT screening in Hitachi, Japan. Lung Cancer 2012;78(3):225–228.
- 4. . Detection of lung cancer through low-dose CT screening (NELSON): a prespecified analysis of screening test performance and interval cancers. Lancet Oncol 2014;15(12):1342–1350.
- 5. . Long-Term Follow-up Results of the DANTE Trial, a Randomized Study of Lung Cancer Screening with Spiral Computed Tomography. Am J Respir Crit Care Med 2015;191(10):1166–1175.
- 6. . Guidelines for management of small pulmonary nodules detected on CT scans: a statement from the Fleischner Society. Radiology 2005;237(2):395–400.
- 7. . Recommendations for the management of subsolid pulmonary nodules detected at CT: a statement from the Fleischner Society. Radiology 2013;266(1):304–317.
- 8. . ACR CT accreditation program and the lung cancer screening program designation. J Am Coll Radiol 2015;12(1):38–42.
- 9. . Nodule management protocol of the NELSON randomised lung cancer screening trial. Lung Cancer 2006;54(2):177–184.
- 10. . NELSON lung cancer screening study. Cancer Imaging 2011;11(Spec No A):S79–S84.
- 11. . Lung cancers diagnosed at annual CT screening: volume doubling times. Radiology 2012;263(2):578–583.
- 12. . Prognostic importance of volumetric measurements in stage I lung adenocarcinoma. Radiology 2014;272(2):557–567.
- 13. . Prognostic impact of tumor size eliminating the ground glass opacity component: modified clinical T descriptors of the tumor, node, metastasis classification of lung cancer. J Thorac Oncol 2013;8(12):1551–1557.
- 14. . Correlation between whole tumor size and solid component size on high-resolution computed tomography in the prediction of the degree of pathologic malignancy and the prognostic outcome in primary lung adenocarcinoma. Acta Radiol 2015;56(10):1187–1195.
- 15. . Predictive CT features of visceral pleural invasion by T1-sized peripheral pulmonary adenocarcinomas manifesting as subsolid nodules. AJR Am J Roentgenol 2017;209(3):561–566.
- 16. . The IASLC lung cancer staging project: proposals for coding T categories for subsolid nodules and assessment of tumor size in part-solid tumors in the forthcoming eighth edition of the TNM classification of lung cancer. J Thorac Oncol 2016;11(8):1204–1223.
- 17. . Clinical and Pathological Staging Validation in the Eighth Edition of the TNM Classification for Lung Cancer: Correlation between Solid Size on Thin-Section Computed Tomography and Invasive Size in Pathological Findings in the New T Classification. J Thorac Oncol 2017;12(9):1403–1412.
- 18. . Algorithm Variability in the Estimation of Lung Nodule Volume From Phantom CT Scans: Results of the QIBA 3A Public Challenge. Acad Radiol 2016;23(8):940–952.
- 19. . Comparative evaluation of newly developed model-based and commercially available hybrid-type iterative reconstruction methods and filter back projection method in terms of accuracy of computer-aided volumetry (CADv) for low-dose CT protocols in phantom study. Eur J Radiol 2016;85(8):1375–1382 https://doi.org/10.1016/j.ejrad.2016.05.001.
- 20. . Multilevel Contextual 3-D CNNs for False Positive Reduction in Pulmonary Nodule Detection. IEEE Trans Biomed Eng 2017;64(7):1558–1567.
- 21. . Automatic lung nodule detection using a 3D deep convolutional neural network combined with a multi-scale prediction strategy in chest CTs. Comput Biol Med 2018;103:220–231.
- 22. . 3D convolutional neural network for differentiating pre-invasive lesions from invasive adenocarcinomas appearing as ground-glass nodules with diameters ≤3 cm using HRCT. Quant Imaging Med Surg 2018;8(5):491–499.
- 23. . Fast and fully-automated detection and segmentation of pulmonary nodules in thoracic CT scans using deep convolutional neural networks. Comput Med Imaging Graph 2019;74:25–36.
- 24. . Machine learning approach for distinguishing malignant and benign lung nodules utilizing standardized perinodular parenchymal features from CT. Med Phys 2019;46(7):3207–3216.
- 25. . Toward an Expert Level of Lung Cancer Detection and Classification Using a Deep Convolutional Neural Network. Oncologist 2019;24(9):1159–1165.
- 26. . Understanding the difficulty of training deep feedforward neural networks. Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), 2010; 249–256.
- 27. . Online learning rate adaptation with hypergradient descent. Proceedings of the International Conference on Learning Representations (ICLR), 2018. https://www.groundai.com/project/learning-an-adaptive-learning-rate-schedule/1. Submitted September 10, 2019. Accessed October 25, 2019.
- 28. . Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 2004;23(7):903–921.
- 29. . Simultaneous truth and performance level estimation with incomplete, over-complete, and ancillary data. In: Dawant BM, Haynor DR, eds. Proceedings of SPIE: medical imaging 2010—image processing. Vol 7623. Bellingham, Wash: International Society for Optics and Photonics, 2010; 76231N.
- 30. . Simultaneous truth and performance level estimation through fusion of probabilistic segmentations. IEEE Trans Med Imaging 2013;32(10):1840–1852.
- 31. . Deterioration of R-Wave Detection in Pathology and Noise: A Comprehensive Analysis Using Simultaneous Truth and Performance Level Estimation. IEEE Trans Biomed Eng 2017;64(9):2163–2175.
- 32. . Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1(8476):307–310.
- 33. . Youden Index and optimal cut-point estimated from observations affected by a lower limit of detection. Biom J 2008;50(3):419–430.
- 34. . Software volumetric evaluation of doubling times for differentiating benign versus malignant pulmonary nodules. AJR Am J Roentgenol 2006;187(1):135–142.
- 35. . Doubling times and CT screen–detected lung cancers in the Pittsburgh Lung Screening Study. Am J Respir Crit Care Med 2012;185(1):85–89.
- 36. . Radiomic features analysis in computed tomography images of lung nodule classification. PLoS One 2018;13(2):e0192002.
- 37. . Radiomics analysis of pulmonary nodules in low-dose CT for early detection of lung cancer. Med Phys 2018;45(4):1537–1549.
Article History
Received: Aug 4 2019Revision requested: Sept 24 2019
Revision received: Mar 15 2020
Accepted: Mar 25 2020
Published online: May 26 2020
Published in print: Aug 2020