Will AI Improve Tumor Delineation Accuracy for Radiation Therapy?
See also the article by Lin and Dou et al in this issue.
Introduction
For head and neck nasopharyngeal carcinoma (NPC), radiation therapy has been the primary treatment modality due to the anatomic characteristics and radiosensitivity of NPC. In radiation therapy, the adequacy of radiation dose and field coverage is critical since studies have shown that undercoverage of dose to the tumor target could compromise treatment outcomes. A recent intensity-modulated radiation therapy study of NPC reported that the 5-year local failure-free rate showed a reduction from 90% to 54% if the volume of the primary nasopharyngeal tumor that received less than 66.5 Gy (considered insufficient) exceeded 3.4 cm3 (P < .001) (1). To deliver precise radiation to the tumor while sparing the surrounding organs at risk requires accurate delineation and segmentation of tumor targets. This critical task is generally performed manually. Conventional manual delineation is quite tedious and time consuming. More concerning is the subjective nature of this process. Even with years of experience and skill, intra- and interobserver variation of gross target volumes and organs at risk remains substantial.
Various proposed automated delineation and segmentation methods assist with manual delineation. These methods generally fall into three categories: (a) atlas-based methods (2), (b) statistical appearance model-based methods (3,4), and (c) deep learning–based methods (5–7).
In atlas-based methods, a reference library of delineated structures is first established based on a reference image data set. Once a new image data set is fused and registered with the reference image data set, the reference delineated structures can be projected to the new image data set, generating a new set of delineated structures. On the basis of published reports, atlas-based methods could improve the efficiency of registration and the accuracy of delineation (2). Although atlas-based methods have been used successfully in various applications, they have the inherent weakness of sensitivity to image registration and could be subject to considerable discrepancies. Unlike atlas-based methods, statistical appearance model-based methods are based on the principle that the delineation of structures should only enable extrapolation of a shape consistent with the characteristics of the structures (3). To take advantage of the strengths of the atlas and the statistical model, a previous research study combined the two methods to improve automated segmentation of head and neck CT images for radiation therapy (4). Deep learning–based methods fall into the third category. As a subfield of artificial intelligence (AI), the concept of deep learning demonstrates promising results in image recognition, object classification, language translation, disease detection, and other complicated tasks. The convolutional neural network (CNN) is one of the most commonly used deep learning networks. The CNN consists of an input layer, an output layer, and multiple convolutional and other mathematical processing layers between the input and output layers. The key idea of a deep learning CNN is that the model parameters of these processing layers are not defined by users but instead are derived or learned from the training data fed to the network (5,6). With recent developments in hardware, especially graphics processing units, Krizhevsky et al proposed a CNN model, AlexNet, in 2012 to reduce the error rate by nearly half for object recognition; their work was considered a breakthrough in the field of AI and paved the way for wide adoption of CNN (5,7). A recent report shows successful automatic multiorgan segmentation using CNN in patients with head and neck cancer for radiation therapy planning (8).
In this issue of Radiology, Lin and colleagues investigate the potential value of deep machine learning in tumor target delineation at radiation therapy for NPC (9). The authors use three-dimensional CNN to develop a contouring tool to automate primary gross tumor delineation of NPC based on the multimodal MRI data sets from 1021 patients. Specifically, the authors train the network based on 818 patients with diagnosed NPC and test the network by using a separate testing cohort of 203 patients with NPC. The authors include four MRI data sets for each patient: (a) a standard unenhanced T1-weighted data set, (b) a standard unenhanced T2-weighted data set, (c) a contrast material–enhanced T1-weighted data set, and (d) a fat-suppressed T1-weighted data set, all acquired with 3.0-T MRI scanners with consistent imaging parameters, including 3-mm section thickness and no intersection gap. This approach minimizes the impact of different imaging parameters on the performance of the proposed AI contouring tool. The performance evaluation of the AI contouring tool used two quantitative measures: Dice similarity coefficient (DSC) and average surface distance. DSC quantifies the overlap between the automated contour and the reference contour, while average surface distance measures the mean deviation between surfaces of the two contours.
In their retrospective multiobserver study, Lin and colleagues compare the automated primary gross tumor contours of the testing cohort of 203 patients with NPC against those by expert radiation oncologists–considered reference or ground truth contours (9). The AI contouring tool achieved a relatively high accuracy of primary gross tumor contours (median DSC, 0.79; average surface distance, 2.0 mm) compared with the reference contours. By using the authors’ grading criteria for contour accuracy, the majority (180 of 203 [88.7%]) of the AI-generated contours receive a satisfactory score, with slight or no revision by the experts. Furthermore, the study compares the AI tool against eight qualified radiation oncologists from seven high-volume academic institutions, each of which cares for more than 200 patients with NPC each year. In this evaluation, the authors randomly selected 20 patients from the testing cohort of 203 patients with NPC.
One interesting finding of the study is that the AI contouring tool outperformed four of the eight radiation oncologists (median DSC, 0.79 vs 0.71–0.74; P < .05) and performed comparably to the other four (median DSC, 0.78–0.80). As the authors found, when the eight radiation oncologists used AI assistance for primary gross tumor contouring, five of them improved their contour accuracy, as measured by average median DSC, from 0.74 to 0.79 (P < .001). With AI assistance, intraobserver variation of the radiation oncologists quantified by median interquartile deviation of DSC showed a reduction from 0.11 (interquartile range, 0.09–0.14) to 0.07 (interquartile range, 0.05–0.08); P = .02), with an overall reduction of 36.0%. Consistent with the reduction of intraobserver variation, interobserver variation of the eight radiation oncologists was also reduced. Although many of the results reported in the study are significant, it remains unproven whether improvements with the AI tool lead to an improved therapeutic outcome in radiation therapy of patients with NPC.
The report by Lin and colleagues also suggests that the AI contouring tool can help enhance efficiency in delineation (while conventional manual delineation is widely acknowledged as being time consuming). Their study compares the average time spent on delineation of primary gross tumor by the same group of radiation oncologists, both with and without the aid of the AI contouring tool. With the aid of the AI contouring tool, Lin et al observe an average time reduction from 30.2 minutes to 18.3 minutes (P < .001), indicating nearly a 40% improvement in efficiency.
Lin and colleagues point out some limitations. The performance of the CNN-based AI contouring tool depends on the anatomic locations. More specifically, they observed inferior accuracy of primary gross tumor contouring at the cavernous sinus and uvula (median section-based DSC, 0.75) compared with that at the skull base and Eustachian cushion (median section-based DSC, 0.82–0.83). These results imply that contouring accuracy of the AI tool could be compromised for tumors that infiltrate into the cavernous sinus and hypopharyngeal regions, particularly in patients with highly advanced–stage NPC. Lin and colleagues suggest alleviating this limitation by including more NPC cohorts with more intracranial and hypopharyngeal extensions into the training data sets.
Given their results, Lin and colleagues suggest that the AI contouring tool could have a positive impact on tumor control and patient survival if used in clinical practice. However, before the AI contouring tool is fully adopted into clinical use as a part of standard practice, it needs validation in more independent multicenter studies with larger patient cohorts. Although the AI contouring tool shows promising results for NPC primary tumor delineation in this study, section-by-section verification of tumor contour by radiation oncologists should never be omitted.
Medical imaging data contain much more information than what can be appreciated by human observers, including experienced specialists. In contrast, AI neural networks have strong learning and fast processing capabilities. AI neural networks could reveal information in the medical imaging data that escapes human eyes to develop a more accurate and robust automated contouring tool. Soon, with rapid technical developments, more advanced AI algorithms are likely to make AI-based contouring tools even more powerful and essential for radiation therapy.
Disclosures of Conflicts of Interest: Z.C. disclosed no relevant relationships.References
- 1. The impact of dosimetric inadequacy on treatment outcome of nasopharyngeal carcinoma with IMRT. Oral Oncol 2014;50(5):506–512. Crossref, Medline, Google Scholar
- 2. Atlas-based auto-segmentation of head and neck CT images. Med Image Comput Comput Assist Interv 2008;11(Pt 2): 434–441. Medline, Google Scholar
- 3. . Active shape models: their training and application. Comput Vis Image Underst 1995;61(1):38–59. Crossref, Google Scholar
- 4. . Automatic segmentation of head and neck CT images for radiotherapy treatment planning using multiple atlases, statistical appearance models, and geodesic active contours. Med Phys 2014;41(5):051910. Crossref, Medline, Google Scholar
- 5. . Deep learning. Nature 2015;521(7553):436–444. Crossref, Medline, Google Scholar
- 6. . Convolutional neural networks for radiologic images: a radiologist’s guide. Radiology 2019;290(3):590–606. Link, Google Scholar
- 7. . ImageNet classification with deep convolutional neural networks. In: Proc Advances in Neural Information Processing Systems 2012;25:1090–1098. Google Scholar
- 8. . Fully automatic multi-organ segmentation for head and neck cancer radiotherapy using shape representation model constrained fully convolutional neural networks. Med Phys 2018;45(10):4558–4567. Crossref, Medline, Google Scholar
- 9. Deep learning for automated contouring of primary tumor volumes by MRI for nasopharyngeal carcinoma. Radiology 2019;291:677–686. Link, Google Scholar
Article History
Received: Feb 19 2019Revision requested: Feb 25 2019
Revision received: Feb 27 2019
Accepted: Feb 27 2019
Published online: Mar 26 2019
Published in print: June 2019








