Reviews and CommentaryFree Access

Deep Learning to Detect Pancreatic Cancer at CT: Artificial Intelligence Living Up to Its Hype

Published Online:https://doi.org/10.1148/radiol.222126

See also the article by Chen and Wu et al in this issue.

Alex M. Aisen, MD, is a retired academic radiologist who specialized in gastrointestinal and body imaging. He began his career at the University of Michigan and moved midcareer to Indiana University where he is currently a professor emeritus of Radiology and Imaging Sciences. Following his retirement from clinical practice, Dr Aisen began working in the commercial sector. He is presently employed as a clinical scientist at Philips Healthcare.

Alex M. Aisen, MD, is a retired academic radiologist who specialized in gastrointestinal and body imaging. He began his career at the University of Michigan and moved midcareer to Indiana University where he is currently a professor emeritus of Radiology and Imaging Sciences. Following his retirement from clinical practice, Dr Aisen began working in the commercial sector. He is presently employed as a clinical scientist at Philips Healthcare.

Pedro Rodrigues, PhD, has a doctorate in physics engineering from Instituto Superior Técnico, Lisbon, Portugal. He is a clinical scientist at Philips Healthcare.

Pedro Rodrigues, PhD, has a doctorate in physics engineering from Instituto Superior Técnico, Lisbon, Portugal. He is a clinical scientist at Philips Healthcare.

Artificial intelligence (AI) is one of the most exciting topics in modern radiology, and new algorithms are introduced seemingly every day. Some show potential promise to have a major impact on image interpretation, while others are less exciting.

Prior to roughly 2015, most AI algorithms were rules-based (1). In effect, they tried to imitate conscious human visual analysis; for example, they would look for local mass-like regions with altered radiodensity. A radiologist could understand what these tools were doing. Today, the approach is different. Most current algorithms are substantially more accurate but also less transparent in the way they work. Indeed, many AI algorithms are described as “black boxes”—we appreciate what they do, but often do not know how they do it. It is frequently not clear which features at which anatomic locations contribute to the output of the algorithm. The approach used today is called “deep learning.” Deep learning is based on a complex mathematical model incorporated in the software and usually the model is a software-implemented construct called a neural network, often of the type known as a convolutional neural network. Neural networks derive their name from their similarities to the human brain. Internally, the numerical data in a neural network flow from “nodes” that are computationally arranged in layers; the same sort of organization is present in portions of the brain (eg, the visual cortex). The data (numbers) that pass between the nodes in the layers are modified by “weights,” and these weights determine the structures a neural network algorithm will recognize (2). Because an algorithm is developed for a specific interpretive task, the weights are determined by a process called training. The nascent neural network model is allowed to process many images, which in radiologic applications are radiographs or scans that are usually labeled with known diagnoses and/or segmentations (delineations of organs and/or lesions). The training images determine and modify the weights, which enables the model to analyze unknown images such that the outputs (the diagnoses or segmentations) are consistent with the training data. The training data provide the foundation, ensuring that the model will recognize the same patterns.

In this issue of Radiology, Chen and Wu and colleagues (3) describe an advanced and powerful AI-based tool, an end-to-end software architecture comprised of several algorithms for the detection of pancreatic cancer. This deep learning–based tool had very good accuracy, and sensitivity approaching that of subspecialist radiologists, including reasonable sensitivity for small (<2 cm) lesions. For a test set of images from a tertiary medical center, the algorithm had 90% sensitivity and 96% specificity; the sensitivity of the original interpreting radiologists was 96%. Tools such as this promise to benefit routine workflow and patient care.

The authors implemented a two-stage algorithm, which is an approach well suited to many AI applications involving advanced imaging studies (4). In the initial stage, an AI-based module analyzed an abdominal CT study and segmented (identified and outlined the boundaries of) the pancreas, including a pancreatic tumor, if present. The likely location of the tumor was identified. The segmentation module produced image masks, such that only images of pancreatic parenchyma were presented at the second stage.

The second stage of the tool functioned to determine (classify) whether pancreatic cancer was present. This stage comprised five similar, but distinct, algorithms, each of which performed the same function of lesion detection. A neoplasm was considered to be present if four or five of the algorithms indicated a positive result. If not, the study was categorized as negative.

The performance of an Al tool is determined by many things. Two of the most important are the design of the neural network model and the patient data used to train it (5,6). The easier task is selecting and tuning the model. Today, “raw” untrained models well suited to medical imaging tasks are available from many reputable sources on the internet, including Google, NVIDIA, and the American College of Radiology. The more difficult task is acquiring and labeling the necessary training data. Labeling refers to determining, and preferably marking, the location of the abnormalities for which the algorithm is being developed. For a segmentation algorithm, the organ in question (in this case, pancreas) and the lesion (in this case, tumor) are outlined by qualified individuals (optimally, radiologists). Much training data are needed, usually hundreds of patient cases or more, and labeling can be an expensive and time-consuming task. The training data should be diverse, with cases fully representative of the patient populations for whom the algorithm will be deployed. A well-trained algorithm is called “generalizable,” and an algorithm that is inadequately trained is usually “brittle,” meaning inaccurate in clinical use (7).

After a trained algorithm is developed, it must be tested to ensure accuracy. The cases used for testing must be different from those used to train the algorithm. If possible, the test data should be from an external source (eg, from institutions other than those that contributed the training data) (8).

A major virtue of the algorithm in the Chen and Wu et al study (3) is the quantity and quality of the training and test data. Multiple sources were used for the various tasks, including training and testing the segmentation and detection processes. The sources included hundreds of studies (scans) from a tertiary care hospital in Taiwan, a database of more than twenty thousand cases from a Taiwanese country-wide national database, and several public databases. CT studies in individuals with normal pancreases (controls) and in those with proven cancer were included, as is appropriate. One limitation, as the authors note, is that the training data for classification were from a single tertiary Taiwan hospital. The standard of ground truth was excellent.

Another notable feature is that this algorithm evaluated three-dimensional data; all images containing pancreatic tissue were segmented and presented to the tumor detection process. This is likely a preferable approach to two-dimensional algorithms that analyze single tomographic images.

Furthermore, the procedure used to measure the composite algorithm’s accuracy was robust and demonstrated that the algorithm was nearly as sensitive as the specialist radiologists who interpreted the test cases clinically.

So where does this leave us? Many AI algorithms have been developed to assist in image interpretation across multiple pathologic conditions. Some are as good as those described in this article. A minority of these have received clearance or approval from the U.S. Food and Drug Administration and/or European authorities for commercial availability. Many algorithms that have passed regulatory muster are triaging tools, which prioritize imaging studies likely to have urgent findings on reading worklists for rapid review by radiologists. Some tools are designed to save the radiologist time; for example, those that detect, measure, and compare pulmonary nodules. A small number of cleared algorithms provide diagnostic information, such as the probability that a breast lesion is malignant. A handful of detection algorithms are designed to reduce the number of inadvertent misses. One of the first such commercial tools was ImageChecker (Hologic) for overreading mammograms (9).

The algorithm described herein falls into the latter category. It is designed to detect pancreatic cancer, not to render a diagnosis. It was not, for example, trained to differentiate pancreatic cancer from similar-appearing benign diseases, such as focal pancreatitis or benign neoplasms. Radiologists would recognize most of the tumors detectable by this tool, but we live in an imperfect world and there will be occasional misses of incidental tumors. If routine overreading by an algorithm such as this is performed, we may be able to reduce even further the small number of misses. This may be clinically important to the involved patients; early diagnosis is key to the possibility of a surgical cure.

To be clinically useful, this algorithm must obtain regulatory clearance. Once approved, the algorithm must achieve clinical deployment with as much reach as possible. It also must be deployed in a manner that integrates seamlessly into routine radiology workflow. Software products that facilitate workflow integration are becoming available and are often called “AI marketplaces.”

As AI is deployed in the real word, mechanisms to confirm that an algorithm is accurate at the sites where it is used and to ensure that its performance does not degrade over time would support quality, protect safety, and engender trust. This is especially true as protocols, imaging modalities, and patient populations evolve and change. Such surveillance may become a regulatory requirement (5,7,10).

Many believe we are approaching an era in which AI will begin to live up to its hype and gain widespread adoption in augmenting, but not replacing, radiologists. AI tools may triage most patient studies and prioritize reading worklists. Time-saving AI that reliably performs mundane tasks, such as detecting, enumerating, measuring, and assessing pulmonary nodules for change since prior studies, may become ubiquitous in the reading room. The same may hold true for oncology-based algorithms that measure and track tumors over time. AI may routinely measure organ volumes to quantitatively and reproducibly detect pathologic organomegaly or atrophy. AI may provide diagnostic assessments, including determining the probability that a lesion is malignant. Finally, a spectrum of AI algorithms, similar to the tool described in the study by Chen and Wu et al (3), designed to detect a large variety of abnormalities and that run concurrently may be implemented to overread cases and, thereby, reduce interpretation misses in a medically beneficial way.

We are not there yet, but we are heading in the right direction. Algorithms such as the one described here are helping to lead the way.

Disclosures of conflicts of interest: A.M.A. Employee and stockholder, Philips Healthcare. P.S.R. Employee, Philips Healthcare; patents planned, issued or pending.

References

  • 1. Burns JE, Yao J, Summers RM. Artificial Intelligence in Musculoskeletal Imaging: A Paradigm Shift. J Bone Miner Res 2020;35(1):28–35. Crossref, MedlineGoogle Scholar
  • 2. Erickson BJ. Basic Artificial Intelligence Techniques: Machine Learning and Deep Learning. Radiol Clin North Am 2021;59(6):933–940. Crossref, MedlineGoogle Scholar
  • 3. Chen P, Wu T, Wang P, et al. Pancreatic cancer detection on CT scans with deep learning: a nationwide population-based study. Radiology 2023;306(1):172–182. LinkGoogle Scholar
  • 4. Lee S, Summers RM. Clinical Artificial Intelligence Applications in Radiology: Chest and Abdomen. Radiol Clin North Am 2021;59(6):987–1002. Crossref, MedlineGoogle Scholar
  • 5. Kohli M, Prevedello LM, Filice RW, Geis JR. Implementing Machine Learning in Radiology Practice and Research. AJR Am J Roentgenol 2017;208(4):754–760. Crossref, MedlineGoogle Scholar
  • 6. Willemink MJ, Koszek WA, Hardell C, et al. Preparing Medical Imaging Data for Machine Learning. Radiology 2020;295(1):4–15. LinkGoogle Scholar
  • 7. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019;17(1):195. Crossref, MedlineGoogle Scholar
  • 8. Park SH, Han K. Methodologic Guide for Evaluating Clinical Performance and Effect of Artificial Intelligence Technology for Medical Diagnosis and Prediction. Radiology 2018;286(3):800–809. LinkGoogle Scholar
  • 9. Rao VM, Levin DC, Parker L, Cavanaugh B, Frangos AJ, Sunshine JH. How widely is computer-aided detection used in screening and diagnostic mammography? J Am Coll Radiol 2010;7(10):802–805. Crossref, MedlineGoogle Scholar
  • 10. Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018;18(8):500–510. Crossref, MedlineGoogle Scholar

Article History

Received: Aug 22 2022
Revision requested: Aug 24 2022
Revision received: Aug 25 2022
Accepted: Aug 29 2022
Published online: Sept 13 2022
Published in print: Jan 2023