Deep Learning to Classify Radiology Free-Text Reports

Published Online:

A deep learning convolutional neural network model can accurately classify free-text radiology reports when compared with a state-of-the-art method on reports from two academic institutions.


To evaluate the performance of a deep learning convolutional neural network (CNN) model compared with a traditional natural language processing (NLP) model in extracting pulmonary embolism (PE) findings from thoracic computed tomography (CT) reports from two institutions.

Materials and Methods

Contrast material–enhanced CT examinations of the chest performed between January 1, 1998, and January 1, 2016, were selected. Annotations by two human radiologists were made for three categories: the presence, chronicity, and location of PE. Classification of performance of a CNN model with an unsupervised learning algorithm for obtaining vector representations of words was compared with the open-source application PeFinder. Sensitivity, specificity, accuracy, and F1 scores for both the CNN model and PeFinder in the internal and external validation sets were determined.


The CNN model demonstrated an accuracy of 99% and an area under the curve value of 0.97. For internal validation report data, the CNN model had a statistically significant larger F1 score (0.938) than did PeFinder (0.867) when classifying findings as either PE positive or PE negative, but no significant difference in sensitivity, specificity, or accuracy was found. For external validation report data, no statistical difference between the performance of the CNN model and PeFinder was found.


A deep learning CNN model can classify radiology free-text reports with accuracy equivalent to or beyond that of an existing traditional NLP model.

© RSNA, 2017

Online supplemental material is available for this article.


  • 1. Dreyer KJ, Kalra MK, Maher MM, et al. Application of recently developed computer algorithm for automatic classification of unstructured radiology reports: validation study. Radiology 2005;234(2):323–329. LinkGoogle Scholar
  • 2. Friedman C, Alderson PO, Austin JH, Cimino JJ, Johnson SB. A general natural-language text processor for clinical radiology. J Am Med Inform Assoc 1994;1(2):161–174. Crossref, MedlineGoogle Scholar
  • 3. Hassanpour S, Langlotz CP, Amrhein TJ, Befera NT, Lungren MP. Performance of a machine learning classifier of knee MRI reports in two large academic radiology practices: a tool to estimate diagnostic yield. AJR Am J Roentgenol 2017;208(4):750–753. Crossref, MedlineGoogle Scholar
  • 4. Hripcsak G, Austin JH, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 2002;224(1):157–163. LinkGoogle Scholar
  • 5. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–444. Crossref, MedlineGoogle Scholar
  • 6. Lee H, Tajmir S, Lee J, et al. Fully automated deep learning system for bone age assessment. J Digit Imaging 2017;30(4):427–441. Crossref, MedlineGoogle Scholar
  • 7. Spampinato C, Palazzo S, Giordano D, Aldinucci M, Leonardi R. Deep learning for automated skeletal bone age assessment in x-ray images. Med Image Anal 2017;36:41–51. Crossref, MedlineGoogle Scholar
  • 8. Chapman BE, Lee S, Kang HP, Chapman WW. Document-level classification of CT pulmonary angiography reports based on an extension of the ConText algorithm. J Biomed Inform 2011;44(5):728–737. Crossref, MedlineGoogle Scholar
  • 9. Hua KL, Hsu CH, Hidayati SC, Cheng WH, Chen YJ. Computer-aided classification of lung nodules on computed tomography images via deep learning technique. Onco Targets Ther 2015;8:2015–2022. MedlineGoogle Scholar
  • 10. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 2017;284(2):574–582. LinkGoogle Scholar
  • 11. Yu S, Kumamaru KK, George E, et al. Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing. J Biomed Inform 2014;52:386–393. Crossref, MedlineGoogle Scholar
  • 12. Cai T, Giannopoulos AA, Yu S, et al. Natural language processing technologies in radiology research and clinical applications. RadioGraphics 2016;36(1):176–191. LinkGoogle Scholar
  • 13. Pennington J, Socher R, Manning CD. GloVe: Global Vectors for Word Representation. Stanford, Calif: Stanford University, 2014. Google Scholar
  • 14. Kim Y. Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, October 25–29, 2014. Stroudsburg, Pa: Association for Computational Linguistics, 2014; 1746–1751. CrossrefGoogle Scholar
  • 15. Mikolov T, Sutskever I, Chen K, Corrado C, Dean J. Efficient Estimation of Word Representations in Vector Space. In: NIPS’13: Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, Nevada, December 5–10, 2013. Red Hook, NY: Curran Associates, 2013; 3111–3119. Google Scholar
  • 16. Arras L, Horn F, Montavon G, Müller KR, Samek W. Explaining Predictions of Non-Linear Classifiers in NLP. In: Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany, August 11, 2016. Stroudsburg, Pa: Association for Computational Linguistics, 2016; 1–7. CrossrefGoogle Scholar
  • 17. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2017. Google Scholar
  • 18. Gallego B, Dunn AG, Coiera E. Role of electronic health records in comparative effectiveness research. J Comp Eff Res 2013;2(6):529–532. Crossref, MedlineGoogle Scholar
  • 19. Leeper NJ, Bauer-Mehren A, Iyer SV, Lependu P, Olson C, Shah NH. Practice-based evidence: profiling the safety of cilostazol by text-mining of clinical notes. PLoS One 2013;8(5):e63499. Crossref, MedlineGoogle Scholar
  • 20. Lungren MP, Amrhein TJ, Paxton BE, et al. Physician self-referral: frequency of negative findings at MR imaging of the knee as a marker of appropriate utilization. Radiology 2013;269(3):810–815. LinkGoogle Scholar
  • 21. Amrhein TJ, Lungren MP, Paxton BE, et al. Journal Club: Shoulder MRI utilization: relationship of physician MRI equipment ownership to negative study frequency. AJR Am J Roentgenol 2013;201(3):605–610. Crossref, MedlineGoogle Scholar
  • 22. Paxton BE, Lungren MP, Srinivasan RC, et al. Physician self-referral of lumbar spine MRI with comparative analysis of negative study rates as a marker of utilization appropriateness. AJR Am J Roentgenol 2012;198(6):1375–1379. Crossref, MedlineGoogle Scholar
  • 23. Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Eng 2017;19:221–248. Crossref, MedlineGoogle Scholar
  • 24. Luo Y. Recurrent neural networks for classifying relations in clinical notes. J Biomed Inform 2017;72:85–95. Crossref, MedlineGoogle Scholar

Article History

Received May 18, 2017; revision requested July 24; revision received July 25; accepted August 11; final version accepted September 14.
Published online: Nov 13 2017
Published in print: Mar 2018