Deep Learning Reconstruction for Accelerated Spine MRI: Prospective Analysis of Interchangeability
Abstract
Background
Deep learning (DL)–based MRI reconstructions can reduce examination times for turbo spin-echo (TSE) acquisitions. Studies that prospectively employ DL-based reconstructions of rapidly acquired, undersampled spine MRI are needed.
Purpose
To investigate the diagnostic interchangeability of an unrolled DL-reconstructed TSE (hereafter, TSEDL) T1- and T2-weighted acquisition method with standard TSE and to test their impact on acquisition time, image quality, and diagnostic confidence.
Materials and Methods
This prospective single-center study included participants with various spinal abnormalities who gave written consent from November 2020 to July 2021. Each participant underwent two MRI examinations: standard fully sampled T1- and T2-weighted TSE acquisitions (reference standard) and prospectively undersampled TSEDL acquisitions with threefold and fourfold acceleration. Image evaluation was performed by five readers. Interchangeability analysis and an image quality–based analysis were used to compare the TSE and TSEDL images. Acquisition time and diagnostic confidence were also compared. Interchangeability was tested using the individual equivalence index regarding various degenerative and nondegenerative entities, which were analyzed on each vertebra and defined as discordant clinical judgments of less than 5%. Interreader and intrareader agreement and concordance (κ and Kendall τ and W statistics) were computed and Wilcoxon and McNemar tests were used.
Results
Overall, 50 participants were evaluated (mean age, 46 years ± 18 [SD]; 26 men). The TSEDL method enabled up to a 70% reduction in total acquisition time (100 seconds for TSEDL vs 328 seconds for TSE, P < .001). All individual equivalence indexes were less than 4%. TSEDL acquisition was rated as having superior image noise by all readers (P < .001). No evidence of a difference was found between standard TSE and TSEDL regarding frequency of major findings, overall image quality, or diagnostic confidence.
Conclusion
The deep learning (DL)–reconstructed turbo spin-echo (TSE) method was found to be interchangeable with standard TSE for detecting various abnormalities of the spine at MRI. DL-reconstructed TSE acquisition provided excellent image quality, with a 70% reduction in examination time.
German Clinical Trials Register no. DRKS00023278
© RSNA, 2022
Online supplemental material is available for this article.
See also the editorial by Hallinan in this issue.
Summary
Deep learning–reconstructed turbo spin-echo (TSE) acquisition was interchangeable with standard TSE for detecting various spinal abnormalities at MRI, providing excellent quality and a 70% reduction in examination time.
Key Results
■ In this prospective analysis of 50 participants with various spinal abnormalities, deep learning (DL)–reconstructed turbo spin-echo (TSE) images were interchangeable with standard TSE images; absolute individual equivalence indexes werew less than 4%.
■ No evidence of a difference was found between DL-reconstructed TSE and standard TSE imaging regarding frequency of major findings, overall image quality, or diagnostic confidence.
Introduction
Diagnostic and fast spine MRI is essential for value-based, effective, and cost-efficient workflows and is a cornerstone for augmenting MRI throughput, especially in the setting of increasing demand (1).
Deep learning (DL) can be used to enhance accelerated MRI acquisition (2,3). The main purpose of DL-based MRI reconstruction is to facilitate faster scanning by using deep neural networks to reconstruct high‐quality images from fewer or “undersampled” k-space data. This could be achieved with so-called "physics-based DL MRI techniques,” which unroll regularized iterative algorithms alternating between data consistency and a neural network–based regularizer for a finite number of iterations (2). The unrolled DL reconstruction employed in this work exhibits similarities with compressed sensing in the sense that the images are generated in an alternating process between data consistency and image regularization. The key difference is that the image regularization in compressed sensing is based on sparsity assumptions, which are typically enforced ad hoc in a wavelet domain. Technically, these regularizations then amount to wavelet thresholding, which is quite simplistic compared with modern neural networks. The latter are designed to learn more refined image enhancement at the cost of a multitude of model parameters. The currently investigated aspect of DL reconstruction is to optimize these model parameters in a separate and often numerically intensive process on exemplary data. Once fixed, they can be deployed, and the reconstruction can be used on prospective data. The optimization is trained on representative data sets prior to deployment at the scanner, which allows for generalization and does not require ad hoc regularization to enforce sparsity. However, DL reconstruction might introduce "instabilities" in image reconstruction, which include “masking” certain small pathologic findings or distorting structural features (4).
To the best of our knowledge, studies that prospectively employ DL reconstruction of rapidly acquired, undersampled spine MRI of patients in actual clinical scenarios are lacking in the literature.
We hypothesize that a significant reduction in examination time can be achieved without compromising image quality or diagnostic confidence. Furthermore, we hypothesize that diagnostic interchangeability between the undersampled DL-reconstructed images and the fully sampled images can be attained.
The purpose of our study was to investigate the diagnostic interchangeability of standard turbo spin-echo (TSE) acquisition and an unrolled DL-reconstructed TSE (hereafter, TSEDL) T1- and T2-weighted acquisition method, and to test their impact on acquisition time, image quality, and diagnostic confidence.
Materials and Methods
Study Design and Participants
This prospective single-center study is registered in the German Clinical Trials Register (DRKS) with identification number DRKS00023278 and approved by the institutional review board. Study participation was voluntary and written informed consent was acquired from all participants. Our study was conducted in accordance with the Declaration of Helsinki and its amendments. The prototype DL reconstruction was provided by Siemens Healthcare. Full control of participant data was maintained by the authors who are not employees of Siemens. From November 2020 to July 2021, consecutive participants aged 18 years and older with a clinical indication for spine MRI at a university teaching hospital, who gave written permission to undergo two MRI examinations during the same session, were included. Patients who were younger than 18 years, who refused to undergo an additional MRI examination, or who had incomplete MRI data sets were excluded.
Imaging Protocol
Standard and accelerated noncontrast-enhanced sagittal T1-weighted and T2-weighted MRI scans were obtained in each participant. MRI data were prospectively undersampled using threefold acceleration for T1-weighted and fourfold acceleration for T2-weighted sequences, as opposed to our fully sampled protocol, which is acquired with no acceleration for T1-weighted and twofold acceleration for T2-weighted sequences. The undersampled data were reconstructed using the DL algorithm, which is integrated in the proprietary image reconstruction on the scanner and performed online. Detailed information on the network architecture and the training, validation, and test sets are provided in Appendix S1 and Figure S1. Detailed acquisition parameters for standard TSE and TSEDL at 1.5 T and 3 T are given in Tables S1 and S2. A spine coil was used to image participants with 1.5-T and 3-T MAGNETOM scanners (Aera, Avanto Fit, Prisma Fit, and Vida; Siemens Healthineers).
Image Analysis
Five readers (two board-certified radiologists and three 4th year radiology trainees) performed an independent image analysis; A.E.O. is a consultant radiologist with 10 years of experience in interpreting spine MRI, S.A. is a board-certified radiologist with 5 years of experience, and H.A., S.G, and J.H. each have 2 years of experience. The readers were blinded to reconstruction type, clinical and radiologic reports, and each other’s assessments. All patient- or sequence-identifying markers were removed. The readers could only view the images without annotations. To reduce recall bias, all readers analyzed the TSE and TSEDL data sets in a random and mixed order in separate sessions, with a minimum 4-week washout period between sessions.
Interchangeability Study for Pathologic Detection and Structural Assessment
Interchangeability was assessed for the reader judgments by using previously reported definitions (5,6). Interchangeability refers to the capacity of a newer diagnostic technique (TSEDL in our study) to replace a standard clinical sequence (ie, TSE) by showing that the difference in agreement, when different readers evaluate the standard clinical sequence, is not less than differences in agreement between judgments between the standard and currently investigated technique (5,7–9).
Interchangeability was examined for eight outcome parameters separately. The following outcome parameters were graded as present or absent on a vertebral level: focal bone marrow signal abnormalities, Modic changes (10), intervertebral disk degeneration, Schmorl nodes, facet arthropathy, endplate fractures, ligamentum flavum lesions, and foraminal stenosis.
The code for the full interchangeability analysis is available at an open repository (https://github.com/Open-source-code-for-radiology/R-Code-for-the-interchangeability-Analysis).
Image Quality–based Analysis
The outcome measures were sharpness of anatomic structures (intervertebral disk, spinal cord, cerebrospinal fluid, nerve roots, facet joint, and neuroforamina), artifacts, noise, overall image quality, and diagnostic confidence (Appendix S1).
Sharpness, overall image quality, and diagnostic confidence were rated on an ordinal four-point scale whereby 1 indicates poor, 2 indicates moderate, 3 indicates good, and 4 indicates excellent. Artifacts and noise were assessed as follows: abundant artifacts and/or noise, severe impediment of image quality by artifacts and/or noise, slight impediment of image quality by artifacts and/or noise, and no visible artifacts and/or noise. A dedicated workstation (GE Centricity PACS RA1000; GE Healthcare) was used for image analysis in certified reading room conditions.
Statistical Analysis
A full-scale power analysis is not feasible for the interchangeability analysis we report; however, we indicated a lower bound of the sensitivity of the study design to detect a difference of 5% between agreements within the reference method TSE, on the one hand, and agreements between TSE and TSEDL on the other. This calculation was performed using the McNemar test, which only accounts for the correlated error of paired observations, after participant inclusion and data collection. Details on the power calculation, interchangeability, and code used are given in Appendix S1. Medians and IQRs are reported for ordered categorical variables and means and SDs are reported for continuous variables. For the assessment of interchangeability, a generalized linear model was fit. To account for clustered data (the same participant was interpreted multiple times by five readers using two techniques), generalized estimating equations (for a binomial distribution using a logit link function) were used (5–7). To consider both protocols interchangeable, disagreement between two protocols should be less than 5% (5) (ie, the individual equivalence index should not exceed 5%). Regarding the image quality–based analysis, intergroup comparisons were performed using the Wilcoxon signed-rank test. Kendall tau (τ) and Kendall coefficient of concordance (W) were used to test for concordance. Weighted Fleiss κ values and intraclass correlation coefficients were used to evaluate agreement. The McNemar test was used to compute differences in major pathologic findings. P < .05 was considered indicative of a statistically significant difference. Statistical analyses were performed using SPSS Statistics version 26.0 (IBM) and R version 4.1.1 (The R Foundation). The power analysis and interchangeability analysis (encompassing interprotocol intrareader, interreader interprotocol, and intrareader interprotocol concordance analyses) were conducted by a biostatistician (J.J.) at our university. The image quality–based analysis was conducted by the first author (H.A.).
Results
Participant Characteristics
All imaging studies were successfully performed. A total of 50 participants with various spinal abnormalities and with complete MRI data sets were included. Of 204 eligible patients, 20 patients younger than 18 years, 133 patients who refused to participate in the study, and one patient with incomplete MRI data sets were excluded (Fig 1). The mean participant age was 46 years ± 18 (SD) (range, 18–87 years), including 26 men and 24 women. Table 1 shows demographics of the study participants and indications for spine MRI. Figures 2–4 and S2 and S3 show examples of TSE and TSEDL images in participants with vertebral disk protrusions (Fig 2), Modic type II changes (Fig 3), vertebral impression fracture and bone marrow edema (Fig 4), acute vertebral compression fractures and hemangiomas (Fig S2), and vertebral metastases (Fig S3).
Acquisition Time
The median total acquisition time of TSEDL (T1- and T2-weighted sequences) was 100 seconds (IQR, 27.25 seconds; range, 60–172 seconds) and the median total acquisition time of TSE was 328 seconds (IQR, 113.25 seconds; range, 264–710 seconds). There was a median total acquisition time reduction of 227 seconds (IQR, 101.25 seconds; P < .001), enabling a median acquisition time reduction of 70% (IQR, 11%).
In terms of T2-weighted imaging, the median acquisition time of TSEDL was 35.50 seconds (IQR, 10 seconds; range, 25–87 seconds) versus a median acquisition time for TSE of 160.50 seconds (IQR, 48 seconds; range, 113–323 seconds). There was a median acquisition time reduction of 123.50 seconds (IQR, 40 seconds; P < .001), enabling a median acquisition time reduction of 78% (IQR, 2.6%).
In terms of T1-weighted imaging, the median acquisition time of TSEDL was 61 seconds (IQR, 23 seconds; range, 28–132 seconds) versus a median acquisition time for TSE of 157.50 seconds (IQR, 84 seconds; range, 128–408 seconds). There was a median acquisition time reduction of 97 seconds (IQR, 42 seconds; P < .001), enabling a median acquisition time reduction of 62% (IQR, 12.7%).
Interchangeability Analysis
Table 2 details the individual equivalence indexes with CIs used to assess interchangeability. Interreader agreement of the accelerated TSEDL and standard TSE was similar to that of standard TSE alone, with a maximum disagreement of 3.2%. In summary, all absolute individual equivalence indexes were less than 4%. The CIs for all evaluated parameters were within the critical limit [–5%, +5%], indicating interchangeability between standard TSE and TSEDL.
Agreement and Concordance (κ and Kendall Statistics)
Interreader agreement of the accelerated TSEDL and standard TSE was similar to the interreader agreement of standard TSE alone. In summary, the interprotocol (TSE vs TSEDL) interreader agreement was moderate to substantial, with κ values ranging from 0.48 to 0.78. The intraprotocol (TSE vs TSE) interreader agreement was also moderate to substantial, with κ values ranging from 0.44 to 0.80. Finally, the interprotocol (TSE vs TSEDL) intrareader agreement was substantial to almost perfect, with κ values ranging from 0.75 to 0.98. Similarly, the interprotocol (TSE vs TSEDL) interreader concordance was moderate to substantial, with Kendall W values ranging from 0.57 to 0.80. The intraprotocol (TSE vs TSE) interreader concordance was moderate to almost perfect, with Kendall W values ranging from 0.59 to 0.84. Finally, the interprotocol (TSE vs TSEDL) intrareader concordance was substantial to almost perfect, with Kendall τ values ranging from 0.77 to 0.98. Table 3 delineates the agreement and concordance results.
Frequency of Major Findings
Both protocols (TSE and TSEDL) had similar rates for detecting major abnormalities (Table 4). There was no evidence that readers were more likely to report major findings with standard TSE compared with TSEDL acquisition (P ≥ .18 for all analyzed parameters).
Image Quality–based Analysis
Detailed results for the image quality–based assessments of all readers are given in Table S3. In summary, for all readers, we found no evidence of a difference between TSE and TSEDL images regarding sharpness of anatomic structures or artifacts (P ≥ .06). The most common artifacts were motion artifacts (mostly seen on TSE images), residual aliasing artifacts (seen on TSE and TSEDL images), and banding artifacts (seen on TSEDL images). Figure 5 illustrates examples of the introduced artifacts in the DL images.
In terms of image quality and diagnostic confidence, no difference was observed between conventional TSE and TSEDL images. The mean T2-weighted image quality was 3.76 ± 0.58 for TSE and 3.83 ± 0.48 for TSEDL (median, 4 [IQR, 0] for both; P = .07). The mean T1-weighted image quality was 3.73 ± 0.57 for TSE and 3.78 ± 0.47 for TSEDL (median, 4 [IQR, 0] for both; P = .06). The mean T2-weighted diagnostic confidence was 3.83 ± 0.54 for TSE and 3.86 ± 0.45 for TSEDL (median, 4 [IQR, 0] for both; P = .37). The mean T1-weighted diagnostic confidence was 3.82 ± 0.53 for TSE and 3.87 ± 0.42 for TSEDL (median, 4 [IQR, 0] for both; P = .15) (Table S3).
Furthermore, TSEDL images were rated as having less noise (P < .001). Interreader interprotocol (TSE vs TSEDL) agreement and concordance ranged from fair (for noise, intraclass correlation coefficient = 0.23 and Kendall τ = 0.29) to almost perfect (for overall image quality, intraclass correlation coefficient = 0.85 and Kendall τ = 0.83).
Discussion
Deep learning (DL)–based MRI reconstructions can reduce examination times for turbo spin-echo (TSE) acquisitions. Studies that prospectively employ DL-based image reconstructions in rapidly acquired, undersampled spine MRI data sets in actual clinical scenarios are lacking in the literature. Our study investigated the diagnostic interchangeability of an unrolled DL-reconstructed TSE T1- and T2-weighted acquisition method with standard TSE and tested their impact on acquisition time, image quality, and diagnostic confidence.
Our findings show that the TSEDL acquisition method enabled an approximately 70% reduction in examination time (P < .001) and that TSEDL was interchangeable with standard TSE (the individual equivalence indexes for all analyzed lesions were less than 4%). Moreover, there was no evidence of a difference between both protocols regarding detection of major findings (P ≥ .18 for all analyzed parameters). This is important because DL-based reconstruction might introduce "instabilities" in image reconstruction, which include "masking" of certain small pathologic findings or introduction of artifacts (4). Certain artifacts were introduced, such as banding artifacts, which are characteristically produced by Cartesian DL reconstruction, particularly in low signal-to-noise ratio regions of the reconstructed image. These artifacts have a streaking pattern aligned with the phase-encoding direction (11) (Fig 5). However, in the study sample there was no evidence of a difference regarding artifacts, image quality, or diagnostic confidence between both protocols (P ≥ .06).
DL-based acceleration does not compromise the resolution or image quality appearance as opposed to conventional acceleration techniques, including compressed sensing (12). In part, this is because the physical modeling introduced with parallel imaging is integrated through coil sensitivity into the variational neural network architecture (13,14). Liu et al (15) showed that DL reconstruction networks directly reconstruct pixel‐wise T2 maps from eightfold accelerated k‐space data without compromising accuracy. DL-accelerated T2-weighted TSE sequences have been recently implemented in prostate MRI, where an improvement of image quality (reduced noise, enhanced overall image quality, and reduced artifacts; P < .01 for both readers) and reduction of scan time by more than 60% were described (16,17). In a volunteer study of healthy individuals (18), the feasibility of DL reconstructions was shown in various musculoskeletal applications (knee, shoulder, and spine) in terms of improving image quality (enhanced edge sharpness and reduced noise, P < .001).
Using DL-based algorithms in the setting of knee MRI, Recht et al (8) demonstrated that superior image quality was achieved despite a fourfold acceleration of image acquisition (P < .001). Furthermore, the authors showed that interchanging the DL-based reconstructions and standard knee MRI sequences produced discordant clinical diagnoses of less than 4% (the individual equivalence index for all analyzed features was less than 4%). However, the authors used retrospective undersampling in their study in an effort to simulate the potential acceleration that could ensue in routine clinical care (8). Currently, there is a need to translate DL technology to clinical scenarios and bridge the distance between development and deployment (12). In our study, prospective undersampling was used and led to an actual scan time reduction in the clinical routine (mean reduction of total examination time by 70%, P < .001) for participants with various degenerative (eg, intervertebral degeneration and Modic changes) and nondegenerative (eg, fractures and bone marrow signal abnormalities) spinal conditions, enabling a true comparison with standard TSE images.
The findings of our study should be interpreted in the context of its limitations. First, the small sample size, heterogeneity of the included participants, and monocentric design might limit the generalizability of our findings. Although five readers conducted the analyses as an attempt to, at least partially, compensate for this limitation, our study findings must be verified in a larger participant sample to ascertain sufficient power for the conducted analyses. Second, the acquisition parameters for TSE and TSEDL were slightly inhomogeneous between the 1.5-T and 3-T scanners. Third, in the six participants who underwent full-spine TSE MRI, TSEDL acquisition was performed on the full spine in one participant and on the lumbar spine in five participants. The difference in the field of view might have created a slight bias in the assessment of these participants. Fourth, the analyzed parameters had little variance in the included sample due to the analyzed pathologic finding being absent in many of the examined vertebrae, and this effect might impact the interchangeability results. Finally, our study focused on noncontrast-enhanced sagittal T1- and T2-weighted imaging; however, the DL technique should also be examined in the axial plane, as well as in contrast-enhanced examinations, to widen its potential clinical application. Nonetheless, our study constitutes a preliminary clinical investigation with prospective deployment of the DL-based acceleration technique in musculoskeletal MRI and spine MRI, providing encouraging results.
In conclusion, the data-driven deep learning (DL)–reconstructed turbo spin-echo (TSE) method was shown to be clinically feasible and interchangeable with standard T1- and T2-weighted TSE acquisition for detecting various abnormalities of the spine. DL-reconstructed TSE provided excellent image quality and diagnostic confidence, with a significant reduction in examination time by 70%. Consequently, the DL technique might set the stage for ultrafast spine MRI. Future directions would be to apply the DL technique in other sequences and in three-dimensional spine MRI.
Author Contributions
Author contributions: Guarantors of integrity of entire study, J.H., S.A., A.E.O.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, H.A., J.H., S.G., D.N., M.M., M.N.; clinical studies, H.A., J.H., S.G., S.A., A.E.O.; statistical analysis, H.A., J.J., A.E.O.; and manuscript editing, H.A., J.H., S.G., S.A., G.K., D.N., M.M., M.N., A.E.O.
Data sharing: Data generated or analyzed during the study are available from the corresponding author by request.
References
- 1. . Rapid Musculoskeletal MRI in 2021: Value and Optimized Use of Widely Accessible Techniques. AJR Am J Roentgenol 2021;216(3):704–717.
- 2. . Self-Supervised Physics-Based Deep Learning MRI Reconstruction Without Fully-Sampled Data.
In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI) ,Iowa City, IA ,April 3–7, 2020 . Piscataway, NJ: IEEE, 2020; 921–925. - 3. . Deep learning. Nature 2015; 521(7553):436–444.
- 4. . On instabilities of deep learning in image reconstruction and the potential costs of AI. Proc Natl Acad Sci U S A 2020;117(48):30088–30095.
- 5. . Testing for interchangeability of imaging tests. Acad Radiol 2014;21(11):1483–1489.
- 6. . MRI of non-specific low back pain and/or lumbar radiculopathy: do we need T1 when using a sagittal T2-weighted Dixon sequence? Eur Radiol 2020;30(5):2583–2593.
- 7. . Comparison of a fast 5-min knee MRI protocol with a standard knee MRI protocol: a multi-institutional multi-reader study. Skeletal Radiol 2018;47(1):107–116.
- 8. . Using Deep Learning to Accelerate Knee MRI at 3 T: Results of an Interchangeability Study. AJR Am J Roentgenol 2020;215(6):1421–1429.
- 9. . Comparison of a fast 5-minute shoulder MRI protocol with a standard shoulder MRI protocol: a multiinstitutional multireader study. AJR Am J Roentgenol 2017;208(4):W146–W154.
- 10. . Degenerative disk disease: assessment of changes in vertebral body marrow with MR imaging. Radiology 1988;166(1 Pt 1):193–199.
- 11. . MRI banding removal via adversarial training. Adv Neural Inf Process Syst 2020;33:7660–7670.https://dl.acm.org/doi/10.5555/3495724.3496366.
- 12. . Prospective Deployment of Deep Learning in MRI: A Framework for Important Considerations, Challenges, and Recommendations for Best Practices. J Magn Reson Imaging 2021;54(2):357–371.
- 13. . Artificial intelligence for MR image reconstruction: an overview for clinicians. J Magn Reson Imaging 2021;53(4):1015–1028.
- 14. . Learning a variational network for reconstruction of accelerated MRI data. Magn Reson Med 2018;79(6):3055–3071.
- 15. . MANTIS: Model-Augmented Neuralne Tworkwith Incoherent k-space Sampling for efficient MR parameter mapping. Magn Reson Med 2019;82(1):174–188.
- 16. . Deep learning-accelerated T2-weighted imaging of the prostate: Reduction of acquisition time and improvement of image quality. Eur J Radiol 2021;137:109600.
- 17. . Accelerated T2-Weighted TSE Imaging of the Prostate Using Deep Learning Image Reconstruction: A Prospective Comparison with Standard T2-Weighted TSE Imaging. Cancers (Basel) 2021;13(14):3593.
- 18. . Feasibility and Implementation of a Deep Learning MR Reconstruction for TSE Sequences in Musculoskeletal Imaging. Diagnostics (Basel) 2021;11(8):1484.
Article History
Received: Nov 16 2021Revision requested: Feb 7 2022
Revision received: July 28 2022
Accepted: Sept 8 2022
Published online: Nov 01 2022