Fully Automated and Standardized Segmentation of Adipose Tissue Compartments via Deep Learning in 3D Whole-Body MRI of Epidemiologic Cohort Studies

Published Online:https://doi.org/10.1148/ryai.2020200010

Abstract

Purpose

To enable fast and reliable assessment of subcutaneous and visceral adipose tissue compartments derived from whole-body MRI.

Materials and Methods

Quantification and localization of different adipose tissue compartments derived from whole-body MR images is of high interest in research concerning metabolic conditions. For correct identification and phenotyping of individuals at increased risk for metabolic diseases, a reliable automated segmentation of adipose tissue into subcutaneous and visceral adipose tissue is required. In this work, a three-dimensional (3D) densely connected convolutional neural network (DCNet) is proposed to provide robust and objective segmentation. In this retrospective study, 1000 cases (average age, 66 years ± 13 [standard deviation]; 523 women) from the Tuebingen Family Study database and the German Center for Diabetes research database and 300 cases (average age, 53 years ± 11; 152 women) from the German National Cohort (NAKO) database were collected for model training, validation, and testing, with transfer learning between the cohorts. These datasets included variable imaging sequences, imaging contrasts, receiver coil arrangements, scanners, and imaging field strengths. The proposed DCNet was compared to a similar 3D U-Net segmentation in terms of sensitivity, specificity, precision, accuracy, and Dice overlap.

Results

Fast (range, 5–7 seconds) and reliable adipose tissue segmentation can be performed with high Dice overlap (0.94), sensitivity (96.6%), specificity (95.1%), precision (92.1%), and accuracy (98.4%) from 3D whole-body MRI datasets (field of view coverage, 450 × 450 × 2000 mm). Segmentation masks and adipose tissue profiles are automatically reported back to the referring physician.

Conclusion

Automated adipose tissue segmentation is feasible in 3D whole-body MRI datasets and is generalizable to different epidemiologic cohort studies with the proposed DCNet.

Supplemental material is available for this article.

Keywords: Adipose Tissue (Obesity Studies), Adults, Convolutional Neural Network (CNN), Epidemiology, MR-Imaging, Neural Networks, Obesity, Segmentation, Supervised learning, Tissue Characterization, Transfer learning, Volume Analysis

© RSNA, 2020

Summary

Fully automated and fast assessment of visceral and subcutaneous adipose tissue compartments using whole-body MRI is feasible with a deep learning network; a robust and generalizable architecture was investigated that enables objective segmentation and quick phenotypic profiling.

Key Points

  • ■ Objective, fast (<7 seconds), and reliable assessment of adipose tissue compartments (subcutaneous and visceral fat) is feasible in MRI datasets for noninvasive and phenotypic adipose tissue profiling.

  • ■ A robust architecture generalizable to several imaging sequences, contrast agents, nine imaging sites, coil arrangement, patient positioning, three scanners, and two field strengths demonstrates high Dice overlap (0.94), as well as high classification sensitivity (96.6%), specificity (95.1%), precision (92.1%), and accuracy (98.4%).

  • ■ Integration into clinical workflow can support end users, such as physicians and physiologists, by reducing time and effort required for image analysis.

Introduction

MRI is a widely used imaging modality that enables highly resolved anatomic and functional depiction of organs, tissues, and disease processes. The acquisition of imaging data has become an integral aspect of reliable and fast disease assessment, staging, therapy, and treatment monitoring. In addition to conducting individual assessment of disease using MRI, large-scale cohort studies provide insight into factors influencing pathogenesis of various diseases (1,2).

The increasing prevalence of type 2 diabetes mellitus demands additional research in large-cohort MRI studies and fast automatic phenotyping to determine suitable biomarkers for risk assessment, as well as to deliver personalized lifestyle intervention treatment for prevention. The adipose tissue distribution in the body has been shown to be a key indicator of the pathogenesis of insulin resistance and type 2 diabetes mellitus (35). The use of MRI for whole-body assessment of adipose tissue distribution has proved to be a promising noninvasive method for screening in large-scale cohort studies (68). One crucial step in the evaluation of medical image content is the recognition and segmentation of specific organs or tissues, that is, performing a voxelwise classification known as semantic segmentation. Several methods for automated segmentation of MR images have been proposed, including those relying on explicit models (9), general correspondence (10), and random forest models (11), as well as deep learning (12) approaches, including convolutional neural networks (CNNs) (1315). Most CNN-based segmentation architectures are derived from the U-Net (15), owing to its good generalizability and performance (16,17). Landmark detection (18,19) or anatomic object localization (20) have already been demonstrated and can be an important preprocessing step for segmentation. Deep learning–based approaches for use with MRI data have been proven effective for segmentation of brain matter (21,22), brain tumors (23,24), the liver (25), and cardiac ventricles (26).

Approaches for automatic adipose tissue segmentation have been proposed mainly by employing machine learning techniques (27). Suitable contrast in MRI for differentiation of adipose tissue from all other tissue types is provided by the markedly T1-weighted imaging leading to hyperintense adipose tissue signal with respect to lean tissue (LT). An alternative approach for fat-water imaging is phase sensitive to multiecho gradient-echo imaging (Dixon technique), which exploits the different chemical shifts from water to methylene and methyl signals from triglycerides, a principle first described by Dixon (28). The multiecho acquisition enables extraction of fat-water separated images and other confounding factors, such as fat fraction and R2* maps (2931). This information can be used to preprocess the data (eg, bias field correction [32]), or it can be incorporated into the segmentation. Adipose tissue segmentations include multiatlas-based machine learning for fat images of regional muscles (33), contour-based segmentations (34), and machine learning clustering methods (35). More recently, two-dimensional (2D) deep learning networks for abdominal adipose tissue segmentation using the Dixon MRI technique (36,37) were proposed. Adipose tissue segmentation on whole-body MRI enabled the use of adipose tissue profiling (38) to investigate metabolic risks for cardiac disease (39), type 2 diabetes mellitus (40), and cancer (41).

The aforementioned studies, focused on either a single organ or a body region, were performed in smaller-scale or single cohorts, or provided only 2D processing. In this work, we propose a semantic segmentation network for whole-body adipose tissue segmentation that operates on single-parameter and multiparametric three-dimensional (3D) whole-body MRI from different multicenter epidemiologic patient cohort studies. The aim of this work is to provide a segmentation network that is robust to datasets acquired in different imaging sites, with varying scanner types, field strengths, and receiver coil set-ups, and for changing imaging sequences with varying imaging resolutions and different patient positioning.

This study had four main objectives: (a) to develop a 3D deep learning architecture as a combination of a densely connected network with merge-and-run mappings for attention-based multiresolution focusing in an encoder-decoder segmentation together with a relative positional encoding of input patches; (b) to determine the performance of this architecture in terms of robustness, sensitivity, specificity, precision, accuracy, and Dice overlap in comparison with a 3D U-Net segmentation; (c) to assess the possibility of transfer learning between different epidemiologic cohorts for training database composition and effect on its performance; and (d) to integrate the model into clinical workflow with automated reporting of adipose tissue head-feet profiles to enable profiling and studying in an epidemiologic setting.

Materials and Methods

A CNN is proposed for 3D semantic segmentation of whole-body adipose tissue into subcutaneous adipose tissue (SAT), abdominal visceral adipose tissue (VAT), LT, and background (BG). To cope with varying tissue appearances and distribution, the network uses a combination of architectural designs (14,15,42,43) and proposes the inclusion of positional encoding for attention-based (44) multiresolution learning (45) with an enhanced receptive field of view (14,15).

The network is trained and tested on whole-body MRI data from different multicenter epidemiologic patient databases with varying image contrast, imaging dimensionality, scanners, field strengths, patient positioning, and coil arrangement.

Epidemiologic Patient Databases

Database compositions, as well as scanner and acquisition parameters, are summarized in Table 1. Studies are approved by the ethics committees, and individuals gave written consent. Inclusion and exclusion criteria and further study information are stated in Bamberg et al (2) and Machann et al (6).

Table 1: TUEF/DZD and NAKO Database Information

Table 1:

Tuebingen Family Study and German Center for Diabetes Research databases.—The Tuebingen Family Study (TUEF) and the studies performed in the framework of the German Center for Diabetes Research (DZD) aim to measure fat distribution in White individuals at increased risk of type 2 diabetes owing to weight (body mass index >27 kg/m2), having a first-degree relative with type 2 diabetes, impaired glucose tolerance, and/or gestational diabetes prior to and after lifestyle intervention (6). Data were acquired in a single-center study (TUEF) and multicenter studies (DZD) using 1.5- and 3.0-T scanners with whole-body T1-weighted multiple breath-hold and multislice 2D fast spin-echo sequences in the axial direction. Individuals were placed in the prone position (extended arms) with one rearrangement (the first step was head first, the second was feet first). Data were recorded using the whole-body coil of the MRI unit. From the approximately 2000 scanned individuals in four sites, 500 1.5-T cases and 500 3.0-T cases were labeled and included in this study.

German National Cohort database.—The German National Cohort (NAKO Gesundheitsstudie) aims to understand the natural history of a broad set of diseases and to potentially identify novel imaging biomarkers (2). The study randomly recruits individuals for MRI screening at five imaging centers distributed throughout Germany. From the imaging protocol, we evaluate the whole-body multiple breath-hold 3D dual gradient-echo chemical shift (Dixon) sequence, which was acquired with a 3.0-T scanner in the axial orientation. Individuals were placed in the head-first supine position. In this set-up, data recording was performed with local array receiver coils on the front and back of the individual. The scanner was equipped with an 18-channel body coil and a 32-channel spine coil from the manufacturer. Approximately 15 000 individuals had been examined at the time of this study, and MR images from 300 of them were labeled and included in this study.

Ground Truth Labeling

Training data are obtained with a semi-supervised labeling technique (34). Automatic fuzzy c-means clustering was used to presegment the whole-body MRI data based on intensity histograms after normalization and partial volume correction into BG, LT, and adipose tissue. A subsequent snake algorithm divides the adipose tissue into an SAT region and a VAT region. The obtained masks (BG, SAT, VAT, LT) were manually inspected and corrected, if necessary, by two trained experts with 9 (S.G.) and 17 (J.M.) years of experience in whole-body MRI on an in-house–developed graphic user interface to ensure proper data curation.

Proposed Architecture

The CNN-based segmentation is inspired by the concepts developed in U-Net (15) for pixelwise localization, V-Net (14) for volumetric medical image segmentation, ResNet (43) to cope with vanishing gradients and the degradation problem, DenseNet (42) to enable deep supervision, and merge-and-run mapping (46) to provide attention-based multiresolution focusing. The proposed network architecture is depicted in Figure 1. It combines the aforementioned schemes, which result in a densely connected CNN (ie, DCNet). Specific details on the DCNet architecture are described in Appendix E1 (supplement).

Proposed three-dimensional (3D) densely connected convolutional neural                         network (DCNet) segmentation network. (a) General encoder-decoder structure                         that consists of merge-and-run (MRGE) blocks and intermittent transition                         layers for downsampling (encoder path) and upsampling (decoder path) of                         feature maps to provide multiresolution segmentation. MRGE blocks are built                         up with dense connections for deep supervision and L layers of dense                         convolution (Dense Conv) nodes. (b) Composition of MRGE block with dense                         merge-and-run connections and transition layers in encoder (max pooling) and                         decoder (transposed convolution [Transposed Conv]) path. (c) Each dense                         convolution node is a series of batch normalization, rectified linear unit                         (ReLU) activation, 1 × 1 x 1 convolution (conv), batch normalization,                         ReLU activation, and 3 × 3 x 3 convolution.

Figure 1: Proposed three-dimensional (3D) densely connected convolutional neural network (DCNet) segmentation network. (a) General encoder-decoder structure that consists of merge-and-run (MRGE) blocks and intermittent transition layers for downsampling (encoder path) and upsampling (decoder path) of feature maps to provide multiresolution segmentation. MRGE blocks are built up with dense connections for deep supervision and L layers of dense convolution (Dense Conv) nodes. (b) Composition of MRGE block with dense merge-and-run connections and transition layers in encoder (max pooling) and decoder (transposed convolution [Transposed Conv]) path. (c) Each dense convolution node is a series of batch normalization, rectified linear unit (ReLU) activation, 1 × 1 x 1 convolution (conv), batch normalization, ReLU activation, and 3 × 3 x 3 convolution.

In total, the proposed DCNet consists of 154 layers, which results in approximately 12 million trainable parameters for a single- or dual-channel 3D input. Each 3D volume is first normalized into a range of 0–1 and then cropped with a sliding window into overlapping 3D input patches with a size of 32 × 32 × 32 × C, where C ε[1,2] is given by the multiparametric input. For the T1-weighted fast spin-echo data, C equals 1, whereas for the multiecho chemical shift (Dixon technique) data, C equals 2 (fat and water image). RMSProp with multiclass focal loss (47) and multimetric loss (true-positive rate and Jaccard distance) is applied during training for a batch size of 48. Training was conducted over 100 epochs with early stopping. The network was implemented in TensorFlow 1.14 (https://www.tensorflow.org/) and trained on a Tesla V100 GPU (Nvidia, Santa Clara, Calif).

Experiments

The proposed method aims to reliably work in all cohorts and for changing input data (imaging sequences, contrasts, sites, scanners, field strengths, coil, and patient positioning). Therefore, robustness and generalizability of the proposed architecture toward changing training and test data are investigated. Robustness and reliability were investigated for all experiments by means of fourfold cross validation for the proposed DCNet and compared against a 3D U-Net segmentation (Appendix E2 [supplement]).

Differing training, validation, and test set composition among and within epidemiologic cohorts was created to determine if and what information can be exchanged. Furthermore, a transfer learning scheme was applied to the proposed DCNet to conduct the database domain change and was compared against multidatabase training. The proposed network was trained on database A and tested on database B or C, in the following description denoted by A → B|C. The following four scenarios were investigated: (a) A → A: intradatabase training and testing; (b) A → B: interdatabase testing to infer generalizability; (c) A+B → A|B: transductive transfer learning with pretraining on database A (first 40 epochs), followed by fine-tuning of training on database B (remaining epochs) and testing on A or B; and (d) A and B → A|B: multidatabase learning for randomly shuffled samples from databases A and B with testing on A or B. The latter case also serves as a comparison if a guided learning approach (ie, transfer learning) is superior to multidatabase learning. Data compositions between the different epidemiologic cohorts (TUEF/DZD and NAKO) as well as between the 1.5- and 3.0-T scans of the TUEF/DZD database are investigated to infer influence on prediction for changing imaging sequences and contrast (T1-weighted fast spin-echo technique vs multiecho chemical shift Dixon technique), imaging sites and scanners, coil positioning, patient positioning (prone vs supine), and field strengths (1.5 T vs 3.0 T).

Training datasets for the training folds and smaller subsets are randomly selected from the labeled cohort to provide uniform distribution over phenotypic factors (sex, age, and body mass index). For all experiments, datasets were split into 70% for training, 10% for validation, and 20% for the test set. For each of the 1.5- and 3.0-T measurements (500 individuals for each) from TUEF/DZD, there were 350 training, 50 validation, and 100 test sets. For the NAKO set of 300 individuals, cases were split into 210 training, 30 validation, and 60 test sets. For each cross-validation run, a different fixed training, validation, and test set was created.

Statistical Analysis

Unless stated otherwise, all reported results represent the mean and two-sided standard deviation on all four cross-validation runs and the test case. Performance was evaluated qualitatively and quantitatively in terms of specificity, sensitivity, precision, and accuracy (equations are found in Appendix E3 [supplement]), which are derived from the confusion matrix between the predicted segmentation, S, and the labeled ground truth, G, of the four classes (SAT, VAT, LT, BG) on a voxel-by-voxel comparison in a one-against-rest approach (ie, the target class is compared against the sum of all remaining classes). Segmentation overlaps are evaluated with the Dice coefficient (equation in Appendix E3 [supplement]) between S and G. Statistical significance for the performance metrics was determined by using a paired Welch t test (significance level of α = .05) under the null hypothesis of equal means for unequal variances performed in Python (version 3.6; https://www.python.org/) (open source) with SciPy (version 1.4.0; https://www.scipy.org/) (open source). The source code is available at https://github.com/lab-midas/med_segmentation.

Results

Axial slices of the TUEF/DZD databases for individuals examined using 1.5- and 3.0-T scanners are shown in Figure 2. The T1-weighted fast spin-echo MR images can be affected by spatial inhomogeneities of the B1 field and by inhomogeneous sensitivity characteristics of the receiver coil, as indicated by the arrows in Figure 2. The segmentation is robust toward these artifacts and is in close agreement with the ground truth.

Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT                     (yellow), lean tissue (green), and background (black) in three individuals                     scanned with a 1.5-T scanner and three individuals scanned with a 3.0-T scanner                     from the Tuebingen Family Study/German Center for Diabetes Research database.                     Axial slices of the T1-weighted fast spin-echo (T1w FSE), labeled ground truth,                     and segmentation output of the proposed densely connected convolutional neural                     network (DCNet) are shown. Arrows indicate areas of magnetic field                     inhomogeneities that were correctly estimated with the DCNet. Quantitative                     scores over the whole cohort are stated in Table 2.

Figure 2: Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT (yellow), lean tissue (green), and background (black) in three individuals scanned with a 1.5-T scanner and three individuals scanned with a 3.0-T scanner from the Tuebingen Family Study/German Center for Diabetes Research database. Axial slices of the T1-weighted fast spin-echo (T1w FSE), labeled ground truth, and segmentation output of the proposed densely connected convolutional neural network (DCNet) are shown. Arrows indicate areas of magnetic field inhomogeneities that were correctly estimated with the DCNet. Quantitative scores over the whole cohort are stated in Table 2.

The same network architecture is also generalizable to the NAKO database with individuals scanned with 3.0-T scanners at multiple imaging sites, as qualitatively depicted in Figure 3. Moreover, extension of the network input dimension from single-channel (T1-weighted fast spin-echo) to multichannel (multiecho chemical shift; Dixon technique) imagers preserves a stable segmentation, indicating a robust architecture. Quantitative evaluation of these experiments is shown in Table 2. Low standard deviations for the proposed DCNet indicate good robustness for changing training data folds. On average, for all segmented tissues and databases, sensitivity of 96.6%, specificity of 95.1%, precision of 92.1%, accuracy of 98.4%, and Dice overlap of 0.94 were achieved. The proposed DCNet significantly outperforms a standard 3D U-Net by 30.2% for sensitivity, 40.1% (P < .001) for specificity, 151.6% (P < .001) for precision, and 46.9% (P < .001) for Dice score, with the lone exception of accuracy (16.0%, P = .14) (Figure E1 [supplement]). Training of the proposed DCNet required approximately 25 hours, whereas only 5–7 seconds are needed for prediction after dataset loading.

Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT                     (yellow), lean tissue (green), and background (black) in two individuals scanned                     with a 3.0-T scanner from the German National Cohort (NAKO) database. Coronal                     and two exemplary axial slices (at the femoral head and abdominal) of fat images                     from the multiecho chemical shift (Dixon) sequence, labeled ground truth, and                     segmentation output of the proposed densely connected convolutional neural                     network (DCNet) are shown. Quantitative scores over the whole cohort are shown                     in Table 2.

Figure 3: Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT (yellow), lean tissue (green), and background (black) in two individuals scanned with a 3.0-T scanner from the German National Cohort (NAKO) database. Coronal and two exemplary axial slices (at the femoral head and abdominal) of fat images from the multiecho chemical shift (Dixon) sequence, labeled ground truth, and segmentation output of the proposed densely connected convolutional neural network (DCNet) are shown. Quantitative scores over the whole cohort are shown in Table 2.

Table 2: Quantitative Evaluation Metrics for TUEF/DZD and NAKO Test Cases Over All Cross-Validation Runs

Table 2:

The influence of changing input data can be examined for changing scanners and imaging field strengths (1.5 and 3.0 T) in Figure 4 and for different cohorts (TUEF/DZD vs NAKO, fast spin-echo vs Dixon, 2D vs 3D, prone vs supine) and for different scanners and receiver coil arrangements in Figure 5. Intradatabase training and testing (A → A) performs better than interdatabase testing (A → B). Transfer (A+B → A|B) and multidatabase (A&B→A|B) learning overcome segmentation limitations of the interdatabase experiments, such as magnetic field inhomogeneities (Fig 4) and different through-plane resolutions (Fig 5; TUEF/DZD [10 mm] vs NAKO [3 mm]), which did not exist in the respective training database. Quantitative scores for changing scanners and imaging field strengths are shown in Table 3 and for changing cohorts in Table 4, which substantiate these findings. The best performing experiment and hence the best data use was obtained via transfer learning (A+B), which was not better than multidatabase learning (A&B) for sensitivity (P = .94 and P = .91), specificity (P = .98 and P = .90), precision (P = .66 and P = .95), accuracy (P = .82 and P = .67), and Dice overlap (P = .86 and P = .84), as shown in Tables 3 and 4, respectively.

Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT                     (yellow), lean tissue (green), and background (black) in one individual scanned                     with a 1.5-T scanner and in one individual scanned with a 3.0-T scanner from the                     Tuebingen Family Study/German Center for Diabetes Research database. Axial                     slices of the T1-weighted fast spin-echo (T1w FSE), labeled ground truth, and                     segmentation output of the proposed densely connected convolutional neural                     network (DCNet) are shown. Different training and testing scenarios were                     investigated to examine intradatabase (A → A), interdatabase (A →                     B), transfer learning (A+B → A|B), or multidatabase learning                     (A&B → A|B) for changing imaging scanners and field strengths.                     The notation A→B|C denotes training on A and testing on B or C.                     Arrows on the images indicate areas that were falsely classified in some                     experiments. Quantitative scores over the whole cohort are shown in Table                     3.

Figure 4: Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT (yellow), lean tissue (green), and background (black) in one individual scanned with a 1.5-T scanner and in one individual scanned with a 3.0-T scanner from the Tuebingen Family Study/German Center for Diabetes Research database. Axial slices of the T1-weighted fast spin-echo (T1w FSE), labeled ground truth, and segmentation output of the proposed densely connected convolutional neural network (DCNet) are shown. Different training and testing scenarios were investigated to examine intradatabase (A → A), interdatabase (A → B), transfer learning (A+B → A|B), or multidatabase learning (A&B → A|B) for changing imaging scanners and field strengths. The notation A→B|C denotes training on A and testing on B or C. Arrows on the images indicate areas that were falsely classified in some experiments. Quantitative scores over the whole cohort are shown in Table 3.

Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT                     (yellow), lean tissue (green), and background (black) in one Tuebingen Family                     Study/German Center for Diabetes Research (TUEF/DZD) individual scanned with a                     3.0-T scanner and in one German National Cohort database (NAKO) individual                     scanned with a 3.0-T scanner. Axial slices of the T1-weighted fast spin-echo                     (T1w FSE) sequence and multiecho chemical shift (Dixon) fat images, labeled                     ground truth, and segmentation output of the proposed densely connected                     convolutional neural network (DCNet) are shown. Different training and testing                     scenarios were investigated to examine intradatabase (A → A),                     interdatabase (A → B), transfer learning (A+B → A|B), or                     multidatabase learning (A&B → A|B) for changing epidemiologic                     cohorts (imaging sequence, resolution, coil arrangements, patient positioning,                     scanner). The notation A→B|C denotes training on A and testing on B                     or C. Arrows on the images themselves indicate areas that were falsely                     classified in some experiments. Quantitative scores over the whole cohort are                     stated in Table 4.

Figure 5: Adipose tissue (AT) segmented into subcutaneous AT (red), visceral AT (yellow), lean tissue (green), and background (black) in one Tuebingen Family Study/German Center for Diabetes Research (TUEF/DZD) individual scanned with a 3.0-T scanner and in one German National Cohort database (NAKO) individual scanned with a 3.0-T scanner. Axial slices of the T1-weighted fast spin-echo (T1w FSE) sequence and multiecho chemical shift (Dixon) fat images, labeled ground truth, and segmentation output of the proposed densely connected convolutional neural network (DCNet) are shown. Different training and testing scenarios were investigated to examine intradatabase (A → A), interdatabase (A → B), transfer learning (A+B → A|B), or multidatabase learning (A&B → A|B) for changing epidemiologic cohorts (imaging sequence, resolution, coil arrangements, patient positioning, scanner). The notation A→B|C denotes training on A and testing on B or C. Arrows on the images themselves indicate areas that were falsely classified in some experiments. Quantitative scores over the whole cohort are stated in Table 4.

Table 3: Quantitative Evaluation Metrics of Proposed DCNet for Different Training and Testing Pairings between 1.5 T and 3.0 T Test Cases in TUEF/DZD Database

Table 3:

Table 4: Quantitative Evaluation Metrics of Proposed DCNet for Different Training and Testing Pairings between TUEF/DZD (T/D) and NAKO (N) Databases

Table 4:

The segmented images enable derivation of a head-feet adipose tissue profile, which helps to quickly visualize the adipose tissue distribution as exemplary (Fig E2 [supplement]). These profiles, together with the segmented masks, are automatically reported back to the referring physician. The adipose tissue profiles also enable phenotypic characterization, as shown for different test cases grouped by age and sex from the TUEF/DZD database, as shown in Figure 6, or grouped by body mass index and sex (Figure E3 [supplement]). The respective adipose tissue profiles of the NAKO database are shown in Figures E4 and E5 (supplement). In general, LT was more elevated in men than in women. LT and SAT show similar percentage distribution in women, whereas in men the percentage distribution of LT is mostly higher than that of SAT.

Adipose tissue profiles along head-feet direction over all test cases in                     the Tuebingen Family Study/German Center for Diabetes Research database grouped                     according to sex and age. Lean tissue (LT, green), visceral adipose tissue (VAT,                     red), and subcutaneous adipose tissue (SAT, blue) are shown as absolute volume                     in liters (solid line) and percentage per slice (dotted line). Mean (line) and 1                     standard deviation around mean (colored shaded area) are depicted.

Figure 6: Adipose tissue profiles along head-feet direction over all test cases in the Tuebingen Family Study/German Center for Diabetes Research database grouped according to sex and age. Lean tissue (LT, green), visceral adipose tissue (VAT, red), and subcutaneous adipose tissue (SAT, blue) are shown as absolute volume in liters (solid line) and percentage per slice (dotted line). Mean (line) and 1 standard deviation around mean (colored shaded area) are depicted.

Discussion

In this work, we present a 3D adipose tissue segmentation network for whole-body MRI. The proposed network combines different architectural concepts and receives the positional input of the 3D patches to learn multiresolution attention focusing. Reliable adipose tissue segmentation can be obtained with high Dice overlap (0.94), sensitivity (96.6%), specificity (95.1%), precision (92.1%), and accuracy (98.4%), which is also significantly better than a comparable U-Net segmentation (except accuracy). A fast whole-body adipose tissue segmentation can be conducted within 5–7 seconds, which enables integration into clinical workflow. The returned segmentation masks and adipose tissue profiles enable a quick phenotypic assessment. We investigated the performance of the network for varying training databases (TUEF/DZD and NAKO) with changing scanners and imaging field strengths (1.5 and 3.0 T), receiver coil set-ups, imaging sequences (multislice 2D T1-weighted fast spin-echo and 3D multiecho chemical shift Dixon methods), and patient positioning (prone and supine). The effect of changing training and test data, as well as information sharing between and within epidemiologic cohort studies by means of transfer learning, is investigated. The aim was to provide a robust architecture that generalizes well.

The proposed network deals well with changing input data, such as single- and multicontrast images, as well as varying resolutions, scanners, and imaging field strengths. The network performance can vary in testing if the image content differs strongly from the trained cases (eg, TUEF/DZD → NAKO) because the visual appearance of adipose tissue in fast spin-echo and Dixon images is diverse. This obstacle can be mitigated by transfer or joint learning on the data. We observed a minor improvement of transfer learning (A+B) over multidatabase learning (A and B) (ie, guiding the network training with a priori knowledge improves its performance). Moreover, use of complementary information can help mitigate segmentation errors (eg, magnetic field inhomogeneities in 3.0-T cases, as observed in Fig 4). If the network is trained on artifact-affected images, segmentation on good cases becomes more challenging (eg, 3.0 T → 1.5 T). These results suggest that a careful selection and training database composition is important for reliable and robust segmentation.

Segmentation of BG performs equally well in all experiments with high sensitivity, specificity, and precision. Because of its patient-dependent appearance and irregular structure, VAT is more challenging to segment than SAT. Nevertheless, high metrics were achieved as well. Performance of 1.5-T cases in TUEF/DZD was slightly better because of the decreased likelihood of disturbance by magnetic field inhomogeneities.

The proposed architecture outperforms a standard U-Net, owing to its multiresolution sharing and attention focusing. For U-Net, we observed repetitive patterns of misclassification that might indicate insufficient network depth for given input dimensionality. The applied 3D patching in DCNet provides invariance to epidemiologic parameters, such as height and weight, and no additional preprocessing (eg, image alignment) is required. Furthermore, invariance to patient positioning (prone vs supine) was observed. Overall good generalization to different scan conditions (imaging sequences, scanners, coil, and patient positioning) and anthropometric features was obtained. Results from cross-validation runs with different training and test cases showed small standard deviations, indicating reliable and robust training. On the basis of these experiences, we anticipate good reproducibility for repeated predictions and measurements.

Our study had limitations. The obtained results are specific to the imaging set-up and MR sequence design of the study at hand. The segmented tissues in our study are rather large structures and thus may be easier to segment compared with more complicated anatomic structures. Thus, generalizability to other segmentation tasks and methods will be investigated in future studies. Currently, the network is not trained to distinguish bone marrow, which leads to misclassifications. However, for the underlying task of adipose tissue profiling, these misclassifications are attributed to a small and almost constant amount among individuals. In future studies, we plan to extend the labeling and classification with a bone marrow class.

In conclusion, automatic 3D adipose tissue segmentation and standardized topography mapping based on whole-body MRI data with the proposed DCNet is feasible. The proposed architecture uses merge-and-run mapping blocks, dense connections, and patch-input encoding to provide multiresolution attention focusing. The architecture is robust and generalizable to different imaging sequences, contrasts, sites, scanners, field strengths, and examination set-ups (coil arrangements and patient positioning). Segmented adipose tissue compartments (subcutaneous and visceral fat) and head-feet profiles are automatically generated and provide feedback to the referring physician.

Disclosures of Conflicts of Interest: T.K. disclosed no relevant relationships. T.H. disclosed no relevant relationships. M.F. disclosed no relevant relationships. M.S. disclosed no relevant relationships. A.F. disclosed no relevant relationships. H.U.H. disclosed no relevant relationships. K.N. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: is on the speakers bureau of Siemens, Bayer, and Bracco. Other relationships: disclosed no relevant relationships. F.B. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: institution received a research grant from Siemens Healthineers; is on the speakers bureau of Siemens Healthineers. Other relationships: disclosed no relevant relationships. B.Y. disclosed no relevant relationships. F.S. disclosed no relevant relationships. S.G. disclosed no relevant relationships. J.M. disclosed no relevant relationships.

Acknowledgments

This project was conducted with data from the German National Cohort (GNC) (www.nako.de). The GNC is funded by the Federal Ministry of Education and Research (BMBF) (project funding reference no. 01ER1301A/B/C and 01ER1511D), federal states, and the Helmholtz Association, with additional financial support from the participating universities and institutes of the Leibniz Association. We thank all participants who took part in the GNC study and the staff in this research program.

Author Contributions

Author contributions: Guarantor of integrity of entire study, J.M.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, T.K., T.H., F.B., S.G., J.M.; clinical studies, T.K., A.F., H.U., F.B., F.S.; statistical analysis, T.K., T.H.; and manuscript editing, T.K., T.H., M.F., M.S., A.F., K.N., F.B., B.Y., F.S., S.G., J.M.

Supported in part by a grant (01GI0925) from the German Federal Ministry of Education and Research (BMBF) to the German Center for Diabetes Research (DZD e.V.) and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation).

References

  • 1. Machann J, Thamer C, Stefan N, et al. Follow-up whole-body assessment of adipose tissue compartments during a lifestyle intervention in a large cohort at increased risk for type 2 diabetes. Radiology 2010;257(2):353–363. LinkGoogle Scholar
  • 2. Bamberg F, Kauczor HU, Weckbach S, et al. Whole-Body MR Imaging in the German National Cohort: Rationale, Design, and Technical Background. Radiology 2015;277(1):206–220. LinkGoogle Scholar
  • 3. Kissebah AH, Vydelingum N, Murray R, et al. Relation of body fat distribution to metabolic complications of obesity. J Clin Endocrinol Metab 1982;54(2):254–260. Crossref, MedlineGoogle Scholar
  • 4. Krotkiewski M, Björntorp P, Sjöström L, Smith U. Impact of obesity on metabolism in men and women. Importance of regional adipose tissue distribution. J Clin Invest 1983;72(3):1150–1162. Crossref, MedlineGoogle Scholar
  • 5. Ohlson LO, Larsson B, Svärdsudd K, et al. The influence of body fat distribution on the incidence of diabetes mellitus. 13.5 years of follow-up of the participants in the study of men born in 1913. Diabetes 1985;34(10):1055–1058. Crossref, MedlineGoogle Scholar
  • 6. Machann J, Thamer C, Schnoedt B, et al. Standardized assessment of whole body adipose tissue topography by MRI. J Magn Reson Imaging 2005;21(4):455–462. Crossref, MedlineGoogle Scholar
  • 7. Linge J, Borga M, West J, et al. Body Composition Profiling in the UK Biobank Imaging Study. Obesity (Silver Spring) 2018;26(11):1785–1795. Crossref, MedlineGoogle Scholar
  • 8. Linge J, Whitcher B, Borga M, Dahlqvist Leinhard O. Sub‐phenotyping metabolic disorders using body composition: an individualized, nonparametric approach utilizing large data sets. Obesity (Silver Spring) 2019;27(7):1190–1199. Crossref, MedlineGoogle Scholar
  • 9. Heimann T, Meinzer HP. Statistical shape models for 3D medical image segmentation: a review. Med Image Anal 2009;13(4):543–563. Crossref, MedlineGoogle Scholar
  • 10. Iglesias JE, Sabuncu MR. Multi-atlas segmentation of biomedical images: A survey. Med Image Anal 2015;24(1):205–219. Crossref, MedlineGoogle Scholar
  • 11. Criminisi A, Shotton J, Konukoglu E, et al. Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning. Found Trends Comput Graph Vis 2012;7(2–3):81–227. CrossrefGoogle Scholar
  • 12. Shen D, Wu G, Suk HI. Deep learning in medical image analysis. Annu Rev Biomed Eng 2017;19(1):221–248. Crossref, MedlineGoogle Scholar
  • 13. Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal 2017;42:60–88. Crossref, MedlineGoogle Scholar
  • 14. Milletari F, Navab N, Ahmadi SA. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. ArXiv e-prints [preprint] https://arxiv.org/abs/1606.04797. Posted June 15, 2016. Accessed October 2016. Google Scholar
  • 15. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab N, Hornegger J, Wells W, Frangi A, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science, vol 9351. Cham, Switzerland: Springer, 2015; 234–241. CrossrefGoogle Scholar
  • 16. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In: Ourselin S, Joskowicz L, Sabuncu M, Unal G, Wells W, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol 9901. Cham, Switzerland: Springer, 2016; 424–432. Google Scholar
  • 17. Ibtehaz N, Rahman MS. MultiResUNet: Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation. ArXiv:1902.04049 [preprint] https://arxiv.org/abs/1902.04049. Posted February 11, 2019. Accessed March 2019. Google Scholar
  • 18. Yang D, Zhang S, Yan Z, Tan C, Li K, Metaxas D. Automated anatomical landmark detection on distal femur surface using convolutional neural network. In: 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI), New York, NY, April 16–19, 2015. Piscataway, NJ: IEEE, 2015; 17–21. Google Scholar
  • 19. Payer C, Štern D, Bischof H, Urschler M. Regressing heatmaps for multiple landmark localization using CNNs. In: Ourselin S, Joskowicz L, Sabuncu M, Unal G, Wells W, eds. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol 9901. Cham, Switzerland: Springer, 2016; 230–238. CrossrefGoogle Scholar
  • 20. de Vos BD, Wolterink JM, de Jong PA, Viergever MA, Išgum I. 2D image classification for 3D anatomy localization: employing deep convolutional neural networks. In: Styner MA, Angelini ED, eds. Proceedings of SPIE: medical imaging 2016—image processing. Vol 9784. Bellingham, Wash: International Society for Optics and Photonics, 2016; 97841Y. Google Scholar
  • 21. Zhang W, Li R, Deng H, et al. Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. Neuroimage 2015;108:214–224. Crossref, MedlineGoogle Scholar
  • 22. Moeskops P, Viergever MA, Mendrik AM, de Vries LS, Benders MJ, Išgum I. Automatic segmentation of MR brain images with a convolutional neural network. IEEE Trans Med Imaging 2016;35(5):1252–1261. Crossref, MedlineGoogle Scholar
  • 23. Pereira S, Pinto A, Alves V, Silva CA. Brain tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging 2016;35(5):1240–1251. Crossref, MedlineGoogle Scholar
  • 24. Havaei M, Davy A, Warde-Farley D, et al. Brain tumor segmentation with deep neural networks. Med Image Anal 2017;35:18–31. Crossref, MedlineGoogle Scholar
  • 25. Qin W, Wu J, Han F, et al. Superpixel-based and boundary-sensitive convolutional neural network for automated liver segmentation. Phys Med Biol 2018;63(9):095017. Crossref, MedlineGoogle Scholar
  • 26. Bai W, Sinclair M, Tarroni G, et al. Automated cardiovascular magnetic resonance image analysis with fully convolutional networks. J Cardiovasc Magn Reson 2018;20(1):65. Crossref, MedlineGoogle Scholar
  • 27. Borga M. MRI adipose tissue and muscle composition analysis-a review of automation techniques. Br J Radiol 2018;91(1089):20180252. Crossref, MedlineGoogle Scholar
  • 28. Dixon WT. Simple proton spectroscopic imaging. Radiology 1984;153(1):189–194. LinkGoogle Scholar
  • 29. Yu H, Shimakawa A, McKenzie CA, Brodsky E, Brittain JH, Reeder SB. Multiecho water-fat separation and simultaneous R2* estimation with multifrequency fat spectrum modeling. Magn Reson Med 2008;60(5):1122–1134. Crossref, MedlineGoogle Scholar
  • 30. Reeder SB, Hu HH, Sirlin CB. Proton density fat-fraction: a standardized MR-based biomarker of tissue fat concentration. J Magn Reson Imaging 2012;36(5):1011–1014. Crossref, MedlineGoogle Scholar
  • 31. Hu HH, Chen J, Shen W. Segmentation and quantification of adipose tissue by magnetic resonance imaging. MAGMA 2016;29(2):259–276. Crossref, MedlineGoogle Scholar
  • 32. Vovk U, Pernus F, Likar B. A review of methods for correction of intensity inhomogeneity in MRI. IEEE Trans Med Imaging 2007;26(3):405–421. Crossref, MedlineGoogle Scholar
  • 33. Karlsson A, Rosander J, Romu T, et al. Automatic and quantitative assessment of regional muscle volume by multi-atlas segmentation using whole-body water-fat MRI. J Magn Reson Imaging 2015;41(6):1558–1569. Crossref, MedlineGoogle Scholar
  • 34. Würslin C, Machann J, Rempp H, Claussen C, Yang B, Schick F. Topography mapping of whole body adipose tissue using A fully automated and standardized procedure. J Magn Reson Imaging 2010;31(2):430–439. Crossref, MedlineGoogle Scholar
  • 35. Addeman BT, Kutty S, Perkins TG, et al. Validation of volumetric and single-slice MRI adipose analysis using a novel fully automated segmentation method. J Magn Reson Imaging 2015;41(1):233–241. Crossref, MedlineGoogle Scholar
  • 36. Estrada S, Lu R, Conjeti S, et al. FatSegNet: A fully automated deep learning pipeline for adipose tissue segmentation on abdominal Dixon MRI. Magn Reson Med 2020;83(4):1471–1483. Crossref, MedlineGoogle Scholar
  • 37. Langner T, Hedström A, Mörwald K, et al. Fully convolutional networks for automated segmentation of abdominal adipose tissue depots in multicenter water-fat MRI. Magn Reson Med 2019;81(4):2736–2745. Crossref, MedlineGoogle Scholar
  • 38. Borga M, West J, Bell JD, et al. Advanced body composition assessment: from body mass index to body composition profiling. J Investig Med 2018;66(5):1–9. Crossref, MedlineGoogle Scholar
  • 39. Artham SM, Lavie CJ, Patel HM, Ventura HO. Impact of obesity on the risk of heart failure and its prognosis. J Cardiometab Syndr 2008;3(3):155–161. Crossref, MedlineGoogle Scholar
  • 40. Kurioka S, Murakami Y, Nishiki M, Sohmiya M, Koshimura K, Kato Y. Relationship between visceral fat accumulation and anti-lipolytic action of insulin in patients with type 2 diabetes mellitus. Endocr J 2002;49(4):459–464. Crossref, MedlineGoogle Scholar
  • 41. Doyle SL, Donohoe CL, Lysaght J, Reynolds JV. Visceral obesity, metabolic syndrome, insulin resistance and cancer. Proc Nutr Soc 2012;71(1):181–189. Crossref, MedlineGoogle Scholar
  • 42. Huang G, Liu Z, Weinberger KQ, van der Maaten L. Densely connected convolutional networks. ArXiv:1608.06993 [preprint] https://arxiv.org/abs/1608.06993. Posted August 25, 2016. Accessed February 2018. Google Scholar
  • 43. He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, June 27–30, 2016. Piscataway, NJ: IEEE, 2016; 770–778. CrossrefGoogle Scholar
  • 44. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need. Advances in Neural Information Processing Systems, 2017; 5998–6008. https://papers.nips.cc/paper/7181-attention-is-all-you-need. Accessed January 2018. Google Scholar
  • 45. Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. ArXiv:1511.07122 [preprint] https://arxiv.org/abs/1511.07122. Posted 2015. Accessed May 2016. Google Scholar
  • 46. Zhao L, Wang J, Li X, Tu Z, Zeng W. Deep convolutional neural networks with merge-and-run mappings. ArXiv:1611.07718 [preprint] https://arxiv.org/abs/1611.07718. Published 2016. Accessed September 2017. Google Scholar
  • 47. Lin T, Goyal P, Girshick R, He K, Dollár P. Focal Loss for Dense Object Detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 22–29, 2017. Piscataway, NJ: IEEE, 2017; 2999–3007. CrossrefGoogle Scholar

Article History

Received: Jan 30 2020
Revision requested: Mar 4 2020
Revision received: June 2 2020
Accepted: June 26 2020
Published online: Oct 28 2020