Multicenter Reproducibility of Liver Iron Quantification with 1.5-T and 3.0-T MRI
Abstract
Background
MRI is a standard of care tool to measure liver iron concentration (LIC). Compared with regulatory-approved R2 MRI, R2* MRI has superior speed and is available in most MRI scanners; however, the cross-vendor reproducibility of R2*-based LIC estimation remains unknown.
Purpose
To evaluate the reproducibility of LIC via single-breath-hold R2* MRI at both 1.5 T and 3.0 T with use of a multicenter, multivendor study.
Materials and Methods
Four academic medical centers using MRI scanners from three different vendors (three 1.5-T scanners, one 2.89-T scanner, and two 3.0-T scanners) participated in this prospective cross-sectional study. Participants with known or suspected liver iron overload were recruited to undergo multiecho gradient-echo MRI for R2* mapping at 1.5 T and 3.0 T (2.89 T or 3.0 T) on the same day. R2* maps were reconstructed from the multiecho images and analyzed at a single center. Reference LIC measurements were obtained with a commercial R2 MRI method performed using standardized 1.5-T spin-echo imaging. R2*-versus-LIC calibrations were generated across centers and field strengths using linear regression and compared using F tests. Receiver operating characteristic (ROC) curve analysis was used to determine the diagnostic performance of R2* MRI in the detection of clinically relevant LIC thresholds.
Results
A total of 207 participants (mean age, 38 years ± 20 [SD]; 117 male participants) were evaluated between March 2015 and September 2019. A linear relationship was confirmed between R2* and LIC. All calibrations within the same field strength were highly reproducible, showing no evidence of statistically significant center-specific differences (P > .43 across all comparisons). Calibrations for 1.5 T and 3.0 T were generated, as follows: for 1.5 T, LIC (in milligrams per gram [dry weight]) = −0.16 + 2.603 × 10−2 R2* (in seconds−1); for 2.89 T, LIC (in milligrams per gram) = −0.03 + 1.400 × 10−2 R2* (in seconds−1); for 3.0 T, LIC (in milligrams per gram) = −0.03 + 1.349 × 10−2 R2* (in seconds−1). Liver R2* had high diagnostic performance in the detection of clinically relevant LIC thresholds (area under the ROC curve, >0.98).
Conclusion
R2* MRI enabled accurate and reproducible quantification of liver iron overload over clinically relevant ranges of liver iron concentration (LIC). The data generated in this study provide the necessary calibrations for broad clinical dissemination of R2*-based LIC quantification.
ClinicalTrials.gov registration no.: NCT02025543
© RSNA, 2022
Summary
MRI-based R2* mapping enabled reproducible quantification of liver iron overload across four centers, three vendors, and three field strengths (1.5 T, 2.89 T, and 3.0 T).
Key Results
■ In a prospective study of 207 participants with known or suspected liver iron overload and liver iron concentration (LIC) between 0.3 and 42.5 mg/g, who underwent same-day multiecho gradient-echo MRI for R2* mapping at 1.5 T, 2.89 T, and 3.0 T, R2* mapping enabled reproducible quantification of iron overload across multiple centers, vendors, and field strengths.
■ Calibrations of liver R2* (in seconds−1) into biopsy-equivalent R2-based LIC (in milligrams per gram) were calculated at 1.5 T, 2.89 T, and 3.0 T and were reproducible across centers and vendors.
■ R2* enabled high diagnostic performance of clinically relevant LIC thresholds (area under the receiver operating characteristic curve, >0.98).
Introduction
Liver iron concentration (LIC) is directly and linearly related to total body iron stores (1). As such, LIC is widely recognized as a useful surrogate marker for the diagnosis, grading, and treatment monitoring of patients with known or suspected iron overload (2). Although the serum ferritin level is the simplest means to assess body iron, it is also an acute phase reactant and therefore is not a reliable marker of body iron (3,4). Liver biopsy is limited by its invasive nature, bleeding risk, and sampling variability (5).
MRI is highly sensitive to the abnormal deposition of iron in tissue, which accelerates the rate of signal decay for both spoiled gradient-recalled echo and spin-echo MRI methods (ie, R2* [in seconds−1] and R2 [in seconds−1], respectively). Correlation studies between these MRI metrics and biochemically determined LIC obtained with liver biopsy have demonstrated that both R2* and R2 have monotonic relationships with LIC; the R2*-LIC relationship is linear (6–10), whereas the R2-LIC relationship is curvilinear (7,11). R2-based LIC quantification is well established as a noninvasive MRI-based reference standard with extensive use in clinical practice and clinical trials (12), good reproducibility (13), and regulatory approval in many countries (14,15). The main limitations of R2-based relaxometry are the long acquisition time (10–20 minutes), limited spatial coverage of the liver, motion artifacts in free-breathing acquisitions, and availability at only 1.5-T field strength. For these reasons, R2-based LIC quantification is not widely used in many clinical practices.
R2* relaxometry using three-dimensional multiecho spoiled gradient-recalled echo acquisitions is emerging as a clinically viable alternative to R2 relaxometry. Facilitated by parallel imaging, high-performance gradient hardware, and phased-array receiver coil technology, complete coverage of the liver in a single short breath hold is feasible. Unfortunately, liver R2* mapping can be confounded by several important effects, leading to errors and variability in liver iron quantification. One important confounder is hepatic steatosis, or deposition of excess triglycerides in the liver, which can contribute to the apparent MRI signal decay, leading to bias in R2* (16,17) if left unaccounted for. Furthermore, fitting of R2* from magnitude spoiled gradient-recalled echo signal can also lead to noise-related bias if the noise floor effects (18) are not properly included in the fitting procedure.
Recently, complex “confounder-corrected” R2* reconstruction has been introduced (17). By modeling for the presence of fat and with use of complex-valued images that preserve the phase and magnitude of the signal, accurate estimates of R2*, corrected for the effects of fat and noise, can be made. Although R2*-LIC calibration curves have been derived for other R2*-based LIC quantification methods, primarily at 1.5 T, only a single-center R2*-LIC calibration study has been published for complex confounder-corrected R2* (9). Therefore, the purpose of this study was to evaluate the reproducibility of complex confounder-corrected R2*-based LIC calibration at both 1.5 T and 3.0 T through a prospective multicenter, multivendor study.
Materials and Methods
Herein, we describe a prospective, multicenter, cross-sectional study (ClinicalTrials.gov identifier: NCT02025543) performed to evaluate the multicenter reproducibility and calibration of MRI-based R2* mapping for the quantification of LIC in patients with iron overload. No employees of GE Healthcare, Siemens Healthineers, or Philips Healthcare had any control over the data or analysis. Two preliminary single-site technical reports that include R2 mapping data from 84 of the patients recruited in this study were recently published (19,20).
Centers and MRI Vendors
Four centers (University of Wisconsin–Madison [hereafter, UW-Madison], University of Texas Southwestern, Johns Hopkins University [JHU], and Stanford University) with 1.5-T and 3.0-T clinical MRI systems from three vendors (GE Healthcare, Philips Healthcare, and Siemens Healthineers) participated in this study (Table 1). Note that 3.0-T systems in this work include two slightly different, clinically available field strengths: 2.89 T at JHU versus 3.0 T at the remaining three centers. This must be accounted for in data analysis.
![]() |
Participants
This was a prospective, Health Insurance Portability and Accountability Act–compliant study performed after obtaining approval from the local institutional review board at each of the four academic medical centers. After obtaining informed consent, participants (minimum age: 10 years at UW-Madison and JHU, 18 years at University of Texas Southwestern, and 6 years at Stanford University) with known or suspected iron overload were recruited. Determination of iron overload to guide recruitment was based on known elevated serum ferritin level, prior clinical imaging indicating iron overload, or an established diagnosis of a genetic condition that predisposed patients to iron overload, such as hereditary hemochromatosis. Exclusion criteria included contraindications to MRI.
Sample Size
The primary goal of this study was to calibrate R2* with LIC at both 1.5 T and 3.0 T and to evaluate the reproducibility of this calibration across sites. Our sample size of approximately 50 patients per site was largely dictated by financial and feasibility constraints. With use of this sample size, the expected performance of calibration and reproducibility were analyzed using the preliminary clinical data acquired previously at UW-Madison (9). In an interim evaluation of a subset (25 patients) of these previous data using a preliminary R2* mapping algorithm, the observed slope between LIC and R2* measured with a commercially available R2-based spin-echo technique (Ferriscan; Resonance Health) was 0.035, with correlation 0.95 at 1.5 T and slope of 0.018 with correlation of 0.95 at 3.0 T. Assuming the observed SDs (SD of R2* at 1.5 T = 132 seconds−1; SD of R2* at 3.0 T = 251 seconds−1; SD of Ferriscan-based LIC = 5 mg/g [dry weight]), the actual distance from the slope to the 95% limits of a two-sided CI will be 2.21 × 10−3 mg · sec/g for a sample size of n = 50 and 1.06 × 10−3 mg·sec/g for a sample size of n = 200 at 1.5 T (residual SD = 1.01) and 2.04 × 10−3 mg·sec/g for a sample size of n = 50 and 0.99 × 10−3 mg · sec/g for a sample size of n = 200 at 3.0 T (residual SD = 1.77). This affords increased precision relative to our pilot data, in which the standard errors were 2.16 × 10−3 mg · sec/g and 1.35 × 10−3 mg · sec/g for our 1.5-T and 3.0-T regression slopes, respectively, based on 25 observations. Furthermore, based on the study design with approximately 50 subjects per center, this study would have 80% power to detect a R2*-LIC calibration slope difference between centers of 2.4 × 10−4 mg · sec/g and 1.5 × 10−4 mg · sec/g for 1.5 T and 3.0 T, respectively.
Study Visit
Study visits occurred between March 2015 and September 2019. Each participant underwent a single visit, which included MRI at both 1.5 T and 3.0 T, performed approximately within 1 hour. In addition, all participants underwent a blood draw for serum ferritin concentration, either during the study visit or during a contemporaneous clinical visit.
MRI Protocol
The reference LIC measurement was obtained at 1.5 T using a commercial U.S. Food and Drug Administration–approved R2-based method for quantification of LIC (FerriScan) (11). This reference LIC measurement method has been shown to be reproducible across centers and MRI vendors (12). The reference R2-based LIC quantification acquisition, which was standardized across centers, consisted of free-breathing two-dimensional spin-echo images acquired at multiple echo times over approximately 17 minutes (see Table 1 for details). R2* mapping data were acquired at 1.5 T and 3.0 T in a single breath hold using a multiecho spoiled gradient-echo acquisition (21–23), with field strength–specific parameters standardized across centers and vendors (Table 1). R2* mapping data were acquired using prototype pulse sequences by each vendor to enable standardized acquisition parameters with optimized R2* dynamic range.
Reference LIC Measurement
Images obtained at 1.5 T were uploaded from each of the four centers to an independent core laboratory (Resonance Health). The core laboratory returned a summary LIC value (in milligrams of iron per gram of dry tissue), which was used as the reference LIC value for this study.
R2* Mapping
Complex-valued (real and imaginary or magnitude and phase) multiecho spoiled gradient-echo images were transferred from each of the centers in this study to the data processing center (UW-Madison) for R2* mapping, measurement, and analysis. R2* maps were generated using a centralized, confounder-corrected algorithm (17). Briefly, this algorithm performed R2* mapping from the complex-valued data, including correction for multipeak fat signals, and avoiding noise bias through the use of nonlinear least-squares fitting to the complex-valued data. To avoid instabilities at high iron levels, fat-uncorrected R2* measurements were used when the (fat-uncorrected) liver R2* was higher than a threshold (500 seconds−1 at 1.5 T and 1000 seconds−1 at 3.0 T) (9).
Image Analysis
Measurements on the R2* maps were performed by an independent core laboratory within UW-Madison, where three observers performed the R2* measurements (each examination was evaluated by one observer) while blinded to the reference LIC values. R2* region of interest measurements (elliptical area approximately 3–4 cm2) were performed in each of the four Couinaud segments within the right lobe of the liver. The right lobe was selected because of the higher image quality with minimal motion artifacts compared with the left lobe and the relative homogeneity of liver iron distribution expected in most participants. Right lobe liver R2* measurements were obtained by averaging the segment-by-segment region of interest measurements.
To evaluate the intrareader reproducibility of R2* measurements, a total of 20 participants (five per site) were analyzed by the same observer (observer 1) at both 1.5 T and 3.0 T, with a more than 1 month delay between measurements. To evaluate the interreader reproducibility, a different set of 20 participants (five per site) originally analyzed by observer 2 were analyzed by observer 1 at both 1.5 T and 3.0 T.
Evaluation of Liver Fat
Using a similar chemical shift–encoded acquisition as that used for R2* mapping at 1.5 T, except with use of low flip angles (5°) to avoid T1 bias, proton-density fat fraction (PDFF) was also measured in the same nine regions of interest used for fat-corrected R2* measurements and averaged to estimate the liver fat content of the participants. Importantly, PDFF values may not be valid for patients with markedly elevated liver iron levels, as described by Colgan et al (24). Specifically, PDFF estimates are likely not valid for R2* values higher than 520 seconds−1 at 1.5 T for the choice of echo times used in this study (echo time 1 of 0.8 msec, echo spacing of 0.8 msec, and effective number of signal averages of 0.2) as predicted by Colgan et al. Therefore, based on the R2*-LIC calibration measured in this study, PDFF values were not tabulated for those participants with LIC higher than 13.4 mg/g.
Statistical Analysis
All statistical analyses were performed in R (version 4.1.0; Foundation for Statistical Computing). P < .05 was considered indicative of a statistically significant difference.
For reproducibility and calibration analysis at 1.5 T, calibration curves were generated by using linear regression with center-specific intercepts acenter and slopes bcenter, as follows: LIC = acenter + bcenter R2*.
The intercepts and slopes were compared for each pair of centers by using F tests on their interactions with the corresponding center identification. An overall calibration was determined for R2* to predict LIC using the pooled data without the center identification.
To account for the slightly different “3.0-T” field strength at one of the centers (2.89 T at JHU vs 3.0 T at the remaining three centers), an adjusted approach was used to determine reproducibility and calibration. For this purpose, the calibration at 3.0 T included a dependence of measured R2* on field strength over this range, as follows: LIC = acenter + bcenter R2*/[field strength].
This approach forces the slope of the calibration (bcenter/field strength) to be inversely related to the field strength over the narrow range between 2.89 T and 3.0 T.
In addition, a unified calibration across the three field strengths was also calculated using the same field strength–dependent model described earlier. To account for the multiple measurements per participant in this unified calibration including 1.5-T and 3.0-T data, generalized estimating equations (25) were applied to estimate the intercept and slope.
To evaluate the effect of age on calibration, we performed a likelihood ratio test (a χ2 test with 2 df) on the difference between models grouped according to age (<18 years vs ≥18 years) and ungrouped (including all ages).
The ability of R2*-based LIC measurements to enable detection of clinically relevant LIC thresholds was evaluated for specific thresholds (11): 1.8 mg/g (upper limit for normal LIC) (26), 3.2 mg/g (lower range of optimal for chelation therapy) (11,27), 7.0 mg/g (upper limit of optimal for chelation therapy, and threshold for increased risk of hepatic fibrosis, diabetes mellitus, and other complications) (11,27), and 15.0 mg/g (threshold for severe iron overload with increased risk of early death) (11,27). This evaluation was performed using receiver operating characteristic (ROC) analysis, with diagnostic performance measured with the area under the ROC curve.
To evaluate the relationship between R2* measured at the two clinically relevant MRI field strengths (ie, 1.5 T and 3.0 T), a regression analysis was performed for measurements at each center, as well as pooled across centers. To assess the effects of potential R2* bias at 3.0 T for participants with very high LIC, separate regression analyses on subsets of the data were performed for LIC less than 3.2 mg/g (28), for LIC less than 7.0 mg/g (29), and for LIC less than 15.0 mg/g (30). Separate 3.0-T pooled analysis including only the centers with 3.0-T systems was also performed to assess the effect of the slightly different field strength (2.89 T) at JHU.
R2* measurement intra- and interreader reproducibility was evaluated using Bland-Altman analysis and through the calculation of the mean difference and 95% limits of agreement.
The relationship of serum ferritin concentration with LIC was analyzed using linear regression analysis. In addition, the diagnostic performance of serum ferritin in the detection of relevant LIC thresholds was evaluated using ROC analysis.
Results
Participant Characteristics
A total of 207 participants were recruited for this study (UW-Madison: 60 participants, 40 male and 20 female participants, mean age of 44 years ± 19 [SD], and mean body mass index [BMI] of 26.7 kg/m2 ± 6.4; University of Texas Southwestern: 49 participants, 26 male and 23 female participants, mean age of 44 years ± 15, and mean BMI of 27.6 kg/m2 ± 5.1; JHU: 53 participants, 24 male and 29 female participants, mean age of 37 years ± 20, mean BMI of 24.3 kg/m2 ± 4.9; Stanford University: 45 participants, 27 male and 18 female participants, mean age of 23 years ± 17, and mean BMI of 21.8 kg/m2 ± 5.4). See Figure 1 for an overview of the study design, data flow, and recruitment. One participant’s reference LIC was out of range (>43.0 mg/g) and was therefore excluded from calibration; however, the R2* data were used for comparison between 1.5 T and 3.0 T (which does not require an LIC reference). For a small number of participants per center (see Table 2), specific LIC or R2* mapping data were not available due to acquisition or data storage errors. R2* mapping data at both field strengths for a single participant (ie, two of 392 acquisitions or 0.5%) were not usable due to excessive motion artifacts. An additional four participants enrolled but withdrew before imaging was performed.

Figure 1: Flowchart and summary of recruitment. JHU = Johns Hopkins University, LIC = liver iron concentration, UTSW = University of Texas Southwestern, UW = University of Wisconsin–Madison.
![]() |
The participants had a mean LIC of 6.3 mg/g ± 7.7 (range, 0.3–42.5 mg/g; median, 3.1 mg/g). The mean LIC was 5.7 mg/g ± 6.2 (range, 0.5–38 mg/g; median, 3.8 mg/g) at UW-Madison, 5.2 mg/g ± 8.6 (range, 0.3–38 mg/g; median, 1.5 mg/g) at University of Texas Southwestern, 8.9 mg/g ± 9.6 (range, 0.4–42.5 mg/g; median, 6.1 mg/g) at JHU, and 5.4 mg/g ± 5.3 (range, 0.8–25.0 mg/g; median, 3.0 mg/g) at Stanford University. See Table 2 for a summary of the recruited participants and Table S1 (online) for details on each individual participant, including their disease state.
R2* Mapping
Figure 2 shows representative R2* maps from three different participants, each acquired with an MRI system from a different vendor at both 1.5 T and 3.0 T. In addition, Figure 2 also shows LIC maps obtained by using the calibration derived in this work (see below). As can be observed, R2* increases with field strength, whereas LIC measurements are independent of field strength.

Figure 2: Representative R2* and liver iron concentration (LIC) maps. (A) Representative R2* maps from three participants with different levels of iron overload, obtained with MRI systems from three different vendors. For each participant, R2* maps were obtained at both clinical field strengths, 1.5 T and 3.0 T. (B) Corresponding LIC maps obtained by applying the calibration reported herein. Although R2* increases approximately linearly with field strength, R2*-based LIC quantification is independent of field strength. JHU = Johns Hopkins University, UTSW = University of Texas Southwestern, UW = University of Wisconsin–Madison.
R2* Reproducibility and Calibration
Figure 3 shows the calibrations between R2* and LIC at both 1.5 T (four centers) and 3.0 T (three centers). No evidence of a difference was observed between any pair of centers at either field strength (P > .43 across all comparisons). The supplemental data include center-by-center calibration (Table S2 [online]) as well as statistical results for the comparisons across calibrations (Table S3 [online]). The calibrations measured in this work at 1.5 T, 2.89 T, and 3.0 T, including 95% CIs, are shown in Table 3. Note that the calibrations at 2.89 T and 3.0 T are constrained to have the same intercept and slope inversely related to the field strength. Results from a single unified calibration across the three field strengths are shown in Table S4 (online).

Figure 3: Graphs show R2* liver iron concentration (LIC) calibration for (A) each of the four centers using 1.5 T, (B) each of the three centers using 3.0 T, and (C) the center using 2.89 T. The dashed lines show the 95% CIs for the pooled calibration at 1.5 T and 3.0 T. JHU = Johns Hopkins University, UTSW = University of Texas Southwestern, UW = University of Wisconsin–Madison.
![]() |
In addition, the relationship between R2* (at both 1.5 T and 3.0 T) and R2 measurements (acquired at 1.5 T to calculate the reference LIC in this study) is depicted in Figure S1 (online).
Effect of Age on Calibration
From the analysis evaluating the effect of age on calibration, we observed no significant effect of age on calibration (P = .72 and P = .08 for 1.5 T and 3.0 T, respectively, when using the data grouped vs ungrouped according to age).
R2*-based Detection of LIC Thresholds
The diagnostic performance of R2* in the detection of LIC thresholds is depicted in the ROC curves in Figure 4. At either field strength and for any of the clinically relevant thresholds (1.8, 3.2, 7.0, 15.0 mg/g), estimated areas under the ROC curve of 0.98 or higher were observed. Center-specific ROC results are included in Table S5 (online).

Figure 4: Receiver operating characteristic curves for R2*-based liver iron concentration (LIC) in the detection of several LIC thresholds. Results are shown pooled across all four centers for each of the two clinical field strengths, (A) 1.5 T and (B) 3.0 T (including both 2.89 T and 3.0 T). AUC = area under the receiver operating characteristic curve.
R2* Correlation across Field Strengths
The relationship between 1.5-T and 3.0-T R2* measurements is shown in Figure 5. Overall, a regression slope of 1.94 (95% CI: 1.91, 1.97) between 1.5-T and 3.0-T measurements was observed, with R2 of 0.99. When considering only participants with LIC less than 15 mg/g, the slope is 1.99 (95% CI: 1.96, 2.02), with R2 of 0.99. Detailed results for various LIC ranges and for each individual center are included in Table S6 (online).

Figure 5: Scatterplots of liver R2* measurements across the two field strengths, 1.5 T and 3.0 T, for each of the centers. (A) Plot shows 1.5-T versus 3.0-T R2* measurements for three centers (University of Wisconsin–Madison, University of Texas Southwestern, Stanford). (B) Plot shows 1.5-T versus 2.89-T R2* measurements for one center (Johns Hopkins University).
Intra- and Interreader Reproducibility
The intra- and interreader reproducibility of R2* measurements demonstrate small mean differences between measurements with narrow limits of agreement, as shown in Figure S2 (online).
Evaluation of Liver Fat
Of the 207 participants, 171 (83%) had valid PDFF measurements. Among these 171 participants, the average PDFF was 6.8%, with an SD of 8.6% (range, −3.8% to 39.5%).
Relationship between Serum Ferritin and LIC
The relationship between serum ferritin concentration and LIC is shown in Figure 6, including regression analysis (Fig 6A) and ROC analysis (Fig 6B). Serum ferritin demonstrated an R2 of 0.50 with LIC and an area under the ROC curve of 0.86, 0.86, 0.84, and 0.94 in the detection of LIC thresholds of 1.8, 3.2, 7.0, and 15.0 mg/g, respectively.

Figure 6: Performance of serum ferritin (SF) concentration in the evaluation of liver iron concentration (LIC). (A) Scatterplot with linear regression between serum ferritin concentration and LIC. (B) Receiver operating characteristic curves for several LIC thresholds. The corresponding serum ferritin concentration thresholds determined using the Youden criterion are shown. AUC = area under the receiver operating characteristic curve, thres = threshold.
Discussion
Compared with R2 MRI, R2* MRI methods are widely available and have superior speed. However, the cross-vendor reproducibility of R2*-based liver iron concentration (LIC) estimation remains unknown. Thus, our objective was to report the results from a multicenter, multivendor study aimed at determining the R2*-LIC calibration for complex confounder-corrected R2* MRI at both 1.5 T and 3.0 T. Our results demonstrate high cross-vendor reproducibility (P > .43 across all comparisons), providing a common calibration relationship across vendors. The reported calibrations are not significantly different between children and adults. This work also provides R2* LIC calibration at both 1.5 T and 3.0 T, the two most widely relevant clinical field strengths, facilitating the widespread dissemination of R2*-based liver iron quantification with high diagnostic performance.
Importantly, the calibration data from this work overcome a major hurdle for widespread dissemination of R2*-based MRI for rapid, accurate, whole-liver LIC quantification. Indeed, R2*-based LIC quantification can be performed as part of a focused MRI protocol in approximately 5 minutes of MRI examination time (31). Focused protocols such as this are well suited for patients as young as 6 years, reducing the need for sedation and anesthesia, and may enable MRI-based liver iron quantification with reduced cost—similar to that of US (31).
Two separate approaches to calibration are reported herein: a field strength–specific calibration where 1.5-T and 3.0-T (including 2.89-T and 3.0-T) data are separately analyzed, and a unified calibration where data are analyzed jointly across the three field strengths. In this unified approach, the slope of the R2*-LIC calibration is inversely related to field strength. Both approaches lead to similar calibrations. For the unified calibration, any residual bias at one field strength (eg, at 3.0 T where R2* mapping is challenging with high iron concentration) may affect the calibration at the other field strengths (eg, at 1.5 T). Thus, we recommend the use of the separate calibrations (one at 1.5 T and one obtained from 2.89-T or 3.0-T data), but the unified calibration is available for those readers who prefer to use this calibration.
Our results also confirm a linear relationship between R2* and LIC, previously reported as part of single-center studies at 1.5 T (6–10,32) and 3.0 T (9). Previously reported calibrations using biopsy (6–8) or R2-based LIC (9,10,32) as the reference standard are very similar to our results. However, the study by Anderson et al (6), who reported results in 13 participants, deviates from all other curves for unclear reasons, possibly related to the small sample size or differences in the acquisition, reconstruction, or reference standard. Overall, however, the relationship between R2* and LIC in our current study is very similar that found in most past work (7–10,33). This may be related to the observation that R2* quantification in the liver is generally reproducible, even with large differences in acquisition or reconstruction method (9,34,35). In our current study, there was a slight difference in field strength for one system (2.89 T, JHU). The observed difference in R2* measurements between 2.89 T and 3.0 T was consistent with expected changes predicted by the underlying physics, particularly when comparing R2* measurements in the same participant across field strengths. These differences, as predicted, are small and may not be clinically relevant relative to measurement variability, depending on the specific clinical or research use of this calibration. Regardless, different calibrations were calculated for consistency with the difference in field strengths, and we recommend the use of the R2* versus LIC calibration for the specific field strength in question. Finally, the high correlation coefficient between R2* measurements at 1.5 T and 3.0 T suggests extremely high precision, although this was not directly evaluated in our study.
We note that the chemical shift–encoded MRI method used to measure fat-corrected R2* in this work can also be used to measure iron-corrected PDFF. The specific acquisition used for R2* mapping in this study had a relatively high flip angle to maximize the signal-to-noise ratio and the dynamic range of R2*, which would lead to T1 bias in the quantification of PDFF. For this reason, a separate chemical shift–encoded acquisition with a low flip angle (5°) was used to evaluate PDFF. Although a full evaluation of liver fat is beyond the scope of this study, we expect that simultaneous fat and iron quantification is feasible with the proposed approach, either by using low flip angles or by correcting for residual T1 bias in PDFF.
Our study had limitations. First, the reference standard used for the calibration was not based on biopsy but rather a commercial U.S. Food and Drug Administration–approved R2-based MRI method previously calibrated using biopsy. Given the increasing use of MRI to detect and quantify liver iron overload, biopsy to quantify LIC is no longer the standard of care at most institutions; therefore, biopsy for the purposes of our current study was not considered ethical. Second, we used Cartesian-based k-space sampling, which limits the minimum achievable echo times, ultimately limiting the upper bound of reliable R2* measurements. Cartesian methods have many favorable properties, including insensitivity to MRI hardware imperfections, and as shown in our study, enable reliable R2* measurements for LIC values well above 15 mg/g, the accepted threshold for severe iron overload (27). Non-Cartesian ultrashort echo time R2* mapping methods show great promise for extending the dynamic range of R2* measurements (36). Ultrashort echo time methods, however, are not necessary for detection of abnormal iron overload and probably only play a role for accurate treatment monitoring of extreme LIC values. Third, we used a single, centralized reconstruction algorithm, and not each vendor’s native reconstruction algorithm (which may be evaluated in future work). Importantly, by establishing the reproducibility across R2* mapping acquisitions for different vendors, our current study addresses the central challenge of reproducibility (ie, the data acquisition), because a confounder-corrected R2* reconstruction algorithm can be implemented by the various vendors with relative ease. In addition, the R2* measurements used in this work were based on multiple regions of interest located throughout the right lobe. Improved precision might be achievable through whole-liver (or whole-lobe) segmentation. Finally, our study was not powered to distinguish R2*-LIC calibrations for various causes of iron overload, as well as the effects of chelation therapy or phlebotomy.
In conclusion, we report the R2* calibration and mapping of reproducible quantifications of liver iron concentration (LIC) across four institutions, three MRI vendors, seven MRI models, and three field strengths (1.5 T, 2.89 T, and 3.0 T). The calibrations developed from these data were highly reproducible and accurate for the detection of clinically meaningful LIC thresholds. In this way, these data provide the necessary calibrations for broad dissemination of R2*-based LIC quantification on most clinically available MRI systems used in modern imaging centers.
Acknowledgments
The authors acknowledge Nikolaos Panagiotopoulos, MD, for assistance with data analysis. We also acknowledge Emily Ferris, Ben Johnson, and Martha Garcia, for assistance with region-of-interest measurements.
Author Contributions
Author contributions: Guarantors of integrity of entire study, D.H., S.B.R.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, D.H., R.Z., S.R., L.M., M.R.J., I.R.K., T.Y., S.B.R.; clinical studies, D.H., Q.Y., M.A.G., R.J.M., M.R.J., I.R.K., S.V., T.Y., S.B.R.; experimental studies, D.H., R.Z., Q.Y., S.R., D.C.K., I.P., I.R.K.; statistical analysis, D.H., R.Z., X.M., L.M., I.R.K., T.Y., S.B.R.; and manuscript editing, D.H., R.Z., Q.Y., S.R., X.M., D.C.K., L.M., D.T.H., M.R.J., I.P., I.R.K., S.V., T.Y., S.B.R.
Supported by the National Institutes of Health (grants R01 DK100651, R01 DK117354, R01 DK083380, and K24 DK102595), GE Healthcare, who provides research support to the University of Wisconsin–Madison and Stanford University, Philips Healthcare, who provides research support to the University of Texas Southwestern and the Technical University of Munich, and Siemens Healthineers, who provides research support to Johns Hopkins University. S.B.R. is a Romnes Faculty Fellow and has received an award provided by the University of Wisconsin–Madison Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation.
Data sharing: Data generated or analyzed during the study are available from the corresponding author by request.
References
- 1. . Hepatic iron concentration and total body iron stores in thalassemia major. N Engl J Med 2000;343(5):327–331.
- 2. . Severity of iron overload in patients with sickle cell disease receiving chronic red blood cell transfusion therapy. Blood 2000;96(1):76–79.
- 3. . Serum ferritin iron in iron overload and liver damage: correlation to body iron stores and diagnostic relevance. J Lab Clin Med 2000;135(5):413–418.
- 4. . Liver biopsy results in patients with sickle cell disease on chronic transfusions: poor correlation with ferritin levels. Pediatr Blood Cancer 2008;50(1):62–65.
- 5. . Variability in hepatic iron concentration measurement from needle-biopsy specimens. J Hepatol 1996;25(2):172–177.
- 6. . Cardiovascular T2-star (T2*) magnetic resonance for the early diagnosis of myocardial iron overload. Eur Heart J 2001;22(23):2171–2179.
- 7. . MRI R2 and R2* mapping accurately estimates hepatic iron concentration in transfusion-dependent thalassemia and sickle cell disease patients. Blood 2005;106(4):1460–1465.
- 8. . R2* magnetic resonance imaging of the liver in patients with iron overload. Blood 2009;113(20):4853–4855.
- 9. . Complex confounder-corrected R2* mapping for liver iron quantification with MRI. Eur Radiol 2021;31(1):264–275.
- 10. . Prospective Evaluation of an R2* Method for Assessing Liver Iron Concentration (LIC) Against FerriScan: Derivation of the Calibration Curve and Characterization of the Nature and Source of Uncertainty in the Relationship. J Magn Reson Imaging 2019;49(5):1467–1474.
- 11. . Noninvasive measurement and imaging of liver iron concentrations using proton magnetic resonance. Blood 2005;105(2):855–861.
- 12. . Multicenter validation of spin-density projection-assisted R2-MRI for the noninvasive measurement of liver iron concentration. Magn Reson Med 2014;71(6):2215–2223.
- 13. . The effect of reducing repetition time TR on the measurement of liver R2 for the purpose of measuring liver iron concentration. Magn Reson Med 2011;65(5):1346–1351.
- 14. . Classification Request for Ferriscan R2-MRI Analysis System Decision Summary. https://www.accessdata.fda.gov/cdrh_docs/reviews/K124065.pdf. Accessed September 23, 2021.
- 15. Resonance Health. https://ferriscan.com/ferriscan/. Accessed September 23, 2021.
- 16. . Effect of multipeak spectral modeling of fat for liver iron and fat quantification: correlation of biopsy with MR imaging results. Radiology 2012;265(1):133–142.
- 17. . Multipeak fat-corrected complex R2* relaxometry: theory, optimization, and clinical validation. Magn Reson Med 2013;70(5):1319–1331.
- 18. . Improved MRI R2 * relaxometry of iron-loaded liver with noise correction. Magn Reson Med 2013;70(6):1765–1774.
- 19. . Inter-method reproducibility of biexponential R2 MR relaxometry for estimation of liver iron concentration. Magn Reson Med 2018;80(6):2691–2701.
- 20. . Spectroscopy-based multi-parametric quantification in subjects with liver iron overload at 1.5T and 3T. Magn Reson Med 2022;87(2):597–613.
- 21. . Quantification of hepatic steatosis with T1-independent, T2-corrected MR imaging with spectral modeling of fat: blinded comparison with MR spectroscopy. Radiology 2011;258(3):767–775.
- 22. . Liver fat quantification using a multi-step adaptive fitting approach with multi-echo GRE imaging. Magn Reson Med 2014;72(5):1353–1365.
- 23. . Correction of phase errors in quantitative water-fat imaging using a monopolar time-interleaved multi-echo gradient echo sequence. Magn Reson Med 2017;78(3):984–996.
- 24. . Limits of Fat Quantification in the Presence of Iron Overload. J Magn Reson Imaging 2021;54(4):1166–1174.[Published correction appears in J Magn Reson Imaging 2022;55(6):1910.]
- 25. . Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol 2003;157(4):364–375.
- 26. . Reference limits for copper and iron in liver biopsies. Ann Clin Lab Sci 2003;33(4):443–450.
- 27. . Iron-chelating therapy and the treatment of thalassemia. Blood 1997;89(3):739–761.
- 28. . Hereditary hemochromatosis. Phenotypic expression of the disease. N Engl J Med 1979;301(4):175–179.
- 29. . Survival and causes of death in cirrhotic and in noncirrhotic patients with primary hemochromatosis. N Engl J Med 1985;313(20):1256–1262.
- 30. . Late Cardiac Complications of Chronic, Severe, Refractory Anemia with Hemochromatosis. Circulation 1964;30(5):698–705.
- 31. . Clinical Implementation of a Focused MRI Protocol for Hepatic Fat and Iron Quantification. AJR Am J Roentgenol 2019;213(1):90–95.
- 32. . Biopsy-based calibration of T2* magnetic resonance for estimation of liver iron concentration and comparison with R2 Ferriscan. J Cardiovasc Magn Reson 2014;16(1):40.
- 33. . R2* relaxometry for the quantification of hepatic iron overload: biopsy-based calibration and comparison with the literature. Rofo 2015;187(6):472–479.
- 34. . Measuring liver T2* and cardiac T2* in a single acquisition. Abdom Radiol (NY) 2018;43(9):2303–2308.
- 35. . Reproducibility of liver R2* quantification for liver iron quantification from cardiac R2* acquisitions. Abdom Radiol (NY) 2021;46(9):4200–4209.
- 36. . Quantitative ultrashort echo time imaging for assessment of massive iron overload at 1.5 and 3 Tesla. Magn Reson Med 2017;78(5):1839–1851.
Article History
Received: Dec 29 2021Revision requested: Feb 25 2022
Revision received: July 22 2022
Accepted: Aug 8 2022
Published online: Oct 04 2022