Implementation of the Radiological Society of North America Expert Consensus Guidelines on Reporting Chest CT Findings Related to COVID-19: A Multireader Performance Study

Published Online:https://doi.org/10.1148/ryct.2020200276

Abstract

Purpose

To assess the performance of the Radiological Society of North America (RSNA) guidelines and quantify interobserver variability in application of the guidelines in patients undergoing chest CT for suspected coronavirus disease 2019 (COVID-19) pneumonia.

Materials and Methods

A retrospective search from January 15, 2020 to March 30, 2020 identified 89 consecutive CT scans whose radiologic report mentioned COVID-19. One positive or two negative reverse-transcription polymerase chain reaction tests for COVID-19 were considered the reference standard for diagnosis. Each chest CT scan was evaluated using RSNA guidelines by nine readers (six fellowship-trained thoracic radiologists and three radiology resident trainees). Clinical information was obtained from the electronic medical record.

Results

There was strong concordance of findings between radiology training levels with agreement ranging from 60% to 86% among attending physicians and trainees (κ, 0.43 to 0.86). Sensitivity and specificity of typical CT findings for COVID-19 per the RSNA guidelines were on average 86% (range, 72%–94%) and 80.2% (range, 75%–93%), respectively. Combined typical and indeterminate findings had a sensitivity of 97.5% (range, 94%–100%) and specificity of 54.7% (range, 37%–62%). A total of 163 disagreements were seen out of 801 observations (79.6% total agreement). Uncertainty in classification primarily derived from difficulty in ascertaining peripheral distribution, multiple dominant disease processes, or minimal disease.

Conclusion

The typical appearance category for COVID-19 CT reporting has an average sensitivity of 86% and specificity rate of 80%. There is reasonable interreader agreement and good reproducibility across various levels of experience.

Supplemental material is available for this article.

Keywords: Adults, CT, Infection

© RSNA, 2020

Summary

In a large multireader study involving nine attending physicians and resident trainees, assignment of the typical RSNA category had strong concordance of findings across levels of training, with agreement ranging from 60% to 86%. The average sensitivity was found to be 86% (range, 72%–94%), and average specificity was 80.2% (range, 75%–93%) for diagnosis of COVID-19 pneumonia, and assignment of typical or indeterminate categories had an average sensitivity of 97.5% (range, 94%–100%) and specificity of 54.7% (range, 37%–62%); commonly reported sources of uncertainty in assignment of categories were difficulty in assessing axial distribution and the presence of two or more patterns of disease.

Key Points

  • ■ Sensitivity and specificity of typical appearance for COVID-19 pneumonia at chest CT per RSNA guidelines were 86% (range, 72%–94%) and 80.2% (range, 75%–93%), respectively.

  • ■ There was strong concordance of findings between training levels, with agreement ranging from 60% to 86% among attending physicians and trainees (κ, 0.43 to 0.86).

  • ■ Future guideline revisions should consider addressing reader uncertainty regarding assessment of axial distribution, the presence of multiple perceived patterns, and other potential sources of reader disagreements.

Introduction

Coronavirus disease 2019 (COVID-19), the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has become a global health emergency. Chest CT has played a variety of roles in the course of the pandemic, including primary diagnosis, clinical problem solving, and assessment of potential complications. Commonly reported CT features of COVID-19 pneumonia include peripheral ground-glass opacities with or without consolidation, sometimes with an organizing lung injury appearance (13). These findings are nonspecific and can be seen in a variety of infectious and noninfectious etiologies (47). Early reports on the diagnostic performance of CT for detection of COVID-19 pneumonia vary substantially, with reported sensitivity ranging from 60% to 98% and specificity ranging from 25% to 53% (810). The role of chest CT in screening or primary diagnosis of COVID-19 pneumonia in locales in which polymerase chain reaction (PCR) testing is readily available is still evolving; however, chest CT plays an important role in assessing for potential complications and guides management in difficult COVID-19 cases (11,12).

Practice patterns vary across institutions and reporting of COVID-19 has not yet been universally standardized. The Radiological Society of North America (RSNA) has recently released reporting consensus guidelines for CT findings related to COVID-19, with the goal of decreasing reporting variability, reducing uncertainty in reporting, and assisting referring providers to better understand the radiologic findings (13). The guidelines contain four major categories based on the presence or absence of commonly described imaging features of COVID-19 pneumonia. Although an alternative option such as the COVID-19 Reporting and Data System (or CO-RADS) for categorizing CT scans has been more recently published, the RSNA consensus guidelines have been the most widely disseminated and would benefit from multireader validation during the implementation (2).

The utility of implementing these reporting guidelines in radiology practices remains unclear, and the sensitivity, specificity, and interreader variability utilizing the four categories has not been previously studied. There are limited data on the level of interreader agreement, sensitivity, and specificity, with early data suggesting moderate disagreement (14). Without this empirical data, it remains uncertain for radiologists to accurately convey the level of suspicion of the CT findings and for referring clinicians to understand the clinical relevance of this information.

Indeed, understanding the diagnostic yield of a specific category may help referring clinicians understand the degree of radiologist concern for COVID-19 and the radiologist’s confidence in the findings. This may influence pursuit of further diagnostic tests for COVID-19 or diagnostic workup or management for possible alternative causes of symptoms.

Imaging features of COVID-19 pneumonia are not uniform and can vary considerably, making implementation of the RSNA guidelines potentially challenging. Commonly described patterns of disease in COVID-19 are not specific to the disease and can be seen in other infections and inflammatory diseases. Patients with COVID-19 can present with negative, minimal, or atypical CT findings, or with CT findings of more than one disease. The diagnosis of COVID-19 at CT may be made by radiologists at multiple levels of training, such as the radiology resident in the emergency department, or by radiologists with varying degrees of thoracic radiology specialization. Thus, the interreader reproducibility of findings related to COVID-19 across training levels and specialization is uncertain. The purpose of this study, therefore, was to investigate the sensitivity and specificity of the RSNA/Society of Thoracic Radiology/American College of Radiology reporting categories for COVID-19 pneumonia and to assess interreader agreement.

Materials and Methods

Study Design and Setting

This retrospective study was performed at a large, quaternary academic medical center and associated health care system. This study was approved by the institutional review board with a waiver of informed consent, and patient privacy was ensured in compliance with the Health Insurance Portability and Accountability Act. All procedures and practices were in accordance with the Declaration of Helsinki.

Study Population

We queried our electronic imaging database for chest CT examinations performed between January 15, 2020 and March 30, 2020 and included those wherein COVID-19 pneumonia was suspected, based either on clinical indication or on radiologist suspicion as indicated in the radiology report. The reference standard for positive diagnosis of COVID-19 was at least one positive reverse-transcription PCR (RT-PCR) test for COVID-19 via nasopharyngeal swab, and the reference standard for negative for COVID-19 was two consecutive negative RT-PCR results. In our health care system, CT scans have been used as a clinical problem-solving tool rather than for screening or primary diagnosis of COVID-19. Studies were originally ordered with either suspicion for COVID-19 despite a single negative RT-PCR result (while waiting for a second test to result), concern for alternative diagnoses such as pulmonary embolism or bacterial pneumonia, or different indications such as malignancy staging with incidental findings suggestive of COVID-19 infection. Any nondiagnostic studies were excluded. A total of 123 patients were identified by CT report or report indication including the word “COVID.” Of these, 89 patients had a CT chest and a positive PCR test at any time prior to analysis, or at least two negative PCR tests, and were included for analysis. In the same time period, there were 711 cases of COVID-19 that were PCR positive. In keeping with national and international guidelines, CT was used for clinical problem solving and not as a primary diagnostic modality for COVID-19 at our center, which does add selection bias for sensitivity and specificity results. Ten patients were excluded for having a CT chest and only one negative PCR test, four patients were excluded for having a chest CT and having no COVID PCR test. Nine patients were excluded for being identified in the pull of reports but only had an abdominal/pelvic CT available, 11 patients were excluded for missing data in the medical record or on chart review and not being available for review, leading to missing data. One patient of these missing data patients did turn out on repeat chart review to be COVID-19 positive but was excluded from analysis due to not being included in the original reader study. During this time period at our institution, criteria for PCR testing included symptoms related to COVID-19, including cough, shortness of breath, fever, or being hospitalized. At the beginning of this period, due to testing restrictions, recent travel history from an endemic area or exposure to a person with known COVID-19 was required to receive a COVID-19 test. Two patients as such underwent a CT scan for concern for COVID-19 but were excluded for lack of a test being done. One patient had a test sent to an outside laboratory that never resulted and was excluded. One patient had a finding concerning for possible COVID-19 during staging for cholangiocarcinoma that did not fit their symptoms, and the patient was to undergo follow-up scan in 6–8 weeks but passed away in the interim. Clinical characteristics such as demographic data, symptoms, and comorbidities were obtained from the electronic medical record. Inability to obtain clinical data led to exclusion from the study.

CT Imaging Technical Parameters

Protocols of CT scan varied per patient and included noncontrast chest CT, contrast-enhanced chest CT, or CT pulmonary angiography studies. CT pulmonary angiography studies had, for a subset, dual-energy scans as well. All images were obtained with the patient in supine position using one of the following CT systems: Optima CT660 (GE, United States), SOMATOM Drive (Siemens Healthineers, Germany), Revolution Frontier (GE, United States), Lightspeed VCT (GE, United States), Biograph 64 (Siemens Healthineers, Germany), SOMATOM Definition Edge (Siemens Healthineers, Germany), Discovery CT750 HD (GE, United States), SOMATOM Definition Flash (Siemens Healthineers, Germany), SOMATOM Definition AS (Siemens Healthineers, Germany), SOMATOM Force (Siemens Healthineers, Germany), and Aquilion PRIME (Toshiba, Japan). The main scanning parameters were as follows: tube voltage = 120 kVp for chest CT and 140 kVp for chest CT pulmonary angiography (plus 80 kVP for dual energy), matrix = 512 × 512, section thickness = 1.25 mm, and field of view = 440 mm × 440 mm.

Radiology Readers and Preparation

Six thoracic fellowship-trained radiology attending physicians (B.P.L., S.M., S.G., A. Sharma, D.P.M., E.J.F.) with 1–15 years of independent clinical practice experience, subdivided into senior (> 5 years of experience B.P.L., S.M., A.S., and E.J.F.) and junior attending physicians (< 5 years of experience (D.P.M., S.G.), and three radiology residents (postgraduate year 2 to 4, A. Som, M.L., M.D.L.) independently reviewed all CT studies using standard picture archiving and communication system stations and software with standard window settings. Radiology residents were included as readers as they often provide independent preliminary reports while on call, and therefore understanding their consistency to attending physicians’ reports is important. Readers were not allowed to use prior CT or follow-up CT scans to make their assessment. Each reader assigned one category from the RSNA consensus document to each study. In addition, readers reported a 0–5 score for certainty for classification of a scan into the selected RSNA category, where 5 was most certain and 0 was least certain. Reasons for uncertainty or for selection of indeterminate or atypical patterns were reported using a free-text response.

Prior to CT review, radiologists studied the RSNA consensus guideline document, reviewed sample images, and had prior experience in reviewing and reporting CT examinations of patients with COVID-19 in our health care system. The radiology trainees were given a 1-hour tutorial of the RSNA consensus guidelines with sample images from the consensus document. All radiologists were blinded to the original CT reports and to clinical diagnoses, including the PCR results for SARS-CoV-2.

RSNA Criteria

Consistent with the consensus guidelines, each examination was labeled as having typical appearance, indeterminate appearance, atypical appearance or no evidence of pneumonia. Briefly, as described in more detail in the consensus guideline publication, peripheral bilateral ground-glass opacities with or without consolidation or intralobular lines, multifocal ground-glass opacity with rounded morphology with or without consolidation, or reverse halo sign were assigned the category of typical appearance. An indeterminate appearance was defined as absence of typical features and presence of ground-glass opacities with or without consolidation in a nonrounded, nonperipheral, perihilar, or diffuse distribution, or as few small ground-glass opacities with a nonrounded and nonperipheral distribution. An atypical appearance was defined as absence of typical or indeterminate features with presence of lobar/segmental consolidation without ground-glass opacities, discrete centrilobular nodules, lung cavitation, or smooth interlobular septal thickening with pleural effusion. Finally, if there were no CT findings to suggest pneumonia, it was assigned the category of negative for pneumonia. Examples of unanimous reader agreement for typical, indeterminate, atypical, and negative for COVID-19 pneumonia RSNA categories are shown in Figure 1.

Examples of cases assigned the same RSNA consensus COVID-19 category                         by all readers. A, Typical category assigned to CT study in a 53-year-old                         man with COVID-19 pneumonia who presented after 2 weeks of cough,                         congestion, and fevers. Axial CT image shows multiple ground-glass opacities                         with a peripheral predominance bilaterally, many with a round morphology. B,                         Indeterminate category assigned to CT study in an 82-year-old woman who                         presented with fever, exertional dyspnea, palpitations, and chest pain, with                         two polymerase chain reaction (PCR) tests negative for severe acute                         respiratory syndrome coronavirus 2 (SARS-CoV-2). Axial CT image shows a                         small amount of ground-glass opacity with a central predominance in the                         perihilar regions bilaterally. C, Atypical category assigned to CT study in                         a 79-year-old woman who presented with fever, productive cough, dyspnea, and                         hypoxemia; two PCR tests were negative for SARS-CoV-2. Axial CT image shows                         tree-in-bud nodules and consolidation in the lower lobes bilaterally, a                         pattern suggesting aspiration/pneumonia. D, Negative for pneumonia category                         assigned to CT study in a 30-year-old woman who presented with 1 week of dry                         cough, sore throat, and severe fatigue; two PCR tests were negative for                         SARS-CoV-2. Axial CT image shows a normal appearance of the lungs. Final                         diagnosis of symptoms was attributed to recurrent rheumatic myopericarditis                         within the context of her history of juvenile rheumatoid                         arthritis.

Figure 1: Examples of cases assigned the same RSNA consensus COVID-19 category by all readers. A, Typical category assigned to CT study in a 53-year-old man with COVID-19 pneumonia who presented after 2 weeks of cough, congestion, and fevers. Axial CT image shows multiple ground-glass opacities with a peripheral predominance bilaterally, many with a round morphology. B, Indeterminate category assigned to CT study in an 82-year-old woman who presented with fever, exertional dyspnea, palpitations, and chest pain, with two polymerase chain reaction (PCR) tests negative for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Axial CT image shows a small amount of ground-glass opacity with a central predominance in the perihilar regions bilaterally. C, Atypical category assigned to CT study in a 79-year-old woman who presented with fever, productive cough, dyspnea, and hypoxemia; two PCR tests were negative for SARS-CoV-2. Axial CT image shows tree-in-bud nodules and consolidation in the lower lobes bilaterally, a pattern suggesting aspiration/pneumonia. D, Negative for pneumonia category assigned to CT study in a 30-year-old woman who presented with 1 week of dry cough, sore throat, and severe fatigue; two PCR tests were negative for SARS-CoV-2. Axial CT image shows a normal appearance of the lungs. Final diagnosis of symptoms was attributed to recurrent rheumatic myopericarditis within the context of her history of juvenile rheumatoid arthritis.

Grading Uncertainty and Atypical or Indeterminate Findings

Reader free-text responses for reasons for uncertainty and for atypical or indeterminate findings were collated and blinded and were subsequently reviewed by two thoracic fellowship-trained radiologists, who by consensus discussion developed a coding scheme to capture the main themes mentioned by the readers. These two radiologists then sorted each response into the predetermined thematic categories determined by consensus by the two radiologists, blinded to reader information, clinical data, and the CT images.

Statistical Analysis

The data were analyzed with descriptive statistics, including mean and standard deviation, with categorical variables as frequencies. The κ score was used for analysis of interrater agreement (≤ 0 indicates no agreement, 0.01–0.20 indicates none to slight agreement, 0.21–0.40 as fair agreement, 0.41–0.60 as moderate agreement, 0.61–0.80 as substantial agreement, and 0.81–1.00 as almost perfect agreement). κ interrater agreement was defined in comparison to the mode of attending responses. All trainee and attending responses were compared with the mode (majority consensus) of attending responses. Sensitivity and specificity analyses were done for each individual reader with averages calculated for attending physicians and trainees. Consensus reads were calculated from the mode of attending observations. Positive predictive value, negative predictive value, accuracy, and diagnostic yield of the consensus reads were compared with RT-PCR results as the reference standard. Diagnostic yield was defined as the number of SARS-CoV-2 PCR-positive patients with a CT designated as positive (typical or indeterminate RSNA categories) divided by the total number of patients in the study population. Of the individual sensitivity and specificity, averages and 95% CIs were calculated. Statistical significance was set as P < .05. An exploratory logistic regression was completed exploring a composite clinical outcome of intensive care unit (ICU)/intubation as a function of age, sex, disagreement in RSNA classification, and RSNA grade. Statistical analysis was performed using Stata software (Stata, College Station, Tex).

Results

A total of 89 patients with CT scans meeting inclusion criteria were included in this study. The population had a mean age of 60.8 years ± 16.1 (standard deviation) and 41 (46%) were women (Table 1). Sixty-four (72%) patients reported race as White, and the majority had presenting symptoms of fever (57%), cough (60%), and shortness of breath (58%). The most common comorbidity was hypertension (53%). Of the patients included, 36 (40.4%) tested positive for COVID-19 by RT-PCR, and 53 (59.6%) were negative for COVID-19 infection. On average, CT scans were performed 6.9 days (95% CI: 5.3, 8.4) from start of reported symptoms. Eighty-four (94.4%) patients were admitted, with 15 (17%) requiring admission to the ICU; six of these patients at the ICU had a positive COVID PCR test. At the time of analysis, 64 (72%) had been discharged, 22 of whom had a positive COVID PCR test; and eight (9%) were deceased, five of whom had a positive COVID PCR test (Table 1). For patients who tested negative for COVID with PCR test, final diagnosis included, if available, bacterial pneumonia, viral (respiratory syncytial virus or parainfluenza virus) pneumonia, sickle cell crisis, diffuse large B-cell lymphoma, heart failure exacerbation or myocardial infarction, asthma exacerbation, among others. Of the 10 patients with chest CT and only one negative PCR result who were excluded from the study, four (40%) were men, 45 years old on average (range, 21–90 years), none were intubated, none went to the ICU, and the majority were diagnosed with unspecified pneumonia or viral upper respiratory infection. For all 34 excluded patients (with comparison to the inclusion group), the average age at time of scan was 56 years ± 21 (standard deviation) (P = .18 two-tailed t test). Twenty-six of the group had information on sex, of whom 13 of 26 (50%) were women (P = .72, χ2 test). Twenty-three had ethnicity and clinical outcome results: 21 of 23 (91%) were White, one of 23 (4%) Hispanic, and one of 23 (4%) Asian American (P = .37, χ2 test). One (4%) patient was admitted to the ICU (P = .18, Fisher exact test), and none were deceased (P = .2, Fisher exact test). There was no significant difference between these characteristics of the included and excluded patients.

Table 1: Summary of Cohort Characteristics in 89 Patients with CT Scans Meeting Inclusion Criteria

Table 1:

According to the majority (mode) of attending grades, 37 (41.5%) of patient CT scans were graded as typical, 24 (27%) indeterminate, 20 (22.5%) atypical, and eight (9%) negative for pneumonia (Table 2). Of patients in the typical group, 30 (81.1%) had a positive RT-PCR result and seven (18.9%) were negative for COVID-19 infection; in the indeterminate group, six (25%) were positive by RT-PCR, and 18 (75%) were negative; in the atypical group, 0 (0%) were positive by RT-PCR, and 20 (100%) were negative; in the negative for pneumonia group, 0 (0%) were positive by RT-PCR, and eight (100%) were negative (Table 2).

Table 2: Findings on Chest CT Studies in 89 Patients

Table 2:

The average sensitivity and specificity of attending readers for a typical finding were 86% (95% CI: 79.8, 92.2) sensitive and 80.2% (95% CI: 70.2, 90.1) specific. Sensitivity and specificity of typical and indeterminate grouped categorization, on average, were 97.5% (95% CI: 95.1, 99.9) and 54.7% (95% CI: 47.3, 62), respectively. Sensitivity and specificity of indeterminate categorization were, on average 14.2% (95% CI: 7.9, 20.5) and 69.8% (95% CI: 64.7, 74.9), respectively. Sensitivity and specificity of atypical categorization were 2% (95% CI: 0.04, 4) and 57% (95% CI: 47.3, 67.4), respectively (Table 3). No cases classified as negative for pneumonia were associated with positive RT-PCR results.

Table 3: Sensitivity/Specificity by Group: Sensitivity and Specificity by Attending Physicians and Trainees

Table 3:

When the mode of attending responses was considered a consensus diagnosis, sensitivity and specificity were similar to those of the average of different attending readers. Typical finding for COVID-19 at CT was 83.3% (range, 72%–94%) sensitive and 86.8% (range, 58%–93%) specific for a diagnosis of COVID-19 by RT-PCR. Grouping typical and indeterminate classifications resulted in 100% (range, 94%–100%) sensitivity and 53% (range, 37%–69%) specificity. The distribution of sensitivity and specificity was roughly the same between attending physicians, senior and junior, and trainees, despite large differences in training (Table 3).

Among attending physicians, as a consensus, the positive predictive value of typical findings in this population was 81.1% with a negative predictive value of 88.5%. The positive predictive value was 59% for typical and indeterminate findings, 25% for indeterminate findings and 0% for atypical findings. The negative predictive value was 100% for typical and indeterminate findings, 53.8% for indeterminate findings, and 41% for atypical findings. The diagnostic yield in this retrospective study for a positive PCR among attending physicians as a consensus was 33.7% for typical findings and 40.4% for typical or indeterminate findings. The diagnostic accuracy in this retrospective study for a positive PCR among attending physicians as a consensus, was 85.4% for typical findings, 71.9% for typical and indeterminate findings, 46% for indeterminate findings, and 30.9% for atypical findings.

Classification of patients by reader was roughly similar between different groups (Fig 2, A). A total number of 163 disagreements were seen out of 801 observations (79.6% total agreement). Using the mode (majority) of classifications from attending readers as a consensus comparison, interrater agreement among attending physicians was moderate to high, with a κ ranging from 0.43 to 0.86 and a range of agreement from 61% to 89% (Table 3).

Reasoning for atypical/indeterminate RSNA score among attending physicians                     and trainees. A, Distribution of scores among different readers. B, Percentage                     of cases with particular reasons for being assigned a category of indeterminate                     or atypical. GGO = ground-glass opacity.

Figure 2: Reasoning for atypical/indeterminate RSNA score among attending physicians and trainees. A, Distribution of scores among different readers. B, Percentage of cases with particular reasons for being assigned a category of indeterminate or atypical. GGO = ground-glass opacity.

There were 21 cases with at least three disagreements among attending physicians. The most common disagreement was between indeterminate and atypical categories. Eleven patients (52%) had a consensus grade among attending physicians of indeterminate. The second most common RSNA grade for these patients was typical for eight patients, and atypical for three patients. Three (14%) had an RSNA consensus grade of no pneumonia; the second most common RSNA grade for all of these patients was atypical. Three (14%) had a consensus grade of atypical, of which two had the second most common classification of indeterminate, and one had the classification of no pneumonia. Four (19%) had the consensus classification of typical for which the second most common classification was atypical (2) or indeterminate (2). Based on uncertainty comments, the majority of indecision for these cases had issues with limited numbers of findings to make a decision, particularly leading to issues of choosing between typical and indeterminate findings.

Trainees also tended to agree moderately well with the attending modes for assigned category, with a κ of 0.62 to 0.77 and an agreement of 74%–84%. The primary reasons for a patient to be classified as atypical or indeterminate were a diffuse or unclear distribution, a finding of tree-in-bud or pure centrilobular nodules, focal consolidation, and pleural effusions. Less common reasons included few ground-glass opacities, unilateral or central opacities, atelectasis, septal thickening, and cavitary or infarctlike lesions (Fig 2, B).

An exploratory multivariable logistic regression analysis was done to assess whether a study having multiple reader disagreements was associated with a composite outcome of intubation or ICU admission. Although the results did not reach statistical significance, we found the odds ratio of having three disagreements (odds ratio, 0.39; P = .317) or two disagreements (odds ratio, 0.18; P = .13) to trend toward protective effects (Table E1 [supplement]). This may be because patients with disagreements tended to have less extensive pulmonary findings and thus less likely to have a negative outcome. Further study with a larger sample is necessary to evaluate this hypothesis.

Uncertainty among trainees and attending physicians tended to be associated with two or more dominant findings suggestive of multiple processes, minimal disease, and an ambiguous distribution or morphology of findings. Example sections from scans with significant interrater disagreement can be seen in Figure 3. Other sources of uncertainty included atelectasis, nodule morphology, limitations of technique, presence of pre-existing disease, or peribronchiolar pattern suggestive of organizing pneumonia (Fig 4, A). The self-reported certainty score on a scale of 1 (most unsure) to 5 (most confident) tended to be between 4 and 5 on average, without a significant difference between attending physicians and trainees (Fig 4, B). Although the absolute delta in average uncertainty is small, on average, senior attending physicians had less uncertainty in their classifications than trainees (P = .0011, two-tailed t test) or junior attending physicians (P = .0001, two-tailed t test) (Fig 4, B). The average number of disagreements per case was 1.8 ± 0.17 (standard error). The plurality of cases did not have any disagreements, though the majority of cases had at least one reader disagree on characterization of cases (Fig 4, C). Of note, the uncertainty score for indeterminate cases was significantly reduced compared with scores for typical cases (Fig 4, D).

Examples of cases for which there was significant disagreement in                     assignment of RSNA consensus COVID-19 category. A, Image in a 67-year-old man                     with clinical signs of pneumonia and four negative polymerase chain reaction                     (PCR) tests for COVID-19 with sputum samples positive for Streptococcus                     pneumoniae. Axial CT image shows a combination of tree-in-bud centrilobular                     nodules in the lower lobes and peripheral ground-glass opacity and consolidation                     in the left lower lobe. Categories 3, 2, and 1 were assigned by four, three, and                     two readers, respectively. B, Image in a 23-year-old man with two negative PCR                     results for severe acute respiratory syndrome coronavirus 2 and presumed                     aspiration or non–COVID-19 infection. Axial CT image shows minimal patchy                     ground-glass opacities in the left lower lobe; there was a question of                     atelectasis or subtle peripheral ground-glass opacity in the posterior right                     lower lobe. Categories 3, 2, 1, and 0 were assigned by one, six, one, and one                     reader, respectively. C, Image in a 64-year-old woman with PCR-proven COVID-19                     pneumonia who presented with fever, productive cough, fatigue, and anosmia.                     Axial CT image shows patchy ground-glass opacities in the lingula and a small                     amount of peripheral ground-glass opacity and atelectasis in the posterior lower                     lobes. Categories 3, 2, 1, and 0 were assigned by four, three, one, and one                     reader, respectively. Reasons given by readers for uncertainty included doubts                     about peripheral distribution and difficulty in classification in the setting of                     minimal disease and posterior atelectasis. D, Image in a 65-year-old woman with                     PCR-proven COVID-19 pneumonia who presented with palpitations, back pain, and                     low-grade fevers. Axial CT image shows patchy ground-glass opacities                     bilaterally. Categories 3 and 2 were assigned by five and four readers,                     respectively. Reasons given by readers for uncertainty included difficulty in                     classifying as peripheral or diffuse and questionable morphology of the                     ground-glass opacities.

Figure 3: Examples of cases for which there was significant disagreement in assignment of RSNA consensus COVID-19 category. A, Image in a 67-year-old man with clinical signs of pneumonia and four negative polymerase chain reaction (PCR) tests for COVID-19 with sputum samples positive for Streptococcus pneumoniae. Axial CT image shows a combination of tree-in-bud centrilobular nodules in the lower lobes and peripheral ground-glass opacity and consolidation in the left lower lobe. Categories 3, 2, and 1 were assigned by four, three, and two readers, respectively. B, Image in a 23-year-old man with two negative PCR results for severe acute respiratory syndrome coronavirus 2 and presumed aspiration or non–COVID-19 infection. Axial CT image shows minimal patchy ground-glass opacities in the left lower lobe; there was a question of atelectasis or subtle peripheral ground-glass opacity in the posterior right lower lobe. Categories 3, 2, 1, and 0 were assigned by one, six, one, and one reader, respectively. C, Image in a 64-year-old woman with PCR-proven COVID-19 pneumonia who presented with fever, productive cough, fatigue, and anosmia. Axial CT image shows patchy ground-glass opacities in the lingula and a small amount of peripheral ground-glass opacity and atelectasis in the posterior lower lobes. Categories 3, 2, 1, and 0 were assigned by four, three, one, and one reader, respectively. Reasons given by readers for uncertainty included doubts about peripheral distribution and difficulty in classification in the setting of minimal disease and posterior atelectasis. D, Image in a 65-year-old woman with PCR-proven COVID-19 pneumonia who presented with palpitations, back pain, and low-grade fevers. Axial CT image shows patchy ground-glass opacities bilaterally. Categories 3 and 2 were assigned by five and four readers, respectively. Reasons given by readers for uncertainty included difficulty in classifying as peripheral or diffuse and questionable morphology of the ground-glass opacities.

Uncertainty among attending physicians and trainees. A, Reasons for                     uncertainty among cases as a percentage of all cases reviewed. OP =                     organizing pneumonia. B, Average certainty scores between attending physicians                     and trainees. C, Histogram of number of readers with scores discrepant from                     attending consensus. D, Average certainty score by RSNA categorization, *                     indicates statistical significance P < .05 (two-tailed t test).

Figure 4: Uncertainty among attending physicians and trainees. A, Reasons for uncertainty among cases as a percentage of all cases reviewed. OP = organizing pneumonia. B, Average certainty scores between attending physicians and trainees. C, Histogram of number of readers with scores discrepant from attending consensus. D, Average certainty score by RSNA categorization, * indicates statistical significance P < .05 (two-tailed t test).

Discussion

The RSNA consensus guidelines have provided guidelines for standardization of reporting CT findings for COVID-19 pneumonia and a framework for consistently elucidating results to referring clinicians (13). The guidelines account for features of COVID-19 pneumonia commonly reported in the existing literature, but it is unclear how the guidelines have been interpreted and implemented among radiologists of different training levels. In this study, we assessed the diagnostic yield and diagnostic accuracy of the RSNA guidelines for CT reporting of suspected COVID-19 pneumonia, assessed interreader agreement among radiologists of different training levels for RSNA category assignment, and analyzed the distribution of RSNA consensus category scores.

Our findings concur with the literature showing the sensitivity of CT for COVID-19 pneumonia is high (10,15), as we found the combination of typical and indeterminate categorizations had an average sensitivity of 97.5% (range, 94%–100%) among both attending physicians and radiology residents. In addition, considered together, assignment of typical or indeterminate category had a specificity of 54.7% (range, 37%–60%), which matches previously reported findings (10,15), while typical findings alone had a higher specificity at 80.2%. Selection criteria for this study attempted to replicate a tertiary center where CT is used as a problem-solving tool and not as a primary screening tool, concurrent with the RSNA and American College of Radiology guidelines. The sensitivity and specificity of the guidelines reported here must be considered with that context. While all studies categorized as negative for pneumonia were found to be RT-PCR negative in our cohort, an absence of CT findings does not exclude the possibly of COVID-19 infection. Prior studies have shown that chest CT may appear normal during early stages of infection or in those that are asymptomatic (16). However, in these prior studies, CT was used a screening and diagnostic tool, whereas the use of CT in our study was primarily for assessment of complications or guiding management in difficult cases. Thus, there may be possible selection bias in our study as patients are all symptomatic at the time of imaging. Future studies with larger cohort size are needed to better detail the prevalence of normal CT findings in patients with COVID-19 infection.

Our results indicate a relatively high concordance among radiologists of varying level of training and experience, ranging from 1st-year radiology residents to fellowship-trained academic radiologists with more than 10 years of experience, which suggests that the RSNA guidelines are clear and feasible to implement. Radiology trainees may be on the front lines of the emergency department response of COVID-19 during the day and overnight in the reading room and lately on the wards (17). A clear guideline amenable to early trainees is more likely to result in more timely care during the pandemic. The rate of concordance was similar to slightly better to that recently reported for CO-RADS (COVID-19 Reporting and Data System) (2). However, for a majority of cases (65 [73%]), at least one reader selected a category different from the remaining readers, with 14 (15.7%) cases having up to half the readers reporting discrepant categories. Trainees and attending physicians had difficulty classifying certain CT scans. Drivers of uncertainty included multiple processes, minimal disease, and an ambiguous distribution or morphology of findings. Concordant to Hickam’s dictum, patients with COVID-19 may present with concurrent non–COVID-19 pathologic conditions, with early reports suggesting 20% of patients may have additional coinfections (7). Future guidelines should consider providing clarification in cases with CT findings from multiple categories and discuss the degree of certainty to which the categorization is placed. We also found that the terms peripheral, rounded opacities, and signs of organizing pneumonia were not well defined, causing varied interpretations for patients with peribronchovascular disease and ground-glass opacities that extended centrally or diffusely. Finally, patients with limited disease remained a source of uncertainty, consistent with prior reports that minimal disease can present as atypical or confusing patterns (18,19). This study is limited by the use of RT-PCR test as a reference standard, as it has a false-negative rate of up to 63% for nasopharyngeal swabs (13,20). The biases and imperfect accuracy associated with the RT-PCR test need to be recognized, and future methods to improve the diagnostic accuracy are urgently needed. Methodologies such as composite reference standard and latent class model may be viable strategies to improve accurate detection of true COVID-19 cases (11). This may be accomplished by combining RT-PCR results with additional test results such as chest CT and potentially identifying latent classes that are better markers for COVID-19 infection (21). The reported sensitivity and specificity of RT-PCR testing vary across studies, with lower end estimates of 70% for sensitivity and 95% for specificity (22,23). While there are no data on the specificity of two consecutive negative RT-PCR tests, we used two negative RT-PCR tests to improve the possibility that the patient did not have COVID-19 but could not exclude the possibility entirely. Until more accurate testing methods are widespread, RT-PCR remains the best validation tool available. Another limitation was that CT imaging was not correlated to timing of symptoms, with the possibility that different stages of COVID-19 infection may have a higher predilection for certain RSNA categories. Not all patients with a positive COVID-19 test undergo a CT scan at our institution, which adds an element of selection bias, and this is not a study of all COVID-19 patients at our institution. The study was also limited by the single health care system, retrospective design. A prospective sequential inclusion of CT scans would have been ideal for studying this question, but this was not practical at our center at the start of the pandemic. In addition, because of difficulties in accessing a master list of patients at our hospital who tested positive for SARS-CoV-2 over the inclusion time period, we cannot assess the exact proportions of patients who underwent CT scans for suspicion of COVID-19, suspicion of pulmonary embolism, or had COVID-19 diagnosed at CT as an unsuspected condition. Selection bias may have resulted from our institutional use of CT as a clinical problem-solving tool rather than a method of primary COVID-19 diagnosis; in addition, application of the RSNA categories will likely be influenced by the local prevalence of disease, as well as prevalence of other infectious or noninfectious etiologies, which will impact the positive and negative predictive values. A strength of our study was the relatively large number of negative COVID-PCR studies that were evaluated enabling a greater test of the RSNA guidelines. Of the 53 patients who had two negative RT-PCR tests, the diagnosis for these patients included bacterial pneumonia (15 patients), atypical or viral pneumonia (six patients), cardiac related (seven related), and cancer related (seven patients); 10 patients were admitted for other unrelated reasons including trauma, cholecystitis, alcohol intoxication, sickle cell crisis, bacterial colitis, liver transplant, and venous thrombosis; eight patients did not have a definitive diagnosis. Finally, previous experience with findings of COVID-19 at chest CT can vary substantially even among radiologists with similar years of subspecialty experience, and performance of our group of readers may not reflect that of those at other institutions.

In the setting of a pandemic, the rapid implementation of standardized CT reporting has been very helpful for communicating clearly and effectively to providers about the potential of COVID-19 infection. The RSNA consensus statement serves as an important guideline for both detection of features typical for COVID-19 pneumonia and identification of features that might be seen in other infections or that might suggest alternative diagnoses (11). In regions in which PCR testing is not severely limited, CT is useful as a tool to follow COVID-19 lung disease and to rule out additional disease such as pulmonary embolism or non-COVID pneumonia (11). The simplicity of the RSNA consensus guidelines allows implementation by radiologists with varying levels of training. Future iterations of the guidelines should consider addressing the uncertainties found in this study to improve radiologist confidence in raising the possibility of COVID-19 pneumonia.

Disclosures of Conflicts of Interest: A. Som disclosed no relevant relationships. M.L. disclosed no relevant relationships. T.Y. disclosed no relevant relationships. D.C. disclosed no relevant relationships. S.G. disclosed no relevant relationships. D.P.M. disclosed no relevant relationships. E.J.F. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: grant funding from the American College of Radiology Innovation Fund and the National Cancer Institute Research Diversity Supplement for work not related to this manuscript. Other relationships: disclosed no relevant relationships. M.D.L. disclosed no relevant relationships. A. Sharma Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: institution has grant from Hummingbird Diagnostics; author receives royalties from Elsevier (co-editor for Thoracic Imaging: The Requisites; no monies received to date). Other relationships: disclosed no relevant relationships. S.M. disclosed no relevant relationships. J.O.S. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: author receives book royalties from Elsevier; editorial board member of Radiology: Cardiothoracic Imaging. Other relationships: disclosed no relevant relationships. B.P.L. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: textbook author and editor for Elsevier and receives royalties for his prior work. Other relationships: disclosed no relevant relationships.

Acknowledgments

We would like to acknowledge the efforts of Nick Reid, Nicholos Joseph, and JC Panagides of the Harvard Medical School Radiology COVID research team for their aid in data collection.

Author Contributions

Author contributions: Guarantors of integrity of entire study, A. Som, T.Y., B.P.L.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, A. Som, M.L., T.Y., D.C., S.G., D.P.M., E.J.F., A. Sharma, B.P.L.; clinical studies, A. Som, M.L., T.Y., D.C., S.G., M.D.L., A. Sharma; experimental studies, E.J.F.; statistical analysis, A. Som, M.L., T.Y., D.C., E.J.F.; and manuscript editing, A. Som, M.L., T.Y., D.C., S.G., D.P.M., E.J.F., M.D.L., A. Sharma, S.M., J.O.S.

* A.S. and M.L. contributed equally to this work.

Authors declared no funding for this work.

References

  • 1. Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A. Coronavirus Disease 2019 (COVID-19): A Systematic Review of Imaging Findings in 919 Patients. AJR Am J Roentgenol 2020;215(1):87–93. Crossref, MedlineGoogle Scholar
  • 2. Wang Y, Dong C, Hu Y, et al. Temporal Changes of CT Findings in 90 Patients with COVID-19 Pneumonia: A Longitudinal Study. Radiology 2020;296(2):E55–E64. LinkGoogle Scholar
  • 3. Shi H, Han X, Jiang N, et al. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. Lancet Infect Dis 2020;20(4):425–434. Crossref, MedlineGoogle Scholar
  • 4. Bai HX, Hsieh B, Xiong Z, et al. Performance of Radiologists in Differentiating COVID-19 from Non-COVID-19 Viral Pneumonia at Chest CT. Radiology 2020;296(2):E46–E54. LinkGoogle Scholar
  • 5. Chung M, Bernheim A, Mei X, et al. CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV). Radiology 2020;295(1):202–207. LinkGoogle Scholar
  • 6. Kong W, Agarwal PP. Chest Imaging Appearance of COVID-19 Infection. Radiol Cardiothorac Imaging 2020;2(1):e200028. LinkGoogle Scholar
  • 7. Bernheim A, Mei X, Huang M, et al. Chest CT Findings in Coronavirus Disease-19 (COVID-19): Relationship to Duration of Infection. Radiology 2020;295(3):200463. LinkGoogle Scholar
  • 8. Inui S, Fujikawa A, Jitsu M, et al. Chest CT Findings in Cases from the Cruise Ship “Diamond Princess” with Coronavirus Disease 2019 (COVID-19). Radiol Cardiothorac Imaging 2020;2(2):e200110. Google Scholar
  • 9. Fang Y, Zhang H, Xie J, et al. Sensitivity of Chest CT for COVID-19: Comparison to RT-PCR. Radiology 2020;296(2):E115–E117. LinkGoogle Scholar
  • 10. Ai T, Yang Z, Hou H, et al. Correlation of Chest CT and RT-PCR Testing for Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology 2020;296(2):E32–E40. LinkGoogle Scholar
  • 11. American College of Radiology. ACR Recommendations for the use of Chest Radiography and Computed Tomography (CT) for Suspected COVID-19 Infection. https://www.acr.org/Advocacy-and-Economics/ACR-Position-Statements/Recommendations-for-Chest-Radiography-and-CT-for-Suspected-COVID19-Infection. Updated March 22, 2020. Accessed April 15, 2020. Google Scholar
  • 12. Rubin GD, Ryerson CJ, Haramati LB, et al. The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society. Radiology 2020;296(1):172–180. LinkGoogle Scholar
  • 13. Simpson S, Kay FU, Abbara S, et al. Radiological Society of North America Expert Consensus Statement on Reporting Chest CT Findings Related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA - Secondary Publication. J Thorac Imaging 2020;35(4):219–227. Crossref, MedlineGoogle Scholar
  • 14. de Jaegere TMH, Krdzalic J, Fasen BACM, Kwee RM; COVID-19 CT Investigators South-East Netherlands (CISEN) study group. Radiological Society of North America Chest CT Classification System for Reporting COVID-19 Pneumonia: Interobserver Variability and Correlation with RT-PCR. Radiol Cardiothorac Imaging 2020;2(3):e200213. LinkGoogle Scholar
  • 15. Wong HYF, Lam HYS, Fong AHT, et al. Frequency and Distribution of Chest Radiographic Findings in Patients Positive for COVID-19. Radiology 2020;296(2):E72–E78. LinkGoogle Scholar
  • 16. Kim D, Quinn J, Pinsky B, Shah NH, Brown I. Rates of Co-infection Between SARS-CoV-2 and Other Respiratory Pathogens. JAMA 2020;323(20):2085–2086. Crossref, MedlineGoogle Scholar
  • 17. Jones J. Case Study: Answering the Call. https://www.acr.org/Practice-Management-Quality-Informatics/Imaging-3/Case-Studies/Quality-and-Safety/Answering-the-Call. Published March 30, 2020. Accessed April 15, 2020. Google Scholar
  • 18. Wang W, Xu Y, Gao R, et al. Detection of SARS-CoV-2 in Different Types of Clinical Specimens. JAMA 2020;323(18):1843–1844. MedlineGoogle Scholar
  • 19. Corman VM, Landt O, Kaiser M, et al. Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Euro Surveill 2020;25(3):2000045. Crossref, MedlineGoogle Scholar
  • 20. Zitek T. The Appropriate Use of Testing for COVID-19. West J Emerg Med 2020;21(3):470–472. Crossref, MedlineGoogle Scholar
  • 21. Fang X, Li X, Bian Y, Ji X, Lu J. Relationship between clinical types and radiological subgroups defined by latent class analysis in 2019 novel coronavirus pneumonia caused by SARS-CoV-2. Eur Radiol 2020. 10.1007/s00330-020-06973-9. Published online May 30, 2020. Google Scholar
  • 22. Arevalo-Rodriguez I, Buitrago-Garcia D, Simancas-Racines D, et al. False-negative results of initial RT-PCR assays for COVID-19: A Systematic Review. [preprint] medRxiv. Posted August 13, 2020. Accessed August 15, 2020. Google Scholar
  • 23. Watson J, Whiting PF, Brush JE. Interpreting a covid-19 test result. BMJ 2020;369:m1808. Crossref, MedlineGoogle Scholar

Article History

Received: May 2 2020
Revision requested: June 11 2020
Revision received: Aug 27 2020
Accepted: Sept 1 2020
Published online: Sept 10 2020