Code and Data Sharing Practices in the Radiology Artificial Intelligence Literature: A Meta-Research Study

Published Online:



To evaluate code and data sharing practices in original artificial intelligence (AI) scientific manuscripts published in the Radiological Society of North America (RSNA) journals suite from 2017 through 2021.

Materials and Methods

A retrospective meta-research study was conducted of articles published in the RSNA journals suite from January 1, 2017, through December 31, 2021. A total of 218 articles were included and evaluated for code sharing practices, reproducibility of shared code, and data sharing practices. Categorical comparisons were conducted using Fisher exact tests with respect to year and journal of publication, author affiliation(s), and type of algorithm used.


Of the 218 included articles, 73 (34%) shared code, with 24 (33% of code sharing articles and 11% of all articles) sharing reproducible code. Radiology and Radiology: Artificial Intelligence published the most code sharing articles (48 [66%] and 21 [29%], respectively). Twenty-nine articles (13%) shared data, and 12 of these articles (41% of data sharing articles) shared complete experimental data by using only public domain datasets. Four of the 218 articles (2%) shared both code and complete experimental data. Code sharing rates were statistically higher in 2020 and 2021 compared with earlier years (P < .01) and were higher in Radiology and Radiology: Artificial Intelligence compared with other journals (P < .01).


Original AI scientific articles in the RSNA journals suite had low rates of code and data sharing, emphasizing the need for open-source code and data to achieve transparent and reproducible science.

Keywords: Meta-Analysis, AI in Education, Machine Learning

Supplemental material is available for this article.

© RSNA, 2022


Among the few code sharing artificial intelligence articles published in the Radiological Society of North America journals, most cannot be reproduced owing to insufficient documentation and data sharing.

Key Points

  • ■ A minority (34% [73 of 218]) of original artificial intelligence (AI) scientific articles published in the Radiological Society of North America suite of journals between January 1, 2017, and December 31, 2021, shared code, and only 33% (24 of 73) of those studies had adequate documentation for code implementation, which limits reproducibility.

  • ■ Of the included articles, 2% (four of 218) shared both code and complete experimental data, which is a small number of feasibly reproducible AI studies.

  • ■ This report highlights areas for improvement in methodological reporting of radiology AI literature and suggests avenues for journals and conferences to encourage the use of open-source code and data.


Artificial intelligence (AI) and deep learning (DL) research in radiology has surged in popularity over the past few years, with exponential increases in the number of AI original studies in the radiology literature since 2010 (1). In response to this growing interest, initiatives to train radiologists about AI and DL have been introduced, such as tutorials about how to code DL algorithms (2). On a more formal level, radiology residency programs have adopted training pathways in data science and AI for radiology trainees (3,4).

Along with growing interest in AI and DL among radiologists, transparency of AI algorithm development in scientific manuscripts by sharing open-source code and/or data has also gained importance (57). These practices serve to both facilitate reproducible science and benefit the scientific community by accelerating development of these potentially game-changing technologies. Accordingly, best practice guidelines for AI scientific reporting, such as the Checklist for Artificial Intelligence in Medical Imaging (CLAIM) (8), recommend sharing code and/or data as a standard practice. Our own anecdotal findings indicate that AI publications in the radiology literature have seldom shared code. Furthermore, the extent to which shared code has been properly documented to allow reimplementation and reproduction is unclear.

The purpose of this study was to evaluate code and data sharing practices in AI scientific manuscripts published in leading radiology journals from January 1, 2017, through December 31, 2021.

Materials and Methods

Study Search Criteria and Screening

This was a retrospective study performed with public data and, therefore, was not subject to institutional review board approval. All articles published in the Radiological Society of North America (RSNA) suite of journals (Radiology, Radiology: Artificial Intelligence, Radiology: Cardiothoracic Imaging, Radiology: Imaging Cancer) between January 1, 2017, and December 31, 2021, were reviewed. A flow diagram of the search process is presented in Figure 1.

 Flow diagram of the literature search and article selection process.                         AI = artificial intelligence, RSNA = Radiological Society of North                         America.

Figure 1: Flow diagram of the literature search and article selection process. AI = artificial intelligence, RSNA = Radiological Society of North America.

First, we excluded nonoriginal articles (eg, review articles) on the basis of the stated article type. Second, we conducted a full-text screening of each article. Articles included were original scientific articles describing the development and evaluation of an AI method. Studies that did not both develop and evaluate an AI method (eg, studies that tested a commercially available AI algorithm) were excluded. An AI method was defined as implementation of a machine learning algorithm toward a radiologic task. Both DL and non-DL approaches were included under this definition. All studies were screened by consensus between two reviewers (one research assistant [K.V., 3 years’ experience in deep learning in radiology] and one board-certified radiologist [P.H.Y., 5 years’ experience in deep learning for radiology]). A list of included articles is provided in Appendix E1 (supplement).

Article Characteristics

For each included article, we recorded the following baseline characteristics: year and month of publication, journal of publication, radiology subspecialty or topic (abdominal, breast, cardiac, chest, medical physics, musculoskeletal, neuroradiology, natural language processing [NLP], nuclear medicine, pediatric, and thyroid), first author affiliation (industry or academic), and whether any author was affiliated with a for-profit company. Regarding AI study characteristics, we recorded whether the AI method employed DL and the type of data analyzed.

Code Sharing Practices and Code Reproducibility

For each article, we evaluated code sharing practices by considering the following factors: if the article states that code is available, if the code was accessible, method of code sharing (eg, GitHub repository, laboratory or personal website), and additional contents provided with the code (eg, trained models, model weights, Docker containers).

Articles that shared code were evaluated for reproducibility, defined as adequate documentation to allow independent reproduction of the study experiments. Adequacy of code documentation was based on satisfying two criteria: sufficient in-line commenting explaining each code snippet, and thorough instructions explaining how to use the code in the experimental design. In-line commenting was assessed at a functional level with the expectation that each subroutine, if not most subroutines, in the code should be reasonably documented with a description of its associated function. Brief explanations were allowed under this definition because the manuscript is available to supplant the code implementations. Additionally, code repositories were expected to include instructions on how to execute the experiments out of the box. The decision to require these standards was motivated by the potential needs of independent research teams seeking to advance new methodologies; having only the first condition (sufficient in-line commenting) makes running the study experiments difficult, and having only the second condition (instructions on how to use the code) inhibits iterating and improving on implemented techniques. For each article that shared code, a binary "all-or-none" decision of reproducibility and a qualitative assessment of each condition was rendered.

All evaluations for code sharing and code reproducibility were made by a study author proficient in Python and DL frameworks (K.V., 3 years’ experience). Any code deemed borderline for reproducibility was adjudicated by a study author with a PhD in computer science (J.S., >10 years’ experience).

Data Sharing Practices

We recorded whether articles shared data used to develop and test their algorithm(s), which are necessary for complete reproduction of study results. Studies that used publicly available data were considered to have shared data. We also recorded the format of provided data. We noted if studies shared only part of the data used for their experiments.

Statistical Analysis

Descriptive statistics were used to summarize article characteristics as well as code and data sharing practices. Categorical comparisons were performed using Fisher exact tests in R software (version 4.1.1; The R Project for Statistical Computing) for the following characteristics: if code was shared, if shared code was reproducible, and if data were shared (at all). These outcomes were compared between the following categories: year of publication (2021 vs other years studied [to evaluate for differences in the year after the CLAIM checklist (8) was published]), journal of publication (all four RSNA journals, each journal in pairwise comparison, and Radiology: Artificial Intelligence vs all other titles), affiliation of first author (academic vs industry), company affiliation (yes or no), whether DL was used (yes or no), and data type (imaging vs text vs other). Owing to the ubiquity of chest radiograph datasets in radiology AI (913), we also assessed if studies evaluating chest radiographs had higher rates of these outcomes compared with studies evaluating other data types. We did not compare these outcomes between radiology subspecialities owing to the many categories and small sample sizes in each. P < .05 was considered indicative of a statistically significant difference.


Article Screening and Characteristics

Of 2371 screened articles, 218 (9%) were included and 2153 (91%) were excluded (Fig 1). Most of the included articles were published in Radiology (104 of 218 [48%]), followed by Radiology: Artificial Intelligence (95 of 218 [44%]), Radiology: Cardiothoracic Imaging (13 of 218 [6%]), and Radiology: Imaging Cancer (six of 218 [3%]). The number of articles increased over time, with five in 2017, 22 in 2018, 54 in 2019, 71 in 2020, and 66 in 2021. Table 1 shows the distribution of included articles over the queried years and journals.

Table 1: Distribution of Code Sharing Articles by Year and Publication

Table 1:

Although all articles had at least one author from an academic institution, 28 of 218 (13%) had at least one author affiliated with a for-profit company. The majority had a first author with only academic affiliation(s) (210 of 218 [96%]). Few first authors had only industry affiliation(s) (seven of 218 [3%]), and one had both academic and industry affiliations (<1%). The most common specialties were neuroradiology (47 of 218 [22%]) and chest imaging (42 of 218 [19%]) (Table 2). Most studies analyzed a single data type, most commonly imaging (212 of 218 [97%]), followed by text (three of 218 [1%]) and tabular data (two of 218 [1%]). One article used both imaging and text data (<1%). Few articles analyzed chest radiograph data (17 of 218 [8%]; 8% of articles analyzing imaging data). Most studies used DL (168 of 218 [77%]).

Table 2: Distribution of Included Articles, Code Sharing Articles, and Reproducible Code Sharing Articles across Radiographic Subspecialties

Table 2:

Code Sharing Practices and Reproducibility

Although 79 of 218 articles (36%) stated that code was shared, one had an empty GitHub repository, two had broken GitHub links, one only shared data, and two did not provide a repository link. In total, 73 of 218 articles (34%) shared accessible code. Of these 73 articles, 65 (89%) shared a GitHub repository, six (8%) shared a laboratory or personal website link, one (1%) shared a Kaggle repository, and one (1%) shared printed code in the supplementary materials. In addition to sharing code, 16 (22%) articles that shared accessible code shared trained models and/or model weights, four (6%) shared Docker containers for the model, and one (1%) shared pseudocode.

Of 73 code sharing articles, 24 (33%; 11% of all articles [24 of 218]) had sufficient documentation to be considered reproducible. An example of code with insufficient documentation from a Radiology article is provided in Figure 2. Of the reproducible code sharing articles, 19 of 24 (79%) shared a GitHub repository, four (17%) shared a laboratory or personal website link, one (4%) shared a Kaggle repository, and none shared printed out code. Nine of the 24 articles (38%) shared trained models and/or model weights, three (13%) shared Docker containers for the model, and none shared pseudocode.

 Sample repository (identity blinded), consisting of a single source                         file without adequate in-line annotation or separate documentation, that was                         provided by one of the articles reviewed. This shared code was considered                         not reproducible.

Figure 2: Sample repository (identity blinded), consisting of a single source file without adequate in-line annotation or separate documentation, that was provided by one of the articles reviewed. This shared code was considered not reproducible.

The number of code sharing articles increased over time, with most published in 2020 and 2021 (54 of 73 articles [74%]) (Fig 3). Most code sharing articles were published in Radiology (48 of 73 [66%]) and Radiology: Artificial Intelligence (21 of 73 [29%]). Table 3 and Table 2 show the distribution of code sharing and reproducible code sharing articles, respectively, over both the queried years and journals. Few code sharing articles had a company affiliation (three of 73 [4%]). The most articles with code sharing by radiographic subspecialty were in neuroradiology (23 of 73 [32%]) and chest imaging (11 of 73 [15%]). Table 2 shows the distribution of articles and code sharing practices by subspecialty. Most code sharing articles analyzed images (71 of 73 [97%]), with two outliers analyzing tabular data (one of 73 [1%]) and text data (one of 73 [1%]). Few code sharing articles used chest radiograph imaging datasets (two of 73 [3%]). The majority of code sharing articles used DL methods (58 of 73 [80%]).

 Proportion of included code sharing artificial intelligence articles                         published per year.

Figure 3: Proportion of included code sharing artificial intelligence articles published per year.

Table 3: Distribution of Reproducible Code Sharing Articles by Year and Publication

Table 3:

Data Sharing Practices

In total, 29 of 218 articles (13%) shared data, either by using a publicly available dataset or by releasing a particular dataset used in the study. Of these, 12 articles shared complete experimental data (6% of all articles [12 of 218], 41% of data sharing articles [12 of 29]); all of these used only public domain datasets in their studies. Only three of 12 studies (25%) used chest radiograph datasets. The remaining 17 of 29 articles (8% of all articles) partially released their data. This typically resulted from the combined use of different datasets, some of which were proprietary and others that were publicly available. Articles shared data in a variety of different formats: reference to an open-source dataset (19 of 29 [66%]), link to or direct upload of the dataset (nine of 29 [31%]), and combined use of referencing and direct upload of the dataset (one of 29 [3%]). Four articles (2% of all articles [four of 218]) shared both code and complete experimental data.

Comparisons between Subgroups

Recent years showed higher rates of code sharing articles compared with earlier years, with 0% (0 of five) in 2017, 5% (one of 22) in 2018, 33% (18 of 54) in 2019, 42% (30 of 71) in 2020, and 36% (24 of 66) in 2021 (P < .01). Code sharing rates were higher in Radiology compared with the other journals, with 46% (48 of 104) in Radiology, 22% (21 of 95) in Radiology: Artificial Intelligence, 23% (three of 13) in Radiology: Cardiothoracic Imaging, and 17% (one of six) in Radiology: Imaging Cancer (P < .01). There was a higher rate of code sharing among articles without a company affiliation (70 of 190 [37%]) than among those with a company affiliation (three of 28 [11%]) (P < .01). Radiology exhibited higher rates of reproducible code sharing (16 of 104 [15%]) than Radiology: Artificial Intelligence (seven of 95 [7%]) (P < .01). Articles without a company affiliation showed a higher rate of reproducible code sharing (24 of 190 [13%]) than those with a company affiliation (0 of 28 [0%]) (P = .02).

Regarding data sharing practices, articles published in 2021 had higher rates of data sharing (14 of 66 [21%]) compared with other years (15 of 152 [10%]) (P = .03). Articles in Radiology: Artificial Intelligence had a higher data sharing rate than those in Radiology (21 of 95 [22%] and eight of 104 [8%], respectively; P < .01). Studies using chest radiograph data had a higher rate of data sharing than those that did not use chest radiograph data (six of 17 [35%] and 23 of 133 [17%], respectively; P = .01). Finally, we found a higher rate of data sharing among DL articles (27 of 167 [16%]) compared with non-DL articles (two of 51 [4%]) (P = .03).

We found no evidence of a difference for all other comparisons. Results for all comparisons are noted in Appendix E2 (supplement).


The growing use of AI in radiology research (1) has emphasized the importance of code sharing (5,8). Therefore, we evaluated code and data sharing practices in AI articles in the RSNA suite of journals. We found that most articles did not share code, despite recent recommendations to do so (8,14). Among studies that did share code, few provided sufficient documentation or data, which may defeat the purpose.

Despite recommendations by editorial board members of RSNA journals to share code (8,14), less than one-third of articles in our study did so. These findings are consistent with a previous study that reported a 21% code sharing rate in health care AI papers published from 2017 to 2019 compared with 39% and 48% in general computer vision and NLP research papers, respectively (7). A different study found that 6% of general AI research papers published between 2013 and 2016 shared code (15). Altogether, these findings are worrisome, because sharing code is a key component to facilitating transparent and reproducible science in AI research (5).

In addition to sharing code, it is important that this code be adequately documented. We found that less than one-third of studies that shared code had sufficient documentation to allow for experiment reproduction. For example, some studies shared a single source code file without adequate in-line annotation or separate documentation (Fig 2). One study demonstrated that none of 400 research papers from two general AI conferences documented all variables necessary to reproduce the results (15). The importance of sharing code and ensuring adequate documentation will only increase over time; this difficulty has been highlighted by the Ten Years Reproducibility Challenge (16,17), which was launched in 2019 to challenge researchers to re-execute code from papers published 10 or more years prior (17). Similar efforts have been introduced by Papers with Code (18), an online database of AI research papers with code implementations that created a Machine Learning Code Completeness Checklist to promote reproducible and adequately documented code repositories (19). Leading AI research conferences have adopted such guidelines as requirements for official submission in addition to hosting annual paper reproducibility challenges (6).

Although these low rates of code sharing and code documentation may be discouraging, our study showed upward trends of both over time, demonstrating growing recognition in the field of the importance of these practices. Nonetheless, there is room for improvement, which can be facilitated by journals and the peer-review process. For example, reproducible code sharing can be improved by radiology journals through mandatory code and documentation availability upon article submission, reproducibility checks during the peer-review process, and standardized publication of accompanying code repositories and model demos. Ultimately, increased code sharing could lead to faster and more collaborative scientific innovation.

In addition to code sharing, data availability is another key component of reproducibility of AI research studies, because DL models may have variable performance on different medical imaging datasets (20). Less than one-sixth of studies in our review shared data. Low rates of data sharing are understandable given the unique challenges of medical data, such as patient privacy concerns. Furthermore, research groups or AI companies might be reluctant to release proprietary data because of a desire to maintain a competitive edge. We note that the majority of studies that provided data used data from open-source datasets, such as the Stanford CheXpert dataset (9); this finding highlights the importance of these publicly released datasets to research.

Our study had limitations. First, we evaluated four journals from the RSNA journals suite, and our results may not apply to other journals (eg, more engineering-oriented journals, such as IEEE Transactions on Medical Imaging [21] or Medical Physics [22]). However, we chose these journals because of their wide readership among radiologists, as well as for their leadership in calling for open code sharing by journal editorial board members (8,14). Second, although we evaluated adequacy of code documentation via manual code review, we did not attempt code implementation owing to practical limitations, including time and low rates of data sharing. Nevertheless, we believe that study reproducibility is a worthwhile topic for future investigation or a society-led challenge, as done in the general AI communities (16,17). Finally, our findings may not reflect the current state of the art. Of note, we reviewed articles published through the end of 2021, so our findings are reasonably up-to-date. Additionally, we chose to start reviewing articles from 2017 to include the inception of the RSNA journals Radiology: Artificial Intelligence, Radiology: Cardiothoracic Imaging, and Radiology: Imaging Cancer and to align with the publication of seminal articles in AI for radiology (10,23) in 2017.

In summary, the majority of AI articles in the RSNA journals suite in the period studied did not share code, and those that did lacked sufficient documentation and data needed for reproducibility. These practices have recently improved, however. We summarize these gaps to constructively highlight areas for improvement and echo prior recommendations for open-sourcing of code and data.

Disclosures of conflicts of interest: K.V. No relevant relationships. S.M.S. No relevant relationships. J.S. No relevant relationships. P.H.Y. Consulting fees to author from FHOrtho and Bunkerhill Health; former trainee editorial board member of Radiology: Artificial Intelligence; associate editor of Radiology: Artificial Intelligence.

Author Contributions

Author contributions: Guarantors of integrity of entire study, K.V., J.S., P.H.Y.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, all authors; experimental studies, K.V., P.H.Y.; statistical analysis, all authors; and manuscript editing, all authors

Authors declared no funding for this work.


  • 1. West E, Mutasa S, Zhu Z, Ha R. Global Trend in Artificial Intelligence-Based Publications in Radiology From 2000 to 2018. AJR Am J Roentgenol 2019;213(6):1204–1206. Crossref, MedlineGoogle Scholar
  • 2. Erickson BJ. Magician’s Corner: How to Start Learning about Deep Learning. Radiol Artif Intell 2019;1(4):e190072. LinkGoogle Scholar
  • 3. Wiggins WF, Caton MT, Magudia K, et al. Preparing Radiologists to Lead in the Era of Artificial Intelligence: Designing and Implementing a Focused Data Science Pathway for Senior Radiology Residents. Radiol Artif Intell 2020;2(6):e200057. LinkGoogle Scholar
  • 4. Lindqwister AL, Hassanpour S, Lewis PJ, Sin JM. AI-RADS: An Artificial Intelligence Curriculum for Residents. Acad Radiol 2021;28(12):1810–1816. Crossref, MedlineGoogle Scholar
  • 5. Kitamura FC, Pan I, Kline TL. Reproducible Artificial Intelligence Research Requires Open Communication of Complete Source Code. Radiol Artif Intell 2020;2(4):e200060. LinkGoogle Scholar
  • 6. Pineau J, Vincent-Lamarre P, Sinha K, et al. Improving Reproducibility in Machine Learning Research. arXiv 2003.12206 [preprint] Posted March 27, 2020. Accessed June 27, 2022. Google Scholar
  • 7. McDermott MBA, Wang S, Marinsek N, Ranganath R, Foschini L, Ghassemi M. Reproducibility in machine learning for health research: Still a ways to go. Sci Transl Med 2021;13(586):eabb1655. Crossref, MedlineGoogle Scholar
  • 8. Mongan J, Moy L, Kahn CEJr. Checklist for Artificial Intelligence in Medical Imaging (CLAIM): A Guide for Authors and Reviewers. Radiol Artif Intell 2020;2(2):e200029. LinkGoogle Scholar
  • 9. Irvin J, Rajpurkar P, Ko M, et al. CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. arXiv 1901.07031 [preprint]. Posted January 21, 2019. Accessed June 27, 2022. Google Scholar
  • 10. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),Honolulu, HI,July 21–26, 2017.Piscataway, NJ:IEEE,2017;3462–3471. CrossrefGoogle Scholar
  • 11. Phillips NA, Rajpurkar P, Sabini M, et al. CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness. arXiv 2007.06199 [preprint] Posted July 13, 2020. Accessed June 27, 2022. Google Scholar
  • 12. Johnson AEW, Pollard TJ, Berkowitz SJ, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data 2019;6(1):317. Crossref, MedlineGoogle Scholar
  • 13. Bustos A, Pertusa A, Salinas JM, de la Iglesia-Vayá M. PadChest: A large chest x-ray image dataset with multi-label annotated reports. Med Image Anal 2020;66:101797. Crossref, MedlineGoogle Scholar
  • 14. Bluemke DA, Moy L, Bredella MA, et al. Assessing Radiology Research on Artificial Intelligence: A Brief Guide for Authors, Reviewers, and Readers-From the Radiology Editorial Board. Radiology 2020;294(3):487–489. LinkGoogle Scholar
  • 15. Gundersen OE, Kjensmo S. State of the Art: Reproducibility in Artificial Intelligence. Proc Conf AAAI Artif Intell 2018;32(1). Google Scholar
  • 16. ReScience C website. Updated October 11, 2019. Accessed June 25, 2022. Google Scholar
  • 17. Perkel JM. Challenge to scientists: does your ten-year-old code still run? Nature 2020;584(7822):656–658. Crossref, MedlineGoogle Scholar
  • 18. Papers with Code website. Accessed June 25, 2022. Google Scholar
  • 19. Tips for Publishing Research Code. GitHub Repository. Published March 5, 2020. Updated March 19, 2021. Accessed June 26, 2022. Google Scholar
  • 20. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med 2018;15(11):e1002683. Crossref, MedlineGoogle Scholar
  • 21. IEEE Xplore. IEEE Transactions on Medical Imaging website. Accessed June 25, 2022. Google Scholar
  • 22. Medical Physics. American Association of Physicists in Medicine website. Accessed June 26, 2022. Google Scholar
  • 23. Rajpurkar P, Irvin J, Zhu K, et al.CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv 1711.05225 [preprint]. Revised December 25, 2017. Accessed June 26, 2022. Google Scholar

Article History

Received: Apr 25 2022
Revision requested: May 26 2022
Revision received: July 25 2022
Accepted: Aug 2 2022
Published online: Aug 17 2022