STARD 2015: An Updated List of Essential Items for Reporting Diagnostic Accuracy Studies
Incomplete reporting has been identified as a major source of avoidable waste in biomedical research. Essential information is often not provided in study reports, impeding the identification, critical appraisal, and replication of studies. To improve the quality of reporting of diagnostic accuracy studies, the Standards for Reporting of Diagnostic Accuracy Studies (STARD) statement was developed. Here we present STARD 2015, an updated list of 30 essential items that should be included in every report of a diagnostic accuracy study. This update incorporates recent evidence about sources of bias and variability in diagnostic accuracy and is intended to facilitate the use of STARD. As such, STARD 2015 may help to improve completeness and transparency in reporting of diagnostic accuracy studies.
As researchers, we talk and write about our studies, not just because we are happy—or disappointed—with the findings, but also to allow others to appreciate the validity of our methods, to enable our colleagues to replicate what we did, and to disclose our findings to clinicians, other health care professionals, and decision-makers, all of whom rely on the results of strong research to guide their actions.
Unfortunately, deficiencies in the reporting of research have been highlighted in several areas of clinical medicine (1). Essential elements of study methods are often poorly described and sometimes completely omitted, making both critical appraisal and replication difficult, if not impossible. Sometimes study results are selectively reported, and other times researchers cannot resist unwarranted optimism in interpretation of their findings (2–4). These practices limit the value of the research and any downstream products or activities, such as systematic reviews and clinical practice guidelines.
Reports of studies of medical tests are no exception. A growing number of evaluations have identified deficiencies in the reporting of test accuracy studies (5). These are studies in which a test is evaluated against a clinical reference standard, or gold standard; the results are typically reported as estimates of the test’s sensitivity and specificity, which express how good the test is in correctly identifying patients as having the target condition. Other accuracy statistics can be used as well, such as the area under the ROC curve or positive and negative predictive values.
Despite their apparent simplicity, such studies are at risk of bias (6, 7). If not all patients undergoing testing are included in the final analysis, for example, or if only healthy controls are included, the estimates of test accuracy may not reflect the performance of the test in clinical applications. Yet such crucial information is often missing from study reports.
It is now well established that sensitivity and specificity are not fixed test properties. The relative number of false-positive and false-negative test results varies across settings, depending on how patients present and which tests they have already undergone. Unfortunately, many authors also fail to completely report the clinical context and when, where, and how they identified and recruited eligible study participants (8). In addition, sensitivity and specificity estimates can differ owing to variable definitions of the reference standard against which the test is being compared. Thus this information should be available in the study report.
The 2003 STARD Statement
To assist in the completeness and transparency of reporting diagnostic accuracy studies, a group of researchers, editors, and other stakeholders developed a minimum list of essential items that should be included in every study report. The guiding principle for developing the list was to select items that, if described, would help readers to judge the potential for bias in the study and appraise the applicability of the study findings and the validity of the authors’ conclusions and recommendations.
The resulting Standards for Reporting Diagnostic Accuracy Studies (STARD) statement appeared in 2003 in two dozen journals (9). It was accompanied by editorials and commentaries in several other publications and endorsed by many more.
Since the publication of STARD, several evaluations have pointed to small but statistically significant improvements in reporting accuracy studies (mean gain 1.4 items; 95% CI 0.7 to 2.2) (5,10). Gradually, more of the essential items are being reported, but the situation remains far from optimal.
Methods for Developing STARD 2015
The STARD steering committee periodically reviews the literature for potentially relevant studies to inform a possible update. In 2013, the steering committee decided that the time was right to update the checklist.
Updating had 2 major goals: first, to incorporate recent evidence about sources of bias, applicability concerns, and factors facilitating generous interpretation in test accuracy research, and second, to make the list easier to use. In making modifications, we also considered harmonization with other reporting guidelines, such as Consolidated Standards of Reporting Trials (CONSORT) 2010 (11).
A complete description of the updating process and the justification for the changes are available on the Enhancing the Quality and Transparency of Health Research (EQUATOR) website at http://www.equator-network.org/reporting-guidelines/stard. In short, we invited the 2003 STARD group members to participate in the updating process, nominate new members, and comment on the general scope of the update. Suggested new members were contacted. As a result, the STARD group has now grown to 85 members that include researchers, editors, journalists, evidence synthesis professionals, funders, and other stakeholders.
STARD group members were then asked to suggest, and later to endorse, proposed changes in a 2-round web-based survey. This served to prepare a draft list of essential items, which was discussed in the steering committee in a 2-day meeting in Amsterdam in September 2014. The list was then piloted in different groups: starting and advanced researchers, peer reviewers, and editors.
The general structure of STARD 2015 is similar to that of STARD 2003. A 1-page document presents 30 items, grouped under sections that follow the Introduction, Methods, Results, and Discussion (IMRAD) structure of a scientific article (see Table 1). Several of the STARD 2015 items are identical to the ones in the 2003 version. Others have been reworded, combined, or (if complex) split. A few have been added (see Table 2 for a summary of new items and Table 3 for key terms). A diagram to describe the flow of participants through the study is now expected in all reports (Figure).
STARD 2015 replaces the original version published in 2003; those who would like to refer to STARD are invited to cite this article. The list of essential items can be seen as a minimum set, and an informative study report will typically present more information. Yet we hope to find all applicable items in a well-prepared report of a diagnostic accuracy study.
Authors are invited to use STARD when preparing their study reports. Reviewers can use the list to verify that all essential information is available in a submitted manuscript and suggest changes if key items are missing.
We trust that journals that endorsed STARD in 2003 or later will recommend the use of this updated version and encourage compliance in submitted manuscripts. We hope that even more journals, and journal organizations, will promote the use of this and comparable reporting guidelines. Funders and research institutions may promote or mandate adherence to STARD as a way to maximize the value of research and downstream products or activities.
STARD may also be beneficial for reporting other studies that evaluate the performance of tests. This includes prognostic studies, which can classify patients on the basis of whether a future event happens; monitoring studies, in which tests are supposed to detect or predict an adverse event or lack of response; studies evaluating treatment selection markers; and more. We and others have found most of the STARD items useful when reporting and examining such studies, although STARD primarily targets diagnostic accuracy studies.
Diagnostic accuracy is not the only expression of test performance, nor is it always the most meaningful (12). Incremental accuracy from combining tests, relative to a single test, can be more informative: for example (13). For continuous tests, dichotomization into test positives and negatives may not always be indicated. In such cases, the desirable computational and graphical methods for expressing test performance are different, although many of the methodological precautions would be the same, and STARD can help in reporting the study in an informative way. Other reporting guidelines target more specific forms of tests, such as Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) for multivariable prediction models (14).
Although STARD focuses on full study reports of test accuracy studies, the items can also be helpful when writing conference abstracts, including information in trial registries, and developing protocols for such studies. Additional initiatives are underway to provide more specific guidance for each of these applications.
STARD Extensions and Applications
The STARD statement was designed to apply to all types of medical tests. The STARD group believed that a single checklist, for all diagnostic accuracy studies, would be more widely disseminated and more easily accepted by authors, peer reviewers, and journal editors than separate lists for different types of tests such as imaging, biochemistry, or histopathology.
Having a general list may necessitate additional instructions for informative reporting, with more information for specific types of tests, specific applications, or specific forms of analysis. Such guidance could describe the preferred methods for studying and reporting measurement uncertainty, for example, without changing any of the other STARD items. The STARD group welcomes the development of such STARD extensions and invites interested groups to contact the STARD executive committee before developing them.
Other groups may want to develop additional guidance to facilitate the use of STARD for specific applications. An example of such a STARD application was prepared for history-taking and physical examination (15). Another type of application is the use of STARD for specific target conditions such as dementia (16).
The new STARD 2015 list and all related documents can be found on the STARD pages of the EQUATOR website. EQUATOR is an international initiative that seeks to improve the value of published health research literature by promoting transparent and accurate reporting and wider use of robust reporting guidelines (17,18). The STARD group believes that working more closely with EQUATOR and other reporting guideline developers will help us to better reach shared objectives. We have updated the 2003 explanation and elaboration document, which can also be found at the EQUATOR website. This document explains the rationale for each item and gives examples.
The STARD list is released under a Creative Commons license. This allows everyone to use and distribute the work if they acknowledge the source. The STARD statement was originally reported in English, but several groups have worked on translations in other languages. We welcome such translations, which are preferably developed by groups of researchers, by use of a cyclical development process, with back-translation to the original language and user testing (19). We have also applied for a trademark for STARD to ensure that the steering committee has the exclusive right to use the word “STARD” to identify goods or services.
Increasing Value, Reducing Waste
The STARD steering committee is aware that building a list of essential items is not sufficient to achieve substantial improvements in reporting completeness, as the modest improvement after introduction of the 2003 list has shown. We see this list not as the final product, but as the starting point for building more specific instruments to stimulate complete and transparent reporting, such as a checklist and a writing aid for authors, tools for reviewers and editors, instruction videos, and teaching materials, all based on this STARD list of essential items.
Incomplete reporting has been identified as one of the sources of avoidable waste in biomedical research (1). Since STARD was initiated, several other initiatives have been undertaken to enhance the reproducibility of research and promote greater transparency (20). Multiple factors are at stake, but incomplete reporting is one of them. We hope that this update of STARD, together with additional implementation initiatives, will help authors, editors, reviewers, readers, and decisionmakers to collect, appraise, and apply the evidence needed to strengthen decisions and recommendations about medical tests. In the end, we are all to benefit from more informative and transparent reporting: as researchers, as health care professionals, as payers, and as patients.Authors’ Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:Employment or Leadership: N. Rifai, Clinical Chemistry, AACC.Consultant or Advisory Role: C.A. Gatsonis, member RSNA Research Development Committee.Stock Ownership: None declared.Honoraria: None declared.Research Funding: There was no explicit funding for the development of STARD 2015. The Academic Medical Center of the University of Amsterdam, the Netherlands, partly funded the meeting of the STARD steering group but had no influence on the development or dissemination of the list of essential items. STARD steering group members and STARD group members covered additional personal costs individually.Expert Testimony: None declared.Patents: None declared.STARD Group collaborators: Todd Alonzo, Douglas G. Altman, Augusto Azuara-Blanco, Lucas Bachmann, Jeffrey Blume, Patrick M. Bossuyt, Isabelle Boutron, David Bruns, Harry Bu¨ller, Frank Buntinx, Sarah Byron, Stephanie Chang, Je´re´mie F. Cohen, Richelle Cooper, Joris de Groot, Henrica C.W. de Vet, Jon Deeks, Nandini Dendukuri, Jac Dinnes, Kenneth Fleming, Constantine A. Gatsonis, Paul P. Glasziou, Robert M. Golub, Gordon Guyatt, Carl Heneghan, Jørgen Hilden, Lotty Hooft, Rita Horvath, Myriam Hunink, Chris Hyde, John Ioannidis, Les Irwig, Holly Janes, Jos Kleijnen, Andre´ Knottnerus, Danie¨l A. Korevaar, Herbert Y. Kressel, Stefan Lange, Mariska Leeflang, Jeroen G. Lijmer, Sally Lord, Blanca Lumbreras, Petra Macaskill, Erik Magid, Susan Mallett, Matthew McInnes, Barbara Mc- Neil, Matthew McQueen, David Moher, Karel Moons, Katie Morris, Reem Mustafa, Nancy Obuchowski, Eleanor Ochodo, Andrew Onderdonk, John Overbeke, Nitika Pai, Rosanna Peeling, Margaret Pepe, Steffen Petersen, Christopher Price, Philippe Ravaud, Johannes B. Reitsma, Drummond Rennie, Nader Rifai, Anne Rutjes, Holger Schunemann, David Simel, Iveta Simera, Nynke Smidt, Ewout Steyerberg, Sharon Straus, William Summerskill, Yemisi Takwoingi, Matthew Thompson, Ann van de Bruel, Hans van Maanen, Andrew Vickers, Gianni Virgili, Stephen Walter, Wim Weber, Marie Westwood, Penny Whiting, Nancy Wilczynski, Andreas Ziegler.
Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.
- 1. . Reducing waste from incomplete or unusable reports of biomedical research. Lancet 2014;383:267–276. Crossref, Medline, Google Scholar
- 2. . Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 2010;303:2058–2064. Crossref, Medline, Google Scholar
- 3. .Overinterpretation and misreporting of diagnostic accuracy studies: evidence of “spin.” Radiology 2013;267:581–588. Link, Google Scholar
- 4. . Comparison of registered and published primary outcomes in randomized controlled trials. JAMA 2009; 302:977–984. Crossref, Medline, Google Scholar
- 5. . Reporting diagnostic accuracy studies: some improvements after 10 years of STARD. Radiology 2015;274:781–789. Link, Google Scholar
- 6. . Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061–1066. Crossref, Medline, Google Scholar
- 7. . A systematic review classifies sources of bias and variation in diagnostic test accuracy studies. J Clin Epidemiol 2013; 66:1093–1104. Crossref, Medline, Google Scholar
- 8. . Designing studies to ensure that estimates of test accuracy are transferable. BMJ 2002;324:669–671. Crossref, Medline, Google Scholar
- 9. . Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD Initiative. Radiology 2003;226:24–28. Link, Google Scholar
- 10. . Reporting quality of diagnostic accuracy studies: a systematic review and meta-analysis of investigations on adherence to STARD. Evid Based Med 2014;19:47–54. Crossref, Medline, Google Scholar
- 11. 2010 Statement: updated guidelines for reporting parallel group randomised trials. J Clin Epidemiol 2010; 63:834–840. Crossref, Medline, Google Scholar
- 12. . Beyond diagnostic accuracy: the clinical utility of diagnostic tests. Clin Chem 2012;58:1636–1643. Crossref, Medline, Google Scholar
- 13. . Quantifying the added value of a diagnostic test or marker. Clin Chem 2012;58:1408–1417. Crossref, Medline, Google Scholar
- 14. . Trans-parent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594. Crossref, Medline, Google Scholar
- 15. . The STARD statement for reporting diagnostic accuracy studies: application to the history and physical examination. J Gen Intern Med 2008;23:768–774. Crossref, Medline, Google Scholar
- 16. . Reporting standards for studies of diagnostic test accuracy in dementia: the STARDdem Initiative. Neurology 2014;83:364–373. Crossref, Medline, Google Scholar
- 17. : reporting guidelines for health research. Lancet 2008;371:1149–1150. Crossref, Medline, Google Scholar
- 18. . Transparent and accurate reporting increases reliability, utility, and impact of your research: reporting guidelines and the EQUATOR Network. BMC Med 2010;8:24. Crossref, Medline, Google Scholar
- 19. . Guidelines for the process of cross-cultural adaptation of self-report measures. Spine 2000;25:3186–3191. Crossref, Medline, Google Scholar
- 20. . Policy: NIH plans to enhance reproducibility. Nature 2014;505:612–613. Crossref, Medline, Google Scholar
Article HistoryReceived July 9, 2015; accepted September 18.
Published online: Oct 28 2015
Published in print: Dec 2015