Ethics of Using and Sharing Clinical Imaging Data for Artificial Intelligence: A Proposed Framework
In this article, the authors propose an ethical framework for using and sharing clinical data for the development of artificial intelligence (AI) applications. The philosophical premise is as follows: when clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. At that point, clinical data should be treated as a form of public good, to be used for the benefit of future patients. In their 2013 article, Faden et al argued that all who participate in the health care system, including patients, have a moral obligation to contribute to improving that system. The authors extend that framework to questions surrounding the secondary use of clinical data for AI applications. Specifically, the authors propose that all individuals and entities with access to clinical data become data stewards, with fiduciary (or trust) responsibilities to patients to carefully safeguard patient privacy, and to the public to ensure that the data are made widely available for the development of knowledge and tools to benefit future patients. According to this framework, the authors maintain that it is unethical for providers to “sell” clinical data to other parties by granting access to clinical data, especially under exclusive arrangements, in exchange for monetary or in-kind payments that exceed costs. The authors also propose that patient consent is not required before the data are used for secondary purposes when obtaining such consent is prohibitively costly or burdensome, as long as mechanisms are in place to ensure that ethical standards are strictly followed. Rather than debate whether patients or provider organizations “own” the data, the authors propose that clinical data are not owned at all in the traditional sense, but rather that all who interact with or control the data have an obligation to ensure that the data are used for the benefit of future patients and society.
© RSNA, 2020
See also the editorial by Krupinski in this issue.
When clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. Secondary use of clinical data should be treated as a form of public good, to be used for the benefit of future patients, and not to be sold for profit or under exclusive arrangements.
■ Sharing clinical data with outside entities is consistent with this ethical framework as long as the entities safeguard patient privacy and act as ethical data stewards.
■ Clinical data may be widely released for research and development as long as those receiving the data identify themselves and act as ethical data stewards.
■ According to this framework, specific patient consent is not required to use clinical data for research and development as long as data are used appropriately and patients are made aware of how their data may be used.
Consider the following recent events:
In September 2018, journalists from the New York Times and ProPublica published a series of articles (1,2) calling into question a relationship between Memorial Sloan-Kettering Cancer Center and Paige.AI, a data science company developing artificial intelligence (AI) algorithms to assist in cancer diagnosis from pathologic slides. The articles described an arrangement in which the cancer center would provide the company with exclusive access to its archive of 25 million patient tissue slides and accompanying pathologist notes in exchange for a 9% equity stake in the company. The article questioned the appropriateness of this relationship, as well as the fact that patients had not consented for images of their tissue to be used in this manner.
In November 2019, the Wall Street Journal published an article (3) revealing what was termed a secret deal for St. Louis, Missouri–based Ascension, a Catholic chain of 2600 hospitals, physicians’ offices, and other facilities, to share patient data with Google in a project with the code name “Project Nightingale.” The shared data included radiology scans, hospitalization records, laboratory tests, medications, and medical conditions and contained personal information such as names, birth dates, addresses, and family members. The article called into question patient privacy considerations, the appropriateness of the health system and Google's profiting from patient data, and the fact that neither patients nor physicians were notified of the arrangement. Discovery of the project triggered a federal inquiry and criticism from patients and lawmakers (4).
A subsequent article in Computerworld (5) highlighted that Google is by no means alone in this practice. As an example, it reported that in 2016, IBM purchased Truven Health Analytics for $2.6 billion, which included health care data on tens of millions of medical records and years of health insurance claims data that could be monetized by selling analysis and access to claims data (5). It also described efforts by Amazon, Apple, Microsoft, and other technology companies to enter the health care sector and gain access to health care data, either through applications that enable access to electronic health records or with their own in-house health care programs.
These examples represent common scenarios that are playing out worldwide as recent advances in machine learning and other AI technologies are being applied to medical imaging (6). Although a legal framework exists to address sharing of clinical data, advances in AI raise questions about whether the ethical basis of these laws is sufficient to address issues presented by this new technology. Although the protection of sensitive patient information has been a priority for decades, the advent of AI technology has given greater urgency to the question of who should control and profit from deidentified clinical data. Experience grappling with these and other related questions at our institution has led us to develop an ethical framework to guide our use and sharing of clinical data for the development of AI applications.
In this article, we outline that ethical framework by proposing principles that should guide the development and use of AI algorithms that can learn from and analyze data acquired in the routine provision of medical care. We do not address regulatory or legal considerations in depth; rather, we contend that the legal precedents that cover the use of clinical data were developed before the advent of modern AI technology and may not have fully considered all relevant ethical issues. We also do not consider broader ethical considerations in the development and use of AI for health care applications, except that we assume that tools derived from such data will be used to benefit rather than intentionally harm individuals. Considerations regarding the integrity of AI-related tools, including bias, quality control, and model transparency, are also out of scope. We limit our discussion to secondary uses of data acquired in the routine provision of medical care, recognizing that a well-developed construct governing data acquired primarily for research in humans already exists (7).
As illustrated in the examples above, the issue is generally framed as a debate regarding who should control and profit from secondary uses of the data—individual patients versus provider organizations. We offer a third alternative: regarding secondary use of information acquired in the course of routine medical care, neither patients nor provider organizations have full rights to control and profit from the data. Rather, we maintain that when clinical data are used to provide care, the primary purpose for acquiring the data is fulfilled. At that point, in terms of the potential for secondary use, clinical data should be treated as a form of public good, to be used for the benefit of future patients. This premise can serve as a foundation to answer questions of access, control, permission, profit, and exclusivity. Although we focus on clinical imaging data, we believe this framework can be applied to the electronic medical record, pathologic data, and other clinical data.
Except for New Hampshire, all U.S. states either recognize the provider as the owner of medical information or have no law conferring specific ownership or property rights to the medical record (8). Although U.S. law does not specifically address ownership of medical information, the U.S. Supreme Court has upheld the rights of companies to use, and to sell, the records of physicians’ prescribing information (9). In fact, health data are currently exchanged in global marketplaces, with a total value of approximately $11.45 billion in 2016, with expected double-digit growth in the forecast period of 2017–2025 (10). Although state and federal laws are in place protecting patient privacy and access to data (11), from a legal standpoint, clinical data are generally considered to be the property of the provider organization and can be used and sold by the provider organization as long as patient access and privacy protection requirements are met.
The field of ethics (or moral philosophy) involves systematizing, defending, and recommending principles, values, and standards of behavior (12). Ethical behavior is the foundation for the medical professions because it establishes the fiduciary duties of medical professionals and provider organizations (13). By establishing concrete guidelines for using clinical data for research and development, we believe that ethics should drive legal and regulatory frameworks.
The Belmont Report, issued in 1979 by the U.S. Office of Human Subjects Research (now called the Office for Human Research Protections), is one of the most widely recognized standards for biomedical ethics in the past 40 years (14). This seminal report articulates three general principles that guide the protection of human subjects in biomedical and behavioral research. First, respect for persons refers to the notion that individuals have the right to make their own choices about actions that affect them. Second, beneficence refers to the notion that researchers must do what is in the subject's best interest, including avoiding harming the subject. Third, justice refers to the notion that the benefits and costs of research and medical care should be distributed and borne in a fair manner.
Although the ethical framework of the Belmont Report is highly relevant to the human subjects research for which it was intended, the framework is not directly applicable to secondary use of deidentified clinical data, which the Report specifically states is beyond its scope (14). According to the Report, once identifying information is removed from the data, the use of that data is no longer considered human subjects research. Therefore, we use the Belmont Report only as a general reference to ethical principles.
A 2013 Hastings Center report by Faden et al (15) articulates an ethics framework that more directly applies to the question at hand. The authors argue that traditional ethical frameworks tend to create too sharp a distinction between research and clinical practice. The authors explicitly reject “the assumption that clinical research and clinical practice are, from an ethics standpoint, fundamentally different enterprises.” Rather, this framework sets a moral priority on health system learning and improvement and maintains that all individuals who participate in the health care system, including patients, have a moral obligation to contribute to improving that system.
We endorse the seven fundamental ethical obligations proposed by Faden et al (15), summarized in Table 1. Six of these obligations are directed to researchers, clinicians, administrators, payers, and purchasers. Some of these entities are not-for-profit entities, whereas others are for-profit entities, such as hospital networks, managed care organizations, and purchasers. The seventh obligation, the obligation to “contribute to the common purpose of improving the quality and value of clinical care and health care systems,” is directed to patients.
Faden and colleagues' article only lightly touches on the ethics of clinical data sharing; the report was published before the advent of modern AI technology, which has substantially increased the potential value of clinical data. Nevertheless, the ethical framework of the report is directly relevant to the secondary use of clinical data.
The Value of Data
We define “data” in the present context as recorded observations of physical properties, phenomena, or behaviors. The immediate value of data to the individual lies in their contribution to the individual’s clinical care. However, when deidentified and aggregated, clinical data can also serve as the raw materials to create generalizable knowledge through research and development, which go beyond the individual benefit. The resulting population-level inferences can be used to understand disease processes, to generate new diagnostic and treatment algorithms, and to otherwise benefit populations. Because clinical data have value to society as a whole, the public has an interest in safeguarding and promoting their use for beneficial purposes. Because of this public interest, once the primary purpose of clinical care has been fulfilled, we believe it is the moral obligation of those who participate in the health care system to treat clinical data as a form of public good, to be used to improve the care of future patients.
The potential societal value of these data are further enhanced because obtaining such data through other means often is impractical or unethical as a result of associated costs and risks, such as exposing patients to additional ionizing radiation. Therefore, from the perspective of beneficence at the population level, we believe it would be unethical to refrain from using clinical data to develop tools that have the potential to benefit others (16).
When used in this manner, clinical data are simply a conduit to viewing fundamental aspects of the human condition. It is not the data that are of primary interest, but rather the underlying physical properties, phenomena, and behaviors that they represent. When providing medical care, providers look “at” the patient to effectively diagnose and treat that individual. But when those data are deidentified, aggregated, and used to develop insights into elemental aspects of the human condition, researchers who do so are looking “through” deidentified and aggregated observations of groups of patients to understand underlying anatomy, physiology, and disease processes common to populations. The ability to use these data for societal benefit is a fortuitous side effect of the advent of electronic health records, but the underlying elements that they represent have, for the most part, existed for millennia and will presumably continue to exist for millennia to come.
Because this valuable resource is created incidentally, a construct is needed to determine what ethical principles should govern it. Whereas those on one side advocate for individual control of data, ethical principles do not support a patient’s right to block or profit from insight gained from the data when they are deidentified, obtained from large populations, and aggregated, especially when the activity requires no effort, imposes no cost, and poses essentially no risk to the patient. Conversely, whereas those on the other side advocate for the ability to exclusively license the data, it is not ethical for an entity to obtain access to and profit from activities derived from these insights and then prevent others from gaining access to the same source of insights.
The ethical obligations for data stewardship described by Faden et al apply not only to patients and clinicians. According to the framework of the report (15), all individuals and entities with access to clinical data inherently take on the same fiduciary obligations as those of medical professionals, including for-profit entities. For example, those who are granted access to the data must accept responsibility for safeguarding protected health information. They must not prevent others from learning from the data, such as through exclusive licenses. All parties must ensure that all knowledge derived from the data will be used for beneficial purposes. It is in the public’s interest to ensure that all who have access to this resource adhere to these ethical obligations.
The Role of Third Parties, Including Industry
In an ideal world, no financial transactions would be involved in the use of clinical data for the clinical research and development and distribution of AI models. However, resources are required to develop, implement, and maintain knowledge and models derived from the clinical data. The expectation of a financial return drives these investments in algorithm development and implementation.
The necessity of financing AI algorithm development sets up a potential conflict between ethical principles and market forces, with associated questions. First, who should be allowed to profit from the data? Second, how can we ensure that data are used appropriately?
Question of Profit
The answer to the question of who is entitled to profit from secondary uses of clinical data rests on the concept of fairness, which dictates that individuals and entities should profit from the data roughly in proportion to the value of their respective contributions. Therefore, to address the concept of fairness in distribution of profits, we must first consider how to assign value to data.
In this context, we distinguish data, which refers to recorded observations, from knowledge, which refers to higher-level inferences that represent understanding of the physical properties, phenomena, or behavior under observation. Such inferences may be abstract, in the form of principles, or concrete, in the form of formulaic scientific laws or, as in this case, diagnostic algorithms.
Data serve as raw materials that help create knowledge through discovery and development. Thus, both the data and the resulting knowledge-generating activities are critical elements of value creation.
Because patients contribute to the development of knowledge and tools by subjecting their bodies to observations by health care providers, two important questions arise: First, are patients entitled to a portion of profits generated from data about their bodies? Second, are provider organizations that have acquired and maintain the patient data entitled to a portion of profits derived from the data?
We contend that the reason a patient receives health care, including diagnostic imaging, is for personal benefit. When that care is provided, the primary purpose for which the data are acquired is fulfilled; patients are not inherently entitled to any additional benefit. Likewise, providers and provider organizations are financially reimbursed for care they provide and also are not entitled to any additional benefit. Because algorithm developers have not contributed anything up to this point, they also are not entitled to financially benefit from the clinical data. In other words, because the data were not primarily acquired for secondary use, from an ethical standpoint, no single entity is inherently entitled to profit from its secondary use. Therefore, it is appropriate for clinical data to be treated as a public good, in terms of its potential use for secondary purposes.
We are not the first to propose this approach. In 2008, a workshop was convened by the Institute of Medicine (now the National Academy of Medicine) specifically to explore what would need to be done to establish health care data as a public good (17). This workshop took place before recent technological advances in data science—the report did not even mention the terms artificial intelligence or machine learning. Nevertheless, the participants were prescient in their recognition of the value of clinical data.
Treating secondary use of clinical data as a public good implies at least two key ethical imperatives: (a) no single entity is entitled to profit directly from the data and (b) dissemination and use of the data for development of beneficial knowledge should be encouraged and facilitated. The dissemination of deidentified clinical data to all qualifying stakeholders diminishes the value of the data to any single stakeholder because it prevents any single entity from holding a monopoly on the resource. Thus, sharing data through exclusive contracts or licenses is unacceptable because the use of clinical data by one entity would preclude others from using it. Furthermore, when access to the data is nonexclusive, the data become a commodity, helping to ensure that proceeds related to licensing of algorithms derived from the data are based on the innovative efforts of the researchers and developers rather than from the data themselves.
Treating secondary use of patient data as a public good also argues against paying patients for the secondary use of clinical data. Such a practice could have dramatic unanticipated effects on the behavior of both patients and providers, altering the incentive structure of the fiduciary relationship between patients and care providers and thus potentially affecting patient care.
We define “selling” of clinical data to be when those who control clinical data grant access to the data to other entities, especially under favorable or exclusive arrangements, in exchange for monetary or in-kind payments that exceed costs. A public good can be defined as “a commodity or service that is provided without profit to all members of a society, either by the government or by a private individual or organization” (18). By considering clinical data as a public good, it becomes unethical for individuals or organizations to sell or resell clinical data for profit. However, it is reasonable for organizations to be reimbursed for the costs associated with aggregating, curating, deidentifying, and maintaining the data and making them available for secondary use in a nonexclusive manner, as long as the reimbursement does not exceed those costs.
We view this framework as a universally applicable, rather than a context-dependent, ethical construct. For example, one might argue that clinical data could serve as a helpful source of revenue to individuals and organizations in less fortunate financial circumstances, either within or outside the United States. We are skeptical of this argument because the potential for exploitation is just as great, if not greater, in such circumstances. Although it may be appropriate for for-profit corporations to reimburse providers for costs associated with the data, paying a premium for exclusive access to data sets up a host of potential ethical conflicts that are as relevant to developing nations as to developed nations.
When developers use the clinical data as raw materials to label the images and create an algorithm that can benefit future patients, they create value that did not previously exist. For example, a developer may label a set of deidentified clinical knee MR images and use the labeled images to train an algorithm. The MRI data are treated as a public good, but the labels are created by using additional resources by the developer, which constitute a value-added activity. It is reasonable for entities to profit from the value that they add through these activities, although it is not ethical for them to profit from the clinical data themselves, such as through the reselling of the data.
The question arises regarding whether it is fair for developers to extract profits derived from clinical data while patients and provider organizations are not allowed to sell the data. Under the proposed framework, developers are not allowed to extract profits from the clinical data themselves; profits should be derived only from the knowledge and tools that are developed from the data, much like for other medical research. The value is then returned to society through the distribution of beneficial knowledge and tools. It is incumbent on government to ensure that such knowledge and tools are distributed fairly; that entities are not allowed to extract excessive profits, such as through monopolistic behavior; and that companies do not engage in unethical behavior, such as using the data to identify individuals, market products, or create malicious applications. These issues are not unique to health care; governments regularly deal with them through a variety of methods.
Obligation to Use Data Appropriately
The concept that all who participate in the delivery of care take upon themselves the same fiduciary obligations as those of medical professionals is a key principle of the Learning Health Care Ethics Framework (15). According to this framework, provider organizations that acquire and maintain clinical data do not own them per se, but rather serve as stewards of the data to maintain, protect, and use the data for appropriate purposes. In other words, no one “owns” the data in the traditional sense—not even the patients themselves. Rather, all who interact with or control the data have an obligation to help ensure that the data are used for the benefit of future patients and of society, including AI developers.
According to Rosenbaum (19), “the concept of a data steward is intended to convey a fiduciary (or trust) relationship with data that turns on a data manager whose loyalty is to the interests of individuals and entities whose data are stored in and managed by the system.” In this case, the fiduciary responsibility is to patients, in protecting privacy, as well as to society, which has an interest in improving health care. Everyone who is granted access to clinical data similarly becomes a data steward, including for-profit AI algorithm developers.
This framework applies to aggregated data obtained from a large number of individuals. When an extensive amount of knowledge is derived from data or tissue samples from one or a few individuals, we contend that a different set of obligations likely apply, such as in the case of Henrietta Lacks (20) or John Moore (21). When knowledge is derived from an aggregate of many cases, not only is the marginal value of each case diminished, but also a greater amount of the overall value is derived from the act of aggregating the data (22). Considerations of individual privacy also differ between knowledge derived from a single individual versus that derived from a large number of individuals.
Provider organizations may need to share clinical data with outside entities to develop generalizable models, for two reasons. First, many provider organizations lack the capability to create AI models themselves. Second, the aggregation of clinical data from multiple institutions may markedly enhance the value of the data (23).
Sharing clinical data with outside entities is consistent with this ethical framework, as long as the following conditions are met:
1. Individual privacy is carefully safeguarded at all times.
2. The receiving organization willingly accepts the same fiduciary responsibilities of data stewardship as the original provider organization. This includes agreeing that no attempt will be made to re-identify any individual from the data.
3. The sharing and receiving organizations both strictly adhere to an agreement specifying the purposes for which the data will be used.
4. The receiving organization agrees to not share the data further without the consent of the original provider organization.
When data are widely released, those making use of the data assume the four obligations listed above. In addition, the parties should (a) identify themselves, including disclosing the personal identity and contact information of at least one guarantor of the agreement; (b) agree to all terms and conditions set forth by the releasing entity; and (c) agree to respond to communications from the originating institution as long as a copy of the data remains in their possession. All recipients of shared clinical data have an obligation to report to the original institution problems with the data, such as when identifiers are unintentionally disclosed, and to destroy data in their possession when requested to do so. They must specifically agree to refrain from attempts to re-identify individuals in any manner, including reconstruction of data (eg, facial recognition) or connection to other databases or data sources (eg, demographic information). Data use agreements should document the terms that specify the preceding obligations for all clinical data-sharing transactions, including for wide releases of deidentified data. Initiatives such as the National Institutes of Health “All of Us” program could serve as an example of how to allow participation from any qualifying research and development organization following an established vetting process (25).
To overcome problems of providing access to data while maintaining patient privacy, a concept known as federated learning is emerging, in which machine learning models are placed by a third party within the respective firewalls of multiple institutions and learnings from the models are transmitted back to the third party, without clinical data having to leave the institution (26,27). In this way, rather than bringing the data to the algorithm, the algorithms are brought to the data. We maintain that the same ethical principles for the secondary use of clinical data apply for federated learning as for data sharing for AI, including considerations of exclusivity. Although federated learning might decrease risks associated with deidentification, it does not necessarily fulfill the ethical imperative of facilitating the wide dissemination of the data as a public good. In fact, without an attempt to ensure otherwise, it may lock in incumbents who have the resources to install such systems, having the same effect as exclusive contracts.
Although detailed discussion of legal and regulatory frameworks are beyond the scope of this document, we maintain that government and other regulatory bodies have a role in both safeguarding and promoting the secondary use of clinical data for beneficial purposes. This includes clearly establishing guidelines and standards for what constitutes appropriate and inappropriate uses, monitoring the behavior of those who handle clinical data, and enforcing compliance with these standards.
Protection of Privacy
The protection of individual privacy is of paramount importance in the secondary use of clinical data (3). The principle of beneficence mandates that caregivers act in patients’ best interest and to prevent from harming them (4). Because unwanted disclosure of sensitive clinical data can cause harm, it is imperative that such data be carefully protected. This includes the application of the principle of the “minimum necessary” standard, meaning that an entity that shares medical information “must make reasonable efforts to limit protected health information to the minimum necessary to accomplish the intended purpose of the use, disclosure, or request” (28).
Unfortunately, identifying information cannot be removed with 100% reliability, for two reasons. First, both human and automated methods fail to detect identifying information in a small percentage of cases in any deidentification tasks. Second, advancing technology may enable the deduction of individual identity from data sets that were not possible in the past, such as the use of facial recognition software from three-dimensional reconstruction of facial CT scans (29,30).
Because of these limitations, it is impossible to set as an ethical requirement that all potential identifying information must be removed from all data sets with 100% certainty. Such incidents should be rare and minor, but given human and system imperfections, they are likely to occasionally occur. Therefore, rather than assuming that full deidentification is possible, one should consider deidentification in terms of levels of reliability (31). Some deidentification methods, such as multiple reviews using different deidentification strategies, are likely to detect and remove identifying information with greater reliability than other methods. In general, such higher-reliability methods tend to be more expensive than lower-reliability methods. The level of reliability of deidentification efforts should correspond to the trustworthiness and integrity of the entities with whom the data are shared. For example, the most reliable deidentification methods should be used for widely released data sets. The additional ethical responsibility of those receiving the data to protect patient privacy and to inform the originating institution of any inclusion of identifying information serves as an additional check for securing patient privacy.
Consent for Secondary Use of Data
A common ethical question is whether patients have the right to control secondary uses of their clinical data. In Faden and colleagues' framework (15), patients have an obligation to contribute to the improvement of care in the future. Although the principle of respect for persons indicates that individuals have the right to make their own choices about actions that affect them (14), this principle does not grant individuals the right to prevent others from learning from aggregated deidentified observations about them when it poses no significant risk to them as individuals (16).
Furthermore, the requirement of consent may impede or even preclude the aggregation of clinical data to benefit populations, adding substantial costs and essentially granting veto power to each individual patient (32). Even relatively small individual costs in this setting can add up to large aggregated costs that stifle innovation and preclude the generation of knowledge and tools that could be of substantial societal benefit.
According to the ethical framework of the learning health care system, we propose that additional patient consent is not required to use clinical data for research or algorithm development, under the following conditions:
1. Individual privacy is carefully safeguarded.
2. Data are aggregated when used for research and development.
3. Institutional oversight mechanisms are in place to ensure that clinical data are used appropriately and only for purposes that will be beneficial to future patients.
4. Patients are made aware of how their data may be used when they consent for care, through a public website, or other means convenient to patients.
We do not intend to refute the general concept of patient consent for secondary use of clinical data per se. If the consent process does not inhibit research and development of beneficial tools, then we support the use of consent to better inform patients, demonstrate respect, and increase trust. But if the requirement for consent would make research impracticable, create additional risk to patients, or cause additional confusion or burden to patients, we believe that informing patients of how clinical data may be used secondarily without obtaining express patient consent is ethically justifiable, as long as mechanisms are in place to ensure that ethical standards are strictly followed. This implies the need for formal institutional oversight. Because the purview of institutional review boards, or IRBs, is limited to research (7), this may necessitate expanding the purview of IRBs or establishing a system of separate local oversight bodies, possibly with external oversight, analogous to the IRB program.
Common Specific Questions
We have specifically addressed seven questions regarding the use of clinical data that we have frequently encountered, as shown in Table 2. Our answers are based on the framework we have outlined; specific rationale for the answers to each question is included in Appendix E1 (online).
We have proposed a framework to guide the sharing and use of clinical data based on foundational ethical principles. In evaluating possible frameworks to guide the appropriate secondary uses of clinical data, three general philosophical approaches were considered. One approach treats the patient as the owner of the data, implying that patient data should be shared only with the express consent of that patient. A second approach treats the care provider as the owner of the data, implying that the data can be bought and sold as any other commodity. We propose an alternative to both of these approaches, which establishes that the secondary use of data should be treated as a public good, implying that the data are not “owned” in the traditional sense, but rather all who interact with or control the data have an obligation to ensure that the data are used for the benefit of future patients and of society.
The latter approach builds on the fiduciary model of health care delivery, in which providers and provider organizations are intentionally placed in a position of trust, with safeguards to ensure that trust is maintained and that penalties are applied when the trust is violated. Rather than assuming that for-profit entities cannot be trusted, our approach expands the role of fiduciary, with all of the accompanying responsibilities, to all who use clinical data to generate knowledge and develop tools for the benefit of patients, including for-profit entities. This builds on the work of others who have highlighted the importance of embedding trustworthiness into data sharing efforts (33–35).
For this approach to be effective, these ethical obligations must be regarded as symmetrical: All who participate in this environment and benefit from it, including patients, providers, health care organizations, and industry, have an ethical obligation to preserve and improve on the health care delivery system, and to not unfairly profit at the expense of the common good. In other words, the notion that patients have an ethical obligation to contribute to the common purpose of improving health care is necessarily accompanied by the concept that all others also share this responsibility.
This approach may be unfamiliar to technology companies, which are not always accustomed to assuming the obligations associated with patient care.
By describing how all entities who participate in the health care system can adhere to their moral obligations to continuously improve it, we believe this framework maximizes benefit for society without compromising individual patient protections.
Author contributions: Guarantor of integrity of entire study, D.B.L.; study concepts/study design or data acquisition or data analysis/interpretation, all authors; manuscript drafting or manuscript revision for important intellectual content, all authors; approval of final version of submitted manuscript, all authors; agrees to ensure any questions related to the work are appropriately resolved, all authors; literature research, all authors; manuscript editing, all authors.
- 1. . Top cancer researcher fails to disclose corporate financial ties in major research journals. New York Times. https://www.nytimes.com/2018/09/08/health/jose-baselga-cancer-memorial-sloan-kettering.html. Published September 8, 2018. Accessed December 20, 2019. Google Scholar
- 2. . Sloan Kettering’s cozy deal with start-up ignites a new uproar. New York Times. https://www.nytimes.com/2018/09/20/health/memorial-sloan-kettering-cancer-paige-ai.html. Published September 20, 2018. Accessed December 20, 2019. Google Scholar
- 3. Google’s ‘Project Nightingale’ gathers personal health data on millions of Americans. Wall Street Journal. https://www.wsj.com/articles/google-s-secret-project-nightingale-gathers-personal-health-data-on-millions-of-americans-11573496790?mod=article_inline. Published November 11, 2019. Accessed December 20, 2019. Google Scholar
- 4. . Google’s ‘Project Nightingale’ triggers federal inquiry. Wall Street Journal. https://www.wsj.com/articles/behind-googles-project-nightingale-a-health-data-gold-mine-of-50-million-patients-11573571867. Published November 12, 2019. Accessed December 20, 2019. Google Scholar
- 5. . Yes, Google’s using your healthcare data–and it’s not alone. Computerworld. https://www.computerworld.com/article/3453818/yes-googles-using-your-healthcare-data-and-its-not-alone.html. Published November 15, 2019. Accessed December 20, 2019. Google Scholar
- 6. Ethics of artificial intelligence in radiology: summary of the Joint European and North American Multisociety Statement. Radiology 2019;293(2):436–440. Link, Google Scholar
- 7. Protection of human subjects. 56 (117) Federal Register 28013 (1991) (codified at 45 CFR §46). Google Scholar
- 8. . Who owns medical records: 50 state comparison. Hirsh Health Law and Policy Program (George Washington University) and Robert Wood Johnson Foundation. http://www.healthinfolaw.org/comparative-analysis/who-owns-medical-records-50-state-comparison. Updated August 20, 2015. Accessed December 20, 2019. Google Scholar
- 9. . IMS Health Inc. (No. 10-779). Legal Information Institute, Cornell University. https://www.law.cornell.edu/supct/html/10-779.ZS.html. Accessed December 20, 2019. Google Scholar
- 10. Global big data in healthcare market analysis and forecast, 2017-2025 - focus on components and services, applications, & competitive Landscape. ResearchAndMarkets.com. https://bisresearch.com/industry-report/global-big-data-in-healthcare-market-2025.html. Published April 3, 2018. Accessed December 20, 2019. Google Scholar
- 11. . Individual access to medical records: 50 state comparison. Hirsh Health Law and Policy Program (George Washington University) and Robert Wood Johnson Foundation. http://www.healthinfolaw.org/comparative-analysis/individual-access-medical-records-50-state-comparison. Updated September 24, 2013. Accessed December 20, 2019. Google Scholar
- 12. . Ethics. Internet Encyclopedia of Philosophy. https://www.iep.utm.edu/ethics. Accessed October 1, 2019. Google Scholar
- 13. . AMA Code of Ethics. American Medical Association. https://www.ama-assn.org/topics/ama-code-medical-ethics. Accessed October 1, 2019. Google Scholar
- 14. . The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. Bethesda, Md: U.S. Government Printing Office, 1978. Google Scholar
- 15. . An ethics framework for a learning health care system: a departure from traditional research ethics and clinical ethics. Hastings Cent Rep 2013;43(Spec No):S16–S27. Crossref, Google Scholar
- 16. . Ethics and big data in health. Curr Opin Syst Biol 2017;4:53–57. Crossref, Google Scholar
- 17. . Clinical Data as the Basic Staple of Health Learning: Creating and Protecting a Public Good: Workshop Summary. Washington, DC: The National Academies Press, 2010. Google Scholar
- 18. “Public good” definition. Lexico. www.lexico.com/en/definition/public_good. Accessed October 1, 2019. Google Scholar
- 19. . Data governance and stewardship: designing data stewardship entities and advancing data access. Health Serv Res 2010;45(5 Pt 2):1442–1455. Crossref, Medline, Google Scholar
- 20. The Immortal Life of Henrietta Lacks. New York, NY: Penguin Random House, 2010. Google Scholar
- 21. . Patient’s right to tissue is limited. New York Times. https://www.nytimes.com/1990/07/10/science/patient-s-right-to-tissue-is-limited.html. Published July 10, 1990. Accessed December 20, 2019. Google Scholar
- 22. . Who should profit from the sale of patient data? Brookings Institution. https://www.brookings.edu/blog/techtank/2018/11/19/who-should-profit-from-the-sale-of-patient-data. Published November 19, 2018. Accessed December 20, 2019. Google Scholar
- 23. . Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 2018;15(11):e1002683. Crossref, Medline, Google Scholar
- 24. . Medical image data and datasets in the era of machine learning: whitepaper from the 2016 C-MIMI meeting dataset session. J Digit Imaging 2017;30(4):392–399. Crossref, Medline, Google Scholar
- 25. . Privacy and security protocols. All of Us. https://www.researchallofus.org/about/privacy-security-protocols. Accessed October 1, 2019. Google Scholar
- 26. Distributed deep learning networks among institutions for medical imaging. J Am Med Inform Assoc 2018;25(8):945–954. Crossref, Medline, Google Scholar
- 27. A roadmap for foundational research on artificial intelligence in medical imaging: from the 2018 NIH/RSNA/ACR/The Academy Workshop. Radiology 2019;291(3):781–791. Link, Google Scholar
- 28. . Standards for privacy of individually identifiable health information: Final rule, 45 CFR §164.502 (b). Google Scholar
- 29. Weaving technology and policy together to maintain confidentiality. J Law Med Ethics 1997;25(2-3):98–110, 82. Crossref, Medline, Google Scholar
- 30. . Use and understanding of anonymization and de-identification in the biomedical literature: scoping review. J Med Internet Res 2019;21(5):e13484. Crossref, Medline, Google Scholar
- 31. . Beyond the DICOM header: additional issues in deidentification. AJR Am J Roentgenol 2014;203(6):W658–W664. Crossref, Medline, Google Scholar
- 32. . Biobank research: who benefits from individual consent? BMJ 2011;343:d5647. Crossref, Medline, Google Scholar
- 33. Creating a data resource: what will it take to build a medical information commons? Genome Med 2017;9(1):84. Crossref, Medline, Google Scholar
- 34. Importance of participant-centricity and trust for a sustainable medical information commons. J Law Med Ethics 2019;47(1):12–20. Crossref, Medline, Google Scholar
- 35. Trust in genomic data sharing among members of the general public in the UK, USA, Canada and Australia. Hum Genet 2019;138(11-12):1237–1246. Crossref, Medline, Google Scholar
Article HistoryReceived: Nov 14 2019
Revision requested: Dec 17 2019
Revision received: Jan 19 2020
Accepted: Jan 23 2020
Published online: Mar 24 2020
Published in print: June 2020