FAIR data sharing: The roles of common data elements and harmonization
The value of robust and responsible data sharing in clinical research and healthcare is recognized by patients, patient advocacy groups, researchers, journal editors, and the healthcare industry globally. Privacy and security concerns acknowledged, the act of exchanging data (interoperability) along...
Saved in:
Published in: | Journal of biomedical informatics Vol. 107; p. 103421 |
---|---|
Main Authors: | , , , , , , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
United States
Elsevier Inc
01-07-2020
|
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The value of robust and responsible data sharing in clinical research and healthcare is recognized by patients, patient advocacy groups, researchers, journal editors, and the healthcare industry globally. Privacy and security concerns acknowledged, the act of exchanging data (interoperability) along with its meaning (semantic interoperability) across studies and between partners has been difficult, if not elusive. For shared data to retain its value, a recommendation has been made to follow the Findable, Accessible, Interoperable, Reusable (FAIR) principles. Without applying appropriate data exchange standards with domain-relevant content standards and accessible rich metadata that uses applicable terminologies, interoperability is burdened by the need for transformation and/or mapping. These obstacles to interoperability limit the findability, accessibility and reusability of data, thus diminishing its value and making it impossible to adhere to FAIR principles.
One effort to standardize data collection has been through common data elements (CDEs). CDEs are data collection units comprising one or more questions together with a set of valid values. Some CDEs contain standardized terminology concepts that define the meaning of the data, and others include links to unique terminology concept identifiers and unique identifiers for each CDE; however, usually CDEs are defined for specific projects or collaborations and lack traceable or machine readable semantics. While the name implies that these are ‘common’, this has not necessarily been a requirement, and many CDEs have not been commonly used. The National Institutes of Health (NIH) CDEs are, in fact, a conglomerate of CDEs developed in silos by various NIH institutes. Therefore, CDEs have not brought the anticipated benefit to the industry through widescale interoperability, nor is there widespread reuse of CDEs. Certain institutes in the NIH recommend, albeit do not enforce, institute-specific preferred CDEs; however, at the NIH level a preponderance of choice and a lack of any overarching harmonization of CDEs or consistency in linking them to controlled terminology or common identifiers create confusion for researchers in their efforts to identify the best CDEs for their protocol. The problem of comparing data among studies is exacerbated when researchers select different CDEs for the same variable or data collection field. This manuscript explores reasons for the disappointingly low adoption of CDEs and the inability of CDEs or other clinical research standards to broadly solve the interoperability and data sharing problems. Recommendations are offered for rectifying this situation to enable responsible data sharing that will help in adherence to FAIR principles and the realization of Learning Health Systems for the sake of all of us as patients. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1532-0464 1532-0480 |
DOI: | 10.1016/j.jbi.2020.103421 |