TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order...
Saved in:
Published in: | Methods of information in medicine Vol. 61; no. S 02; p. e89 |
---|---|
Main Authors: | , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Germany
01-12-2022
|
Subjects: | |
Online Access: | Get more information |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Abstract | During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable.
This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization.
The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML.
First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined.
This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them. |
---|---|
AbstractList | During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable.
This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization.
The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML.
First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined.
This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them. |
Author | Muñoz-Carrero, Adolfo Bernal-Sobrino, José Luis García-Barrio, Noelia Pedrera-Jiménez, Miguel Serrano-Balazote, Pablo Cruz-Bermúdez, Juan Luis Rubio-Mayo, Paula Tato-Gómez, Alberto |
Author_xml | – sequence: 1 givenname: Miguel surname: Pedrera-Jiménez fullname: Pedrera-Jiménez, Miguel organization: ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain – sequence: 2 givenname: Noelia surname: García-Barrio fullname: García-Barrio, Noelia organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain – sequence: 3 givenname: Paula surname: Rubio-Mayo fullname: Rubio-Mayo, Paula organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain – sequence: 4 givenname: Alberto surname: Tato-Gómez fullname: Tato-Gómez, Alberto organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain – sequence: 5 givenname: Juan Luis surname: Cruz-Bermúdez fullname: Cruz-Bermúdez, Juan Luis organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain – sequence: 6 givenname: José Luis surname: Bernal-Sobrino fullname: Bernal-Sobrino, José Luis organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain – sequence: 7 givenname: Adolfo surname: Muñoz-Carrero fullname: Muñoz-Carrero, Adolfo organization: Digital Health Research Unit, Instituto de Salud Carlos III, Madrid, Spain – sequence: 8 givenname: Pablo surname: Serrano-Balazote fullname: Serrano-Balazote, Pablo organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36220109$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j1tLxDAUhIMo7kVffZT8gerJSdOLb7J0XaEgSAXxZUma07XSG0kL7r-33p6GgZmPmRU77fqOGLsScCNAqVsfAIQYiFjFcSRP2BKVEEEM6nXBVt5_AECSQHjOFjJCBAHpkr0VTne-6l2b7Z79Hde8auizNg3xlsb33vZNfzjyOcDNVDe27g58_K4M2lE38qzI-eD6krwn_xObOdzR5OmCnVW68XT5p2v2ss2KzS7Inx4eN_d5UIaYjIG22oLEZF6HilJEIkiUFKJKsJIkJMRRCmL2JrZah6QMGiSZ6lKDsQbX7PqXO0ymJbsfXN1qd9z_n8QvCthT6Q |
CitedBy_id | crossref_primary_10_2196_48702 crossref_primary_10_2196_44547 |
ContentType | Journal Article |
Copyright | The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
Copyright_xml | – notice: The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
DBID | CGR CUY CVF ECM EIF NPM |
DOI | 10.1055/s-0042-1757763 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: ECM name: MEDLINE url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Medicine |
EISSN | 2511-705X |
ExternalDocumentID | 36220109 |
Genre | Research Support, Non-U.S. Gov't Journal Article |
GrantInformation_xml | – fundername: PI18/01047 grantid: Instituto de Salud Carlos III – fundername: PI18CIII/00019 grantid: Instituto de Salud Carlos III – fundername: PI18/00981 grantid: Ministerio de Economía y Competitividad |
GroupedDBID | --- .GJ 0R~ 123 4.4 53G 5RE AAWTL ABCQX ABJNI ABOCM ACGFS AENEX AFFNX AHRSK ALMA_UNASSIGNED_HOLDINGS C45 CGR CS3 CUY CVF DIK DU5 EBS ECM EIF EJD F5P H13 L7B NPM OK1 OVD QTC RTC RTE TEORI ZGI ZXP |
ID | FETCH-LOGICAL-c428t-adad032800025e922ee085311f82f3e1307690111fb7daa4e5b2b2e39aca0bdb2 |
IngestDate | Sat Sep 28 08:20:48 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | S 02 |
Language | English |
License | The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c428t-adad032800025e922ee085311f82f3e1307690111fb7daa4e5b2b2e39aca0bdb2 |
OpenAccessLink | https://pubmed.ncbi.nlm.nih.gov/PMC9788916 |
PMID | 36220109 |
ParticipantIDs | pubmed_primary_36220109 |
PublicationCentury | 2000 |
PublicationDate | 2022-12-01 |
PublicationDateYYYYMMDD | 2022-12-01 |
PublicationDate_xml | – month: 12 year: 2022 text: 2022-12-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Germany |
PublicationPlace_xml | – name: Germany |
PublicationTitle | Methods of information in medicine |
PublicationTitleAlternate | Methods Inf Med |
PublicationYear | 2022 |
SSID | ssj0008804 |
Score | 2.3836813 |
Snippet | During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | e89 |
SubjectTerms | COVID-19 - epidemiology Electronic Health Records Humans Pandemics |
Title | TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse |
URI | https://www.ncbi.nlm.nih.gov/pubmed/36220109 |
Volume | 61 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1La9wwEBa7LYRcQtv0kfSBDrkZUUe2Yju3PNwsJS4l2UDoJUjrcTB07cXePeTfZyRZjndLaXvoRXgl72A8n6UZaeYbQg5wFS1kEggEL49YeKQEk2h4MF0HPo4LRJAh9ZlcR99u4_M0TEcjxxD81PdfNY19qGudOfsP2u6FYgdeo86xRa1j-3d6d5ZoOrlqbSpzoUkvdYaULRdtWZdMmGZXE1sXiqhaHYteLb10euktbPoAGLIGDyV5DazataihzMiyjLVVnwKpt082j-u_Q95AI9nXcm6P5Su7a52V9yvoAzwuUA9m-FyyU9k0ZW2PleBn2a8cVytV1iyTD7ULauyHpnJZswst4DSYW_knmr5rWQ_3NTgfxIiAmf-088MiX9wOJ2vL3N6B8trz-WDyBVuM6JdFwReaP6NlJhUJzaWom1MHaFjMDRxwNdfRAcmfRzdIut3QmIzR5NJW-VnWGwQ4RYaOL1SIz-sPsk223J83PBtj4UxfkJ3ONaEnFlMvyQiqV2Qr67S5S36sQeuYSuqARQfAongDdcCiA2BRBBbtgWVuQznUAOs1ufmSTs8mrCvNwWbory6ZzGWumRj1giog4RwAbffg8LCIeREAGkaRrnSGv1WUSxmCUFxxCBI5k77KFX9DnlV1Be8Izf04EeooBIVNEaHHAiKWQTiTUoGYxXvkrX0rdwvLv3Ln3tf-b0fek-0nSH0gzwv8uOEjGbf56pPRzSP79G2P |
link.rule.ids | 782 |
linkProvider | EBSCOhost |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TransformEHRs%3A+a+flexible+methodology+for+building+transparent+ETL+processes+for+EHR+reuse&rft.jtitle=Methods+of+information+in+medicine&rft.au=Pedrera-Jim%C3%A9nez%2C+Miguel&rft.au=Garc%C3%ADa-Barrio%2C+Noelia&rft.au=Rubio-Mayo%2C+Paula&rft.au=Tato-G%C3%B3mez%2C+Alberto&rft.date=2022-12-01&rft.eissn=2511-705X&rft.volume=61&rft.issue=S+02&rft.spage=e89&rft_id=info:doi/10.1055%2Fs-0042-1757763&rft_id=info%3Apmid%2F36220109&rft_id=info%3Apmid%2F36220109&rft.externalDocID=36220109 |