TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse

During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order...

Full description

Saved in:
Bibliographic Details
Published in:Methods of information in medicine Vol. 61; no. S 02; p. e89
Main Authors: Pedrera-Jiménez, Miguel, García-Barrio, Noelia, Rubio-Mayo, Paula, Tato-Gómez, Alberto, Cruz-Bermúdez, Juan Luis, Bernal-Sobrino, José Luis, Muñoz-Carrero, Adolfo, Serrano-Balazote, Pablo
Format: Journal Article
Language:English
Published: Germany 01-12-2022
Subjects:
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
AbstractList During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes are often based on black boxes, on which clinical researchers are unaware of how the data were recorded, extracted, and transformed. In order to solve this, it is essential that extract, transform, and load (ETL) processes are based on transparent, homogeneous, and formal methodologies, making them understandable, reproducible, and auditable. This study aims to design and implement a methodology, according with FAIR Principles, for building ETL processes (focused on data extraction, selection, and transformation) for EHR reuse in a transparent and flexible manner, applicable to any clinical condition and health care organization. The proposed methodology comprises four stages: (1) analysis of secondary use models and identification of data operations, based on internationally used clinical repositories, case report forms, and aggregated datasets; (2) modeling and formalization of data operations, through the paradigm of the Detailed Clinical Models; (3) agnostic development of data operations, selecting SQL and R as programming languages; and (4) automation of the ETL instantiation, building a formal configuration file with XML. First, four international projects were analyzed to identify 17 operations, necessary to obtain datasets according to the specifications of these projects from the EHR. With this, each of the data operations was formalized, using the ISO 13606 reference model, specifying the valid data types as arguments, inputs and outputs, and their cardinality. Then, an agnostic catalog of data was developed through data-oriented programming languages previously selected. Finally, an automated ETL instantiation process was built from an ETL configuration file formally defined. This study has provided a transparent and flexible solution to the difficulty of making the processes for obtaining EHR-derived data for secondary use understandable, auditable, and reproducible. Moreover, the abstraction carried out in this study means that any previous EHR reuse methodology can incorporate these results into them.
Author Muñoz-Carrero, Adolfo
Bernal-Sobrino, José Luis
García-Barrio, Noelia
Pedrera-Jiménez, Miguel
Serrano-Balazote, Pablo
Cruz-Bermúdez, Juan Luis
Rubio-Mayo, Paula
Tato-Gómez, Alberto
Author_xml – sequence: 1
  givenname: Miguel
  surname: Pedrera-Jiménez
  fullname: Pedrera-Jiménez, Miguel
  organization: ETSI Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain
– sequence: 2
  givenname: Noelia
  surname: García-Barrio
  fullname: García-Barrio, Noelia
  organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
– sequence: 3
  givenname: Paula
  surname: Rubio-Mayo
  fullname: Rubio-Mayo, Paula
  organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
– sequence: 4
  givenname: Alberto
  surname: Tato-Gómez
  fullname: Tato-Gómez, Alberto
  organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
– sequence: 5
  givenname: Juan Luis
  surname: Cruz-Bermúdez
  fullname: Cruz-Bermúdez, Juan Luis
  organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
– sequence: 6
  givenname: José Luis
  surname: Bernal-Sobrino
  fullname: Bernal-Sobrino, José Luis
  organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
– sequence: 7
  givenname: Adolfo
  surname: Muñoz-Carrero
  fullname: Muñoz-Carrero, Adolfo
  organization: Digital Health Research Unit, Instituto de Salud Carlos III, Madrid, Spain
– sequence: 8
  givenname: Pablo
  surname: Serrano-Balazote
  fullname: Serrano-Balazote, Pablo
  organization: Data Science Unit, Instituto de Investigación Sanitaria Hospital Universitario 12 de Octubre, Madrid, Spain
BackLink https://www.ncbi.nlm.nih.gov/pubmed/36220109$$D View this record in MEDLINE/PubMed
BookMark eNo1j1tLxDAUhIMo7kVffZT8gerJSdOLb7J0XaEgSAXxZUma07XSG0kL7r-33p6GgZmPmRU77fqOGLsScCNAqVsfAIQYiFjFcSRP2BKVEEEM6nXBVt5_AECSQHjOFjJCBAHpkr0VTne-6l2b7Z79Hde8auizNg3xlsb33vZNfzjyOcDNVDe27g58_K4M2lE38qzI-eD6krwn_xObOdzR5OmCnVW68XT5p2v2ss2KzS7Inx4eN_d5UIaYjIG22oLEZF6HilJEIkiUFKJKsJIkJMRRCmL2JrZah6QMGiSZ6lKDsQbX7PqXO0ymJbsfXN1qd9z_n8QvCthT6Q
CitedBy_id crossref_primary_10_2196_48702
crossref_primary_10_2196_44547
ContentType Journal Article
Copyright The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Copyright_xml – notice: The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
DBID CGR
CUY
CVF
ECM
EIF
NPM
DOI 10.1055/s-0042-1757763
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
DatabaseTitleList MEDLINE
Database_xml – sequence: 1
  dbid: ECM
  name: MEDLINE
  url: https://search.ebscohost.com/login.aspx?direct=true&db=cmedm&site=ehost-live
  sourceTypes: Index Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Medicine
EISSN 2511-705X
ExternalDocumentID 36220109
Genre Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: PI18/01047
  grantid: Instituto de Salud Carlos III
– fundername: PI18CIII/00019
  grantid: Instituto de Salud Carlos III
– fundername: PI18/00981
  grantid: Ministerio de Economía y Competitividad
GroupedDBID ---
.GJ
0R~
123
4.4
53G
5RE
AAWTL
ABCQX
ABJNI
ABOCM
ACGFS
AENEX
AFFNX
AHRSK
ALMA_UNASSIGNED_HOLDINGS
C45
CGR
CS3
CUY
CVF
DIK
DU5
EBS
ECM
EIF
EJD
F5P
H13
L7B
NPM
OK1
OVD
QTC
RTC
RTE
TEORI
ZGI
ZXP
ID FETCH-LOGICAL-c428t-adad032800025e922ee085311f82f3e1307690111fb7daa4e5b2b2e39aca0bdb2
IngestDate Sat Sep 28 08:20:48 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue S 02
Language English
License The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/).
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c428t-adad032800025e922ee085311f82f3e1307690111fb7daa4e5b2b2e39aca0bdb2
OpenAccessLink https://pubmed.ncbi.nlm.nih.gov/PMC9788916
PMID 36220109
ParticipantIDs pubmed_primary_36220109
PublicationCentury 2000
PublicationDate 2022-12-01
PublicationDateYYYYMMDD 2022-12-01
PublicationDate_xml – month: 12
  year: 2022
  text: 2022-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Germany
PublicationPlace_xml – name: Germany
PublicationTitle Methods of information in medicine
PublicationTitleAlternate Methods Inf Med
PublicationYear 2022
SSID ssj0008804
Score 2.3836813
Snippet During the COVID-19 pandemic, several methodologies were designed for obtaining electronic health record (EHR)-derived datasets for research. These processes...
SourceID pubmed
SourceType Index Database
StartPage e89
SubjectTerms COVID-19 - epidemiology
Electronic Health Records
Humans
Pandemics
Title TransformEHRs: a flexible methodology for building transparent ETL processes for EHR reuse
URI https://www.ncbi.nlm.nih.gov/pubmed/36220109
Volume 61
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1La9wwEBa7LYRcQtv0kfSBDrkZUUe2Yju3PNwsJS4l2UDoJUjrcTB07cXePeTfZyRZjndLaXvoRXgl72A8n6UZaeYbQg5wFS1kEggEL49YeKQEk2h4MF0HPo4LRJAh9ZlcR99u4_M0TEcjxxD81PdfNY19qGudOfsP2u6FYgdeo86xRa1j-3d6d5ZoOrlqbSpzoUkvdYaULRdtWZdMmGZXE1sXiqhaHYteLb10euktbPoAGLIGDyV5DazataihzMiyjLVVnwKpt082j-u_Q95AI9nXcm6P5Su7a52V9yvoAzwuUA9m-FyyU9k0ZW2PleBn2a8cVytV1iyTD7ULauyHpnJZswst4DSYW_knmr5rWQ_3NTgfxIiAmf-088MiX9wOJ2vL3N6B8trz-WDyBVuM6JdFwReaP6NlJhUJzaWom1MHaFjMDRxwNdfRAcmfRzdIut3QmIzR5NJW-VnWGwQ4RYaOL1SIz-sPsk223J83PBtj4UxfkJ3ONaEnFlMvyQiqV2Qr67S5S36sQeuYSuqARQfAongDdcCiA2BRBBbtgWVuQznUAOs1ufmSTs8mrCvNwWbory6ZzGWumRj1giog4RwAbffg8LCIeREAGkaRrnSGv1WUSxmCUFxxCBI5k77KFX9DnlV1Be8Izf04EeooBIVNEaHHAiKWQTiTUoGYxXvkrX0rdwvLv3Ln3tf-b0fek-0nSH0gzwv8uOEjGbf56pPRzSP79G2P
link.rule.ids 782
linkProvider EBSCOhost
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=TransformEHRs%3A+a+flexible+methodology+for+building+transparent+ETL+processes+for+EHR+reuse&rft.jtitle=Methods+of+information+in+medicine&rft.au=Pedrera-Jim%C3%A9nez%2C+Miguel&rft.au=Garc%C3%ADa-Barrio%2C+Noelia&rft.au=Rubio-Mayo%2C+Paula&rft.au=Tato-G%C3%B3mez%2C+Alberto&rft.date=2022-12-01&rft.eissn=2511-705X&rft.volume=61&rft.issue=S+02&rft.spage=e89&rft_id=info:doi/10.1055%2Fs-0042-1757763&rft_id=info%3Apmid%2F36220109&rft_id=info%3Apmid%2F36220109&rft.externalDocID=36220109