How do we share data in COVID-19 research? A systematic review of COVID-19 datasets in PubMed Central Articles
Abstract Objective This study aims at reviewing novel coronavirus disease (COVID-19) datasets extracted from PubMed Central articles, thus providing quantitative analysis to answer questions related to dataset contents, accessibility and citations. Methods We downloaded COVID-19-related full-text ar...
Saved in:
Published in: | Briefings in bioinformatics Vol. 22; no. 2; pp. 800 - 811 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
England
Oxford University Press
22-03-2021
Oxford Publishing Limited (England) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Abstract
Objective
This study aims at reviewing novel coronavirus disease (COVID-19) datasets extracted from PubMed Central articles, thus providing quantitative analysis to answer questions related to dataset contents, accessibility and citations.
Methods
We downloaded COVID-19-related full-text articles published until 31 May 2020 from PubMed Central. Dataset URL links mentioned in full-text articles were extracted, and each dataset was manually reviewed to provide information on 10 variables: (1) type of the dataset, (2) geographic region where the data were collected, (3) whether the dataset was immediately downloadable, (4) format of the dataset files, (5) where the dataset was hosted, (6) whether the dataset was updated regularly, (7) the type of license used, (8) whether the metadata were explicitly provided, (9) whether there was a PubMed Central paper describing the dataset and (10) the number of times the dataset was cited by PubMed Central articles. Descriptive statistics about these seven variables were reported for all extracted datasets.
Results
We found that 28.5% of 12 324 COVID-19 full-text articles in PubMed Central provided at least one dataset link. In total, 128 unique dataset links were mentioned in 12 324 COVID-19 full text articles in PubMed Central. Further analysis showed that epidemiological datasets accounted for the largest portion (53.9%) in the dataset collection, and most datasets (84.4%) were available for immediate download. GitHub was the most popular repository for hosting COVID-19 datasets. CSV, XLSX and JSON were the most popular data formats. Additionally, citation patterns of COVID-19 datasets varied depending on specific datasets.
Conclusion
PubMed Central articles are an important source of COVID-19 datasets, but there is significant heterogeneity in the way these datasets are mentioned, shared, updated and cited. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 |
ISSN: | 1467-5463 1477-4054 |
DOI: | 10.1093/bib/bbaa331 |