DATS, the data tag suite to enable discoverability of datasets

Today’s science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an...

Full description

Saved in:

Bibliographic Details
Published in:	Scientific data Vol. 4; no. 1; p. 170059
Main Authors:	Sansone, Susanna-Assunta, Gonzalez-Beltran, Alejandra, Rocca-Serra, Philippe, Alter, George, Grethe, Jeffrey S., Xu, Hua, Fore, Ian M., Lyle, Jared, Gururaj, Anupama E., Chen, Xiaoling, Kim, Hyeon-eui, Zong, Nansu, Li, Yueling, Liu, Ruiling, Ozyurt, I. Burak, Ohno-Machado, Lucila
Format:	Journal Article
Language:	English
Published:	London Nature Publishing Group UK 06-06-2017 Nature Publishing Group
Subjects:	631/114 631/114/1314 Humanities and Social Sciences multidisciplinary Science
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Today’s science increasingly requires effective ways to find and access existing datasets that are distributed across a range of repositories. For researchers in the life sciences, discoverability of datasets may soon become as essential as identifying the latest publications via PubMed. Through an international collaborative effort funded by the National Institutes of Health (NIH)’s Big Data to Knowledge (BD2K) initiative, we have designed and implemented the DAta Tag Suite (DATS) model to support the DataMed data discovery index. DataMed’s goal is to be for data what PubMed has been for the scientific literature. Akin to the Journal Article Tag Suite (JATS) used in PubMed, the DATS model enables submission of metadata on datasets to DataMed. DATS has a core set of elements, which are generic and applicable to any type of dataset, and an extended set that can accommodate more specialized data types. DATS is a platform-independent model also available as an annotated serialization in schema.org, which in turn is widely used by major search engines like Google, Microsoft, Yahoo and Yandex.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 These authors contributed equally to this work S-A.S. led the Descriptive Metadata WG3, contributed to the model and its specification and documentation. A.G.-B. and P.R.-S. co-led on the model development, specification and documentation; A.G.-B. developed the DATS validation code and P.R.-S. focused on competency questions. G.A. led the Accessibility Metadata WG7 and chaired the use cases workshop, and with J.L. contributed to the model and its specification and documentation. J.G. and H.X. led the implementation of the model and their feedback, along with those from H.K., R.L., Y.L., B.O., X.C., I.F. and A.E.G., contributed to its refinement and releases. S.-A.S., A.G.-B., P.R.-S., J.G., and I.F. worked with the Schema.org and BioSchemas collaborators. L.O.-M. led the bioCADDIE consortium and ensured that DATS remained central to the other activities of the NIH Commons ecosystem. She wrote the discussion portion of this manuscript and provided critical edits to other portions of the text. All authors gave iterative feedback on the model and the community engagement and as well as the manuscript. S.-A.S., A.G.-B. and P.R.-S. drafted the first version of the manuscript, with contribution from L.O.-M., G.A. and J.G. All co-authors have contributed to its final version.
ISSN:	2052-4463 2052-4463
DOI:	10.1038/sdata.2017.59