A data science roadmap for open science organizations engaged in early-stage drug discovery

The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The...

Full description

Saved in:

Bibliographic Details
Published in:	Nature communications Vol. 15; no. 1; pp. 5640 - 10
Main Authors:	Edfeldt, Kristina, Edwards, Aled M., Engkvist, Ola, Günther, Judith, Hartley, Matthew, Hulcoop, David G., Leach, Andrew R., Marsden, Brian D., Menge, Amelie, Misquitta, Leonie, Müller, Susanne, Owen, Dafydd R., Schütt, Kristof T., Skelton, Nicholas, Steffen, Andreas, Tropsha, Alexander, Vernet, Erik, Wang, Yanli, Wellnitz, James, Willson, Timothy M., Clevert, Djork-Arné, Haibe-Kains, Benjamin, Schiavone, Lovisa Holmberg, Schapira, Matthieu
Format:	Journal Article
Language:	English
Published:	London Nature Publishing Group UK 05-07-2024 Nature Publishing Group Nature Portfolio
Subjects:	639/638/309/2144 706/648/697 Artificial Intelligence Automation Cloud Computing Data integration Data management Data mining Data Mining - methods Data processing Data science Data Science - methods Databases, Factual Design of experiments Drug development Drug discovery Drug Discovery - methods Experimental design Humanities and Social Sciences Humans Information Dissemination - methods Learning algorithms Machine Learning Modelling multidisciplinary Open access Perspective R&D Real time Research & development Robustness Science Science (multidisciplinary) Time integration
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The Structural Genomics Consortium is an international open science research organization with a focus on accelerating early-stage drug discovery, namely hit discovery and optimization. We, as many others, believe that artificial intelligence (AI) is poised to be a main accelerator in the field. The question is then how to best benefit from recent advances in AI and how to generate, format and disseminate data to enable future breakthroughs in AI-guided drug discovery. We present here the recommendations of a working group composed of experts from both the public and private sectors. Robust data management requires precise ontologies and standardized vocabulary while a centralized database architecture across laboratories facilitates data integration into high-value datasets. Lab automation and opening electronic lab notebooks to data mining push the boundaries of data sharing and data modeling. Important considerations for building robust machine-learning models include transparent and reproducible data processing, choosing the most relevant data representation, defining the right training and test sets, and estimating prediction uncertainty. Beyond data-sharing, cloud-based computing can be harnessed to build and disseminate machine-learning models. Important vectors of acceleration for hit and chemical probe discovery will be (1) the real-time integration of experimental data generation and modeling workflows within design-make-test-analyze (DMTA) cycles openly, and at scale and (2) the adoption of a mindset where data scientists and experimentalists work as a unified team, and where data science is incorporated into the experimental design. Artificial intelligence is greatly accelerating research in drug discovery, but its development is still hindered by the lack of available data. Here the authors present data management and data science recommendations to help reach AI’s potential in the field.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 ObjectType-Review-3 content type line 23
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-024-49777-x