ESAMP: event-sourced architecture for materials provenance management and application to accelerated materials discovery

While the vision of accelerating materials discovery using data driven methods is well-founded, practical realization has been throttled due to challenges in data generation, ingestion, and materials state-aware machine learning. High-throughput experiments and automated computational workflows are...

Full description

Saved in:
Bibliographic Details
Published in:Digital discovery Vol. 2; no. 4; pp. 178 - 188
Main Authors: Statt, Michael J, Rohr, Brian A, Brown, Kris, Guevarra, Dan, Hummelshøj, Jens, Hung, Linda, Anapolsky, Abraham, Gregoire, John M, Suram, Santosh K
Format: Journal Article
Language:English
Published: United Kingdom Royal Society of Chemistry (RSC) 08-08-2023
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:While the vision of accelerating materials discovery using data driven methods is well-founded, practical realization has been throttled due to challenges in data generation, ingestion, and materials state-aware machine learning. High-throughput experiments and automated computational workflows are addressing the challenge of data generation, and capitalizing on these emerging data resources requires ingestion of data into an architecture that captures the complex provenance of experiments and simulations. In this manuscript, we describe an event-sourced architecture for materials provenance (ESAMP) that encodes the sequence and interrelationships among events occurring in a simulation or experiment. We use this architecture to ingest a large and varied dataset (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments performed using various modalities such as serial, parallel, multi-modal experimentation. Our data architecture tracks the evolution of a material's state, enabling a demonstration of how state-equivalency rules can be used to generate datasets that significantly enhance data-driven materials discovery. Specifically, using state-equivalency rules and parameters associated with state-changing processes in addition to the typically used composition data, we demonstrated marked reduction of uncertainty in prediction of overpotential for oxygen evolution reaction (OER) catalysts. Finally, we discuss the importance of ESAMP architecture in enabling several aspects of accelerated materials discovery such as dynamic workflow design, generation of knowledge graphs, and efficient integration of simulation and experiment. We present a generalizable database architecture ESAMP that captures the complete provenance associated with a material. We demonstrate this architecture and provenance based machine learning on one of the largest experimental materials databases.
Bibliography:Electronic supplementary information (ESI) available: Detailed schema discussion for relational database implementation of ESAMP. See DOI
https://doi.org/10.1039/d3dd00054k
USDOE
SC0004993; DESC0020383
ISSN:2635-098X
2635-098X
DOI:10.1039/d3dd00054k