Perturbation-Theory and Machine Learning (PTML) Model for High-Throughput Screening of Parham Reactions: Experimental and Theoretical Studies
Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting...
Saved in:
Published in: | Journal of chemical information and modeling Vol. 58; no. 7; pp. 1384 - 1396 |
---|---|
Main Authors: | , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
United States
American Chemical Society
23-07-2018
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Machine learning (ML) algorithms are gaining importance in the processing of chemical information and modeling of chemical reactivity problems. In this work, we have developed a perturbation-theory and machine learning (PTML) model combining perturbation theory (PT) and ML algorithms for predicting the yield of a given reaction. For this purpose, we have selected Parham cyclization, which is a general and powerful tool for the synthesis of heterocyclic and carbocyclic compounds. This reaction has both structural (substitution pattern on the substrate, internal electrophile, ring size, etc.) and operational variables (organolithium reagent, solvent, temperature, time, etc.), so predicting the effect of changes on substrate design (internal elelctrophile, halide, etc.) or reaction conditions on the yield is an important task that could help to optimize the reaction design. The PTML model developed uses PT operators to account for perturbations under experimental conditions and/or structural variables of all the molecules involved in a query reaction, compared to a reaction of reference. Thus, a dataset of >100 reactions has been collected for different substrates and internal electrophiles, under different reaction conditions, with a wide range of yields (0–98%). The best PTML model found using General Linear Regression (GLR) has R = 0.88 in training and R = 0.83 in external validation series for 10 000 pairs of query and reference reactions. The PTML model has a final R = 0.95 for all reactions using multiple reactions of reference. We also report a comparative study of linear versus nonlinear PTML models based on artificial neural network (ANN) algorithms. PTML-ANN models (LNN, MLP, RBF) with R ≈ 0.1–0.8 do not outperform the first PMTL model. This result confirms the validity of the linearity of the model. Next, we carried out an experimental and theoretical study of nonreported Parham reactions to illustrate the practical use of the PTML model. A 500 000-point simulation and a Hammett analysis of the reactivity space of Parham reactions are also reported. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 1549-9596 1549-960X |
DOI: | 10.1021/acs.jcim.8b00286 |