ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H 2 O solutions stored at 50 °C for 105 days were used to predicted stability by applying ru...
Saved in:
Published in: | Journal of computer-aided molecular design Vol. 28; no. 9; pp. 941 - 950 |
---|---|
Main Authors: | , , , , , , , , |
Format: | Journal Article |
Language: | English |
Published: |
Cham
Springer International Publishing
01-09-2014
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H
2
O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (
p
s
) and an unstable probability (
p
uns
). 13,340 ACFs, together with their
p
s
and
p
uns
data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned
p
s
and
p
uns
values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes’ theorem, based upon the
p
s
and
p
uns
values of the compound ACFs. We were able to achieve performance with an AUC value of 84 % and a tenfold cross validation accuracy of 76.5 %. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0920-654X 1573-4951 |
DOI: | 10.1007/s10822-014-9778-3 |