ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability

Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H 2 O solutions stored at 50 °C for 105 days were used to predicted stability by applying ru...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computer-aided molecular design Vol. 28; no. 9; pp. 941 - 950
Main Authors: Liu, Zhihong, Zheng, Minghao, Yan, Xin, Gu, Qiong, Gasteiger, Johann, Tijhuis, Johan, Maas, Peter, Li, Jiabo, Xu, Jun
Format: Journal Article
Language:English
Published: Cham Springer International Publishing 01-09-2014
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H 2 O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability ( p s ) and an unstable probability ( p uns ). 13,340 ACFs, together with their p s and p uns data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p s and p uns values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes’ theorem, based upon the p s and p uns values of the compound ACFs. We were able to achieve performance with an AUC value of 84 % and a tenfold cross validation accuracy of 76.5 %. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0920-654X
1573-4951
DOI:10.1007/s10822-014-9778-3