Deep2Full: Evaluating strategies for selecting the minimal mutational experiments for optimal computational predictions of deep mutational scan outcomes

Performing a complete deep mutational scan with all single point mutations may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In a realisti...

Full description

Saved in:

Bibliographic Details
Published in:	PloS one Vol. 15; no. 1; p. e0227621
Main Authors:	Sruthi, C K, Prakash, Meher
Format:	Journal Article
Language:	English
Published:	United States Public Library of Science 01-01-2020 Public Library of Science (PLoS)
Subjects:	Alanine Amino Acid Sequence Amino acids Animals Asparagine Assaying Biology and Life Sciences Computational Biology - methods Computer and Information Sciences Computer applications Computer Simulation DNA Mutational Analysis - methods Drug resistance Earth Sciences Error analysis Evaluation Experiments Histidine Humans Kinases Machine learning Mathematical models Methods Models, Theoretical Mutagenesis Mutagenesis, Site-Directed - methods Mutants Mutation Nehru, Jawaharlal (1889-1964) Neural networks Physical Sciences Point Mutation - genetics Predictions Proteins Research and Analysis Methods Social Sciences Training India
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Performing a complete deep mutational scan with all single point mutations may not be practical, and may not even be required, especially if predictive computational models can be developed. Computational models are however naive to cellular response in the myriads of assay-conditions. In a realistic paradigm of assay context-aware predictive hybrid models that combine minimal experimental data from deep mutational scans with structure, sequence information and computational models, we define and evaluate different strategies for choosing this minimal set. We evaluated the trivial strategy of a systematic reduction in the number of mutational studies from 85% to 15%, along with several others about the choice of the types of mutations such as random versus site-directed with the same 15% data completeness. Interestingly, the predictive capabilities by training on a random set of mutations and using a systematic substitution of all amino acids to alanine, asparagine and histidine (ANH) were comparable. Another strategy we explored, augmenting the training data with measurements of the same mutants at multiple assay conditions, did not improve the prediction quality. For the six proteins we analyzed, the bin-wise error in prediction is optimal when 50-100 mutations per bin are used in training the computational model, suggesting that good prediction quality may be achieved with a library of 500-1000 mutations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0227621