Clustering of multi‐domain protein sequences

The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not cap...

Full description

Saved in:

Bibliographic Details
Published in:	Proteins, structure, function, and bioinformatics Vol. 86; no. 7; pp. 759 - 776
Main Authors:	Mehrotra, Prachi, Ami, Vimla Kany G., Srinivasan, Narayanaswamy
Format:	Journal Article
Language:	English
Published:	United States Wiley Subscription Services, Inc 01-07-2018
Subjects:	Alignment alignment‐free method Amino acid sequence Classification Classification schemes Clustering Datasets Information processing multi‐domain protein Nucleotide sequence protein classification Protein-tyrosine-phosphatase Proteins sequence classification Sequences Structure-function relationships Tyrosine multi-domain protein alignment-free method protein classification sequence classification
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The overall function of a multi‐domain protein is determined by the functional and structural interplay of its constituent domains. Traditional sequence alignment‐based methods commonly utilize domain‐level information and provide classification only at the level of domains. Such methods are not capable of taking into account the contributions of other domains in the proteins, and domain‐linker regions and classify multi‐domain proteins. An alignment‐free protein sequence comparison tool, CLAP (CLAssification of Proteins) was previously developed in our laboratory to especially handle multi‐domain protein sequences without a requirement of defining domain boundaries and sequential order of domains. Through this method we aim to achieve a biologically meaningful classification scheme for multi‐domain protein sequences. In this article, CLAP‐based classification has been explored on 5 datasets of multi‐domain proteins and we present detailed analysis for proteins containing (1) Tyrosine phosphatase and (2) SH3 domain. At the domain‐level CLAP‐based classification scheme resulted in a clustering similar to that obtained from an alignment‐based method. CLAP‐based clusters obtained for full‐length datasets were shown to comprise of proteins with similar functions and domain architectures. Our study demonstrates that multi‐domain proteins could be classified effectively by considering full‐length sequences without a requirement of identification of domains in the sequence.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0887-3585 1097-0134
DOI:	10.1002/prot.25510