Constrained sequence analysis algorithms in computational biology

The knowledge of the similarity of two or more sequences is crucial in computational molecular biology. The longest common subsequence (LCS) is a well-known and widely used measure for sequence similarity. Constrained variants of the LCS problem have been studied in the literature where the knowledg...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences Vol. 295; pp. 247 - 257
Main Authors: Farhana, Effat, Rahman, M. Sohel
Format: Journal Article
Language:English
Published: Elsevier Inc 20-02-2015
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The knowledge of the similarity of two or more sequences is crucial in computational molecular biology. The longest common subsequence (LCS) is a well-known and widely used measure for sequence similarity. Constrained variants of the LCS problem have been studied in the literature where the knowledge of the functionalities or structures of the input sequences are provided in the form of inclusion/exclusion constraint patterns. In this paper we focus on different variants of the LCS problem involving multiple input sequences and constraint patterns. Given L input sequences and ℓ constraint patterns, the goal here is to construct an LCS of the given sequences such that each of the constraint patterns occurs/does not occur in the LCS as a subsequence/substring. We devise finite automata based efficient algorithms for all the variants of the problem that run in O(|Σ|(R+L)+nL+|Σ|Rnℓ) time, where R is the size of the resulting subsequence automaton, n is the length of each input sequence and Σ is the underlying alphabet. We also conduct an extensive experimental study to evaluate the practical performance of our algorithms. The experimental results suggest the superiority of our finite automata based algorithms. Therefore, we believe that our automata based algorithms will be useful in practical sequence analysis in computational biology and will replace the existing algorithms that are mostly based on memory intensive dynamic programming based methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2014.10.019