TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases

Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern...

Full description

Saved in:

Bibliographic Details
Published in:	15th International Conference on Scientific and Statistical Database Management, 2003 pp. 171 - 180
Main Authors:	Wang, J.T.L., Huiyuan Shan, Shasha, D., Piel, W.H.
Format:	Conference Proceeding
Language:	English
Published:	IEEE 2003
Subjects:	Biology Computer science Data analysis Educational institutions Information retrieval Nearest neighbor searches Phylogeny Search engines Web sites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern tree P and a data tree D, the TreeRank score from P to D is a measure of the topological relationships in P that are found to be the same or similar in D. The proposed algorithm calculates the TreeRank score in O(M/sup 2/ + N) time where M is the number of nodes appearing in both P and D, and N is the number of nodes in D. We then develop a search engine that, given a query or pattern tree P and a database of trees D, finds and ranks the nearest neighbors of P in D where the "nearness" is measured by the proposed similarity function. This structure-based search engine is fully operational and is available on the World Wide Web.
ISBN:	0769519644 9780769519647
ISSN:	1099-3371
DOI:	10.1109/SSDM.2003.1214978