TreeRank: a similarity measure for nearest neighbor searching in phylogenetic databases

Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern...

Full description

Saved in:
Bibliographic Details
Published in:15th International Conference on Scientific and Statistical Database Management, 2003 pp. 171 - 180
Main Authors: Wang, J.T.L., Huiyuan Shan, Shasha, D., Piel, W.H.
Format: Conference Proceeding
Language:English
Published: IEEE 2003
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Phylogenetic trees are unordered labeled trees in which each leaf node has a label and the order among siblings is unimportant. In this paper we propose a new similarity measure, called TreeRank, for phylogenetic trees and present an algorithm for computing TreeRank scores. Given a query or pattern tree P and a data tree D, the TreeRank score from P to D is a measure of the topological relationships in P that are found to be the same or similar in D. The proposed algorithm calculates the TreeRank score in O(M/sup 2/ + N) time where M is the number of nodes appearing in both P and D, and N is the number of nodes in D. We then develop a search engine that, given a query or pattern tree P and a database of trees D, finds and ranks the nearest neighbors of P in D where the "nearness" is measured by the proposed similarity function. This structure-based search engine is fully operational and is available on the World Wide Web.
ISBN:0769519644
9780769519647
ISSN:1099-3371
DOI:10.1109/SSDM.2003.1214978