Crochemore’s Partitioning on Weighted Strings and Applications

Given a string on alphabet Σ the partitioning problem is to compute classes of equivalences on the set of positions of the input string. These classes implicitly memorise identical factors of the string and, hence, their efficient computation is essential for a wide range of string processing applic...

Full description

Saved in:
Bibliographic Details
Published in:Algorithmica Vol. 80; no. 2; pp. 496 - 514
Main Authors: Barton, Carl, Pissis, Solon P.
Format: Journal Article
Language:English
Published: New York Springer US 01-02-2018
Springer Nature B.V
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Given a string on alphabet Σ the partitioning problem is to compute classes of equivalences on the set of positions of the input string. These classes implicitly memorise identical factors of the string and, hence, their efficient computation is essential for a wide range of string processing applications. We study this problem for a weighted string : for every position of the weighted string and every letter of the alphabet a probability of occurrence of this letter at this position is given. Thus a weighted string may represent many different strings, each with probability of occurrence equal to the product of probabilities of its letters at subsequent positions. In this article, we present a non-trivial generalisation of Crochemore’s partitioning algorithm (IPL, 1981) that works on weighted strings requiring time O ( υ n log υ n ) , where n is the length of the string, υ = min { z 2 , z n , σ n } , σ is the size of Σ , and 1 /  z is a cumulative weight threshold , defined as the minimal probability of occurrence of factors in the string. Our contributions can be summarised as follows: (a) we design the first algorithm to solve the partitioning problem on weighted strings for arbitrary z and σ in time O ( υ n log υ n ) and space O ( υ n ) improving the state of the art for z = O ( 1 ) ; (b) we improve the state of the art for numerous other string processing problems; and (c) we show further combinatorial insight into the relation between weighted and indeterminate strings, that is, sequences of alphabet subsets without associated occurrence probabilities.
ISSN:0178-4617
1432-0541
DOI:10.1007/s00453-016-0266-0