Efficient algorithms for mining frequent high utility sequences with constraints

•A search space partitioning approach for frequent high utility sequence mining with constraints.•The C-FHUSM algorithm to mine these patterns from generators and closed sequences.•The FGenHUSM algorithm to mine all frequent generator high utility sequences. An important data mining task is to disco...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences Vol. 568; pp. 239 - 264
Main Authors: Truong, Tin, Duong, Hai, Le, Bac, Fournier-Viger, Philippe, Yun, Unil, Fujita, Hamido
Format: Journal Article
Language:English
Published: Elsevier Inc 01-08-2021
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•A search space partitioning approach for frequent high utility sequence mining with constraints.•The C-FHUSM algorithm to mine these patterns from generators and closed sequences.•The FGenHUSM algorithm to mine all frequent generator high utility sequences. An important data mining task is to discover all high utility sequences in a quantitative sequence database. Although useful, the number of discovered sequences is often very large. To find patterns that are more tailored to a user’s needs, this paper studies the problem of mining frequent high utility sequences satisfying item constraints. This article proposes a novel algorithm named C-FHUSM to quickly obtain these sequences from two concise representations discovered from a quantitative sequence database, namely frequent generator high utility sequences and frequent closed high utility sequences. The first set is extracted using a novel algorithm named FGenHUSM, while an existing algorithm is applied to extract the second set. C-FHUSM integrates novel pruning techniques to ignore sequences that do not satisfy item constraints early by checking only a small number of representative sequences at the beginning of the mining process. Experimental results show that C-FHUSM can be more than ten times faster and has better scalability than a modified version of the state-of-the-art EHUSM algorithm for mining sequences with item constraints. Moreover, it is found that using C-FHUSM is beneficial when a user frequently changes constraints as results can be updated without rescanning the database.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2021.01.060