Privately vertically mining of sequential patterns based on differential privacy with high efficiency and utility

Sequential pattern mining is one of the fundamental tools for many important data analysis tasks, such as web browsing behavior analysis. Based on frequent patterns, decision-makers can obtain both economic gains and social values. Sequential data, on the other hand, frequently contain sensitive inf...

Full description

Saved in:
Bibliographic Details
Published in:Scientific reports Vol. 13; no. 1; p. 17866
Main Authors: Liang, Wenjuan, Zhang, Wenke, Liang, Songtao, Yuan, Caihong
Format: Journal Article
Language:English
Published: London Nature Publishing Group UK 19-10-2023
Nature Publishing Group
Nature Portfolio
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sequential pattern mining is one of the fundamental tools for many important data analysis tasks, such as web browsing behavior analysis. Based on frequent patterns, decision-makers can obtain both economic gains and social values. Sequential data, on the other hand, frequently contain sensitive information, and directly analyzing these data will raise user concerns from a privacy perspective. Differential privacy (DP), as the most popular privacy model, has been employed to address this privacy concern. Most existing DP-Solutions are designed to combine horizontal sequence pattern mining algorithms with differential privacy. Due to the inefficiency of horizontal algorithms, their DP-Solutions cannot ensure high efficiency and accuracy while offering a high privacy guarantee. Therefore, we proposed privVertical, a new private sequence pattern mining scheme combining the vertical mining algorithm with differential privacy to achieve the above objective. Unlike DP-solutions based on horizontal algorithms, privVertical can promote efficiency by avoiding performing costly database scans or costly projection database constructions. Moreover, to promote accuracy, a differentially private hash MapList (called privHashMap) is designed to record frequent concurrency items and their noisy support based on the Sparse Vector Technique. PrivHashMap is used to pre-pruning excessive infrequent candidate sequences in private mining, and Sparse Vector Technique is used to promote the accuracy of PrivHashMap. After pruning these invalid candidate sequences, less noise is required to achieve the same level of privacy, increasing the accuracy of private mining. Theoretical privacy analysis proves privVertical satisfies ε -differential privacy. Experiments show that privVertical achieves higher accuracy and efficiency while achieving the same privacy level.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2045-2322
2045-2322
DOI:10.1038/s41598-023-43030-z