Large-Scale High-Utility Sequential Pattern Analytics in Internet of Things

The concepts of sequential pattern mining have become a growing topic in data mining, finding a home most recently in the Internet of Things (IoT) where large volumes of data are presented by the second for analysis and knowledge extraction. One key topic within the realm of sequential pattern minin...

Full description

Saved in:
Bibliographic Details
Published in:IEEE internet of things journal Vol. 8; no. 16; pp. 12669 - 12678
Main Authors: Srivastava, Gautam, Lin, Jerry Chun-Wei, Zhang, Xuyun, Li, Yuanfa
Format: Journal Article
Language:English
Published: Piscataway IEEE 15-08-2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The concepts of sequential pattern mining have become a growing topic in data mining, finding a home most recently in the Internet of Things (IoT) where large volumes of data are presented by the second for analysis and knowledge extraction. One key topic within the realm of sequential pattern mining in high-utility sequential pattern mining (HUSPM), short form for high-utility sequential pattern mining. HUSPM takes into account the fusion of utility and sequence factors to assist in the determination of sequential patterns of high utility from databases and data sources. That being said, almost all current existing literature focus on only using a single machine to increase mining performance. In this work, we present a four-stage MapReduce framework that is solely based on the well-known Spark platform for use in HUSPM. This framework is shown to create a more efficient and faster mining performance for dealing with large data sets. It consists of four phases such as initialization, mining, updating, and generation phases to handle the big data sets based on the MapReduce framework running on the Spark platform. Experiments indicated that the designed model is capable of handling the very big data sets while state-of-the-art algorithms can only achieve good performance in small data sets.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2020.3026826