Large-Scale High-Utility Sequential Pattern Analytics in Internet of Things
The concepts of sequential pattern mining have become a growing topic in data mining, finding a home most recently in the Internet of Things (IoT) where large volumes of data are presented by the second for analysis and knowledge extraction. One key topic within the realm of sequential pattern minin...
Saved in:
Published in: | IEEE internet of things journal Vol. 8; no. 16; pp. 12669 - 12678 |
---|---|
Main Authors: | , , , |
Format: | Journal Article |
Language: | English |
Published: |
Piscataway
IEEE
15-08-2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The concepts of sequential pattern mining have become a growing topic in data mining, finding a home most recently in the Internet of Things (IoT) where large volumes of data are presented by the second for analysis and knowledge extraction. One key topic within the realm of sequential pattern mining in high-utility sequential pattern mining (HUSPM), short form for high-utility sequential pattern mining. HUSPM takes into account the fusion of utility and sequence factors to assist in the determination of sequential patterns of high utility from databases and data sources. That being said, almost all current existing literature focus on only using a single machine to increase mining performance. In this work, we present a four-stage MapReduce framework that is solely based on the well-known Spark platform for use in HUSPM. This framework is shown to create a more efficient and faster mining performance for dealing with large data sets. It consists of four phases such as initialization, mining, updating, and generation phases to handle the big data sets based on the MapReduce framework running on the Spark platform. Experiments indicated that the designed model is capable of handling the very big data sets while state-of-the-art algorithms can only achieve good performance in small data sets. |
---|---|
ISSN: | 2327-4662 2327-4662 |
DOI: | 10.1109/JIOT.2020.3026826 |