Streaming changepoint detection for transition matrices
Sequentially detecting multiple changepoints in a data stream is a challenging task. Difficulties relate to both computational and statistical aspects, and in the latter, specifying control parameters is a particular problem. Choosing control parameters typically relies on unrealistic assumptions, s...
Saved in:
Published in: | Data mining and knowledge discovery Vol. 35; no. 4; pp. 1287 - 1316 |
---|---|
Main Authors: | , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
01-07-2021
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Sequentially detecting multiple changepoints in a data stream is a challenging task. Difficulties relate to both computational and statistical aspects, and in the latter, specifying control parameters is a particular problem. Choosing control parameters typically relies on unrealistic assumptions, such as the distributions generating the data, and their parameters, being known. This is implausible in the streaming paradigm, where several changepoints will exist. Further, current literature is mostly concerned with streams of continuous-valued observations, and focuses on detecting a single changepoint. There is a dearth of literature dedicated to detecting multiple changepoints in
transition matrices
, which arise from a sequence of discrete states. This paper makes the following contributions: a complete framework is developed for
adaptively
and
sequentially
estimating a Markov transition matrix in the streaming data setting. A change detection method is then developed, using a novel moment matching technique, which can effectively monitor for multiple changepoints in a transition matrix. This adaptive detection and estimation procedure for transition matrices, referred to as
ADEPT-M
, is compared to several change detectors on synthetic data streams, and is implemented on two real-world data streams – one consisting of over nine million HTTP web requests, and the other being a well-studied electricity market data set. |
---|---|
ISSN: | 1384-5810 1573-756X |
DOI: | 10.1007/s10618-021-00747-7 |