Greedy Gaussian segmentation of multivariate time series

We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimizat...

Full description

Saved in:

Bibliographic Details
Published in:	Advances in data analysis and classification Vol. 13; no. 3; pp. 727 - 751
Main Authors:	Hallac, David, Nystrup, Peter, Boyd, Stephen
Format:	Journal Article
Language:	English
Published:	Berlin/Heidelberg Springer Berlin Heidelberg 01-09-2019 Springer Nature B.V
Subjects:	Chemistry and Earth Sciences Combinatorial analysis Computer Science Covariance Data Mining and Knowledge Discovery Dynamic programming Economics Finance Gaussian distribution Health Sciences Heuristic methods Humanities Insurance Law Management Mathematics and Statistics Medicine Normal distribution Physics Regular Article Segmentation Statistical Theory and Methods Statistics Statistics for Business Statistics for Engineering Statistics for Life Sciences Statistics for Social Sciences Time series Change-point detection Greedy algorithms Covariance regularization Time series analysis Financial regimes 37M10: Time series analysis Text segmentation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data.
ISSN:	1862-5347 1862-5355
DOI:	10.1007/s11634-018-0335-0