Multiple changepoint detection in categorical data streams
File(s)
Author(s)
Plasse, Joshua
Adams, Niall
Type
Journal Article
Abstract
The need for efficient tools is pressing in the era of big data, particularly in streaming data applications. As data streams are ubiquitous, the ability to accurately detect multiple changepoints, without affecting the continuous flow of data, is an important issue. Change detection for categorical data streams is understudied, and existing work commonly introduces fixed control parameters while providing little insight into how they may be chosen. This is ill-suited to the streaming paradigm, motivating the need for an approach that introduces few parameters which may be set without requiring any prior knowledge of the stream. This paper introduces such a method, which can accurately detect changepoints in categorical data streams with fixed storage and computational requirements. The detector relies on the ability to adaptively monitor the category probabilities of a multinomial distribution, where temporal adaptivity is introduced using forgetting factors. A novel adaptive threshold is also developed which can be computed given a desired false positive rate. This method is then compared to sequential and nonsequential change detectors in a large simulation study which verifies the usefulness of our approach. A real data set consisting of nearly 40 million events from a computer network is also investigated.
Date Issued
2019-09-11
Date Acceptance
2019-02-02
Citation
Statistics and Computing, 2019, 29 (5), pp.1109-1125
ISSN
0960-3174
Publisher
Springer (part of Springer Nature)
Start Page
1109
End Page
1125
Journal / Book Title
Statistics and Computing
Volume
29
Issue
5
Copyright Statement
© 2019 The Author(s). Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
License URL
Identifier
https://link.springer.com/article/10.1007%2Fs11222-019-09858-0
Subjects
Science & Technology
Technology
Physical Sciences
Computer Science, Theory & Methods
Statistics & Probability
Computer Science
Mathematics
Adaptive threshold
Categorical data stream
Changepoint detection
Forgetting factor
CHANGE-POINT
CHARTS
0104 Statistics
0802 Computation Theory and Mathematics
Statistics & Probability
Publication Status
Published
Date Publish Online
2019-02-15