The Annals of Statistics

Optimal sequential detection in multi-stream data

Hock Peng Chan

Abstract

Consider a large number of detectors each generating a data stream. The task is to detect online, distribution changes in a small fraction of the data streams. Previous approaches to this problem include the use of mixture likelihood ratios and sum of CUSUMs. We provide here extensions and modifications of these approaches that are optimal in detecting normal mean shifts. We show how the (optimal) detection delay depends on the fraction of data streams undergoing distribution changes as the number of detectors goes to infinity. There are three detection domains. In the first domain for moderately large fractions, immediate detection is possible. In the second domain for smaller fractions, the detection delay grows logarithmically with the number of detectors, with an asymptotic constant extending those in sparse normal mixture detection. In the third domain for even smaller fractions, the detection delay lies in the framework of the classical detection delay formula of Lorden. We show that the optimal detection delay is achieved by the sum of detectability score transformations of either the partial scores or CUSUM scores of the data streams.

Article information

Source
Ann. Statist., Volume 45, Number 6 (2017), 2736-2763.

Dates
Revised: July 2016
First available in Project Euclid: 15 December 2017

https://projecteuclid.org/euclid.aos/1513328589

Digital Object Identifier
doi:10.1214/17-AOS1546

Mathematical Reviews number (MathSciNet)
MR3737908

Zentralblatt MATH identifier
06838149

Subjects
Primary: 62G10: Hypothesis testing 62L10: Sequential analysis

Citation

Chan, Hock Peng. Optimal sequential detection in multi-stream data. Ann. Statist. 45 (2017), no. 6, 2736--2763. doi:10.1214/17-AOS1546. https://projecteuclid.org/euclid.aos/1513328589

References

• [1] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
• [2] Arias-Castro, E., Donoho, D. L. and Huo, X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Statist. 34 326–349.
• [3] Chan, H. P. (2009). Detection of spatial clustering with average likelihood ratio test statistics. Ann. Statist. 37 3985–4010.
• [4] Chan, H. P. and Walther, G. (2015). Optimal detection of multi-sample aligned sparse signals. Ann. Statist. 43 1865–1895.
• [5] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• [6] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
• [7] Ingster, Y. I. (1998). Minimax detection of a signal for $l^{n}$-balls. Math. Methods Statist. 7 401–428.
• [8] Jeng, X. J., Cai, T. T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157–172.
• [9] Lai, T. L. (1995). Sequential changepoint detection in quality control and dynamical systems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 613–658.
• [10] Lorden, G. (1971). Procedures for reacting to a change in distribution. Ann. Math. Stat. 42 1897–1908.
• [11] Mei, Y. (2006). Sequential change-point detection when unknown parameters are present in the pre-change distribution. Ann. Statist. 34 92–122.
• [12] Mei, Y. (2010). Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97 419–433.
• [13] Moustakides, G. V. (1986). Optimal stopping times for detecting changes in distributions. Ann. Statist. 14 1379–1387.
• [14] Pollak, M. (1985). Optimal detection of a change in distribution. Ann. Statist. 13 206–227.
• [15] Pollak, M. (1987). Average run lengths of an optimal method of detecting a change in distribution. Ann. Statist. 15 749–779.
• [16] Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York.
• [17] Siegmund, D., Yakir, B. and Zhang, N. R. (2011). Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Stat. 5 645–668.
• [18] Tartakovsky, A. G. and Veeravalli, V. V. (2008). Asymptotically optimal quickest change detection in distributed sensor systems. Sequential Anal. 27 441–475.
• [19] Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist. 41 670–692.