The Annals of Statistics

Sequential change-point detection based on nearest neighbors

Hao Chen

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We propose a new framework for the detection of change-points in online, sequential data analysis. The approach utilizes nearest neighbor information and can be applied to sequences of multivariate observations or non-Euclidean data objects, such as network data. Different stopping rules are explored, and one specific rule is recommended due to its desirable properties. An accurate analytic approximation of the average run length is derived for the recommended rule, making it an easy off-the-shelf approach for real multivariate/object sequential data monitoring applications. Simulations reveal that the new approach has better performance than likelihood-based approaches for high dimensional data. The new approach is illustrated through a real dataset in detecting global structural changes in social networks.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1381-1407.

Dates
Received: February 2017
Revised: April 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1550026842

Digital Object Identifier
doi:10.1214/18-AOS1718

Mathematical Reviews number (MathSciNet)
MR3911116

Zentralblatt MATH identifier
07053512

Subjects
Primary: 62G32: Statistics of extreme values; tail inference
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Keywords
Change-point sequential detection graph-based tests nonparametrics scan statistic tail probability high-dimensional data network data non-Euclidean data

Citation

Chen, Hao. Sequential change-point detection based on nearest neighbors. Ann. Statist. 47 (2019), no. 3, 1381--1407. doi:10.1214/18-AOS1718. https://projecteuclid.org/euclid.aos/1550026842


Export citation

References

  • Bickel, P. J. and Breiman, L. (1983). Sums of functions of nearest neighbor distances, moment bounds, limit theorems and a goodness of fit test. Ann. Probab. 11 185–214.
  • Chan, H. P. and Walther, G. (2015). Optimal detection of multi-sample aligned sparse signals. Ann. Statist. 43 1865–1895.
  • Chen, H. (2019). Supplement to “Sequential change-point detection based on nearest neighbors.” DOI:10.1214/18-AOS1718SUPP.
  • Chen, H. and Zhang, N. (2015). Graph-based change-point detection. Ann. Statist. 43 139–176.
  • Eagle, N., Pentland, A. S. and Lazer, D. (2009). Inferring friendship network structure by using mobile phone data. Proc. Natl. Acad. Sci. USA 106 15274–15278.
  • Heard, N. A., Weston, D. J., Platanioti, K. and Hand, D. J. (2010). Bayesian anomaly detection methods for social networks. Ann. Appl. Stat. 4 645–662.
  • Henze, N. (1988). A multivariate two-sample test based on the number of nearest neighbor type coincidences. Ann. Statist. 16 772–783.
  • Kappenman, J. (2012). A perfect storm of planetary proportions. IEEE Spectrum 49 26–31.
  • Mei, Y. (2010). Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97 419–433.
  • Qu, M., Shih, F. Y., Jing, J. and Wang, H. (2005). Automatic solar filament detection using image processing techniques. Sol. Phys. 228 119–135.
  • Schilling, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc. 81 799–806.
  • Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York.
  • Siegmund, D. and Venkatraman, E. S. (1995). Using the generalized likelihood ratio statistic for sequential detection of a change-point. Ann. Statist. 23 255–271.
  • Siegmund, D. and Yakir, B. (2007). The Statistics of Gene Mapping. Springer, New York.
  • Tartakovsky, A., Nikiforov, I. and Basseville, M. (2015). Sequential Analysis: Hypothesis Testing and Changepoint Detection. Monographs on Statistics and Applied Probability 136. CRC Press, Boca Raton, FL.
  • Tartakovsky, A. G. and Veeravalli, V. V. (2008). Asymptotically optimal quickest change detection in distributed sensor systems. Sequential Anal. 27 441–475.
  • Wald, A. (1973). Sequential Analysis. Dover, Mineola, NY.
  • Wang, H., Tang, M., Park, Y. and Priebe, C. E. (2014). Locality statistics for anomaly detection in time series of graphs. IEEE Trans. Signal Process. 62 703–717.
  • Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. In 2013 Information Theory and Applications Workshop (ITA) 1–20. IEEE, Los Alamitos, CA.
  • Yang, W., Lipsitch, M. and Shaman, J. (2015). Inference of seasonal and pandemic influenza transmission dynamics. Proc. Natl. Acad. Sci. USA 112 2723–2728.
  • Yang, S., Santillana, M. and Kou, S. C. (2015). Accurate estimation of influenza epidemics using Google search data via ARGO. Proc. Natl. Acad. Sci. USA 112 14473–14478.

Supplemental materials

  • Proofs for theorems. This supplement contains proofs for Theorem 4.2 and Theorem 4.4.