Electronic Journal of Statistics

Consistent change-point detection with kernels

Damien Garreau and Sylvain Arlot

Full-text: Open access


In this paper we study the kernel change-point algorithm (KCP) proposed by Arlot, Celisse and Harchaoui [5], which aims at locating an unknown number of change-points in the distribution of a sequence of independent data taking values in an arbitrary set. The change-points are selected by model selection with a penalized kernel empirical criterion. We provide a non-asymptotic result showing that, with high probability, the KCP procedure retrieves the correct number of change-points, provided that the constant in the penalty is well-chosen; in addition, KCP estimates the change-points location at the optimal rate. As a consequence, when using a characteristic kernel, KCP detects all kinds of change in the distribution (not only changes in the mean or the variance), and it is able to do so for complex structured data (not necessarily in $\mathbb{R}^{d}$). Most of the analysis is conducted assuming that the kernel is bounded; part of the results can be extended when we only assume a finite second-order moment. We also demonstrate KCP on both synthetic and real data.

Article information

Electron. J. Statist., Volume 12, Number 2 (2018), 4440-4486.

Received: June 2017
First available in Project Euclid: 18 December 2018

Permanent link to this document

Digital Object Identifier

Primary: 62M10: Time series, auto-correlation, regression, etc. [See also 91B84]
Secondary: 62G20: Asymptotic properties

Change-point detection kernel methods penalized least-squares

Creative Commons Attribution 4.0 International License.


Garreau, Damien; Arlot, Sylvain. Consistent change-point detection with kernels. Electron. J. Statist. 12 (2018), no. 2, 4440--4486. doi:10.1214/18-EJS1513. https://projecteuclid.org/euclid.ejs/1545123630

Export citation


  • [1] Abou-Elailah, A., Gouet-Brunet, V. and Bloch, I. (2015). Detection of Abrupt Changes in Spatial Relationships in Video Sequences. In, International Conference on Pattern Recognition Applications and Methods 89–106. Springer.
  • [2] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate., Annals of Statistics 34 584–653.
  • [3] Anguita, D., Ghio, A., Oneto, L., Parra, X. and Reyes-Ortiz, J. L. (2013). A public domain dataset for human activity recognition using smartphones. In, ESANN.
  • [4] Arlot, S. and Celisse, A. (2011). Segmentation of the mean of heteroscedastic data via cross-validation., Statistics and Computing 21 613–632.
  • [5] Arlot, S., Celisse, A. and Harchaoui, Z. (2012). A kernel multiple change-point algorithm via model selection., Preprint. Available at https://arxiv.org/abs/1202.3878v2.
  • [6] Aronszajn, N. (1950). Theory of reproducing kernels., Transactions of the American mathematical society 337–404.
  • [7] Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural changes., Econometrica 47–78.
  • [8] Basseville, M. and Nikiforov, I. V. (1993)., Detection of abrupt changes: theory and application. Prentice Hall Englewood Cliffs.
  • [9] Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: overview and implementation., Statistics and Computing 22 455–470.
  • [10] Bellman, R. (1961). On the approximation of curves by line segments using dynamic programming., Communications of the ACM 4 284.
  • [11] Birgé, L. and Massart, P. (2001). Gaussian model selection., Journal of the European Mathematical Society 3 203–268.
  • [12] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probability Theory and Related Fields 138 33–73.
  • [13] Boysen, L., Kempe, A., Liebscher, V., Munk, A. and Wittich, O. (2009). Consistencies and rates of convergence of jump-penalized least squares estimators., Annals of Statistics 37 157–183.
  • [14] Brodsky, B. E. and Darkhovsky, B. S. (2013)., Nonparametric methods in change point problems 243. Springer Science & Business Media.
  • [15] Brunel, V.-E. (2014). Convex set detection., Preprint. Available at https://arxiv.org/abs/1404.6224.
  • [16] Carlstein, E. (1988). Nonparametric change-point estimation., Annals of Statistics 188–197.
  • [17] Celisse, A., Marot, G., Pierre-Jean, M. and Rigaill, G. (2016). New efficient algorithms for multiple change-point detection with kernels., Preprint. Available at https://hal.inria.fr/hal-01413230.
  • [18] Comte, F. and Rozenholc, Y. (2004). A new algorithm for fixed design regression and denoising., Annals of the Institute of Statistical Mathematics 56 449–473.
  • [19] Desobry, F., Davy, M. and Doncarli, C. (2005). An online kernel change detection algorithm., Signal Processing, IEEE Transactions on 53 2961–2974.
  • [20] Diestel, J. and Uhl, J. J. (1977)., Vector measures 15. American Mathematical Soc.
  • [21] Dieuleveut, A. and Bach, F. (2016). Nonparametric stochastic approximation with large step-sizes., Annals of Statistics 44 1363–1399.
  • [22] Fisher, W. D. (1958). On grouping for maximum homogeneity., Journal of the American Statistical Association 53 789–798.
  • [23] Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detection., The Annals of Statistics 42 2243–2281.
  • [24] Fukumizu, K., Gretton, A., Sun, X. and Schölkopf, B. (2008). Kernel Measures of Conditional Dependence. In, Advances in Neural Information Processing Systems 20 489–496. Curran Associates, Inc.
  • [25] Garreau, D., Jitkrittum, W. and Kanagawa, M. (2018). Large sample analysis of the median heuristic., Preprint. Available at https://arxiv.org/abs/1707.07269.
  • [26] Gold, B., Morgan, N. and Ellis, D. (2011)., Speech and audio signal processing: processing and perception of speech and music, second ed. John Wiley & Sons.
  • [27] Gretton, A., Borgwardt, K. M., Rasch, M., Schölkopf, B. and Smola, A. J. (2006). A kernel method for the two-sample-problem. In, Advances in neural information processing systems 513–520.
  • [28] Gretton, A., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M., Fukumizu, K. and Sriperumbudur, B. K. (2012). Optimal kernel choice for large-scale two-sample tests. In, Advances in Neural Information Processing Systems 1205–1213.
  • [29] Hájek, J. and Rényi, A. (1955). Generalization of an inequality of Kolmogorov., Acta Mathematica Hungarica 6 281–283.
  • [30] Harchaoui, Z. and Cappé, O. (2007). Retrospective multiple change-point estimation with kernels. In, IEEE Workshop on Statistical Signal Processing 768–772.
  • [31] Harchaoui, Z., Moulines, E. and Bach, F. R. (2009). Kernel change-point analysis. In, Advances in neural information processing systems 609–616.
  • [32] Hubert, L. and Arabie, P. (1985). Comparing partitions., Journal of classification 2 193–218.
  • [33] Kim, A. Y., Marzban, C., Percival, D. B. and Stuetzle, W. (2009). Using labeled data to evaluate change detectors in a multivariate streaming environment., Signal Processing 89 2529–2536.
  • [34] Kolmogorov, A. N. (1928). Über die Summen durch den Zufall bestimmten unabhängigen Größen., Mathematische Annalen 99 484–488.
  • [35] Korostelev, A. P. (1988). On minimax estimation of a discontinuous signal., Theory of Probability & Its Applications 32 727–730.
  • [36] Korostelev, A. P. and Tsybakov, A. B. (2012)., Minimax theory of image reconstruction 82. Springer Science & Business Media.
  • [37] Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data., Bioinformatics 21 3763–3770.
  • [38] Lajugie, R., Arlot, S. and Bach, F. (2014). Large-margin metric learning for constrained partitioning problems. In, Proceedings of The 31st International Conference on Machine Learning 297–305.
  • [39] Lavielle, M. (2005). Using penalized contrasts for the change-point problem., Signal processing 85 1501–1510.
  • [40] Lavielle, M. and Moulines, E. (2000). Least-squares Estimation of an Unknown Number of Shifts in a Time Series., Journal of time series analysis 21 33–59.
  • [41] Lavielle, M. and Teyssiere, G. (2006). Detection of multiple change-points in multivariate time series., Lithuanian Mathematical Journal 46 287–306.
  • [42] Lebarbier, É. (2005). Detecting multiple change-points in the mean of a Gaussian process by model selection., Signal Proces. 85 717–736.
  • [43] Ledoux, M. and Talagrand, M. (2013)., Probability in Banach Spaces: isoperimetry and processes 23. Springer Science & Business Media.
  • [44] Li, S., Xie, Y., Dai, H. and Song, L. (2015). $M$-Statistic for Kernel Change-Point Detection., Advances in Neural Information Processing Systems 3366–3374.
  • [45] Liu, J., Wu, S. and Zidek, J. V. (1997). On segmented multivariate regression., Statistica Sinica 7 497–525.
  • [46] Liu, S., Suzuki, T., Relator, R., Sese, J., Sugiyama, M. and Fukumizu, K. (2017). Support consistency of direct sparse-change learning in Markov networks., Annals of Statistics.
  • [47] Page, E. S. (1955). A test for a change in a parameter occurring at an unknown point., Biometrika 523–527.
  • [48] Ritov, Y., Raz, A. and Bergman, H. (2002). Detection of onset of neuronal activity by allowing for heterogeneity in the change points., Journal of neuroscience methods 122 25–42.
  • [49] Schölkopf, B. and Smola, A. J. (2002)., Learning with kernels: Support vector machines, regularization, optimization, and beyond. MIT press.
  • [50] Shao, J. (1997). An asymptotic theory for linear model selection., Statistica Sinica 7 221–242.
  • [51] Sharipov, O., Tewes, J. and Wendler, M. (2016). Sequential block bootstrap in a Hilbert space with application to change point analysis., The Canadian Journal of Statistics. La Revue Canadienne de Statistique 44 300–322.
  • [52] Shixin, G. (1997). The Hájek-Rényi inequality for Banach space valued martingales and the $p$-smoothness of Banach spaces., Statistics & Probability Letters 32 245–248.
  • [53] Spokoiny, V. (2009). Multiscale local change point detection with applications to value-at-risk., Annals of Statistics 1405–1436.
  • [54] Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Lanckriet, G. R. G. and Schölkopf, B. (2009). Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions. In, Advances in Neural Information Processing Systems, 21 NIPS Foundation.
  • [55] Tartakovsky, A., Nikiforov, I. V. and Basseville, M. (2014)., Sequential Analysis: Hypothesis Testing and Changepoint Detection. Monographs on Statistics and Applied Probability 136. Chapman and Hall/CRC, Boca Raton, FL.
  • [56] Truong, C., Oudre, L. and Vayatis, N. (2018). A review of change point detection methods., Preprint. Available at https://arxiv.org/abs/1801.00718.
  • [57] Vogt, M. and Dette, H. (2015). Detecting gradual changes in locally stationary processes., Annals of Statistics 43 713–740.
  • [58] Wang, T. and Samworth, R. J. (2016). High-dimensional changepoint estimation via sparse projection., Preprint. https://arxiv.org/abs/1606.06246.
  • [59] Winkler, G., Kempe, A., Liebscher, V. and Wittich, O. (2005). Parsimonious segmentation of time series by potts models. In, Innovations in Classification, Data Science, and Information Systems 295–302. Springer.
  • [60] Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion., Statistics & Probability Letters 6 181–189.
  • [61] Yao, Y.-C. and Au, S.-T. (1989). Least-squares estimation of a step function., Sankhyā: The Indian Journal of Statistics, Series A 370–381.
  • [62] Zou, C., Yin, G., Feng, L. and Wang, Z. (2014). Nonparametric maximum likelihood approach to multiple change-point problems., Annals of Statistics 42 970–1002.