## Electronic Journal of Statistics

### A quasi-Bayesian perspective to online clustering

#### Abstract

When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 3071-3113.

Dates
First available in Project Euclid: 20 September 2018

https://projecteuclid.org/euclid.ejs/1537430425

Digital Object Identifier
doi:10.1214/18-EJS1479

Mathematical Reviews number (MathSciNet)
MR3856169

Zentralblatt MATH identifier
06942966

#### Citation

Li, Le; Guedj, Benjamin; Loustau, Sébastien. A quasi-Bayesian perspective to online clustering. Electron. J. Statist. 12 (2018), no. 2, 3071--3113. doi:10.1214/18-EJS1479. https://projecteuclid.org/euclid.ejs/1537430425

#### References

• P. Alquier., Transductive and Inductive Adaptive Inference for Regression and Density Estimation. PhD thesis, Université Paris 6, 2006.
• P. Alquier and G. Biau. Sparse single-index model., Journal of Machine Learning Research, 14:243–280, 2013.
• P. Alquier and B. Guedj. An Oracle Inequality for Quasi-Bayesian Non-Negative Matrix Factorization., Mathematical Methods of Statistics, 2017.
• P. Alquier and K. Lounici. PAC-Bayesian theorems for sparse regression estimation with exponential weights., Electronic Journal of Statistics, 5:127–145, 2011.
• J.-Y. Audibert., Une approche PAC-bayésienne de la théorie statistique de l’apprentissage. PhD thesis, Université Paris 6, 2004.
• J.-Y. Audibert. Fast learning rates in statistical inference through aggregation., The Annals of Statistics, 37(4) :1591–1646, 2009.
• K. S. Azoury and M. K. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions., Machine Learning, 43(3):211–246, 2001.
• W. Barbakh and C. Fyfe. Online clustering algorithms., International Journal of Neural Systems, 18(3):185–194, 2008.
• P. L. Bartlett, T. Linder, and G. Lugosi. The minimax distortion redundancy in empirical quantizer design., IEEE Transactions on Information Theory, 44(5) :1802–1813, 1998.
• J.-P. Baudry, C. Maugis, and B. Michel. Slope heuristics: overview and implementation., Statistics and Computing, 22(2):455–470, 2012.
• R. B. Calinski and J. Harabasz. A dendrite method for cluster analysis., Communications in Statistics, 3:1–27, 1974.
• O. Catoni., Statistical Learning Theory and Stochastic Optimization. École d’Été de Probabilités de Saint-Flour 2001. Springer, 2004.
• O. Catoni., PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning, volume 56 of Lecture notes – Monograph Series. Institute of Mathematical Statistics, 2007.
• N. Cesa-Bianchi. Analysis of two gradient-based algorithms for on-line regression., Journal of Computer and System Sciences, 59(3):392–411, 1999.
• N. Cesa-Bianchi and G. Lugosi., Prediction, Learning and Games. Cambridge University Press, New York, 2006.
• N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent., IEEE Transactions on Neural Networks, 7(3):604–619, 1996.
• N. Cesa-Bianchi, D. Helmbold, N. Freund, Y. Haussler, and M. K. Warmuth. How to use expert advice., Journal of the ACM, 44(3):427–485, 1997.
• N. Cesa-Bianchi, Y. Mansour, and G. Stoltz. Improved second-order bounds for prediction with expert advice., Machine Learning, 66(2):321–352, 2007. ISSN 1573-0565. URL https://doi.org/10.1007/s10994-006-5001-7.
• A. Choromanska and C. Monteleoni. Online clustering with experts. In, Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), pages 227–235, 2012.
• I. Csiszár. I-divergence geometry of probability distributions and minimization problems., Annals of Probability, 3:146–158, 1975.
• A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting and sharp oracle inequalities. In, Learning theory (COLT 2007), Lecture Notes in Computer Science, pages 97–111, 2007.
• A. S. Dalalyan and A. B. Tsybakov. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity., Machine Learning, 72:39–61, 2008.
• A. S. Dalalyan and A. B. Tsybakov. Mirror averaging with sparsity priors., Bernoulli, 18(3):914–944, 2012a.
• A. S. Dalalyan and A. B. Tsybakov. Sparse regression learning by aggregation and Langevin Monte-Carlo., Journal of Computer and System Sciences, 78(5) :1423–1443, 2012b.
• P. Dellaportas, J. J. Forster, and I. Ntzoufras. On Bayesian model and variable selection using MCMC., Statistics and Computing, 12(1):27–36, 2002.
• A. Fischer. On the number of groups in clustering., Statistics and Probability Letters, 81 :1771–1781, 2011.
• S. Gerchinovitz., Prédiction de suites individuelles et cadre statistique classique : étude de quelques liens autour de la régression parcimonieuse et des techniques d’agrégation. PhD thesis, Université Paris-Sud, 2011.
• A. D. Gordon., Classification, volume 82 of Monographs on Statistics and Applied Probability. Chapman Hall/CRC, Boca Raton, 1999.
• P. J. Green. Reversible Jump Markov Chain Monte Carlo computation and Bayesian model determination., Biometrika, 82(4):711–732, 1995.
• B. Guedj and P. Alquier. PAC-Bayesian estimation and prediction in sparse additive models., Electronic Journal of Statistics, 7:264–291, 2013.
• B. Guedj and S. Robbiano. PAC-Bayesian high dimensional bipartite ranking., Journal of Statistical Planning and Inference, 2017.
• S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan. Clustering data streams: theory and practice., IEEE Transactions on Knowledge and Data Engineering, 15(3):511–528, 2003.
• J. A. Hartigan., Clustering Algorithms. Wiley Series in Probability and Mathematical Statistics. John Wiley and Sons, New York, 1975.
• L. Kaufman and P. Rousseeuw., Finding Groups in Data: An Introduction to Cluster Analysis. Wiley Series in Probability and Mathematical Statistics. Wiley-Interscience, Hoboken, 1990.
• J. Kivinen and M. K. Warmuth. Exponentiated gradient versus gradient descent for linear predictors., Information and Computation, 132(1):1–63, 1997.
• J. Kivinen and M. K. Warmuth. Averaging expert predictions. In, Computational Learning Theory: 4th European Conference (EuroCOLT ’99), pages 153–167. Springer, 1999.
• A. N. Kolmogorov and V. M. Tikhomirov. $\epsilon$-entropy and $\epsilon$-capacity of sets in function spaces., American Mathematical Society Translations, 17:277–364, 1961.
• Samuel Kotz and Saralees Nadarajah., Multivariate T-Distributions and Their Applications. Cambridge University Press, 2004.
• W. J. Krzanowski and Y. T. Lai. A criterion for determination the number of clusters in a data set., Biometrics, 44:23–34, 1988.
• L. Li., PACBO: PAC-Bayesian Online Clustering, 2016. URL https://CRAN.R-project.org/package=PACBO. R package version 0.1.0.
• E. Liberty, R. Sriharsha, and M. Sviridenko. An algorithm for online $k$-means clustering. In, Proceedings of the Eighteenth Workshop on Algorithm Engineering and Experiments (ALENEX), pages 81–89. SIAM, 2016.
• N. Littlestone and M. K. Warmuth. The weighted majority algorithm., Information and Computation, 108(2):212–216, 1994.
• D. A. McAllester. Some PAC-Bayesian theorems., Machine Learning, 37(3):355–363, 1999a.
• D. A. McAllester. PAC-Bayesian model averaging. In, Proceedings of the 12th annual conference on Computational Learning Theory, pages 164–170. ACM, 1999b.
• G. W. Milligan and M. C. Cooper. An examination of procedures for determining the number of clusters in a data set., Psychometrika, 50:159–179, 1985.
• A. Petralias and P. Dellaportas. An MCMC model search algorithm for regression problems., Journal of Statistical Computation and Simulation, 83(9) :1722–1740, 2013.
• C. P. Robert and G. Casella., Monte Carlo Statistical Methods. Springer, New York, 2004.
• G. O. Roberts and J. S. Rosenthal. Harris Recurrence of Metropolis-Within-Gibbs and Trans-Dimensional Markov Chains., Annals of Applied Probability, 16(4) :2123–2139, 2006.
• M. Seeger. PAC-Bayesian generalization bounds for gaussian processes., Journal of Machine Learning Research, 3:233–269, 2002.
• M. Seeger., Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, 2003.
• J. Shawe-Taylor and R. C. Williamson. A PAC analysis of a Bayes estimator. In, Proceedings of the 10th annual conference on Computational Learning Theory, pages 2–9. ACM, 1997.
• R. Tibshirani, G. Walther, and T. Hastie. Estimating the number of clusters in a dataset via the gap statistic., Journal of the Royal Statistical Society, 63:411–423, 2001.
• V. Vovk. Competitive on-line statistics., International Statistical Review, 69(2):213–248, 2001.
• O. Wintenberger. Optimal learning with Bernstein online aggregation., Machine Learning, 106(1):119–141, 2017.