Bernoulli

• Bernoulli
• Volume 24, Number 1 (2018), 271-296.

Sparse oracle inequalities for variable selection via regularized quantization

Clément Levrard

Abstract

We give oracle inequalities on procedures which combines quantization and variable selection via a weighted Lasso $k$-means type algorithm. The results are derived for a general family of weights, which can be tuned to size the influence of the variables in different ways. Moreover, these theoretical guarantees are proved to adapt the corresponding sparsity of the optimal codebooks, suggesting that these procedures might be of particular interest in high dimensional settings. Even if there is no sparsity assumption on the optimal codebooks, our procedure is proved to be close to a sparse approximation of the optimal codebooks, as has been done for the Generalized Linear Models in regression. If the optimal codebooks have a sparse support, we also show that this support can be asymptotically recovered, providing an asymptotic consistency rate. These results are illustrated with Gaussian mixture models in arbitrary dimension with sparsity assumptions on the means, which are standard distributions in model-based clustering.

Article information

Source
Bernoulli, Volume 24, Number 1 (2018), 271-296.

Dates
Revised: May 2016
First available in Project Euclid: 27 July 2017

https://projecteuclid.org/euclid.bj/1501142443

Digital Object Identifier
doi:10.3150/16-BEJ876

Mathematical Reviews number (MathSciNet)
MR3706757

Zentralblatt MATH identifier
06778328

Citation

Levrard, Clément. Sparse oracle inequalities for variable selection via regularized quantization. Bernoulli 24 (2018), no. 1, 271--296. doi:10.3150/16-BEJ876. https://projecteuclid.org/euclid.bj/1501142443

References

• [1] Antoniadis, A., Brossat, X., Cugliari, J. and Poggi, J.-M. (2013). Clustering functional data using wavelets. Int. J. Wavelets Multiresolut. Inf. Process. 11 1350003, 30.
• [2] Bach, F.R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
• [3] Biau, G., Devroye, L. and Lugosi, G. (2008). On the performance of clustering in Hilbert spaces. IEEE Trans. Inform. Theory 54 781–790.
• [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer Series in Statistics. Heidelberg: Springer.
• [5] Chang, X., Wang, Y., Li, R. and Xu, Z. (2014). Sparse $k$-means with $\ell _{\infty}/\ell _{0}$ penalty for high-dimensional data clustering. Available at arXiv:1403.7890.
• [6] De Soete, G. and Carroll, J.D. (1994). $k$-means clustering in a low-dimensional Euclidean space. In New Approaches in Classification and Data Analysis (E. Diday, Y. Lechevallier, M. Schader, P. Bertrand and B. Burtschy, eds.). Studies in Classification, Data Analysis, and Knowledge Organization. 212–219. Heidelberg: Springer.
• [7] Fischer, A. (2010). Quantization and clustering with Bregman divergences. J. Multivariate Anal. 101 2207–2221.
• [8] Gersho, A. and Gray, R.M. (1991). Vector Quantization and Signal Compression. Norwell, MA: Kluwer Academic.
• [9] Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions. Lecture Notes in Math. 1730. Berlin: Springer.
• [10] Graf, S., Luschgy, H. and Pagès, G. (2007). Optimal quantizers for Radon random vectors in a Banach space. J. Approx. Theory 144 27–53.
• [11] Jin, J. and Wang, W. (2014). Important feature PCA for high dimensional clustering. Available at arXiv:1407.5241.
• [12] Levrard, C. (2015). Nonasymptotic bounds for vector quantization in Hilbert spaces. Ann. Statist. 43 592–619.
• [13] Levrard, C. (2016). Supplement to “Sparse oracle inequalities for variable selection via regularized quantization.” DOI:10.3150/16-BEJ876SUPP.
• [14] Lloyd, S.P. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 129–137.
• [15] Massart, P. and Meynet, C. (2011). The Lasso as an $\ell_{1}$-ball model selection procedure. Electron. J. Stat. 5 669–687.
• [16] Maugis-Rabusseau, C. and Michel, B. (2013). Adaptive density estimation for clustering with Gaussian mixtures. ESAIM Probab. Stat. 17 698–724.
• [17] Meynet, C. (2013). An $\ell_{1}$-oracle inequality for the Lasso in finite mixture Gaussian regression models. ESAIM Probab. Stat. 17 650–671.
• [18] Pollard, D. (1981). Strong consistency of $k$-means clustering. Ann. Statist. 9 135–140.
• [19] Pollard, D. (1982). A central limit theorem for $k$-means clustering. Ann. Probab. 10 919–926.
• [20] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
• [21] Steinley, D. and Brusco, M.J. (2008). Selection of variables in cluster analysis: An empirical comparison of eight procedures. Psychometrika 73 125–144.
• [22] Sun, W., Wang, J. and Fang, Y. (2012). Regularized $k$-means clustering of high-dimensional data and its asymptotic consistency. Electron. J. Stat. 6 148–167.
• [23] Terada, Y. (2014). Strong consistency of reduced $k$-means clustering. Scand. J. Stat. 41 913–931.
• [24] Terada, Y. (2015). Strong consistency of factorial $k$-means clustering. Ann. Inst. Statist. Math. 67 335–357.
• [25] Timmerman, M.E., Ceulemans, E., Kiers, H.A.L. and Vichi, M. (2010). Factorial and reduced $k$-means reconsidered. Comput. Statist. Data Anal. 54 1858–1871.
• [26] van de Geer, S. (2013). Generic chaining and the $\ell_{1}$-penalty. J. Statist. Plann. Inference 143 1001–1012.
• [27] van de Geer, S.A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
• [28] Vichi, M. and Kiers, H.A.L. (2001). Factorial $k$-means analysis for two-way data. Comput. Statist. Data Anal. 37 49–64.
• [29] Witten, D.M. and Tibshirani, R. (2010). A framework for feature selection in clustering. J. Amer. Statist. Assoc. 105 713–726.

Supplemental materials

• Appendix: Remaining proofs. Due to space constraints, we relegate technical details of the remaining proofs to the supplement [13].