Open Access
February 2018 Sparse oracle inequalities for variable selection via regularized quantization
Clément Levrard
Bernoulli 24(1): 271-296 (February 2018). DOI: 10.3150/16-BEJ876

Abstract

We give oracle inequalities on procedures which combines quantization and variable selection via a weighted Lasso $k$-means type algorithm. The results are derived for a general family of weights, which can be tuned to size the influence of the variables in different ways. Moreover, these theoretical guarantees are proved to adapt the corresponding sparsity of the optimal codebooks, suggesting that these procedures might be of particular interest in high dimensional settings. Even if there is no sparsity assumption on the optimal codebooks, our procedure is proved to be close to a sparse approximation of the optimal codebooks, as has been done for the Generalized Linear Models in regression. If the optimal codebooks have a sparse support, we also show that this support can be asymptotically recovered, providing an asymptotic consistency rate. These results are illustrated with Gaussian mixture models in arbitrary dimension with sparsity assumptions on the means, which are standard distributions in model-based clustering.

Citation

Download Citation

Clément Levrard. "Sparse oracle inequalities for variable selection via regularized quantization." Bernoulli 24 (1) 271 - 296, February 2018. https://doi.org/10.3150/16-BEJ876

Information

Received: 1 April 2015; Revised: 1 May 2016; Published: February 2018
First available in Project Euclid: 27 July 2017

zbMATH: 06778328
MathSciNet: MR3706757
Digital Object Identifier: 10.3150/16-BEJ876

Keywords: $k$-means , clustering , high dimension , Lasso , Oracle inequalities , Sparsity , Variable selection

Rights: Copyright © 2018 Bernoulli Society for Mathematical Statistics and Probability

Vol.24 • No. 1 • February 2018
Back to Top