Electronic Journal of Statistics

PAC-Bayesian bounds for sparse regression estimation with exponential weights

Pierre Alquier and Karim Lounici

Full-text: Open access

Abstract

We consider the sparse regression model where the number of parameters p is larger than the sample size n. The difficulty when considering high-dimensional problems is to propose estimators achieving a good compromise between statistical and computational performances. The Lasso is solution of a convex minimization problem, hence computable for large value of p. However stringent conditions on the design are required to establish fast rates of convergence for this estimator. Dalalyan and Tsybakov [17–19] proposed an exponential weights procedure achieving a good compromise between the statistical and computational aspects. This estimator can be computed for reasonably large p and satisfies a sparsity oracle inequality in expectation for the empirical excess risk only under mild assumptions on the design. In this paper, we propose an exponential weights estimator similar to that of [17] but with improved statistical performances. Our main result is a sparsity oracle inequality in probability for the true excess risk.

Article information

Source
Electron. J. Statist. Volume 5 (2011), 127-145.

Dates
First available in Project Euclid: 14 March 2011

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1300108317

Digital Object Identifier
doi:10.1214/11-EJS601

Mathematical Reviews number (MathSciNet)
MR2786484

Zentralblatt MATH identifier
1274.62463

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62J05: Linear regression 62G08: Nonparametric regression 62F15: Bayesian inference 62B10: Information-theoretic topics [See also 94A17] 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Keywords
Sparsity oracle inequality high-dimensional regression exponential weights PAC-Bayesian inequalities

Citation

Alquier, Pierre; Lounici, Karim. PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electron. J. Statist. 5 (2011), 127--145. doi:10.1214/11-EJS601. https://projecteuclid.org/euclid.ejs/1300108317


Export citation

References

  • [1] Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle., 2nd International Symposium on Information Theory. Budapest: Akademia Kiado. B. N. Petrov and F. Csaki, editors. 267–281.
  • [2] Alquier, P. (2006)., Transductive and Inductive Adaptative Inference for Regression and Density Estimation. PhD Thesis, Université Paris 6.
  • [3] Alquier, P. (2008). PAC-Bayesian bounds for randomized empirical risk minimizers., Mathematical Methods of Statistics 17 (4) 279–304.
  • [4] Audibert, J.-Y. (2004). Aggregated Estimators and Empirical Complexity for Least Square Regression., Annales de l’Institut Henri Poincaré B: Probability and Statistics 40 (6) 685–736.
  • [5] Audibert, J.-Y. (2004)., PAC-Bayesian Statistical Learning Theory. PhD Thesis, Université Paris 6.
  • [6] Bach, F. (2008). Bolasso: model consistent Lasso estimation through the bootstrap., Proceedings of the 25th international conference on Machine learning, ACM, New-York.
  • [7], Bickel, P. J., Ritov, Y. and Tsybakov, A. (2009). Simultaneous Analysis of Lasso and Dantzig Selector., The Annals of Statistics 37 (4) 1705–1732.
  • [8] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics 1 169–194.
  • [9] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Aggregation for Gaussian regression., The Annals of Statistics 35 1674–1697.
  • [10] Cai, T., Xu, G. and Zhang, J. (2009). On Recovery of Sparse Signals via, 1 Minimization. IEEE Transactions on Information Theory 55 3388–3397.
  • [11] Candès, E. and Tao, T. (2007) The Dantzig selector: statistical estimation when, p is much larger than n. The Annals of Statistics 35.
  • [12] Catoni, O. (2003), A PAC-Bayesian approach to adaptative classification. Preprint Laboratoire de Probabilités et Modèles Aléatoires PMA-840.
  • [13] Catoni, O. (2004), Statistical Learning Theory and Stochastic Optimization. Saint-Flour Summer School on Probability Theory 2001 (Jean Picard ed.), Lecture Notes in Mathematics, Springer.
  • [14] Catoni, O. (2007), PAC-Bayesian Supervised Classification (The Thermodynamics of Statistical Learning). IMS Lecture Notes - Monograph Series 56.
  • [15] Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic Decomposition by Basis Pursuit., SIAM Review 43 (1) 129–159.
  • [16] Dalalyan, A. and Tsybakov, A. (2007). Aggregation by exponential weighting and sharp oracle inequalities., Proceedings of the 20th annual conference on Learning theory, Springer-Verlag Berlin, Heidelberg.
  • [17] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp oracle inequalities and sparsity., Machine Learning 72 (1-2) 39–61.
  • [18] Dalalyan, A. and Tsybakov, A. (2010)., Mirror averaging with sparsity priors. In minor revision for Bernoulli, arXiv:1003.1189.
  • [19] Dalalyan, A. and Tsybakov, A. (2010)., Sparse Regression Learning by Aggregation and Langevin Monte-Carlo. Preprint arXiv:0903.1223.
  • [20] Dembo, A. and Zeitouni, O. (1998)., Large Deviation Techniques and Applications. Springer.
  • [21] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least Angle Regression., The Annals of Statistics 32 (2) 407–499.
  • [22] Frank, L. and Friedman, J. (1993). A Statistical View on Some Chemometrics Regression Tools., Technometrics 16 199–511.
  • [23] Ghosh, S. (2007)., Adative Elastic Net: an Improvement of Elastic Net to achieve Oracle Properties. Preprint.
  • [24] Juditsky, A., Rigollet, P. and Tsybakov, A. (2008). Learning by mirror averaging., The Annals of Statistics 36 (5) 2183–2206.
  • [25] Koltchinskii, V. (2010). Sparsity in Empirical Risk Minimization., Annales de l’Institut Henri Poincaré B: Probability and Statistics, to appear.
  • [26] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities., Bernoulli 15 (3) 799–828.
  • [27] Lehmann, E. L. and Casella, G. (1998)., Theory of Point Estimation. Springer.
  • [28] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions., IEEE Transactions on Information Theory 52 (8) 3396–3410.
  • [29] Lounici, K. (2007). Generalized mirror averaging and, d-convex aggregation. Mathematical Methods of Statistics 16 (3).
  • [30] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electronic Journal of Statistics 2 90–102.
  • [31] Mallows, C. L. (1973). Some comments on, cp. Technometrics 15 661–676.
  • [32] Marin, J.-M. and Robert, C. P. (2007)., Bayesian Core: A practical approach to computational Bayesian analysis. Springer.
  • [33] Massart, P. (2007)., Concentration Inequalities and Model Selection. Saint-Flour Summer School on Probability Theory 2003 (Jean Picard ed.), Lecture Notes in Mathematics, Springer.
  • [34] McAllester D. A. (1998). Some pac-bayesian theorems., Proceedings of the Eleventh Annual Conference on Computational Learning Theory. ACM, 230–234.
  • [35] Meinshausen, N. and Bühlmann, P. (2010). Stability selection., Journal of the Royal Statistical Society B, to appear.
  • [36] Rigollet, P. and Tsybakov, A. (2010). Exponential Screening and optimal rates of sparse estimation., The Annals of Statistics, to appear.
  • [37] Schwarz, G. (1978). Estimating the dimension of a model., The Annals of Statistics, 6 461–464.
  • [38] Shawe-Taylor, J. and Williamson, R. (1997). A PAC analysis of a bayes estimator., Proceedings of the Tenth Annual Conference on Computational Learning Theory. ACM, 2–9.
  • [39] Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO., Journal of the Royal Statistical Society B 58 (1) 267–288.
  • [40] Tsybakov, A. (2003). Optimal rates of aggregation., Computationnal Learning theory and Kernel Machines, Lecture Notes in Artificial Intelligence n.2777, Springer, Heidelberg. 303–313.
  • [41] Tsybakov, A. (2009)., Introduction to nonparametric estimation. Spinger, New York.
  • [42] Van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the LASSO., Electronic Journal of Statistics 3 1360–1392.
  • [43] Yang, Y. (2004). Aggregating regression procedures to improve performance., Bernoulli, 10 (1) 25–47.
  • [44] Zhang, T. (2008). From epsilon-entropy to KL-entropy: analysis of minimum information complexity density estimation., The Annals of Statistics 34 2180–2210.
  • [45] Zou, H. (2006). The adaptive lasso and its oracle properties., Journal of the American Statistical Association, 101 1418–1429.
  • [46] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society B 67 (2) 301–320.