## Electronic Journal of Statistics

### PAC-Bayesian estimation and prediction in sparse additive models

#### Abstract

The present paper is about estimation and prediction in high-dimensional additive models under a sparsity assumption ($p\gg n$ paradigm). A PAC-Bayesian strategy is investigated, delivering oracle inequalities in probability. The implementation is performed through recent outcomes in high-dimensional MCMC algorithms, and the performance of our method is assessed on simulated data.

#### Article information

Source
Electron. J. Statist. Volume 7 (2013), 264-291.

Dates
First available in Project Euclid: 24 January 2013

https://projecteuclid.org/euclid.ejs/1359041592

Digital Object Identifier
doi:10.1214/13-EJS771

Mathematical Reviews number (MathSciNet)
MR3020421

Zentralblatt MATH identifier
1337.62075

#### Citation

Guedj, Benjamin; Alquier, Pierre. PAC-Bayesian estimation and prediction in sparse additive models. Electron. J. Statist. 7 (2013), 264--291. doi:10.1214/13-EJS771. https://projecteuclid.org/euclid.ejs/1359041592

#### References

• Alquier, P. (2006). Transductive and Inductive Adaptive Inference for Regression and Density Estimation PhD thesis, Université Paris 6 -, UPMC.
• Alquier, P. (2008). PAC-Bayesian Bounds for Randomized Empirical Risk Minimizers., Mathematical Methods of Statistics 17 279–304. arXiv:0712.1698v3
• Alquier, P. and Biau, G. (2011). Sparse Single-Index Model. To appear in Journal of Machine Learning Research., arXiv:1101.3229v2
• Alquier, P. and Lounici, K. (2011). PAC-Bayesian Theorems for Sparse Regression Estimation with Exponential Weights., Electronic Journal of Statistics 5 127–145.
• Audibert, J.-Y. (2004a). Aggregated estimators and empirical complexity for least square regression., Annales de l’Institut Henri Poincaré: Probabilités et Statistiques 40 685–736.
• Audibert, J.-Y. (2004b). Théorie statistique de l’apprentissage: une approche PAC-Bayésienne PhD thesis, Université Paris 6 -, UPMC.
• Audibert, J.-Y. (2009). Fast learning rates in statistical inference through aggregation., The Annals of Statistics 37 1591–1646.
• Audibert, J.-Y. and Catoni, O. (2010). Robust linear regression through PAC-Bayesian truncation. Submitted., arXiv:1010.0072v2
• Audibert, J.-Y. and Catoni, O. (2011). Robust linear least squares regression., The Annals of Statistics 39 2766–2794.
• Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector., The Annals of Statistics 37 1705–1732. arXiv:0801.1095v3
• Bunea, F., Tsybakov, A. B. and Wegkamp, M. (2006). Aggregation and sparsity via $\ell_1$-penalized least squares. In, Proceedings of the 19th annual conference on Computational Learning Theory 379–391. Springer-Verlag.
• Bühlmann, P. and van de Geer, S. A. (2011)., Statistics for High-Dimensional Data. Springer.
• Carlin, B. P. and Chib, S. (1995). Bayesian Model choice via Markov Chain Monte Carlo Methods., Journal of the Royal Statistical Society, Series B 57 473–484.
• Catoni, O. (2004)., Statistical Learning Theory and Stochastic Optimization. École d’Été de Probabilités de Saint-Flour XXXI – 2001. Springer.
• Catoni, O. (2007)., PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. Lecture notes – Monograph Series 56. Institute of Mathematical Statistics.
• Dalalyan, A. S. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity., Machine Learning 72 39–61. arXiv:0803.2839v1
• Dalalyan, A. S. and Tsybakov, A. B. (2012). Sparse Regression Learning by Aggregation and Langevin Monte-Carlo., Journal of Computer and System Sciences 78 1423–1443. arXiv:0903.1223v3
• Giraud, C., Huet, S. and Verzelen, N. (2012). High-dimensional regression with unknown variance. To appear in Statistical Science., arXiv:1109.5587v2
• Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination., Biometrika 82 711–732.
• Guedj, B. (2012). pacbpred: PAC-Bayesian Estimation and Prediction in Sparse Additive Models R package version 0.92., http://cran.r-project.org/web/packages/pacbpred/index.html
• Hans, C., Dobra, A. and West, M. (2007). Shotgun Stochastic Search for “Large p” Regression., Journal of the American Statistical Association 102 507–516.
• Hastie, T. and Tibshirani, R. (1986). Generalized Additive Models., Statistical Science 1 297–318.
• Hastie, T. and Tibshirani, R. (1990)., Generalized Additive Models. Monographs on Statistics and Applied Probability 43. Chapman & Hall/CRC.
• Hastie, T., Tibshirani, R. and Friedman, J. (2009)., The Elements of Statistical Learning – Data mining, Inference, and Prediction, Second ed. Springer.
• Härdle, W. K. (1990)., Applied nonparametric regression. Cambridge University Press.
• Koltchinskii, V. and Yuan, M. (2010). Sparsity in multiple kernel learning., The Annals of Statistics 38 3660–3695.
• Marin, J.-M. and Robert, C. P. (2007)., Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer.
• Massart, P. (2007)., Concentration Inequalities and Model Selection. École d’Été de Probabilités de Saint-Flour XXXIII – 2003. Springer.
• McAllester, D. A. (1999). Some PAC-Bayesian Theorems., Machine Learning 37 355–363.
• Meier, L., van de Geer, S. A. and Bühlmann, P. (2009). High-dimensional additive modeling., The Annals of Statistics 37 3779–3821. arXiv:0806.4115
• Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., The Annals of Statistics 37 246–270. arXiv:0806.0145v2
• Meyn, S. and Tweedie, R. L. (2009)., Markov Chains and Stochastic Stability, 2nd ed. Cambridge University Press.
• Petralias, A. (2010). Bayesian model determination and nonlinear threshold volatility models PhD thesis, Athens University of Economics and, Business.
• Petralias, A. and Dellaportas, P. (2012). An MCMC model search algorithm for regression problems., Journal of Statistical Computation and Simulation 0 1-19.
• R Core Team (2012). R: A Language and Environment for Statistical Computing, Vienna, Austria ISBN 3-900051-07-0., http://www.R-project.org/
• Raskutti, G., Wainwright, M. J. and Yu, B. (2012). Minimax-optimal rates for sparse additive models over kernel classes via convex programming., Journal of Machine Learning Research 13 389-427.
• Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models., Journal of the Royal Statistical Society, Series B 71 1009–1030. arXiv:0711.4555v2
• Rigollet, P. (2006). Inégalités d’oracle, agrégation et adaptation PhD thesis, Université Paris 6 -, UPMC.
• Rigollet, P. and Tsybakov, A. B. (2012). Sparse estimation by exponential weighting., Statistical Science 27 558-575.
• Shawe-Taylor, J. and Williamson, R. C. (1997). A PAC analysis of a Bayes estimator. In, Proceedings of the 10th annual conference on Computational Learning Theory 2–9. ACM.
• Stone, C. J. (1985). Additive regression and other nonparametric models., The Annals of Statistics 13 689–705.
• Suzuki, T. (2012). PAC-Bayesian Bound for Gaussian Process Regression and Multiple Kernel Additive Model. In, Proceedings of the 25th annual conference on Computational Learning Theory.
• Suzuki, T. and Sugiyama, M. (2012). Fast learning rates of Multiple kernel learning: trade-off between sparsity and smoothness. Submitted., arXiv.org/abs/1203.0565v1
• Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society, Series B 58 267–288.
• Tsybakov, A. B. (2009)., Introduction to Nonparametric Estimation. Statistics. Springer.
• van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso., The Annals of Statistics 36 614–645.