The Annals of Statistics

Bayesian linear regression with sparse priors

Ismaël Castillo, Johannes Schmidt-Hieber, and Aad van der Vaart

Full-text: Open access


We study full Bayesian procedures for high-dimensional linear regression under sparsity constraints. The prior is a mixture of point masses at zero and continuous distributions. Under compatibility conditions on the design matrix, the posterior distribution is shown to contract at the optimal rate for recovery of the unknown sparse vector, and to give optimal prediction of the response vector. It is also shown to select the correct sparse model, or at least the coefficients that are significantly different from zero. The asymptotic shape of the posterior distribution is characterized and employed to the construction and study of credible sets for uncertainty quantification.

Article information

Ann. Statist., Volume 43, Number 5 (2015), 1986-2018.

Received: March 2014
Revised: March 2015
First available in Project Euclid: 3 August 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G20: Asymptotic properties
Secondary: 62G05: Estimation

Bayesian inference sparsity


Castillo, Ismaël; Schmidt-Hieber, Johannes; van der Vaart, Aad. Bayesian linear regression with sparse priors. Ann. Statist. 43 (2015), no. 5, 1986--2018. doi:10.1214/15-AOS1334.

Export citation


  • [1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression. Electron. J. Stat. 4 932–949.
  • [3] Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights. Electron. J. Stat. 8 328–354.
  • [4] Babenko, A. and Belitser, E. (2010). Oracle convergence rate of posterior under projection prior and Bayesian model selection. Math. Methods Statist. 19 219–245.
  • [5] Belitser, E. Personal communication.
  • [6] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [7] Bottolo, L., Chadeau-Hyam, M., Hastie, D. I., Langley, S. R., Petretto, E., Tiret, L., Tregouet, D. and Richardson, S. (2011). Ess++: A C++ objected-oriented algorithm for Bayesian stochastic search model exploration. Bioinformatics 27 587–588.
  • [8] Bottolo, L. and Richardson, S. (2010). Evolutionary stochastic search for Bayesian model exploration. Bayesian Anal. 5 583–618.
  • [9] Bühlmann, P., Rütimann, P., van de Geer, S. and Zhang, C.-H. (2013). Correlated variables in regression: Clustering and sparse estimation. J. Statist. Plann. Inference 143 1835–1858.
  • [10] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • [11] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
  • [12] Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39 1496–1525.
  • [13] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [14] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465–480.
  • [15] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Supplement to “Bayesian linear regression with sparse priors.” DOI:10.1214/15-AOS1334SUPP.
  • [16] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [17] Dalalyan, A. S. and Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Springer, Berlin.
  • [18] Dellaportas, P., Forster, J. and Ntzoufras, I. (2002). On Bayesian model and variable selection using mcmc. Stat. Comput. 12 27–36.
  • [19] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • [20] George, E. I. (2000). The variable selection problem. J. Amer. Statist. Assoc. 95 1304–1308.
  • [21] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • [22] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [23] Griffin, J. E. and Brown, P. J. (2010). Inference with normal-gamma prior distributions in regression problems. Bayesian Anal. 5 171–188.
  • [24] Hans, C. (2009). Bayesian lasso regression. Biometrika 96 835–845.
  • [25] Hans, C., Dobra, A. and West, M. (2007). Shotgun stochastic search for “large $p$” regression. J. Amer. Statist. Assoc. 102 507–516.
  • [26] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • [27] Ji, C. and Schmidler, S. C. (2013). Adaptive Markov chain Monte Carlo for Bayesian variable selection. J. Comput. Graph. Statist. 22 708–728.
  • [28] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
  • [29] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
  • [30] Leung, G. and Barron, A. R. (2006). Information theory and mixing least-squares regressions. IEEE Trans. Inform. Theory 52 3396–3410.
  • [31] Li, F. and Zhang, N. R. (2010). Bayesian variable selection in structured high-dimensional covariate spaces with applications in genomics. J. Amer. Statist. Assoc. 105 1202–1214.
  • [32] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
  • [33] Martin, R., Mess, R. and Walker, S. G. (2014). Empirical Bayes posterior concentration in sparse high-dimensional linear models. Available at arXiv:1406.7718.
  • [34] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1036.
  • [35] Richardson, S., Bottolo, L. and Rosenthal, J. S. (2011). Bayesian models for sparse regression analysis of high dimensional data. In Bayesian Statistics 9 539–568. Oxford Univ. Press, Oxford.
  • [36] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [37] Rigollet, P. and Tsybakov, A. B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
  • [38] Ročková, V. and George, E. I. (2014). EMVS: The EM approach to Bayesian variable selection. J. Amer. Statist. Assoc. 109 828–846.
  • [39] Schäfer, C. and Chopin, N. (2013). Sequential Monte Carlo on large binary sampling spaces. Stat. Comput. 23 163–184.
  • [40] Schreck, A., Fort, G., Le Corff, S. and Moulines, E. (2013). A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. Available at arXiv:1312.5658.
  • [41] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587–2619.
  • [42] Shi, M. and Dunson, D. B. (2011). Bayesian variable selection via particle stochastic search. Statist. Probab. Lett. 81 283–291.
  • [43] Stingo, F. C. and Vannucci, M. (2011). Variable selection for discriminant analysis with Markov random field priors for the analysis of microarray data. Bioinformatics 27 495–501.
  • [44] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [45] van de Geer, S. and Muro, A. (2014). On higher order isotropy conditions and lower bounds for sparse quadratic forms. Electron. J. Stat. 8 3031–3061.
  • [46] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • [47] You, C., Ormerod, J. T. and Müller, S. (2014). On variational Bayes estimation and variational information criteria for linear regression models. Aust. N. Z. J. Stat. 56 73–87.
  • [48] Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215–1225.
  • [49] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [50] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [51] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

Supplemental materials