The Annals of Statistics

Bayesian variable selection with shrinking and diffusing priors

Naveen Naidu Narisetty and Xuming He

Full-text: Open access


We consider a Bayesian approach to variable selection in the presence of high dimensional covariates based on a hierarchical model that places prior distributions on the regression coefficients as well as on the model space. We adopt the well-known spike and slab Gaussian priors with a distinct feature, that is, the prior variances depend on the sample size through which appropriate shrinkage can be achieved. We show the strong selection consistency of the proposed method in the sense that the posterior probability of the true model converges to one even when the number of covariates grows nearly exponentially with the sample size. This is arguably the strongest selection consistency result that has been available in the Bayesian variable selection literature; yet the proposed method can be carried out through posterior sampling with a simple Gibbs sampler. Furthermore, we argue that the proposed method is asymptotically similar to model selection with the $L_{0}$ penalty. We also demonstrate through empirical work the fine performance of the proposed approach relative to some state of the art alternatives.

Article information

Ann. Statist., Volume 42, Number 2 (2014), 789-817.

First available in Project Euclid: 20 May 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62F12: Asymptotic properties of estimators 62F15: Bayesian inference

Bayes factor hierarchical model high dimensional data shrinkage variable selection


Narisetty, Naveen Naidu; He, Xuming. Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 (2014), no. 2, 789--817. doi:10.1214/14-AOS1207.

Export citation


  • Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
  • Bondell, H. D. and Reich, B. J. (2008). Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics 64 115–123, 322–323.
  • Bondell, H. D. and Reich, B. J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. J. Amer. Statist. Assoc. 107 1610–1624.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • Dey, T., Ishwaran, H. and Rao, J. S. (2008). An in-depth look at highest posterior model selection. Econometric Theory 24 377–403.
  • Dicker, L., Huang, B. and Lin, X. (2013). Variable selection and estimation with the seamless-$L_0$ penalty. Statist. Sinica 23 929–962.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • Hsu, D., Kakade, S. M. and Zhang, T. (2012). A tail inequality for quadratic forms of subgaussian random vectors. Electron. Commun. Probab. 17 6.
  • Huang, J. and Xie, H. (2007). Asymptotic oracle properties of SCAD-penalized least squares estimators. In Asymptotics: Particles, Processes and Inverse Problems. Institute of Mathematical Statistics Lecture Notes—Monograph Series 55 149–166. IMS, Beachwood, OH.
  • Ishwaran, H., Kogalur, U. B. and Rao, J. S. (2010). spikeslab: Prediction and variable selection using spike and slab regression. The R Journal 2 68–73.
  • Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • Ishwaran, H. and Rao, J. S. (2011). Consistency of spike and slab regression. Statist. Probab. Lett. 81 1920–1928.
  • James, G. M., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 127–142.
  • Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
  • Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
  • Kim, Y., Kwon, S. and Choi, H. (2012). Consistent model selection criteria on high dimensions. J. Mach. Learn. Res. 13 1037–1057.
  • Lan, H., Chen, M., Flowers, J. B., Yandell, B. S., Stapleton, D. S., Mata, C. M., Mui, E. T., Flowers, M. T., Schueler, K. L., Manly, K. F., Williams, R. W., Kendziorski, K. and Attie, A. D. (2006). Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genetics 2 e6.
  • Liang, F., Song, Q. and Yu, K. (2013). Bayesian subset modeling for high-dimensional generalized linear models. J. Amer. Statist. Assoc. 108 589–606.
  • Liu, Y. and Wu, Y. (2007). Variable selection via a combination of the $L_0$ and $L_1$ penalties. J. Comput. Graph. Statist. 16 782–798.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1036.
  • Moreno, E., Girón, F. J. and Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Ann. Statist. 38 1937–1952.
  • Narisetty, N. N. and He, X. (2014). Supplement to “Bayesian variable selection with shrinking and diffusing priors.” DOI:10.1214/14-AOS1207SUPP.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Shen, X., Pan, W. and Zhu, Y. (2012). Likelihood-based selection and sharp parameter estimation. J. Amer. Statist. Assoc. 107 223–232.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
  • Yang, Y. and He, X. (2012). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40 1102–1131.
  • Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215–1225.
  • Zhang, D., Lin, Y. and Zhang, M. (2009). Penalized orthogonal-components regression for large $p$ small $n$ data. Electron. J. Stat. 3 781–796.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

Supplemental materials