Bayesian Analysis

Variable Selection via Penalized Credible Regions with Dirichlet–Laplace Global-Local Shrinkage Priors

Yan Zhang and Howard D. Bondell

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

The method of Bayesian variable selection via penalized credible regions separates model fitting and variable selection. The idea is to search for the sparsest solution within the joint posterior credible regions. Although the approach was successful, it depended on the use of conjugate normal priors. More recently, improvements in the use of global-local shrinkage priors have been made for high-dimensional Bayesian variable selection. In this paper, we incorporate global-local priors into the credible region selection framework. The Dirichlet–Laplace (DL) prior is adapted to linear regression. Posterior consistency for the normal and DL priors are shown, along with variable selection consistency. We further introduce a new method to tune hyperparameters in prior distributions for linear regression. We propose to choose the hyperparameters to minimize a discrepancy between the induced distribution on R-square and a prespecified target distribution. Prior elicitation on R-square is more natural, particularly when there are a large number of predictor variables in which elicitation on that scale is not feasible. For a normal prior, these hyperparameters are available in closed form to minimize the Kullback–Leibler divergence between the distributions.

Article information

Source
Bayesian Anal. (2017), 22 pages.

Dates
First available in Project Euclid: 21 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1508551721

Digital Object Identifier
doi:10.1214/17-BA1076

Keywords
variable selection posterior credible region global-local shrinkage prior Dirichlet–Laplace posterior consistency hyperparameter tuning

Rights
Creative Commons Attribution 4.0 International License.

Citation

Zhang, Yan; Bondell, Howard D. Variable Selection via Penalized Credible Regions with Dirichlet–Laplace Global-Local Shrinkage Priors. Bayesian Anal., advance publication, 21 October 2017. doi:10.1214/17-BA1076. https://projecteuclid.org/euclid.ba/1508551721


Export citation

References

  • Akaike, H. (1973). “Information theory and an extension of the maximum likelihood principle.” InSelected Papers of Hirotugu Akaike, 199–213. Springer.
  • Arias-Castro, E., Lounici, K., et al. (2014). “Estimation and variable selection with exponential weights.”Electronic Journal of Statistics, 8(1): 328–354.
  • Armagan, A., Dunson, D. B., and Lee, J. (2013a). “Generalized double Pareto shrinkage.”Statistica Sinica, 23(1): 119.
  • Armagan, A., Dunson, D. B., Lee, J., Bajwa, W. U., and Strawn, N. (2013b). “Posterior consistency in linear models under shrinkage priors.”Biometrika, 100(4): 1011–1018.
  • Bhadra, A., Datta, J., Polson, N. G., Willard, B., et al. (2016). “The horseshoe+ estimator of ultra-sparse signals.”Bayesian Analysis.
  • Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet–Laplace priors for optimal shrinkage.”Journal of the American Statistical Association, 110(512): 1479–1490.
  • Bondell, H. D. and Reich, B. J. (2008). “Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR.”Biometrics, 64(1): 115–123.
  • Bondell, H. D. and Reich, B. J. (2012). “Consistent high-dimensional Bayesian variable selection via penalized credible regions.”Journal of the American Statistical Association, 107(500): 1610–1624.
  • Candes, E. and Tao, T. (2007). “The Dantzig selector: Statistical estimation when p is much larger than n.”The Annals of Statistics, 2313–2351.
  • Carvalho, C. M., Polson, N. G., and Scott, J. G. (2009). “Handling sparsity via the horseshoe.” InInternational Conference on Artificial Intelligence and Statistics, 73–80.
  • Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). “The Horseshoe estimator for sparse signals.”Biometrika, asq017.
  • Castillo, I., Schmidt-Hieber, J., Van der Vaart, A., et al. (2015). “Bayesian linear regression with sparse priors.”The Annals of Statistics, 43(5): 1986–2018.
  • Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). “Least angle regression.”The Annals of Statistics, 32(2): 407–499.
  • Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle properties.”Journal of the American Statistical Association, 96(456): 1348–1360.
  • George, E. and Foster, D. P. (2000). “Calibration and empirical Bayes variable selection.”Biometrika, 87(4): 731–747.
  • George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.”Journal of the American Statistical Association, 88(423): 881–889.
  • Griffin, J. E., Brown, P. J., et al. (2010). “Inference with normal-gamma prior distributions in regression problems.”Bayesian Analysis, 5(1): 171–188.
  • Hans, C. (2010). “Model uncertainty and variable selection in Bayesian lasso regression.”Statistics and Computing, 20(2): 221–229.
  • Ishwaran, H. and Rao, J. S. (2005). “Spike and slab variable selection: frequentist and Bayesian strategies.”Annals of Statistics, 730–773.
  • Lan, H., Chen, M., Flowers, J. B., Yandell, B. S., Stapleton, D. S., Mata, C. M., Mui, E., Flowers, M. T., Schueler, K. L., Manly, K. F., et al. (2006). “Combined expression trait correlations and expression quantitative trait locus mapping.”PLoS Genet, 2(1): e6.
  • Leng, C., Tran, M.-N., and Nott, D. (2014). “Bayesian adaptive lasso.”Annals of the Institute of Statistical Mathematics, 66(2): 221–244.
  • Li, Q., Lin, N., et al. (2010). “The Bayesian elastic net.”Bayesian Analysis, 5(1): 151–170.
  • Lv, J. and Fan, Y. (2009). “A unified approach to model selection and sparse recovery using regularized least squares.”The Annals of Statistics, 3498–3528.
  • Martin, R., Mess, R., Walker, S. G., et al. (2017). “Empirical Bayes posterior concentration in sparse high-dimensional linear models.”Bernoulli, 23(3): 1822–1847.
  • Park, T. and Casella, G. (2008). “The bayesian lasso.”Journal of the American Statistical Association, 103(482): 681–686.
  • Polson, N. G. and Scott, J. G. (2010). “Shrink globally, act locally: Sparse Bayesian regularization and prediction.”Bayesian Statistics, 9: 501–538.
  • Polson, N. G., Scott, J. G., and Windle, J. (2013). “The Bayesian Bridge.”Journal of the Royal Statistical Society: Series B (Statistical Methodology).
  • Schwarz, G. et al. (1978). “Estimating the dimension of a model.”The annals of statistics, 6(2): 461–464.
  • Stroeker, R. (1983). “Approximations of the eigenvalues of the covariance matrix of a first-order autoregressive process.”Journal of Econometrics, 22(3): 269–279.
  • Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.”Journal of the Royal Statistical Society. Series B (Methodological), 267–288.
  • Zhang, Y. and Bondell, H. D. (2017). “Supplementary material of “Variable selection via penalized credible regions with Dirichlet–Laplace global-local shrinkage priors”.”Bayesian Analysis.
  • Zou, H. (2006). “The adaptive lasso and its oracle properties.”Journal of the American Statistical Association, 101(476): 1418–1429.
  • Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.

Supplemental materials