## Bayesian Analysis

### Variance Prior Forms for High-Dimensional Bayesian Variable Selection

This article is in its final form and can be cited using the date of online publication and the DOI.

#### Abstract

Consider the problem of high dimensional variable selection for the Gaussian linear model when the unknown error variance is also of interest. In this paper, we show that the use of conjugate shrinkage priors for Bayesian variable selection can have detrimental consequences for such variance estimation. Such priors are often motivated by the invariance argument of Jeffreys (1961). Revisiting this work, however, we highlight a caveat that Jeffreys himself noticed; namely that biased estimators can result from inducing dependence between parameters a priori. In a similar way, we show that conjugate priors for linear regression, which induce prior dependence, can lead to such underestimation in the Bayesian high-dimensional regression setting. Following Jeffreys, we recommend as a remedy to treat regression coefficients and the error variance as independent a priori. Using such an independence prior framework, we extend the Spike-and-Slab Lasso of Ročková and George (2018) to the unknown variance case. This extended procedure outperforms both the fixed variance approach and alternative penalized likelihood methods on simulated data. On the protein activity dataset of Clyde and Parmigiani (1998), the Spike-and-Slab Lasso with unknown variance achieves lower cross-validation error than alternative penalized likelihood methods, demonstrating the gains in predictive accuracy afforded by simultaneous error variance estimation. The unknown variance implementation of the Spike-and-Slab Lasso is provided in the publicly available R package SSLASSO (Ročková and Moran, 2017).

#### Article information

Source
Bayesian Anal., Advance publication (2018), 29 pages.

Dates
First available in Project Euclid: 7 March 2019

Permanent link to this document
https://projecteuclid.org/euclid.ba/1551927862

Digital Object Identifier
doi:10.1214/19-BA1149

#### Citation

Moran, Gemma E.; Ročková, Veronika; George, Edward I. Variance Prior Forms for High-Dimensional Bayesian Variable Selection. Bayesian Anal., advance publication, 7 March 2019. doi:10.1214/19-BA1149. https://projecteuclid.org/euclid.ba/1551927862

#### References

• Bayarri, M. J., Berger, J. O., Forte, A., García-Donato, G., et al. (2012). “Criteria for Bayesian model choice with application to variable selection.” The Annals of Statistics, 40(3): 1550–1577.
• Belloni, A., Chernozhukov, V., Wang, L., et al. (2014). “Pivotal estimation via square-root lasso in nonparametric regression.” The Annals of Statistics, 42(2): 757–788.
• Berger, J. O., Pericchi, L. R., and Varshavsky, J. A. (1998). “Bayes factors and marginal distributions in invariant situations.” Sankhya Ser. A, 60: 307–321.
• Bhadra, A., Datta, J., Polson, N. G., and Willard, B. (2016). “Default Bayesian analysis with global-local shrinkage priors.” Biometrika, 103(4): 955–969.
• Bhadra, A., Datta, J., Polson, N. G., and Willard, B. (2017). “Horseshoe Regularization for Feature Subset Selection.” ArXiv e-prints.
• Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97(2): 465–480.
• Chipman, H. A., George, E. I., McCulloch, R. E., et al. (2010). “BART: Bayesian additive regression trees.” The Annals of Applied Statistics, 4(1): 266–298.
• Clyde, M. A., Ghosh, J., and Littman, M. L. (2011). “Bayesian Adaptive Sampling for Variable Selection and Model Averaging.” Journal of Computational and Graphical Statistics, 20(1): 80–101.
• Clyde, M. A. and Parmigiani, G. (1998). “Protein construct storage: Bayesian variable selection and prediction with mixtures.” Journal of Biopharmaceutical Statistics, 8(3): 431–443.
• Fan, J. and Li, R. (2001). “Variable selection via nonconcave penalized likelihood and its oracle properties.” Journal of the American Statistical Association, 96(456): 1348–1360.
• Friedman, J., Hastie, T., and Tibshirani, R. (2010). “Regularization paths for generalized linear models via coordinate descent.” Journal of Statistical Software, 33(1): 1.
• Gelman, A. (2004). “Prior distributions for variance parameters in hierarchical models.” Bayesian Analysis.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian Data Analysis, volume 2. CRC press Boca Raton, FL.
• George, E. I. and McCulloch, R. E. (1993). “Variable selection via Gibbs sampling.” Journal of the American Statistical Association, 88(423): 881–889.
• George, E. I. and McCulloch, R. E. (1997). “Approaches for Bayesian Variable Selection.” Statistica Sinica, 7: 339–373.
• Hans, C. (2009). “Bayesian lasso regression.” Biometrika, 96(4): 835–845.
• Jeffreys, H. (1961). The Theory of Probability. Oxford University Press, 3 edition.
• Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. (2008). “Mixtures of g priors for Bayesian variable selection.” Journal of the American Statistical Association, 103(481): 410–423.
• Loh, P.-L. (2017). “Statistical consistency and asymptotic normality for high-dimensional robust $M$-estimators.” The Annals of Statistics, 45(2): 866–896.
• Mazumder, R., Friedman, J. H., and Hastie, T. (2011). “Sparsenet: Coordinate descent with nonconvex penalties.” Journal of the American Statistical Association, 106(495): 1125–1138.
• Mitchell, T. J. and Beauchamp, J. J. (1988). “Bayesian variable selection in linear regression.” Journal of the American Statistical Association, 83(404): 1023–1032.
• Moran, G. E., Ročková, V., George, E. I. (2019). “Supplementary Material for “Variance Prior Forms for High-Dimensional Bayesian Variable Selection”.” Bayesian Analysis.
• Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103(482): 681–686.
• Piironen, J. and Vehtari, A. (2017). “On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior.” In Artificial Intelligence and Statistics, 905–913.
• Polson, N. G. and Scott, J. G. (2010). “Shrink globally, act locally: sparse Bayesian regularization and prediction.” Bayesian Statistics, 9: 501–538.
• Robert, C. P., Chopin, N., and Rousseau, J. (2009). “Harold Jeffreys’s Theory of Probability Revisited.” Statistical Science, 141–172.
• Ročková, V. (2018). “Bayesian estimation of sparse signals with a continuous spike-and-slab prior.” The Annals of Statistics, 46(1): 401–437.
• Ročková, V. and George, E. I. (2014). “EMVS: The EM approach to Bayesian variable selection.” Journal of the American Statistical Association, 109(506): 828–846.
• Ročková, V. and George, E. I. (2018). “The Spike-and-Slab LASSO.” Journal of the American Statistical Association, 113(521): 431–444.
• Ročková, V. and Moran, G. (2017). SSLASSO: The Spike-and-Slab LASSO. https://cran.r-project.org/package=SSLASSO
• Ročková, V. and Moran, G. (2018). EMVS: The Expectation-Maximization Approach to Bayesian Variable Selection. https://cran.r-project.org/package=EMVS
• Städler, N., Bühlmann, P., and Van De Geer, S. (2010). “l1 penalization for mixture regression models.” Test, 19(2): 209–256.
• Sun, T. and Zhang, C.-H. (2010). “Comments on: l1-penalization for mixture regression models.” Test, 19(2): 270–275.
• Sun, T. and Zhang, C.-H. (2012). “Scaled sparse linear regression.” Biometrika.
• van der Pas, S., Salomond, J.-B., Schmidt-Hieber, J., et al. (2016). “Conditions for posterior contraction in the sparse normal means problem.” Electronic Journal of Statistics, 10(1): 976–1000.
• Zhang, C.-H. (2010). “Nearly unbiased variable selection under minimax concave penalty.” The Annals of Statistics, 38(2): 894–942.
• Zhang, C.-H. and Zhang, T. (2012). “A general theory of concave regularization for high-dimensional sparse estimation problems.” Statistical Science, 576–593.
• Zou, H. (2006). “The adaptive lasso and its oracle properties.” Journal of the American Statistical Association, 101(476): 1418–1429.