Electronic Journal of Statistics

Sparsity information and regularization in the horseshoe and other shrinkage priors

Juho Piironen and Aki Vehtari

Full-text: Open access

Abstract

The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage hyperparameter based on the prior information about the degree of sparsity in the parameter vector. Second, the horseshoe prior has the undesired property that there is no possibility of specifying separately information about sparsity and the amount of regularization for the largest coefficients, which can be problematic with weakly identified parameters, such as the logistic regression coefficients in the case of data separation. This paper proposes solutions to both of these problems. We introduce a concept of effective number of nonzero parameters, show an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions, and argue that the previous default choices are dubious based on their tendency to favor solutions with more unshrunk parameters than we typically expect a priori. Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab. Numerical experiments on synthetic and real world data illustrate the benefit of both of these theoretical advances.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 5018-5051.

Dates
Received: June 2017
First available in Project Euclid: 15 December 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1513306866

Digital Object Identifier
doi:10.1214/17-EJS1337SI

Subjects
Primary: 62F15: Bayesian inference

Keywords
Bayesian inference sparse estimation shrinkage priors horseshoe prior

Rights
Creative Commons Attribution 4.0 International License.

Citation

Piironen, Juho; Vehtari, Aki. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron. J. Statist. 11 (2017), no. 2, 5018--5051. doi:10.1214/17-EJS1337SI. https://projecteuclid.org/euclid.ejs/1513306866


Export citation

References

  • Betancourt, M. (2017a). A conceptual introduction to Hamiltonian Monte Carlo., arXiv:1701.02434.
  • Betancourt, M. (2017b). Diagnosing biased inference with divergences. Case study notebook., http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html, accessed 15.6.2017.
  • Betancourt, M. and Girolami, M. (2015). Hamiltonian Monte Carlo for hierarchical models. In, Current trends in Bayesian methodology with applications (S. K. Upadhyay, U. Singh, D. K. Dey and A. Loganathan, eds.) 79–101. Chapman & Hall.
  • Bhadra, A., Datta, J., Polson, N. G. and Willard, B. (2017). The horseshoe$+$ estimator of ultra-sparse signals., Bayesian Analysis. First Online, DOI: 10.1214/16-BA1028.
  • Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage., Journal of the American Statistical Association 110 1479–1490.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2009). Handling sparsity via the horseshoe. In, Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (D. van Dyk and M. Welling, eds.). Proceedings of Machine Learning Research 5 73–80. PMLR.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465–480.
  • Datta, J. and Ghosh, J. K. (2013). Asymptotic properties of Bayes risk for the horseshoe prior., Bayesian Analysis 8 111–132.
  • Faulkner, J. R. and Minin, V. N. (2017). Locally adaptive smoothing with Markov random fields and shrinkage priors., Bayesian Analysis. First Online, DOI: 10.1214/17-BA1050.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent., Journal of Statistical Software 33.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models., Bayesian Analysis 1 515–533.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013)., Bayesian Data Analysis, Third ed. Chapman & Hall.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling., Journal of the American Statistical Association 88 881–889.
  • Ghosh, S. and Doshi-Velez, F. (2017). Model selection in Bayesian neural networks via horseshoe priors., arXiv:1705.10388.
  • Ghosh, J., Li, Y. and Mitra, R. (2017). On the use of Cauchy prior distributions for Bayesian logistic regression., Bayesian Analysis. First Online, DOI: 10.1214/17-BA1051.
  • Hastie, T., Tibshirani, R. and Wainwright, M. (2015)., Statistical learning with sparsity. Chapman & Hall.
  • Hernández-Lobato, D., Hernández-Lobato, J. M. andSuárez, A. (2010). Expectation propagation for microarray data classification., Pattern Recognition Letters 31 1618–1626.
  • Hernández-Lobato, J. M., Hernández-Lobato, D. and Suárez, A. (2015). Expectation propagation in linear regresssion models with spike-and-slab priors., Machine Learning 99 437–487.
  • Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: a tutorial., Statistical Science 14 382–417.
  • Hoffman, M. D. and Gelman, A. (2014). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte Carlo., Journal of Machine Learning Research 15 1593–1623.
  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences., The Annals of Statistics 32 1594–1649.
  • McCullagh, P. and Nelder, J. A. (1989)., Generalized linear models, second ed. Monographs on Statistics and Applied Probability. Chapman & Hall.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression., Journal of the American Statistical Association 83 1023–1036.
  • Park, T. and Casella, G. (2008). The Bayesian Lasso., Journal of the American Statistical Association 103 681–686.
  • Peltola, T., Havulinna, A. S., Salomaa, V. and Vehtari, A. (2014). Hierarchical Bayesian survival analysis and projective covariate selection in cardiovascular event risk prediction. In, Proceedings of the Eleventh UAI Bayesian Modeling Applications Workshop. CEUR Workshop Proceedings 1218 79–88.
  • Piironen, J. and Vehtari, A. (2015). Projection predictive variable selection using Stan+R., arXiv:1508.02502.
  • Piironen, J. and Vehtari, A. (2017a). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior. In, Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (A. Singh and J. Zhu, eds.). Proceedings of Machine Learning Research 54 905–913. PMLR.
  • Piironen, J. and Vehtari, A. (2017b). Comparison of Bayesian predictive methods for model selection., Statistics and Computing 27 711–735.
  • Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: sparse Bayesian regularization and prediction. In, Bayesian statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 501–538. Oxford University Press, Oxford.
  • Stan Development Team (2017). Stan modeling language users guide and reference manual, Version 2.15.0., http://mc-stan.org.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society. Series B (Methodological) 58 267–288.
  • Titsias, M. K. and Lázaro-Gredilla, M. (2011). Spike and slab variational inference for multi-task and multiple kernel learning. In, Advances in Neural Information Processing Systems 24 2339–2347.
  • van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: posterior concentration around nearly black vectors., Electronic Journal of Statistics 8 2585–2618.
  • Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC., Statistics and Computing 27 1413–1432.
  • Vehtari, A. and Ojanen, J. (2012). A survey of Bayesian predictive methods for model assessment, selection and comparison., Statistics Surveys 6 142–228.
  • Zhang, Y., Reich, B. J. and Bondell, H. D. (2016). High dimensional linear regression via the R2-D2 shrinkage prior., arXiv:1609.00046.