Electronic Journal of Statistics

Sparsity information and regularization in the horseshoe and other shrinkage priors

Juho Piironen and Aki Vehtari

Full-text: Open access


The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage hyperparameter based on the prior information about the degree of sparsity in the parameter vector. Second, the horseshoe prior has the undesired property that there is no possibility of specifying separately information about sparsity and the amount of regularization for the largest coefficients, which can be problematic with weakly identified parameters, such as the logistic regression coefficients in the case of data separation. This paper proposes solutions to both of these problems. We introduce a concept of effective number of nonzero parameters, show an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions, and argue that the previous default choices are dubious based on their tendency to favor solutions with more unshrunk parameters than we typically expect a priori. Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab. Numerical experiments on synthetic and real world data illustrate the benefit of both of these theoretical advances.

Article information

Electron. J. Statist. Volume 11, Number 2 (2017), 5018-5051.

Received: June 2017
First available in Project Euclid: 15 December 2017

Permanent link to this document

Digital Object Identifier

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference

Bayesian inference sparse estimation shrinkage priors horseshoe prior

Creative Commons Attribution 4.0 International License.


Piironen, Juho; Vehtari, Aki. Sparsity information and regularization in the horseshoe and other shrinkage priors. Electron. J. Statist. 11 (2017), no. 2, 5018--5051. doi:10.1214/17-EJS1337SI. https://projecteuclid.org/euclid.ejs/1513306866

Export citation


  • Betancourt, M. (2017a). A conceptual introduction to Hamiltonian Monte, Carlo.arXiv:1701.02434.
  • Betancourt, M. (2017b). Diagnosing biased inference with divergences. Case study, notebook.http://mc-stan.org/users/documentation/case-studies/divergences_and_bias.html, accessed 15.6.2017.
  • Betancourt, M. and Girolami, M. (2015). Hamiltonian Monte Carlo for hierarchical models., InCurrent trends in Bayesian methodology with applications(S. K. Upadhyay, U. Singh, D. K. Dey and A. Loganathan, eds.) 79–101. Chapman & Hall.
  • Bhadra, A., Datta, J., Polson, N. G. and Willard, B. (2017). The horseshoe$+$ estimator of ultra-sparse, signals.Bayesian Analysis. First Online, DOI: 10.1214/16-BA1028.
  • Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet-Laplace priors for optimal, shrinkage.Journal of the American Statistical Association1101479–1490.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2009). Handling sparsity via the horseshoe., InProceedings of the 12th International Conference on Artificial Intelligence and Statistics(D. van Dyk and M. Welling, eds.).Proceedings of Machine Learning Research573–80. PMLR.
  • Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse, signals.Biometrika97465–480.
  • Datta, J. and Ghosh, J. K. (2013). Asymptotic properties of Bayes risk for the horseshoe, prior.Bayesian Analysis8111–132.
  • Faulkner, J. R. and Minin, V. N. (2017). Locally adaptive smoothing with Markov random fields and shrinkage, priors.Bayesian Analysis. First Online, DOI: 10.1214/17-BA1050.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization Paths for Generalized Linear Models via Coordinate, Descent.Journal of Statistical Software33.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical, models.Bayesian Analysis1515–533.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B., (2013).Bayesian Data Analysis, Third ed. Chapman & Hall.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs, sampling.Journal of the American Statistical Association88881–889.
  • Ghosh, S. and Doshi-Velez, F. (2017). Model selection in Bayesian neural networks via horseshoe, priors.arXiv:1705.10388.
  • Ghosh, J., Li, Y. and Mitra, R. (2017). On the use of Cauchy prior distributions for Bayesian logistic, regression.Bayesian Analysis. First Online, DOI: 10.1214/17-BA1051.
  • Hastie, T., Tibshirani, R. and Wainwright, M., (2015).Statistical learning with sparsity. Chapman & Hall.
  • Hernández-Lobato, D., Hernández-Lobato, J. M. andSuárez, A. (2010). Expectation propagation for microarray data, classification.Pattern Recognition Letters311618–1626.
  • Hernández-Lobato, J. M., Hernández-Lobato, D. and Suárez, A. (2015). Expectation propagation in linear regresssion models with spike-and-slab, priors.Machine Learning99437–487.
  • Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: a, tutorial.Statistical Science14382–417.
  • Hoffman, M. D. and Gelman, A. (2014). The No-U-Turn Sampler: adaptively setting path lengths in Hamiltonian Monte, Carlo.Journal of Machine Learning Research151593–1623.
  • Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse, sequences.The Annals of Statistics321594–1649.
  • McCullagh, P. and Nelder, J. A., (1989).Generalized linear models, second ed.Monographs on Statistics and Applied Probability. Chapman & Hall.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear, regression.Journal of the American Statistical Association831023–1036.
  • Park, T. and Casella, G. (2008). The Bayesian, Lasso.Journal of the American Statistical Association103681–686.
  • Peltola, T., Havulinna, A. S., Salomaa, V. and Vehtari, A. (2014). Hierarchical Bayesian survival analysis and projective covariate selection in cardiovascular event risk prediction., InProceedings of the Eleventh UAI Bayesian Modeling Applications Workshop.CEUR Workshop Proceedings121879–88.
  • Piironen, J. and Vehtari, A. (2015). Projection predictive variable selection using, Stan+R.arXiv:1508.02502.
  • Piironen, J. and Vehtari, A. (2017a). On the hyperprior choice for the global shrinkage parameter in the horseshoe prior., InProceedings of the 20th International Conference on Artificial Intelligence and Statistics(A. Singh and J. Zhu, eds.).Proceedings of Machine Learning Research54905–913. PMLR.
  • Piironen, J. and Vehtari, A. (2017b). Comparison of Bayesian predictive methods for model, selection.Statistics and Computing27711–735.
  • Polson, N. G. and Scott, J. G. (2011). Shrink globally, act locally: sparse Bayesian regularization and prediction., InBayesian statistics 9(J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 501–538. Oxford University Press, Oxford.
  • Stan Development Team (2017). Stan modeling language users guide and reference manual, Version, 2.15.0.http://mc-stan.org.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the, Lasso.Journal of the Royal Statistical Society. Series B (Methodological)58267–288.
  • Titsias, M. K. and Lázaro-Gredilla, M. (2011). Spike and slab variational inference for multi-task and multiple kernel learning., InAdvances in Neural Information Processing Systems 242339–2347.
  • van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: posterior concentration around nearly black, vectors.Electronic Journal of Statistics82585–2618.
  • Vehtari, A., Gelman, A. and Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and, WAIC.Statistics and Computing271413–1432.
  • Vehtari, A. and Ojanen, J. (2012). A survey of Bayesian predictive methods for model assessment, selection and, comparison.Statistics Surveys6142–228.
  • Zhang, Y., Reich, B. J. and Bondell, H. D. (2016). High dimensional linear regression via the R2-D2 shrinkage, prior.arXiv:1609.00046.