Electronic Journal of Statistics

Adaptive posterior contraction rates for the horseshoe

Stéphanie van der Pas, Botond Szabó, and Aad van der Vaart

Full-text: Open access

Abstract

We investigate the frequentist properties of Bayesian procedures for estimation based on the horseshoe prior in the sparse multivariate normal means model. Previous theoretical results assumed that the sparsity level, that is, the number of signals, was known. We drop this assumption and characterize the behavior of the maximum marginal likelihood estimator (MMLE) of a key parameter of the horseshoe prior. We prove that the MMLE is an effective estimator of the sparsity level, in the sense that it leads to (near) minimax optimal estimation of the underlying mean vector generating the data. Besides this empirical Bayes procedure, we consider the hierarchical Bayes method of putting a prior on the unknown sparsity level as well. We show that both Bayesian techniques lead to rate-adaptive optimal posterior contraction, which implies that the horseshoe posterior is a good candidate for generating rate-adaptive credible sets.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 3196-3225.

Dates
Received: February 2017
First available in Project Euclid: 22 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1506067214

Digital Object Identifier
doi:10.1214/17-EJS1316

Subjects
Primary: 62G15: Tolerance and confidence regions
Secondary: 62F15: Bayesian inference

Keywords
Horseshoe sparsity nearly black vectors normal means problem adaptive inference frequentist Bayes

Rights
Creative Commons Attribution 4.0 International License.

Citation

van der Pas, Stéphanie; Szabó, Botond; van der Vaart, Aad. Adaptive posterior contraction rates for the horseshoe. Electron. J. Statist. 11 (2017), no. 2, 3196--3225. doi:10.1214/17-EJS1316. https://projecteuclid.org/euclid.ejs/1506067214


Export citation

References

  • [1] Armagan, A., Dunson, D. B., and Lee, J. Generalized double Pareto shrinkage., Statistica Sinica 23 (2013), 119–143.
  • [2] Bhadra, A., Datta, J., Polson, N. G., and Willard, B. The horseshoe+ estimator of ultra-sparse signals. To appear in Bayesian Analysis., 2015.
  • [3] Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. Dirichlet–laplace priors for optimal shrinkage., Journal of the American Statistical Association 110, 512 (2015), 1479–1490. PMID: 27019543.
  • [4] Bickel, P. J., Ritov, Y., and Tsybakov, A. B. Simultaneous analysis of Lasso and Dantzig selector., The Annals of Statistics 37, 4 (2009), 1705–1732.
  • [5] Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. Asymptotic bayes-optimality under sparsity of some multiple testing procedures., Ann. Statist. 39, 3 (06 2011), 1551–1579.
  • [6] Caron, F., and Doucet, A. Sparse Bayesian nonparametric regression. In, Proceedings of the 25th International Conference on Machine Learning (New York, NY, USA, 2008), ICML ’08, ACM, pp. 88–95.
  • [7] Carvalho, C. M., Polson, N. G., and Scott, J. G. Handling sparsity via the horseshoe., Journal of Machine Learning Research, W&CP 5 (2009), 73–80.
  • [8] Carvalho, C. M., Polson, N. G., and Scott, J. G. The horseshoe estimator for sparse signals., Biometrika 97, 2 (2010), 465–480.
  • [9] Castillo, I., Schmidt-Hieber, J., and van der Vaart, A. Bayesian linear regression with sparse priors., Ann. Statist. 43, 5 (10 2015), 1986–2018.
  • [10] Castillo, I., and Van der Vaart, A. W. Needles and straw in a haystack: Posterior concentration for possibly sparse sequences., Ann. Statist. 40, 4 (2012), 2069–2101.
  • [11] Datta, J., and Ghosh, J. K. Asymptotic properties of Bayes risk for the horseshoe prior., Bayesian Analysis 8, 1 (2013), 111–132.
  • [12] Donoho, D. L., Johnstone, I. M., Hoch, J. C., and Stern, A. S. Maximum entropy and the nearly black object (with discussion)., Journal of the Royal Statistical Society. Series B (Methodological) 54, 1 (1992), 41–81.
  • [13] Ghosal, S., Ghosh, J. K., and Van der Vaart, A. W. Convergence rates of posterior distributions., The Annals of Statistics 28, 2 (2000), 500–531.
  • [14] Ghosal, S., Lember, J., and van der Vaart, A. Nonparametric Bayesian model selection and averaging., Electron. J. Stat. 2 (2008), 63–89.
  • [15] Ghosh, P., and Chakrabarti, A. Posterior concentration properties of a general class of shrinkage estimators around nearly black vectors. arXiv :1412.8161v2, 2015.
  • [16] Gramacy, R. B., monomvn: Estimation for multivariate normal and Student-t data with monotone missingness, 2014. R package version 1.9-5.
  • [17] Griffin, J. E., and Brown, P. J. Inference with normal-gamma prior distributions in regression problems., Bayesian Analysis 5, 1 (2010), 171–188.
  • [18] Hahn, R. P., He, J., and Lopes, H., fastHorseshoe: The Elliptical Slice Sampler for Bayesian Horseshoe Regression, 2016. R package version 0.1.0.
  • [19] Jiang, W., and Zhang, C.-H. General maximum likelihood empirical Bayes estimation of normal means., Ann. Statist. 37, 4 (08 2009), 1647–1684.
  • [20] Johnson, V. E., and Rossell, D. On the use of non-local prior densities in Bayesian hypothesis tests., J. R. Stat. Soc. Ser. B Stat. Methodol. 72, 2 (2010), 143–170.
  • [21] Johnstone, I. M., and Silverman, B. W. Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences., Ann. Statist. 32, 4 (2004), 1594–1649.
  • [22] Makalic, E., and Schmidt, D. F. A simple sampler for the horseshoe estimator., IEEE Signal Processing Letters 23, 1 (Jan 2016), 179–182.
  • [23] Piironen, J., and Vehtari, A. On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior., ArXiv e-prints (Oct. 2016).
  • [24] Polson, N. G., and Scott, J. G. Shrink globally, act locally: Sparse Bayesian regularization and prediction. In, Bayesian Statistics 9, J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, Eds. Oxford University Press, 2010.
  • [25] Polson, N. G., and Scott, J. G. Good, great or lucky? Screening for firms with sustained superior performance using heavy-tailed priors., The Annals of Applied Statistics 6, 1 (2012), 161–185.
  • [26] Polson, N. G., and Scott, J. G. On the half-Cauchy prior for a global scale parameter., Bayesian Analysis 7, 4 (2012), 887–902.
  • [27] Ročková, V. Bayesian estimation of sparse signals with a continuous spike-and-slab prior. submitted manuscript, available at, http://stat.wharton.upenn.edu/~vrockova/rockova2015.pdf, 2015.
  • [28] Rousseau, J., and Szabo, B. Asymptotic behaviour of the empirical bayes posteriors associated to maximum marginal likelihood estimator., Ann. Statist. 45, 2 (04 2017), 833–865.
  • [29] Scott, J. G. Parameter expansion in local-shrinkage models. arXiv :1010.5265, 2010.
  • [30] Scott, J. G. Bayesian estimation of intensity surfaces on the sphere via needlet shrinkage and selection., Bayesian Analysis 6, 2 (2011), 307–328.
  • [31] Szabo, B. T., van der Vaart, A. W., and van Zanten, J. Empirical Bayes scaling of Gaussian priors in the white noise model., Electron. J. Statist. 7 (2013), 991–1018.
  • [32] Tibshirani, R. Regression shrinkage and selection via the Lasso., J. R. Stat. Soc. Ser. B Stat. Methodol. 58, 1 (1996), 267–288.
  • [33] van der Pas, S., Scott, J., Chakraborty, A., and Bhattacharya, A., horseshoe: Implementation of the Horseshoe Prior, 2016. R package version 0.1.0.
  • [34] van der Pas, S., Szabó, B., and van der Vaart, A. Uncertainty quantification for the horseshoe., ArXiv e-prints (July 2016).
  • [35] van der Pas, S. L., Kleijn, B. J. K., and van der Vaart, A. W. The horseshoe estimator: Posterior concentration around nearly black vectors., Electron. J. Statist. 8, 2 (2014), 2585–2618.
  • [36] van der Vaart, A., and van Zanten, H. Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth., Ann. Statist. 37, 5B (2009), 2655–2675.
  • [37] van der Vaart, A. W., and Wellner, J. A., Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to statistics.