Electronic Journal of Statistics

Adaptive posterior contraction rates for the horseshoe

Stéphanie van der Pas, Botond Szabó, and Aad van der Vaart

Full-text: Open access

Abstract

We investigate the frequentist properties of Bayesian procedures for estimation based on the horseshoe prior in the sparse multivariate normal means model. Previous theoretical results assumed that the sparsity level, that is, the number of signals, was known. We drop this assumption and characterize the behavior of the maximum marginal likelihood estimator (MMLE) of a key parameter of the horseshoe prior. We prove that the MMLE is an effective estimator of the sparsity level, in the sense that it leads to (near) minimax optimal estimation of the underlying mean vector generating the data. Besides this empirical Bayes procedure, we consider the hierarchical Bayes method of putting a prior on the unknown sparsity level as well. We show that both Bayesian techniques lead to rate-adaptive optimal posterior contraction, which implies that the horseshoe posterior is a good candidate for generating rate-adaptive credible sets.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 3196-3225.

Dates
Received: February 2017
First available in Project Euclid: 22 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1506067214

Digital Object Identifier
doi:10.1214/17-EJS1316

Zentralblatt MATH identifier
1373.62140

Subjects
Primary: 62G15: Tolerance and confidence regions
Secondary: 62F15: Bayesian inference

Keywords
Horseshoe sparsity nearly black vectors normal means problem adaptive inference frequentist Bayes

Rights
Creative Commons Attribution 4.0 International License.

Citation

van der Pas, Stéphanie; Szabó, Botond; van der Vaart, Aad. Adaptive posterior contraction rates for the horseshoe. Electron. J. Statist. 11 (2017), no. 2, 3196--3225. doi:10.1214/17-EJS1316. https://projecteuclid.org/euclid.ejs/1506067214


Export citation

References

  • [1] Armagan, A., Dunson, D. B., and Lee, J. Generalized double Pareto, shrinkage.Statistica Sinica 23 (2013), 119–143.
  • [2] Bhadra, A., Datta, J., Polson, N. G., and Willard, B. The horseshoe+ estimator of ultra-sparse signals. To appear in Bayesian Analysis., 2015.
  • [3] Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. Dirichlet–laplace priors for optimal, shrinkage.Journal of the American Statistical Association 110, 512 (2015), 1479–1490. PMID: 27019543.
  • [4] Bickel, P. J., Ritov, Y., and Tsybakov, A. B. Simultaneous analysis of Lasso and Dantzig, selector.The Annals of Statistics 37, 4 (2009), 1705–1732.
  • [5] Bogdan, M., Chakrabarti, A., Frommlet, F., and Ghosh, J. K. Asymptotic bayes-optimality under sparsity of some multiple testing, procedures.Ann. Statist. 39, 3 (06 2011), 1551–1579.
  • [6] Caron, F., and Doucet, A. Sparse Bayesian nonparametric regression., InProceedings of the 25th International Conference on Machine Learning(New York, NY, USA, 2008), ICML ’08, ACM, pp. 88–95.
  • [7] Carvalho, C. M., Polson, N. G., and Scott, J. G. Handling sparsity via the, horseshoe.Journal of Machine Learning Research, W&CP 5 (2009), 73–80.
  • [8] Carvalho, C. M., Polson, N. G., and Scott, J. G. The horseshoe estimator for sparse, signals.Biometrika 97, 2 (2010), 465–480.
  • [9] Castillo, I., Schmidt-Hieber, J., and van der Vaart, A. Bayesian linear regression with sparse, priors.Ann. Statist. 43, 5 (10 2015), 1986–2018.
  • [10] Castillo, I., and Van der Vaart, A. W. Needles and straw in a haystack: Posterior concentration for possibly sparse, sequences.Ann. Statist. 40, 4 (2012), 2069–2101.
  • [11] Datta, J., and Ghosh, J. K. Asymptotic properties of Bayes risk for the horseshoe, prior.Bayesian Analysis 8, 1 (2013), 111–132.
  • [12] Donoho, D. L., Johnstone, I. M., Hoch, J. C., and Stern, A. S. Maximum entropy and the nearly black object (with, discussion).Journal of the Royal Statistical Society. Series B (Methodological) 54, 1 (1992), 41–81.
  • [13] Ghosal, S., Ghosh, J. K., and Van der Vaart, A. W. Convergence rates of posterior, distributions.The Annals of Statistics 28, 2 (2000), 500–531.
  • [14] Ghosal, S., Lember, J., and van der Vaart, A. Nonparametric Bayesian model selection and, averaging.Electron. J. Stat. 2 (2008), 63–89.
  • [15] Ghosh, P., and Chakrabarti, A. Posterior concentration properties of a general class of shrinkage estimators around nearly black vectors. arXiv :1412.8161v2, 2015.
  • [16] Gramacy, R., B.monomvn: Estimation for multivariate normal and Student-t data with monotone missingness, 2014. R package version 1.9-5.
  • [17] Griffin, J. E., and Brown, P. J. Inference with normal-gamma prior distributions in regression, problems.Bayesian Analysis 5, 1 (2010), 171–188.
  • [18] Hahn, R. P., He, J., and Lopes, H.fastHorseshoe: The Elliptical Slice Sampler for Bayesian Horseshoe Regression, 2016. R package version 0.1.0.
  • [19] Jiang, W., and Zhang, C.-H. General maximum likelihood empirical Bayes estimation of normal, means.Ann. Statist. 37, 4 (08 2009), 1647–1684.
  • [20] Johnson, V. E., and Rossell, D. On the use of non-local prior densities in Bayesian hypothesis, tests.J. R. Stat. Soc. Ser. B Stat. Methodol. 72, 2 (2010), 143–170.
  • [21] Johnstone, I. M., and Silverman, B. W. Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse, sequences.Ann. Statist. 32, 4 (2004), 1594–1649.
  • [22] Makalic, E., and Schmidt, D. F. A simple sampler for the horseshoe, estimator.IEEE Signal Processing Letters 23, 1 (Jan 2016), 179–182.
  • [23] Piironen, J., and Vehtari, A. On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe, Prior.ArXiv e-prints(Oct. 2016).
  • [24] Polson, N. G., and Scott, J. G. Shrink globally, act locally: Sparse Bayesian regularization and prediction., InBayesian Statistics 9, J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, Eds. Oxford University Press, 2010.
  • [25] Polson, N. G., and Scott, J. G. Good, great or lucky? Screening for firms with sustained superior performance using heavy-tailed, priors.The Annals of Applied Statistics 6, 1 (2012), 161–185.
  • [26] Polson, N. G., and Scott, J. G. On the half-Cauchy prior for a global scale, parameter.Bayesian Analysis 7, 4 (2012), 887–902.
  • [27] Ročková, V. Bayesian estimation of sparse signals with a continuous spike-and-slab prior. submitted manuscript, available, athttp://stat.wharton.upenn.edu/~vrockova/rockova2015.pdf, 2015.
  • [28] Rousseau, J., and Szabo, B. Asymptotic behaviour of the empirical bayes posteriors associated to maximum marginal likelihood, estimator.Ann. Statist. 45, 2 (04 2017), 833–865.
  • [29] Scott, J. G. Parameter expansion in local-shrinkage models. arXiv :1010.5265, 2010.
  • [30] Scott, J. G. Bayesian estimation of intensity surfaces on the sphere via needlet shrinkage and, selection.Bayesian Analysis 6, 2 (2011), 307–328.
  • [31] Szabo, B. T., van der Vaart, A. W., and van Zanten, J. Empirical Bayes scaling of Gaussian priors in the white noise, model.Electron. J. Statist. 7 (2013), 991–1018.
  • [32] Tibshirani, R. Regression shrinkage and selection via the, Lasso.J. R. Stat. Soc. Ser. B Stat. Methodol. 58, 1 (1996), 267–288.
  • [33] van der Pas, S., Scott, J., Chakraborty, A., and Bhattacharya, A.horseshoe: Implementation of the Horseshoe Prior, 2016. R package version 0.1.0.
  • [34] van der Pas, S., Szabó, B., and van der Vaart, A. Uncertainty quantification for the, horseshoe.ArXiv e-prints(July 2016).
  • [35] van der Pas, S. L., Kleijn, B. J. K., and van der Vaart, A. W. The horseshoe estimator: Posterior concentration around nearly black, vectors.Electron. J. Statist. 8, 2 (2014), 2585–2618.
  • [36] van der Vaart, A., and van Zanten, H. Adaptive Bayesian estimation using a Gaussian random field with inverse gamma, bandwidth.Ann. Statist. 37, 5B (2009), 2655–2675.
  • [37] van der Vaart, A. W., and Wellner, J., A.Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996. With applications to statistics.