The Annals of Statistics

Bayesian estimation of sparse signals with a continuous spike-and-slab prior

Veronika Ročková

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We introduce a new framework for estimation of sparse normal means, bridging the gap between popular frequentist strategies (LASSO) and popular Bayesian strategies (spike-and-slab). The main thrust of this paper is to introduce the family of Spike-and-Slab LASSO (SS-LASSO) priors, which form a continuum between the Laplace prior and the point-mass spike-and-slab prior. We establish several appealing frequentist properties of SS-LASSO priors, contrasting them with these two limiting cases. First, we adopt the penalized likelihood perspective on Bayesian modal estimation and introduce the framework of Bayesian penalty mixing with spike-and-slab priors. We show that the SS-LASSO global posterior mode is (near) minimax rate-optimal under squared error loss, similarly as the LASSO. Going further, we introduce an adaptive two-step estimator which can achieve provably sharper performance than the LASSO. Second, we show that the whole posterior keeps pace with the global mode and concentrates at the (near) minimax rate, a property that is known \textsl{not to hold} for the single Laplace prior. The minimax-rate optimality is obtained with a suitable class of independent product priors (for known levels of sparsity) as well as with dependent mixing priors (adapting to the unknown levels of sparsity). Up to now, the rate-optimal posterior concentration has been established only for spike-and-slab priors with a point mass at zero. Thus, the SS-LASSO priors, despite being continuous, possess similar optimality properties as the “theoretically ideal” point-mass mixtures. These results provide valuable theoretical justification for our proposed class of priors, underpinning their intuitive appeal and practical potential.

Article information

Source
Ann. Statist., Volume 46, Number 1 (2018), 401-437.

Dates
Received: May 2015
Revised: February 2017
First available in Project Euclid: 22 February 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1519268435

Digital Object Identifier
doi:10.1214/17-AOS1554

Mathematical Reviews number (MathSciNet)
MR3766957

Zentralblatt MATH identifier
06865116

Subjects
Primary: 62J99: None of the above, but in this section
Secondary: 62F15: Bayesian inference

Keywords
Asymptotic minimaxity LASSO posterior concentration spike-and-slab

Citation

Ročková, Veronika. Bayesian estimation of sparse signals with a continuous spike-and-slab prior. Ann. Statist. 46 (2018), no. 1, 401--437. doi:10.1214/17-AOS1554. https://projecteuclid.org/euclid.aos/1519268435


Export citation

References

  • [1] Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem. Ann. Statist. 35 2261–2286.
  • [2] Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc. 96 939–967.
  • [3] Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2015). Dirichlet–Laplace priors for optimal shrinkage. J. Amer. Statist. Assoc. 110 1479–1490.
  • [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [5] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • [6] Carlin, B. P. and Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 473–484.
  • [7] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465–480.
  • [8] Castillo, I. (2014). Bayesian nonparametrics, convergence and limiting shape of posterior distributions. Univ. Paris Diderot Paris 7.
  • [9] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. W. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
  • [10] Castillo, I. and van der Vaart, A. W. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [11] Chipman, H., George, E. I. and McCulloch, R. E. (2001). The practical implementation of Bayesian model selection. In Model Selection. Institute of Mathematical Statistics Lecture Notes—Monograph Series 38 65–134. IMS, Beachwood, OH.
  • [12] Clyde, M., DeSimone, H. and Parmigiani, G. (1994). Prediction via orthogonalized model mixing. J. Amer. Statist. Assoc. 91 1197–1208.
  • [13] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • [14] Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 41–81.
  • [15] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [16] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • [17] George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • [18] Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Statist. Assoc. 98 438–455.
  • [19] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • [20] Ishwaran, H. and Rao, J. S. (2011). Consistency of spike and slab regression. Statist. Probab. Lett. 81 1920–1928.
  • [21] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
  • [22] Johnstone, I. M. and Silverman, B. W. (2005). Empirical Bayes estimates selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • [23] Lempers, F. B. (1971). Posterior Probabilities of Alternative Linear Models: Some Theoretical Considerations and Empirical Experiments. Rotterdam Univ. Press, Rotterdam.
  • [24] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 3498–3528.
  • [25] Martin, R. and Walker, S. G. (2014). Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector. Electron. J. Stat. 8 2188–2206.
  • [26] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1032.
  • [27] Narisetty, N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
  • [28] Pati, D., Bhattacharya, A., Pillai, N. and Dunson, D. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 1102–1130.
  • [29] Ročková, V. (2018). Supplement to “Bayesian estimation of sparse signals with a continuous spike-and-slab prior.” DOI:10.1214/17-AOS1554SUPP.
  • [30] Ročková, V. and George, E. I. (2014). EMVS: The EM approach to Bayesian variable selection. J. Amer. Statist. Assoc. 109 828–846.
  • [31] Rockova, V. and George, E. I. (2016). Fast Bayesian factor analysis via automatic rotations to sparsity. J. Amer. Statist. Assoc. 111 160–1622.
  • [32] Rockova, V. and George, E. I. (2017). The Spike-and-Slab LASSO. J. Amer. Statist. Assoc. To appear.
  • [33] Su, W. and Candés, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068.
  • [34] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [35] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused LASSO. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
  • [36] van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat. 8 2585–2618.
  • [37] Zhang, C. H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional regression. Ann. Statist. 36 1567–1594.
  • [38] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • [39] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

Supplemental materials

  • Supplement to “Bayesian estimation of sparse signals with a continuous spike-and-slab prior”. Supplement contains proofs of Section 4.