Bernoulli

  • Bernoulli
  • Volume 23, Number 3 (2017), 1822-1847.

Empirical Bayes posterior concentration in sparse high-dimensional linear models

Ryan Martin, Raymond Mess, and Stephen G. Walker

Full-text: Open access

Abstract

We propose a new empirical Bayes approach for inference in the $p\gg n$ normal linear model. The novelty is the use of data in the prior in two ways, for centering and regularization. Under suitable sparsity assumptions, we establish a variety of concentration rate results for the empirical Bayes posterior distribution, relevant for both estimation and model selection. Computation is straightforward and fast, and simulation results demonstrate the strong finite-sample performance of the empirical Bayes model selection procedure.

Article information

Source
Bernoulli, Volume 23, Number 3 (2017), 1822-1847.

Dates
Received: September 2015
Revised: December 2015
First available in Project Euclid: 17 March 2017

Permanent link to this document
https://projecteuclid.org/euclid.bj/1489737626

Digital Object Identifier
doi:10.3150/15-BEJ797

Mathematical Reviews number (MathSciNet)
MR3624879

Zentralblatt MATH identifier
06714320

Keywords
data-dependent prior fractional likelihood minimax regression variable selection

Citation

Martin, Ryan; Mess, Raymond; Walker, Stephen G. Empirical Bayes posterior concentration in sparse high-dimensional linear models. Bernoulli 23 (2017), no. 3, 1822--1847. doi:10.3150/15-BEJ797. https://projecteuclid.org/euclid.bj/1489737626


Export citation

References

  • [1] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression. Electron. J. Stat. 4 932–949.
  • [2] Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights. Electron. J. Stat. 8 328–354.
  • [3] Barbieri, M.M. and Berger, J.O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
  • [4] Barron, A.R. and Cover, T.M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034–1054.
  • [5] Bondell, H.D. and Reich, B.J. (2012). Consistent high-dimensional Bayesian variable selection via penalized credible regions. J. Amer. Statist. Assoc. 107 1610–1624.
  • [6] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • [7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [8] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
  • [9] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [10] Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • [11] Clyde, M. and George, E.I. (2004). Model uncertainty. Statist. Sci. 19 81–94.
  • [12] Dalalyan, A.S. and Tsybakov, A.B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds, and sparsity. Mach. Learn. 72 39–61.
  • [13] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [14] Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • [15] Gao, C., van der Vaart, A.W. and Zhou, H.H. (2015). A general framework for Bayes structured linear models. Unpublished manuscript. Available at arXiv:1506.02174.
  • [16] George, E.I. and McCullogh, R.E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • [17] Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [18] Grünwald, P. and van Ommen, T. (2014). Inconsistency of Bayesian inference for misspecified linear models, and a proposal for repairing it. Unpublished manuscript. Available at arXiv:1412.3730.
  • [19] Heaton, M.J. and Scott, J.G. (2010). Bayesian Computation and the Linear Model. In Frontiers of Statistical Decision Making and Bayesian Analysis (M.-H. Cheh, D. Dey, P. Müller, D. Sun and K. Ye, eds.) 527–545. New York: Springer.
  • [20] Ishwaran, H. and Rao, J.S. (2005). Spike and slab gene selection for multigroup microarray data. J. Amer. Statist. Assoc. 100 764–780.
  • [21] Ishwaran, H. and Rao, J.S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
  • [22] James, G.M. and Radchenko, P. (2009). A generalized Dantzig selector with shrinkage tuning. Biometrika 96 323–337.
  • [23] James, G.M., Radchenko, P. and Lv, J. (2009). DASSO: Connections between the Dantzig selector and lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 127–142.
  • [24] Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
  • [25] Jiang, W. and Tanner, M.A. (2008). Gibbs posterior for variable selection in high-dimensional classification and data mining. Ann. Statist. 36 2207–2231.
  • [26] Johnson, V.E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
  • [27] Martin, R. and Walker, S.G. (2014). Asymptotically minimax empirical Bayes estimation of a sparse normal mean vector. Electron. J. Stat. 8 2188–2206.
  • [28] Narisetty, N.N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
  • [29] Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
  • [30] Reid, S., Tibshirani, R. and Friedman, J. (2014). A study of error variance estimation in lasso regression. Unpublished manuscript. Available at arXiv:1311.5274.
  • [31] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [32] Rigollet, P. and Tsybakov, A.B. (2012). Sparse estimation by exponential weighting. Statist. Sci. 27 558–575.
  • [33] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687–714.
  • [34] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [35] Walker, S. and Hjort, N.L. (2001). On Bayesian consistency. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 811–821.
  • [36] Walker, S.G., Lijoi, A. and Prünster, I. (2007). On rates of convergence for posterior distributions in infinite-dimensional models. Ann. Statist. 35 738–746.
  • [37] Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447 661–678.
  • [38] Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques. Stud. Bayesian Econometrics Statist. 6 233–243. North-Holland, Amsterdam.
  • [39] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [40] Zhang, T. (2006). From $\varepsilon $-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.
  • [41] Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • [42] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.