Electronic Journal of Statistics

MAP model selection in Gaussian regression

Felix Abramovich and Vadim Grinshtein

Full-text: Open access

Abstract

We consider a Bayesian approach to model selection in Gaussian linear regression, where the number of predictors might be much larger than the number of observations. From a frequentist view, the proposed procedure results in the penalized least squares estimation with a complexity penalty associated with a prior on the model size. We investigate the optimality properties of the resulting model selector. We establish the oracle inequality and specify conditions on the prior that imply its asymptotic minimaxity within a wide range of sparse and dense settings for “nearly-orthogonal” and “multicollinear” designs.

Article information

Source
Electron. J. Statist., Volume 4 (2010), 932-949.

Dates
First available in Project Euclid: 24 September 2010

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1285333752

Digital Object Identifier
doi:10.1214/10-EJS573

Mathematical Reviews number (MathSciNet)
MR2721039

Zentralblatt MATH identifier
1329.62051

Subjects
Primary: 62C99: None of the above, but in this section
Secondary: 62C10, 62C20, 62G05

Keywords
Adaptivity complexity penalty Gaussian linear regression maximum a posteriori rule minimax estimation model selection oracle inequality sparsity

Citation

Abramovich, Felix; Grinshtein, Vadim. MAP model selection in Gaussian regression. Electron. J. Statist. 4 (2010), 932--949. doi:10.1214/10-EJS573. https://projecteuclid.org/euclid.ejs/1285333752


Export citation

References

  • [1] Abramovich, F., Benjamini, Y., Donoho, D.L. and Johnstone, I.M. (2006). Adapting to unknown sparsity by controlling the false discovery rate., Ann. Statist. 34, 584–653.
  • [2] Abramovich, F., Grinshtein, V. and Pensky, M. (2007). On optimality of Bayesian testimation in the normal means problem., Ann. Statist. 35, 2261–2286.
  • [3] Abramovich, F., Grinshtein, V., Petsa, A. and Sapatinas, T. (2010). On Bayesian testimation and its application to wavelet thresholding., Biometrika 97, 181–198.
  • [4] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. in, Second International Symposium on Information Theory. (eds. B.N. Petrov and F. Czáki). Akademiai Kiadó, Budapest, 267–281.
  • [5] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector., Ann. Statist. 35, 1705–1732.
  • [6] Birgé, L. and Massart, P. (2001). Gaussian model selection., J. Eur. Math. Soc. 3, 203–268.
  • [7] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probab. Theory Relat. Fields 138, 33–73.
  • [8] Bunea, F., Tsybakov, A. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression., Ann. Statist. 35, 1674–1697.
  • [9] Candés, E.J. (2006). Modern statistical estimation via oracle inequalities., Acta Numerica 15, 257–325.
  • [10] Candés, E.J. and Tao, T. (2007). The Dantzig selector: statistical estimation when, p is much larger than n. Ann. Statist. 35, 2313–2351.
  • [11] Chipman, H., George, E.I. and McCullogh, R.E. (2001)., The Practical Implementation of Bayesian Model Selection. IMS Lecture Notes – Monograph Series 38.
  • [12] Donoho, D.L. and Johnstone, I.M. (1994). Ideal spatial adaptation via wavelet shrinkage., Biometrika 81, 425–455.
  • [13] Donoho, D.L. and Johnstone, I.M. (1995). Empirical atomic decomposition, unpublished manuscript.
  • [14] Foster, D.P. and George, E.I. (1994). The risk inflation criterion for multiple regression., Ann. Statist. 22, 1947–1975.
  • [15] George, E.I. and McCullogh, R.E. (1993). Variable selection via Gibbs sampling., J. Am. Statist. Assoc. 88, 881–889.
  • [16] George, E.I. and McCullogh, R.E. (1997). Approaches to Bayesian variable selection., Statistica Sinica 7, 339–373.
  • [17] Greenshstein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization., Bernoulli 10, 971–988.
  • [18] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise., Ann. Statist. 38, 1681–1732.
  • [19] Johnstone, I.M. (2002)., Function Estimation and Gaussian Sequence Models, unpublished manuscript.
  • [20] Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J.O. (2008). Mixtures of, g priors for Bayesian variable selection. J. Am. Statist. Assoc. 103, 410–423.
  • [21] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Ann. Statist. 37, 246–270.
  • [22] Raskutti, G., Wainwright, M.J. and Yu, B. (2009). Minimax rates of estimations for high-dimensional regression over, lq balls. Technical Report, UC Berkeley, http://arxiv.org/abs/0910/2042.
  • [23] Rigollet, P. and Tsybakov, A. (2010). Exponential screening and optimal rates of sparse estimation., http://arxiv.org/pdf/1003.2654.
  • [24] Schwarz, G. (1978). Estimating the dimension of a model., Ann. Statist. 6, 461–464.
  • [25] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., J. Roy. Statist. Soc. Ser. B 58, 267–288.
  • [26] Tropp, J.A. and Wright, S.J. (2010). Computational methods for sparse solution of linear inverse problems., Proc. IEEE, special issue “Applications of sparse representation and comprehensive sensing”.
  • [27] Tsybakov, A. (2009)., Introduction to Nonparametric Estimation. Springer.
  • [28] Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with, g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finietti (eds. Goel, P.K. and Zellner, A.), North-Holland, Amsterdam, 233–243.