Source: Bayesian Anal. Volume 7, Number 2
(2012), 477-502.
Using a collection of simulated and real benchmarks, we compare Bayesian and frequentist regularization approaches under a low informative constraint when the number of variables is almost equal to the number of observations on simulated and real datasets. This comparison includes new global noninformative approaches for Bayesian variable selection built on Zellner’s $g$-priors that are similar to Liang et al. (2008). The interest of those calibration-free proposals is discussed. The numerical experiments we present highlight the appeal of Bayesian regularization methods, when compared with non-Bayesian alternatives. They dominate frequentist methods in the sense that they provide smaller prediction errors while selecting the most relevant variables in a parsimonious way.
References
Bartlett, M. (1957). A comment on D.V. Lindley’s statistical paradox. Biometrika, 44:533–534.
Berger, J., Pericchi, L., and Varshavsky, I. (1998). Bayes factors and marginal distributions in invariant situations. Sankhya A, 60:307–321.
Bottolo, L. and Richardson, S. (2010). Evolutionary stochastic search for Bayesian model exploration. Bayesian Analysis, 5(3):583–618.
Breiman, L. and Friedman, J.H. (1985). Estimating optimal transformations for multiple regression and correlation. Journal of the American Statistical Association, 85(391):580–598.
Mathematical Reviews (MathSciNet):
MR803258
Brown, J. and Vannucci, M. (1998). Multivariate Bayesian variable selection and prediction. Journal of the Royal Statistical Society Series B, 60(3):627–641.
Butler, R. and Wood, A. (2002). Laplace approximations for hypergeometric functions with matrix arguments. Annals of Statistics, 30:1155–1177.
Candes, E. and Tao, T. (2007). The Dantzig Selector: statistical estimation when $p$ is much larger than $n$. Annals of Statistics, 35(6):2313–2351.
Casella, G. and Moreno, E. (2006). Objective Bayesian variable selection. Journal of the American Statistical Association, 101(473):157–167.
Celeux, G., Marin, J.-M., and Robert, C. (2006). Sélection bayésienne de variables en régression linéaire. Journal de la Société Française de Statistique, 147(1):59–79.
Chipman, H. (1996). Bayesian variable selection with related predictors. Canadian Journal of Statistics, 1:17–36.
Cui, W. and George, E. (2008). Empirical Bayes vs. fully Bayes variable selection. Journal of Statistical Planning and Inference, 138:888–900.
de Finetti, B. (1972). Probability, Induction and Statistics. John Wiley, New York.
DeGroot, M. (1973). Doing what comes naturally: Interpreting a tail area as a posterior probability or as a likelihood ratio. Journal of the American Statistical Association, 68:966–969.
Mathematical Reviews (MathSciNet):
MR362639
Dupuis, J. and Robert, C. (2003). Bayesian variable selection in qualitative models by Kullback-Leibler projections. Journal of Statistical Planning and Inference, 111:77–94.
Fernandez, C., Ley, E., and Steel, M. (2001). Benchmark priors for Bayesian model averaging. Journal of Econometrics, 100:381–427.
Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Annals of Statistics, 22:1947–1975.
George, E. (2000). The variable selection problem. Journal of the American Statistical Association, 95:1304–1308.
George, E. and Foster, D. (2000). Calibration and empirical Bayes variable selection. Biometrika, 87(4):731–747.
George, E. and McCulloch, R. (1993). Variable selection via Gibbbs sampling. Journal of the American Statistical Association, 88:881–889.
George, E. and McCulloch, R. (1997). Approaches to Bayesian variable selection. Statistica Sinica, 7:339–373.
Hoerl, A. and Kennard, R. (1970). Ridge regression: biased estimation for non orthogonal problems. Technometrics, 12:55–67.
Kass, R. and Raftery, A. (1995). Bayes factor and model uncertainty. Journal of the American Statistical Association, 90:773–795.
Kass, R. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association, 90:928–934.
Kohn, R., Smith, M., and Chan, D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11:313–322.
Liang, F., Paulo, R., Molina, G., Clyde, M., and Berger, J. (2008). Mixtures of $g$-priors for Bayesian variable selection. Journal of the American Statistical Association, 103(481):410–423.
Lindley, D. (1957). A statistical paradox. Biometrika, 44:187–192.
Mathematical Reviews (MathSciNet):
MR87273
Marin, J. and Robert, C. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer-Verlag, New York.
Mitchell, T. and Beauchamp, J. (1988). Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83:1023–1032.
Mathematical Reviews (MathSciNet):
MR997578
Nott, D. J. and Green, P. J. (2004). Bayesian variable selection and the Swendsen-Wang algorithm. Journal of Computational and Graphical Statistics, 13:1–17.
Park, T. and Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103(473):681–686.
Penrose, K., Nelson, A., and Fisher, A. (1985). Generalized body composition prediction equation for men using simple measurement techniques. Medicine and Science in Sports and Exercise, 17(2):189.
Philips, R. and Guttman, I. (1998). A new criterion for variable selection. Statistics and Probability Letters, 38:11–19.
Rao, C. (1973). Linear Statistical Inference and its Applications. John Wiley, New York.
Mathematical Reviews (MathSciNet):
MR346957
Robert, C. (1993). A note on the Jeffreys-Lindley paradox. Statistica Sinica, 3:601–608.
Robert, C. (2001). The Bayesian Choice. Springer-Verlag, 2 edition.
Schneider, U. and Corcoran, J. (2004). Perfect sampling for Bayesian variable selection in a linear regression model. Journal of Statistical Planning and Inference, 126:153–171.
Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics, 75:317–343.
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 58(1):267–288.
Yuan, M. and Lin, Y. (2007). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68(1):49–67.
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distribution regression using Bayesian variable selection. In Bayesian inference and decision techniques: Essays in Honor of Bruno De Finetti, pages 233–243. North-Holland / Elsevier.
Mathematical Reviews (MathSciNet):
MR881437
Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics, pages 585–603. Valencia: University Press. (Proceedings of the first Valencia meeting).
Zou, H. (2006). The adaptive Lasso and its oracle properties. Journal of the American Statistical Association, 101:1418–1429.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B, 67(2):301–320.