Source: Ann. Statist. Volume 38, Number 5
(2010), 2587-2619.
This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.
References
Barbieri, M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
Berger, J., Pericchi, L. and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A 60 307–321.
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet):
MR804611
Berger, J. O. and Molina, G. (2005). Posterior model probabilities via path-based pairwise priors. Statist. Neerlandica 59 3–15.
Berry, D. (1988). Multiple comparisons, multiple tests, and data dredging: A Bayesian perspective. In Bayesian Statistics 3 (J. Bernardo, M. DeGroot, D. Lindley and A. Smith, eds.) 79–94. Oxford Univ. Press, New York.
Berry, D. and Hochberg, Y. (1999). Bayesian perspectives on multiple comparisons. J. Statist. Plann. Inference 82 215–277.
Bogdan, M., Ghosh, J. K. and Zak-Szatkowska, M. (2008). Selecting explanatory variables with the modified version of the Bayesian information criterion. Quality and Reliability Engineering International 24 627–641.
Bogdan, M., Chakrabarti, A. and Ghosh, J. K. (2008). Optimal rules for multiple testing and sparse multiple regression. Technical Report I-18/08/P-003, Wrocław Univ. Technology.
Bogdan, M., Ghosh, J. K. and Tokdar, S. T. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen 211–230. IMS, Beachwood, OH.
Carlin, B. and Louis, T. (2000). Empirical Bayes: Past, present and future. J. Amer. Statist. Assoc. 95 1286–1289.
Carvalho, C. M. and Scott, J. G. (2009). Objective Bayesian model selection in Gaussian graphical models. Biometrika 96 497–512.
Casella, G. and Moreno, E. (2002). Objective Bayes variable selection. Technical Report 023, Univ. Florida.
Cui, W. and George, E. I. (2008). Empirical Bayes vs. fully Bayes variable selection. J. Statist. Plann. Inference 138 888–900.
Do, K.-A., Muller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. J. Roy. Statist. Soc. Ser. C 54 627–644.
Eaton, M. (1989). Group Invariance Applications in Statistics. IMS, Hayward, CA.
Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Fernandez, C., Ley, E. and Steel, M. (2001). Model uncertainty in cross-country growth regressions. J. Appl. Econometrics 16 563–576.
George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
Gopalan, R. and Berry, D. (1998). Bayesian multiple comparisons using Dirichlet process priors. J. Amer. Statist. Assoc. 93 1130–1139.
Gould, H. (1964). Sums of logarithms of binomial coefficients. Amer. Math. Monthly 71 55–58.
Jefferys, W. and Berger, J. (1992). Ockham’s razor and Bayesian analysis. American Scientist 80 64–72.
Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford.
Mathematical Reviews (MathSciNet):
MR187257
Johnstone, I. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical–Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
Ley, E. and Steel, M. F. (2009). On the effect of prior assumptions in Bayesian model averaging with applications to growth regression. J. Appl. Econometrics 24 651–674.
Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J. (2008). Mixtures of g-priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
Meng, C. and Dempster, A. (1987). A Bayesian approach to the multiplicity problem for significance testing with binomial data. Biometrics 43 301–311.
Mathematical Reviews (MathSciNet):
MR897406
Sala-i Martin, X., Doppelhofer, G. and Miller, R. I. (2004). Determinants of long-term growth: A Bayesian averaging of classical estimates (bace) approach. American Economic Review 94 813–835.
Scott, J. G. (2009). Nonparametric Bayesian multiple testing for longitudinal performance stratification. Ann. Appl. Statist. 3 1655–1674.
Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
Scott, J. G. and Carvalho, C. M. (2008). Feature-inclusion stochastic search for Gaussian graphical models. J. Comput. Graph. Statist. 17 790–808.
Waller, R. and Duncan, D. (1969). A Bayes rule for the symmetric multiple comparison problem. J. Amer. Statist. Assoc. 64 1484–1503.
Mathematical Reviews (MathSciNet):
MR362749
Westfall, P. H., Johnson, W. O. and Utts, J. M. (1997). A Bayesian perspective on the Bonferroni adjustment. Biometrika 84 419–427.
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (P. Goel and A. Zellner, eds.) 233–243. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet):
MR881437
Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 585–603. Univ. Press, Valencia.
Mathematical Reviews (MathSciNet):
MR638871