The Annals of Statistics

Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem

James G. Scott and James O. Berger
Source: Ann. Statist. Volume 38, Number 5 (2010), 2587-2619.

Abstract

This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.

First Page: Show Hide
Primary Subjects: 62J05, 62J15
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1278861454
Digital Object Identifier: doi:10.1214/10-AOS792
Zentralblatt MATH identifier: 1200.62020
Mathematical Reviews number (MathSciNet): MR2722450

References

Barbieri, M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
Mathematical Reviews (MathSciNet): MR2065192
Digital Object Identifier: doi:10.1214/009053604000000238
Project Euclid: euclid.aos/1085408489
Berger, J., Pericchi, L. and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A 60 307–321.
Mathematical Reviews (MathSciNet): MR1718789
Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR804611
Zentralblatt MATH: 0572.62008
Berger, J. O. and Molina, G. (2005). Posterior model probabilities via path-based pairwise priors. Statist. Neerlandica 59 3–15.
Mathematical Reviews (MathSciNet): MR2137378
Digital Object Identifier: doi:10.1111/j.1467-9574.2005.00275.x
Berry, D. (1988). Multiple comparisons, multiple tests, and data dredging: A Bayesian perspective. In Bayesian Statistics 3 (J. Bernardo, M. DeGroot, D. Lindley and A. Smith, eds.) 79–94. Oxford Univ. Press, New York.
Mathematical Reviews (MathSciNet): MR1008045
Zentralblatt MATH: 0706.62033
Berry, D. and Hochberg, Y. (1999). Bayesian perspectives on multiple comparisons. J. Statist. Plann. Inference 82 215–277.
Mathematical Reviews (MathSciNet): MR1736444
Zentralblatt MATH: 1063.62527
Digital Object Identifier: doi:10.1016/S0378-3758(99)00044-0
Bogdan, M., Ghosh, J. K. and Zak-Szatkowska, M. (2008). Selecting explanatory variables with the modified version of the Bayesian information criterion. Quality and Reliability Engineering International 24 627–641.
Bogdan, M., Chakrabarti, A. and Ghosh, J. K. (2008). Optimal rules for multiple testing and sparse multiple regression. Technical Report I-18/08/P-003, Wrocław Univ. Technology.
Bogdan, M., Ghosh, J. K. and Tokdar, S. T. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen 211–230. IMS, Beachwood, OH.
Carlin, B. and Louis, T. (2000). Empirical Bayes: Past, present and future. J. Amer. Statist. Assoc. 95 1286–1289.
Mathematical Reviews (MathSciNet): MR1825277
Zentralblatt MATH: 1072.62511
Digital Object Identifier: doi:10.2307/2669771
Carvalho, C. M. and Scott, J. G. (2009). Objective Bayesian model selection in Gaussian graphical models. Biometrika 96 497–512.
Casella, G. and Moreno, E. (2002). Objective Bayes variable selection. Technical Report 023, Univ. Florida.
Cui, W. and George, E. I. (2008). Empirical Bayes vs. fully Bayes variable selection. J. Statist. Plann. Inference 138 888–900.
Mathematical Reviews (MathSciNet): MR2416869
Zentralblatt MATH: 1130.62007
Digital Object Identifier: doi:10.1016/j.jspi.2007.02.011
Do, K.-A., Muller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. J. Roy. Statist. Soc. Ser. C 54 627–644.
Mathematical Reviews (MathSciNet): MR2137258
Zentralblatt MATH: 05188702
Digital Object Identifier: doi:10.1111/j.1467-9876.2005.05593.x
Eaton, M. (1989). Group Invariance Applications in Statistics. IMS, Hayward, CA.
Mathematical Reviews (MathSciNet): MR1089423
Zentralblatt MATH: 0749.62005
Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Mathematical Reviews (MathSciNet): MR1946571
Zentralblatt MATH: 1036.62045
Digital Object Identifier: doi:10.1198/016214501753382129
Fernandez, C., Ley, E. and Steel, M. (2001). Model uncertainty in cross-country growth regressions. J. Appl. Econometrics 16 563–576.
George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
Mathematical Reviews (MathSciNet): MR1813972
Zentralblatt MATH: 1029.62008
Digital Object Identifier: doi:10.1093/biomet/87.4.731
Gopalan, R. and Berry, D. (1998). Bayesian multiple comparisons using Dirichlet process priors. J. Amer. Statist. Assoc. 93 1130–1139.
Mathematical Reviews (MathSciNet): MR1649207
Zentralblatt MATH: 1063.62530
Digital Object Identifier: doi:10.2307/2669856
Gould, H. (1964). Sums of logarithms of binomial coefficients. Amer. Math. Monthly 71 55–58.
Mathematical Reviews (MathSciNet): MR1532480
Digital Object Identifier: doi:10.2307/2311306
Jefferys, W. and Berger, J. (1992). Ockham’s razor and Bayesian analysis. American Scientist 80 64–72.
Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford.
Mathematical Reviews (MathSciNet): MR187257
Johnstone, I. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical–Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
Mathematical Reviews (MathSciNet): MR2089135
Zentralblatt MATH: 1047.62008
Digital Object Identifier: doi:10.1214/009053604000000030
Project Euclid: euclid.aos/1091626180
Ley, E. and Steel, M. F. (2009). On the effect of prior assumptions in Bayesian model averaging with applications to growth regression. J. Appl. Econometrics 24 651–674.
Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J. (2008). Mixtures of g-priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
Mathematical Reviews (MathSciNet): MR2420243
Zentralblatt MATH: 05564499
Digital Object Identifier: doi:10.1198/016214507000001337
Meng, C. and Dempster, A. (1987). A Bayesian approach to the multiplicity problem for significance testing with binomial data. Biometrics 43 301–311.
Mathematical Reviews (MathSciNet): MR897406
Digital Object Identifier: doi:10.2307/2531814
Sala-i Martin, X., Doppelhofer, G. and Miller, R. I. (2004). Determinants of long-term growth: A Bayesian averaging of classical estimates (bace) approach. American Economic Review 94 813–835.
Scott, J. G. (2009). Nonparametric Bayesian multiple testing for longitudinal performance stratification. Ann. Appl. Statist. 3 1655–1674.
Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
Mathematical Reviews (MathSciNet): MR2235051
Zentralblatt MATH: 1087.62039
Digital Object Identifier: doi:10.1016/j.jspi.2005.08.031
Scott, J. G. and Carvalho, C. M. (2008). Feature-inclusion stochastic search for Gaussian graphical models. J. Comput. Graph. Statist. 17 790–808.
Mathematical Reviews (MathSciNet): MR2649067
Digital Object Identifier: doi:10.1198/106186008X382683
Waller, R. and Duncan, D. (1969). A Bayes rule for the symmetric multiple comparison problem. J. Amer. Statist. Assoc. 64 1484–1503.
Mathematical Reviews (MathSciNet): MR362749
Digital Object Identifier: doi:10.2307/2286085
Westfall, P. H., Johnson, W. O. and Utts, J. M. (1997). A Bayesian perspective on the Bonferroni adjustment. Biometrika 84 419–427.
Mathematical Reviews (MathSciNet): MR1467057
Zentralblatt MATH: 0882.62025
Digital Object Identifier: doi:10.1093/biomet/84.2.419
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (P. Goel and A. Zellner, eds.) 233–243. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR881437
Zentralblatt MATH: 0655.62071
Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 585–603. Univ. Press, Valencia.
Mathematical Reviews (MathSciNet): MR638871

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?