The Annals of Statistics

Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem

James G. Scott and James O. Berger

Full-text: Open access

Abstract

This paper studies the multiplicity-correction effect of standard Bayesian variable-selection priors in linear regression. Our first goal is to clarify when, and how, multiplicity correction happens automatically in Bayesian analysis, and to distinguish this correction from the Bayesian Ockham’s-razor effect. Our second goal is to contrast empirical-Bayes and fully Bayesian approaches to variable selection through examples, theoretical results and simulations. Considerable differences between the two approaches are found. In particular, we prove a theorem that characterizes a surprising aymptotic discrepancy between fully Bayes and empirical Bayes. This discrepancy arises from a different source than the failure to account for hyperparameter uncertainty in the empirical-Bayes estimate. Indeed, even at the extreme, when the empirical-Bayes estimate converges asymptotically to the true variable-inclusion probability, the potential for a serious difference remains.

Article information

Source
Ann. Statist. Volume 38, Number 5 (2010), 2587-2619.

Dates
First available in Project Euclid: 11 July 2010

Permanent link to this document
http://projecteuclid.org/euclid.aos/1278861454

Digital Object Identifier
doi:10.1214/10-AOS792

Zentralblatt MATH identifier
1200.62020

Mathematical Reviews number (MathSciNet)
MR2722450

Subjects
Primary: 62J05: Linear regression 62J15: Paired and multiple comparisons

Keywords
Bayesian model selection empirical Bayes multiple testing variable selection

Citation

Scott, James G.; Berger, James O. Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 (2010), no. 5, 2587--2619. doi:10.1214/10-AOS792. http://projecteuclid.org/euclid.aos/1278861454.


Export citation

References

  • Barbieri, M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
  • Berger, J., Pericchi, L. and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A 60 307–321.
  • Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
  • Berger, J. O. and Molina, G. (2005). Posterior model probabilities via path-based pairwise priors. Statist. Neerlandica 59 3–15.
  • Berry, D. (1988). Multiple comparisons, multiple tests, and data dredging: A Bayesian perspective. In Bayesian Statistics 3 (J. Bernardo, M. DeGroot, D. Lindley and A. Smith, eds.) 79–94. Oxford Univ. Press, New York.
  • Berry, D. and Hochberg, Y. (1999). Bayesian perspectives on multiple comparisons. J. Statist. Plann. Inference 82 215–277.
  • Bogdan, M., Ghosh, J. K. and Zak-Szatkowska, M. (2008). Selecting explanatory variables with the modified version of the Bayesian information criterion. Quality and Reliability Engineering International 24 627–641.
  • Bogdan, M., Chakrabarti, A. and Ghosh, J. K. (2008). Optimal rules for multiple testing and sparse multiple regression. Technical Report I-18/08/P-003, Wrocław Univ. Technology.
  • Bogdan, M., Ghosh, J. K. and Tokdar, S. T. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen 211–230. IMS, Beachwood, OH.
  • Carlin, B. and Louis, T. (2000). Empirical Bayes: Past, present and future. J. Amer. Statist. Assoc. 95 1286–1289.
  • Carvalho, C. M. and Scott, J. G. (2009). Objective Bayesian model selection in Gaussian graphical models. Biometrika 96 497–512.
  • Casella, G. and Moreno, E. (2002). Objective Bayes variable selection. Technical Report 023, Univ. Florida.
  • Cui, W. and George, E. I. (2008). Empirical Bayes vs. fully Bayes variable selection. J. Statist. Plann. Inference 138 888–900.
  • Do, K.-A., Muller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. J. Roy. Statist. Soc. Ser. C 54 627–644.
  • Eaton, M. (1989). Group Invariance Applications in Statistics. IMS, Hayward, CA.
  • Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Fernandez, C., Ley, E. and Steel, M. (2001). Model uncertainty in cross-country growth regressions. J. Appl. Econometrics 16 563–576.
  • George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • Gopalan, R. and Berry, D. (1998). Bayesian multiple comparisons using Dirichlet process priors. J. Amer. Statist. Assoc. 93 1130–1139.
  • Gould, H. (1964). Sums of logarithms of binomial coefficients. Amer. Math. Monthly 71 55–58.
  • Jefferys, W. and Berger, J. (1992). Ockham’s razor and Bayesian analysis. American Scientist 80 64–72.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford.
  • Johnstone, I. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical–Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
  • Ley, E. and Steel, M. F. (2009). On the effect of prior assumptions in Bayesian model averaging with applications to growth regression. J. Appl. Econometrics 24 651–674.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. and Berger, J. (2008). Mixtures of g-priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
  • Meng, C. and Dempster, A. (1987). A Bayesian approach to the multiplicity problem for significance testing with binomial data. Biometrics 43 301–311.
  • Sala-i Martin, X., Doppelhofer, G. and Miller, R. I. (2004). Determinants of long-term growth: A Bayesian averaging of classical estimates (bace) approach. American Economic Review 94 813–835.
  • Scott, J. G. (2009). Nonparametric Bayesian multiple testing for longitudinal performance stratification. Ann. Appl. Statist. 3 1655–1674.
  • Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
  • Scott, J. G. and Carvalho, C. M. (2008). Feature-inclusion stochastic search for Gaussian graphical models. J. Comput. Graph. Statist. 17 790–808.
  • Waller, R. and Duncan, D. (1969). A Bayes rule for the symmetric multiple comparison problem. J. Amer. Statist. Assoc. 64 1484–1503.
  • Westfall, P. H., Johnson, W. O. and Utts, J. M. (1997). A Bayesian perspective on the Bonferroni adjustment. Biometrika 84 419–427.
  • Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (P. Goel and A. Zellner, eds.) 233–243. North-Holland, Amsterdam.
  • Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting held in Valencia (Spain) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 585–603. Univ. Press, Valencia.