Annals of Statistics

Optimal predictive model selection

Maria Maddalena Barbieri and James O. Berger

Full-text: Open access


Often the goal of model selection is to choose a model for future prediction, and it is natural to measure the accuracy of a future prediction by squared error loss. Under the Bayesian approach, it is commonly perceived that the optimal predictive model is the model with highest posterior probability, but this is not necessarily the case. In this paper we show that, for selection among normal linear models, the optimal predictive model is often the median probability model, which is defined as the model consisting of those variables which have overall posterior probability greater than or equal to 1/2 of being in a model. The median probability model often differs from the highest probability model.

Article information

Ann. Statist., Volume 32, Number 3 (2004), 870-897.

First available in Project Euclid: 24 May 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F15: Bayesian inference
Secondary: 62C10: Bayesian problems; characterization of Bayes procedures

Bayesian linear models predictive distribution squared error loss variable selection


Barbieri, Maria Maddalena; Berger, James O. Optimal predictive model selection. Ann. Statist. 32 (2004), no. 3, 870--897. doi:10.1214/009053604000000238.

Export citation


  • Berger, J. O. (1997). Bayes factors. In Encyclopedia of Statistical Sciences, Update (S. Kotz, C. B. Read and D. L. Banks, eds.) 3 20–29. Wiley, New York.
  • Berger, J. O., Ghosh, J. K. and Mukhopadhyay, N. (2003). Approximations and consistency of Bayes factors as model dimension grows. J. Statist. Plann. Inference 112 241–258.
  • Berger, J. O. and Molina, M. (2002). Discussion of “A case study in model selection,” by K. Viele, R. Kass, M. Tarr, M. Behrmann and I. Gauthier. In Case Studies in Bayesian Statistics VI. Lecture Notes in Statist. 167 112–125. Springer, New York.
  • Berger, J. O. and Pericchi, L. R. (1996a). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109–122.
  • Berger, J. O. and Pericchi, L. R. (1996b). The intrinsic Bayes factor for linear models. In Bayesian Statistics 5 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 25–44. Oxford Univ. Press.
  • Berger, J. O. and Pericchi, L. R. (2001). Objective Bayesian methods for model selection: Introduction and comparison (with discussion). In Model Selection (P. Lahiri, ed.) 135–207. IMS, Beachwood, OH.
  • Berger, J. O., Pericchi, L. R. and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A 60 307–321.
  • Bernardo, J. and Smith, A. F. M. (1994). Bayesian Theory. Wiley, New York.
  • Chipman, H., George, E. I. and McCulloch, R. E. (2001). The practical implementation of Bayesian model selection (with discussion). In Model Selection (P. Lahiri, ed.) 66–134. IMS, Beachwood, OH.
  • Clyde, M. A. (1999). Bayesian model averaging and model search strategies. In Bayesian Statistics 6 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 157–185. Oxford Univ. Press.
  • Clyde, M. A., DeSimone, H. and Parmigiani, G. (1996). Prediction via orthogonalized model mixing. J. Amer. Statist. Assoc. 91 1197–1208.
  • Clyde, M. A. and George, E. I. (1999). Empirical Bayes estimation in wavelet nonparametric regression. In Bayesian Inference in Wavelet-Based Models. Lecture Notes in Statist. 141 309–322. Springer, New York.
  • Clyde, M. A. and George, E. I. (2000). Flexible empirical Bayes estimation for wavelets. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 681–698.
  • Clyde, M. A. and Parmigiani, G. (1996). Orthogonalizations and prior distributions for orthogonalized model mixing. In Modelling and Prediction (J. C. Lee, W. Johnson and A. Zellner, eds.) 206–227. Springer, New York.
  • Clyde, M. A., Parmigiani, G. and Vidakovic, B. (1998). Multiple shrinkage and subset selection in wavelets. Biometrika 85 391–401.
  • Draper, N. and Smith, H. (1981). Applied Regression Analysis, 2nd ed. Wiley, New York.
  • George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731–747.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford Univ. Press.
  • Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial (with discussion). Statist. Sci. 14 382–417.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression (with discussion). J. Amer. Statist. Assoc. 83 1023–1036.
  • Montgomery, D. C. (1991). Design and Analysis of Experiments, 3rd ed. Wiley, New York.
  • Mukhopadhyay, N. (2000). Bayesian and empirical Bayesian model selection. Ph.D. dissertation, Purdue Univ.
  • Müller, P. (1999). Simulation-based optimal design. In Bayesian Statistics 6 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 459–474. Oxford Univ. Press.
  • Nadal, N. (1999). El análisis de varianza basado en los factores de Bayes intrínsecos. Ph.D. thesis, Univ. Simón Bolívar, Venezuela.
  • O'Hagan, A. (1995). Fractional Bayes factors for model comparison (with discussion). J. Roy. Statist. Soc. Ser. B 57 99–138.
  • Scheffé, H. (1959). The Analysis of Variance. Wiley, New York.
  • Shibata, R. (1983). Asymptotic mean efficiency of a selection of regression variables. Ann. Inst. Statist. Math. 35 415–423.
  • Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (P. K. Goel and A. Zellner, eds.) 233–243. North-Holland, Amsterdam.