## The Annals of Statistics

### Training samples in objective Bayesian model selection

#### Abstract

Central to several objective approaches to Bayesian model selection is the use of training samples (subsets of the data), so as to allow utilization of improper objective priors. The most common prescription for choosing training samples is to choose them to be as small as possible, subject to yielding proper posteriors; these are called minimal training samples.

When data can vary widely in terms of either information content or impact on the improper priors, use of minimal training samples can be inadequate. Important examples include certain cases of discrete data, the presence of censored observations, and certain situations involving linear models and explanatory variables. Such situations require more sophisticated methods of choosing training samples. A variety of such methods are developed in this paper, and successfully applied in challenging situations.

#### Article information

Source
Ann. Statist. Volume 32, Number 3 (2004), 841-869.

Dates
First available in Project Euclid: 24 May 2004

https://projecteuclid.org/euclid.aos/1085408488

Digital Object Identifier
doi:10.1214/009053604000000229

Mathematical Reviews number (MathSciNet)
MR2065191

Zentralblatt MATH identifier
1092.62034

#### Citation

Berger, James O.; Pericchi, Luis R. Training samples in objective Bayesian model selection. Ann. Statist. 32 (2004), no. 3, 841--869. doi:10.1214/009053604000000229. https://projecteuclid.org/euclid.aos/1085408488.

#### References

• Alqallaf, F. and Gustafson, P. (2001). On cross-validation of Bayesian models. Canad. J. Statist. 29 333--340.
• Beattie, S. D., Fong, D. K. H. and Lin, D. K. J. (2002). A two-stage Bayesian model selection strategy for supersaturated designs. Technometrics 44 55--63.
• Berger, J. and Bernardo, J. (1992). On the development of reference priors. In Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 35--60. Oxford Univ. Press.
• Berger, J. and Mortera, J. (1999). Default Bayes factors for nonnested hypothesis testing. J. Amer. Statist. Assoc. 94 542--554.
• Berger, J. and Pericchi, L. (1996a). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109--122.
• Berger, J. and Pericchi, L. (1996b). The intrinsic Bayes factor for linear models. In Bayesian Statistics 5 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 25--44. Oxford Univ. Press.
• Berger, J. and Pericchi, L. (1996c). On the justification of default and intrinsic Bayes factors. In Modelling and Prediction (J. C. Lee, W. O. Johnson and A. Zellner, eds.) 276--293. Springer, New York.
• Berger, J. and Pericchi, L. (1998). Accurate and stable Bayesian model selection: The median intrinsic Bayes factor. Sankhy$\bara$ Ser. B 60 1--18.
• Berger, J. and Pericchi, L. (2001). Objective Bayesian methods for model selection: Introduction and comparison (with discussion). In Model Selection (P. Lahiri, ed.). 135--207. IMS, Beachwood, OH.
• Berger, J., Pericchi, L. and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhy$\bara$ Ser. A 60 307--321.
• Bertolino, F. and Racugno, W. (1996). Is the intrinsic Bayes factor intrinsic? Metron 54 5--15.
• Bertolino, F. Racugno, W. and Moreno, E., (2000). Bayesian model selection approach to analysis of variance under heteroscedasticity. The Statistician 49 503--517.
• Cano, J. A., Kessler, M. and Moreno, E. (2002). On intrinsic priors for nonnested models. Technical report, Univ. Granada.
• Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data. Chapman and Hall, London.
• De Santis, F., Mortera, J. and Nardi, A. (2001). Jeffreys priors for survival models with censored data. J. Statist. Plann. Inference 99 193--209.
• De Santis, F. and Spezzaferri, F. (1997). Alternative Bayes factors for model selection. Canad. J. Statist. 25 503--515.
• De Santis, F. and Spezzaferri, F. (1998a). Consistent fractional Bayes factor for linear models. Pubblicazioni Scientifich del Dipartimento di Statistica, Probab. e Stat. Appl., Univ. di Roma, La Sapienza,'' Ser. A n. 19.
• De Santis, F. and Spezzaferri, F. (1998b). Bayes factors and hierarchical models. J. Statist. Plann. Inference 74 323--342.
• De Santis, F. and Spezzaferri, F. (1999). Methods for default and robust Bayesian model comparison: The fractional Bayes factor approach. Internat. Statist. Rev. 67 267--286.
• de Vos, A. F. (1993). A fair comparison between regression models of different dimension. Technical report, The Free University, Amsterdam.
• Findley, D. F. (1991). Counterexamples to parsimony and BIC. Ann. Inst. Statist. Math. 43 505--514.
• Gehan, E. A. (1965). A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52 203--223.
• Gelfand, A. E., Dey, D. K. and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. In Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 147--167. Oxford Univ. Press.
• Ghosh, J. K. and Samanta, T. (2002). Nonsubjective Bayes testing---an overview. J. Statist. Plann. Inference 103 205--223.
• Girón, F., Martínez, M. L. and Moreno, E. (2003). Bayesian analysis of matched pairs. J. Statist. Plann. Inference 113 49--66.
• Good, I. J. (1950). Probability and the Weighing of Evidence. Hafner, New York.
• Iwaki, K. (1997). Posterior expected marginal likelihood for testing hypotheses. J. Econ. Asia Univ. 21 105--134.
• Iwaki, K. (1999). Noninformative priors for model comparison. Discussion Paper No. 53, Institute of Economic and Social Research, Asia Univ.
• Key, J. T., Pericchi, L. R. and Smith, A. F.M. (1999). Bayesian model choice: What and why? In Bayesian Statistics 6 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 343--370. Oxford Univ. Press.
• Kim, S. and Sun, D. (2000). Intrinsic priors for model selection using an encompassing model with applications to censored failure time data. Lifetime Data Anal. 6 251--269.
• Lingham, R. and Sivaganesan, S. (1997). Testing hypotheses about the power law process under failure truncation using intrinsic Bayes factors. Ann. Inst. Statist. Math. 49 693--710.
• Lingham, R. and Sivaganesan, S. (1999). Intrinsic Bayes factor approach to a test for the power law process. J. Statist. Plann. Inference 77 195--220.
• Moreno, E., Bertolino, F. and Racugno, W. (1998). An intrinsic limiting procedure for model selection and hypothesis testing. J. Amer. Statist. Assoc. 93 1451--1460.
• Moreno, E., Bertolino, F., and Racugno, W. (1999). Default Bayesian analysis of the Behrens--Fisher problem. J. Statist. Plann. Inference 81 323--333.
• Moreno, E., Bertolino, F. and Racugno, W. (2001). Inference under partial prior information. Technical report, Univ. Granada.
• Moreno, E., Girón, F. and Torres, F. (2004). Intrinsic priors for hypothesis testing in normal regression models. Rev. R. Acad. Cienc. Exactas Fís. Nat. $($Esp.$)$. To appear.
• Moreno, E. and Liseo, B. (2003). A default Bayesian test for the number of components in a mixture. J. Statist. Plann. Inference 111 129--142.
• Moreno, E., Torres, F. and Casella, G. (2002). Testing equality of regression coefficients in heteroscedastic normal regression models. Technical report, Univ. Granada.
• Neal, R. (2001). Transferring prior information between models using imaginary data. Technical Report 0108, Dept. Statistics, Univ. Toronto.
• O'Hagan, A. (1995). Fractional Bayes factors for model comparison (with discussion). J. Roy. Statist. Soc. Ser. B 57 99--138.
• O'Hagan, A. (1997). Properties of intrinsic and fractional Bayes factors. Test 6 101--118.
• Paulo, R. (2002). Conditional frequentist testing and model validation. Ph.D. dissertation, Duke Univ.
• Pérez, J. M. (1998). Development of conventional prior distributions for model comparisons. Ph.D. dissertation, Purdue Univ.
• Pérez, J. M. and Berger, J. (2001). Analysis of mixture models using expected posterior priors, with application to classification of gamma ray bursts. In Bayesian Methods, with Applications to Science, Policy and Official Statistics (E. George and P. Nanopoulos, eds.) 401--410. Eurostat, Luxembourg.
• Pérez, J. M. and Berger, J. (2002). Expected posterior prior distributions for model selection. Biometrika 89 491--511.
• Pericchi, L. R., Fiteni, A. and Presa, E. (1993). The intrinsic Bayes factor explained by examples. Technical report, Dept. Estadística y Econometría, Universidad Carlos III, Madrid.
• Rodriguez, A. and Pericchi, L. R. (2001). Intrinsic Bayes factors for dynamic models. In Bayesian Methods, with Applications to Science, Policy and Official Statistics (E. George and P. Nanopoulos, eds.) 459--468. Eurostat, Luxembourg.
• Sivaganesan, S. and Lingham, R. (1999). Bayes factors for model selection for some diffusion processes under improper priors. Technical report, Dept. Mathematical Sciences, Univ. Cincinnati.
• Smith, A. F. M. and Spiegelhalter, D. J. (1980). Bayes factors and choice criteria for linear models. J. Roy. Statist. Soc. Ser. B 42 213--220.
• Sun, D. and Kim, S. (1997). Intrinsic priors for testing ordered exponential means. Technical report, Dept. Statistics, Univ. Missouri.
• Varshavsky, J. (1995). On the development of intrinsic Bayes factors. Ph.D. dissertation, Purdue Univ.
• Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics 1 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 585--603. Valencia Univ. Press.
• Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with $g$-prior distributions. In Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (P. K. Goel and A. Zellner, eds.) 233--243. North-Holland, Amsterdam.