The Annals of Applied Statistics

A weakly informative default prior distribution for logistic and other regression models

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung Su

Full-text: Open access

Abstract

We propose a new prior distribution for classical (nonhierarchical) logistic regression models, constructed by first scaling all nonbinary variables to have mean 0 and standard deviation 0.5, and then placing independent Student-t prior distributions on the coefficients. As a default choice, we recommend the Cauchy distribution with center 0 and scale 2.5, which in the simplest setting is a longer-tailed version of the distribution attained by assuming one-half additional success and one-half additional failure in a logistic regression. Cross-validation on a corpus of datasets shows the Cauchy class of prior distributions to outperform existing implementations of Gaussian and Laplace priors.

We recommend this prior distribution as a default choice for routine applied use. It has the advantage of always giving answers, even when there is complete separation in logistic regression (a common problem, even when the sample size is large and the number of predictors is small), and also automatically applying more shrinkage to higher-order interactions. This can be useful in routine data analysis as well as in automated procedures such as chained equations for missing-data imputation.

We implement a procedure to fit generalized linear models in R with the Student-t prior distribution by incorporating an approximate EM algorithm into the usual iteratively weighted least squares. We illustrate with several applications, including a series of logistic regressions predicting voting preferences, a small bioassay experiment, and an imputation model for a public health data set.

Article information

Source
Ann. Appl. Stat. Volume 2, Number 4 (2008), 1360-1383.

Dates
First available in Project Euclid: 8 January 2009

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1231424214

Digital Object Identifier
doi:10.1214/08-AOAS191

Mathematical Reviews number (MathSciNet)
MR2655663

Zentralblatt MATH identifier
1156.62017

Keywords
Bayesian inference generalized linear model least squares hierarchical model linear regression logistic regression multilevel model noninformative prior distribution weakly informative prior distribution

Citation

Gelman, Andrew; Jakulin, Aleks; Pittau, Maria Grazia; Su, Yu-Sung. A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 (2008), no. 4, 1360--1383. doi:10.1214/08-AOAS191. http://projecteuclid.org/euclid.aoas/1231424214.


Export citation

References

  • Agresti, A. and Coull, B. A. (1998). Approximate is better than exact for interval estimation of binomial proportions. Amer. Statist. 52 119–126.
  • Albert, A. and Anderson, J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71 1–10.
  • Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository. Dept. of Information and Computer Sciences, Univ. California, Irvine. Available at www.ics.uci.edu/~mlearn/MLRepository.html.
  • Bedrick, E. J., Christensen, R. and Johnson, W. (1996). A new perspective on priors for generalized linear models. J. Amer. Statist. Assoc. 91 1450–1460.
  • Berger, J. O. and Berliner, L. M. (1986). Robust Bayes and empirical Bayes analysis with epsilon-contaminated priors. Ann. Statist. 14 461–486.
  • Bernardo, J. M. (1979). Reference posterior distributions for Bayesian inference (with discussion). J. Roy. Statist. Soc. Ser. B 41 113–147.
  • Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review 78 1–3.
  • Carlin, B. P. and Louis, T. A. (2001). Bayes and Empirical Bayes Methods for Data Analysis, 2nd ed. CRC Press. London.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Dunson, D. B., Herring, A. H. and Engel, S. M. (2006). Bayesian selection and clustering of polymorphisms in functionally-related genes. J. Amer. Statist. Assoc. To appear.
  • Fayyad, U. M. and Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the International Joint Conference on Artificial Intelligence IJCAI-93. Morgan Kauffman, Chambery, France.
  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80 27–38.
  • Gelman, A. (2008). Scaling regression inputs by dividing by two standard deviations. Statist. Med. To appear.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2003). Bayesian Data Analysis, 2nd ed. CRC Press, London.
  • Gelman, A. and Jakulin, A. (2007). Bayes: Liberal, radical, or conservative? Statist. Sinica 17 422–426.
  • Gelman, A. and Pardoe, I. (2007). Average predictive comparisons for models with nonlinearity, interactions, and variance components. Sociological Methodology.
  • Gelman, A., Pittau, M. G., Yajima, M. and Su, Y. S. (2008). An approximate EM algorithm for multilevel generalized linear models. Technical report, Dept. of Statistics, Columbia Univ.
  • Genkin, A., Lewis, D. D. and Madigan, D. (2007). Large-scale Bayesian logistic regression for text categorization. Technometrics 49 291–304.
  • Greenland, S. (2001). Putting background information about relative risks into conjugate prior distributions. Biometrics 57 663–670.
  • Greenland, S., Schlesselman, J. J. and Criqui, M. H. (2002). The fallacy of employing standardized regression coefficients and correlations as measures of effect. American Journal of Epidemiology 123 203–208.
  • Hartigan, J. (1964). Invariant prior distributions. Ann. Math. Statist. 35 836–845.
  • Heinze, G. (2006). A comparative investigation of methods for logistic regression with separated or nearly separated data. Statist. Med. 25 4216–4226.
  • Heinze, G. and Schemper, M. (2003). A solution to the problem of separation in logistic regression. Statist. Med. 12 2409–2419.
  • Jakulin, A. and Bratko, I. (2003). Analyzing attribute dependencies. In Knowledge Discovery in Databases: PKDD 2003 229–240.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford Univ. Press.
  • Kass, R. E. and Wasserman, L. (1996). The selection of prior distributions by formal rules. J. Amer. Statist. Assoc. 91 1343–1370.
  • Kosmidis, I. (2007). Bias reduction in exponential family nonlinear models. Ph.D. thesis, Dept. of Statistics, Univ. Warwick, England.
  • Lesaffre, E. and Albert, A. (1989). Partial separation in logistic discrimination. J. Roy. Statist. Soc. Ser. B 51 109–116.
  • Lange, K. L., Little, R. J. A. and Taylor, J. M. G. (1989). Robust statistical modeling using the t distribution. J. Amer. Statist. Assoc. 84 881–896.
  • Liu, C. (2004). Robit regression: A simple robust alternative to logistic and probit regression. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives (A. Gelman and X. L. Meng, eds.) 227–238. Wiley, London.
  • MacLehose, R. F., Dunson, D. B., Herring, A. H. and Hoppin, J. A. (2006). Bayesian methods for highly correlated exposure data. Epidemiology. To appear.
  • Martin, A. D. and Quinn, K. M. (2002). MCMCpack. Available at mcmcpack.wush.edu.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
  • Miller, M. E., Hui, S. L. and Tierney, W. M. (1990). Validation techniques for logistic regression models. Statist. Med. 10 1213–1226.
  • Newman, D. J., Hettich, S., Blake, C. L. and Merz, C. J. (1998). UCI Repository of machine learning databases. Dept. of Information and Computer Sciences, Univ. California, Irvine.
  • Racine, A., Grieve, A. P., Fluhler, H. and Smith, A. F. M. (1986). Bayesian methods in practice: Experiences in the pharmaceutical industry (with discussion). Appl. Statist. 35 93–150.
  • Raftery, A. E. (1996). Approximate Bayes factors and accounting for model uncertainty in generalised linear models. Biometrika 83 251–266.
  • Raghunathan, T. E., Van Hoewyk, J. and Solenberger, P. W. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Surv. Methodol. 27 85–95.
  • Rubin, D. B. (1978). Multiple imputations in sample surveys: A phenomenological Bayesian approach to nonresponse (with discussion). In Proc. Amer. Statist. Assoc., Survey Research Methods Section 20–34.
  • Rubin, D. B. (1996). Multiple imputation after 18+ years (with discussion). J. Amer. Statist. Assoc. 91 473–520.
  • Spiegelhalter, D. J. and Smith, A. F. M. (1982). Bayes factors for linear and log-linear models with vague prior information. J. Roy. Statist. Soc. Ser. B 44 377–387.
  • Stigler, S. M. (1977). Do robust estimators work with real data? Ann. Statist. 5 1055–1098.
  • Van Buuren, S. and Oudshoom, C. G. M. (2000). MICE: Multivariate imputation by chained equations (S software for missing-data imputation). Available at web.inter.nl.net/users/S.van.Buuren/mi/.
  • Vilalta, R. and Drissi, Y. (2002). A perspective view and survey of metalearning. Artificial Intelligence Review 18 77–95.
  • Winkler, R. L. (1969). Scoring rules and the evaluation of probability assessors. J. Amer. Statist. Assoc. 64 1073–1078.
  • Witte, J. S., Greenland, S. and Kim, L. L. (1998). Software for hierarchical modeling of epidemiologic data. Epidemiology 9 563–566.
  • Zhang, T. and Oles, F. J. (2001). Text categorization based on regularized linear classification methods. Information Retrieval 4 5–31.
  • Yang, R. and Berger, J. O. (1994). Estimation of a covariance matrix using reference prior. Ann. Statist. 22 1195–1211.
  • Zorn, C. (2005). A solution to separation in binary response models. Political Analysis 13 157–170.