Bayesian Analysis

Simulation-based Regularized Logistic Regression

Robert B. Gramacy and Nicholas G. Polson

Full-text: Open access


In this paper, we develop a simulation-based framework for regularized logistic regression, exploiting two novel results for scale mixtures of normals. By carefully choosing a hierarchical model for the likelihood by one type of mixture, and implementing regularization with another, we obtain new MCMC schemes with varying efficiency depending on the data type (binary v. binomial, say) and the desired estimator (maximum likelihood, maximum a posteriori, posterior mean). Advantages of our omnibus approach include flexibility, computational efficiency, applicability in $p\gg n$ settings, uncertainty estimates, variable selection, and assessing the optimal degree of regularization. We compare our methodology to modern alternatives on both synthetic and real data. An R package called reglogit is available on CRAN.

Article information

Bayesian Anal. Volume 7, Number 3 (2012), 567-590.

First available in Project Euclid: 28 August 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

logistic regression regularization z–distributions data augmentation classification Gibbs sampling lasso variance-mean mixtures Bayesian shrinkage


Gramacy, Robert B.; Polson, Nicholas G. Simulation-based Regularized Logistic Regression. Bayesian Anal. 7 (2012), no. 3, 567--590. doi:10.1214/12-BA719.

Export citation


  • Andrews, D. and Mallows, C. (1974). “Scale Mixtures of Normal Distributions.” Journal of the Royal Statistical Soceity, Series B, 36, 99–102.
  • Asuncion, A. and Newman, D. (2007). “UCI Machine Learning Repository.” Http://
  • Barndorff-Nielsen, O., Kent, J., and Sorensen, M. (1982). “Normal Variance-Mean Mixtures and $z$-distributions.” International Statistical Review, 50, 145–159.
  • Bernstein, D. (2005). Matrix Mathematics. Princeton, NJ: Princeton University Press.
  • Box, G. and Tiao, G. (1973). Bayesian Inference in Statistical Analysis. Mass: Addison Wesley.
  • Carlin, B. P. and Polson, N. G. (1991). “Inference for Nonconjugate Bayesian Models using the Gibbs sampler.” The Canadian Journal of Statistics, 19, 4, 399–405.
  • Carvalho, C., Polson, N., and Scott, J. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 9, 2, 465–480.
  • Devroye, L. (1986). Non-Uniform Random Variate Generation. Springer-Verlag.
  • Doucet, A., Godsill, S., and Robert, C. (2002). “Marginal maximum a posteriori estimation using Markov chain Monte Carlo.” Statistics and Computing, 21, 77–84.
  • Fahrmeir, L., Kneib, T., and Konrath, S. (2010). “Bayesian regularisation in structured additive regression: A unifying perspective on shrinkage, smoothing and predictor selection.” Statistics and Computing, 203–219.
  • Friedman, J. H., Hastie, T., and Tibshirani, R. (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.” Journal of Statistical Software, 33, 1, 1–22.
  • Friel, N. and Pettitt, A. (2008). “Marginal likelihood estimation via power posteriors.” Journal of the Royal Statistical Society, Series B., 70, 3, 589–607.
  • Frühwirth-Schnatter, S. and Frühwirth, R. (2007). “Auxilliary Mixture Sampling with Applications to Logistic Models.” Computational Statistics and Data Analysis, 51, 7, 3509–3528.
  • — (2010). “Data augmentation and MCMC for binary and multinomial logit models.” In Statistical Modelling and Regression Structures – Festschrift in Honour of Ludwig Fahrmeir, eds. T. Kneib and G. Tutz, 111–132. Physica-Verlag.
  • Frühwirth-Schnatter, S., R., Frühwirth, Held, L., and Rue, H. (2009). “Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data.” Statistics and Computing, 19, 479–492.
  • Genkin, A., Lewis, D., and Madigan, D. (2007). “Large-Scale Bayesian Logistic Regression for Text Categorization.” Technometrics, 49, 3, 291–304.
  • Godsill, S. (2000). “Inference in symmetric alpha-stable noise using MCMC and the slice sampler.” In IEEE International Conference on Acoustics, Speech and Signal Processing, vol. VI, 3806–3809.
  • Gramacy, R. and Pantaleo, E. (2010). “Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing.” Bayesian Analysis, 5, 2, 237–262.
  • Green, P. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination.” Biometrika, 82, 711–732.
  • Griffin, J. E. and Brown, P. J. (2010). “Inference with Normal–Gamma prior distributions in regression problems.” Bayesian Analysis, 5, 1, 171–188.
  • Hans, C. (2009). “Bayesian Lasso Regression.” Biometrika, 96, 836–845.
  • Holmes, C. and Held, K. (2006). “Bayesian Auxilliary Variable Models for Binary and Multinomial Regression.” Bayesian Analysis, 1, 1, 145–168.
  • Jacquier, E., Johannes, M., and Polson, N. (2007). “MCMC Maximum Likelihood for Latent State Models.” Journal of Econometrics, 137, 615–640.
  • Johnson, V. and Albert, J. (1999). Ordinal Data Modeling. Springer-Verlag.
  • Kass, R. E., Carlin, B. P., Gelman, A., and Neal, R. M. (1998). “Markov Chain Monte Carlo in Practice: A Roundtable Discussion.” The American Statistician, 52, 2, 93–100.
  • Kirkpatrick, S., Gelatt, C., and Vecci, M. (1983). “Optimization by simulated annealing.” Science, 220, 671–680.
  • Krishnapuram, B., Carin, L., Figueiredo, M., and Hartemink, A. (2005). “Sparse Multinomial Logistic Regression: Fast Algorithms and Generalization Bounds.” IEEE Pattern Analysis and Machine Intellegence, 27, 6, 957–969.
  • Madigan, D. and Ridgeway, G. (2004). “Discussion of ‘Least Angle Regression’ by B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani.” Annals of Statistics, 32, 2, 465–469.
  • Park, M. and Hastie, T. (2008). “Penalized Logistic Regression for Detecting Gene Interactions.” Biostatistics, 9, 1, 30–50.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103, 482, 681–686.
  • Pincus, M. (1968). “A Closed Form Solution of Certain Programming Problems.” Operations Research, 18, 1225–1228.
  • Robert, C. (1995). “Simulation of Truncated Normal Variables.” Statistics and Computing, 5, 2, 121–125.
  • R Development Core Team (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
  • Tibshirani, R. (1996). “Regression shrinkage and Selection via the Lasso.” Journal of the Royal Statistical Society, Series B., 58, 1, 267–288.
  • Tüchler, R. (2008). “Bayesian variable selection for logistic models using auxiliary mixture sampling.” Journal of Computational and Graphical Statistics, 17, 76–94.
  • Weron, R. (1996). “On the Chambers-Mallows-Stuck Method for Simulating Skewed Stable Random Variables.” Statistics and Probability Letters, 28, 2, 165–171.
  • West, M. (1987). “On Scale Mixtures of Normal Distributions.” Biometrika, 74, 3, 646–648.