Bayesian Analysis

A General Method for Robust Bayesian Modeling

Chong Wang and David M. Blei

Full-text: Open access


Robust Bayesian models are appealing alternatives to standard models, providing protection from data that contains outliers or other departures from the model assumptions. Historically, robust models were mostly developed on a case-by-case basis; examples include robust linear regression, robust mixture models, and bursty topic models. In this paper we develop a general approach to robust Bayesian modeling. We show how to turn an existing Bayesian model into a robust model, and then develop a generic computational strategy for it. We use our method to study robust variants of several models, including linear regression, Poisson regression, logistic regression, and probabilistic topic models. We discuss the connections between our methods and existing approaches, especially empirical Bayes and James–Stein estimation.

Article information

Bayesian Anal., Volume 13, Number 4 (2018), 1163-1191.

First available in Project Euclid: 3 January 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

robust statistics empirical Bayes probabilistic models variational inference expectation-maximization generalized linear models topic models

Creative Commons Attribution 4.0 International License.


Wang, Chong; Blei, David M. A General Method for Robust Bayesian Modeling. Bayesian Anal. 13 (2018), no. 4, 1163--1191. doi:10.1214/17-BA1090.

Export citation


  • Ahn, S., Korattikara, A., and Welling, M. (2012). “Bayesian posterior sampling via stochastic gradient Fisher scoring.” arXiv preprint arXiv:1206.6380.
  • Airoldi, E. (2007). “Bayesian Mixed-Membership Models of Complex and Evolving Networks.” Ph.D. thesis, Carnegie Mellon University.
  • Airoldi, E., Blei, D., Fienberg, S., and Xing, E. (2007). “Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis.” In Statistical Network Analysis: Models, Issues and New Directions, Lecture Notes in Computer Science, 57–74. Springer-Verlag.
  • Airoldi, E., Blei, D., Fienberg, S., and Xing, E. (2009). “Mixed Membership Stochastic Blockmodels.” In Neural Information Processing Systems.
  • Antoniak, C. (1974). “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems.” The Annals of Statistics, 2(6): 1152–1174.
  • Asuncion, A., Welling, M., Smyth, P., and Teh, Y. (2009). “On Smoothing and Inference for Topic Models.” In Uncertainty in Artificial Intelligence.
  • Attias, H. (2000). “A variational Bayesian framework for graphical models.” In Advances in Neural Information Processing Systems.
  • Berger, J. O., Moreno, E., Pericchi, L. R., Bayarri, M. J., Bernardo, J. M., Cano, J. A., De la Horra, J., Martín, J., Ríos-Insúa, D., Betrò, B., et al. (1994). “An overview of robust Bayesian analysis.” Test, 3(1): 5–124.
  • Bernardo, J. and Smith, A. (1994). Bayesian theory. Chichester: John Wiley & Sons Ltd.
  • Bickel, P. and Doksum, K. (2007). Mathematical Statistics: Basic Ideas and Selected Topics, volume 1. Upper Saddle River, NJ: Pearson Prentice Hall, 2nd edition.
  • Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer New York.
  • Blei, D. (2012). “Probabilistic Topic Models.” Communications of the ACM, 55(4): 77–84.
  • Blei, D. and Lafferty, J. (2007). “A Correlated Topic Model of Science.” Annals of Applied Statistics, 1(1): 17–35.
  • Blei, D., Ng, A., and Jordan, M. (2003). “Latent Dirichlet Allocation.” Journal of Machine Learning Research, 3: 993–1022.
  • Blei, D. M., Kucukelbir, A., and McAuliffe, J. D. (2017). “Variational Inference: A Review for Statisticians.” Journal of the American Statistical Association, 112(518): 859–877. URL
  • Box, G. (1976). “Science and Statistics.” Journal of the American Statistical Association, 71(356): 791–799.
  • Box, G. (1980). “Sampling and Bayes’ Inference in Scientific Modeling and Robustness.” Journal of the Royal Statistical Society, Series A, 143(4): 383–430.
  • Cameron, A. C. and Trivedi, P. K. (2013). Regression analysis of count data, volume 53. Cambridge university press.
  • Carlin, B. and Louis, T. (2000a). Bayes and Empirical Bayes Methods for Data Analysis, 2nd Edition. Chapman & Hall/CRC.
  • Carlin, B. and Louis, T. (2000b). “Empirical Bayes: Past, present and future.” Journal of the American Statistical Association, 95(452): 1286–1289.
  • Copas, J. B. (1969). “Compound Decisions and Empirical Bayes.” Journal of the Royal Statistical Society. Series B (Methodological), 31(3): pp. 397–425. URL
  • Corduneanu, A. and Bishop, C. (2001). “Variational Bayesian Model Selection for Mixture Distributions.” In International Conference on Artifical Intelligence and Statistics.
  • Dempster, A., Laird, N., and Rubin, D. (1977). “Maximum likelihood from incomplete data via the EM algorithm.” Journal of the Royal Statistical Society, Series B, 39: 1–38.
  • Diaconis, P. and Ylvisaker, D. (1979). “Conjugate Priors for Expontial Families.” The Annals of Statistics, 7(2): 269–281.
  • Doyle, G. and Elkan, C. (2009). “Accounting for burstiness in topic models.” In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, 281–288. New York, NY, USA: ACM.
  • Efron, B. (1996). “Empirical Bayes methods for Combining Likelihoods.” Journal of the American Statistical Association, 91(434): 538–550.
  • Efron, B. (2010). Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press.
  • Efron, B. and Morris, C. (1973). “Combining Possibly Related Estimation Problems.” Journal of the Royal Statistical Society, Series B, 35(3): 379–421.
  • Efron, B. and Morris, C. (1975). “Data analysis using Stein’s estimator and its generalizations.” Journal of the American Statistical Association, 70(350): 311–319.
  • Erosheva, E., Fienberg, S., and Joutard, C. (2007). “Describing Disability Through Individual-Level Mixture Models for Multivariate Binary Data.” Annals of Applied Statistics.
  • Fei-Fei, L. and Perona, P. (2005). “A Bayesian Hierarchical Model for Learning Natural Scene Categories.” IEEE Computer Vision and Pattern Recognition, 524–531.
  • Feng, J., Xu, H., Mannor, S., and Yan, S. (2014). “Robust Logistic Regression and Classification.” In Advances in Neural Information Processing Systems, 253–261.
  • Fernández, C. and Steel, M. F. (1999). “Multivariate Student-t regression models: Pitfalls and inference.” Biometrika, 86(1): 153–167.
  • Fine, S., Singer, Y., and Tishby, N. (1998). “The Hierarchical Hidden Markov Model: Analysis and Applications.” Machine Learning, 32: 41–62.
  • Fox, E., Sudderth, E., Jordan, M., and Willsky, A. (2011). “A Sticky HDP-HMM with Application to Speaker Diarization.” Annals of Applied Statistics, 5(2A): 1020–1056.
  • Geisser, S. and Eddy, W. F. (1979). “A predictive approach to model selection.” Journal of the American Statistical Association, 74(365): 153–160.
  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014). Bayesian data analysis, volume 2. Chapman & Hall/CRC Boca Raton, FL, USA.
  • Gelman, A., Meng, X., and Stern, H. (1996). “Posterior Predictive Assessment of Model Fitness Via Realized Discrepancies.” Statistica Sinica, 6: 733–807.
  • Ghahramani, Z. and Beal, M. J. (2000). “Variational Inference for Bayesian Mixtures of Factor Analysers.” In NIPS.
  • Grimmer (2009). “A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases.”
  • Hoffman, M., Blei, D., Wang, C., and Paisley, J. (2013). “Stochastic Variational Inference.” Journal of Machine Learning Research, 14(1303–1347).
  • Hoffman, M. D. and Gelman, A. (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15(Apr): 1593–1623.
  • Huber, P. and Ronchetti, E. (2009). Robust Statistics. Wiley, 2nd edition.
  • Huber, P. J. (1964). “Robust Estimation of a Location Parameter.” The Annals of Mathematical Statistics, 35(1): 73–101.
  • Jordan, M., Ghahramani, Z., Jaakkola, T., and Saul, L. (1999). “Introduction to Variational Methods for Graphical Models.” Machine Learning, 37: 183–233.
  • Jorgensen, B. (1987). “Exponential dispersion models.” Journal of the Royal Statistical Society. Series B (Methodological), 127–162.
  • Kalman, R. (1960). “A New Approach to Linear Filtering and Prediction Problems A New Approach to Linear Filtering and Prediction Problems,”.” Transaction of the AMSE: Journal of Basic Engineering, 82: 35–45.
  • Kass, R. and Steffey, D. (1989). “Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models).” Journal of the American Statistical Association, 84(407): 717–726.
  • Lange, K., Little, R., and Taylor, J. (1989). “Robust Statistical Modeling Using the t Distribution.” Journal of the American Statistical Association, 84(408): 881.
  • Madsen, R. E., Kauchak, D., and Elkan, C. (2005). “Modeling word burstiness using the Dirichlet distribution.” In Proceedings of the 22nd international conference on Machine learning, 545–552. ACM.
  • Maritz, J. and Lwin, T. (1989). Empirical Bayes methods. Monographs on Statistics and Applied Probability. London: Chapman & Hall.
  • McCullagh, P. and Nelder, J. (1989). Generalized Linear Models. London: Chapman and Hall.
  • McCulloch, C. E. and Neuhaus, J. M. (2001). Generalized linear mixed models. Wiley Online Library.
  • McLachlan, G. and Peel, D. (2000). Finite mixture models. Wiley-Interscience.
  • Morris, C. (1983). “Parametric empirical Bayes inference: Theory and applications.” Journal of the American Statistical Association, 78(381): 47–65.
  • Murphy, K. (2013). Machine Learning: A Probabilistic Approach. MIT Press.
  • Paisley, B. and Carin, L. (2009). “Nonparametric Factor Analysis with Beta Process Priors.” In International Conference on Machine Learning.
  • Peel, D. and McLachlan, G. J. (2000). “Robust mixture modelling using the t distribution.” Statistics and Computing, 10(4): 339–348.
  • Polson, N. G. and Scott, J. G. (2010). “Shrink globally, act locally: Sparse Bayesian regularization and prediction.” Bayesian Statistics, 9: 501–538.
  • Pregibon, D. (1982). “Resistant fits for some commonly used logistic models with medical applications.” Biometrics, 485–498.
  • Pritchard, J. K., Stephens, M., and Donnelly, P. (2000). “Inference of population structure using multilocus genotype data.” Genetics, 155(2): 945–959.
  • Rabe-Hesketh, S. and Skrondal, A. (2008). “Generalized linear mixed-effects models.” Longitudinal Data Analysis, 79–106.
  • Rabiner, L. R. (1989). “A tutorial on hidden Markov models and selected applications in speech recognition.” Proceedings of the IEEE, 77: 257–286.
  • Ranganath, R., Gerrish, S., and Blei, D. (2014). “Black box variational inference.” In Artificial Intelligence and Statistics.
  • Robbins, H. (1964). “The empirical Bayes approach to statistical decision problems.” The Annals of Mathematical Statistics, 1–20.
  • Robbins, H. (1980). “An empirical Bayes estimation problem.” Proceedings of the National Academy of Sciences, 77(12): 6988.
  • Rubin, D. (1984). “Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician.” The Annals of Statistics, 12(4): 1151–1172.
  • Salakhutdinov, R. and Mnih, A. (2008). “Probabilistic matrix factorization.” In Neural Information Processing Systems.
  • She, Y. and Owen, A. (2011). “Outlier detection using nonconvex penalized regression.” Journal of the American Statistical Association, 106(494).
  • Stefanski, L. A., Carroll, R. J., and Ruppert, D. (1986). “Optimally hounded score functions for generalized linear models with applications to logistic regression.” Biometrika, 73(2): 413–424.
  • Svensén, M. and Bishop, C. M. (2005). “Robust Bayesian mixture modelling.” Neurocomput., 64: 235–252.
  • Teh, Y., Jordan, M., Beal, M., and Blei, D. (2006). “Hierarchical Dirichlet processes.” Journal of the American Statistical Association, 101(476): 1566–1581.
  • Teh, Y. W. (2006). “A Hierarchical Bayesian Language Model based on Pitman-Yor Processes.” In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 985–992. URL
  • Tibshirani, J. and Manning, C. D. (2013). “Robust Logistic Regression using Shift Parameters.” CoRR, abs/1305.4987.
  • Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S. New York: Springer, fourth edition. ISBN 0-387-95457-0. URL
  • Wainwright, M. and Jordan, M. (2008). “Graphical models, exponential families, and variational inference.” Foundations and Trends in Machine Learning, 1(1–2): 1–305.
  • Wang, C. and Blei, D. M. (2013). “Variational inference in nonconjugate models.” The Journal of Machine Learning Research, 14(1): 1005–1031.
  • Wang, C., Paisley, J., and Blei, D. (2011). “Online Variational Inference for the Hierarchical Dirichlet Process.” In International Conference on Artificial Intelligence and Statistics.
  • Welling, M. and Teh, Y. W. (2011). “Bayesian learning via stochastic gradient Langevin dynamics.” In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 681–688.
  • Wood, F., van de Meent, J. W., and Mansinghka, V. (2014). “A New Approach to Probabilistic Programming Inference.” In Artificial Intelligence and Statistics, 1024–1032.
  • Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., and Yu, Y. (2013). “Petuum: A New Platform for Distributed Machine Learning on Big Data.” arXiv preprint arXiv:1312.7651.