## Bayesian Analysis

### Mean Field Variational Bayes for Elaborate Distributions

#### Abstract

We develop strategies for mean field variational Bayes approximate inference for Bayesian hierarchical models containing elaborate distributions. We loosely define elaborate distributions to be those having more complicated forms compared with common distributions such as those in the Normal and Gamma families. Examples are Asymmetric Laplace, Skew Normal and Generalized Extreme Value distributions. Such models suffer from the difficulty that the parameter updates do not admit closed form solutions. We circumvent this problem through a combination of (a) specially tailored auxiliary variables, (b) univariate quadrature schemes and (c) finite mixture approximations of troublesome density functions. An accuracy assessment is conducted and the new methodology is illustrated in an application.

#### Article information

Source
Bayesian Anal., Volume 6, Number 4 (2011), 847-900.

Dates
First available in Project Euclid: 13 June 2012

https://projecteuclid.org/euclid.ba/1339616546

Digital Object Identifier
doi:10.1214/11-BA631

Mathematical Reviews number (MathSciNet)
MR2869967

Zentralblatt MATH identifier
1330.62158

#### Citation

Wand, Matthew P.; Ormerod, John T.; Padoan, Simone A.; Frühwirth, Rudolf. Mean Field Variational Bayes for Elaborate Distributions. Bayesian Anal. 6 (2011), no. 4, 847--900. doi:10.1214/11-BA631. https://projecteuclid.org/euclid.ba/1339616546

#### References

• Aigner, D. J., Lovell, C. A. K., and Schmidt, P. (1977). "Formulation and estimation of stochastic frontier production function models." Journal of Econometrics, 12: 21–37.
• Antoniadis, A. and Fan, J. (2001). "Regularization of wavelet approximations (with discussion)." Journal of the American Statistical Association, 96: 939–967.
• Archambeau, C. and Bach, F. (2008). "Sparse probabilistic projections." In 21st Annual Conference on Neural Information Processing Systems, 73–80. Vancouver, Canada.
• Armagan, A. (2009). "Variational bridge regression." Journal of Machine Learning Research, Workshop and Conference Proceedings, 5: 17–24.
• Attias, H. (1999). "Inferring parameters and structure of latent variable models by variational Bayes." In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 21–30.
• Azzalini, A. and Capitanio, A. (2003). "Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution." Journal of the Royal Statistical Society, Series B, 65: 367–389.
• Azzalini, A. and Dalla Valle, A. (1996). "The multivariate skew-normal distribution." Biometrika, 83: 715–726.
• Bishop, C. M. (2006). Pattern Recognition and Machine Learning. New York: Springer.
• Braun, M. and McAuliffe, J. (2010). "Variational inference for large-scale models of discrete choice." Journal of the American Statistical Association, 105: 324–335.
• Chib, S., Nardari, F., and Shephard, N. (2002). "Markov chain Monte Carlo methods for stochastic volatility models." Journal of Econometrics, 108: 281–316.
• Consonni, G. and Marin, J.-M. (2007). "Mean-field variational approximate Bayesian inference for latent variable models." Computational Statistics and Data Analysis, 52: 790–798.
• Cottet, R., Kohn, R. J., and Nott, D. J. (2008). "Variable selection and model averaging in semiparametric overdispersed generalized linear models." Journal of the American Statistical Association, 103: 661–671.
• Devroye, L. and Györfi, L. (1985). Density Estimation: The $L_1$ View. New York: Wiley.
• Frühwirth-Schnatter, S. and Frühwirth, R. (2010). "Data augmentation and MCMC for binary and multinomial logit models." In Kneib, T. and Tutz, G. (eds.), Statistical Modelling and Regression Structures – Festschrift in Honour of Ludwig Fahrmeir, 111–132. Heidelberg, Germany: Physica-Verlag.
• Frühwirth-Schnatter, S., Frühwirth, R., Held, L., and Rue, H. (2009). "Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data." Statistics and Computing, 19: 479–492.
• Frühwirth-Schnatter, S. and Wagner, H. (2006). "Auxiliary mixture sampling for parameter driven models of time series counts with applications to state space modelling." Biometrika, 93: 827–841.
• Gelman, A. (2006). "Prior distributions for variance parameters in hierarchical models." Bayesian Analysis, 1: 515–533.
• Girolami, M. and Rogers, S. (2006). "Variational Bayesian multinomial probit regression." Neural Computation, 18: 1790–1817.
• Gradshteyn, I. S. and Ryzhik, I. M. (1994). Tables of Integrals, Series, and Products. San Diego, California: Academic Press, 5th edition.
• Jaakkola, T. S. (2001). "Tutorial on variational approximation methods." In Opper, M. and Saad, D. (eds.), Advanced Mean Field Methods: Theory and Practice, 129–160. Cambridge, Massachusetts: MIT Press.
• Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., and Saul, L. K. (1999). "An introduction to variational methods for graphical models." Machine Learning, 37: 183–233.
• Kim, S., Shephard, N., and Chib, S. (1998). "Stochastic volatility: Likelihood inference and comparison with ARCH models." Review of Economic Studies, 65: 361–393.
• Kotz, S., Kozubowski, T. J., and Podgórski, K. (2001). The Laplace Distribution and Generalizations. Boston: Birkhäuser.
• Kschischang, F. R., Frey, B. J., and Loeliger, H.-A. (2001). "Factor graphs and the sum-product algorithm." IEEE Transactions on Information Theory, 47: 498–519.
• Lange, K. L., Little, R. J. A., and Taylor, J. M. G. (1989). "Robust statistical modeling using the $t$ distribution." Journal of the American Statistical Association, 84: 881–896.
• Lasserre, J. B. (2001). "Global optimization with polynomials and problem of moments." SIAM Journal of Optimization, 12: 756–769.
• Ligges, U., Thomas, A., Spiegelhalter, D., Best, N., Lunn, D., Rice, K., and Sturtz, S. (2011). BRugs 0.5: \textsfOpenBUGS and its \textsfR/\textsfS-PLUS interface BRugs. http://www.stats.ox.ac.uk/pub/RWin/src/contrib/
• Liu, Q. and Pierce, D. A. (1994). "A note on Gauss-Hermite quadrature." Biometrika, 81: 624–629.
• Luenberger, D. G. and Ye, Y. (2008). Linear and Nonlinear Programming. New York: Springer, 3rd edition.
• Lunn, D., Thomas, A., Best, N. G., and Spiegelhalter, D. J. (2000). "\textsfWinBUGS" – A Bayesian Modelling Framework: Concepts, Structure, and Extensibility. Statistics and Computing, 10: 325–337.
• Marron, J. S. and Wand, M. P. (1992). "Exact mean integrated squared error." The Annals of Statistics, 20: 712–736.
• McGrory, C. A. and Titterington, D. M. (2007). "Variational approximations in Bayesian model selection for finite mixture distributions." Computational Statistics and Data Analysis, 51: 5352–5367.
• Minka, T. (2001). "Expectation propagation and approximate Bayesian inference." In Proceedings on the 17th Conference on Uncertainty in Artificial Intelligence, 362–369. San Francisco: Morgan Kaufmann.
• Minka, T., Winn, J., Guiver, G., and Knowles, D. (2010). "\textsfInfer.NET" 2.4". Microsoft Research Cambridge. http://research/microsoft.com/infernet
• Nelder, J. and Mead, R. (1965). "A simplex method for function minimization." Computer Journal, 7: 308–313.
• Omori, Y., Chib, S., Shephard, N., and Nakajima, J. (2007). "Stochastic volatility with leverage: Fast and efficient likelihood inference." Journal of Econometrics, 140: 425–449.
• Ormerod, J. T. and Wand, M. P. (2010). "Explaining variational approximations." The American Statistician, 64: 140–153.
• Parisi, G. (1988). Statistical Field Theory. Redwood City, California: Addison-Wesley.
• Park, T. and Casella, G. (2008). "The Bayesian lasso." Journal of the American Statistical Association, 103: 681–686.
• Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. San Mateo, California: Morgan Kaufmann.
• Saul, L. K. and Jordan, M. I. (1996). "Exploiting tractable substructures in intractable networks." In Advances in Neural Information Processing Systems, 435–442. Cambridge, Massachusetts: MIT Press.
• Shephard, N. (1994). "Partial non-Gaussian state space." Biometrika, 81: 115–131.
• Staudenmayer, J., Lake, E. E., and Wand, M. P. (2009). "Robustness for general design mixed models using the $t$-distribution." Statistical Modelling, 9: 235–255.
• Teschendorff, A. E., Wang, Y., Barbosa-Morais, N. L., Brenton, J. D., and Caldas, C. (2005). "A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data." Bioinformatics, 21: 3025–3033.
• Tipping, M. E. and Lawrence, N. D. (2003). "A variational approach to robust Bayesian interpolation." IEEE Workshop on Neural Networks for Signal Processing, 229–238.
• Wainwright, M. J. and Jordan, M. I. (2008). "Graphical models, exponential families, and variational inference." Foundation and Trends in Machine Learning, 1: 1–305.
• Wand, M. P. and Ormerod, J. T. (2008). "On semiparametric regression with O'Sullivan penalized splines." Australian and New Zealand Journal of Statistics, 50: 179–198.
• Wand, M. P. and Ripley, B. D. (2010). KernSmooth 2.23: Functions for kernel smoothing corresponding to the book: Wand, M.P. and Jones, M.C. (1995) “Kernel Smoothing”. http://cran.r-project.org
• Wang, B. and Titterington, D. M. (2005). "Inadequacy of interval estimates corresponding to variational Bayesian approximations." In Cowell, R. and Ghahramani, Z. (eds.), Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, 373–380.
• Winn, J. and Bishop, C. M. (2005). "Variational message passing." Journal of Machine Learning Research, 6: 661–694.
• Yu, K. and Moyeed, R. A. (2001). "Bayesian quantile regression." Statistics and Probability Letters, 54: 437–447.
• Zhang, S. and Jin, J.-M. (1996). Computation of Special Functions. New York: Wiley.