Electronic Journal of Statistics

Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies

Sarah E. Neville, John T. Ormerod, and M. P. Wand

Full-text: Open access

Abstract

We investigate mean field variational approximate Bayesian inference for models that use continuous distributions, Horseshoe, Negative-Exponential-Gamma and Generalized Double Pareto, for sparse signal shrinkage. Our principal finding is that the most natural, and simplest, mean field variational Bayes algorithm can perform quite poorly due to posterior dependence among auxiliary variables. More sophisticated algorithms, based on special functions, are shown to be superior. Continued fraction approximations via Lentz’s Algorithm are developed to make the algorithms practical.

Article information

Source
Electron. J. Statist. Volume 8, Number 1 (2014), 1113-1151.

Dates
First available in Project Euclid: 7 August 2014

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1407415580

Digital Object Identifier
doi:10.1214/14-EJS910

Mathematical Reviews number (MathSciNet)
MR3263115

Zentralblatt MATH identifier
1298.62050

Subjects
Primary: 62F15: Bayesian inference
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Approximate Bayesian inference continued fraction Generalized Double Pareto distribution Horseshoe distribution Lentz’s Algorithm Normal-Exponential-Gamma distribution special function

Citation

Neville, Sarah E.; Ormerod, John T.; Wand, M. P. Mean field variational Bayes for continuous sparse signal shrinkage: Pitfalls and remedies. Electron. J. Statist. 8 (2014), no. 1, 1113--1151. doi:10.1214/14-EJS910. https://projecteuclid.org/euclid.ejs/1407415580


Export citation

References

  • [1] Abramowitz, M. and Stegun, I.A. (Eds.) (1972). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. New York: Dover Publications.
  • [2] Archambeau, C. and Bach, F. (2008). Sparse probabilistic projections. 21st Annual Conference on Neural Information Processing Systems, Vancouver, Canada, December 8–11.
  • [3] Armagan, A. (2009). Variational bridge regression. Journal of Machine Learning Research, Workshop and Conference Proceedings, 5, 17–24.
  • [4] Armagan, A., Dunson, D.B. and Clyde, M. (2011). Generalized beta mixtures of Gaussians. In Advances in Neural Information Processing Systems 24, J. Shawe-Taylor, R.S. Zamel, P. Bartlett, F. Pereira and K.Q. Weinberger (Eds.), 523–531.
  • [5] Armagan, A., Dunson, D.B. and Lee, J. (2013). Generalized double Pareto shrinkage. Statistica Sinica, 23, 119–143.
  • [6] Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, 21–30.
  • [7] Bishop, C.M. (2006). Pattern Recognition and Machine Learning. New York: Springer.
  • [8] Carbonetto, P. and Stephens, M. (2011). Scalable variational inference for Bayesian variable selection in regression, and its accuracy in genetic association studies. Bayesian Analysis, 6(4), 1–42.
  • [9] Carvalho, C.M., Polson, N.G. and Scott, J.G. (2010). The horseshoe estimator for sparse signals. Biometrika, 97, 465–480.
  • [10] Consonni, G. and Marin, J.M. (2007). Mean-field variational approximate Bayesian inference for latent variable models. Computational Statistics and Data Analysis, 52, 790–798.
  • [11] Cuyt, A., Petersen, V.B., Verdonk, B., Waadeland, H. and Jones, W.B. (2008). Handbook of Continued Fractions for Special Functions. New York: Springer.
  • [12] Flandin, G. and Penny, W.D. (2007). Bayesian fMRI data analysis with sparse spatial basis function priors. NeuroImage, 34, 1108–1125.
  • [13] Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Alken, P., Booth, M. and Rossi, F. (2009). GNU Scientific Library Reference Manual, 3rd Edition, Version 1.12, Bristol UK: Network Theory.
  • [14] Gradshteyn, I.S. and Ryzhik, I.M. (1994). Tables of Integrals, Series, and Products, 5th Edition, San Diego, California: Academic Press.
  • [15] Griffin, J.E. and Brown, P.J. (2011). Bayesian hyper-lassos with non-convex penalization. Australian and New Zealand Journal of Statistics, 53, 423–442.
  • [16] Hankin, R.K.S. (2007). gsl 1.9. Wrapper for the Gnu Scientific Library. R package. http://cran.r-project.org.
  • [17] Johnstone, I.M. and Silverman, B.W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. The Annals of Statistics, 32, 1594–1649.
  • [18] Johnstone, I.M. and Silverman, B.W. (2005). Bayes selection of wavelet thresholds. The Annals of Statistics, 33, 1700–1752.
  • [19] Lentz, W.J. (1976). Generating Bessel functions in Mie scattering calculations using continued fractions. Applied Optics, 3, 668–671.
  • [20] Ligges, U., Thomas, A., Spiegelhalter, D., Best, N., Lunn, D., Rice, K. and Sturtz, S. (2011). BRugs 0.5: OpenBUGS and its R/S-PLUS interface BRugs. R package. http://www.stats.ox.ac.uk/pub/RWin/bin/windows/contrib/2.13.
  • [21] Logsdon, B.A., Hoffman, G.E. and Mezey, J.G. (2010). A variational Bayes algorithm for fast and accurate multiple locus genome-wide association analysis. BMC Bioinformatics, 11:58, 1–13.
  • [22] Lunn, D.J., Thomas, A., Best, N. and Spiegelhalter, D. (2000). WinBUGS – a Bayesian modelling framework: concepts, structure, and extensibility. Statistics and Computing, 10, 325–337.
  • [23] McGrory, C.A. and Titterington, D.M. (2007). Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics and Data Analysis, 51, 5352–5367.
  • [24] Neville, S.E. (2013). Elaborate Distribution Semiparametric Regression via Mean Field Variational Bayes. PhD Thesis, University of Wollongong.
  • [25] Ormerod, J.T. and Wand, M.P. (2010). Explaining variational approximations. The American Statistician, 64, 140–153.
  • [26] Polson, N.G. and Scott, J.G. (2010). Shrink globally, act locally: sparse Bayesian regularization and prediction. In Bayesian Statistics 9, J.M. Bernardo, M.J. Bayarri, J.O. Berger, A.P. Dawid, D. Heckerman, A.F.M. Smith and M. West (Eds.). Oxford: Oxford University Press.
  • [27] Press, W., Teukolosky, S., Vetterling, W. and Flannery, B. (1992). Numerical Recipes: The Art of Scientific Computing, 2nd Edition. New York: Cambridge University Press.
  • [28] R Development Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org
  • [29] Teschendorff, A.E., Wang, Y., Barbosa-Morais, N.L., Brenton, J.D. and Caldas C. (2005). A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics, 21, 3025–3033.
  • [30] Tipping, M.E. and Lawrence, N.D. (2003). A variational approach to robust Bayesian interpolation. IEEE Workshop on Neural Networks for Signal Processing, 229–238.
  • [31] Wainwright, M.J. and Jordan, M.I. (2008). Graphical models, exponential families, and variational inference. Foundation and Trends in Machine Learning, 1, 1–305.
  • [32] Wand, M.P. and Ormerod, J.T. (2011). Penalized wavelets: embedding wavelets into semiparametric regression. Electronic Journal of Statistics, 5, 1654–1717.
  • [33] Wand, M.P. and Ormerod, J.T. (2012). Continued fraction enhancement of Bayesian computing. Stat., 1, 31–41.
  • [34] Wand, M.P., Ormerod, J.T., Padoan, S.A. and Frühwirth, R. (2011). Mean field variational Bayes for elaborate distributions. Bayesian Analysis, 6, 847–900.
  • [35] Wand, M.P. and Ripley, B.D. (2010). KernSmooth 2.23. Functions for kernel smoothing corresponding to the book: Wand, M.P. and Jones, M.C. (1995) “Kernel Smoothing”. R package. http://cran.r-project.org
  • [36] Whittaker, E T. and Watson, G.N. (1990). A Course in Modern Analysis, 4th Edition, Cambridge UK: Cambridge University Press.
  • [37] Wuertz, D. and others. (2009). fAsianOptions 2100.76. Exponential Brownian motion and Asian option evaluation. R package. http://cran.r-project.org