## The Annals of Applied Statistics

### On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example

#### Abstract

Variational inference is a general approach for approximating complex density functions, such as those arising in latent variable models, popular in machine learning. It has been applied to approximate the maximum likelihood estimator and to carry out Bayesian inference, however, quantification of uncertainty with variational inference remains challenging from both theoretical and practical perspectives. This paper is concerned with developing uncertainty measures for variational inference by using bootstrap procedures. We first develop two general bootstrap approaches for assessing the uncertainty of a variational estimate and the study the underlying bootstrap theory in both fixed- and increasing-dimension settings. We then use the bootstrap approach and our theoretical results in the context of mixed membership modeling with multivariate binary data on functional disability from the National Long Term Care Survey. We carry out a two-sample approach to test for changes in the repeated measures of functional disability for the subset of individuals present in 1989 and 1994 waves.

#### Article information

Source
Ann. Appl. Stat., Volume 12, Number 2 (2018), 846-876.

Dates
Revised: April 2018
First available in Project Euclid: 28 July 2018

https://projecteuclid.org/euclid.aoas/1532743479

Digital Object Identifier
doi:10.1214/18-AOAS1169

Mathematical Reviews number (MathSciNet)
MR3834288

#### Citation

Chen, Yen-Chi; Wang, Y. Samuel; Erosheva, Elena A. On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example. Ann. Appl. Stat. 12 (2018), no. 2, 846--876. doi:10.1214/18-AOAS1169. https://projecteuclid.org/euclid.aoas/1532743479

#### References

• Airoldi, E., Blei, D., Xing, E. and Fienberg, S. (2005). A latent mixed membership model for relational data. In Proceedings of the 3rd International Workshop on Link Discovery 82–89. ACM, New York.
• Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
• Airoldi, E. M., Blei, D. M., Erosheva, E. A. and Fienberg, S. E. (2015). Introduction to mixed membership models and methods. In Handbook of Mixed Membership Models and Their Applications. Chapman & Hall/CRC Handb. Mod. Stat. Methods 3–13. CRC Press, Boca Raton, FL.
• Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica 68 399–405.
• Babu, G. J. and Singh, K. (1983). Inference on means using the bootstrap. Ann. Statist. 11 999–1003.
• Berry, A. C. (1941). The accuracy of the Gaussian approximation to the sum of independent variates. Trans. Amer. Math. Soc. 49 122–136.
• Bickel, P., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41 1922–1943.
• Blei, D. M. and Jordan, M. I. (2006). Variational inference for Dirichlet process mixtures. Bayesian Anal. 1 121–143.
• Blei, D. M., Kucukelbir, A. and McAuliffe, J. D. (2017). Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112 859–877.
• Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.
• Box, G. E. P. (1976). Science and statistics. J. Amer. Statist. Assoc. 71 791–799.
• Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
• Chernozhukov, V., Chetverikov, D. and Kato, K. (2013). Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. Ann. Statist. 41 2786–2819.
• Damianou, A., Titsias, M. K. and Lawrence, N. D. (2011). Variational Gaussian process dynamical systems. In Advances in Neural Information Processing Systems 2510–2518.
• Damianou, A. C., Titsias, M. K. and Lawrence, N. D. (2016). Variational inference for latent variables and uncertain inputs in Gaussian processes. J. Mach. Learn. Res. 17 Paper No. 42.
• Douglas, J. (1997). Joint consistency of nonparametric item characteristic curve and ability estimation. Psychometrika 62 7–28.
• Efron, B. (1982a). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics 38. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
• Efron, B. (1982b). Bootstrap methods: Another look at the jackknife. In Breakthroughs in Statistics 569–593.
• Erosheva, E. A., Fienberg, S. E. and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Stat. 1 502–537.
• Erosheva, E. A. and White, T. (2006). Operational definition of chronic disability in the national long term care survey: Problems and suggestions. Working Paper.
• Esseen, C.-G. (1942). On the Liapounoff limit of error in the theory of probability. Ark. Mat. Astron. Fys. 28A 19.
• Fan, J. and Zhou, W.-X. (2016). Guarding against spurious discoveries in high dimensions. J. Mach. Learn. Res. 17 Paper No. 203.
• Ghahramani, Z. and Beal, M. J. (2000). Variational inference for Bayesian mixtures of factor analysers. In Advances in Neural Information Processing Systems 449–455.
• Haberman, S. J. (1977). Maximum likelihood estimates in exponential response models. Ann. Statist. 5 815–841.
• Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer, New York.
• Hall, P., Ormerod, J. T. and Wand, M. P. (2011). Theory of Gaussian variational approximation for a Poisson mixed model. Statist. Sinica 21 369–389.
• Hall, P., Pham, T., Wand, M. P. and Wang, S. S. J. (2011). Asymptotic normality and valid inference for Gaussian variational approximation. Ann. Statist. 39 2502–2532.
• Horowitz, J. L. (1997). Bootstrap methods in econometrics: Theory and numerical performance. Econom. Soc. Monogr. 28 188–222.
• Jordan, M. I., Ghahramani, Z., Jaakkola, T. S. and Saul, L. K. (1999). An introduction to variational methods for graphical models. Mach. Learn. 37 183–233.
• Khan, M. E., Bouchard, G., Murphy, K. P. and Marlin, B. M. (2010). Variational bounds for mixed-data factor analysis. In Advances in Neural Information Processing Systems 1108–1116.
• Klami, A., Virtanen, S., Leppäaho, E. and Kaski, S. (2015). Group factor analysis. IEEE Trans. Neural Netw. Learn. Syst. 26 2136–2147.
• Latouche, P., Birmelé, E. and Ambroise, C. (2012). Variational Bayesian inference and complexity control for stochastic block models. Stat. Model. 12 93–115.
• Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59.
• Mammen, E. (1989). Asymptotics with increasing dimension for robust regression with applications to the bootstrap. Ann. Statist. 17 382–400.
• Mammen, E. (1993). Bootstrap and wild bootstrap for high-dimensional linear models. Ann. Statist. 21 255–285.
• Manton, K. G., Corder, L. S. and Stallard, E. (1993). Estimates of change in chronic disability and institutional incidence and prevalence rates in the us elderly population from the 1982, 1984, and 1989 national long term care survey. J. Gerontol. 48 S153–S166.
• Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1–32.
• O’Hagan, A., Murphy, T. B. and Gormley, I. C. (2015). On estimation of parameter uncertainty in model-based clustering. Preprint. Available at arXiv:1510.00551.
• Portnoy, S. (1984). Asymptotic behavior of $M$-estimators of $p$ regression parameters when $p^{2}/n$ is large. I. Consistency. Ann. Statist. 12 1298–1309.
• Portnoy, S. (1985). Asymptotic behavior of $M$ estimators of $p$ regression parameters when $p^{2}/n$ is large. II. Normal approximation. Ann. Statist. 13 1403–1417.
• Portnoy, S. (1988). Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity. Ann. Statist. 16 356–366.
• Pritchard, J. K., Stephens, M. and Donnelly, P. (2000). Inference of population structure using multilocus genotype data. Genetics 155 945–959.
• Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195–239.
• Singh, K. (1981). On the asymptotic accuracy of Efron’s bootstrap. Ann. Statist. 9 1187–1195.
• Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1 1–305.
• Wang, Y. and Blei, D. M. (2017). Frequentist consistency of variational Bayes. Preprint. Available at arXiv:1705.03439.
• Wang, Y. S. and Erosheva, E. A. (2015). Fitting mixed membership models using mixedmem.
• Wang, Y. S., Matsueda, R. L. and Erosheva, E. A. (2017). A variational EM method for mixed membership models with multivariate rank data: An analysis of public policy preferences. Ann. Appl. Stat. 11 1452–1480.
• Wasserman, L. (2006). All of Nonparametric Statistics. Springer, New York.
• Wasserman, L., Kolar, M. and Rinaldo, A. (2013). Estimating undirected graphs under weak assumptions. Preprint. Available at arXiv:1309.6933.
• Westling, T. and McCormick, T. H. (2015). Establishing consistency and improving uncertainty estimates of variational inference through m-estimation. Preprint. Available at arXiv:1510.08151.
• Woodbury, M. A., Clive, J. and Garson, A. (1978). Mathematical typology: A grade of membership technique for obtaining disease definition. Comput. Biomed. Res. 11 277–298.