## Electronic Journal of Statistics

### Convergence rates of latent topic models under relaxed identifiability conditions

Yining Wang

#### Abstract

In this paper we study the frequentist convergence rate for the Latent Dirichlet Allocation (Blei, Ng and Jordan, 2003) topic models. We show that the maximum likelihood estimator converges to one of the finitely many equivalent parameters in Wasserstein’s distance metric at a rate of $n^{-1/4}$ without assuming separability or non-degeneracy of the underlying topics and/or the existence of more than three words per document, thus generalizing the previous works of Anandkumar et al. (2012, 2014) from an information-theoretical perspective. We also show that the $n^{-1/4}$ convergence rate is optimal in the worst case.

#### Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 37-66.

Dates
First available in Project Euclid: 4 January 2019

https://projecteuclid.org/euclid.ejs/1546570941

Digital Object Identifier
doi:10.1214/18-EJS1516

#### Citation

Wang, Yining. Convergence rates of latent topic models under relaxed identifiability conditions. Electron. J. Statist. 13 (2019), no. 1, 37--66. doi:10.1214/18-EJS1516. https://projecteuclid.org/euclid.ejs/1546570941

#### References

• Anandkumar, A., Ge, R. and Janzamin, M. (2017). Analyzing tensor power method dynamics in overcomplete regime., Journal of Machine Learning Research 18 1–40.
• Anandkumar, A., Foster, D. P., Hsu, D. J., Kakade, S. M. and Liu, Y.-K. (2012). A spectral algorithm for latent dirichlet allocation. In, Proceedings of Advances in Neural Information Processing Systems (NIPS).
• Anandkumar, A., Ge, R., Hsu, D. J., Kakade, S. M. and Telgarsky, M. (2014). Tensor decompositions for learning latent variable models., Journal of Machine Learning Research 15 2773–2832.
• Arora, S., Ge, R. and Moitra, A. (2012). Learning topic models–going beyond SVD. In, Proceedings of the IEEE Annual Symposium on Foundations of Computer Science (FOCS).
• Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y. and Zhu, M. (2013). A practical algorithm for topic modeling with provable guarantees. In, Proceedings of the International Conference on Machine Learning (ICML).
• Blei, D. M. (2012). Probabilistic topic models., Communications of the ACM 55 77–84.
• Blei, D. M. and Lafferty, J. D. (2006). Dynamic topic models. In, Proceedings of the international conference on Machine learning (ICML).
• Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent dirichlet allocation., Journal of Machine Learning Research 3 993–1022.
• Chen, J. (1995). Optimal Rate of Convergence for Finite Mixture Models., The Annals of Statistics 23 221–233.
• Cheng, D., He, X. and Liu, Y. (2015). Model Selection for Topic Models via Spectral Decomposition. In, Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).
• Drton, M. (2016). Algebraic Problems in Structural Equation Modeling., arXiv preprint arXiv:1612.05994.
• Drton, M., Foygel, R. and Sullivant, S. (2011). Global identifiability of linear structural equation models., The Annals of Statistics 39 865–886.
• Fei-Fei, L. and Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
• Ge, R., Huang, Q. and Kakade, S. M. (2015). Learning mixtures of gaussians in high dimensions. In, Proceedings of the annual ACM symposium on Theory of computing (STOC).
• Griffiths, T. L. and Steyvers, M. (2004). Finding scientific topics., Proceedings of the National academy of Sciences 101 5228–5235.
• Hager, W. W. (1984). Condition estimates., SIAM Journal on scientific and statistical computing 5 311–316.
• Ho, N. and Nguyen, X. (2016). Singularity structures and impacts on parameter estimation in finite mixtures of distributions., arXiv preprint arXiv:1609.02655.
• Hsu, D. and Kakade, S. M. (2013). Learning mixtures of spherical gaussians: moment methods and spectral decompositions. In, Proceedings of the conference on Innovations in Theoretical Computer Science (ITCS).
• Huang, F., Niranjan, U., Hakeem, M. U. and Anandkumar, A. (2015). Online tensor methods for learning latent variable models., Journal of Machine Learning Research 16 2797–2835.
• Leung, D., Drton, M. and Hara, H. (2016). Identifiability of directed Gaussian graphical models with one latent source., Electronic Journal of Statistics 10 394–422.
• Ma, T., Shi, J. and Steurer, D. (2016). Polynomial-time tensor decompositions with sum-of-squares. In, Proceedings of the IEEE Annual Symposium on Foundations of Computer Science (FOCS). IEEE.
• Nguyen, X. (2013). Convergence of latent mixing measures in finite and infinite mixture models., The Annals of Statistics 41 370–400.
• Nguyen, X. (2015). Posterior Contraction of the population polytope in infinite admixture models., Bernoulli 21 618–646.
• Nguyen, X. (2016). Borrowing strength in hierarchical Bayes Posterior concentration of the Dirichlet base measure., Bernoulli 22 1535–1571.
• Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes., The Annals of Probability 22 28–76.
• Tang, J., Meng, Z., Nguyen, X., Mei, Q. and Zhang, M. (2014). Understanding the limiting factors of topic modeling via posterior contraction analysis. In, Proceedings of the International Conference on Machine Learning (ICML).
• Van der Vaart, A. W. (1998)., Asymptotic statistics 3. Cambridge university press.
• Wang, Y. and Anandkumar, A. (2016). Online and differentially-private tensor decomposition. In, Proceedings of the Advances in Neural Information Processing Systems (NIPS).
• Wang, Y. and Zhu, J. (2014). Spectral methods for supervised topic models. In, Proceedings of the Advances in Neural Information Processing Systems (NIPS).
• Wang, Y., Tung, H.-Y., Smola, A. J. and Anandkumar, A. (2015). Fast and guaranteed tensor decomposition via sketching. In, Proceedings of Advances in Neural Information Processing Systems (NIPS).
• Watanabe, S. (2009)., Algebraic geometry and statistical learning theory. Cambridge University Press.
• Watanabe, S. (2013). A widely applicable Bayesian information criterion., Journal of Machine Learning Research 14 867–897.