## The Annals of Statistics

### An operator theoretic approach to nonparametric mixture models

#### Abstract

When estimating finite mixture models, it is common to make assumptions on the mixture components, such as parametric assumptions. In this work, we make no distributional assumptions on the mixture components and instead assume that observations from the mixture model are grouped, such that observations in the same group are known to be drawn from the same mixture component. We precisely characterize the number of observations $n$ per group needed for the mixture model to be identifiable, as a function of the number $m$ of mixture components. In addition to our assumption-free analysis, we also study the settings where the mixture components are either linearly independent or jointly irreducible. Furthermore, our analysis considers two kinds of identifiability, where the mixture model is the simplest one explaining the data, and where it is the only one. As an application of these results, we precisely characterize identifiability of multinomial mixture models. Our analysis relies on an operator-theoretic framework that associates mixture models in the grouped-sample setting with certain infinite-dimensional tensors. Based on this framework, we introduce a general spectral algorithm for recovering the mixture components.

#### Article information

Source
Ann. Statist., Volume 47, Number 5 (2019), 2704-2733.

Dates
Revised: March 2018
First available in Project Euclid: 3 August 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1564797861

Digital Object Identifier
doi:10.1214/18-AOS1762

Mathematical Reviews number (MathSciNet)
MR3988770

Subjects
Primary: 62E10: Characterization and structure theory
Secondary: 62G05: Estimation

#### Citation

Vandermeulen, Robert A.; Scott, Clayton D. An operator theoretic approach to nonparametric mixture models. Ann. Statist. 47 (2019), no. 5, 2704--2733. doi:10.1214/18-AOS1762. https://projecteuclid.org/euclid.aos/1564797861

#### References

• [1] Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
• [2] Anandkumar, A., Ge, R., Hsu, D., Kakade, S. M. and Telgarsky, M. (2014). Tensor decompositions for learning latent variable models. J. Mach. Learn. Res. 15 2773–2832.
• [3] Anderson, J., Belkin, M., Goyal, N., Rademacher, L. and Voss, J. (2014). The more, the merrier: The blessing of dimensionality for learning large Gaussian mixtures. In Proceedings of the 27th Conference on Learning Theory 1135–1164.
• [4] Arora, S., Ge, R., Kannan, R. and Moitra, A. (2012). Computing a nonnegative matrix factorization—provably. In STOC’12—Proceedings of the 2012 ACM Symposium on Theory of Computing 145–161. ACM, New York.
• [5] Arora, S., Ge, R. and Moitra, A. (2012). Learning topic models—going beyond SVD. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science—FOCS 2012 1–10. IEEE Computer Soc., Los Alamitos, CA.
• [6] Balan, R., Casazza, P. and Edidin, D. (2006). On signal reconstruction without phase. Appl. Comput. Harmon. Anal. 20 345–356.
• [7] Bhaskara, A., Charikar, M., Moitra, A. and Vijayaraghavan, A. (2014). Smoothed analysis of tensor decompositions. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, STOC’14 594–603. ACM, New York.
• [8] Blanchard, G. and Scott, C. (2014). Decontamination of mutually contaminated models. In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics 1–9.
• [9] Bruni, C. and Koch, G. (1985). Identifiability of continuous mixtures of unknown Gaussian distributions. Ann. Probab. 13 1341–1357.
• [10] Comon, P., Golub, G., Lim, L.-H. and Mourrain, B. (2008). Symmetric tensors and symmetric tensor rank. SIAM J. Matrix Anal. Appl. 30 1254–1279.
• [11] Donoho, D. and Stodden, V. (2004). When does non-negative matrix factorization give a correct decomposition into parts? In Advances in Neural Information Processing Systems (S. Thrun, L. K. Saul and B. Schölkopf, eds.) 16 1141–1148. MIT Press, Cambridge, MA.
• [12] Elmore, R. and Wang, S. (2003). Identifiability and estimation in finite mixture models with multinomial components. Technical Report 03-04, Dept. Statistics, Pennsylvania State Univ., State College, PA.
• [13] Folland, G. B. (1999). Real Analysis: Modern Techniques and Their Applications, 2nd ed. Pure and Applied Mathematics (New York). Wiley, New York.
• [14] Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins Univ. Press, Baltimore, MD.
• [15] Kadison, R. V. and Ringrose, J. R. (1983). Fundamentals of the Theory of Operator Algebras. Vol. I: Elementary Theory. Pure and Applied Mathematics 100. Academic Press, New York.
• [16] Kallenberg, O. (2002). Foundations of Modern Probability, 2nd ed. Probability and Its Applications (New York). Springer, New York.
• [17] Kim, B. S. (1984). Studies of multinomial mixture models. Ph.D. thesis, Univ. North Carolina, Chapel Hill.
• [18] Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl. 18 95–138.
• [19] Micchelli, C. A., Xu, Y. and Zhang, H. (2006). Universal kernels. J. Mach. Learn. Res. 7 2651–2667.
• [20] Pachter, L. and Speyer, D. (2004). Reconstructing trees from subtree weights. Appl. Math. Lett. 17 615–621.
• [21] Paz, A. (1971). Introduction to Probabilistic Automata. Academic Press, New York.
• [22] Rabani, Y., Schulman, L. J. and Swamy, C. (2014). Learning mixtures of arbitrary distributions over large discrete domains. In ITCS’14—Proceedings of the 2014 Conference on Innovations in Theoretical Computer Science 207–223. ACM, New York.
• [23] Song, L., Anandkumar, A., Dai, B. and Xie, B. (2014). Nonparametric estimation of multi-view latent variable models. In Proceedings of the 31st International Conference on Machine Learning, ICML 2014 640–648.
• [24] Teicher, H. (1963). Identifiability of finite mixtures. Ann. Math. Stat. 34 1265–1269.
• [25] Vandermeulen, R. A. and Scott, C. D. (2019). Supplement to “An operator theoretic approach to nonparametric mixture models.” DOI:10.1214/18-AOS1762SUPP.
• [26] Yakowitz, S. J. and Spragins, J. D. (1968). On the identifiability of finite mixtures. Ann. Math. Stat. 39 209–214.

#### Supplemental materials

• Supplement to “An operator theoretic approach to nonparametric mixture models”. Technical results and additional algorithmic details.