The Annals of Statistics

Fixed points of the EM algorithm and nonnegative rank boundaries

Kaie Kubjas, Elina Robeva, and Bernd Sturmfels

Full-text: Open access

Abstract

Mixtures of $r$ independent distributions for two discrete random variables can be represented by matrices of nonnegative rank $r$. Likelihood inference for the model of such joint distributions leads to problems in real algebraic geometry that are addressed here for the first time. We characterize the set of fixed points of the Expectation–Maximization algorithm, and we study the boundary of the space of matrices with nonnegative rank at most $3$. Both of these sets correspond to algebraic varieties with many irreducible components.

Article information

Source
Ann. Statist., Volume 43, Number 1 (2015), 422-461.

Dates
First available in Project Euclid: 6 February 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1423230085

Digital Object Identifier
doi:10.1214/14-AOS1282

Mathematical Reviews number (MathSciNet)
MR3311865

Zentralblatt MATH identifier
1308.62035

Subjects
Primary: 62F10: Point estimation 13P25: Applications of commutative algebra (e.g., to statistics, control theory, optimization, etc.)

Keywords
Maximum likelihood EM algorithm mixture model nonnegative rank

Citation

Kubjas, Kaie; Robeva, Elina; Sturmfels, Bernd. Fixed points of the EM algorithm and nonnegative rank boundaries. Ann. Statist. 43 (2015), no. 1, 422--461. doi:10.1214/14-AOS1282. https://projecteuclid.org/euclid.aos/1423230085


Export citation

References

  • [1] Aggarwal, A., Booth, H., O’Rourke, J., Suri, S. and Yap, C.-K. (1989). Finding minimal convex nested polygons. Inform. and Comput. 83 98–110.
  • [2] Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
  • [3] Allman, E. S., Rhodes, J., Sturmfels, B. and Zwiernik, P. (2014). Tensors of nonnegative rank two. Linear Algebra Appl. Special Issue on Statistics. To appear. DOI:10.1016/j.laa.2013.10.046.
  • [4] Allman, E. S., Rhodes, J. A. and Taylor, A. (2014). A semialgebraic description of the general Markov model on phylogenetic trees. SIAM J. Discrete Math. 28 736–755.
  • [5] Basu, S., Pollack, R. and Roy, M.-F. (2003). Algorithms in Real Algebraic Geometry. Algorithms and Computation in Mathematics 10. Springer, Berlin.
  • [6] Beale, E. (1977). Discussion of “Maximum likelihood from incomplete data via the EM algorithm” by A. Dempster, N. Laird and D. Rubin. J. Roy. Statist. Soc. Ser. B 39 22–23.
  • [7] Bocci, C., Carlini, E. and Rapallo, F. (2011). Perturbation of matrices and nonnegative rank with a view toward statistical models. SIAM J. Matrix Anal. Appl. 32 1500–1512.
  • [8] Bosma, W., Cannon, J. and Playoust, C. (1997). The Magma algebra system. I. The user language. J. Symbolic Comput. 24 235–265.
  • [9] Brion, M. and Kumar, S. (2005). Frobenius Splitting Methods in Geometry and Representation Theory. Progress in Mathematics 231. Birkhäuser, Boston.
  • [10] Cohen, J. E. and Rothblum, U. G. (1993). Nonnegative ranks, decompositions, and factorizations of nonnegative matrices. Linear Algebra Appl. 190 149–168.
  • [11] Cox, D., Little, J. and O’Shea, D. (1992). Ideals, Varieties, and Algorithms: An Introduction to Computational Algebraic Geometry and Commutative Algebra. Springer, New York.
  • [12] Critch, A. (2013). Binary hidden Markov models and varieties. J. Algebr. Stat. 4 1–30.
  • [13] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–22.
  • [14] Drton, M., Sturmfels, B. and Sullivant, S. (2009). Lectures on Algebraic Statistics. Oberwolfach Seminars 39. Birkhäuser, Basel.
  • [15] Eggermont, R., Horobeţ, E. and Kubjas, K. Algebraic boundary of matrices of nonnegative rank at most three. Available at arXiv:1412.1654.
  • [16] Eisenbud, D. and Sturmfels, B. (1996). Binomial ideals. Duke Math. J. 84 1–45.
  • [17] Fawzi, H. and Parrilo, P. Self-scaled bounds for atomic cone ranks: Applications to nonnegative rank and cp-rank. Available at arXiv:1404.3240.
  • [18] Fienberg, S. (1977). Discussion of “Maximum likelihood from incomplete data via the EM algorithm” by A. Dempster, N. Laird and D. Rubin. J. Roy. Statist. Soc. Ser. B 39 29–30.
  • [19] Fienberg, S. E., Hersh, P., Rinaldo, A. and Zhou, Y. (2010). Maximum likelihood estimation in latent class models for contingency table data. In Algebraic and Geometric Methods in Statistics 27–62. Cambridge Univ. Press, Cambridge.
  • [20] Garcia, L. D., Stillman, M. and Sturmfels, B. (2005). Algebraic geometry of Bayesian networks. J. Symbolic Comput. 39 331–355.
  • [21] Georgi, B. and Schliep, A. (2006). Context-specific independence mixture modeling for positional weight matrices. Bioinformatics 22 166–173.
  • [22] Grayson, D. and Stillman, M. Macaulay2, a software system for research in algebraic geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
  • [23] Gross, E. and Rodriguez, J. Maximum likelihood geometry in the presence of data zeros. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation 232–239. ACM, New York.
  • [24] Hauenstein, J., Rodriguez, J. and Sturmfels, B. (2014). Maximum likelihood for matrices with rank constraints. J. Algebr. Stat. 5 18–38.
  • [25] Huh, J. and Sturmfels, B. (2014). Likelihood geometry. In Combinatorial Algebraic Geometry. Lecture Notes in Math. 2108 63–117. Springer, Cham.
  • [26] Lee, D. and Seung, H. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature 401 788–791.
  • [27] Miller, E. and Sturmfels, B. (2005). Combinatorial Commutative Algebra. Graduate Texts in Mathematics 227. Springer, New York.
  • [28] Moitra, A. (2012). An almost optimal algorithm for computing nonnegative rank. In Proceedings of the Twenty-Fourth Annual ACM–SIAM Symposium on Discrete Algorithms 1454–1464. SIAM, Philadelphia.
  • [29] Mond, D., Smith, J. and van Straten, D. (2003). Stochastic factorizations, sandwiched simplices and the topology of the space of explanations. R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci. 459 2821–2845.
  • [30] Murray, G. (1977). Discussion of “Maximum likelihood from incomplete data via the EM algorithm” by A. Dempster, N. Laird and D. Rubin. J. Roy. Statist. Soc. Ser. B 39 27–28.
  • [31] Pachter, L. and Sturmfels, B. (2005). Algebraic Statistics for Computational Biology. Cambridge Univ. Press, New York.
  • [32] Sturmfels, B. (2002). Solving Systems of Polynomial Equations. CBMS Regional Conference Series in Mathematics 97. AMS, Washington, DC.
  • [33] Vavasis, S. A. (2009). On the complexity of nonnegative matrix factorization. SIAM J. Optim. 20 1364–1377.
  • [34] White, N. L. (1997). Geometric applications of the Grassmann–Cayley algebra. In Handbook of Discrete and Computational Geometry 881–892. CRC, Boca Raton.
  • [35] Zhu, M., Jiang, G. and Gao, S. (2011). Solving the 100 Swiss francs problem. Math. Comput. Sci. 5 195–207.
  • [36] Zwiernik, P. and Smith, J. Q. (2011). Implicit inequality constraints in a binary tree model. Electron. J. Stat. 5 1276–1312.