## The Annals of Statistics

### Computation of maximum likelihood estimates in cyclic structural equation models

#### Abstract

Software for computation of maximum likelihood estimates in linear structural equation models typically employs general techniques from nonlinear optimization, such as quasi-Newton methods. In practice, careful tuning of initial values is often required to avoid convergence issues. As an alternative approach, we propose a block-coordinate descent method that cycles through the considered variables, updating only the parameters related to a given variable in each step. We show that the resulting block update problems can be solved in closed form even when the structural equation model comprises feedback cycles. Furthermore, we give a characterization of the models for which the block-coordinate descent algorithm is well defined, meaning that for generic data and starting values all block optimization problems admit a unique solution. For the characterization, we represent each model by its mixed graph (also known as path diagram), which leads to criteria that can be checked in time that is polynomial in the number of considered variables.

#### Article information

Source
Ann. Statist., Volume 47, Number 2 (2019), 663-690.

Dates
Received: October 2016
Revised: May 2017
First available in Project Euclid: 11 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1547197234

Digital Object Identifier
doi:10.1214/17-AOS1602

Mathematical Reviews number (MathSciNet)
MR3909946

Zentralblatt MATH identifier
07033147

Subjects
Primary: 62H12: Estimation 62F10: Point estimation

#### Citation

Drton, Mathias; Fox, Christopher; Wang, Y. Samuel. Computation of maximum likelihood estimates in cyclic structural equation models. Ann. Statist. 47 (2019), no. 2, 663--690. doi:10.1214/17-AOS1602. https://projecteuclid.org/euclid.aos/1547197234

#### References

• Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley-Interscience, Hoboken, NJ.
• Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley, New York.
• Chaudhuri, S., Drton, M. and Richardson, T. S. (2007). Estimation of a covariance matrix with zeros. Biometrika 94 199–216.
• Colombo, D., Maathuis, M. H., Kalisch, M. and Richardson, T. S. (2012). Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40 294–321.
• Drton, M. (2009). Likelihood ratio tests and singularities. Ann. Statist. 37 979–1012.
• Drton, M., Eichler, M. and Richardson, T. S. (2009). Computing maximum likelihood estimates in recursive linear models with correlated errors. J. Mach. Learn. Res. 10 2329–2348.
• Drton, M., Fox, C. and Wang, Y. S. (2018). Supplement to “Computation of maximum likelihood estimates in cyclic structural equation models.” DOI:10.1214/17-AOS1602SUPP.
• Drton, M. and Maathuis, M. (2017). Structure learning in graphical modeling. Annual Review of Statistics and Its Application 4 365–393.
• Drton, M. and Richardson, T. S. (2004). Multimodality of the likelihood in the bivariate seemingly unrelated regressions model. Biometrika 91 383–392.
• Drton, M., Sturmfels, B. and Sullivant, S. (2009). Lectures on Algebraic Statistics. Oberwolfach Seminars 39. Birkhäuser, Basel.
• Fox, J. (2006). Structural equation modeling with the sem package in R. Struct. Equ. Model. 13 465–486.
• Fox, C. (2014). Interpretation and inference of linear structural equation models. Ph.D. thesis, Univ. Chicago.
• Foygel, R., Draisma, J. and Drton, M. (2012). Half-trek criterion for generic identifiability of linear structural equation models. Ann. Statist. 40 1682–1713.
• Grace, J. B., Anderson, T. M., Seabloom, E. W., Borer, E. T., Adler, P. B., Harpole, W. S., Hautier, Y., Hillebrand, H., Lind, E. M., et al. (2016). Integrative modelling reveals mechanisms linking productivity and plant species richness. Nature 529 390–393.
• Harary, F. (1962). The determinant of the adjacency matrix of a graph. SIAM Rev. 4 202–210.
• Hoyle, R. H., ed. (2012). Handbook of Structural Equation Modeling. Guilford Press, New York.
• Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling, 4th ed. Guilford Press, New York.
• Lacerda, G., Spirtes, P., Ramsey, J. and Hoyer, P. (2008). Discovering cyclic causal models by independent components analysis. In Proceedings of the Twenty-Fourth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-08) 366–374. AUAI Press, Corvallis, OR.
• Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
• McDonald, R. P. and Hartmann, W. M. (1992). A procedure for obtaining initial values of parameters in the RAM model. Multivar. Behav. Res. 27 57–76.
• Mooij, J. M. and Heskes, T. (2013). Cyclic causal discovery from continuous equilibrium data. In Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI-13) (A. Nicholson and P. Smyth, eds.) 431–439. AUAI Press, Corvallis, OR.
• Narayanan, A. (2012). A review of eight software packages for structural equation modeling. Amer. Statist. 66 129–138.
• Nowzohour, C., Maathuis, M. and Bühlmann, P. (2015). Structure learning with bow-free acyclic path diagrams. Available at: arxiv:1508.01717.
• Okamoto, M. (1973). Distinctness of the eigenvalues of a quadratic form in a multivariate sample. Ann. Statist. 1 763–765.
• Park, G. and Raskutti, G. (2016). Identifiability assumptions and algorithm for directed graphical models with feedback. Available at: arxiv:1602.04418.
• Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge.
• Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling. Springer, New York.
• R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
• Richardson, T. (1996). A discovery algorithm for directed cyclic graphs. In Uncertainty in Artificial Intelligence (Portland, OR, 1996) 454–461. Morgan Kaufmann, San Francisco, CA.
• Richardson, T. (1997). A characterization of Markov equivalence for directed cyclic graphs. Internat. J. Approx. Reason. 17 107–162.
• Rosseel, Y. (2012). lavaan: An R package for structural equation modeling. J. Stat. Softw. 48 1–36.
• Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A. and Nolan, G. P. (2005). Causal protein-signaling networks derived from multiparameter single-cell data. Science 308 523–529.
• Silva, R. (2013). A MCMC approach for learning the structure of Gaussian acyclic directed mixed graphs. In Statistical Models for Data Analysis (P. Giudici, S. Ingrassia and M. Vichi, eds.) 343–351. Springer, New York.
• Spirtes, P. (1995). Directed cyclic graphical representations of feedback models. In Uncertainty in Artificial Intelligence: Proceedings of the 11th Conference (P. Besnard and S. Hanks, eds.) 491–498. Morgan Kaufmann, San Francisco, CA.
• Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
• StataCorp (2013). STATA structural equation modeling reference manual. StataCorp LP, College Station, TX, Release 13.
• Steiger, J. H. (2001). Driving fast in reverse. J. Amer. Statist. Assoc. 96 331–338.
• Triantafillou, S. and Tsamardinos, I. (2016). Score-based vs constraint-based causal learning in the presence of confounders. In UAI 2016 Workshop on Causation: Foundation to Application (F. Eberhardt, E. Bareinboim, M. Maathuis, J. Mooij and R. Silva, eds.). CEUR Workshop Proceedings 1792 59–67.
• Wermuth, N. (2011). Probability distributions with summary graph structure. Bernoulli 17 845–879.
• Wright, S. (1921). Correlation and causation. J. Agricultural Research 20 557–585.
• Wright, S. (1934). The method of path coefficients. Ann. Math. Stat. 5 161–215.

#### Supplemental materials

• Proofs of claims. The supplement provides proofs for claims made in Sections 2, 3 and 4. Specifically, we verify the form of $\det(I-B)$ as claimed in Lemma 1 and derive the likelihood equations with respect to $\Omega$ and $B$. We also verify the claims in Lemmas 4 and 5 which are required for the BCD algorithm described in the constructive proof of Theorem 1. Finally, we verify the claims in Section 4 which characterize graphs for which the BCD algorithm is well defined when initialized generically. In particular, we give a graphical condition and show that it can be checked in time which is a polynomial of the considered variables.