The Annals of Statistics

Causal inference in partially linear structural equation models

Dominik Rothenhäusler, Jan Ernest, and Peter Bühlmann

Full-text: Open access


We consider identifiability of partially linear additive structural equation models with Gaussian noise (PLSEMs) and estimation of distributionally equivalent models to a given PLSEM. Thereby, we also include robustness results for errors in the neighborhood of Gaussian distributions. Existing identifiability results in the framework of additive SEMs with Gaussian noise are limited to linear and nonlinear SEMs, which can be considered as special cases of PLSEMs with vanishing nonparametric or parametric part, respectively. We close the wide gap between these two special cases by providing a comprehensive theory of the identifiability of PLSEMs by means of (A) a graphical, (B) a transformational, (C) a functional and (D) a causal ordering characterization of PLSEMs that generate a given distribution $\mathbb{P}$. In particular, the characterizations (C) and (D) answer the fundamental question to which extent nonlinear functions in additive SEMs with Gaussian noise restrict the set of potential causal models, and hence influence the identifiability.

On the basis of the transformational characterization (B) we provide a score-based estimation procedure that outputs the graphical representation (A) of the distribution equivalence class of a given PLSEM. We derive its (high-dimensional) consistency and demonstrate its performance on simulated datasets.

Article information

Ann. Statist., Volume 46, Number 6A (2018), 2904-2938.

Received: July 2016
Revised: October 2017
First available in Project Euclid: 7 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G99: None of the above, but in this section 62H99: None of the above, but in this section
Secondary: 68T99: None of the above, but in this section

Causal inference distribution equivalence class graphical model high-dimensional consistency partially linear structural equation model


Rothenhäusler, Dominik; Ernest, Jan; Bühlmann, Peter. Causal inference in partially linear structural equation models. Ann. Statist. 46 (2018), no. 6A, 2904--2938. doi:10.1214/17-AOS1643.

Export citation


  • [1] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
  • [2] Bühlmann, P. (2013). Causal statistical inference in high dimensions. Math. Methods Oper. Res. 77 357–370.
  • [3] Bühlmann, P., Peters, J. and Ernest, J. (2014). CAM: Causal additive models, high-dimensional order search and penalized regression. Ann. Statist. 42 2526–2556.
  • [4] Castelo, R. and Kocka, T. (2003). On inclusion-driven learning of Bayesian networks. J. Mach. Learn. Res. 4 527–574.
  • [5] Chickering, D. (2002). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507–554.
  • [6] Chickering, D. M. (1995). A transformational characterization of equivalent Bayesian network structures. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI) 87–98. Morgan Kaufmann, San Francisco, CA.
  • [7] Glass, T. A., Goodman, S. N., Hernán, M. A. and Samet, J. M. (2013). Causal inference in public health. Annu. Rev. Public Health 34 61–75.
  • [8] Hoyer, P., Hyvarinen, A., Scheines, R., Spirtes, P., Ramsey, J., Lacerda, G. and Shimizu, S. (2008). Causal discovery of linear acyclic models with arbitrary distributions. In Proceedings of the 24th Annual Conference on Uncertainty in Artificial Intelligence (UAI) 282–289. AUAI Press, Corvallis, OR.
  • [9] Hoyer, P. O., Janzing, D., Mooij, J. M., Peters, J. and Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21 (NIPS) 689–696. Curran, Red Hook, NY.
  • [10] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
  • [11] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H. and Bühlmann, P. (2012). Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47 1–26.
  • [12] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence (UAI) 403–410. Morgan Kaufmann, San Francisco, CA.
  • [13] Nandy, P., Hauser, A. and Maathuis, M. (2015). High-dimensional consistency in score-based and hybrid structure learning. arXiv:1507.02608.
  • [14] Nowzohour, C. and Bühlmann, P. (2016). Score-based causal learning in additive noise models. Statistics 50 471–485.
  • [15] Pearl, J. (2009). Causality: Models, Reasoning and Inference, 2nd ed. Cambridge Univ. Press, New York.
  • [16] Peters, J. and Bühlmann, P. (2014). Identifiability of Gaussian structural equation models with equal error variances. Biometrika 101 219–228.
  • [17] Peters, J., Mooij, J., Janzing, D. and Schölkopf, B. (2011). Identifiability of causal graphs using functional models. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI) 589–598. AUAI Press, Corvallis, OR.
  • [18] Peters, J., Mooij, J., Janzing, D. and Schölkopf, B. (2014). Causal discovery with continuous additive noise models. J. Mach. Learn. Res. 15 2009–2053.
  • [19] Ramsey, J., Hanson, S., Hanson, C., Halchenko, Y., Poldrack, R. and Glymour, C. (2010). Six problems for causal inference from fmri. NeuroImage 49 1545–1558.
  • [20] Rothenhäusler, D., Ernest, J. and Bühlmann, P. (2018). Supplement to “Causal inference in partially linear structural equation models.” DOI:10.1214/17-AOS1643SUPP.
  • [21] Shimizu, S., Hoyer, P., Hyvärinen, A. and Kerminen, A. (2006). A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7 2003–2030.
  • [22] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press.
  • [23] Spirtes, P. and Zhang, K. (2016). Causal discovery and inference: Concepts and recent methodological advances. Applied Informatics 3 1–28.
  • [24] Statnikov, A., Henaff, M., Lytkin, N. I. and Aliferis, C. F. (2012). New methods for separating causes from effects in genomics data. BMC Genomics 13 S22.
  • [25] Stekhoven, D., Moraes, I., Sveinbjörnsson, G., Hennig, L., Maathuis, M. and Bühlmann, P. (2012). Causal stability ranking. Bioinformatics 28 2819–2823.
  • [26] van de Geer, S. (2014). On the uniform convergence of empirical norms and inner products, with application to causal inference. Electron. J. Stat. 8 543–574.
  • [27] van de Geer, S. and Bühlmann, P. (2013). $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 536–567.
  • [28] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence (UAI) 220–227. AUAI Press, Corvallis, OR.
  • [29] Wood, S. N. (2003). Thin plate regression splines. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 95–114.
  • [30] Wood, S. N. (2006). Generalized Additive Models: An Introduction with R. CRC Press, Boca Raton, FL.
  • [31] Zhang, K. and Hyvärinen, A. (2009). On the identifiability of the post-nonlinear causal model. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI) 647–655. AUAI Press, Corvallis, OR.

Supplemental materials