The Annals of Applied Statistics

Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

Mireille E. Schnitzer, Mark J. van der Laan, Erica E. M. Moodie, and Robert W. Platt

Full-text: Open access


The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized a program encouraging breastfeeding to new mothers in hospital centers. The original studies indicated that this intervention successfully increased duration of breastfeeding and lowered rates of gastrointestinal tract infections in newborns. Additional scientific and popular interest lies in determining the causal effect of longer breastfeeding on gastrointestinal infection. In this study, we estimate the expected infection count under various lengths of breastfeeding in order to estimate the effect of breastfeeding duration on infection. Due to the presence of baseline and time-dependent confounding, specialized “causal” estimation methods are required. We demonstrate the double-robust method of Targeted Maximum Likelihood Estimation (TMLE) in the context of this application and review some related methods and the adjustments required to account for clustering. We compare TMLE (implemented both parametrically and using a data-adaptive algorithm) to other causal methods for this example. In addition, we conduct a simulation study to determine (1) the effectiveness of controlling for clustering indicators when cluster-specific confounders are unmeasured and (2) the importance of using data-adaptive TMLE.

Article information

Ann. Appl. Stat., Volume 8, Number 2 (2014), 703-725.

First available in Project Euclid: 1 July 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal inference G-computation inverse probability weighting marginal effects missing data pediatrics


Schnitzer, Mireille E.; van der Laan, Mark J.; Moodie, Erica E. M.; Platt, Robert W. Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data. Ann. Appl. Stat. 8 (2014), no. 2, 703--725. doi:10.1214/14-AOAS727.

Export citation


  • Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962–972.
  • Cameron, A. C., Gelbach, J. B. and Miller, D. L. (2008). Boostrap-based improvements for inference with clustered errors. The Review of Economics and Statistics 90 414–427.
  • Finster, M. and Wood, M. (2005). The apgar score has survived the test of time. Anesthesiology 102 855–857.
  • Gruber, S. and van der Laan, M. J. (2010). A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome. Int. J. Biostat. 6 Art. 26, 16.
  • Hastie, T. (2011). gam: Generalized additive models. R package version 1.04.1.
  • Hernán, M. A., Brumback, B. and Robins, J. M. (2000). Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology 11 561–570.
  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • Kramer, M. S., Chalmers, B., Hodnett, E. D., Sevkovskaya, Z., Dzikovich, I., Shapiro, S., Collet, J. P., Vanilovich, I., Mezen, I., Ducruet, T., Shishko, G., Zubovich, V., Mknuik, D., Gluchanina, E., Dombrovskiy, V., Ustinovitch, A., Kot, T., Bogdanovich, N., Ovchinikova, L. and Helsing, E. (2001). Promotion of breastfeeding intervention trial (PROBIT). The Journal of the American Medical Association 285 413–420.
  • Kramer, M. S., Guo, T., Platt, R. W., Shapiro, S., Collet, J. P., Chalmers, B., Hodnett, E., Sevkovskaya, Z., Dzikovich, I. and Vanilovich, I. (2002). Breastfeeding and infant growth: Biology or bias? Pediatrics 110 343–347.
  • Kramer, M. S., Moodie, E. E. M., Dahhou, M. and Platt, R. W. (2011). Breastfeeding and infant size: Evidence of reverse causality. American Journal of Epidemiology 173 978–983.
  • Milborrow, S. (2011). Earth: Multivariate adaptive regression spline models. Derived from mda:mars by Trevor Hastie and Rob Tibshirani.
  • Peters, A. and Hothorn, T. (2011). ipred: Improved predictors. R package version 0.8-11.
  • Petersen, M. L., Porter, K. E., Gruber, S., Wang, Y. and van der Laan, M. J. (2012). Diagnosing and responding to violations in the positivity assumption. Stat. Methods Med. Res. 21 31–54.
  • Petersen, M., Schwab, J., Gruber, S., Blaser, N., Schomaker, M. and van der Laan, M. (2014). Targeted maximum likelihood estimation for dynamic and static longitudinal marginal structural working models. Journal of Causal Inference. To appear.
  • Polley, E. C. and van der Laan, M. J. (2011). Package “SuperLearner,” 2.0-4 ed.
  • R Development Core Team (2011). R: A Language and Environment for Statistical Computing. Vienna, Austria. ISBN 3-900051-07-0.
  • Robins, J. (1986). A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math. Modelling 7 1393–1512.
  • Robins, J. M., Hernán, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.
  • Robins, J. M. and Rotnitzky, A. (1992). Recovery of information and adjustment for dependent censoring using surrogate markers. In AIDS Epidemiology—Methodological Issues (N. Jewell, K. Dietz and V. Farewell, eds.) 297–331. Birkhäuser, Boston, MA.
  • Rosenblum, M. and van der Laan, M. J. (2010a). Simple examples of estimating causal effects using targeted maximum likelihood estimation. Working paper, Univ. California, Berkeley, Division of Biostatistics.
  • Rosenblum, M. and van der Laan, M. J. (2010b). Targeted maximum likelihood estimation of the parameter of a marginal structural model. Int. J. Biostat. 6 Art. 19, 23.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66 688–701.
  • Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
  • Schnitzer, M. E., Moodie, E. E. M. and Platt, R. W. (2013). Targeted maximum likelihood estimation for marginal time-dependent treatment effects under density misspecification. Biostatistics 14 1–14.
  • Schnitzer, M. E., van der Laan, M. J., Moodie, E. E. M. and Platt, R. W. (2014). Supplement to “Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data.” DOI:10.1214/14-AOAS727SUPP.
  • Snowden, J. M., Rose, S. and Mortimer, K. M. (2011). Implementation of G-computation on a simulated data set: Demonstration of a causal inference technique. Am. J. Epidemiol. 173 731–738.
  • Stitelman, O. M., De Gruttola, V. and van der Laan, M. J. (2012). A general implementation of TMLE for longitudinal data applied to causal inference in survival analysis. Int. J. Biostat. 8 Art. 26, front matter+37.
  • Tsiatis, A. A. (2006). Semiparametric Theory and Missing Data. Springer, New York.
  • van der Laan, M. J. (2010). Targeted maximum likelihood based causal inference. I. Int. J. Biostat. 6 Art. 2, 44.
  • van der Laan, M. J. and Gruber, S. (2012). Targeted minimum loss based estimation of causal effects of multiple time point interventions. Int. J. Biostat. 8 Art. 9, 41.
  • van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York.
  • van der Laan, M. J. and Rubin, D. (2006). Targeted maximum likelihood learning. Int. J. Biostat. 2 Art. 11, 40.
  • Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S, 4th ed. Springer, New York.

Supplemental materials

  • Supplementary material: The efficient influence curve for clustered data and data generation for the simulation study. Derivation of the efficient influence curve used in the TMLE analysis. Full description (with R code) of the data generation used in the simulation study.