The Annals of Applied Statistics

Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data

Mireille E. Schnitzer, Mark J. van der Laan, Erica E. M. Moodie, and Robert W. Platt

Full-text: Open access


The PROmotion of Breastfeeding Intervention Trial (PROBIT) cluster-randomized a program encouraging breastfeeding to new mothers in hospital centers. The original studies indicated that this intervention successfully increased duration of breastfeeding and lowered rates of gastrointestinal tract infections in newborns. Additional scientific and popular interest lies in determining the causal effect of longer breastfeeding on gastrointestinal infection. In this study, we estimate the expected infection count under various lengths of breastfeeding in order to estimate the effect of breastfeeding duration on infection. Due to the presence of baseline and time-dependent confounding, specialized “causal” estimation methods are required. We demonstrate the double-robust method of Targeted Maximum Likelihood Estimation (TMLE) in the context of this application and review some related methods and the adjustments required to account for clustering. We compare TMLE (implemented both parametrically and using a data-adaptive algorithm) to other causal methods for this example. In addition, we conduct a simulation study to determine (1) the effectiveness of controlling for clustering indicators when cluster-specific confounders are unmeasured and (2) the importance of using data-adaptive TMLE.

Article information

Ann. Appl. Stat., Volume 8, Number 2 (2014), 703-725.

First available in Project Euclid: 1 July 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal inference G-computation inverse probability weighting marginal effects missing data pediatrics


Schnitzer, Mireille E.; van der Laan, Mark J.; Moodie, Erica E. M.; Platt, Robert W. Effect of breastfeeding on gastrointestinal infection in infants: A targeted maximum likelihood approach for clustered longitudinal data. Ann. Appl. Stat. 8 (2014), no. 2, 703--725. doi:10.1214/14-AOAS727.

Supplemental materials

  • Supplementary material: The efficient influence curve for clustered data and data generation for the simulation study. Derivation of the efficient influence curve used in the TMLE analysis. Full description (with R code) of the data generation used in the simulation study.