Annals of Applied Statistics

Marginal analysis of longitudinal count data in long sequences: Methods and applications to a driving study

Zhiwei Zhang, Paul S. Albert, and Bruce Simons-Morton

Full-text: Open access


Most of the available methods for longitudinal data analysis are designed and validated for the situation where the number of subjects is large and the number of observations per subject is relatively small. Motivated by the Naturalistic Teenage Driving Study (NTDS), which represents the exact opposite situation, we examine standard and propose new methodology for marginal analysis of longitudinal count data in a small number of very long sequences. We consider standard methods based on generalized estimating equations, under working independence or an appropriate correlation structure, and find them unsatisfactory for dealing with time-dependent covariates when the counts are low. For this situation, we explore a within-cluster resampling (WCR) approach that involves repeated analyses of random subsamples with a final analysis that synthesizes results across subsamples. This leads to a novel WCR method which operates on separated blocks within subjects and which performs better than all of the previously considered methods. The methods are applied to the NTDS data and evaluated in simulation experiments mimicking the NTDS.

Article information

Ann. Appl. Stat., Volume 6, Number 1 (2012), 27-54.

First available in Project Euclid: 6 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Correlation generalized estimating equation multiple outputation overdispersion random effect separated blocks within-cluster resampling


Zhang, Zhiwei; Albert, Paul S.; Simons-Morton, Bruce. Marginal analysis of longitudinal count data in long sequences: Methods and applications to a driving study. Ann. Appl. Stat. 6 (2012), no. 1, 27--54. doi:10.1214/11-AOAS507.

Export citation


  • Albert, P. S. and McShane, L. M. (1995). A generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data. Biometrics 51 627–638.
  • Chan, K. S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. J. Amer. Statist. Assoc. 90 242–252.
  • Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (2000). On autocorrelation in a Poisson regression model. Biometrika 87 491–505.
  • Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
  • Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (2008). Longitudinal Data Analysis: A Handbook of Modern Statistical Methods. Chapman and Hall/CRC, New York.
  • Follmann, D., Proschan, M. and Leifer, E. (2003). Multiple outputation: Inference for complex clustered data by averaging analyses from independent data. Biometrics 59 420–429.
  • Heagerty, P. J. and Lumley, T. (2000). Window subsampling of estimating functions with application to regression models. J. Amer. Statist. Assoc. 95 197–211.
  • Hoffman, E. B., Sen, P. K. and Weinberg, C. R. (2001). Within-cluster resampling. Biometrika 88 1121–1134.
  • Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17 1217–1241.
  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1990). Using the jackknife to estimate the variance of regression estimators from repeated measures studies. Comm. Statist. Theory Methods 19 821–845.
  • Mancl, L. A. and DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics 57 126–134.
  • McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92 162–170.
  • McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley, Hoboken, NJ.
  • Oman, S. D., Landsman, V., Carmel, Y. and Kadmon, R. (2007). Analyzing spatially distributed binary data using independent-block estimating equations. Biometrics 63 892–900.
  • Paik, M. C. (1988). Repeated measurement analysis for nonnormal data in small samples. Communications in Statistics: Simulations 17 1155–1171.
  • Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. Roy. Statist. Soc. Ser. B 58 509–523.
  • Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Albert, P. S. and Dingus, T. A. (2011a). Risky driving and crash rates among novice teenagers and their parents. American Journal of Public Health 101 2362–2367.
  • Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Chen, R., Albert, P. S. and Dingus, T. A. (2011b). Naturalistic assessment of risky driving and crash/near crashes among novice teenagers: the effect of passengers. Journal of Adolescent Health 49 587–593.
  • Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of Brownian Motion. Phys. Rev. 36 823–841.
  • Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75 621–629.
  • Zeger, S. L. and Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42 121–130.
  • Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049–1060.