Source: Ann. Appl. Stat. Volume 6, Number 1
(2012), 27-54.
Most of the available methods for longitudinal data analysis are
designed and validated for the situation where the number of
subjects is large and the number of observations per subject is
relatively small. Motivated by the Naturalistic Teenage Driving
Study (NTDS), which represents the exact opposite situation, we
examine standard and propose new methodology for marginal
analysis of longitudinal count data in a small number of very
long sequences. We consider standard methods based on
generalized estimating equations, under working independence or
an appropriate correlation structure, and find them
unsatisfactory for dealing with time-dependent covariates when
the counts are low. For this situation, we explore a
within-cluster resampling (WCR) approach that involves repeated
analyses of random subsamples with a final analysis that
synthesizes results across subsamples. This leads to a novel WCR
method which operates on separated blocks within subjects and
which performs better than all of the previously considered
methods. The methods are applied to the NTDS data and evaluated
in simulation experiments mimicking the NTDS.
References
Albert, P. S. and McShane, L. M. (1995). A generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data. Biometrics 51 627–638.
Chan, K. S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. J. Amer. Statist. Assoc. 90 242–252.
Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (2000). On autocorrelation in a Poisson regression model. Biometrika 87 491–505.
Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (2008). Longitudinal Data Analysis: A Handbook of Modern Statistical Methods. Chapman and Hall/CRC, New York.
Follmann, D., Proschan, M. and Leifer, E. (2003). Multiple outputation: Inference for complex clustered data by averaging analyses from independent data. Biometrics 59 420–429.
Heagerty, P. J. and Lumley, T. (2000). Window subsampling of estimating functions with application to regression models. J. Amer. Statist. Assoc. 95 197–211.
Hoffman, E. B., Sen, P. K. and Weinberg, C. R. (2001). Within-cluster resampling. Biometrika 88 1121–1134.
Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17 1217–1241.
Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
Mathematical Reviews (MathSciNet):
MR836430
Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1990). Using the jackknife to estimate the variance of regression estimators from repeated measures studies. Comm. Statist. Theory Methods 19 821–845.
Mancl, L. A. and DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics 57 126–134.
McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92 162–170.
McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley, Hoboken, NJ.
Oman, S. D., Landsman, V., Carmel, Y. and Kadmon, R. (2007). Analyzing spatially distributed binary data using independent-block estimating equations. Biometrics 63 892–900.
Paik, M. C. (1988). Repeated measurement analysis for nonnormal data in small samples. Communications in Statistics: Simulations 17 1155–1171.
Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. Roy. Statist. Soc. Ser. B 58 509–523.
Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Albert, P. S. and Dingus, T. A. (2011a). Risky driving and crash rates among novice teenagers and their parents. American Journal of Public Health 101 2362–2367.
Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Chen, R., Albert, P. S. and Dingus, T. A. (2011b). Naturalistic assessment of risky driving and crash/near crashes among novice teenagers: the effect of passengers. Journal of Adolescent Health 49 587–593.
Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of Brownian Motion. Phys. Rev. 36 823–841.
Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75 621–629.
Mathematical Reviews (MathSciNet):
MR995107
Zeger, S. L. and Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42 121–130.
Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049–1060.
Mathematical Reviews (MathSciNet):
MR980999