The Annals of Applied Statistics

Marginal analysis of longitudinal count data in long sequences: Methods and applications to a driving study

Zhiwei Zhang, Paul S. Albert, and Bruce Simons-Morton
Source: Ann. Appl. Stat. Volume 6, Number 1 (2012), 27-54.

Abstract

Most of the available methods for longitudinal data analysis are designed and validated for the situation where the number of subjects is large and the number of observations per subject is relatively small. Motivated by the Naturalistic Teenage Driving Study (NTDS), which represents the exact opposite situation, we examine standard and propose new methodology for marginal analysis of longitudinal count data in a small number of very long sequences. We consider standard methods based on generalized estimating equations, under working independence or an appropriate correlation structure, and find them unsatisfactory for dealing with time-dependent covariates when the counts are low. For this situation, we explore a within-cluster resampling (WCR) approach that involves repeated analyses of random subsamples with a final analysis that synthesizes results across subsamples. This leads to a novel WCR method which operates on separated blocks within subjects and which performs better than all of the previously considered methods. The methods are applied to the NTDS data and evaluated in simulation experiments mimicking the NTDS.

First Page: Show Hide
Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1331043387
Digital Object Identifier: doi:10.1214/11-AOAS507
Zentralblatt MATH identifier: 1235.62037
Mathematical Reviews number (MathSciNet): MR2951528

References

Albert, P. S. and McShane, L. M. (1995). A generalized estimating equations approach for spatially correlated binary data: Applications to the analysis of neuroimaging data. Biometrics 51 627–638.
Chan, K. S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. J. Amer. Statist. Assoc. 90 242–252.
Mathematical Reviews (MathSciNet): MR1325132
Zentralblatt MATH: 0819.62069
Digital Object Identifier: doi:10.1080/01621459.1995.10476508
Davis, R. A., Dunsmuir, W. T. M. and Wang, Y. (2000). On autocorrelation in a Poisson regression model. Biometrika 87 491–505.
Mathematical Reviews (MathSciNet): MR1789805
Zentralblatt MATH: 0956.62075
Digital Object Identifier: doi:10.1093/biomet/87.3.491
Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR2049007
Zentralblatt MATH: 1031.62002
Fitzmaurice, G., Davidian, M., Verbeke, G. and Molenberghs, G. (2008). Longitudinal Data Analysis: A Handbook of Modern Statistical Methods. Chapman and Hall/CRC, New York.
Follmann, D., Proschan, M. and Leifer, E. (2003). Multiple outputation: Inference for complex clustered data by averaging analyses from independent data. Biometrics 59 420–429.
Mathematical Reviews (MathSciNet): MR1987409
Digital Object Identifier: doi:10.1111/1541-0420.00049
Heagerty, P. J. and Lumley, T. (2000). Window subsampling of estimating functions with application to regression models. J. Amer. Statist. Assoc. 95 197–211.
Mathematical Reviews (MathSciNet): MR1803149
Zentralblatt MATH: 1013.62077
Digital Object Identifier: doi:10.1080/01621459.2000.10473914
Hoffman, E. B., Sen, P. K. and Weinberg, C. R. (2001). Within-cluster resampling. Biometrika 88 1121–1134.
Mathematical Reviews (MathSciNet): MR1872223
Zentralblatt MATH: 0986.62047
Digital Object Identifier: doi:10.1093/biomet/88.4.1121
Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations. Ann. Statist. 17 1217–1241.
Mathematical Reviews (MathSciNet): MR1015147
Zentralblatt MATH: 0684.62035
Digital Object Identifier: doi:10.1214/aos/1176347265
Project Euclid: euclid.aos/1176347265
Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
Mathematical Reviews (MathSciNet): MR836430
Zentralblatt MATH: 0595.62110
Digital Object Identifier: doi:10.1093/biomet/73.1.13
Lipsitz, S. R., Laird, N. M. and Harrington, D. P. (1990). Using the jackknife to estimate the variance of regression estimators from repeated measures studies. Comm. Statist. Theory Methods 19 821–845.
Mancl, L. A. and DeRouen, T. A. (2001). A covariance estimator for GEE with improved small-sample properties. Biometrics 57 126–134.
Mathematical Reviews (MathSciNet): MR1833298
Digital Object Identifier: doi:10.1111/j.0006-341X.2001.00126.x
McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92 162–170.
Mathematical Reviews (MathSciNet): MR1436105
Zentralblatt MATH: 0889.62061
Digital Object Identifier: doi:10.1080/01621459.1997.10473613
McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized, Linear, and Mixed Models, 2nd ed. Wiley, Hoboken, NJ.
Mathematical Reviews (MathSciNet): MR2431553
Oman, S. D., Landsman, V., Carmel, Y. and Kadmon, R. (2007). Analyzing spatially distributed binary data using independent-block estimating equations. Biometrics 63 892–900.
Mathematical Reviews (MathSciNet): MR2395808
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00754.x
Paik, M. C. (1988). Repeated measurement analysis for nonnormal data in small samples. Communications in Statistics: Simulations 17 1155–1171.
Sherman, M. (1996). Variance estimation for statistics computed from spatial lattice data. J. Roy. Statist. Soc. Ser. B 58 509–523.
Mathematical Reviews (MathSciNet): MR1394363
Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Albert, P. S. and Dingus, T. A. (2011a). Risky driving and crash rates among novice teenagers and their parents. American Journal of Public Health 101 2362–2367.
Simons-Morton, B. G., Ouimet, M. C., Zhang, Z., Lee, S. E., Klauer, S. E., Wang, J., Chen, R., Albert, P. S. and Dingus, T. A. (2011b). Naturalistic assessment of risky driving and crash/near crashes among novice teenagers: the effect of passengers. Journal of Adolescent Health 49 587–593.
Uhlenbeck, G. E. and Ornstein, L. S. (1930). On the theory of Brownian Motion. Phys. Rev. 36 823–841.
Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75 621–629.
Mathematical Reviews (MathSciNet): MR995107
Zentralblatt MATH: 0653.62064
Digital Object Identifier: doi:10.1093/biomet/75.4.621
Zeger, S. L. and Liang, K. Y. (1986). Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42 121–130.
Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049–1060.
Mathematical Reviews (MathSciNet): MR980999
Digital Object Identifier: doi:10.2307/2531734

2013 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics

Turn MathJax Off
What is MathJax?