Commonly used methods to analyze incomplete longitudinal clinical trial data include complete case analysis (CC) and last observation carried forward (LOCF). However, such methods rest on strong assumptions, including missing completely at random (MCAR) for CC and unchanging profile after dropout for LOCF. Such assumptions are too strong to generally hold. Over the last decades, a number of full longitudinal data analysis methods have become available, such as the linear mixed model for Gaussian outcomes, that are valid under the much weaker missing at random (MAR) assumption. Such a method is useful, even if the scientific question is in terms of a single time point, for example, the last planned measurement occasion, and it is generally consistent with the intention-to-treat principle. The validity of such a method rests on the use of maximum likelihood, under which the missing data mechanism is ignorable as soon as it is MAR. In this paper, we will focus on non-Gaussian outcomes, such as binary, categorical or count data. This setting is less straightforward since there is no unambiguous counterpart to the linear mixed model. We first provide an overview of the various modeling frameworks for non-Gaussian longitudinal data, and subsequently focus on generalized linear mixed-effects models, on the one hand, of which the parameters can be estimated using full likelihood, and on generalized estimating equations, on the other hand, which is a nonlikelihood method and hence requires a modification to be valid under MAR. We briefly comment on the position of models that assume missingness not at random and argue they are most useful to perform sensitivity analysis. Our developments are underscored using data from two studies. While the case studies feature binary outcomes, the methodology applies equally well to other discrete-data settings, hence the qualifier “discrete” in the title.
References
Aerts, M., Geys, H., Molenberghs, G. and Ryan, L. M. (2002). Topics in Modelling of Clustered Data. Chapman and Hall, London.
Afifi, A. and Elashoff, R. (1966). Missing observations in multivariate statistics. I. Review of the literature. J. Amer. Statist. Assoc. 61 595--604.
Mathematical Reviews (MathSciNet):
MR203865
Agresti, A. (1990). Categorical Data Analysis. Wiley, New York.
Ashford, J. R. and Sowden, R. R. (1970). Multivariate probit analysis. Biometrics 26 535--546.
Bahadur, R. R. (1961). A representation of the joint distribution of responses to $n$ dichotomous items. In Studies in Item Analysis and Prediction (H. Solomon, ed.) 169--176. Stanford Univ. Press, Stanford, CA.
Mathematical Reviews (MathSciNet):
MR121894
Bowman, D. and George, E. O. (1995). A saturated model for analyzing exchangeable binary data: Applications to clinical and developmental toxicity studies. J. Amer. Statist. Assoc. 90 871--879.
Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9--25.
Dale, J. R. (1986). Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42 909--917.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1--38.
Mathematical Reviews (MathSciNet):
MR501537
Dempster, A. P. and Rubin, D. B. (1983). Overview. In Incomplete Data in Sample Surveys 2. Theory and Bibliographies (W. G. Madow, I. Olkin and D. B. Rubin, eds.) 3--10. Academic Press, New York.
Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Univ. Press, New York.
Ekholm, A. (1991). Algorithms versus models for analyzing data that contain misclassification errors (with response). Biometrics 47 1171--1182.
Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized Linear Models. Springer, Heidelberg.
Fitzmaurice, G. M. (2003). Methods for handling dropouts in longitudinal clinical trials. Statist. Neerlandica 57 75--99.
Forster, J. J. and Smith, P. W. F. (1998). Model-based inference for categorical survey data subject to non-ignorable non-response. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 57--70.
Freeman, G. H. and Halton, J. H. (1951). Note on an exact treatment of contingency, goodness of fit and other problems of significance. Biometrika 38 141--149.
Mathematical Reviews (MathSciNet):
MR42666
Geys, H., Molenberghs, M. and Lipsitz, S. R. (1998). A note on the comparison of pseudo-likelihood and generalized estimating equations for marginal odds ratio models with exchangeable association structure. J. Statist. Comput. Simulation 62 45--72.
Gilula, Z. and Haberman, S. J. (1994). Conditional log-linear models for analyzing categorical panel data. J. Amer. Statist. Assoc. 89 645--656.
Glonek, G. F. V. and McCullagh, P. (1995). Multivariate logistic models. J. Roy. Statist. Soc. Ser. B 55 533--546.
Hartley, H. O. and Hocking, R. R. (1971). The analysis of incomplete data. Biometrics 27 783--823.
Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879--899.
Jansen, I., Molenberghs, G., Aerts, M., Thijs, H. and van Steen, K. (2003). A local influence approach applied to binary data from a psychiatric study. Biometrics 59 410--418.
Kenward, M. G., Goetghebeur, E. J. T. and Molenberghs, G. (2001). Sensitivity analysis for incomplete categorical data. Statistical Modelling 1 31--48.
Kenward, M. G. and Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random. Statist. Sci. 13 236--247.
Lang, J. B. and Agresti, A. (1994). Simultaneously modeling joint and marginal distributions of multivariate categorical responses. J. Amer. Statist. Assoc. 89 625--632.
le Cessie, S. and van Houwelingen, J. C. (1994). Logistic regression for correlated binary data. Appl. Statist. 43 95--108.
Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13--22.
Mathematical Reviews (MathSciNet):
MR836430
Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, New York.
Longford, N. (1993). Inference about variation in clustered binary data. Paper presented at the Multilevel Conference, Rand Corporation, Los Angeles.
Mallinckrodt, C. H., Clark, W. S., Carroll, R. J. and Molenberghs, G. (2003a). Assessing response profiles from incomplete longitudinal clinical trial data under regulatory considerations. J. Biopharmaceutical Statistics 13 179--190.
Mallinckrodt, C. H., Clark, W. S. and David, S. R. (2001a). Type I error rates from mixed-effects model repeated measures versus fixed effects analysis of variance with missing values imputed via last observation carried forward. Drug Information J. 35 1215--1225.
Mallinckrodt, C. H., Clark, W. S. and David, S. R. (2001b). Accounting for dropout bias using mixed-effects models. J. Biopharmaceutical Statistics 11 9--21.
Mallinckrodt, C. H., Sanger, T. M., Dubé, S., DeBrota, D. J., Molenberghs, G., Carroll, R. J., Potter, W. Z. and Tollefson, G. D. (2003b). Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biological Psychiatry 53 754--760.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
Mathematical Reviews (MathSciNet):
MR727836
Molenberghs, G. and Lesaffre, E. (1994). Marginal modeling of correlated ordinal data using a multivariate Plackett distribution. J. Amer. Statist. Assoc. 89 633--644.
Molenberghs, G. and Lesaffre, E. (1999). Marginal modelling of multivariate categorical data. Statistics in Medicine 18 2237--2255.
Molenberghs, G., Thijs, H., Jansen, I., Beunckens, C., Kenward, M. G., Mallinckrodt, C. and Carroll, R. J. (2004). Analyzing incomplete longitudinal clinical trial data. Biostatistics 5 445--464.
Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. Springer, New York.
Neuhaus, J. M. (1992). Statistical methods for longitudinal and clustered designs with binary responses. Statistical Methods in Medical Research 1 249--273.
Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. (1991). A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Internat. Statist. Rev. 59 25--35.
Pharmacological Therapy for Macular Degeneration Study Group (1997). Interferon $\alpha$-IIA is ineffective for patients with choroidal neovascularization secondary to age-related macular degeneration. Results of a prospective randomized placebo-controlled clinical trial. Archives of Ophthalmology 115 865--872.
Plackett, R. L. (1965). A class of bivariate distributions. J. Amer. Statist. Assoc. 60 516--522.
Mathematical Reviews (MathSciNet):
MR183042
Raab, G. M. and Donnelly, C. A. (1999). Information on sexual behaviour when some data are missing. Appl. Statist. 48 117--133.
Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106--121.
Rotnitzky, A., Robins, J. M., and Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with nonignorable nonresponse. J. Amer. Statist. Assoc. 93 1321--1339.
Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581--592.
Mathematical Reviews (MathSciNet):
MR455196
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
Mathematical Reviews (MathSciNet):
MR899519
Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling ``don't know'' survey responses: The case of the Slovenian plebiscite. J. Amer. Statist. Assoc. 90 822--828.
Schafer, J. (2003). Multiple imputation in multivariate problems when the imputation and analysis models differ. Statist. Neerlandica 57 19--35.
Siddiqui, O. and Ali, M. W. (1998). A comparison of the random-effects pattern mixture model with last-observation-carried-forward (LOCF) analysis in longitudinal clinical trials with dropouts. J. Biopharmaceutical Statistics 8 545--563.
Stiratelli, R., Laird, N. and Ware, J. H. (1984). Random effects models for serial observations with binary response. Biometrics 40 961--971.
van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York.
van Steen, K., Molenberghs, G., Verbeke, G. and Thijs, H. (2001). A local influence approach to sensitivity analysis of incomplete longitudinal ordinal data. Statistical Modelling 1 125--142.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer, New York.
Wolfinger, R. and O'Connell, M. (1993). Generalized linear mixed models: A pseudo-likelihood approach. J. Statist. Comput. Simulation 48 233--243.