Statistical Science

A Family of Generalized Linear Models for Repeated Measures with Normal and Conjugate Random Effects

Geert Molenberghs, Geert Verbeke, Clarice G. B. Demétrio, and Afrânio M. C. Vieira

Full-text: Open access

Abstract

Non-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean–variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various members of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simultaneously, models combining them are uncommon. This paper proposes a broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary, count and time-to-event cases are given particular emphasis. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic–numerical integration. Implications for the derivation of marginal correlations functions are discussed. The methodology is applied to data from a study in epileptic seizures, a clinical trial in toenail infection named onychomycosis and survival data in children with asthma.

Article information

Source
Statist. Sci. Volume 25, Number 3 (2010), 325-347.

Dates
First available in Project Euclid: 4 January 2011

Permanent link to this document
https://projecteuclid.org/euclid.ss/1294167963

Digital Object Identifier
doi:10.1214/10-STS328

Mathematical Reviews number (MathSciNet)
MR2791671

Zentralblatt MATH identifier
1329.62342

Keywords
Bernoulli model Beta–binomial model Cauchy distribution conjugacy maximum likelihood frailty model negative-binomial model Poisson model strong conjugacy Weibull model

Citation

Molenberghs, Geert; Verbeke, Geert; Demétrio, Clarice G. B.; Vieira, Afrânio M. C. A Family of Generalized Linear Models for Repeated Measures with Normal and Conjugate Random Effects. Statist. Sci. 25 (2010), no. 3, 325--347. doi:10.1214/10-STS328. https://projecteuclid.org/euclid.ss/1294167963


Export citation

References

  • Aerts, M., Geys, H., Molenberghs, G. and Ryan, L. (2002). Topics in Modelling of Clustered Data. Chapman & Hall, London.
  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55 117–128.
  • Alfò, M. and Aitkin, M. (2000). Random coefficient models for binary longitudinal responses with attrition. Statist. Comput. 10 279–288.
  • Ashford, J. R. and Sowden, R. R. (1970). Multivariate probit analysis. Biometrics 26 535–546.
  • Bahadur, R. R. (1961). A representation of the joint distribution of responses to n dichotomous items. In Studies in Item Analysis and Prediction (H. Solomon, ed.) 158–168. Stanford Univ. Press, Stanford, CA.
  • Böhning, D. (2000). Computer-Assisted Analysis of Mixtures and Applications. Meta-Analysis, Disease Mapping and Others. Chapman & Hall/CRC, London.
  • Booth, J. G., Casella, G., Friedl, H. and Hobert, J. P. (2003). Negative binomial loglinear mixed models. Stat. Model. 3 179–181.
  • Breslow, N. (1984). Extra-Poisson variation in log-linear models. Appl. Statist. 33 38–44.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Breslow, N. E. and Lin, X. (1995). Bias correction in generalized linear mixed models with a single component of dispersion. Biometrika 82 81–91.
  • Burzykowski, T., Molenberghs, G. and Buyse, M. (2005). The Evaluation of Surrogate Endpoints. Springer, New York.
  • Butler, J. S. and Moffit, R. (1982). A computationally efficient quadrature procedure for the one-factor multinomial probit model. Econometrica 50 761–765.
  • Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall/CRC, London.
  • Dale, J. R. (1986). Global cross-ratio models for bivariate, discrete, ordered responses. Biometrics 42 721–727.
  • Dean, C. B. (1991). Estimating equations for mixed-Poisson models. In Estimating Functions (V. P. Godambe, ed.) 35–46. Oxford Univ. Press, Oxford.
  • De Backer, M., De Keyser, P., De Vroey, C. and Lesaffre, E. (1996). A 12-week treatment for dermatophyte toe onychomycosis: Terbinafine 250 mg/day vs. itraconazole 200 mg/day—a double-blind comparative trial. British J. Dermatol. 134 16–17.
  • Duchateau, L. and Janssen, P. (2007). The Frailty Model. Springer, New York.
  • Engel, B. and Keen, A. (1994). A simple approach for the analysis of generalized linear mixed models. Statist. Neerlandica 48 1–22.
  • Fahrmeir, L. and Tutz, G. (2001). Multivariate Statistical Modelling Based on Generalized Linear Models, 2nd ed. Springer, New York.
  • Faught, E., Wilder, B. J., Ramsay, R. E., Reife, R. A., Kramer, L. D., Pledger, G. W. and Karim, R. M. (1996). Topiramate placebo-controlled dose-ranging trial in refractory partial epilepsy using 200-, 400-, and 600-mg daily dosages. Neurology 46 1684–1690.
  • Fitzmaurice, G., Davidian, M., Molenberghs, G. and Verbeke, G. (2009). Longitudinal Data Analysis. Handbooks of Modern Statistical Methods. Chapman & Hall/CRC, New York.
  • Gentle, J. E. (2003). Random Number Generation and Monte Carlo Methods. Springer, New York.
  • Gibbons, R. D. and Hedeker, D. (1997). Random effects probit and logistic regression models for three-level data. Biometrics 53 1527–1537.
  • Guilkey, D. K. and Murphy, J. L. (1993). Estimation and testing in the random effects probit model. J. Econometrics 59 301–317.
  • Harville, D. A. (1974). Bayesian inference for variance components using only error contrasts. Biometrika 61 383–385.
  • Hedeker, D. and Gibbons, R. D. (1994). A random-effects ordinal regression model for multilevel analysis. Biometrics 51 933–944.
  • Henderson, C. R. (1984). Applications of Linear Models in Animal Breeding. University of Guelph Press, Guelph, Canada.
  • Hinde, J. and Demétrio, C. G. B. (1998a). Overdispersion: Models and estimation. Comput. Statist. Data Anal. 27 151–170.
  • Hinde, J. and Demétrio, C. G. B. (1998b). Overdispersion: Models and Estimation. XIII Sinape, São Paulo.
  • Johnson, N. L., Kemp, A. and Kotz, S. (2005). Univariate Discrete Distributions, 3rd ed. Wiley, Hoboken.
  • Johnson, N. L. and Kotz, S. (1970). Distributions in Statistics, Continuous Univariate Distributions, Vol. 2. Houghton-Mifflin, Boston.
  • Kleinman, J. (1973). Proportions with extraneous variance: Single and independent samples. J. Amer. Statist. Assoc. 68 46–54.
  • Lawless, J. (1987). Negative binomial and mixed Poisson regression. Canadian J. Statist. 15 209–225.
  • Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models (with discussion). J. Roy. Statist. Soc. Ser. B 58 619–678.
  • Lee, Y. and Nelder, J. A. (2001a). Two ways of modelling overdispersion. Appl. Statist. 49 591–598.
  • Lee, Y. and Nelder, J. A. (2001b). Hierarchical generalized linear models: A synthesis of generalized linear models, random-effect models and structured dispersions. Biometrika 88 987–1006.
  • Lee, Y. and Nelder, J. A. (2003). Extended-REML estimators. J. Appl. Statist. 30 845–856.
  • Lee, Y., Nelder, J. A. and Pawitan, Y. (2006). Generalized Linear Models with Random Effects: Unified Analysis via H-Likelihood. Chapman & Hall/CRC, Boca Raton, FL.
  • Lesaffre, E. and Molenberghs, G. (1991). Multivariate probit analysis: A neglected procedure in medical statistics. Statist. Med. 10 1391–1403.
  • Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • Lin, T. I. and Lee, J. C. (2008). Estimation and prediction in linear mixed models with skew-normal random effects for longitudinal data. Statist. Med. 27 1490–1507.
  • Liu, L. and Yu, Z. (2008). A likelihood reformulation method in non-normal random-effects models. Statist. Med. 27 3105–3124.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman & Hall/CRC, London.
  • McCulloch, C. E. (1994). Maximum likelihood variance components estimation for binary data. J. Amer. Statist. Assoc. 89 330–335.
  • McLachlan, G. and Peel, D. A. (2000). Finite Mixture Models. Wiley, New York.
  • Molenberghs, G. and Lesaffre, E. (1994). Marginal modelling of correlated ordinal data using a multivariate Plackett distribution. J. Amer. Statist. Assoc. 89 633–644.
  • Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. Springer, New York.
  • Molenberghs, G. and Verbeke, G. (2007). Likelihood ratio, score, and Wald tests in a constrained parameter space. Amer. Statist. 61 1–6.
  • Molenberghs, G., Verbeke, G. and Demétrio, C. (2007). An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Anal. 13 513–531.
  • Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. J. Roy. Statist. Soc. Ser. A 135 370–384.
  • Nelson, K. P., Lipsitz, S. R., Fitzmaurice, G. M., Ibrahim, J., Parzen, M. and Strawderman, R. (2006). Use of the probability integral transformation to fit nonlinear mixed-effects models with non-normal random effects. J. Comput. Graph. Statist. 15 39–57.
  • Renard, D., Molenberghs, G. and Geys, H. (2004). A pairwise likelihood approach to estimation in multilevel probit models. Comput. Statist. Data Anal. 44 649–667.
  • Roberts, D. T. (1992). Prevalence of dermatophyte onychomycosis in the United Kingdom: Results of an omnibus survey. British J. Dermatol. 126 (Suppl. 39) 23–27.
  • Ridout, M., Demétrio, C. G. B. and Hinde, J. (1998). Models for count data with many zeros. In International Biometric Conference XIX 179–192. Cape Town. Invited papers.
  • Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 78 719–729.
  • Skellam, J. G. (1948). A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. J. Roy. Statist. Soc. Ser. B 10 257–261.
  • Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling. Chapman & Hall/CRC, London.
  • Thall, P. F. and Vail, S. C. (1990). Some covariance models for longitudinal count data with overdispersion. Biometrics 46 657–671.
  • Vangeneugden, T., Molenberghs, G., Laenen, A., Alonso, A. and Geys, H. (2008a). Generalizability in non-Gaussian longitudinal clinical trial data based on generalized linear mixed models. J. Biopharm. Statist. 18 691–712.
  • Vangeneugden, T., Molenberghs, G., Verbeke, G. and Demétrio, C. (2010). Marginal correlation from an extended random-effects model for repeated and overdispersed counts. Comm. Statist. Theory Methods. To appear.
  • Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. Springer, New York.
  • Verbeke, G. and Molenberghs, G. (2009). Arbitrariness of models for augmented and coarse data, with emphasis on incomplete-data and random-effects models. Statist. Model. 00 000–000.
  • Wolfinger, R. and O’Connell, M. (1993). Generalized linear mixed models: A pseudo-likelihood approach. J. Statist. Comput. Simul. 48 233–243.
  • Yun, S., Sohn, S. Y. and Lee, Y. (2006). Modelling and estimating heavy-tailed non-homogeneous correlated queues Pareto-inverse gamma HGLMs with covariates. J. Appl. Statist. 33 417–425.
  • Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049–1060.

Supplemental materials

  • Supplementary material: A family of generalized linear models for repeated measures with normal and conjugate random effects: Calculation details. In Section A, generic approximate calculations are provided. Closed-form calculations for various cases are offered as well: for the Poisson case (Section B), for the binary case with logit link (Section C), for the binary case with probit link (Section D), and for the time-to-event case (Section E). Finally, Section F is dedicated to the derivation of marginal correlation functions.