Statistical Science

Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter

Charles E. McCulloch and John M. Neuhaus

Full-text: Open access


Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects and estimation of random effects variances. We describe examples, theoretical calculations and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.

Article information

Statist. Sci. Volume 26, Number 3 (2011), 388-402.

First available in Project Euclid: 31 October 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Maximum likelihood mixed models parametric modeling


McCulloch, Charles E.; Neuhaus, John M. Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter. Statist. Sci. 26 (2011), no. 3, 388--402. doi:10.1214/11-STS361.

Export citation


  • Agresti, A., Caffo, B. and Ohman-Strickland, P. (2004). Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. J. Comput. Graph. Statist. 47 639–653.
  • Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55 117–128.
  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Czáki, eds.) 267–281. Akademiai Kiadó, Budapest.
  • Auble, T. E., Hsieh, M., McCausland, J. B. and Yealy, D. M. (2007). Comparison of four clinical prediction rules for estimating risk in heart failure. Annals of Emergency Medicine 50 127–135.
  • Benhin, E., Rao, J. N. K. and Scott, A. J. (2005). Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes. Biometrika 92 435–450.
  • Butler, S. M. and Louis, T. A. (1992). Random effects models with non-parametric priors. Stat. Med. 11 1981–2000.
  • Caffo, B., An, M. and Rohde, C. (2007). Flexible random intercept models for binary outcomes using mixtures of normals. Comput. Statist. Data Anal. 51 5220–5235.
  • Chen, J., Zhang, D. and Davidian, M. (2002). A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics (Oxford) 3 347–360.
  • Davidian, M. and Gallant, A. R. (1993). The nonlinear mixed effects model with a smooth random effects density. Biometrika 80 475–488.
  • Ghidey, W., Lesaffre, E. and Eilers, P. (2004). Smooth random effects distribution in a linear mixed model. Biometrics 60 945–953.
  • Ghidey, W., Lesaffre, E. and Verbeke, G. (2010). A comparison of methods for estimating the random effects distribution of a linear mixed model. Stat. Methods Med. Res. 19 565–600.
  • He, Y. and Raghunathan, T. (2006). Tukey’s gh distribution for multiple imputation. Amer. Statist. 60 251–256.
  • Heagerty, P. J. and Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 88 973–985.
  • Heagerty, P. J. and Zeger, S. L. (2000). Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statist. Sci. 15 1–26.
  • Heckman, J. and Singer, B. (1984). A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52 271–320.
  • Hoffman, E. B., Sen, P. K. and Weinberg, C. R. (2001). Within-cluster resampling. Biometrika 88 1121–1134.
  • Huang, X. (2009). Diagnosis of random-effect model misspecification in generalized linear mixed models for binary response. Biometrics 65 361–368.
  • Huber, P. J. (1967). The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 1 (L. M. Le Cam and J. Neyman, eds.) 221–223. Univ. California Press, Berkeley.
  • Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B. and Vittinghoff, E. (1998). Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group. Journal of the American Medical Association 280 605–613.
  • Kullback, S. C. (1959). Information Theory and Statistics. Wiley, New York.
  • Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73 805–811.
  • Lee, Y. and Nelder, J. A. (2004). Conditional and marginal models: Another view. Statist. Sci. 19 219–238.
  • Lesaffre, E. and Molenberghs, G. (2001). Multivariate probit analysis: A neglected procedure in medical statistics. Stat. Med. 10 1391–1403.
  • Litière, S., Alonso, A. and Molenberghs, G. (2007). Type I and type II error under random-effects misspecification in generalized linear mixed models. Biometrics 63 1038–1044.
  • Litière, S., Alonso, A. and Molenberghs, G. (2008). The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Stat. Med. 27 3125–3144.
  • Magder, L. S. and Zeger, S. L. (1996). A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians. J. Amer. Statist. Assoc. 91 1141–1151.
  • McCulloch, C. E. and Neuhaus, J. M. (2011). Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics 67 270–279.
  • McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized, Linear and Mixed Models, 2nd ed. Wiley, New York.
  • Metlay, J. P., Camargo, C. A., MacKenzie, T., McCulloch, C. E., Maselli, J., Levin, S. K., Kersey, A., Gonzales, R. and the IMPAACT Investigators (2007). Cluster-randomized trial to improve antibiotic use for adults with acute respiratory infections treated in emergency departments. Annals of Emergency Medicine 50 221–230.
  • Neuhaus, J. M., Hauck, W. W. and Kalbfleisch, J. D. (1992). The effects of mixture distribution misspecification when fitting mixed-effects logistic models. Biometrika 79 755–762.
  • Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. (1994). Conditions for consistent estimation in mixed-effects models for binary matched pairs data. Canad. J. Statist. 22 139–148.
  • Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between- and within-cluster covariate effects in the analysis of clustered data. Biometrics 54 638–645.
  • Neuhaus, J. M. and McCulloch, C. E. (2006). Separating between and within-cluster covariate effects using conditional and partitioning methods. J. Roy. Statist. Soc. Ser. B 68 859–872.
  • Neuhaus, J. M., McCulloch, C. E. and Boylan, R. (2011). A note on type II error under random effects misspecification in generalized linear mixed models. Biometrics 67 654–656.
  • Neuhaus, J. M. and McCulloch, C. E. (2011). Estimation of covariate effects in generalised linear mixed models with informative cluster sizes. Biometrika 98 147–162.
  • Piepho, H.-P. and McCulloch, C. E. (2004). Transformations in mixed models: Application to risk analysis for a multienvironment trial. J. Agric. Biol. Environ. Statist. 9 123–137.
  • Rasch, D. and Guiard, V. (2004). The robustness of parametric statistical methods. Psychology Science 46 175–208.
  • Raudenbush, S. and Bryk, A. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Sage Publications, Thousand Oaks.
  • Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
  • Selby, J. V., Fireman, B. H., Lundstrom, R. J., Swain, B. E., Truman, A. F., Wong, C. C., Froelicher, E. S., Barron, H. V. and Hlatky, M. A. (1996). Variation among hospitals in coronary-angiography practices and outcomes after myocardial infarction in a large health maintenance organization. New England Journal of Medicine 335 1888–1896.
  • Tao, H., Palta, M., Yandell, B. S. and Newton, M. A. (1999). An estimation method for the semiparametric mixed effects model. Biometrics 55 102–110.
  • Verbecke, G. and Lesaffre, E. (1997). The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput. Statist. Data Anal. 23 541–556.
  • White, H. (1994). Estimation, Inference, and Specification Analysis. Cambridge Univ. Press, Cambridge.
  • Williamson, J. M., Datta, S. and Satten, G. A. (2003). Marginal analyses of clustered data when cluster size is informative. Biometrics 59 36–42.
  • Zhang, D. and Davidian, M. (2001). Linear mixed models with flexible distribution of random effects for longitudinal data. Biometrics 57 795–802.
  • Zhang, P., Song, P. X. K., Qu, A. and Greene, T. (2008). Efficient estimation for patient-specific rates of disease progression using nonnormal linear mixed models. Biometrics 64 29–38.