Statistical Science
- Statist. Sci.
- Volume 26, Number 3 (2011), 388-402.
Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter
Charles E. McCulloch and John M. Neuhaus
Full-text: Open access
Abstract
Statistical models that include random effects are commonly used to analyze longitudinal and correlated data, often with strong and parametric assumptions about the random effects distribution. There is marked disagreement in the literature as to whether such parametric assumptions are important or innocuous. In the context of generalized linear mixed models used to analyze clustered or longitudinal data, we examine the impact of random effects distribution misspecification on a variety of inferences, including prediction, inference about covariate effects, prediction of random effects and estimation of random effects variances. We describe examples, theoretical calculations and simulations to elucidate situations in which the specification is and is not important. A key conclusion is the large degree of robustness of maximum likelihood for a wide variety of commonly encountered situations.
Article information
Source
Statist. Sci., Volume 26, Number 3 (2011), 388-402.
Dates
First available in Project Euclid: 31 October 2011
Permanent link to this document
https://projecteuclid.org/euclid.ss/1320066927
Digital Object Identifier
doi:10.1214/11-STS361
Mathematical Reviews number (MathSciNet)
MR2917962
Zentralblatt MATH identifier
1246.62169
Keywords
Maximum likelihood mixed models parametric modeling
Citation
McCulloch, Charles E.; Neuhaus, John M. Misspecifying the Shape of a Random Effects Distribution: Why Getting It Wrong May Not Matter. Statist. Sci. 26 (2011), no. 3, 388--402. doi:10.1214/11-STS361. https://projecteuclid.org/euclid.ss/1320066927
References
- Agresti, A., Caffo, B. and Ohman-Strickland, P. (2004). Examples in which misspecification of a random effects distribution reduces efficiency, and possible remedies. J. Comput. Graph. Statist. 47 639–653.Mathematical Reviews (MathSciNet): MR2100566
- Aitkin, M. (1999). A general maximum likelihood analysis of variance components in generalized linear models. Biometrics 55 117–128.Mathematical Reviews (MathSciNet): MR1705676
Digital Object Identifier: doi:10.1111/j.0006-341X.1999.00117.x - Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Czáki, eds.) 267–281. Akademiai Kiadó, Budapest.
- Auble, T. E., Hsieh, M., McCausland, J. B. and Yealy, D. M. (2007). Comparison of four clinical prediction rules for estimating risk in heart failure. Annals of Emergency Medicine 50 127–135.
- Benhin, E., Rao, J. N. K. and Scott, A. J. (2005). Mean estimating equation approach to analysing cluster-correlated data with nonignorable cluster sizes. Biometrika 92 435–450.Mathematical Reviews (MathSciNet): MR2201369
Zentralblatt MATH: 1094.62078
Digital Object Identifier: doi:10.1093/biomet/92.2.435 - Butler, S. M. and Louis, T. A. (1992). Random effects models with non-parametric priors. Stat. Med. 11 1981–2000.
- Caffo, B., An, M. and Rohde, C. (2007). Flexible random intercept models for binary outcomes using mixtures of normals. Comput. Statist. Data Anal. 51 5220–5235.Mathematical Reviews (MathSciNet): MR2370867
- Chen, J., Zhang, D. and Davidian, M. (2002). A Monte Carlo EM algorithm for generalized linear mixed models with flexible random effects distribution. Biostatistics (Oxford) 3 347–360.
- Davidian, M. and Gallant, A. R. (1993). The nonlinear mixed effects model with a smooth random effects density. Biometrika 80 475–488.Mathematical Reviews (MathSciNet): MR1248015
Zentralblatt MATH: 0788.62028
Digital Object Identifier: doi:10.1093/biomet/80.3.475 - Ghidey, W., Lesaffre, E. and Eilers, P. (2004). Smooth random effects distribution in a linear mixed model. Biometrics 60 945–953.Mathematical Reviews (MathSciNet): MR2133547
Digital Object Identifier: doi:10.1111/j.0006-341X.2004.00250.x - Ghidey, W., Lesaffre, E. and Verbeke, G. (2010). A comparison of methods for estimating the random effects distribution of a linear mixed model. Stat. Methods Med. Res. 19 565–600.Mathematical Reviews (MathSciNet): MR2744512
Digital Object Identifier: doi:10.1177/0962280208091686 - He, Y. and Raghunathan, T. (2006). Tukey’s gh distribution for multiple imputation. Amer. Statist. 60 251–256.Mathematical Reviews (MathSciNet): MR2246758
Digital Object Identifier: doi:10.1198/000313006X126819 - Heagerty, P. J. and Kurland, B. F. (2001). Misspecified maximum likelihood estimates and generalised linear mixed models. Biometrika 88 973–985.
- Heagerty, P. J. and Zeger, S. L. (2000). Marginalized multilevel models and likelihood inference (with comments and a rejoinder by the authors). Statist. Sci. 15 1–26.
- Heckman, J. and Singer, B. (1984). A method for minimizing the impact of distributional assumptions in econometric models for duration data. Econometrica 52 271–320.
- Hoffman, E. B., Sen, P. K. and Weinberg, C. R. (2001). Within-cluster resampling. Biometrika 88 1121–1134.Mathematical Reviews (MathSciNet): MR1872223
Zentralblatt MATH: 0986.62047
Digital Object Identifier: doi:10.1093/biomet/88.4.1121 - Huang, X. (2009). Diagnosis of random-effect model misspecification in generalized linear mixed models for binary response. Biometrics 65 361–368.Mathematical Reviews (MathSciNet): MR2751459
Digital Object Identifier: doi:10.1111/j.1541-0420.2008.01103.x - Huber, P. J. (1967). The Behavior of Maximum Likelihood Estimates Under Nonstandard Conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 1 (L. M. Le Cam and J. Neyman, eds.) 221–223. Univ. California Press, Berkeley.
- Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B. and Vittinghoff, E. (1998). Randomized trial of estrogen plus progestin for secondary prevention of coronary heart disease in postmenopausal women. Heart and Estrogen/progestin Replacement Study (HERS) Research Group. Journal of the American Medical Association 280 605–613.
- Kullback, S. C. (1959). Information Theory and Statistics. Wiley, New York.Mathematical Reviews (MathSciNet): MR103557
- Laird, N. (1978). Nonparametric maximum likelihood estimation of a mixing distribution. J. Amer. Statist. Assoc. 73 805–811.
- Lee, Y. and Nelder, J. A. (2004). Conditional and marginal models: Another view. Statist. Sci. 19 219–238.Mathematical Reviews (MathSciNet): MR2140539
Digital Object Identifier: doi:10.1214/088342304000000305
Project Euclid: euclid.ss/1105714159 - Lesaffre, E. and Molenberghs, G. (2001). Multivariate probit analysis: A neglected procedure in medical statistics. Stat. Med. 10 1391–1403.
- Litière, S., Alonso, A. and Molenberghs, G. (2007). Type I and type II error under random-effects misspecification in generalized linear mixed models. Biometrics 63 1038–1044.Mathematical Reviews (MathSciNet): MR2414580
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00782.x - Litière, S., Alonso, A. and Molenberghs, G. (2008). The impact of a misspecified random-effects distribution on the estimation and the performance of inferential procedures in generalized linear mixed models. Stat. Med. 27 3125–3144.Mathematical Reviews (MathSciNet): MR2522153
- Magder, L. S. and Zeger, S. L. (1996). A smooth nonparametric estimate of a mixing distribution using mixtures of Gaussians. J. Amer. Statist. Assoc. 91 1141–1151.Mathematical Reviews (MathSciNet): MR1424614
Zentralblatt MATH: 0882.62033
Digital Object Identifier: doi:10.2307/2291733 - McCulloch, C. E. and Neuhaus, J. M. (2011). Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics 67 270–279.
- McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. (2008). Generalized, Linear and Mixed Models, 2nd ed. Wiley, New York.Mathematical Reviews (MathSciNet): MR2431553
- Metlay, J. P., Camargo, C. A., MacKenzie, T., McCulloch, C. E., Maselli, J., Levin, S. K., Kersey, A., Gonzales, R. and the IMPAACT Investigators (2007). Cluster-randomized trial to improve antibiotic use for adults with acute respiratory infections treated in emergency departments. Annals of Emergency Medicine 50 221–230.
- Neuhaus, J. M., Hauck, W. W. and Kalbfleisch, J. D. (1992). The effects of mixture distribution misspecification when fitting mixed-effects logistic models. Biometrika 79 755–762.
- Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. (1994). Conditions for consistent estimation in mixed-effects models for binary matched pairs data. Canad. J. Statist. 22 139–148.
- Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between- and within-cluster covariate effects in the analysis of clustered data. Biometrics 54 638–645.
- Neuhaus, J. M. and McCulloch, C. E. (2006). Separating between and within-cluster covariate effects using conditional and partitioning methods. J. Roy. Statist. Soc. Ser. B 68 859–872.Mathematical Reviews (MathSciNet): MR2301298
Zentralblatt MATH: 1110.62093
Digital Object Identifier: doi:10.1111/j.1467-9868.2006.00570.x - Neuhaus, J. M., McCulloch, C. E. and Boylan, R. (2011). A note on type II error under random effects misspecification in generalized linear mixed models. Biometrics 67 654–656.
- Neuhaus, J. M. and McCulloch, C. E. (2011). Estimation of covariate effects in generalised linear mixed models with informative cluster sizes. Biometrika 98 147–162.
- Piepho, H.-P. and McCulloch, C. E. (2004). Transformations in mixed models: Application to risk analysis for a multienvironment trial. J. Agric. Biol. Environ. Statist. 9 123–137.
- Rasch, D. and Guiard, V. (2004). The robustness of parametric statistical methods. Psychology Science 46 175–208.
- Raudenbush, S. and Bryk, A. (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Sage Publications, Thousand Oaks.
- Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.Mathematical Reviews (MathSciNet): MR1190470
- Selby, J. V., Fireman, B. H., Lundstrom, R. J., Swain, B. E., Truman, A. F., Wong, C. C., Froelicher, E. S., Barron, H. V. and Hlatky, M. A. (1996). Variation among hospitals in coronary-angiography practices and outcomes after myocardial infarction in a large health maintenance organization. New England Journal of Medicine 335 1888–1896.
- Tao, H., Palta, M., Yandell, B. S. and Newton, M. A. (1999). An estimation method for the semiparametric mixed effects model. Biometrics 55 102–110.Mathematical Reviews (MathSciNet): MR1705675
Digital Object Identifier: doi:10.1111/j.0006-341X.1999.00102.x - Verbecke, G. and Lesaffre, E. (1997). The effect of misspecifying the random-effects distribution in linear mixed models for longitudinal data. Comput. Statist. Data Anal. 23 541–556.Mathematical Reviews (MathSciNet): MR1437679
- White, H. (1994). Estimation, Inference, and Specification Analysis. Cambridge Univ. Press, Cambridge.
- Williamson, J. M., Datta, S. and Satten, G. A. (2003). Marginal analyses of clustered data when cluster size is informative. Biometrics 59 36–42.
- Zhang, D. and Davidian, M. (2001). Linear mixed models with flexible distribution of random effects for longitudinal data. Biometrics 57 795–802.Mathematical Reviews (MathSciNet): MR1859815
Digital Object Identifier: doi:10.1111/j.0006-341X.2001.00795.x - Zhang, P., Song, P. X. K., Qu, A. and Greene, T. (2008). Efficient estimation for patient-specific rates of disease progression using nonnormal linear mixed models. Biometrics 64 29–38.Mathematical Reviews (MathSciNet): MR2422816
Digital Object Identifier: doi:10.1111/j.1541-0420.2007.00824.x

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Parametric estimation. Finite sample theory
Spokoiny, Vladimir, Annals of Statistics, 2012 - A Dirichlet Process Mixture Model for Non-Ignorable Dropout
Moore, Camille M., Carlson, Nichole E., MaWhinney, Samantha, and Kreidler, Sarah, Bayesian Analysis, 2020 - A robust and efficient approach to causal inference based on sparse sufficient dimension reduction
Ma, Shujie, Zhu, Liping, Zhang, Zhiwei, Tsai, Chih-Ling, and Carroll, Raymond J., Annals of Statistics, 2019
- Parametric estimation. Finite sample theory
Spokoiny, Vladimir, Annals of Statistics, 2012 - A Dirichlet Process Mixture Model for Non-Ignorable Dropout
Moore, Camille M., Carlson, Nichole E., MaWhinney, Samantha, and Kreidler, Sarah, Bayesian Analysis, 2020 - A robust and efficient approach to causal inference based on sparse sufficient dimension reduction
Ma, Shujie, Zhu, Liping, Zhang, Zhiwei, Tsai, Chih-Ling, and Carroll, Raymond J., Annals of Statistics, 2019 - Regression Theory for Categorical Time Series
Fokianos, Konstantinos and Kedem, Benjamin, Statistical Science, 2003 - Mitigating Bias in Generalized Linear Mixed Models: The Case for Bayesian Nonparametrics
Antonelli, Joseph, Trippa, Lorenzo, and Haneuse, Sebastien, Statistical Science, 2016 - Inference for censored quantile regression models in longitudinal studies
Wang, Huixia Judy and Fygenson, Mendel, Annals of Statistics, 2009 - Bayesian Nonparametric Inference – Why and How
Müller, Peter and Mitra, Riten, Bayesian Analysis, 2013 - Parametrically guided local quasi-likelihood with censored data
Talamakrouni, Majda, El Ghouch, Anouar, and Van Keilegom, Ingrid, Electronic Journal of Statistics, 2017 - Effects of Mismodeling on Tests of Association Based on Logistic Regression Models
Begg, Melissa Dowd and Lagakos, Stephen, Annals of Statistics, 1992 - Semi-parametric estimation for conditional independence multivariate finite mixture models
Chauveau, Didier, Hunter, David R., and Levine, Michael, Statistics Surveys, 2015
