The Annals of Applied Statistics

How Gaussian mixture models might miss detecting factors that impact growth patterns

Brianna C. Heggeseth and Nicholas P. Jewell

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Longitudinal studies play a prominent role in biological, social, and behavioral sciences. Repeated measurements over time facilitate the study of an outcome level, how individuals change over time, and the factors that may impact either or both. A standard approach to modeling childhood growth over time is to use multilevel or mixed effects models to study factors that might play a role in the level and growth over time. However, there has been increased interest in using mixture models, which have inherent grouping structure to more flexibly explain heterogeneity in the longitudinal outcomes, to study growth patterns. While several possible model specifications can be used, these methods generally fail to explicitly group individuals by the shape of their growth pattern separate from level, and thus fail to shed light on the relationships between growth pattern and potential explanatory factors. We illustrate the weaknesses of these methods as they are currently being used. We also propose a pre-processing step that removes the outcome level to focus explicitly on shape, discuss its impact on estimation, and demonstrate its usefulness though a simulation study and with real longitudinal data.

Article information

Ann. Appl. Stat. Volume 12, Number 1 (2018), 222-245.

Received: July 2016
Revised: May 2017
First available in Project Euclid: 9 March 2018

Permanent link to this document

Digital Object Identifier

Finite mixture model longitudinal data analysis latent variables growth curves


Heggeseth, Brianna C.; Jewell, Nicholas P. How Gaussian mixture models might miss detecting factors that impact growth patterns. Ann. Appl. Stat. 12 (2018), no. 1, 222--245. doi:10.1214/17-AOAS1066.

Export citation


  • Aitkin, M., Anderson, D. and Hinde, J. (1981). Statistical modelling of data on teaching styles. J. Roy. Statist. Soc. Ser. A 144 419–461.
  • Asparouhov, T. and Muthén, B. (2016). Structural equation models and mixture models with continuous nonnormal skewed distributions. Struct. Equ. Model. 23 1–19.
  • Brillinger, D. R. (1975). Time Series: Data Analysis and Theory. Holt, Rinehart and Winston, New York.
  • Carter, M. A., Dubois, L., Tremblay, M. S., Taljaard, M. and Jones, B. L. (2012). Trajectories of childhood weight gain: The relative importance of local environment versus individual social and early life factors. PLoS ONE 7 e47065.
  • Celeux, G. and Soromenho, G. (1996). An entropy criterion for assessing the number of clusters in a mixture model. J. Classification 13 195–212.
  • Cupul-Uicab, L. A., Hernández-Avila, M., Terrazas-Medina, E. A., Pennell, M. L. and Longnecker, M. P. (2010). Prenatal exposure to the major DDT metabolite 1,1-dichloro-2,2-bis(p-chlorophenyl)ethylene (DDE) and growth in boys from Mexico. Environ. Res. 110 595–603.
  • Cupul-Uicab, L. A., Klebanoff, M. A., Brock, J. W. and Longnecker, M. P. (2013). Prenatal exposure to persistent organochlorines and childhood obesity in the US collaborative perinatal project. Environmental Health Perspectives 121 1103–1109.
  • Curry, H. B. and Schoenberg, I. J. (1966). On Pólya frequency functions IV: The fundamental spline functions and their limits. Journal d’Analyse Mathématique 17 71–107.
  • D’Urso, P. (2000). Dissimilarity measures for time trajectories. Stat. Methods Appl. 9 53–83.
  • Davies, C. E., Glonek, G. F. V. and Giles, L. C. (2015). The impact of covariance misspecification in group-based trajectory models for longitudinal data with non-stationary covariance structure. Stat. Methods Med. Res. Preprint. Available online doi:10.1177/0962280215598806.
  • Deeks, S. G., Hecht, F. M., Swanson, M., Elbeik, T., Loftus, R., Cohen, P. T. and Grant, R. M. (1999). HIV RNA and CD4 cell count response to protease inhibitor therapy in an urban AIDS clinic: Response to both initial and salvage therapy. AIDS 13 35–43.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. Stat. Methodol. 39 1–38.
  • De Boor, C. (1976). Splines as linear combinations of B-splines. A survey. In Approximation Theory II (G. G. Lorentz, C. K. Chui and L. L. Schumaker, eds.) 1–47. Academic Press, New York.
  • De Boor, C. (1978). A Practical Guide to Splines. Springer, New York.
  • Diallo, T. M. O., Morin, A. J. S. and Lu, H. (2016). Impact of misspecifications of the latent variance? Covariance and residual matrices on the class enumeration accuracy of growth mixture models. Struct. Equ. Model. 23 507–531.
  • Diggle, P., Heagerty, P., Liang, K. Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Univ. Press, New York.
  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Ann. Statist. 7 1–26.
  • Efron, B. (1982). The Jackknife, the Bootstrap and Other Resampling Plans. CBMS-NSF Regional Conference Series in Applied Mathematics 38. SIAM, Philadelphia, PA.
  • Erosheva, E. A., Matsueda, R. L. and Telesca, D. (2014). Breaking bad: Two decades of life-course data analysis in criminology, developmental psychology, and beyond. Annual Review of Statistics and Its Application 1 301–332.
  • Eskenazi, B., Bradman, A., Gladstone, E. A., Jaramillo, S., Birch, K. and Holland, N. (2003). CHAMACOS, a longitudinal birth cohort study: Lessons from the fields. Journal of Children’s Health 1 3–27.
  • Eskenazi, B., Harley, K., Bradman, A., Weltzien, E., Jewell, N. P., Barr, D. B., Furlong, C. E. and Holland, N. T. (2004). Association of in utero organophosphate pesticide exposure and fetal growth and length of gestation in an agricultural population. Environmental Health Perspectives 112 1116–1124.
  • Eskenazi, B., Gladstone, E. A., Berkowitz, G. S., Drew, C. H., Faustman, E. M., Holland, N. T., Lanphear, B., Meisel, S. J., Perera, F. P., Rauh, V. A., Sweeney, A., Whyatt, R. M. and Yolton, K. (2005). Methodologic and logistic issues in conducting longitudinal birth cohort studies: Lessons learned from the Centers for Children’s Environmental Health and Disease Prevention Research. Environmental Health Perspectives 113 1419–1429.
  • Eubank, R. L. (1999). Nonparametric Regression and Spline Smoothing. Dekker, New York, NY.
  • Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions. Chapman & Hall, London.
  • Everitt, B. S., Landau, S., Leese, M. and Stahl, D. (2011). Cluster Analysis, 5th ed. Wiley, London.
  • Feng, Z. D. and McCulloch, C. E. (1996). Using bootstrap likelihood ratios in finite mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 609–617.
  • Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41 578–588.
  • Garden, F. L., Marks, G. B., Simpson, J. M. and Webb, K. L. (2012). Body mass index (BMI) trajectories from birth to 11.5 years: Relation to early life food intake. Nutrients 4 1382–1398.
  • Gray, G. (1994). Bias in misspecified mixtures. Biometrics 50 457–470.
  • Grün, B. and Leisch, F. (2008). FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28 1–35.
  • Heggeseth, B. (2018a). Supplement to “How Gaussian mixture models might miss detecting factors that impact growth patterns.” DOI:10.1214/17-AOAS1066SUPPA.
  • Heggeseth, B. (2018b). Supplement to “How Gaussian mixture models might miss detecting factors that impact growth patterns.” DOI:10.1214/17-AOAS1066SUPPB.
  • Heggeseth, B. C. and Jewell, N. P. (2013). The impact of covariance misspecification in multivariate Gaussian mixtures on estimation and inference: An application to longitudinal modeling. Stat. Med. 32 2790–2803.
  • Heo, M., Faith, M. S., Mott, J. W., Gorman, B. S., Redden, D. T. and Allison, D. B. (2003). Hierarchical linear models for the development of growth curves: An example with body mass index in overweight/obese adults. Stat. Med. 22 1911–1942.
  • Huang, Y., Chen, J. and Yin, P. (2017). Hierarchical mixture models for longitudinal immunologic data with heterogeneity, non-normality, and missingness. Stat. Methods Med. Res. 26 223–247.
  • Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.
  • Jennrich, R. I. and Schluchter, M. D. (1986). Unbalanced repeated-measures models with structured covariance matrices. Biometrics 42 805–820.
  • Jones, B. L., Nagin, D. S. and Roeder, K. (2001). A SAS procedure based on mixture models for estimating developmental trajectories. Sociol. Methods Res. 29 374–393.
  • Leisch, F. (2004). FlexMix: A general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11 1–18.
  • Lu, X. and Huang, Y. (2014). Bayesian analysis of nonlinear mixed-effects mixture models for longitudinal data with heterogeneity and skewness. Stat. Med. 33 2830–2849.
  • McLachlan, G. J. and Basford, K. E. (1988). Mixture Models: Inference and Applications to Clustering. Dekker, New York.
  • McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. Wiley, New York.
  • Mendez, M. A., Garcia-Esteban, R., Guxens, M., Vrijheid, M., Kogevinas, M., Goñi, F., Fochs, S. and Sunyer, J. (2011). Prenatal organochlorine compound exposure, rapid weight gain, and overweight in infancy. Environmental Health Perspectives 119 272–278.
  • Möller-Levet, C., Klawonn, F., Cho, K. H. and Wolkenhauer, O. (2003). Fuzzy clustering of short time-series and unevenly distributed sampling points. In Proceedings of the Fifth International Conference on Intelligent Data Analysis (M. R. Berthold, H.-J. Lenz, E. Bradley and C. Borgelt, eds.) 330–340.
  • Morin, A. J. S. and Marsh, H. W. (2015). Disentangling shape from level effects in person-centered analyses: An illustration based on university teachers? Multidimensional profiles of effectiveness. Struct. Equ. Model. 22 39–59.
  • Morin, A. J. S., Maïano, C., Marsh, H. W., Nagengast, B. and Janosz, M. (2013). School life and adolescents’ self-esteem trajectories. Child Dev. 84 1967–1988.
  • Muthén, B. and Asparouhov, T. (2009). Multilevel regression mixture analysis. J. Roy. Statist. Soc. Ser. A 172 639–657.
  • Muthén, L. K. and Muthén, B. O. (1998–2010). Mplus User’s Guide, 6th ed., Los Angeles.
  • Muthén, B. and Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the EM algorithm. Biometrics 55 463–469.
  • Muthén, B., Brown, C. H., Masyn, K., Jo, B., Khoo, S. T., Yang, C. C., Wang, C. P., Kellam, S. G., Carlin, J. B. and Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics 3 459–475.
  • Nagin, D. S. (1999). Analyzing developmental trajectories: A semiparametric, group-based approach. Psychological Methods 4 139–157.
  • Nagin, D. S. and Odgers, C. L. (2010a). Group-based trajectory modeling (nearly) two decades later. J. Quant. Criminol. 26 445–453.
  • Nagin, D. S. and Odgers, C. L. (2010b). Group-based trajectory modeling in clinical research. Annual Review of Clinical Psychology 6 109–138.
  • Park, T., Yi, S.-G., Kang, S.-H., Lee, S., Lee, Y.-S. and Simon, R. (2003). Evaluation of normalization methods for microarray data. BMC Bioinform. 4 1–13.
  • Pearson, K. (1894). Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 185 71–110.
  • Pickles, A. and Croudace, T. (2010). Latent mixture models for multivariate and longitudinal outcomes. Stat. Methods Med. Res. 19 271–289.
  • Proust-Lima, C., Philipps, V., Diakite, A. and Liquet, B. (2014). lcmm: Estimation of extended mixed models using latent classes and latent processes. R package version 1.6.4.
  • Pryor, L. E., Tremblay, R. E., Boivin, M., Touchette, E., Dubois, L., Genolini, L., Xuecheng, C., Falissard, B. and Côté, S. M. (2011). Developmental trajectories of body mass index in early childhood and their risk factors: An 8-year longitudinal study. Archives of Pediatrics & Adolescent Medicine 165 906–912.
  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. J. Amer. Statist. Assoc. 66 846–850.
  • Ruppert, D. (2002). Selecting the number of knots for penalized splines. J. Comput. Graph. Statist. 11 735–757.
  • Schlattmann, P. and Böhning, D. (1997). On Bayesian analysis of mixtures with an unknown number of components. Contribution to a paper by S. Richardson and PJ Green. J. R. Stat. Soc. Ser. B. Stat. Methodol. 59 782–783.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Shumway, R. H. and Stoffer, D. S. (2010). Time Series Analysis and Its Applications: With R Examples. Springer Science & Business Media, New York.
  • Singer, J. D. and Willett, J. B. (2003). Applied Longitudinal Data Analysis: Modeling Change and Event Occurrence. Oxford Univ. Press, New York, NY.
  • Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985). Statistical Analysis of Finite Mixture Distributions. Wiley, New York.
  • Valvi, D., Mendez, M. A., Martinez, D., Grimalt, J. O., Torrent, M., Sunyer, J. and Vrijheid, M. (2012). Prenatal concentrations of polychlorinated biphenyls, DDE, and DDT and overweight in children: A prospective birth cohort study. Environmental Health Perspectives 120 451–457.
  • Warner, M., Aguilar Schall, R., Harley, K. G., Bradman, A., Barr, D. and Eskenazi, B. (2013). In utero DDT and DDE exposure and obesity status of 7-year-old Mexican-American children in the CHAMACOS cohort. Environmental Health Perspectives 121 631–636.
  • Warner, M., Wesselink, A., Harley, K. G., Bradman, A., Kogut, K. and Eskenazi, B. (2014). Prenatal exposure to dichlorodiphenyltrichloroethane and obesity at 9 years of age in the CHAMACOS study cohort. Am. J. Epidemiol. 179 1312–1322.
  • Wedel, M. (2002). Concomitant variables in finite mixture models. Stat. Neerl. 56 362–375.

Supplemental materials

  • Supplement A: Growth simulation. The supplement includes a description and the results an additional simulation study that mimics real childhood growth data.
  • Supplement B: Additional CHAMACOS results. The supplement includes the relative risk ratio estimates from the CHAMACOS data example.