The Annals of Applied Statistics

Regression trees for longitudinal and multiresponse data

Wei-Yin Loh and Wei Zheng

Full-text: Open access

Abstract

Previous algorithms for constructing regression tree models for longitudinal and multiresponse data have mostly followed the CART approach. Consequently, they inherit the same selection biases and computational difficulties as CART. We propose an alternative, based on the GUIDE approach, that treats each longitudinal data series as a curve and uses chi-squared tests of the residual curve patterns to select a variable to split each node of the tree. Besides being unbiased, the method is applicable to data with fixed and random time points and with missing values in the response or predictor variables. Simulation results comparing its mean squared prediction error with that of MVPART are given, as well as examples comparing it with standard linear mixed effects and generalized estimating equation models. Conditions for asymptotic consistency of regression tree function estimates are also given.

Article information

Source
Ann. Appl. Stat. Volume 7, Number 1 (2013), 495-522.

Dates
First available in Project Euclid: 9 April 2013

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1365527208

Digital Object Identifier
doi:10.1214/12-AOAS596

Mathematical Reviews number (MathSciNet)
MR3086428

Zentralblatt MATH identifier
06171281

Keywords
CART decision tree generalized estimating equation linear mixed effects model lowess missing values recursive partitioning selection bias

Citation

Loh, Wei-Yin; Zheng, Wei. Regression trees for longitudinal and multiresponse data. Ann. Appl. Stat. 7 (2013), no. 1, 495--522. doi:10.1214/12-AOAS596. https://projecteuclid.org/euclid.aoas/1365527208


Export citation

References

  • Abdolell, M., LeBlanc, M., Stephens, D. and Harrison, R. V. (2002). Binary partitioning for continuous longitudinal data: Categorizing a prognostic variable. Stat. Med. 21 3395–3409.
  • Alexander, C. S. and Markowitz, R. (1986). Maternal employment and use of pediatric clinic services. Med. Care 24 134–147.
  • Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository. Available at http://www.ics.uci.edu/~mlearn/MLRepository.html.
  • Bates, D. (2011). Linear mixed-effects models using S4 classes. R package version 0.999375-42.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Chaudhuri, P. and Loh, W.-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli 8 561–576.
  • Chaudhuri, P., Huang, M. C., Loh, W. Y. and Yao, R. (1994). Piecewise-polynomial regression trees. Statist. Sinica 4 143–167.
  • Chaudhuri, P., Lo, W. D., Loh, W.-Y. and Yang, C. C. (1995). Generalized regression trees. Statist. Sinica 5 641–666.
  • Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74 829–836.
  • De’ath, G. (2002). Multivariate regression trees: A new technique for modeling species-environment relationships. Ecology 83 1105–1117.
  • De’ath, G. (2012). MVPART: Multivariate partitioning. R package version 1.6-0.
  • Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
  • Fitzmaurice, G. M., Laird, N. M. and Ware, J. H. (2004). Applied Longitudinal Analysis. Wiley, Hoboken, NJ.
  • Härdle, W. (1990). Applied Nonparametric Regression. Econometric Society Monographs 19. Cambridge Univ. Press, Cambridge.
  • Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651–674.
  • Hsiao, W.-C. and Shih, Y.-S. (2007). Splitting variable selection for multivariate regression trees. Statist. Probab. Lett. 77 265–271.
  • Kim, H., Loh, W. Y., Shih, Y. S. and Chaudhuri, P. (2007). Visualizable and interpretable regression models with good prediction power. IIE Transactions 39 565–579.
  • Larsen, D. R. and Speckman, P. L. (2004). Multivariate regression trees for analysis of abundance data. Biometrics 60 543–549.
  • Lee, S. K. (2005). On generalized multivariate decision tree by using GEE. Comput. Statist. Data Anal. 49 1105–1119.
  • Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statist. Sinica 12 361–386.
  • Loh, W.-Y. (2009). Improving the precision of classification trees. Ann. Appl. Stat. 3 1710–1737.
  • Loh, W.-Y. and Shih, Y.-S. (1997). Split selection methods for classification trees. Statist. Sinica 7 815–840.
  • Segal, M. R. (1992). Tree structured methods for longitudinal data. J. Amer. Statist. Assoc. 87 407–418.
  • Shih, Y. S. (2004). A note on split selection bias in classification trees. Comput. Statist. Data Anal. 45 457–466.
  • Singer, J. D. and Willett, J. B. (2003). Applied Longitudinal Data Analysis. Oxford Univ. Press, New York.
  • Strobl, C., Boulesteix, A.-L. and Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Comput. Statist. Data Anal. 52 483–501.
  • Yan, J., Højsgaard and Halekoh, U. (2012). Generalized estimation equation solver. R package version 1.1-6.
  • Yeh, I. C. (2007). Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites 29 474–480.
  • Yu, Y. and Lambert, D. (1999). Fitting trees to functional data, with an application to time-of-day patterns. J. Comput. Graph. Statist. 8 749–762.
  • Zhang, H. (1998). Classification trees for multiple binary responses. J. Amer. Statist. Assoc. 93 180–193.
  • Zhang, H. and Ye, Y. (2008). A tree-based method for modeling a multivariate ordinal response. Stat. Interface 1 169–178.