The Annals of Applied Statistics

Joint modeling of longitudinal drug using pattern and time to first relapse in cocaine dependence treatment data

Jun Ye, Yehua Li, and Yongtao Guan

Full-text: Open access


An important endpoint variable in a cocaine rehabilitation study is the time to first relapse of a patient after the treatment. We propose a joint modeling approach based on functional data analysis to study the relationship between the baseline longitudinal cocaine-use pattern and the interval censored time to first relapse. For the baseline cocaine-use pattern, we consider both self-reported cocaine-use amount trajectories and dichotomized use trajectories. Variations within the generalized longitudinal trajectories are modeled through a latent Gaussian process, which is characterized by a few leading functional principal components. The association between the baseline longitudinal trajectories and the time to first relapse is built upon the latent principal component scores. The mean and the eigenfunctions of the latent Gaussian process as well as the hazard function of time to first relapse are modeled nonparametrically using penalized splines, and the parameters in the joint model are estimated by a Monte Carlo EM algorithm based on Metropolis–Hastings steps. An Akaike information criterion (AIC) based on effective degrees of freedom is proposed to choose the tuning parameters, and a modified empirical information is proposed to estimate the variance–covariance matrix of the estimators.

Article information

Ann. Appl. Stat., Volume 9, Number 3 (2015), 1621-1642.

Received: September 2014
Revised: May 2015
First available in Project Euclid: 2 November 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Akaike information criterion EM algorithm functional principal components generalized longitudinal data interval censoring Metropolis–Hastings algorithm penalized splines


Ye, Jun; Li, Yehua; Guan, Yongtao. Joint modeling of longitudinal drug using pattern and time to first relapse in cocaine dependence treatment data. Ann. Appl. Stat. 9 (2015), no. 3, 1621--1642. doi:10.1214/15-AOAS852.

Export citation


  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Cai, T. and Betensky, R. A. (2003). Hazard regression for interval-censored data with penalized spline. Biometrics 59 570–579.
  • Carroll, K. C., Power, M., Bryant, K. and Rounsaville, B. J. (1993). One year follow-up status of treatment-seeking cocaine abusers: Psychopathology and dependence severity as predictors of outcome. Journal of Nervous and Mental Disease 181 71–79.
  • Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Monographs on Statistics and Applied Probability 105. Chapman & Hall/CRC, Boca Raton, FL.
  • Crainiceanu, C. M., Staicu, A.-M. and Di, C.-Z. (2009). Generalized multilevel functional regression. J. Amer. Statist. Assoc. 104 1550–1561.
  • Eilers, P. H. C. and Marx, B. D. (1996). Flexible smoothing with $B$-splines and penalties. Statist. Sci. 11 89–121.
  • Fals-Stewart, W., O’Farrell, T.-J., Freitas, T.-T., McFarlin, S.-K. and Rutigliano, P. (2000). The timeline follow-back reports of psychoactive substance use by drug-abusing patients: Psychometric properties. Journal of Consulting and Clinical Psychology 68 134–144.
  • First, M., Spitzer, R., Gibbon, M. and Williams, J. (1995). Structured Clinical Interview for DSMIV: Patient Edition. American Psychiatric Press, Washington, DC.
  • Fox, H. C., Garcia, M., Kemp, K., Milivojevic, V., Kreek, M. J. and Sinha, R. (2006). Gender differences in cardiovascular and corticoadrenal response to stress and drug cues in cocaine dependent individuals. Psychopharmacology (Berl.) 185 348–357.
  • Guan, Y., Li, Y. and Sinha, R. (2011). Cocaine dependence treatment data: Methods for measurement error problems with predictors derived from stationary stochastic processes. J. Amer. Statist. Assoc. 106 480–493.
  • Hall, P., Müller, H.-G. and Wang, J.-L. (2006). Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 34 1493–1517.
  • Hall, P., Müller, H.-G. and Yao, F. (2008). Modelling sparse generalized longitudinal observations with latent Gaussian processes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 703–723.
  • Ibrahim, J. G., Zhu, H. and Tang, N. (2008). Model selection criteria for missing-data problems using the EM algorithm. J. Amer. Statist. Assoc. 103 1648–1658.
  • James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika 87 587–602.
  • Jones, G. L., Haran, M., Caffo, B. S. and Neath, R. (2006). Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 101 1537–1547.
  • Kampman, K. M., Volpicelli, J. R., Mulvaney, F., Alterman, A. I., Cornish, J., Gariti, P., Cnaan, A., Poole, S., Muller, E., Acosta, T., Luce, D. and O’Brien, C. (2001). Effectiveness of propranolol for cocaine dependence treatment may depend on cocaine withdrawal symptom severity. Drug Alcohol Depend. 63 69–78.
  • Kooperberg, C. and Clarkson, D. B. (1997). Hazard regression with interval-censored data. Biometrics 53 1485–1494.
  • Li, Y. and Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Statist. 38 3321–3351.
  • Li, Y., Wang, N. and Carroll, R. J. (2010). Generalized functional linear models with semiparametric single-index interactions. J. Amer. Statist. Assoc. 105 621–633.
  • Lin, X. and Carroll, R. J. (2001). Semiparametric regression for clustered data using generalized estimating equations. J. Amer. Statist. Assoc. 96 1045–1056.
  • Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. J. R. Stat. Soc. Ser. B. Stat. Methodol. 44 226–233.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman & Hall, London.
  • McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models. J. Amer. Statist. Assoc. 92 162–170.
  • Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. J. R. Stat. Soc. Ser. B. Stat. Methodol. 51 127–138.
  • Meng, X.-L. and Rubin, D. B. (1991). Using EM to obtain asymptotic variance–covariance matrices: The SEM algorithm. J. Amer. Statist. Assoc. 86 899–909.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • Ratcliffe, S. J., Guo, W. and Ten Have, T. R. (2004). Joint modeling of longitudinal and survival data via a common frailty. Biometrics 60 892–899.
  • Rosenberg, P. S. (1995). Hazard function estimation using B-splines. Biometrics 51 874–887.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Sinha, R. (2001). How does stress increase risk of drug abuse and relapse? Psychopharmacology (Berl.) 158 343–359.
  • Sinha, R. (2007). The role of stress in addiction relapse. Curr. Psychiatry Rep. 9 388–395.
  • Sinha, R., Garcia, M., Paliwal, P., Kreek, M. J. and Rounsaville, B. J. (2006). Stress-induced cocaine craving and hypothalamic-pituitary-adrenal responses are predictive of cocaine relapse outcomes. Arch. Gen. Psychiatry 63 324–331.
  • Sobell, L. and Sobell, M. (1993). Timeline follow back: A technique for assessing self-reported ethanol consumption. In Techniques to Assess Alcohol Consumption (J. Allen and R. Litten, eds.). Humana Press, Totowa, NJ.
  • Su, Y.-R. and Wang, J.-L. (2012). Modeling left-truncated and right-censored survival data with longitudinal covariates. Ann. Statist. 40 1465–1488.
  • Sun, J. (2006). The Statistical Analysis of Interval-Censored Failure Time Data. Springer, New York.
  • Wei, J. and Zhou, L. (2010). Model selection using modified AIC and BIC in joint modeling of paired functional data. Statist. Probab. Lett. 80 1918–1924.
  • Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53 330–339.
  • Yan, J. and Fine, J. P. (2005). Functional association models for multivariate survival processes. J. Amer. Statist. Assoc. 100 184–196.
  • Yao, F. (2007). Functional principal component analysis for longitudinal and survival data. Statist. Sinica 17 965–983.
  • Yao, F. (2008). Functional approach of flexibly modelling generalized longitudinal data and survival time. J. Statist. Plann. Inference 138 995–1009.
  • Yao, F., Müller, H.-G. and Wang, J.-L. (2005a). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–590.
  • Yao, F., Müller, H.-G. and Wang, J.-L. (2005b). Functional linear regression analysis for longitudinal data. Ann. Statist. 33 2873–2903.
  • Ye, J., Li, Y. and Guan, Y. (2015). Supplement to “Joint modeling of longitudinal drug using pattern and time to first relapse in cocaine dependence treatment data.” DOI:10.1214/15-AOAS852SUPP.
  • Zhang, Y., Hua, L. and Huang, J. (2010). A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval-censored data. Scand. J. Stat. 37 338–354.
  • Zhou, L., Huang, J. Z. and Carroll, R. J. (2008). Joint modelling of paired sparse functional data using principal components. Biometrika 95 601–619.
  • Zhou, L., Huang, J. Z., Martinez, J. G., Maity, A., Baladandayuthapani, V. and Carroll, R. J. (2010). Reduced rank mixed effects models for spatially correlated hierarchical functional data. J. Amer. Statist. Assoc. 105 390–400.

Supplemental materials

  • Supplement A. The online supplementary material for this paper contains the technical details of the MCEM algorithm to fit the model, estimation of the covariance matrix of the estimator, additional simulation results and sensitivity analysis in the real data analysis.