The Annals of Applied Statistics

Prediction-based classification for longitudinal biomarkers

Andrea S. Foulkes, Livio Azzoni, Xiaohong Li, Margaret A. Johnson, Colette Smith, Karam Mounzer, and Luis J. Montaner

Full-text: Open access


Assessment of circulating CD4 count change over time in HIV-infected subjects on antiretroviral therapy (ART) is a central component of disease monitoring. The increasing number of HIV-infected subjects starting therapy and the limited capacity to support CD4 count testing within resource-limited settings have fueled interest in identifying correlates of CD4 count change such as total lymphocyte count, among others. The application of modeling techniques will be essential to this endeavor due to the typically nonlinear CD4 trajectory over time and the multiple input variables necessary for capturing CD4 variability. We propose a prediction-based classification approach that involves first stage modeling and subsequent classification based on clinically meaningful thresholds. This approach draws on existing analytical methods described in the receiver operating characteristic curve literature while presenting an extension for handling a continuous outcome. Application of this method to an independent test sample results in greater than 98% positive predictive value for CD4 count change. The prediction algorithm is derived based on a cohort of n = 270 HIV-1 infected individuals from the Royal Free Hospital, London who were followed for up to three years from initiation of ART. A test sample comprised of n = 72 individuals from Philadelphia and followed for a similar length of time is used for validation. Results suggest that this approach may be a useful tool for prioritizing limited laboratory resources for CD4 testing after subjects start antiretroviral therapy.

Article information

Ann. Appl. Stat., Volume 4, Number 3 (2010), 1476-1497.

First available in Project Euclid: 18 October 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Prediction classification receiver operator characteristic (ROC) curve generalized linear mixed effects modeling (GLMM) CD4 HIV/AIDS


Foulkes, Andrea S.; Azzoni, Livio; Li, Xiaohong; Johnson, Margaret A.; Smith, Colette; Mounzer, Karam; Montaner, Luis J. Prediction-based classification for longitudinal biomarkers. Ann. Appl. Stat. 4 (2010), no. 3, 1476--1497. doi:10.1214/10-AOAS326.

Export citation


  • Albert, P. (2007). Random effects modeling approaches to estimating ROC curves from repeated ordinal tests without a gold standard. Biometrics 63 593–602.
  • Badri, M. and Wood, R. (2003). Usefulness of total lymphocyte count in monitoring highly active antiretroviral therapy in resource-limited settings. AIDS 17 541–545.
  • Bagchi, S., Kempf, M., Westfall, A., Maherya, A., Willig, J. and Saag, M. (2007). Can routine clinical markers be used longitudinally to monitor antiretroviral therapy success in resource-limited settings? CID 44 135–138.
  • Bedell, R., Keath, K., Hogg, R., Wood, E., Press, N., Yip, B., O’Shaughnessy, M. and Montaner, J. (2003). Total lymphocyte count ass a possible surrogate of CD4 cell count to prioritize eligibility for antiretroviral therapy among HIV-infected individuals in resource-limited settings. Antivir. Ther. 8 379–384.
  • Bisson, G., Gross, R., Strom, J., Rollins, C., Bellamy, S., Weinstein, R., Friedman, H., Dickinson, D., Frank, I., Strom, B., Gaolathe, T. and Ndwapi, N. (2006). Diagnostic accuracy of CD4 cell count increase for virologic response after initiating highly active antiretroviral therapy. AIDS 20 1613–1619.
  • Bisson, G., Gross, R., Bellamy, S., Chittams, J., Hislop, M., Regensberg, L., Frank, I., Maartens, G. and Nachega, J. (2008). Pharmacy refill adherence compared with CD4 count changes for monitoring HIV-infected adults on antiretroviral therapy. PLoS Medicine 5 e109.
  • Chu, H., Gange, S., Yamashita, T., Hoover, D., Chmiel, J., Margolick, J. and Jacobson, L. (2005). Individual variation in CD4 cell count trajectory among human immunodeciency virus-infected men and women on long-term highly active antiretroviral therapy: An application using a Bayesian random change-point model. American Journal of Epidemiology 162 787–797.
  • Dodd, L. and Pepe, M. (2003). Semiparametric regression for the area under the receiver operating characteristic curve. J. Amer. Statist. Assoc. 98 409–417.
  • Emir, B., Wieand, S., Su, J. and Cha, S. (1998). Analysis of repeated markers used to predict progression of cancer. Stat. Med. 17 2563–2578.
  • Ferris, D., Dawood, H., Magula, N. and Lalloo, U. (2004). Application of an algorithm to predict CD4 lymphocyte count below 200 cells/mm2 in HIV-infected patients in South Africa. AIDS 18 1481–1482.
  • Fitzmaurice, G., Laird, N. and Ware, J. (2004). Applied Longitudinal Analysis. Wiley, Hoboken, NJ.
  • Foulkes, A. and DeGruttola, V. (2002). Characterizing the relationship between HIV-1 genotype and phenotype: Prediction based classification. Biometrics 58 145–156.
  • Foulkes, A. and DeGruttola, V. (2003). Characterizing classes of antiretroviral drugs by genotype. Stat. Med. 22 2637–2655.
  • Gatsonis, C. (1995). Random effects models for diagnostic accuracy. Academic Radiology 2 514–521.
  • Heagerty, P. J., Lumley, T. and Pepe, M. S. (2000). Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 56 337–344.
  • Kamya, M., Semitala, F., Quinn, T., Ronald, A., Njama-Meta, D., Mayania-Kizza, H., Katabira, E. and Spacek, L. (2004). Total lymphocyte count of 1200 is not a sensitive predictor of CD4 lymphocyte count among patients with HIV disease in Kampala, Uganda. Afr. Health Sci. 4 94–101.
  • Kumarasamy, N., Mahajan, A. P., Flanigan, T., Hemalatha, R., Mayer, K., Carpenter, C., Thyagarajan, S. and Solomon, S. (2002). Total lymphocyte count (TLC) is a useful tool for the timing of opportunistic infection prophylaxis in India and other resource-constrained countries. JAIDS 31 378–383.
  • Laird, N. M. and Ware, J. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • Mahajan, A., Hogan, J., Snyder, B., Kumarasamy, N., Mehta, K., Solomon, S., Carpenter, C., Mayer, K. and Flanigan, T. (2004). Changes in total lymphocyte count as a surrogate for changes in CD4 count following initiation of HAART: Implications for monitoring in resource-limited settings. Clinical Science 36 567–575.
  • McClean, R., Sanders, W. and Stroup, W. (1991). A unified approach to mixed linear models. Amer. Statist. 45 54–64.
  • McCulloch, C. E. and Searle, S. R. (2001). Generalized, Linear, and Mixed Models. Wiley, New York.
  • Pepe, M. (1998). Three approaches to regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54 124–135.
  • Pepe, M. (2000a). An interpretation for the ROC curve and inference using GLM procedures. Biometrics 56 352–359.
  • Pepe, M. (2000b). Receiver operating characteristic methodology. J. Amer. Statist. Assoc. 95 308–311.
  • Pepe, M. (2005). Evaluating technologies for classification and prediction in medicine. Stat. Med. 24 3687–3696.
  • Smith, C., Sabin, C., Lampe, F., Kinloch-de Loes, S., Gumley, H., Carroll, A., Prinz, B., Youle, M., Johnson, M. and Phillips, A. (2003). The potential for CD4 cell increases in HIV-positive individuals who control viraemia with highly active antiretroviral therapy. AIDS 17 963–969.
  • Smith, C., Sabin, C., Youle, M., Kinloch-de Loes, S., Lampe, F., Madge, S., Cropley, I., Johnson, M. and Phillips, A. (2004). Factors influencing increases in CD4 cell counts of HIV-positive persons receiving long-term highly active antiretroviral therapy. J. Infect. Dis. 190 1860–1868.
  • Spacek, L., Griswold, M., Quinn, T. and Moore, R. (2003). Total lymphocyte count and hemoglobin combined in an algorithm to initiate the use of highly active antiretroviral therapy in resource-limited settings. AIDS 17 1311–1317.
  • Tosteson, A. and Begg, C. (1988). A general regression methodology for ROC curve estimation. Medical Decision Making 8 204–215.
  • Tosteson, A., Weinstein, M., Wittenberg, J. and Begg, C. (1994). A general regression methodology for ROC curve estimation. Environmental Health Perspectives 102 73–78.
  • WHO-Report (2006). Antiretroviral therapy for HIV infection in adults and adolescents in resource-limited settings: Toward universal access. Available at
  • Zhou, X., Obuchowski, N. and McClish, D. (2002). Statistical Methods in Diagnostic Medicine. Wiley, New York.