The Annals of Applied Statistics

Estimating and comparing cancer progression risks under varying surveillance protocols

Jane M. Lange, Roman Gulati, Amy S. Leonardson, Daniel W. Lin, Lisa F. Newcomb, Bruce J. Trock, H. Ballentine Carter, Peter R. Carroll, Matthew R. Cooperberg, Janet E. Cowan, Lawrence H. Klotz, and Ruth Etzioni

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Outcomes after cancer diagnosis and treatment are often observed at discrete times via doctor-patient encounters or specialized diagnostic examinations. Despite their ubiquity as endpoints in cancer studies, such outcomes pose challenges for analysis. In particular, comparisons between studies or patient populations with different surveillance schema may be confounded by differences in visit frequencies. We present a statistical framework based on multistate and hidden Markov models that represents events on a continuous time scale given data with discrete observation times. To demonstrate this framework, we consider the problem of comparing risks of prostate cancer progression across multiple active surveillance cohorts with different surveillance frequencies. We show that the different surveillance schedules partially explain observed differences in the progression risks between cohorts. Our application permits the conclusion that differences in underlying cancer progression risks across cohorts persist after accounting for different surveillance frequencies.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1773-1795.

Received: May 2017
Revised: September 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Hidden Markov model multistate model panel data prostate cancer active surveillance


Lange, Jane M.; Gulati, Roman; Leonardson, Amy S.; Lin, Daniel W.; Newcomb, Lisa F.; Trock, Bruce J.; Carter, H. Ballentine; Carroll, Peter R.; Cooperberg, Matthew R.; Cowan, Janet E.; Klotz, Lawrence H.; Etzioni, Ruth. Estimating and comparing cancer progression risks under varying surveillance protocols. Ann. Appl. Stat. 12 (2018), no. 3, 1773--1795. doi:10.1214/17-AOAS1130.

Export citation


  • Andersen, P. K. and Keiding, N. (2002). Multi-state models for event history analysis. Stat. Methods Med. Res. 11 91–115.
  • Aralis, H. J. (2016). Modeling multistate processes with back transitions: Statistical challenges and applications. Ph.D. thesis, UCLA.
  • Baum, L. E., Petrie, T., Soules, G. and Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann. Math. Stat. 41 164–171.
  • Bladt, M. and Sorensen, M. (2005). Statistical inference for discretely observed Markov jump processes. J. R. Stat. Soc., Ser. B Stat. Methodol. 67 395–410.
  • Coley, R. Y., Zeger, S. L., Mamawala, M., Pienta, K. J. and Carter, H. B. (2016). Prediction of the pathologic Gleason score to inform a personalized management program for prostate cancer. Eur. Urol. 72 135–141.
  • Cumani, A. (1982). On the canonical representation of homogeneous Markov processes modelling failure-time distributions. Microelectron. Reliab. 22 583–602.
  • Dall’Era, M. A. (2015). Patient and disease factors affecting the choice and adherence to active surveillance. Curr. Opin. Neurol. 25 272–276.
  • Donnelly, C., McFetridge, L. M., Marshall, A. H. and Mitchell, H. J. (2017). A two-stage approach to the joint analysis of longitudinal and survival data utilising the Coxian phase-type distribution. Stat. Methods Med. Res. To appear. PMID: 28633604.
  • Foucher, Y., Giral, M., Soulillou, J.-P. and Daures, J.-P. (2007). A semi-Markov model for multistate and interval-censored data with multiple terminal events. Application in renal transplantation. Stat. Med. 26 5381–5393.
  • Frydman, H. and Szarek, M. (2009). Nonparametric estimation in a Markov “illness–death” process from interval censored observations with missing intermediate transition status. Biometrics 65 143–151.
  • Gignac, G. A., Morris, M. J., Heller, G., Schwartz, L. H. and Scher, H. I. (2008). Assessing outcomes in prostate cancer clinical trials: A twenty-first century tower of Babel. Cancer 113 966–974.
  • Gilbert, P. and Varadhan, R. (2012). numDeriv: Accurate numerical derivatives. R package version 2012.9-1.
  • Grüger, J., Kay, R. and Schumacher, M. (1991). The validity of inferences based on incomplete observations in disease state models. Biometrics 47 595–605.
  • Huang, X. and Wolfe, R. A. (2002). A frailty model for informative censoring. Biometrics 58 510–520.
  • Hudgens, M. G., Satten, G. A. and Longini, I. M. (2001). Nonparametric maximum likelihood estimation for competing risks survival data subject to interval censoring and truncation. Biometrics 57 74–80.
  • Humphrey, P. A. (2004). Gleason grading and prognostic factors in carcinoma of the prostate. Mod. Pathol. 17 292–306.
  • Inoue, L. Y. T., Trock, B. J., Partin, A. W., Carter, H. B. and Etzioni, R. (2014). Modeling grade progression in an active surveillance study. Stat. Med. 33 930–939.
  • Jackson, C. H., Sharples, L. D., Thompson, S. G. and Duffy, S. W. (2003). Multistate Markov models for disease progression with classification error. J. R. Stat. Soc., Ser. D Stat. 52 193–209.
  • Kang, M. and Lagakos, S. W. (2007). Statistical methods for panel data from a semi-Markov process, with application to HPV. Biostatistics 8 252–264.
  • Klotz, L., Vesprini, D., Sethukavalan, P., Jethava, V., Zhang, L., Jain, S., Yamamoto, T., Mamedov, A. and Loblaw, A. (2015). Long-term follow-up of a large active surveillance cohort of patients with prostate cancer. J. Clin. Oncol. 33 272–277.
  • Lange, J. M. and Minin, V. N. (2013). Fitting and interpreting continuous-time latent Markov models for panel data. Stat. Med. 32 4581–4595.
  • Lange, J. M., Hubbard, R. A., Inoue, L. Y. T. and Minin, V. N. (2015). A joint model for multistate disease processes and random informative observation times, with applications to electronic medical records data. Biometrics 71 90–101.
  • Lange, J. M., Gulati, R., Leonardson, A. S., Lin, D. W., Newcomb, L. F., Trock, B. J., Carter, H. B., Cooperberg, M. R., Cowan, J. E., Klotz, L. H. and Etzioni, R. (2018). Supplement to “Estimating and comparing cancer progression risks under varying surveillance protocols.” DOI:10.1214/17-AOAS1130SUPPA, DOI:10.1214/17-AOAS1130SUPPB, DOI:10.1214/17-AOAS1130SUPPC, DOI:10.1214/17-AOAS1130SUPPD, DOI:10.1214/17-AOAS1130SUPPE.
  • Mandel, M. (2010). Estimating disease progression using panel data. Biostatistics 11 304–316.
  • Mao, L., Lin, D.-Y. and Zeng, D. (2017). Semiparametric regression analysis of interval-censored competing risks data. Biometrics 73 857–865.
  • Marshall, G. and Jones, R. H. (1995). Multi-state models and diabetic retinopathy. Stat. Med. 14 1975–1983.
  • Moler, C. and Loan, C. V. (2003). Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAM Rev. 45 801–836.
  • Narod, S. A. and Rakovitch, E. (2014). A comparison of the risks of in-breast recurrence after a diagnosis of DCIS or early invasive breast cancer. Curr. Oncol. 21 119–124.
  • Newcomb, L. F., Thompson Jr., I. M., Boyer, H. D., Brooks, J. D., Carroll, P. R., Cooperberg, M. R., Dash, A., Ellis, W. J., Fazli, L., Feng, Z., Martin, E., Kunju, P., Lance, R. S., Mckenney, J. K., Meng, M. V., Marlo, M., Sanda, M. G., Simko, J., So, A., Tretiakova, M. S., Troyer, D. A., True, L. D., Vakar-Lopez, F., Virgin, J., Wagner, A. A., Wei, J. T., Nelson, P. S. and Lin, D. W. for the Canary Prostate Active Surveillance Study Investigators (2016). Outcomes of active surveillance for the management of clinically localized prostate cancer in the prospective, multi-institutional Canary PASS cohort. J. Urol. 195 206–221.
  • Palisaar, J. R., Noldus, J., Löppenberg, B., Von Bodman, C., Sommerer, F. and Eggert, T. (2012). Comprehensive report on prostate cancer misclassification by 16 currently used low-risk and active surveillance criteria. BJU Int. 110.
  • Penson, D. F. (2012). Factors influencing patients’ acceptance and adherence to active surveillance. J. Natl. Cancer Inst. Monogr. 45 207–212.
  • Pinsky, P., Parnes, H. and Ford, L. (2008). Estimating rates of true high-grade disease in the Prostate Cancer Prevention Trial. Cancer Prev. Res. 1 182–186.
  • Popiolek, M., Rider, J. R., Andrén, O., Andersson, S.-O., Holmberg, L., Adami, H.-O. and Johansson, J.-E. (2013). Natural history of early, localized prostate cancer: A final report from three decades of follow-up. Eur. Urol. 63 428–435.
  • Ross, A. E., Loeb, S., Landis, P., Partin, A. W., Epstein, J. I., Kettermann, A., Feng, Z., Carter, H. B. and Walsh, P. C. (2010). Prostate-specific antigen kinetics during follow-up are an unreliable trigger for intervention in a prostate cancer surveillance program. J. Clin. Oncol. 28 2810–2816.
  • Rouanet, A., Joly, P., Dartigues, J.-F., Proust-Lima, C. and Jacqmin-Gadda, H. (2016). Joint latent class model for longitudinal data and interval-censored semi-competing events: Application to dementia. Biometrics 72 1123–1135.
  • Sridhara, R., Mandrekar, S. J. and Dodd, L. E. (2013). Missing data and measurement variability in assessing progression-free survival endpoint in randomized clinical trials. Clin. Cancer Res. 19 2613–2620.
  • Steele, R. J. and Raftery, A. E. (2010). Performance of Bayesian model selection criteria for Gaussian mixture models. In Frontiers of Statistical Decision Making and Bayesian Analysis (M.-H. Chen, P. Muller, D. Sun, K. Ye and D. K. Dey, eds.) 113–130. Springer, New York.
  • Stephenson, A. J., Kattan, M. W., Eastham, J. A., Dotan, Z. A., Bianco, F. J., Lilja, H. and Scardino, P. T. (2006). Defining biochemical recurrence of prostate cancer after radical prostatectomy: A proposal for a standardized definition. J. Clin. Oncol. 24 3973–3978.
  • Titman, A. C. and Sharples, L. D. (2010). Semi-Markov models with phase-type sojourn distributions. Biometrics 66 742–752.
  • Tosoian, J. J., Mamawala, M., Epstein, J. I., Landis, P., Wolf, S., Trock, B. J. and Carter, H. B. (2015). Intermediate and longer-term outcomes from a prospective active-surveillance program for favorable-risk prostate cancer. J. Clin. Oncol. 33 3379–3385.
  • Tosoian, J. J., Carter, H. B., Lepor, A. and Loeb, S. (2016). Active surveillance for prostate cancer: Contemporary state of practice. Nat. Rev. Urol. 116 1477–1490.
  • Welty, C. J., Cowan, J. E., Nguyen, H., Shinohara, K., Perez, N., Greene, K. L., Chan, J. M., Meng, M. V., Simko, J. P., Cooperberg, M. R. and Carroll, P. R. (2015). Extended followup and risk factors for disease reclassification in a large active surveillance cohort for localized prostate cancer. J. Urol. 193 807–811.
  • Zeng, L., Cook, R. J., Wen, L. and Boruvka, A. (2015). Bias in progression-free survival analysis due to intermittent assessment of progression. Stat. Med. 34 3181–3193.

Supplemental materials

  • Supplement A: Maximum likelihood estimation. Description of the expectation-maximization algorithm used to estimate model parameters.
  • Supplement B: Sample description. Description of the four active surveillance cohorts after exclusions.
  • Supplement C: Parameter estimates. Model parameter estimates assuming 100% biopsy sensitivity and specificity.
  • Supplement D: Model selection. BIC model selection for the four active surveillance cohorts, assuming 60%, 75%, and 90% sensitivity and 100% specifcity.
  • Supplement E: Additional figures. Sensitivity analysis for estimates of upgrading probabilities.