The Annals of Applied Statistics

Semiparametric time to event models in the presence of error-prone, self-reported outcomes—With application to the women’s health initiative

Xiangdong Gu, Yunsheng Ma, and Raji Balasubramanian

Full-text: Open access


The onset of several silent, chronic diseases such as diabetes can be detected only through diagnostic tests. Due to cost considerations, self-reported outcomes are routinely collected in lieu of expensive diagnostic tests in large-scale prospective investigations such as the Women’s Health Initiative. However, self-reported outcomes are subject to imperfect sensitivity and specificity. Using a semiparametric likelihood-based approach, we present time to event models to estimate the association of one or more covariates with a error-prone, self-reported outcome. We present simulation studies to assess the effect of error in self-reported outcomes with regard to bias in the estimation of the regression parameter of interest. We apply the proposed methods to prospective data from 152,830 women enrolled in the Women’s Health Initiative to evaluate the effect of statin use with the risk of incident diabetes mellitus among postmenopausal women. The current analysis is based on follow-up through 2010, with a median duration of follow-up of 12.1 years. The methods proposed in this paper are readily implemented using our freely available R software package icensmis, which is available at the Comprehensive R Archive Network (CRAN) website.

Article information

Ann. Appl. Stat., Volume 9, Number 2 (2015), 714-730.

Received: September 2014
Revised: January 2015
First available in Project Euclid: 20 July 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Measurement error panel data interval censoring time to event outcomes


Gu, Xiangdong; Ma, Yunsheng; Balasubramanian, Raji. Semiparametric time to event models in the presence of error-prone, self-reported outcomes—With application to the women’s health initiative. Ann. Appl. Stat. 9 (2015), no. 2, 714--730. doi:10.1214/15-AOAS810.

Export citation


  • Anderson, G., Cummings, S., Freedman, L. S., Furberg, C., Henderson, M., Johnson, S. R., Kuller, L., Manson, J., Oberman, A., Prentice, R. L., Rossouw, J. E. and Grp, W. H. I. S. (1998). Design of the women’s health initiative clinical trial and observational study. Control. Clin. Trials 19 61–109.
  • Balasubramanian, R. and Lagakos, S. W. (2001). Estimation of the timing of perinatal transmission of HIV. Biometrics 57 1048–1058.
  • Balasubramanian, R. and Lagakos, S. W. (2003). Estimation of a failure time distribution based on imperfect diagnostic tests. Biometrika 90 171–182.
  • Chen, H. H., Duffy, S. W. and Tabar, L. (1996). A Markov chain method to estimate the tumour progression rate from preclinical to clinical phase, sensitivity and positive predictive value for mammography in breast cancer screening. Statistician 45 307–317.
  • Cook, T. D. (2000). Adjusting survival analysis for the presence of unadjudicated study events. Control. Clin. Trials 21 208–222.
  • Cook, T. D. and Kosorok, M. R. (2004). Analysis of time-to-event data with incomplete event adjudication. J. Amer. Statist. Assoc. 99 1140–1152.
  • Cox, D. R. and Hinkley, D. V. (1979). Theoretical Statistics. Chapman & Hall, London.
  • Culver, A. L., Ockene, I. S., Balasubramanian, R., Olendzki, B. C., Sepavich, D. M., Wactawski-Wende, J., Manson, J. E., Qiao, Y. X., Liu, S. M., Merriam, P. A., Rahilly-Tierny, C., Thomas, F., Berger, J. S., Ockene, J. K., Curb, J. D. and Ma, Y. S. (2012). Statin use and risk of diabetes mellitus in postmenopausal women in the women’s health initiative. Arch. Intern. Med. 172 144–152.
  • Finkelstein, D. M. (1986). A proportional hazards model for interval-censored failure time data. Biometrics 42 845–854.
  • García-Zattera, M. J., Jara, A., Lesaffre, E. and Marshall, G. (2012). Modeling of multivariate monotone disease processes in the presence of misclassification. J. Amer. Statist. Assoc. 107 976–989.
  • Gu, X. and Balasubramanian, R. (2013). icensmis: Study Design and Data Analysis in the presence of error-prone diagnostic tests and self-reported outcomes. R package version 1.1.
  • Gu, X., Ma, Y. and Balasubramanian, R. (2015). Supplement to “Semiparametric time to event models in the presence of error-prone, self-reported outcomes—With application to the women’s health initiative.” DOI:10.1214/15-AOAS810SUPP.
  • Guihenneuc-Jouyaux, C., Richardson, S. and Longini, I. M. Jr. (2000). Modeling markers of disease progression by a hidden Markov process: Application to characterizing CD4 cell decline. Biometrics 56 733–741.
  • He, C. Y., Zhang, C. L., Hunter, D. J., Hankinson, S. E., Louis, G. M. B., Hediger, M. L. and Hu, F. B. (2010). Age at menarche and risk of type 2 diabetes: Results from 2 large prospective cohort studies. Am. J. Epidemiol. 171 334–344.
  • Hu, F. B., Manson, J. E., Stampfer, M. J., Colditz, G., Liu, S., Solomon, C. G. and Willett, W. C. (2001). Diet, lifestyle, and the risk of type 2 diabetes mellitus in women. N. Engl. J. Med. 345 790–797.
  • Jackson, C. H. and Sharples, L. D. (2002). Hidden Markov models for the onset and progression of bronchiolitis obliterans syndrome in lung transplant recipients. Stat. Med. 21 113–128.
  • Jackson, C. H., Sharples, L. D., Thompson, S. G., Duffy, S. W. and Couto, E. (2003). Multistate Markov models for disease progression with classification error. The Statistician 52 193–209.
  • Jackson, J. M., Defor, T. A., Crain, A. L., Kerby, T. J., Strayer, L. S., Lewis, C. E., Whitlock, E. P., Williams, S. B., Vitolins, M. Z., Rodabough, R. J., Larson, J. C., Habermann, E. B. and Margolis, K. L. (2014). Validity of diabetes self-reports in the women’s health initiative. Menopause 8 861–868.
  • Kirby, A. J. and Spiegelhalter, D. J. (1994). Statistical Modelling for the Precursors of Cervical Cancer. Wiley, New York.
  • Lyles, R. H., Tang, L., Superak, H. M., King, C. C., Celentano, D. D., Lo, Y. and Sobel, J. D. (2011). Validation data-based adjustments for outcome misclassification in logistic regression: An illustration. Epidemiology 22 589–597.
  • Margolis, K. L., Qi, L. H., Brzyski, R., Bonds, D. E., Howard, B. V., Kempoinen, S., Liu, S. M., Robinson, J. G., Safford, M. M., Tinker, L. T., Phillips, L. S. and Womens Hlth, I. (2008). Validity of diabetes self-reports in the women’s health initiative: Comparison with medication inventories and fasting glucose measurements. Clinical Trials 5 240–247.
  • McKeown, K. and Jewell, N. P. (2010). Misclassification of current status data. Lifetime Data Anal. 16 215–230.
  • Meier, A. S., Richardson, B. A. and Hughes, J. P. (2003). Discrete proportional hazards models for mismeasured outcomes. Biometrics 59 947–954.
  • Neuhaus, J. M. (1999). Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 86 843–855.
  • Oksanen, T., Kivimäki, M., Pentti, J., Virtanen, M., Klaukka, T. and Vahtera, J. (2010). Self-report as an indicator of incident disease. Ann. Epidemiol. 20 547–554.
  • Satten, G. A. and Longini, I. M. (1996). Markov chains with measurement error: Estimating the ‘true’ course of a marker of the progression of human immunodeficiency virus disease. J. R. Stat. Soc. Ser. C. Appl. Stat. 45 275–295.
  • Shaw, P. A. and Prentice, R. L. (2012). Hazard ratio estimation for biomarker-calibrated dietary exposures. Biometrics 68 397–407.
  • Snapinn, S. M. (1998). Survival analysis with uncertain endpoints. Biometrics 54 209–218.
  • Spiegelman, D., Rosner, B. and Logan, R. (2000). Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J. Amer. Statist. Assoc. 95:449 51–61.
  • Turnbull, B. W. (1976). The empirical distribution function with arbitrarily grouped, censored and truncated data. J. Roy. Statist. Soc. Ser. B 38 290–295.

Supplemental materials