Open Access
June 2017 Flexible risk prediction models for left or interval-censored data from electronic health records
Noorie Hyun, Li C. Cheung, Qing Pan, Mark Schiffman, Hormuzd A. Katki
Ann. Appl. Stat. 11(2): 1063-1084 (June 2017). DOI: 10.1214/17-AOAS1036

Abstract

Electronic health records are a large and cost-effective data source for developing risk-prediction models. However, for screen-detected diseases, standard risk models (such as Kaplan–Meier or Cox models) do not account for key issues encountered with electronic health record data: left-censoring of pre-existing (prevalent) disease, interval-censoring of incident disease, and ambiguity of whether disease is prevalent or incident when definitive disease ascertainment is not conducted at baseline. Furthermore, researchers might conduct novel screening tests only on a complex two-phase subsample. We propose a family of weighted mixture models that account for left/interval-censoring and complex sampling via inverse-probability weighting in order to estimate current and future absolute risk: we propose a weakly-parametric model for general use and a semiparametric model for checking goodness of fit of the weakly-parametric model. We demonstrate asymptotic properties analytically and by simulation. We used electronic health records to assemble a cohort of 33,295 human papillomavirus (HPV) positive women undergoing cervical cancer screening at Kaiser Permanente Northern California (KPNC) that underlie current screening guidelines. The next guidelines would focus on HPV typing tests, but reporting 14 HPV types is too complex for clinical use. National Cancer Institute along with KPNC conducted a HPV typing test on a complex subsample of 9258 women in the cohort. We used our model to estimate the risk due to each type and grouped the 14 types (the 3-year risk ranges 21.9–1.5) into 4 risk-bands to simplify reporting to clinicians and guidelines. These risk-bands could be adopted by future HPV typing tests and future screening guidelines.

Citation

Download Citation

Noorie Hyun. Li C. Cheung. Qing Pan. Mark Schiffman. Hormuzd A. Katki. "Flexible risk prediction models for left or interval-censored data from electronic health records." Ann. Appl. Stat. 11 (2) 1063 - 1084, June 2017. https://doi.org/10.1214/17-AOAS1036

Information

Received: 1 July 2016; Revised: 1 February 2017; Published: June 2017
First available in Project Euclid: 20 July 2017

zbMATH: 06775904
MathSciNet: MR3693558
Digital Object Identifier: 10.1214/17-AOAS1036

Keywords: B-splines , HIV , interval censoring , mixture model , two-phase sampling , weighted likelihood

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.11 • No. 2 • June 2017
Back to Top