Translator Disclaimer
June 2020 Accounting for dependent errors in predictors and time-to-event outcomes using electronic health records, validation samples and multiple imputation
Mark J. Giganti, Pamela A. Shaw, Guanhua Chen, Sally S. Bebawy, Megan M. Turner, Timothy R. Sterling, Bryan E. Shepherd
Ann. Appl. Stat. 14(2): 1045-1061 (June 2020). DOI: 10.1214/20-AOAS1343


Data from electronic health records (EHR) are prone to errors which are often correlated across multiple variables. The error structure is further complicated when analysis variables are derived as functions of two or more error-prone variables. Such errors can substantially impact estimates, yet we are unaware of methods that simultaneously account for errors in covariates and time-to-event outcomes. Using EHR data from 4217 patients, the hazard ratio for an AIDS-defining event associated with a 100 cell/mm$^{3}$ increase in CD4 count at ART initiation was 0.74 (95$\%$CI: 0.68–0.80) using unvalidated data and 0.60 (95$\%$CI: 0.53–0.68) using fully validated data. Our goal is to obtain unbiased and efficient estimates after validating a random subset of records. We propose fitting discrete failure time models to the validated subsample and then multiply imputing values for unvalidated records. We demonstrate how this approach simultaneously addresses dependent errors in predictors, time-to-event outcomes, and inclusion criteria. Using the fully validated dataset as a gold standard, we compare the mean squared error of our estimates with those from the unvalidated dataset and the corresponding subsample-only dataset for various subsample sizes. By incorporating reasonably sized validated subsamples and appropriate imputation models, our approach had improved estimation over both the naive analysis and the analysis using only the validation subsample.


Download Citation

Mark J. Giganti. Pamela A. Shaw. Guanhua Chen. Sally S. Bebawy. Megan M. Turner. Timothy R. Sterling. Bryan E. Shepherd. "Accounting for dependent errors in predictors and time-to-event outcomes using electronic health records, validation samples and multiple imputation." Ann. Appl. Stat. 14 (2) 1045 - 1061, June 2020.


Received: 1 August 2018; Revised: 1 March 2020; Published: June 2020
First available in Project Euclid: 29 June 2020

zbMATH: 07239895
MathSciNet: MR4117840
Digital Object Identifier: 10.1214/20-AOAS1343

Rights: Copyright © 2020 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.14 • No. 2 • June 2020
Back to Top