Jaeyoung Park, Muxuan Liang, Ying-Qi Zhao, Xiang Zhong
Electron. J. Statist. 19 (1), 1-53, (2025) DOI: 10.1214/24-EJS2335
KEYWORDS: missing data, Dimension reduction, semiparametric inference, semi-supervised learning, Double machine learning, 62-08, 62D10
Patient-reported outcome (PRO) measures are increasingly collected as a means of measuring healthcare quality and value. The capability to predict such measures enables patient-provider shared decision-making and the delivery of patient-centered care. However, PRO measures often suffer from high missing rates, and the missingness may depend on many patient factors. Under such a complex missing mechanism, developing a predictive model for PRO measures with valid inference procedures is challenging, especially when flexible imputation models such as machine learning or nonparametric methods are used. Specifically, the slow convergence rate of the flexible imputation model may lead to non-negligible bias, and the traditional missing propensity, capable of removing such a bias, is hard to estimate due to the complex missing mechanism. To efficiently infer the parameters of interest, we propose to use an informative surrogate that enables a flexible imputation model lying in a low-dimensional subspace. To remove the bias due to the flexible imputation model, we identify a class of weighting functions as alternatives to the traditional propensity score and estimate the low-dimensional one within the identified function class. Based on the estimated low-dimensional weighting function, we construct a one-step debiased estimator without using any information on the true missing propensity. We establish the asymptotic normality of the one-step debiased estimator. Simulation and an application to real-world data demonstrate the superiority of the proposed method.