Abstract
Missing values among variables present a challenge in variable selection in the generalized linear model. Common strategies that delete observations with missing information may cause serious information loss. Multiple imputation has been widely used in recent years because it provides unbiased statistical results given a correctly specified imputation model and considers the uncertainty of the missing data. However, variable selection methods in the generalized linear model with multiply-imputed data have not yet been studied widely. In this study, we introduce penalized estimating equations for generalized linear models with multiple imputation (PEE–MI), which incorporates the correlation of multiple imputed observations into the objective function. The theoretical performance of the proposed PEE–MI depends on the penalized function adopted. We use the adaptive least absolute shrinkage and selection operator (adaptive LASSO) as an illustrating example. Simulations show that PEE–MI outperforms the alternatives. The proposed method is shown to select variables with clinical relevance when applied to a database of laboratory-diagnosed A/H7N9 patients in the Zhejiang province, China.
Funding Statement
Dr. Li’s work is supported by the Natural Science Foundation of China (72271237) and the Platform of Public Health and Disease Control and Prevention, Major Innovation and Planning Interdisciplinary Platform for the “Double-First Class” Initiative, Renmin University of China.
Acknowledgments
The authors thank Ms. Lin Li for her productive discussion.
Citation
Yang Li. Haoyu Yang. Haochen Yu. Hanwen Huang. Ye Shen. "Penalized estimating equations for generalized linear models with multiple imputation." Ann. Appl. Stat. 17 (3) 2345 - 2363, September 2023. https://doi.org/10.1214/22-AOAS1721
Information