Abstract
High-dimensional linear models have been widely studied, but the developments in high-dimensional generalized linear models, or GLMs, have been slower. In this paper, we propose an empirical or data-driven prior leading to an empirical Bayes posterior distribution which can be used for estimation of and inference on the coefficient vector in a high-dimensional GLM, as well as for variable selection. We prove that our proposed posterior concentrates around the true/sparse coefficient vector at the optimal rate, provide conditions under which the posterior can achieve variable selection consistency, and prove a Bernstein–von Mises theorem that implies asymptotically valid uncertainty quantification. Computation of the proposed empirical Bayes posterior is simple and efficient, and is shown to perform well in simulations compared to existing Bayesian and non-Bayesian methods in terms of estimation and variable selection.
Funding Statement
This work was partially supported by the U. S. National Science Foundation, under grants DMS–1737933, DMS–1811802, and SES–205122.
Acknowledgments
The authors thank the Editor and anonymous Associate Editor and reviewers for their helpful feedback. Special thanks to Jeyong Lee and Minwoo Chae for spotting some technical issues in a previous version of the manuscript, and to Naveen Narisetty for sharing his skinny Gibbs codes.
Citation
Yiqi Tang. Ryan Martin. "Empirical Bayes inference in sparse high-dimensional generalized linear models." Electron. J. Statist. 18 (2) 3212 - 3246, 2024. https://doi.org/10.1214/24-EJS2274
Information