The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of naïve confidence intervals. Categorized according to two types of estimation targets, namely the population- and projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods.
Abbas Khalili is supported by the Natural Sciences and Engineering Research Council of Canada (NSERC RGPIN-2020-05011), and Masoud Asgharian is supported by the Natural Science and Engineering Research Council of Canada (NSERC RGPIN-2018-05618).
The authors would like to thank the co-editor Professor Richard Lockhart, an associate editor, and three referees for their thoughtful and constructive comments. This work is based on the master thesis of Dongliang Zhang written in the department of Mathematics and Statistics at McGill University. Dongliang Zhang also thanks Professors Martin Lindquist and Mei-Cheng Wang, his PhD advisors at Johns Hopkins University, for their support during the completion of this work.
"Post-model-selection inference in linear regression models: An integrated review." Statist. Surv. 16 86 - 136, 2022. https://doi.org/10.1214/22-SS135