Post-model-selection inference in linear regression models: An integrated review

The research on statistical inference after data-driven model selection can be traced as far back as Koopmans (1949). The intensive research on modern model selection methods for high-dimensional data over the past three decades revived the interest in statistical inference after model selection. In recent years, there has been a surge of articles on statistical inference after model selection and now a rather vast literature exists on this topic. Our manuscript aims at presenting a holistic review of post-model-selection inference in linear regression models, while also incorporating perspectives from high-dimensional inference in these models. We first give a simulated example motivating the necessity for valid statistical inference after model selection. We then provide theoretical insights explaining the phenomena observed in the example. This is done through a literature survey on the post-selection sampling distribution of regression parameter estimators and properties of coverage probabilities of näıve confidence intervals. Categorized according to two types of estimation targets, namely the populationand projection-based regression coefficients, we present a review of recent uncertainty assessment methods. We also discuss possible pros and cons for the confidence intervals constructed by different methods. MSC2020 subject classifications: Primary 62F25; secondary 62J07.


Introduction
In one of his earliest work, praised for being "arguably the most influential article" on statistical inference, Fisher describes three fundamental issues in data reduction (Stigler 2005, Fisher 1922), which are the problems of specification, estimation and distribution. According to his illustration, these three problems respectively refer to (1) a "choice of the mathematical form of the population", where we know "what parameters are required to specify the population from which the sample is drawn"; (2) estimation of "values of parameters of the hypothetical population"; (3) determination of "distribution of statistics derived in the sample" for precision assessment.
To solve the second and third problems, respectively of parameter estimation and statistical distribution assessment, Fisher elucidates the renowned maximum likelihood method which is the principal ingredient of most classical and modern model selection and estimation procedures. Fisher's likelihood theory, however, relies on one important assumption pertinent to the first issue on model specification, that is a correct model which accurately characterizes the true data generating mechanism is known, except for certain parameter values, before valid parameter estimation and uncertainty assessment is executed. The implication is that the empirical data shall not be involved in the selection of a final model from a set of candidate models, regardless of whether the selected model is correct or wrong. Instead, the data shall be solely used to perform point estimation and construct relevant uncertainty measures. Moreover, under this assumption, to provide an explicit answer to the third problem, Fisher advocates the use of the well-known p-value, a concept introduced by Pearson (1900), as an assessment benchmark for evaluating statistical significance of result(s) of a scientific experiment. In addition, Fisher (1935) implies that the criterion that p-value less than a pre-specified value may indicate that "a phenomenon is experimentally demonstrable". In other words, p-value could be considered as an indicator for replicability of results, which is commonly regarded as a golden standard in science (Shapin & Schaffer 1985).
The foundation on which Fisher's maximum likelihood method and the subsequent adoption of p-values for precision assessment rests, however, is seldom satisfied in practice, which is "a quiet scandal in the statistical community" as phrased by Breiman (1992). In fact, the exact mathematical form of population is often unidentified beforehand. Usually, upon assuming a specific class of true data generating mechanism such as a linear or a generalized linear regression model, and identifying a set of potential predictor variables, a model for statistical inference is usually selected from a set of candidate models through either a well-defined, ill-defined or opaque data-dependent fashion . A data-driven selection of tuning parameters in obtaining penalized least-squares estimators further invalidates that assumption. The final model so selected, on which parameter estimation and uncertainty assessment is based, is not fixed as required by Fisher's view, but indeed random. In other words, even with the same model selection procedure, different realizations of the true data generating mechanism may well lead to different selected models which may deviate from the true model with non-negligible probability. Therefore, by integrating additional uncertainty from model selection into inferential process, the validity of statistical inference guaranteed by Fisher's likelihood theory becomes suspicious.
In addition to the aforementioned violation of Fisher's fundamental assumption, negligence of the multiple testing problems in the era of mass production of scientific publications further deepen the concern with respect to the validity of statistical inference in ensuring the replicability of scientific experiment, particularly on the legitimacy of p-values. This consideration has been highlighted across several scientific communities, for example behavioral genetics (Mann 1994) and genomics research (Lander & Kruglyak 1995), which culminated in the thought-provoking article by Ioannidis (2005), who boldly assert that "most claimed research findings are false". As such, to ensure the replicability of scientific discoveries, several initiatives have been launched within the last decade involving appeal for more cautious use or even abandonment of p-values. Dis-missing these movements as "misguided attacks", Benjamini (2020) argue that (i) failure to ascertain the correct level of variability, and (ii) ignorance of the effect of selection on statistical inference are two crucial yet often neglected causes for the irreplicability of scientific results. He concludes that we do not need to refrain from counting on classic measures, such as p-values and confidence intervals, for assessing statistical significance. Rather, these assessment benchmarks have to be adjusted for valid statistical inference.
In view of the foregoing reasoning and perspectives from Ioannidis (2005) and Benjamini (2020) on replicability of results, given that the presumed condition of Fisher's likelihood method is infringed since one data set is simultaneously used for model selection and statistical inference, one would naturally ask what detrimental effects this violation might provoke on parameter estimation and precision assessment so that the resulting measures (without adjustments), such as p-values or confidence intervals, represent sources of concern across scientific communities. To answer this question, several important contributions have been accomplished over the past two decades, Pötscher (1991Pötscher ( , 1995; Pötscher & NovÁk (1998); Leeb & Pötscher (2003; Kabaila (1995Kabaila ( , 1998Kabaila ( , 2005Kabaila ( , 2009); Kabaila & Leeb (2006); Kabaila & Giri (2009) and Berk et al. (2009), among others. The aforementioned research recognize and explain the undesirable outcomes of distorted sampling distribution of regression parameter estimators obtained after model selection and reduced coverage probability of confidence regions that are constructed in a naïve fashion where the data-driven nature of a selected model is ignored. Furthermore, via the lens of replicability of results, the confidence regions so constructed may contain zero after necessary adjustment by accounting for model selection randomness, whereas they may not include zero without such adjustments. Therefore, classic uncertainty quantification measures formulated without considering additional layer of randomness from selection indeed contribute to the irreplicability of scientific results. As such, it is imperative to devise reliable uncertainty quantification measures to account for randomness inherited from data-dependent model selection procedures.
Considering randomness in estimation of regression coefficient(s) after model selection, Berk et al. (2013) argue that the population-based regression coefficient(s) (referred to as target(s) from now on) may have interpretation drawbacks in some applications. They therefore present an alternative estimation target(s), known as the projection-based regression coefficient(s). Focusing on such targets, Berk et al. (2013) and Lee et al. (2016) propose two methods for constructing valid post-selection confidence intervals. They are respectively known as the post-selection inference (PoSI) and the exact post-selection inference (EPoSI) methods.
To the best of our knowledge, Leeb et al. (2015) is the only paper that provides a comparative study on methods of constructing post-selection confidence intervals for the projection-based targets. Specifically, these authors compared the naïve confidence intervals with those constructed using the PoSI approach proposed by Berk et al. (2013), where AIC, BIC and LASSO (Tibshirani 1996) are used for model selection. Their simulations indicate that the PoSI confidence intervals do deliver at least the desired minimal coverage probability for the projected coefficients, while naïve confidence intervals in general fail to do so. Interestingly, dramatic under-coverage is observed for both types of confidence intervals if the projection-based targets are replaced by the population-based regression coefficients. Such under-coverage could be explained by the fact that the PoSI confidence intervals are not designed for the population-based target.
Yet, the existing literature on post-model-selection (PMS) inference stops short on three different fronts. First, there is no integrated review of existing literature in post-selection inference. It is therefore an arduous task to grasp the essence of post-selection inference in a succinct and unified fashion. Second, the empirical studies by Leeb et al. (2015) fall short in: (1) comparing confidence intervals constructed using the Scheffé's (Scheffé 1959) and the EPoSI (Lee et al. 2016) approaches; (2) considering other important indicators such as the length of confidence intervals. Third, a collection of recent methodological advancement has not been presented in a unified fashion.
This manuscript aims at filling the aforementioned gaps and presents an integrated guide for the contemporary PMS inference. Particularly, using simulations, we compare naïve confidence intervals against those constructed by the Scheffé's method, the PoSI and the EPoSI approaches. In terms of the average coverage probabilities for the projection-based regression coefficients, we find that when conditioning on the selected submodel only, a mix of under-and satisfactory coverage is observed for the EPoSI confidence intervals while the PoSI confidence intervals consistently achieve over-coverage. In terms of the average length of confidence intervals, those constructed by the EPoSI method are in general shorter than the ones by the PoSI approach, reinforcing the conservative nature underpinning the PoSI method. On the other hand, when conditioning on both the selected submodel and corresponding sign vector of point estimator(s), similar phenomenon with respect to the average coverage probabilities is seen for the EPoSI and PoSI confidence intervals. However, in this case, the EPoSI confidence intervals can sometimes be much wider. We have observed through simulations that the EPoSI confidence intervals, obtained via conditioning on the selected submodel or both the selected submodel and corresponding sign vector of parameter estimator(s), sometimes have infinite lengths. This has also been analytically discussed in the recent manuscript Kivaranovic & Leeb (2021).
The rest of the manuscript is organized as follows: in Section 2, we introduce notation and terminology. In Section 3, we provide a motivating example based on the linear regression model to give some tangible hints to the pitfalls of ignoring data-dependent model selection on statistical inference. In Section 4, we present a summary of key results in finite-and large-sample distributional properties of the PMS estimators as well as properties of coverage probabilities of naïve confidence intervals, serving as theoretical insights into the phenomena observed in Section 3. In Section 5, we introduce and discuss some merits and demerits of both population-and projection-based estimation targets. Focusing on the projection-based targets, we then provide a selective review of existing post-selection methods for uncertainty assessment in Section 6. Particular emphasis is placed on the construction of valid post-selection confidence intervals and p-values. This is followed by a concise presentation of various recent methodological development in post-model-selection inference. In Section 7, we provide an overview of statistical inference methods designed specifically for the population-based targets, even though they may not be considered as postselection methods in the most strict sense. In Section 8, focusing on methods designed for projection-based regression coefficients, we present comparative simulations to evaluate different constructions of post-selection confidence intervals in linear regression models. Conclusions are given in Section 9.
Let the index set M = {j 1 , ..., j m } ⊆ {1, ..., p} denote a linear submodel which contains explanatory variables with indices in M , where |M | = m ≤ p denotes the size of submodel M . Then we define X M ∈ R n×m to be the n × m submatrix of the design matrix X with columns indexed by M . The linear regression model corresponding to a submodel M is given by If we set M = {1, ..., p} in (2), that is the index set of all the predictors, we obtain the linear regression model (1) with the true population-based regression coefficient β 0 . Indeed, this observation consists of one rudimentary aspect of the interpretation for β 0 , which is referred to as the full model interpretation of parameters by Berk et al. (2013). More specifically, the linear regression model in (1) is considered as the true data generating mechanism for the response vector Y . As such, a causal connection between the response and the predictors is implicitly embedded in this interpretation. Under this setup, coefficient estimates for de-selected predictors are assigned to be zero, and the resulting submodel comprising only the selected predictors is merely the "computational compression"(ibid., Appendix, B.1) of the data. While this interpretation has merits in explaining natural and physical scientific phenomena, Berk et al. (2013) point out that: (i) the idea of having a full model exhibiting a causal interpretation is controversial, and (ii) inferential complications may arise from this interpretation as well (Leeb & Pötscher 2005).
To avoid these contentious elements, as an alternative to the full model interpretation of parameters, Berk et al. (2013) propose the submodel interpretation of parameters. Instead of assuming a linear structure for the true model, one would give up the proposition of having a full model all together and only assume the existence of the true mean response vector E(Y ) = μ in model (1). Then, the so-called projection-based target corresponding to a given submodel M is defined as We will provide more discussion about the two estimation targets β 0 and b M in Sections 5.1 and 5.2, respectively.
Assuming the design matrix X is fixed, we define M n to be a data-dependent model selection procedure, a measurable function of Y , given by M n : R n → M all , where M all = {M |M ⊆ {1, ..., p}, rank(X M ) = |M |}, is the space of all possible full-rank submodels. Let M true ∈ M all be the index set of the true data generating mechanism upon assuming such linear model exists. From now on, we use M to represent the output of the map M n (Y ), i.e. M = M n (Y ).
If a submodel M is given a priori, then the ordinary least-squares (OLS) estimator of β M in (2) is given by The same estimator is also used for b M in (3), and to avoid introducing more notation, we use β M as its estimator. Components of β M are denoted by β j·M , for all j ∈ M . Moreover, related to our motivating example in Section 3 based on (1), we denote β Method M as the OLS estimator corresponding to a submodel M selected by a "Method" considered below. Define where sd(·) stands for standard deviation. The quantity T Method In addition, a penalized least-squares estimator of β 0 in (1) is given by where 2 , λ is the tuning parameter (possibly multi-dimensional) and J n (β; λ) is a penalty function. In the sequel, we consider these penalties: 1. LASSO: J n (β; λ) = λ||β|| 1 , where λ = λ; 2. Elastic net (Zou & Hastie 2005) Here, λ = (λ, a), where a ≥ 2. 4. Adaptive LASSO (AdaLASSO) (Zou 2006): J n (β; λ) = λ||w * β|| 1 , λ = λ and the data-dependent weights w = (w 1 , ..., w p ) is recommended to be chosen as w j = 1/| β j | for some consistent estimator β j of β 0 j . Moreover, where * represents elementwise multiplication of two vectors.
The books by Fan et al. (2020), Wainwright (2019), and Hastie et al. (2015) provide a comprehensive review on the above modern regularization techniques and related theories in the context of different statistical models, including linear and generalized linear regression models.

Motivating example
We now give some tangible insights into the unfavorable impacts of ignoring the data-dependent model selection on statistical inference. Through a simulated example based on the linear regression model (1), we achieve this by: against their respective counterparts, T j·Mtrue and T j·Mtrue,0 , where j ∈ M true ; (c) tabulating the empirical coverage probabilities of the naïve confidence intervals for the true nonzero β 0 j , where j ∈ M true in TABLE 1. The naïve confidence intervals are constructed by using T Method j· M in (5), and are given by where t(n−| M |; 1−α/2) is the 100(1−α/2) th percentile of a t-distribution with n − | M | degrees of freedom.
In this example, for parts (a) and (b) we only report the results for j = 5 as the plots for j = 1, 3, 4, 7 show similar behaviors. For part (c), we report empirical coverage probabilities of 95% (α = 0.05) naïve confidence intervals    in (7) for parameters β 0 j , j = 1, 3, 4, 5, 7, in TABLE 1 below. Details of model setting and R implementation, and additional plots for j = 1, 3, 4, 7 are given in Section S1 of the Supplementary Material (Zhang, Khalili & Asgharian 2022). , and their counterparts, β j·Mtrue , T j·Mtrue and T j·Mtrue,0 . More specifically, bimodality, longer tail and positive or negative skewness appear in the sampling distribution of these post-selection quantities in contrast to their counterparts. Thus, we conclude that conducting model selection and inference on the same data can distort the sampling distribution of parameter estimators and the quantities in (5). In particular, the sampling distributions of these quantities after model selection no longer follow t-distribution as seen in . This contributes to the under-coverage of naïve confidence intervals in (7), a phenomenon noted in TABLE 1. As it can be seen from the table, the under-coverage can be severe.
In addition, our simulations indicate that there is no tractable pattern of the post-selection sampling distributions of parameter estimators and quantities in (5), and the coverage probability of naïve confidence intervals in (7). We provide explanations for these phenomena in Sections 4.

Post-selection sampling distribution of parameter estimators and properties of coverage probabilities of naïve confidence intervals
We now review existing results about finite-and large-sample distributional properties of post-selection parameter estimators and properties of coverage probabilities of naïve confidence intervals in (7). The research in entangling the distributional properties of post-selection parameter estimators is mainly pioneered by Benedikt M. Pötscher and Hannes Leeb. Through a series of works (Pötscher 1991, Pötscher & NovÁk 1998, Leeb & Pötscher 2003, the following points have been highlighted: (i) Deviation from Gaussianity. The finite-(in both known and unknown variance) and large-sample conditional (on the order of selected models) and unconditional distributions of post-model-selection parameter estimators, in general, deviate from Gaussian. Indeed, these distributions are mixtures of distributions, and each mixture component corresponds to a model in the model space. As such, these distributions have complicated forms and are difficult to work with when performing statistical inference. (ii) Non-uniform convergence. The aforementioned mixture distributions depend on the unknown true parameter β 0 . More specifically, regardless of the use of consistent or inconsistent model selectors, the convergence of the mixture distributions to their limiting distributions depends on how frequently different models are selected. However, these frequencies depend on the true parameter β 0 , and consequently, the convergence of postmodel-selection estimators to their limiting distributions is non-uniform with respect to β 0 . Thus, irrespective of the sample size, inferential validity of using the large-sample distribution to estimate the corresponding finite-sample counterpart is not guaranteed. (iii) Nonexistence of uniformly consistent estimator. A uniformly consistent estimator of conditional (on the selected submodel) or unconditional distribution of post-model-selection parameter estimator does not exist. Therefore, it is impossible to estimate their exact distributions in a uniform and consistent fashion.
These results provide theoretical explanations to the discrepancies between the sampling distributions of the OLS estimator β j·Mtrue and post-selection OLS estimators β Method Next, we review results concerning finite-and large-sample properties of naïve confidence intervals, serving as an explanation for the under-coverage observed in TABLE 1. The relevant work was initiated by fixing the true regression coefficient. In various settings, Hurvich & Tsai (1990), Regal & Hook (1991) and Zhang (1992) showed, both theoretically and via simulations, that finite-and large-sample conditional (on selected models) and unconditional coverage probabilities of naïve confidence intervals are lower than nominal coverage probabilities.
In large-sample theory, Kabaila (1995) explored the pitfall of assigning the criterion that asymptotic nominal coverage rate is achieved for a fixed regression coefficient as a benchmark to assess the asymptotic validity of a confidence region. In particular, using Hodge's example of a "superefficient" estimator, the author showed that even if the above criterion is met by some confidence interval, for any given n, there exists a regression coefficient such that this confidence interval fails to deliver the desired coverage probability. In other words, achieving proper asymptotic coverage probability for a fixed regression coefficient does not guarantee the validity of the associated confidence interval. As such, Kabaila (1995) argued that asymptotic coverage rate of a valid confidence region should be achieved by its minimal coverage probability over the space of all possible values of β 0 . Based on this criterion, a novel Monte Carlo simulation algorithm was proposed by Kabaila (2005) to compute the coverage probability of naïve confidence intervals at any given parameter value, which is also shown to be lower than the nominal coverage probability based on simulated data. Further attempts were made by Kabaila & Leeb (2006) and Kabaila & Giri (2009), where upper bounds of the large-and finite-sample minimal coverage probability of naïve confidence interval are respectively derived. The authors showed that this upper bound could be far from the nominal coverage probability.
More detailed explanations of the above mentioned contents are respectively given in Sections S2 and S3 of our Supplementary Material.
Having recognized the challenges in performing statistical inference after data-driven model selection, researchers have since been devising post-selection inferential methodologies that take into account model selection uncertainty. In the next section, we discuss two types of statistical estimation targets.

Statistical estimation targets
One primary challenge of post-selection inference is the randomness in the estimation target. For instance, under the setting of the linear regression model (1), the OLS estimator corresponding to a submodel M ∈ M all is given by However, upon introducing the uncertainty inherited by a data-drive model selector M n , it is not clear what the estimation target of the least-squares estimator (X M X M ) −1 X M Y is. This makes the interpretation of an estimation target so formed ambiguous. Thus, it is imperative to first formulate a meaningful estimation target. With such motivation, we now discuss two classes of estimation targets that have been the focus of contemporary high-dimensional and post-selection inference literature, namely the population-based target in (1), and the projection-based target in (3).

Population-based target
According to the full model interpretation of parameters ), the linear model (1) is regarded as the true data generating mechanism for the response Y . As such, this model has the special status of embracing a complete set of predictors which are causal for the response vector Y . In other words, the variation in Y can be explained by that set of predictors. Under this framework, although a submodel may be selected and the coefficient estimates of the de-selected predictors are given a numerical value of zero, these predictors do exist when conducting inference, and they have tangible and meaningful interpretations. Therefore, the regression coefficient β 0 j represents the expected change in the response Y per unit change in x j , holding all the remaining covariates fixed, regardless of being selected or de-selected. Thus, the coefficients β 0 j 's are referred to as the population-based targets and they indeed contribute to the interpretation of linear model (1) according to the aforementioned full model interpretation of parameters.
It is apparent that the above view is legitimate only within the presumed framework (1). In fact, it hinges entirely on the first-order-correctness condition E(Y ) = Xβ 0 . On the other hand, another line of thought argues that some limitations may associate with the full model interpretation of parameters, which motivates the introduction of the projection-based target as discussed in the next section.

Projection-based target
There are several limitations of the interpretation of β 0 according to the full model view of parameters. First, the concept of the so-called full model to entangle all the variability in Y is controversial. Given the complexity of the mother nature, it may neither be accurate nor feasible to measure or include all the pertinent predictors. Moreover, Berk et al. (2013) point out that due to the issue of predictor redundancy, which is common in social and biological sciences, the full model may not even contain the parameter of interest. Second, the full model view may give rise to complications when conducting statistical inference. In fact, as shown by Leeb & Pötscher (2005), the sampling distribution of coefficient estimator for the parameter of interest depends on all the other true regression coefficients. Since these coefficients are unknown, it is uncertain whether there exists discrepancy between the targeted sampling distribution and its counterpart when we have a fixed model. As a consequence, this may lead to unreliable statistical inference based on a wrong sampling distribution of the estimator for the parameter of interest. Indeed, its true sampling distribution cannot be estimated in a uniform and consistent manner; see Section 4,point (iii). Third, it is contentious to presume a linear structure of the true model. A plausible perspective toward statistical modeling of real life problems is that none of the arbitrarily constructed candidate models and observed data adequately represent the full reality (Bancroft & Han 1977, Box 1976. In fact, the true data generating mechanism is perhaps much more complex. Thus, simply assuming a specific structure for the true model may lack solid scientific verification. Considering the above limitations with respect to the full model view of parameters, Berk et al. (2013) proposed three principles pertaining to the interpretation of regression parameters of a submodel: (i) "the full model has no special status other than being the repository of available predictors"; (ii) "the coefficients of excluded predictors are not zero; they are not defined and therefore do not exist"; (iii) "the meaning of a predictor's coefficient depends on which other predictors are included in the selected model".
Governed by the above three guidelines, Berk et al. (2013) subsequently proposed the projection-based target b M as defined in (3). It is seen that b M is the orthogonal projection of μ onto the column space of X M . In general, b M and β M in (2) are different. In fact, if the first-order-correctness condition holds so that μ = X M β M for the submodel M , then the two targets are equal.
It has been discussed that the formulation of projection-based target b M is advantageous over population-based target β 0 in two aspects: (I) the first-ordercorrectness condition is not required; (II) simplicity in interpretation. Abandoning the assumption of the first-order-correctness is more appealing in many real life statistical modeling problems. Indeed, the central concept underpinning the definition of b M is that although it is unrealistic to assume a specific structure of the true model, it is always legitimate to approximate the true model by a linear structure containing a selected submodel M . As such, with a selected submodel M , each component of projected regression coefficient, denoted by b j·M , j ∈ M , is interpreted as the approximated expected change by submodel M in the response Y per unit change in x j , holding all other covariates in M fixed. This interpretation is in fact the second advantage of b M as it is interpreted only within the framework of a submodel M thereby avoiding the issue caused by covariates that are not included in M . We next review the inferential methods for the two estimation targets discussed above respectively in Sections 6 and 7, by focusing on confidence interval constructions.

Post-selection inferential methodologies
Focusing on the projection-based targets, we now provide a review of three existing post-selection statistical inferential methods. Particular emphasis is on the construction of valid post-selection confidence intervals. A collection of recent work in post-model-selection that complements the foregoing discussions is presented in Section 6.4. Inferential methods designed for the population-based targets have been surveyed and compared through simulations by Dezeure et al. (2015) which we discuss and further elaborate on in Section 7.

Universally valid post-selection inference (PoSI)
Analogous to Scheffé's S-method (Scheffé 1959) on valid simultaneous inference by virtue of Scheffé's constant, Berk et al. (2013) proposed the so-called PoSI method, which is capable of producing universally valid post-selection confidence intervals for the projection-based regression coefficients b j· M , j ∈ M , regardless of model selection procedures and selected submodel M . Here "valid" means the confidence intervals achieve at least the nominal coverage probability 1 − α, for any α ∈ (0, 1), by taking into account the model selection stage. The merit of the PoSI method is that even if a selected submodel deviates from the true model, it still guarantees valid inference for that selected submodel. We now describe the method.
Given a submodel M ∈ M all selected by a generic model selection procedure M n , we consider the following confidence interval for b j· M , where β j· M is the OLS estimator corresponding to a submodel M , K is a constant to be specified below, and where SSE is the sum of squares of errors obtained from fitting a linear regression model including all the covariates (full model).
If in (8) K is replaced by t(n − | M |; 1 − α/2), we obtain the naïve confidence interval in (7). We have demonstrated in our motivating example, TABLE 1 in Section 3, that such interval fails to attain the nominal coverage probabilities for the population-based targets. We refer to Remark 6.1.2 for a recent result regarding the coverage probability of the naïve confidence intervals when the population-based target is replaced by the projection-based target.
The goal of the PoSI approach is to have a potentially larger value of the constant K to capture the extra randomness brought about by data-dependent model selector M n . In this way, the resulting confidence interval becomes wider and thus capable of achieving the nominal coverage probability, regardless of model selection procedures and selected submodels. This objective can be formulated as follows. The constant K in (8) More specifically, Berk et al. (2013) where Recall that e j ∈ R |M | is the j th standard basis. Now, since for any randomly selected submodel M , Thus, by choosing K = K PoSI , (9) is satisfied. We notice that the constant K PoSI is the 100(1 − α/2) th percentile of the random variable T = max M ∈M all max j∈M |t j·M | in (10), whose distribution is needed for the computation of K PoSI . On the other hand, for each M ∈ M all , the random variable t j·M as shown in (11) mainly depends on the design matrix X M and the error term = ( 1 , ..., n ) , where i 's are independent and i ∼ N (0, σ 2 ). Thus, the distribution of T can be approximated using Monte Carlo simulations. Specifically, we first obtain several copies of the random variable T based on independent draws of , where i ∼ N (0, σ 2 ) are generated independently. Then, given α, the constant K PoSI satisfying (10) is approximated by the 100(1 − α/2) th percentile of the empirical distribution based on the obtained copies of T . Recall that σ 2 is computed based on the full model. In practice, this computation has been implemented in the R package PoSI .
Another choice for the constant K in (8) is the Scheffé's constant (Scheffé 1959 In comparison, the Scheffé's constant is more conservative than the PoSI-constant, as Berk et al. (2013) showed that K PoSI ≤ K Sch , for all design matrices X and any model universe M all .
It is worthwhile to note that the calculation of the PoSI-constant, K PoSI , is independent of any quantities derived from selected submodels. In other words, the formulation of the PoSI-constant is pre-experimental as it does not depend on any experiment outcomes. Although the construction of the PoSI confidence interval unavoidably involves some post-experimental quantities which are dependent on the experiment outcomes (such as the post-selection point estimators), the inclusion of the PoSI-constant indeed contributes to the universal validity in covering a random projected target with nominal coverage probability.
However, the strong universal protection resulted from the pre-experimental nature of the PoSI approach implies that this method is necessarily conservative, since its inferential validity is safeguarded against all possible selected submodels. Indeed, our simulations in Section 8 indicate that conditioning on a selected submodel, the PoSI confidence intervals are in general wider than the EPoSI confidence intervals (introduced in the next section). Yet, we demonstrate that this conservativeness of the PoSI approach is moderate, since when conditioning on both the selected submodel and corresponding sign vector of point estimator(s), the PoSI confidence intervals are not necessarily wider.
Apart from the above characteristics, one limitation of the PoSI approach is its relatively high computational cost for computing the PoSI constant K PoSI . In fact, the authors recommended to use this approach for designs with p ≈ 20 only. Therefore, this method is not feasible in high-or ultrahigh-dimensional settings unless some pre-screening procedure (Fan & Lv 2008) is performed.
Remark 6.1.1. The order of the magnitude of the PoSI-constant is studied by Berk et al. (2013) and Bachoc et al. (2018). Berk et al. (2013) prove that for a specific type of exchangeable design matrix, K PoSI = O( √ log p) and this rate is also achieved for orthogonal design matrix. In full generality, however, it turns  (2021) show that when we apply the LASSO for variable selection and then refit a linear regression model (1) corresponding to the selected variables, the resulting ordinary least-squares estimator is asymptotically normally distributed. As a consequence, the naïve confidence interval, CI j· M (z 1−α/2 ) in (8), where z 1−α/2 denotes the 100(1 − α/2) th percentile of a standard normal distribution, asymptotically attains the nominal coverage probability of 1 − α for the projected target. These results, however, depend on several assumptions regarding the design matrix X, its dimensions n and p, the tuning parameter λ in running the LASSO for variable selection, and the true signal strength.

Exact post-selection inference (EPoSI)
In contrast to the pre-experimental proposal by Berk et al. (2013), Lee et al. (2016) and Lee & Taylor (2014) present a post-experimental approach, which depends on the model selection procedure and the selected submodel. Specifically, they construct confidence intervals with the exact (1 − α) coverage probability via conditioning on a selected submodel. Therefore, their method is known as the conditionally exact post-selection inference and the resulting confidence interval for b j·M , denoted by CI j·M , satisfies where the event M = M means Y ∈ R n : M n (Y ) = M . Since this method involves a number of delicate steps, we first provide a compendium of this approach in Algorithm 1, before supplying details below. The crux of the EPoSI approach involves two stages: first, determination of the conditional distribution of β j·M M = M , which corresponds to Steps 1-3 below, and it turns out to be a truncated normal distribution; second, construction of the confidence intervals as shown in (12), which corresponds 2: The above formulation of selection event can be further decomposed as Y ∈ R n : gives the ordinary least-squares estimator for b j·M , which is β j·M . Thus, the con- follows a truncated normal distribution with cumulative distribution function given by F where ξ j and τ j are the mean and variance of this truncated normal, respectively.

5: The above result in
Step 4 enables us to construct the confidence interval CI j·M in (12).
In practice, the upper and lower bounds of CI j·M can be computed using the R package selectiveInference (Tibshirani, Tibshirani, Taylor, Loftus & Reid 2016).
can be accomplished via first considering the distribution of β j·M M = M , s = s . Next, we provide details for the EPoSI framework.
Step 1 We first recall that the ordinary least-squares estimator for b j·M is β j·M . To obtain CI j·M as in (12), we need to find the conditional distribution of β j·M , given M = M . To achieve that, we first investigate the conditional distribution of β j·M given the so-called selection event (13) Here, β j·M can be written as η j Y , where η j = e j (X M X M ) −1 X M and e j ∈ R |M | is the j th standard basis. Moreover, s n is the selection of sign vector of the non-zero coefficient estimate(s) corresponding to the model selection procedure M n , s is the selected sign vector which is random and s is the selected sign vector corresponding to submodel M .
Then, to ascertain the exact conditional distribution of where A(M, s) and B(M, s) respectively denote some affine matrix and vector depending on the model selection procedure, selected submodel, and sign vector of the corresponding non-zero coefficient estimate(s). Here, for two vectors u and v, we say u ≤ v if and only if every component of u is less than or equal to every component of v. Geometrically, (14) is equivalent to characterizing the response vector falling into a single polytope. Thus, this requirement is known as the polyhedral selection property. Examples of model selection procedures having this property include, but not limited to, LASSO and elastic net with fixed tuning parameters, marginal screening, nonnegative least squares and orthogonal matching pursuit (Tropp & Gilbert 2007). The specific forms of the affine matrix and vector A(M, s) and B(M, s) corresponding to LASSO are given in Section S4 of our Supplementary Material.
Step 2 With the polyhedral selection property (14), Lee et al. (2016) and Lee & Taylor (2014) show that the single polytope {Y ∈ R n : A(M, s) Y ≤ B(M, s)} can be further decomposed as where the exact forms of quantities V − (Y ), V + (Y ) and V 0 (Y ) are given in Section S4 of our Supplementary Material. These quantities are shown to be independent of η j Y . In summary, with the polyhedral selection property, the Step 3 Based on the characterizations (14) and (15), and the assumption that where v − , v + ∈ R, ξ j = η j μ, τ j = σ 2 ||η j || 2 , and TN(μ, σ 2 ; a, b) denotes a Gaussian distribution with mean μ and variance σ 2 , truncated to the interval [a, b].
Step 5 The finite-sample pivot (17) enables us to construct valid confidence intervals for the b j·M as follows. Let L s j and U s j be two quantities satisfying F 106 D. Zhang et al. where E n (M , s) satisfies the requirement in (14).
To construct the confidence intervals in (12), we then consider a union of polytopes: where the union is taken over 2 |M | sign vectors. By the same argument as conditioning on a single polytope in (17), we obtain the uniformly distributed pivotal quantity as from which the lower and upper bound L j and U j can be constructed in the same fashion as the single polytope-based interval in (18). As such, the EPoSI confidence interval CI j·M in (12) is given by [L j , U j ]. One advantage of the EPoSI is its capability of constructing valid postselection confidence intervals in both low-and high-dimensional configurations. Taylor & Tibshirani (2018) explore generalization of the EPoSI approach to a large class of 1 -penalized regression models, including generalized linear models, Cox's proportional hazards model, and the graphical LASSO (Friedman et al. 2008). On the other hand, one limitation of the EPoSI approach is with respect to the lengths of the resulting confidence intervals in both single and union of polytopes cases. More specifically, by conditioning on a single polytope (14) or a union of polytopes (19), the resulting confidence intervals are sometimes wider than the competing confidence intervals when the signal strengths are weak (Lee et al. 2016), or may even have infinite lengths (Kivaranovic & Leeb 2021). These phenomena have also been demonstrated by our simulations in TABLE 6-8, Section 8.
To gain insights and explore possible remedial procedures to this limitation, Fithian et al. (2017) explain that including redundant information in the selection event in (15), also known as the conditioning event, gives rise to reduced power of associated hypothesis tests, contributing to unacceptably wide confidence intervals. The authors then propose a generic framework of uniformly most powerful unbiased selective level-α test for exponential family models, and conclude that hypothesis tests based on data splitting procedures are always inadmissible. In other words, there always exists a testing procedure exhibiting higher power than data splitting.  propose to incorporate a noise ω in the original response Y and perform subsequent inference procedures outlined in this section on the randomized response Y * = Y + ω. It is theoretically shown that the resulting estimator derived from the randomized response Y * is uniformly consistent. Moreover, via simulations,  demonstrate that assigning ω to be either Gaussian or logistic noise leads to shortened EPoSI confidence intervals. This framework of randomized noise augmentation is then applied by Hyun et al. (2021) to post-selection inference for change point detection, which demonstrates enhanced power. As discussed by Fithian et al. (2017) and , conditioning on unnecessary information implies less information for statistical inference, and hence resulting in wider confidence intervals. Motivated by this, Liu et al. (2018) construct inferential framework with so-called minimal conditioning, focusing on a different estimation target than the projection-based one in (3). Specifically, they divide the selected model M n (Y ) = M into two groups of variables, namely the high and low value targets. Only the high value targets, denoted by H n (Y ) = H, are included in the new conditioning event and instead of (12), the authors aim to achieve where the new target for inference, b j·H , is the j th coordinate of the vector (X H X H ) −1 X H E(Y ), given the high value target H, and CI j·H is the corresponding EPoSI confidence interval. To select the high value targets, two solutions are proposed, namely stable-1 and stable-t approach. Based on this set up, similar derivation of truncation region and truncated Gaussian result as those in (15) and (16)  (2021) also have proposed EPoSI-based frameworks that condition on less information, with specific application to inference for change point detection and graph fused LASSO, respectively. Tibshirani, Taylor, Lockhart & Tibshirani (2016) show that the EPoSI method can also be applied at each step of certain sequential regression procedures. Their goal is to assess the significance of the latest selected covariate(s) through projecting E(Y ) = μ onto the space of current active covariates. The implication is that, at each step of a sequential regression procedure and given the set of current active predictors, conditionally exact post-selection confidence intervals for the projected regression coefficient of a newly-entered covariate can be constructed. We should be aware that the EPoSI approach is only applicable to those sequential regression procedures that at each step satisfy the polyhedral selection property described in Section 6.2. A non-exhaustive list of such sequential regression procedures includes a modified version of forward stepwise regression (Tibshirani, Taylor, Lockhart & Tibshirani 2016), the least angle regression (LARS) (Efron et al. 2004) and the LASSO solution path which is a modified LARS path.

EPoSI for sequential regression
To apply the EPoSI method on a sequential regression procedure, similar to (13), consider the selection event where M k and s k are respectively the set of current active covariate(s) and sign vector of their corresponding estimated regression coefficient(s) after k steps. If the polyhedral selection property is satisfied, then by (14), we have Given A k and B k , valid conditional confidence intervals for the projected coefficient(s) of the newly-entered covariate(s) after k steps, conditioning on E n,k (M k , s k ), can be constructed as prescribed in Section 6.2. Marginalizing over 2 |M k | possible sign patterns yields a confidence interval upon conditioning In case of a modified forward stepwise regression, Tibshirani, Taylor, Lockhart & Tibshirani (2016) explicitly construct A k and B k through induction. They show that B k = 0, for all k, and both A k and B k have exactly 2pk − k(k + 1) rows after k steps, where p is the total number of available covariates in the data. Moreover, A k and B k are also formulated when applying this scheme to the LARS and LASSO solution path. In this case, the authors demonstrate that both the affine matrix A k and vector B k roughly reach 3pk rows after k steps, resulting in relatively heavier computation load. To remedy the computational burden in case of the LARS path, the authors propose a compact version of A k and B k , denoted by A k and B k . The compact versions only have k+1 rows after k steps, which immensely enhance the computational feasibility and efficiency of the EPoSI method. The authors show that applying A k and B k for uncertainty quantification about w k μ requires the contrast vector at k th step, w k , to be in the column span of current active variables selected by LARS. As a consequence, they propose the so-called spacing test H 0,k : w k μ = 0, to assess the significance of the latest selected variable at the k th step when regressing μ on the column space of X M k . The primary strength of the spacing test lies on the simplicity of its corresponding one-and two-sided spacing statistics (Tibshirani, Taylor, Lockhart & Tibshirani 2016). Recently, Azais et al. (2018) have examined the distribution of the spacing test statistic under the alternative hypothesis. The authors have studied the power of this test and proven that it is unbiased for LARS.
Under the framework of the linear regression model (1), Lockhart et al. (2014) present a closely related method to test the significance of all the k−1 predictors entered before the k th knot λ k along the LASSO solution path. Their goal is to test H 0 : supp(β 0 ) ⊆ M k−1 against its alternative, where supp(β 0 ) = {j : β 0 j = 0, 1 ≤ j ≤ p}. Their proposed statistic for testing the null hypothesis H 0 is called covariance test statistic and is given by where ·, · denotes the dot product, β LASSO (λ k+1 ) is the estimated coefficient(s) along the LASSO solution path with tuning parameter λ = λ k+1 using predictors in M k , and β LASSO (λ k+1 ) is the estimated coefficient(s) obtained by re-applying LASSO with tuning parameter λ = λ k+1 to the active variables in M k−1 . In other words, C k in (21) measures how much change in the covariance between the response vector Y and the fitted values can be attributed to the newly-entered predictor(s) at the tuning parameter value λ = λ k+1 . Lockhart et al. (2014) show that under H 0 : supp(β 0 ) ⊆ M k−1 , and certain conditions on the design matrix X, as (n, p) − → ∞, where Exp(1) is the exponential random variable with scale parameter 1. Thus, the approximated exponential distribution is used to test H 0 at a designated significance level. Tibshirani, Taylor, Lockhart & Tibshirani (2016) establish asymptotic equivalence between the covariance test statistic C k and a modified version of their spacing test statistic, R k , as exp (C k ) = R k (1 + o p (1)).

Recent work
PoSI and related work: Following the idea of PoSI method ) delineated in Section 6.1, Bachoc et al. (2020) develop a general framework to construct asymptotically (p fixed and n → ∞) valid marginal confidence interval for the projection-based target b M defined in (3), irrespective of model selectors and selected submodel M . Remarkably, the large-sample coverage rate attained by such interval for b M is also uniformly achieved over an arbitrary sequence of underlying data generating mechanisms. One salient feature of this method is that it does not require the homoscedasticity and normality assumptions, Y ∼ N(μ, σ 2 I), as in Berk et al. (2013). Moreover, this approach is applicable to a wide range of model settings, including homoscedastic or heteroscedastic linear regression models and binary regression models. Specifically, when fitting homoscedastic linear regression models to non-Gaussian data, the proposed confidence interval has the same form as (8) with K = K UPoSI , known as the uniform PoSI-constant. In comparison with the PoSI method, K UPoSI is computed as the 100(1 − α) th percentile of the maximum of Gaussian random variables, whereas the PoSI-constant, K PoSI , is that of the maximum of the ttype random variables as in (10). In addition, in constructing CI j· M (K UPoSI ), the estimator of the error variance, σ, is obtained based on the selected submodel. In contrast, this estimator is calculated based on the full model in formulating the PoSI confidence interval CI j· M (K PoSI ). In practice, the algorithm for computing K UPoSI is given in the supplementary material of Bachoc et al. (2019).
So far, the aforementioned methodologies have been proposed upon conditioning on all the regressors contained in the design matrix X. The implication is that the distribution of X is independent of any model parameter of interest, say θ, to be estimated. Such ancillarity assumption of the distribution of X with respect to θ can be mathematically expressed as where f X,Y , f Y |X and f X respectively denote the joint distribution of (X, Y ), conditional distribution of Y given X and marginal distribution of X.  (Section 4) give a nice account of the dependence of model coefficient on the distribution of X, especially when the true mean response vector μ is nonlinear in X and X is heteroscedastic. As such, the ancillarity assumption in (22) where the expectation is with respect to the joint distribution of (X, Y ). Kuchibhotla et al. (2018a) obtain the estimation error bounds, in terms of both 1 -and 2 -norms, for ( β M − ζ M ) and the linear representation uniformly over the set of all submodels in M all with sizes less than a prespecified value k. Moreover, the corresponding rates of convergence are also derived. Kuchibhotla et al. (2018b) construct both finite-and large-sample valid simultaneous confidence regions for ζ M . In practice, these confidence regions are computed based on high-dimensional central limit theorem (Chernozhukov et al. 2017) and multiplier bootstrap (Zhang & Cheng 2014). On the other hand, for a selected submodel M (a random quantity), Rinaldo et al. (2019) study the inference for ζ M , referred to as the linear projection parameter, which is the best linear predictor of Y using X M . The authors propose two constructions of simultaneous confidence regions for ζ M . Respectively, they are based on a combination of sample splitting and bootstrap, and sampling splitting and normal approximation. The asymptotic coverage rates of these two constructions are also investigated. Lei et al. (2018) extend the aforementioned assumption-lean framework to predictive inference and construct marginally valid prediction intervals for the predicted responses.
From the perspective of algorithmic stability, Zrnic & Jordan (2020) propose a construction of valid post-selection confidence intervals for ζ M using the generic form specified in (8) with an appropriate choice of the quantity K (other than the PoSI constant K PoSI (10)). Heuristically, a model selection procedure M n is stable if given two realizations of data, the probability that two submodels selected by M n are different is small. Their method is based on the logic that for a stable model selection procedure, the submodel selected is almost independent of the data. As a consequence, it is reasonable to conduct both model selection and valid statistical inference on the same data. In other words, a more stable model selection procedure leads to confidence intervals that are shorter, less variable in terms of lengths, and closer to the confidence intervals obtained as if the model was fixed a priori. Guided by this, given a (η, τ, ν)-stable model selection procedure M n (ibid., Definition 2) and to achieve (9), a valid 100(δ + τ + ν)% post-selection confidence interval (8) is obtained as follows: (i) when σ is known, . Moreover, they also provide stabilized modifications of the LASSO, marginal screening and forward stepwise selection procedures. In comparison with the PoSI approach discussed in Section 6.1, their approach bypasses the computation of the PoSI constant. As such, it is more computationally efficient.
EPoSI theories: Besides the aforementioned advance in post-model-selection inference, the exact post-selection inference (EPoSI) and EPoSI for sequential regression (sequential EPoSI), respectively discussed in Sections 6.2 and 6.3, have also been further studied. Two main assumptions of the existing frameworks are that (i) the variance of responses, σ 2 , is known, and (ii) the response Y follows a Gaussian distribution. To address the first assumption,  apply the EPoSI approach to square-root LASSO (Belloni et al. 2011) for inference on selected submodel after model selection, where an estimate for σ 2 is derived based on the square-root LASSO. Tibshirani et al. (2018) propose an alternative solution via constructing a computationally efficient bootstrapped version of the truncated normal statistics in (16) which does not depend on σ 2 . To remove the second assumption on Gaussianity of Y , Tian & Taylor (2017) and Tibshirani et al. (2018) respectively examine large-sample conditional framework of the EPoSI and sequential EPoSI. Specifically, both work show that under certain conditions on the distribution of Y , selection procedures and unknown regression coefficients, the pivotal quantity in (17) and its counterpart for sequential EPoSI converge (n → ∞, p constant) to the uniform distribution so that subsequent construction of post-selection confidence interval can be conducted in the same fashion. Taylor & Tibshirani (2018) further explore the asymptotic aspects of EPoSI (assuming p is fixed) and show that this framework can be generalized for statistical inference of a large class of 1 -penalized regression models, including generalized linear models, Cox's proportional hazards model, and the graphical LASSO (Friedman et al. 2008). Zhao, Small & Ertefaie (2021) employ the EPoSI approach in a two-stage proposal for solving effect modification problem and demonstrate theoretically and via simulations that this method is asymptotically valid. Hyun et al. (2018) extend the framework of EPoSI to the so-called generalized LASSO estimators with specific examples of fused LASSO, trend filtering and graph fused LASSO and useful application to changepoint detection problem.
EPoSI and its extension: Apart from the above theoretical development of the EPoSI framework, progress has been made to extend this idea to several important problems in statistics, including change point detection, graphs, clustering, and regression trees, some of which we review below. Hyun et al. (2021) study post-detection inference of change point with the following setup, where the mean vector θ = (θ 1 , ..., θ n ) follows a piecewise constant structure. Specifically, for location indices where t ∈ {0} ∪ Z + is the number of change points. The goal is to use a data-dependent method to test whether there is a change in the mean at these locations. Upon characterizing the changepoint detection (using methods such as binary segmentation, wild binary segmentation, circular segmentation and fused LASSO) as polyhedral selection events in (14), Hyun et al. (2021) apply the EPoSI framework and obtain p-values for the hypothesis testing of interest. Based on (24) and using changepoint detection methods such as binary segmentation, 0 segmentation and fused LASSO, Jewell et al. (2021) propose an EPoSI-based construction that conditions on less information, resulting in tests with improved powers. On the other hand, Mehrizi & Chenouri (2021) study a variant of the change point detection problem where the underlying mean vector is a piecewise polynomial function. Under this setting, change points identified by a filtering algorithm forms a polyhedral set. Thus, the EPoSI scheme can be used to obtain p-values and confidence intervals. In addition, two extensions with less conditioning requirements, respectively conditioning on only the target change point, and only the target change point and its adjacent neighbors being in the selected change points set, are proposed to produce shorter confidence intervals. Chen et al. (2021) present an EPoSI-based framework for testing the significance of a difference in the means of two connected components obtained from the graph fused LASSO. Their method conditions on less information and is shown to be more powerful than the method of Hyun et al. (2018). In clustering, Gao et al. (2021) consider testing for a difference in means of a pair of clusters identified via a data-dependent procedure. Upon showing that sample splitting is not a valid procedure in this case, the authors propose a method that conditions on the selected clusters and as opposed to (16), a truncated chisquared counterpart is obtained in the computation of p-values. Neufeld et al. (2021) apply EPoSI on inference associated with the Classification and Regression Tree algorithm. Specifically, the authors obtain EPoSI-based p-values for testing a difference in the mean response between a pair of terminal nodes and confidence interval for the mean response within a single terminal.

Statistical inference for population-based target: bias correction and sampling techniques
In this section, our goal is to discuss another line of work on inference about the population-based target β 0 , even though these are not considered as postmodel-selection methods. We review three methods, bias-correction, sampling techniques, and an optimization perspective for constructing p-values and confidence regions for β 0 under the framework of linear regression model (1) (unless otherwise specified).

Bias-correction
The general idea of the methods in this category is to (asymptotically) remove or quantify the bias of a regularized estimator, such as the LASSO or ridge estimator, in estimating β 0 . Consequently, the distribution of the bias-corrected estimator can be approximated by a tractable distribution, which then enables us to construct p-values for testing the hypotheses H j : β 0 j = 0, for j = 1, ..., p, and confidence intervals for β 0 j 's.

De-biasing LASSO:
A well-known bias-correction technique is the de-biasing LASSO approach pioneered by Javanmard & Montanari (2014b) and , where the initial scaled LASSO estimator undergoes a biascorrection stage, leading to a so-called low-dimensional projection estimator (LDPE). The resulting corrected estimator thus becomes de-sparsified but asymptotically unbiased with a limiting Gaussian distribution. This idea is further studied by van der Geer et al. (2014) and Javanmard & Montanari (2014a). The general form of a de-biasing LASSO estimator is given by where β LASSO (λ) is the LASSO estimator with tuning parameter λ in (6), and Θ is a p × p matrix which has different constructions according to the aforementioned works. Assuming that the inverse of population variance-covariance matrix is sparse, one example of such construction is Θ = Σ −1 , where Σ = X X/n is the sample variance-covariance matrix, assuming the existence of its inverse. In high-dimensional settings, Θ is obtained using the graphical LASSO (Friedman et al. 2008). By relaxing the sparsity assumption on inverse of the covariance, another example of Θ is provided by Javanmard & Montanari (2014a), which is obtained by minimizing the error term Δ n and the variance of the Gaussian term Z n in (25) below. Under different assumptions and constructions of Θ, and appropriate choice of a sequence of tuning parameters λ = λ n , a uniform asymptotic normality result is established as follows, where Z n ∼ N (0, σ 2 Θ Σ Θ ), and sup with s 0 = |{1 ≤ j ≤ p : β 0 j = 0}| being the size of the active set, which is commonly referred to the sparsity constant. One possible estimator for σ 2 is the scaled LASSO estimator . Therefore, similar to the standard maximum likelihood theory, (25) is used to construct asymptotically valid confidence intervals for β 0 and to obtain p-values for testing the hypotheses H j : β 0 j = 0, for j = 1, ..., p. To compare these aforementioned four proposals , Javanmard & Montanari 2014a,b, van der Geer et al. 2014, in addition to different construction of Θ, we note that both Javanmard & Montanari (2014a) and Javanmard & Montanari (2014b) also consider a random design matrix case, whereas the other two works focus on deterministic designs. Moreover, Javanmard & Montanari (2014b) require n ≥ s 0 log (p/s 0 ). In contrast, Javanmard & Montanari (2014a) and van der Geer et al. (2014) require n (s 0 log (p)) 2 . (2013) proposes a bias-correction method by focusing on correcting the bias of the ridge estimator. Define θ 0 by θ 0 = P X β 0 , where P X = X (XX ) − X, and " − " denotes the pseudo-inverse of a matrix. Consider the ridge estimator β ridge (λ) = arg min

Bias-corrected ridge estimator: Bühlmann
where λ is the tuning parameter, Σ = X X/n and I p is the p × p identity matrix. Here, p = p(n) → ∞ as n → ∞. Bühlmann argues that the ridge estimator (26) is a reasonable estimator for θ 0 since the bias of β ridge (λ) in estimating θ 0 can be controlled in a fashion as discussed below in (28). Therefore, the bias of β ridge (λ) in estimating the population-based regression coefficient β 0 now can be decomposed componentwise as Based on (27), Bühlmann (2013) then shows that |E( β ridge,j (λ)) − θ 0 j | is a minor term which can be controlled, while |θ 0 j −β 0 j |, is a major term, which can be estimated by a newly-proposed post-corrected estimator given in (29). Specifically, for the minor bias term, we have where σ min =0 ( Σ) is the minimal non-zero eigenvalue of the matrix Σ. In light of the bias-and-variance trade-off, λ is chosen to achieve a relatively small upper bound for |E( β ridge,j (λ)) − θ 0 j | while attaining a moderate variance of β ridge,j (λ). Now, to address the major bias term |θ 0 j − β 0 j |, it is worthwhile to note that Next, θ 0 j is estimated by β ridge,j (λ) and one possible initial estimator for β 0 k in the second summand is the LASSO estimator, β LASSO,k (λ). This leads to the post-correction ridge estimator Under a sparsity condition that s 0 = |{1 ≤ j ≤ p : β 0 j = 0}| = o((n/ log p) ξ ) for some 0 < ξ < 1/2, and compatibility condition for Σ (van der Geer 2007), Bühlmann (2013) shows where Z ∼ N (0, I), U = (U 1 , ..., U p ) and |U j | ≤ U j := max Here, V is the variance-covariance matrix of the ridge estimator and κ is the parameter controlling sparsity. It is crucial to note that differing from the debiasing LASSO estimator approach, the bias term U in (30) does not vanish asymptotically. In fact, the upper bound of U is carried over to the construction of confidence intervals for β 0 and p-values for testing the hypotheses H j : β 0 j = 0, j = 1, ..., p. To be more specific, for example, the 100 and Φ(·) is the cumulative distribution function of the standard normal. On the other hand, the p-values for testing the hypotheses H j : β 0 j = 0 are given by Other bias correction techniques: Apart from the aforementioned research in the linear regression context, there has been further developments on this topic under different settings, some of which are presented below. Neykov et al. (2018) have proposed post-corrected parameter estimators and asymptotically valid confidence regions for parameters of interest in the context of estimating equations. The bias-correction step of this method is based on projecting an estimating equation under consideration onto a certain sparse direction, which is obtained via solving a tractable linear optimization problem. A Z-estimator (van der Vaart & Wellner 1996) for the parameter of interest is then obtained as the root of the projected estimating equation. Neykov et al. (2018) show that the resulting parameter estimator is consistent and asymptotically normal, hence facilitating the construction of a valid confidence region. This method is applicable to instrumental variable regression, graphical models, linear discriminant analysis and autoregressive models.
In the context of program evaluation and various causal inference applications, Belloni et al. (2015) have developed bias-correction methods targeting inference about causal effects. Specifically, using the absolute deviations (LAD) method for regression model with homoscedastic error, they construct uniformly valid confidence regions for the parameters of interest in the presence of a highdimensional nuisance parameter. In contrast with the foregoing discussions, the bias-correction step of their method is based on an orthogonal moment equation for the parameters of interest. It has been shown that the resulting LAD regression parameter estimator is uniformly asymptotically normal, hence can be used to construct an asymptotically valid confidence region for the parameters of interest (Belloni et al. 2014, Belloni, Chernozhukov & Wei 2013.

Re-sampling
The re-sampling techniques such as data-splitting and bootstrapping are also used to perform valid inference for the population-based regression coefficient β 0 , which we review in this section.

Data-splitting:
The main idea behind this method is aligned with Fisher's view discussed in the Introduction that the same data cannot be used for exploration and validation for statistical modeling. As such, a portion of the data is only used for model selection while the remaining portion is used to perform inference based on that selected submodel. Wasserman & Roeder (2009) propose a data splitting procedure, also referred to as screen and clean, in which the data are randomly divided into three parts, D 1 , D 2 and D 3 , of approximately equal sizes. For a given tuning parameter λ, the LASSO is used to obtain an active set based on D 1 . Then cross validation based on D 2 is used to select an optimal tuning parameter, yielding an estimated active set. Hypothesis testing for the regression coefficients corresponding to the estimated active set is then performed using the ordinary least-squares based on D 3 . This procedure produces one p-value for testing the effect of every selected covariate.
The resulting p-values in Wasserman & Roeder (2009) are sensitive to the randomness due to the data splitting, which can be a cause of concern when the sample size is small. To address the issue, Meinshausen et al. (2009) propose a so-called multi sample-splitting method, in which the data are randomly split into two parts B times, obtaining (D b 1 , D b 2 ), b = 1, ..., B. This procedure results in B p-values for testing H 0 : β 0 j = 0, j = 1, ..., p. They then adopt a quantile approach to aggregate these p-values and obtain a single p-value corresponding to each covariate, as described in Algorithm 2. For a selected submodel M , having assumed the sure screening property, that is lim n→∞ P(M true ⊆ M ) = 1, and the sparsity property, that is | M | < n/2, it is shown that the aggregated p-values are capable of (asymptotically) controlling the family-wise error rate and false discovery rate. Model selection procedures satisfying these two requirements include LASSO, L 2 boosting (Friedman 2001, Bühlmann 2006, orthogonal matching pursuit (Tropp & Gilbert 2007) and sure independent screening (Fan & Lv 2008). One key difference between the single and multi samplesplitting described above is that the former assigns p-values to only the selected covariates while the later assigns p-values to all the covariates. where γ ∈ (0, 1) and γ min is a lower bound for γ, usually 0.05. Here, Q j (γ) is defined by Q j (γ) = min 1, qγ p b adj,j /γ, b = 1, ..., B , j = 1, ..., p, where qγ (·) denotes the empirical γ-quantile function.
Khalili & Vidyashankar (2018) propose a multi sample-splitting approach which assigns p-values only to the selected covariates. Their framework is applicable to linear, generalized linear and finite mixture models, and is shown to attain asymptotic control of family-wise error rate at a given significance level.
To perform statistical inference on a low-dimensional parameter of interest in the presence of a high-dimensional nuisance parameter, Chernozhukov et al. (2018) propose an asymptotically and uniformly valid inference method that is based on cross-fitting. Specifically, in this method, the original data is first randomly partitioned into K folds with approximately equal sizes, indexed by I j , j = 1, ..., K, and let I c j = {1, ..., n} \ I j . For each j = 1, ..., K, an estimator for the nuisance parameter is obtained using an existing machine learning method, such as the LASSO or boosting, based on the subset of data indexed by I c j . Then, for each j = 1, ..., K, an estimator for the parameter of interest is obtained via solving a Neyman orthogonal score equation upon plugging in the estimator for the nuisance parameter. A final estimator for the parameter of interest, referred to as the debiased machine learning (DML) estimator, is obtained by averaging over all the estimators across all j = 1, ..., K. Chernozhukov et al. (2018) theoretically prove that the DML estimator is asymptotically normal, which can then be used to construct asymptotically and uniformly valid confidence intervals for a parameter of interest. Though not being a paradigm of re-sampling procedures, as a closely related work, Belloni, Chernozhukov & Hansen (2013) consider a similar problem by focusing on statistical inference for a treatment effect in the presence of high-dimensional nuisance covariates x j , j ∈ {1, ..., p}, in a partially linear regression model. Their development involves two variable selection steps using a version of the LASSO (called feasible LASSO) defined in Belloni et al. (2012): (i) selecting nuisance variables indexed by M 1 ⊆ {1, ..., p} via regressing the treatment effect on all the x j , and (ii) selecting nuisance variables indexed by M 2 ⊆ {1, ..., p} via regressing the response on all the x j . Then, an estimator for the treatment effect is obtained by regressing the response on the treatment effect and nuisance covariates indexed by M 1 ∪ M 2 . As such, this estimator is referred to as the post-double-selection (PDS) estimator. Similar to Chernozhukov et al. (2018), they show that the PDS estimator is asymptotically normal, leading to the construction of asymptotically and uniformly valid confidence interval for the parameter of interest, that is the treatment effect.
Bootstrapping: The idea of this approach is that the distribution of a penalized regression estimator, such as the LASSO, AdaLASSO estimator in (6) or de-biasing LASSO estimator , can be well approximated by their bootstrap counterparts (see (33) below). Confidence intervals can then be constructed based on the distribution of the bootstrap estimators. More specifically, let where β LASSO (λ n ) is the LASSO estimator in (6). The goal is to approximate the distribution of T n . To accomplish this, Chatterjee & Lahiri (2011) propose a residual-based bootstrap LASSO estimator, described in Algorithm 3 below. The bootstrap-based confidence region for β 0 is given by where t n (α) is the α th quantile of the empirical distribution of ||T b Boot ||, b = 1, ..., B . Here, T b Boot is defined by where β b Boot (λ n ) is the modified residual-based bootstrap LASSO estimator and β LASSO (λ n ) is the thresholded LASSO estimator, defined in Algorithm 3 1 .
Under certain conditions on the design matrix X, the tuning parameter λ n and error term in (1), it is shown that as n → ∞, where T Boot is the generic random variable representing ||T b Boot ||, b = 1, ..., B , B(·) is the Borel σ-filed and ρ(·, ·) is the Lévy-Prohorov metric over the set of all probability measures on the space (R p , B(R p )). Using (33), it is then shown that the bootstrap-based confidence region R Boot (α) in (31) is a uniformly asymptotically valid confidence region for β 0 , that is
Chatterjee & Lahiri (2013) apply the bootstrap technique based on the AdaLASSO estimator to approximate the distribution of a studentized version of T n , denoted by R n = T n / σ n , where σ 2 n is the sample variance of the AdaLASSO-based residuals. They show analytically and via simulations that the bootstrap-based confidence intervals so obtained has better coverage compared to those obtained from the oracle-based normal approximation to the distribution of T n , even when the regression parameter dimension is unbounded. Liu & Yu (2013) propose a residual-based bootstrap approach, where residuals are computed in a similar fashion as in Step 2 of Algorithm 3, where β LASSO (λ n ) is replaced by a modified least-squares or ridge coefficient estimator, based on a submodel M selected by LASSO. The modified least-squares estimator is based on thresholding the singular values of the matrix X M X M /n. The reason for this modification is that the matrix inversion required in the OLS estimator (4) may not be stable when the smallest nonzero eigenvalue of X M X M /n is close to 0. Similar to Chatterjee & Lahiri (2011), the asymptotically valid confidence region R Boot (α) in (31) can then be constructed.
On the other hand, focusing on valid statistical inference for an individual or a group of entries of β 0 , Dezeure et al. (2017b) study the application of bootstrap procedures to the de-biasing LASSO estimator. They demonstrate that bootstrapping the de-biasing LASSO estimator is capable of producing asymptotically valid: p-values for testing H 0 : β 0 j = 0, j ∈ G ⊆ {1, ..., p}; and simultaneous confidence region for β 0 j , j ∈ G ⊆ {1, ..., p}. Specifically, three versions of bootstrap methods are presented, namely the residual bootstrap, multiplier wild bootstrap, and a so-called paired bootstrap method. For each of these procedures, they show the consistency of the resulting bootstrapped estimator in the sense of (33).

Optimization Perspective
Apart from the bias-correction and re-sampling techniques, methods from optimization theory, in particular techniques associated with variational inequalities (Hartman & Stampacchia 1966) and normal maps (Robinson 1992(Robinson , 1993, have been used to provide valid asymptotic confidence intervals for β 0 . Generally speaking, a local minimizer of an objective function over a compact and convex set S in R n is also a solution to a so-called variational inequality involving the gradient of the objective function. Moreover, the variational inequality can be equivalently written as a function, called a normal map, which is induced by the objective function and the set S. These two mathematical objects can then be connected by the fact that a solution in S to the variational inequality is a root to the associated normal map. Assuming that the design matrix X is random and the dimension p is fixed, Lu et al. (2017) consider solving the so-called random-design population version of the LASSO problem where the solution to (34), denoted by (β 0 ,β), is called the population penalized (LASSO) parameter. By transforming (34) to its corresponding variational inequality and normal map formulations, the authors propose a four-step method to construct asymptotically valid confidence interval for (β 0 ,β) based on the LASSO estimator β LASSO (λ): Step 1: Transform the minimization problem in (34) to its normal maps; Step 2: Obtain the asymptotic distributions for the solutions to the normal maps; Step 3: Construct individual and simultaneous confidence intervals for the solutions obtained in Step 2; Step 4: Assuming the linear regression model in (1) is the true model, convert the confidence intervals obtained in Step 3 to the ones for the population-based target β 0 .

Simulation study
In this section, we compare various post-selection confidence intervals through simulated data. In light of the simulation results by Dezeure et al. (2015) for the population-based targets, here our focal point is a comparison between the confidence intervals for the projection-based targets using the PoSI and EPoSI approaches, though we also include the naïve and Scheffé's confidence intervals. Details of data generation are presented in Section 8.1. Assessing metrics are defined in Section 8.2. Specifics pertinent to method, code and implementation are described in Section 8.3, followed by an analysis of simulation results in Section 8.4. Since the PoSI approach is computationally expensive for p > 20, we design simulations for p ≤ 20 so that the PoSI and EPoSI confidence intervals can be compared.

Data generation
Design matrices X are simulated with n = 40 and 60, and p = p(n) = n 2/3 + 2, where · denotes the ceiling function. Thus, the number of covariates are respectively 14 and 18 for n = 40 and 60. With these dimensions, five types of design matrix with the following variance-covariance structures are considered: low-dimensional setting and is formed by putting together p orthonormal n-tuples obtained by first draw a set of i.i.d. standard Gaussian n-tuples and then applying the Gram-Schmidt procedure. 5. Design matrix from a real data set: we include a design matrix (denoted by X Rl ) from the standardized diabetes data from Efron et al. (2004). X Rl has n = 442 observations and p = 10 covariates. The 10 covariates are the first 10 columns of the diabetes data, namely age (age), sex (sex), body mass index (BMI), average blood pressure (map) and six blood serum measurements (tc, ldl, hdl, tch, ltg, glu).
Given a design matrix X according to any of the scenarios 1-5, it remains fixed throughout the simulations. For the response vector Y , we noticed in Section 5.2 that the construction of the PoSI and EPoSI confidence intervals do not require the first order correctness condition E(Y ) = μ = Xβ 0 . In our simulations, yet, we assume this condition for simplicity. In other words, the n × 1 dimensional response vectors Y are generated from the linear regression model (1). Given a design matrix X, we generate 10000 copies of Y , randomly from four linear regression models with error variance σ 2 = 2 and the regression coefficients specified in TABLE 2.

Assessing metrics
We set the nominal coverage probability for the confidence intervals to be 1 − α = 0.95. To compare various post-selection confidence intervals, we consider two assessing metrics: the empirical average coverage probability (CP j·M ), and the empirical average length of confidence intervals (L j·M ), conditioning on a submodel M selected by M n . In particular, where 1{·} is the indicator function, CI i j·M is the confidence interval for the j th projection-based regression coefficient b j·M within the submodel M corresponding to the i th random sample, and Length{·} is the length of the confidence interval. In case of the EPoSI confidence intervals, the above quantities are also computed by conditioning on both a selected submodel M and the sign vector s of its associated regression parameter estimate(s).

Method, code and implementation
To satisfy the polyhedral selection property required by the EPoSI method, in our simulations we use the LASSO with a fixed tuning parameter λ as the model selector. The λ is chosen according to the strategy outlined in Section 4 of Negahban et al. (2012), who propose λ = 2E(||X || ∞ ) so that convergence rate for errors of the LASSO estimate(s) can be controlled. Given a design matrix X, we compute an empirical version of λ by randomly generating 1000 samples of from the multivariate normal N (0, σ 2 I), where σ 2 is the residual mean square obtained from the full model.
We compare the following five post-selection confidence intervals: Naïve, Scheffé, PoSI, EPoSI1 and EPoSI2, where EPOSI1 is the EPoSI confidence interval constructed by conditioning on a selected submodel M , and EPoSI2 is the one derived by conditioning on both a selected submodel M and the sign vector s of its associated regression parameter estimate(s). We use the functions PoSI and summary.PoSI from the R package PoSI  to compute the PoSI-constant and corresponding confidence intervals. For the EPoSI confidence intervals, we use the functions selection.int, grid.search and myptruncnorm 2 .

Analysis of simulation results
The simulation results for Model I and the design matrices X Eq , X Ec and X Rl , with dimension n = 40 and p = 14, are respectively given in TABLE 3, 4 and 5. Each table contains the empirical average coverage probability (CP j· M ), and the empirical average length of confidence intervals (L j· M ), corresponding to the projection-based regression coefficients b j· M , j = 1, 2 and 3, for the three most frequently submodels selected by the LASSO that contain x j (sorted by the empirical model selection frequency in descending order). Simulation results corresponding to (1) From TABLE 3 to 5, we have observed a mix of under-and satisfactory coverage of naïve confidence intervals for projected regression coefficients. Such Table 3 CP j· M and L j· M , j = 1, 2 and 3, of post-selection confidence intervals based on Model I and X Eq . phenomena depend on the types of design matrices under consideration. For example, in the case of X Rl , the coverage probabilities for all the three projected regression coefficients b j· M , j = 1, 2 and 3, fall below the desired nominal cov-  Leeb et al. (2015), indicating that although naïve confidence intervals in general fail to achieve desired nominal coverage probabilities for the population-based regression coefficients (also reported in TABLE 1), both under-and satisfactory coverage can be achieved for the projected regression coefficients. In addition, the extent of the under-coverage is moderate, depending on the specific structures of the design matrices. In contrast, the Scheffé and PoSI confidence intervals consistently achieve over-coverage for all the projected regression coefficients, under all the design matrices considered here. Focusing on the EPoSI confidence intervals, empirical results demonstrate that similar to the naïve confidence intervals, a mix of underand satisfactory coverage exists for both EPoSI1 and EPoSI2. For example, although over-coverage is attained by EPoSI2 for all the projected coefficients in the case of X Rl , under-coverage is seen under the other two design matrices (CP 2·{1,2,3,11} in the case of X Eq for instance). This phenomenon also holds for EPoSI1. Furthermore, we can see that the coverage probabilities of EPoSI1 are aligned with those attained by the naïve confidence intervals.
In terms of the empirical average length (L j· M ), from TABLE 3 to 5 we observe that the Scheffé confidence intervals are always wider than the PoSI confidence intervals. In fact, the ratio of the empirical average interval length of these two confidence intervals is between 1.25 and 1.32, under the three design matrices X Eq , X Ec and X Rl . This observation agrees with the fact that the Scheffé's confidence intervals are more conservative (wider) than the PoSI confidence intervals, as discussed in Section 6.1.
Moreover, it is worthwhile to notice that conditioning on a selected submodel only, the EPoSI confidence intervals (EPoSI1) are always shorter than those obtained by conditioning on both a selected model and sign vector of corresponding point estimator(s) (EPoSI2). Furthermore, EPoSI1 are close to (or even shorter than) the naïve confidence intervals. For example, in the case of X Eq as reported in TABLE 3, the average length L 1·{1,2,3} of EPoSI1, being 1.51, is shorter than that of naïve confidence interval, being 1.56. But in the case of X Ec as reported in TABLE 4, L 1·{1,2,3,5} of EPoSI1 is wider. On the other hand, EPoSI2 are always wider than the naïve confidence intervals.
Comparing the EPoSI confidence intervals with those based on the PoSI and Scheffé, we have observed that under all the models and design matrices considered here, EPoSI1 are always shorter. In contrast, EPoSI2 can sometimes be surprisingly wider. As an example, for the projected regression coefficient b 1·M , in the case of X Rl (TABLE 5), when conditioning on the submodel {1, 3} and sign vector of its associated parameter point estimator(s), the ratio of empirical average interval length of EPoSI2 over the PoSI and Scheffé confidence intervals are approximately 2.07 and 1.63, respectively. This phenomenon is consistent with the explanations in Lee et al. (2016): when the signal is relatively week (as in Model I), the truncated Gaussian variable, η j Y in (16), is near the boundary points of truncated intervals, leading to wider confidence intervals.
Yet, we wish to emphasize that the competition between the PoSI confidence intervals and EPoSI2 does depend on other factors as well, such as the design matrix and a particular selected submodel. For instance, in the case of X Eq and as seen in TABLE 3, the average length L 2·{1,2,3} of the PoSI confidence interval, being 3.70, is larger than that of the EPoSI2, being 1.76.
In our simulation study, we have also observed that the EPoSI confidence intervals, EPoSI1 and EPoSI2, sometimes have infinite lengths. Therefore, the results corresponding to these intervals reported in TABLE 3 to 5 are only associated with the finite EPoSI confidence intervals. To be more specific, in TABLE 6 to 8, under Model I and the design matrices X Eq , X Ec and X Rl , we report the percentages of the time that the EPoSI confidence intervals have finite lengths, out of the total number of such intervals constructed for each selected submodel. As we can see from these three tables, the percentages of finite EPoSI1 and EPoSI2 differ dramatically according to the types of design matrix. In our simulations, both EPoSI1 and EPoSI2 in the case of X Ec have the lowest such percentages. For example, for the projected regression coefficient b 1·{1,2} , only 75.3% of the EPoSI1 are finite, while almost all the EPoSI2 are infinite. Moreover, we also see that the percentages of finite EPoSI1 are always higher than those of EPoSI2. In a recent paper, Kivaranovic & Leeb (2021) show that the expected length of EPoSI2 interval is infinite, and they have also derived necessary and sufficient conditions under which the expected length of EPoSI1 interval is infinite.

Conclusion
In this section, we summarize our observations and conclusions on various constructions of the confidence intervals for the projection-and the populationbased regression coefficients.

Projection-based regression coefficients
1. Both EPoSI1 and EPoSI2 can have infinite lengths depending on the type of the design matrix. Moreover, EPoSI1 is more likely to be of finite length than EPoSI2. This observation has also been discussed analytically by Kivaranovic & Leeb (2021). In view of this observation, we divide our discussions below into two groups, which are (i) either EPoSI1 or EPoSI2 is of finite length; (ii) neither is of finite length. The point 2 below pertains to the former case, while point 3 is about the latter. 2. p is relatively small (recommended cut-off point by Berk et al. (2013) is 20): In this scenario, it is practical to apply both the PoSI and the EPoSI methods. Considering coverage probabilities, under-coverage for the projected regression coefficients exists for both EPoSI1 and EPoSI2. In comparison, the PoSI confidence intervals consistently achieve the desired nominal coverage probability. In terms of the length of confidence intervals, the EPoSI1 are the shortest among all the competing confidence intervals and they can be even shorter than the naïve confidence intervals. On the other hand, the competition between the EPoSI2 and PoSI depends on several factors, one of which is the type of the design matrix. In some cases, the EPoSI2 can be wider than the PoSI confidence intervals. For p relatively large: In this case, the computational cost of the PoSI approach can be high. Similarly, since 2 | M | calculations are required to obtain the EPoSI1, this interval is not recommended either due to heavy computational loads if p is too large. Therefore, in high-dimensional data where p is large, the EPoSI2 is more computationally feasible among the competing post-selection confidence intervals. 3. When neither EPoSI1 nor EPoSI2 is of finite length: In view of the computational cost associated with the PoSI confidence intervals in this case and having EPoSI1 and EPoSI2 of infinite lengths, we conclude that more research and new ideas are required in this case.

Population-based regression coefficients
Here, we summarize the conclusions made by Dezeure et al. (2015), who did a thorough study on the post-selection inference for the population-based targets. The authors compare the empirical coverage probabilities and lengths of confidence intervals constructed using the methods proposed by , Javanmard & Montanari (2014a), Bühlmann (2013), Meinshausen et al. (2009), Liu & Yu (2013 and Chatterjee & Lahiri (2013) through an empirical study, and make the following observations: (1) the empirical coverage probability of confidence intervals for the true zero regression coefficients is satisfactory for all the methods. Moreover, the empirical coverage probability for the true non-zero coefficients is in accord with the p-values for the corresponding hypothesis tests; (2) the empirical coverage probability for the smallest true non-zero coefficient (i.e. the weakest signal) is very low for all the methods; (3) confidence intervals constructed using the multi sample-splitting (Meinshausen et al. 2009) and ridge-based bias correction (Bühlmann 2013) approaches are in general wider than those derived from the other methods.