Valid post-selection inference

Richard Berk; Lawrence Brown; Andreas Buja; Kai Zhang; Linda Zhao

doi:10.1214/12-AOS1077

April 2013 Valid post-selection inference

Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, Linda Zhao

Ann. Statist. 41(2): 802-837 (April 2013). DOI: 10.1214/12-AOS1077

Abstract

It is common practice in statistical data analysis to perform data-driven variable selection and derive statistical inference from the resulting model. Such inference enjoys none of the guarantees that classical statistical theory provides for tests and confidence intervals when the model has been chosen a priori. We propose to produce valid “post-selection inference” by reducing the problem to one of simultaneous inference and hence suitably widening conventional confidence and retention intervals. Simultaneity is required for all linear functions that arise as coefficient estimates in all submodels. By purchasing “simultaneity insurance” for all possible submodels, the resulting post-selection inference is rendered universally valid under all possible model selection procedures. This inference is therefore generally conservative for particular selection procedures, but it is always less conservative than full Scheffé protection. Importantly it does not depend on the truth of the selected submodel, and hence it produces valid inference even in wrong models. We describe the structure of the simultaneous inference problem and give some asymptotic results.

References

1.

Angrist, J. D. and Pischke, J. S. (2009). Mostly Harmless Econometrics. Princeton Univ. Press, Princeton. 1159.62090Angrist, J. D. and Pischke, J. S. (2009). Mostly Harmless Econometrics. Princeton Univ. Press, Princeton. 1159.62090

2.

Bahadur, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37 577–580. MR189095 0147.18805 10.1214/aoms/1177699450 euclid.aoms/1177699450 Bahadur, R. R. (1966). A note on quantiles in large samples. Ann. Math. Statist. 37 577–580. MR189095 0147.18805 10.1214/aoms/1177699450 euclid.aoms/1177699450

3.

Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Supplement to “Valid post-selection inference.” DOI:10.1214/12-AOS1077SUPP.Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Supplement to “Valid post-selection inference.” DOI:10.1214/12-AOS1077SUPP.

4.

Brown, L. (1967). The conditional level of Student’s $t$ test. Ann. Math. Statist. 38 1068–1071. MR214210 0171.16703 10.1214/aoms/1177698776 euclid.aoms/1177698776 Brown, L. (1967). The conditional level of Student’s $t$ test. Ann. Math. Statist. 38 1068–1071. MR214210 0171.16703 10.1214/aoms/1177698776 euclid.aoms/1177698776

5.

Buehler, R. J. and Feddersen, A. P. (1963). Note on a conditional property of Student’s $t$. Ann. Math. Statist. 34 1098–1100. MR150864 0124.10101 10.1214/aoms/1177704034 euclid.aoms/1177704034 Buehler, R. J. and Feddersen, A. P. (1963). Note on a conditional property of Student’s $t$. Ann. Math. Statist. 34 1098–1100. MR150864 0124.10101 10.1214/aoms/1177704034 euclid.aoms/1177704034

6.

Claeskens, G. and Hjort, N. L. (2003). The focused information criterion (with discussion). J. Amer. Statist. Assoc. 98 900–945. MR2041482 1045.62003 10.1198/016214503000000819Claeskens, G. and Hjort, N. L. (2003). The focused information criterion (with discussion). J. Amer. Statist. Assoc. 98 900–945. MR2041482 1045.62003 10.1198/016214503000000819

7.

Dijkstra, T. K. and Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In On Model Uncertainty and Its Statistical Implications (T. K. Dijkstra, ed.) 17–38. Springer, Berlin. 1114.62303Dijkstra, T. K. and Veldkamp, J. H. (1988). Data-driven selection of regressors and the bootstrap. In On Model Uncertainty and Its Statistical Implications (T. K. Dijkstra, ed.) 17–38. Springer, Berlin. 1114.62303

8.

Hall, P. and Carroll, R. J. (1989). Variance function estimation in regression: The effect of estimating the mean. J. R. Stat. Soc. Ser. B Stat. Methodol. 51 3–14. MR984989Hall, P. and Carroll, R. J. (1989). Variance function estimation in regression: The effect of estimating the mean. J. R. Stat. Soc. Ser. B Stat. Methodol. 51 3–14. MR984989

9.

Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. MR2722294Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York. MR2722294

10.

Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879–899. MR2041481 1047.62003 10.1198/016214503000000828Hjort, N. L. and Claeskens, G. (2003). Frequentist model average estimators. J. Amer. Statist. Assoc. 98 879–899. MR2041481 1047.62003 10.1198/016214503000000828

11.

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med. 2 e124. DOI:10.1371/journal.pmed.0020124. MR2216666Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Med. 2 e124. DOI:10.1371/journal.pmed.0020124. MR2216666

12.

Kabaila, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory 14 463–482. MR1650037 10.1017/S0266466698144031Kabaila, P. (1998). Valid confidence intervals in regression after variable selection. Econometric Theory 14 463–482. MR1650037 10.1017/S0266466698144031

13.

Kabaila, P. (2009). The coverage properties of confidence regions after model selection. International Statistical Review 77 405–414.Kabaila, P. (2009). The coverage properties of confidence regions after model selection. International Statistical Review 77 405–414.

14.

Kabaila, P. and Leeb, H. (2006). On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101 619–629. MR2256178 1119.62322 10.1198/016214505000001140Kabaila, P. and Leeb, H. (2006). On the large-sample minimal coverage probability of confidence intervals after model selection. J. Amer. Statist. Assoc. 101 619–629. MR2256178 1119.62322 10.1198/016214505000001140

15.

Leeb, H. (2006). The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations. In Optimality. Institute of Mathematical Statistics Lecture Notes—Monograph Series 49 291–311. IMS, Beachwood, OH. MR2338549 10.1214/074921706000000518Leeb, H. (2006). The distribution of a linear predictor after model selection: Unconditional finite-sample distributions and asymptotic approximations. In Optimality. Institute of Mathematical Statistics Lecture Notes—Monograph Series 49 291–311. IMS, Beachwood, OH. MR2338549 10.1214/074921706000000518

16.

Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100–142. MR1965844 1032.62011 10.1017/S0266466603191050Leeb, H. and Pötscher, B. M. (2003). The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory 19 100–142. MR1965844 1032.62011 10.1017/S0266466603191050

17.

Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59. MR2153856 1085.62004 10.1017/S0266466605050036Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59. MR2153856 1085.62004 10.1017/S0266466605050036

18.

Leeb, H. and Pötscher, B. M. (2006a). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory 22 69–97. MR2212693 10.1017/S0266466606060038Leeb, H. and Pötscher, B. M. (2006a). Performance limits for estimators of the risk or distribution of shrinkage-type estimators, and some general lower risk-bound results. Econometric Theory 22 69–97. MR2212693 10.1017/S0266466606060038

19.

Leeb, H. and Pötscher, B. M. (2006b). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591. MR2291510 1106.62029 10.1214/009053606000000821 euclid.aos/1169571807 Leeb, H. and Pötscher, B. M. (2006b). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591. MR2291510 1106.62029 10.1214/009053606000000821 euclid.aos/1169571807

20.

Leeb, H. and Pötscher, B. M. (2008a). Model selection. In The Handbook of Financial Time Series (T. G. Anderson, R. A. Davis, J. P. Kreiss and T. Mikosch, eds.) 785–821. Springer, New York.Leeb, H. and Pötscher, B. M. (2008a). Model selection. In The Handbook of Financial Time Series (T. G. Anderson, R. A. Davis, J. P. Kreiss and T. Mikosch, eds.) 785–821. Springer, New York.

21.

Leeb, H. and Pötscher, B. M. (2008b). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376. MR2422862 05564000Leeb, H. and Pötscher, B. M. (2008b). Can one estimate the unconditional distribution of post-model-selection estimators? Econometric Theory 24 338–376. MR2422862 05564000

22.

Leeb, H. and Pötscher, B. M. (2008c). Sparse estimators and the oracle property, or the return of Hodges’ estimator. J. Econometrics 142 201–211. MR2394290 10.1016/j.jeconom.2007.05.017Leeb, H. and Pötscher, B. M. (2008c). Sparse estimators and the oracle property, or the return of Hodges’ estimator. J. Econometrics 142 201–211. MR2394290 10.1016/j.jeconom.2007.05.017

23.

Moore, D. S. and McCabe, G. P. (2003). Introduction to the Practice of Statistics, 4th ed. Freeman, New York. 0701.62002Moore, D. S. and McCabe, G. P. (2003). Introduction to the Practice of Statistics, 4th ed. Freeman, New York. 0701.62002

24.

Olshen, R. A. (1973). The conditional level of the $F$-test. J. Amer. Statist. Assoc. 68 692–698. MR359198 0271.62068Olshen, R. A. (1973). The conditional level of the $F$-test. J. Amer. Statist. Assoc. 68 692–698. MR359198 0271.62068

25.

Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185. MR1128410 10.1017/S0266466600004382Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185. MR1128410 10.1017/S0266466600004382

26.

Pötscher, B. M. (2006). The distribution of model averaging estimators and an impossibility result regarding its estimation. In Time Series and Related Topics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 52 113–129. IMS, Beachwood, OH. MR2427842Pötscher, B. M. (2006). The distribution of model averaging estimators and an impossibility result regarding its estimation. In Time Series and Related Topics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 52 113–129. IMS, Beachwood, OH. MR2427842

27.

Pötscher, B. M. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065–2082. MR2543087 1170.62046 10.1016/j.jmva.2009.06.010Pötscher, B. M. and Leeb, H. (2009). On the distribution of penalized maximum likelihood estimators: The LASSO, SCAD, and thresholding. J. Multivariate Anal. 100 2065–2082. MR2543087 1170.62046 10.1016/j.jmva.2009.06.010

28.

Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator. J. Statist. Plann. Inference 139 2775–2790. MR2523666 1162.62063 10.1016/j.jspi.2009.01.003Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator. J. Statist. Plann. Inference 139 2775–2790. MR2523666 1162.62063 10.1016/j.jspi.2009.01.003

29.

Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360. MR2645488 06166508 10.1214/09-EJS523 euclid.ejs/1268655653 Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360. MR2645488 06166508 10.1214/09-EJS523 euclid.ejs/1268655653

30.

Pötscher, B. M. and Schneider, U. (2011). Distributional results for thresholding estimators in high-dimensional Gaussian regression models. Electron. J. Stat. 5 1876–1934. MR2970179 06166875 10.1214/11-EJS659 euclid.ejs/1325264852 Pötscher, B. M. and Schneider, U. (2011). Distributional results for thresholding estimators in high-dimensional Gaussian regression models. Electron. J. Stat. 5 1876–1934. MR2970179 06166875 10.1214/11-EJS659 euclid.ejs/1325264852

31.

Scheffé, H. (1959). The Analysis of Variance. Wiley, New York. MR116429Scheffé, H. (1959). The Analysis of Variance. Wiley, New York. MR116429

32.

Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7 1019–1033. MR536504 0413.62020 10.1214/aos/1176344785 euclid.aos/1176344785 Sen, P. K. (1979). Asymptotic properties of maximum likelihood estimators based on conditional specification. Ann. Statist. 7 1019–1033. MR536504 0413.62020 10.1214/aos/1176344785 euclid.aos/1176344785

33.

Sen, P. K. and Saleh, A. K. M. E. (1987). On preliminary test and shrinkage $M$-estimation in linear models. Ann. Statist. 15 1580–1592. MR913575 0639.62046 10.1214/aos/1176350611 euclid.aos/1176350611 Sen, P. K. and Saleh, A. K. M. E. (1987). On preliminary test and shrinkage $M$-estimation in linear models. Ann. Statist. 15 1580–1592. MR913575 0639.62046 10.1214/aos/1176350611 euclid.aos/1176350611

34.

Wyner, A. D. (1967). Random packings and coverings of the unit $n$-sphere. Bell System Tech. J. 46 2111–2118. MR223979Wyner, A. D. (1967). Random packings and coverings of the unit $n$-sphere. Bell System Tech. J. 46 2111–2118. MR223979

Citation Download Citation

Richard Berk, Lawrence Brown, Andreas Buja, Kai Zhang, and Linda Zhao "Valid post-selection inference," The Annals of Statistics 41(2), 802-837, (April 2013). https://doi.org/10.1214/12-AOS1077

Published: April 2013

Access the abstract

JOURNAL ARTICLE
36 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY