Annals of Statistics
- Ann. Statist.
- Volume 44, Number 3 (2016), 907-927.
Exact post-selection inference, with application to the lasso
Jason D. Lee, Dennis L. Sun, Yuekai Sun, and Jonathan E. Taylor
Full-text: Open access
Abstract
We develop a general approach to valid inference after model selection. At the core of our framework is a result that characterizes the distribution of a post-selection estimator conditioned on the selection event. We specialize the approach to model selection by the lasso to form valid confidence intervals for the selected coefficients and test whether all relevant variables have been included in the model.
Article information
Source
Ann. Statist., Volume 44, Number 3 (2016), 907-927.
Dates
Received: January 2015
Revised: September 2015
First available in Project Euclid: 11 April 2016
Permanent link to this document
https://projecteuclid.org/euclid.aos/1460381681
Digital Object Identifier
doi:10.1214/15-AOS1371
Mathematical Reviews number (MathSciNet)
MR3485948
Zentralblatt MATH identifier
1341.62061
Subjects
Primary: 62F03: Hypothesis testing 62J07: Ridge regression; shrinkage estimators
Secondary: 62E15: Exact distribution theory
Keywords
Lasso confidence interval hypothesis test model selection
Citation
Lee, Jason D.; Sun, Dennis L.; Sun, Yuekai; Taylor, Jonathan E. Exact post-selection inference, with application to the lasso. Ann. Statist. 44 (2016), no. 3, 907--927. doi:10.1214/15-AOS1371. https://projecteuclid.org/euclid.aos/1460381681
References
- Benjamini, Y., Heller, R. and Yekutieli, D. (2009). Selective inference in complex research. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4255–4271.
- Benjamini, Y. and Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. J. Amer. Statist. Assoc. 100 71–93.Mathematical Reviews (MathSciNet): MR2156820
Digital Object Identifier: doi:10.1198/016214504000001907 - Berk, R., Brown, L., Buja, A., Zhang, K. and Zhao, L. (2013). Valid post-selection inference. Ann. Statist. 41 802–837.Mathematical Reviews (MathSciNet): MR3099122
Digital Object Identifier: doi:10.1214/12-AOS1077
Project Euclid: euclid.aos/1369836961 - Cox, D. R. (1975). A note on data-splitting for the evaluation of significance levels. Biometrika 62 441–444.
- Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.Mathematical Reviews (MathSciNet): MR2060166
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935 - Fisher, R. (1956). On a test of significance in Pearson’s Biometrika Tables (No. 11). J. Roy. Statist. Soc. Ser. B. 18 56–60.Mathematical Reviews (MathSciNet): MR82773
- Fithian, W., Sun, D. and Taylor, J. (2014). Optimal inference after model selection. Preprint. Available at arXiv:1410.2597.arXiv: 1410.2597
- Javanmard, A. and Montanari, A. (2013). Confidence intervals and hypothesis testing for high-dimensional regression. Preprint. Available at arXiv:1306.3171.arXiv: 1306.3171
- Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.Mathematical Reviews (MathSciNet): MR1805787
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397 - Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction. Econometric Theory 21 21–59.Mathematical Reviews (MathSciNet): MR2153856
Digital Object Identifier: doi:10.1017/S0266466605050036 - Leeb, H. and Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? Ann. Statist. 34 2554–2591.Mathematical Reviews (MathSciNet): MR2291510
Digital Object Identifier: doi:10.1214/009053606000000821
Project Euclid: euclid.aos/1169571807 - Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.Mathematical Reviews (MathSciNet): MR2135927
- Lockhart, R., Taylor, J., Tibshirani, R. and Tibshirani, R. (2014). A significance test for the lasso (with discussion). Ann. Statist. 42 413–468.Mathematical Reviews (MathSciNet): MR3210970
Digital Object Identifier: doi:10.1214/13-AOS1175
Project Euclid: euclid.aos/1400592161 - Miller, A. (2002). Subset Selection in Regression, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.Mathematical Reviews (MathSciNet): MR2001193
- Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.Mathematical Reviews (MathSciNet): MR3025133
Digital Object Identifier: doi:10.1214/12-STS400
Project Euclid: euclid.ss/1356098555 - Pötscher, B. M. (1991). Effects of model selection on inference. Econometric Theory 7 163–185.Mathematical Reviews (MathSciNet): MR1128410
Digital Object Identifier: doi:10.1017/S0266466600004382 - Pötscher, B. M. and Schneider, U. (2010). Confidence sets based on penalized maximum likelihood estimators in Gaussian regression. Electron. J. Stat. 4 334–360.Mathematical Reviews (MathSciNet): MR2645488
Digital Object Identifier: doi:10.1214/09-EJS523
Project Euclid: euclid.ejs/1268655653 - Robinson, G. K. (1979). Conditional properties of statistical procedures. Ann. Statist. 7 742–755.Mathematical Reviews (MathSciNet): MR532239
Digital Object Identifier: doi:10.1214/aos/1176344725
Project Euclid: euclid.aos/1176344725 - Sampson, A. R. and Sill, M. W. (2005). Drop-the-losers design: Normal case. Biom. J. 47 257–268.
- Sill, M. W. and Sampson, A. R. (2009). Drop-the-losers design: Binomial case. Comput. Statist. Data Anal. 53 586–595.Mathematical Reviews (MathSciNet): MR2654594
- Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.Mathematical Reviews (MathSciNet): MR2036398
Digital Object Identifier: doi:10.1214/aos/1074290335
Project Euclid: euclid.aos/1074290335 - Taylor, J., Lockhart, R., Tibshirani, R. J. and Tibshirani, R. (2014). Post-selection adaptive inference for least angle regression and the lasso. Preprint. Available at arXiv:1401.3889.arXiv: 1401.3889
- Tian, X. and Taylor, J. (2015). Asymptotics of selective inference. Preprint. Available at arXiv:1501.03588.arXiv: 1501.0358
- Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.Mathematical Reviews (MathSciNet): MR1379242
- Tibshirani, R. J. (2013). The lasso problem and uniqueness. Electron. J. Stat. 7 1456–1490.Mathematical Reviews (MathSciNet): MR3066375
Digital Object Identifier: doi:10.1214/13-EJS815
Project Euclid: euclid.ejs/1369148600 - Tibshirani, R. J., Rinaldo, A., Tibshirani, R. and Wasserman, L. (2015). Uniform asymptotic inference and the bootstrap after model selection. Preprint. Available at arXiv:1506.06266.arXiv: 1506.0626
- van de Geer, S., Bühlmann, P., Ritov, Y. and Dezeure, R. (2013). On asymptotically optimal confidence regions and tests for high-dimensional models. Preprint. Available at arXiv:1303.0518.arXiv: 1303.0518
Mathematical Reviews (MathSciNet): MR3070248
Digital Object Identifier: doi:10.1016/j.jspi.2013.03.006 - Weinstein, A., Fithian, W. and Benjamini, Y. (2013). Selection adjusted confidence intervals with more power to determine the sign. J. Amer. Statist. Assoc. 108 165–176.Mathematical Reviews (MathSciNet): MR3174610
Digital Object Identifier: doi:10.1080/01621459.2012.737740 - Zhang, C.-H. and Zhang, S. S. (2014). Confidence intervals for low dimensional parameters in high dimensional linear models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 217–242.
- Zhong, H. and Prentice, R. L. (2008). Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9 621–634.Mathematical Reviews (MathSciNet): MR2712234
- Zollner, S. and Pritchard, J. K. (2007). Overcoming the winner’s curse: Estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80 605–615.
- Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.Mathematical Reviews (MathSciNet): MR2137327
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00503.x

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Exact post-selection inference for the generalized lasso path
Hyun, Sangwon, G’Sell, Max, and Tibshirani, Ryan J., Electronic Journal of Statistics, 2018 - Valid post-selection inference
Berk, Richard, Brown, Lawrence, Buja, Andreas, Zhang, Kai, and Zhao, Linda, Annals of Statistics, 2013 - On post dimension reduction statistical inference
Kim, Kyongwon, Li, Bing, Yu, Zhou, and Li, Lexin, Annals of Statistics, 2020
- Exact post-selection inference for the generalized lasso path
Hyun, Sangwon, G’Sell, Max, and Tibshirani, Ryan J., Electronic Journal of Statistics, 2018 - Valid post-selection inference
Berk, Richard, Brown, Lawrence, Buja, Andreas, Zhang, Kai, and Zhao, Linda, Annals of Statistics, 2013 - On post dimension reduction statistical inference
Kim, Kyongwon, Li, Bing, Yu, Zhou, and Li, Lexin, Annals of Statistics, 2020 - Debiasing the debiased Lasso with bootstrap
Li, Sai, Electronic Journal of Statistics, 2020 - Geometric inference for general high-dimensional linear inverse problems
Cai, T. Tony, Liang, Tengyuan, and Rakhlin, Alexander, Annals of Statistics, 2016 - Uniformly valid confidence intervals post-model-selection
Bachoc, François, Preinerstorfer, David, and Steinberger, Lukas, Annals of Statistics, 2020 - Selecting the number of principal components: Estimation of the true rank of a noisy matrix
Choi, Yunjin, Taylor, Jonathan, and Tibshirani, Robert, Annals of Statistics, 2017 - A likelihood ratio framework for high-dimensional semiparametric regression
Ning, Yang, Zhao, Tianqi, and Liu, Han, Annals of Statistics, 2017 - High-dimensional inference for personalized treatment decision
Jeng, X. Jessie, Lu, Wenbin, and Peng, Huimin, Electronic Journal of Statistics, 2018 - Regression analysis for microbiome compositional data
Shi, Pixu, Zhang, Anru, and Li, Hongzhe, Annals of Applied Statistics, 2016
