Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 7 (2013), 3124-3169.
Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high-dimensional linear regression
Hanzhong Liu and Bin Yu
Full-text: Open access
Abstract
We study the asymptotic properties of Lasso+mLS and Lasso+ Ridge under the sparse high-dimensional linear regression model: Lasso selecting predictors and then modified Least Squares (mLS) or Ridge estimating their coefficients. First, we propose a valid inference procedure for parameter estimation based on parametric residual bootstrap after Lasso+ mLS and Lasso+Ridge. Second, we derive the asymptotic unbiasedness of Lasso+mLS and Lasso+Ridge. More specifically, we show that their biases decay at an exponential rate and they can achieve the oracle convergence rate of $s/n$ (where $s$ is the number of nonzero regression coefficients and $n$ is the sample size) for mean squared error (MSE). Third, we show that Lasso+mLS and Lasso+Ridge are asymptotically normal. They have an oracle property in the sense that they can select the true predictors with probability converging to $1$ and the estimates of nonzero parameters have the same asymptotic normal distribution that they would have if the zero parameters were known in advance. In fact, our analysis is not limited to adopting Lasso in the selection stage, but is applicable to any other model selection criteria with exponentially decay rates of the probability of selecting wrong models.
Article information
Source
Electron. J. Statist. Volume 7 (2013), 3124-3169.
Dates
First available in Project Euclid: 15 January 2014
Permanent link to this document
http://projecteuclid.org/euclid.ejs/1389795619
Digital Object Identifier
doi:10.1214/14-EJS875
Mathematical Reviews number (MathSciNet)
MR3151764
Zentralblatt MATH identifier
1281.62158
Subjects
Primary: 62F12: Asymptotic properties of estimators 62F40: Bootstrap, jackknife and other resampling methods
Secondary: 62J07: Ridge regression; shrinkage estimators
Keywords
Lasso irrepresentable condition Lasso+mLS and Lasso+Ridge sparsity asymptotic unbiasedness asymptotic normality residual bootstrap
Citation
Liu, Hanzhong; Yu, Bin. Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high-dimensional linear regression. Electron. J. Statist. 7 (2013), 3124--3169. doi:10.1214/14-EJS875. http://projecteuclid.org/euclid.ejs/1389795619.
References
- [1] Adel, J. and Andrea, M. (2013). Model selection for high-dimensional regression under the generalized irrepresentability condition., http://arxiv.org/abs/1305.0355.
- [2] Bach, F. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap. In, Proc. 25th Int. Conf. Machine Learning, 33–40.
- [3] Belloni, A. and Chernozhukov, V. (2009). Least squares after model selection in high-dimensional sparse models., Bernoulli 19, 521–547.Mathematical Reviews (MathSciNet): MR3037163
Digital Object Identifier: doi:10.3150/11-BEJ410
Project Euclid: euclid.bj/1363192037 - [4] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference for high-dimensional sparse econometric models., http://arxiv.org/abs/1201.0220.
- [5] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference on treatment effects after selection amongst high-dimensional controls., http://arxiv.org/abs/1201.0224.
- [6] Bickel, P. J. and van Zwet, W. R. (1978). Asymptotic expansions for the power of distribution free tests in the two-sample problem., Annals of Statistics 6, 937–1004.Mathematical Reviews (MathSciNet): MR499567
Digital Object Identifier: doi:10.1214/aos/1176344305
Project Euclid: euclid.aos/1176344305 - [7] Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for the bootstrap., Annals of Statistics 9, 1196–1217.Mathematical Reviews (MathSciNet): MR630103
Digital Object Identifier: doi:10.1214/aos/1176345637
Project Euclid: euclid.aos/1176345637 - [8] Bickel, P. J. and Freedman, D. A. (1983). Bootstrapping regression models with many parameters. In, Festschrift for Erich L. Lehmann (P. Bickel, K. Doksum, and J. Hodges, Jr., eds.) 28–48. Wadsworth, Belmont, Calif.Mathematical Reviews (MathSciNet): MR689736
- [9] Bickel, P. J., Ritov, Y. and Tsybakov A. (2009). Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics 37, 1705–1732.Mathematical Reviews (MathSciNet): MR2533469
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830 - [10] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via $l_1$ and $l_1+l_2$ penalization., Electronic Journal of Statistics 2, 1153–1194.Mathematical Reviews (MathSciNet): MR2461898
Digital Object Identifier: doi:10.1214/08-EJS287
Project Euclid: euclid.ejs/1229450666 - [11] Bunea, F., Tsybakov A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics 1, 169–194.Mathematical Reviews (MathSciNet): MR2312149
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718 - [12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$., Annals of Statistics 35, 2312–2351.Mathematical Reviews (MathSciNet): MR2382644
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958 - [13] Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping Lasso estimators., Journal of the American Statistical Association 106, 608–625.Mathematical Reviews (MathSciNet): MR2847974
Digital Object Identifier: doi:10.1198/jasa.2011.tm10159 - [14] Chatterjee, A. and Lahiri, S. N. (2012). Rates of convergence of the adaptive Lasso estimators to the oracle distribution and higher order refinements by the bootstrap., Annals of Statistics (to appear).Mathematical Reviews (MathSciNet): MR3113809
Digital Object Identifier: doi:10.1214/13-AOS1106
Project Euclid: euclid.aos/1371150899 - [15] Davison, A. C. and Hinkley, D. V. (1997)., Bootstrap methods and their application, Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.Mathematical Reviews (MathSciNet): MR1478673
- [16] del Barrio, E., Cuesta-Albertos, J. and Matran, C. (2000). Contributions of empirical and quantile processes to the asymptotic theory of goodness-of-fit tests., Test 9, 1–96.
- [17] Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise., IEEE Transactions on Information Theory 52, 6–18.
- [18] Efron, B. (1979). Bootstrap methods: Another look at the jackknife., Annals of Statistics 7, 1–26.Mathematical Reviews (MathSciNet): MR515681
Digital Object Identifier: doi:10.1214/aos/1176344552
Project Euclid: euclid.aos/1176344552 - [19] Efron, B., Hastie, T. and Tibshirani, R. (2004). Least angle regression., Annals of Statistics 32, 407–499.Mathematical Reviews (MathSciNet): MR2060166
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935 - [20] Efron, B. and Tibshirani, R. (1993)., An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC.Mathematical Reviews (MathSciNet): MR1270903
- [21] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96, 1348–1360.Mathematical Reviews (MathSciNet): MR1946581
Digital Object Identifier: doi:10.1198/016214501753382273 - [22] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space., Journal of the Royal Statistical Society, Series B 70, 849–911.Mathematical Reviews (MathSciNet): MR2530322
Digital Object Identifier: doi:10.1111/j.1467-9868.2008.00674.x - [23] Freedman, D. A. (1981). Bootstrapping regression models., Annals of Statistics 9, 1218–1228.Mathematical Reviews (MathSciNet): MR630104
Digital Object Identifier: doi:10.1214/aos/1176345638
Project Euclid: euclid.aos/1176345638 - [24] Friedman, J., Hastie, T., Holfing, H., and Tibshirani, R. (2007). Pathwise coordinate optimization., The Annals of Applied Statistics 1, 302–332.Mathematical Reviews (MathSciNet): MR2415737
Digital Object Identifier: doi:10.1214/07-AOAS131
Project Euclid: euclid.aoas/1196438020 - [25] Fuchs, J. J. (2005). Recovery of exact sparse representations in the presence of noise., IEEE Transactions on Information Theory 51, 3601–3608.
- [26] Gai, Y., Zhu, L. and Lin, L. (2013). Model selection consistency of Dantzig selector., Statistica Sinica 23, 615–634.Mathematical Reviews (MathSciNet): MR3086649
- [27] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization., Bernoulli 10, 971–988.Mathematical Reviews (MathSciNet): MR2108039
Digital Object Identifier: doi:10.3150/bj/1106314846
Project Euclid: euclid.bj/1106314846 - [28] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems., Technometrics 12, 55–67.
- [29] Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models., Annals of Statistics 36, 587–613.Mathematical Reviews (MathSciNet): MR2396808
Digital Object Identifier: doi:10.1214/009053607000000875
Project Euclid: euclid.aos/1205420512 - [30] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models., Statistica Sinica 18, 1603–1618.Mathematical Reviews (MathSciNet): MR2469326
- [31] Karim, L. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electronic Journal of Statistics 2, 90–102.Mathematical Reviews (MathSciNet): MR2386087
Digital Object Identifier: doi:10.1214/08-EJS177
Project Euclid: euclid.ejs/1202844625 - [32] Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators., Annals of Statistics 28, 1356–1378.Mathematical Reviews (MathSciNet): MR1805787
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397 - [33] Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction., Econometric Theory 21, 21–59.Mathematical Reviews (MathSciNet): MR2153856
Digital Object Identifier: doi:10.1017/S0266466605050036 - [34] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares., Annals of Statistics 37, 3498–3528.Mathematical Reviews (MathSciNet): MR2549567
Digital Object Identifier: doi:10.1214/09-AOS683
Project Euclid: euclid.aos/1250515394 - [35] Massy, W. F. (1965). Principal components regression in exploratory statistical research., Journal of the American Statistical Association 60, 234–256.
- [36] Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso., Annals of Statistics 34, 1436–1462.Mathematical Reviews (MathSciNet): MR2278363
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754 - [37] Meinshausen, N. (2007). Relaxed Lasso., Computational Statistics and Data Analysis 52, 374–393.Mathematical Reviews (MathSciNet): MR2409990
- [38] Meinshausen, N. and Buhlmann, P. (2010). Stability selection., Journal of the Royal Statistical Society: Series B 72, 417–473.Mathematical Reviews (MathSciNet): MR2758523
Digital Object Identifier: doi:10.1111/j.1467-9868.2010.00740.x - [39] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Annals of Statistics 37, 246–270.Mathematical Reviews (MathSciNet): MR2488351
Digital Object Identifier: doi:10.1214/07-AOS582
Project Euclid: euclid.aos/1232115934 - [40] Minnier, J., Tian, L. and Cai, T. (2009). A perturbation method for inference on regularized regression estimates., Journal of the American Statistical Association 106(496), 1371–1382.Mathematical Reviews (MathSciNet): MR2896842
Digital Object Identifier: doi:10.1198/jasa.2011.tm10382 - [41] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2009). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. In, Advances in Neural Information Processing Systems 22, 1348–1356.
- [42] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2009). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers., Statistical Science 28, 538–557.Mathematical Reviews (MathSciNet): MR3025133
Digital Object Identifier: doi:10.1214/12-STS400
Project Euclid: euclid.ss/1356098555 - [43] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the Lasso and its dual., Journal of Computational and Graphical Statistics 9, 319–337.Mathematical Reviews (MathSciNet): MR1822089
- [44] Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator., Journal of Statistical Planning and Inference 139, 2775–2790.Mathematical Reviews (MathSciNet): MR2523666
Digital Object Identifier: doi:10.1016/j.jspi.2009.01.003 - [45] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $l_q$-balls., IEEE Transactions on Information Theory 57, 6976–6994.Mathematical Reviews (MathSciNet): MR2882274
Digital Object Identifier: doi:10.1109/TIT.2011.2165799 - [46] Sartori, S. (2011)., Penalized regression: Bootstrap confidence intervals and variable selection for high dimensional data sets. PhD thesis, Universit degli Studi di Milano, 2011. Available online: http://air.unimi.it/bitstream/2434/153099/6/phd_unimi_R07738.pdf.
- [47] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society: Series B 58, 267–288.Mathematical Reviews (MathSciNet): MR1379242
- [48] Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation., IEEE Transactions on Information Theory 50, 2231–2242.
- [49] Van de Geer, S. (2007). The deterministic Lasso., Proc. of Joint Statistical Meeting
- [50] Van de Geer, S. (2007). High-dimensional generalized linear models and the Lasso., Annals of Statistics 36, 614–645.Mathematical Reviews (MathSciNet): MR2396809
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513 - [51] Wainwright, M. (2009). Sharp thresholds for noisy and high-dimensional recovery of sparsity using $l_1$-constrained quadratic programming (Lasso)., IEEE Transactions on Information Theory 55, 2183–2202.Mathematical Reviews (MathSciNet): MR2729873
Digital Object Identifier: doi:10.1109/TIT.2009.2016018 - [52] Zhang, C.-H. and Huang J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression., Annals of Statistics 36, 1567–1594.Mathematical Reviews (MathSciNet): MR2435448
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292 - [53] Zhang, C.-H. and Zhang, S. S. (2011). Confidence intervals for low-dimensional parameters in high-dimensional linear models., arxiv.org/abs/1110.2563.
- [54] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., The Journal of Machine Learning Research 7, 2541–2563.Mathematical Reviews (MathSciNet): MR2274449
- [55] Zou, H. (2006). The adaptive Lasso and its oracle properties., Journal of the American Statistical Association 101, 1418–1429.Mathematical Reviews (MathSciNet): MR2279469
Digital Object Identifier: doi:10.1198/016214506000000735 - [56] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B 67, 301–320.Mathematical Reviews (MathSciNet): MR2137327
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00503.x
The Institute of Mathematical Statistics and the Bernoulli Society

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Sign-constrained least squares estimation for high-dimensional regression
Meinshausen, Nicolai, Electronic Journal of Statistics, 2013 - Asymptotic oracle properties of SCAD-penalized least squares estimators
Huang, Jian and Xie, Huiliang, Asymptotics: Particles, Processes and Inverse Problems, 2007 - Estimation in high-dimensional linear models with deterministic design matrices
Shao, Jun and Deng, Xinwei, The Annals of Statistics, 2012
- Sign-constrained least squares estimation for high-dimensional regression
Meinshausen, Nicolai, Electronic Journal of Statistics, 2013 - Asymptotic oracle properties of SCAD-penalized least squares estimators
Huang, Jian and Xie, Huiliang, Asymptotics: Particles, Processes and Inverse Problems, 2007 - Estimation in high-dimensional linear models with deterministic design matrices
Shao, Jun and Deng, Xinwei, The Annals of Statistics, 2012 - Asymptotic properties of bridge estimators in sparse high-dimensional regression models
Huang, Jian, Horowitz, Joel L., and Ma, Shuangge, The Annals of Statistics, 2008 - A lava attack on the recovery of sums of dense and sparse signals
Chernozhukov, Victor, Hansen, Christian, and Liao, Yuan, The Annals of Statistics, 2017 - High-dimensional graphs and variable selection with the Lasso
Meinshausen, Nicolai and Bühlmann, Peter, The Annals of Statistics, 2006 - Joint variable and rank selection for parsimonious estimation of high-dimensional matrices
Bunea, Florentina, She, Yiyuan, and Wegkamp, Marten H., The Annals of Statistics, 2012 - SCAD-penalized regression in high-dimensional partially linear models
Xie, Huiliang and Huang, Jian, The Annals of Statistics, 2009 - Adaptive robust variable selection
Fan, Jianqing, Fan, Yingying, and Barut, Emre, The Annals of Statistics, 2014 - High-dimensional generalized linear models and the lasso
van de Geer, Sara A., The Annals of Statistics, 2008
