Electronic Journal of Statistics

Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high-dimensional linear regression

Hanzhong Liu and Bin Yu

Full-text: Open access

Abstract

We study the asymptotic properties of Lasso+mLS and Lasso+ Ridge under the sparse high-dimensional linear regression model: Lasso selecting predictors and then modified Least Squares (mLS) or Ridge estimating their coefficients. First, we propose a valid inference procedure for parameter estimation based on parametric residual bootstrap after Lasso+ mLS and Lasso+Ridge. Second, we derive the asymptotic unbiasedness of Lasso+mLS and Lasso+Ridge. More specifically, we show that their biases decay at an exponential rate and they can achieve the oracle convergence rate of $s/n$ (where $s$ is the number of nonzero regression coefficients and $n$ is the sample size) for mean squared error (MSE). Third, we show that Lasso+mLS and Lasso+Ridge are asymptotically normal. They have an oracle property in the sense that they can select the true predictors with probability converging to $1$ and the estimates of nonzero parameters have the same asymptotic normal distribution that they would have if the zero parameters were known in advance. In fact, our analysis is not limited to adopting Lasso in the selection stage, but is applicable to any other model selection criteria with exponentially decay rates of the probability of selecting wrong models.

Article information

Source
Electron. J. Statist. Volume 7 (2013), 3124-3169.

Dates
First available in Project Euclid: 15 January 2014

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1389795619

Digital Object Identifier
doi:10.1214/14-EJS875

Mathematical Reviews number (MathSciNet)
MR3151764

Zentralblatt MATH identifier
1281.62158

Subjects
Primary: 62F12: Asymptotic properties of estimators 62F40: Bootstrap, jackknife and other resampling methods
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Lasso irrepresentable condition Lasso+mLS and Lasso+Ridge sparsity asymptotic unbiasedness asymptotic normality residual bootstrap

Citation

Liu, Hanzhong; Yu, Bin. Asymptotic properties of Lasso+mLS and Lasso+Ridge in sparse high-dimensional linear regression. Electron. J. Statist. 7 (2013), 3124--3169. doi:10.1214/14-EJS875. https://projecteuclid.org/euclid.ejs/1389795619.


Export citation

References

  • [1] Adel, J. and Andrea, M. (2013). Model selection for high-dimensional regression under the generalized irrepresentability condition., http://arxiv.org/abs/1305.0355.
  • [2] Bach, F. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap. In, Proc. 25th Int. Conf. Machine Learning, 33–40.
  • [3] Belloni, A. and Chernozhukov, V. (2009). Least squares after model selection in high-dimensional sparse models., Bernoulli 19, 521–547.
  • [4] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference for high-dimensional sparse econometric models., http://arxiv.org/abs/1201.0220.
  • [5] Belloni, A., Chernozhukov, V. and Hansen, C. (2011). Inference on treatment effects after selection amongst high-dimensional controls., http://arxiv.org/abs/1201.0224.
  • [6] Bickel, P. J. and van Zwet, W. R. (1978). Asymptotic expansions for the power of distribution free tests in the two-sample problem., Annals of Statistics 6, 937–1004.
  • [7] Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for the bootstrap., Annals of Statistics 9, 1196–1217.
  • [8] Bickel, P. J. and Freedman, D. A. (1983). Bootstrapping regression models with many parameters. In, Festschrift for Erich L. Lehmann (P. Bickel, K. Doksum, and J. Hodges, Jr., eds.) 28–48. Wadsworth, Belmont, Calif.
  • [9] Bickel, P. J., Ritov, Y. and Tsybakov A. (2009). Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics 37, 1705–1732.
  • [10] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via $l_1$ and $l_1+l_2$ penalization., Electronic Journal of Statistics 2, 1153–1194.
  • [11] Bunea, F., Tsybakov A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics 1, 169–194.
  • [12] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$., Annals of Statistics 35, 2312–2351.
  • [13] Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping Lasso estimators., Journal of the American Statistical Association 106, 608–625.
  • [14] Chatterjee, A. and Lahiri, S. N. (2012). Rates of convergence of the adaptive Lasso estimators to the oracle distribution and higher order refinements by the bootstrap., Annals of Statistics (to appear).
  • [15] Davison, A. C. and Hinkley, D. V. (1997)., Bootstrap methods and their application, Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press.
  • [16] del Barrio, E., Cuesta-Albertos, J. and Matran, C. (2000). Contributions of empirical and quantile processes to the asymptotic theory of goodness-of-fit tests., Test 9, 1–96.
  • [17] Donoho, D., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise., IEEE Transactions on Information Theory 52, 6–18.
  • [18] Efron, B. (1979). Bootstrap methods: Another look at the jackknife., Annals of Statistics 7, 1–26.
  • [19] Efron, B., Hastie, T. and Tibshirani, R. (2004). Least angle regression., Annals of Statistics 32, 407–499.
  • [20] Efron, B. and Tibshirani, R. (1993)., An Introduction to the Bootstrap. Boca Raton, FL: Chapman & Hall/CRC.
  • [21] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96, 1348–1360.
  • [22] Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high dimensional feature space., Journal of the Royal Statistical Society, Series B 70, 849–911.
  • [23] Freedman, D. A. (1981). Bootstrapping regression models., Annals of Statistics 9, 1218–1228.
  • [24] Friedman, J., Hastie, T., Holfing, H., and Tibshirani, R. (2007). Pathwise coordinate optimization., The Annals of Applied Statistics 1, 302–332.
  • [25] Fuchs, J. J. (2005). Recovery of exact sparse representations in the presence of noise., IEEE Transactions on Information Theory 51, 3601–3608.
  • [26] Gai, Y., Zhu, L. and Lin, L. (2013). Model selection consistency of Dantzig selector., Statistica Sinica 23, 615–634.
  • [27] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization., Bernoulli 10, 971–988.
  • [28] Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems., Technometrics 12, 55–67.
  • [29] Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models., Annals of Statistics 36, 587–613.
  • [30] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models., Statistica Sinica 18, 1603–1618.
  • [31] Karim, L. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electronic Journal of Statistics 2, 90–102.
  • [32] Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators., Annals of Statistics 28, 1356–1378.
  • [33] Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: Facts and fiction., Econometric Theory 21, 21–59.
  • [34] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares., Annals of Statistics 37, 3498–3528.
  • [35] Massy, W. F. (1965). Principal components regression in exploratory statistical research., Journal of the American Statistical Association 60, 234–256.
  • [36] Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso., Annals of Statistics 34, 1436–1462.
  • [37] Meinshausen, N. (2007). Relaxed Lasso., Computational Statistics and Data Analysis 52, 374–393.
  • [38] Meinshausen, N. and Buhlmann, P. (2010). Stability selection., Journal of the Royal Statistical Society: Series B 72, 417–473.
  • [39] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Annals of Statistics 37, 246–270.
  • [40] Minnier, J., Tian, L. and Cai, T. (2009). A perturbation method for inference on regularized regression estimates., Journal of the American Statistical Association 106(496), 1371–1382.
  • [41] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2009). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. In, Advances in Neural Information Processing Systems 22, 1348–1356.
  • [42] Negahban, S., Ravikumar, P., Wainwright, M. J., and Yu, B. (2009). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers., Statistical Science 28, 538–557.
  • [43] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the Lasso and its dual., Journal of Computational and Graphical Statistics 9, 319–337.
  • [44] Pötscher, B. M. and Schneider, U. (2009). On the distribution of the adaptive LASSO estimator., Journal of Statistical Planning and Inference 139, 2775–2790.
  • [45] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $l_q$-balls., IEEE Transactions on Information Theory 57, 6976–6994.
  • [46] Sartori, S. (2011)., Penalized regression: Bootstrap confidence intervals and variable selection for high dimensional data sets. PhD thesis, Universit degli Studi di Milano, 2011. Available online: http://air.unimi.it/bitstream/2434/153099/6/phd_unimi_R07738.pdf.
  • [47] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society: Series B 58, 267–288.
  • [48] Tropp, J. (2004). Greed is good: Algorithmic results for sparse approximation., IEEE Transactions on Information Theory 50, 2231–2242.
  • [49] Van de Geer, S. (2007). The deterministic Lasso., Proc. of Joint Statistical Meeting
  • [50] Van de Geer, S. (2007). High-dimensional generalized linear models and the Lasso., Annals of Statistics 36, 614–645.
  • [51] Wainwright, M. (2009). Sharp thresholds for noisy and high-dimensional recovery of sparsity using $l_1$-constrained quadratic programming (Lasso)., IEEE Transactions on Information Theory 55, 2183–2202.
  • [52] Zhang, C.-H. and Huang J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression., Annals of Statistics 36, 1567–1594.
  • [53] Zhang, C.-H. and Zhang, S. S. (2011). Confidence intervals for low-dimensional parameters in high-dimensional linear models., arxiv.org/abs/1110.2563.
  • [54] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., The Journal of Machine Learning Research 7, 2541–2563.
  • [55] Zou, H. (2006). The adaptive Lasso and its oracle properties., Journal of the American Statistical Association 101, 1418–1429.
  • [56] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society: Series B 67, 301–320.