Electronic Journal of Statistics

Recovery of weak signal in high dimensional linear regression by data perturbation

Yongli Zhang

Full-text: Open access


How to recover weak signals (i.e., small nonzero regression coefficients) is a difficult task in high dimensional feature selection problems. Both convex and nonconvex regularization methods fail to fully recover the true model whenever there exist strong columnwise correlations in design matrices or small nonzero coefficients below some threshold. To address the two challenges, we propose a procedure, Perturbed LASSO (PLA), that weakens correlations in the design matrix and strengthens signals by adding random perturbations to the design matrix. Moreover, a quantitative relationship between the selection accuracy and computing cost of PLA is derived. We theoretically prove and demonstrate using simulations that PLA substantially improves the chance of recovering weak signals and outperforms comparable methods at a limited cost of computation.

Article information

Electron. J. Statist. Volume 11, Number 2 (2017), 3226-3250.

Received: November 2016
First available in Project Euclid: 25 September 2017

Permanent link to this document

Digital Object Identifier

Primary: 62J07: Ridge regression; shrinkage estimators

Beta-min condition data perturbation high dimensional data irrepresentable condition LASSO weak signal

Creative Commons Attribution 4.0 International License.


Zhang, Yongli. Recovery of weak signal in high dimensional linear regression by data perturbation. Electron. J. Statist. 11 (2017), no. 2, 3226--3250. doi:10.1214/17-EJS1320. https://projecteuclid.org/euclid.ejs/1506326416

Export citation


  • [1] Bach, F. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap. In, Procedings 25th International Conference Machine Learning 33–40. Association for Computing Machinery, New York.
  • [2] Bellec, P. and Tsybakov, A. (2017). Bounds on the prediction error of penalized least squares estimators with convex penalty. In, Modern Problems of Stochastic Analysis and Statistics: Selected Contributions In Honor of Valentin Konakov, to appear. Springer International Publishing AG, Switzerland.
  • [3] Boyd, S. and Vandenberghe, L. (2004)., Convex Optimization. Cambridge Univ. Press, Cambridge.
  • [4] Bühlmann, P. (2011). Comments on ‘Regression shrinkage and selection via the lasso: A retrospective’., J. R. Stat. Soc. Ser. B Stat. Methodol. 73 277–279.
  • [5] Bühlmann, P., Kalisch, M. and Meier, L. (2014). High-dimensional statistics with a view towards applications in biology., Annual Review of Statistics and its Applications 1 255–278.
  • [6] Harville, D. A. (1997)., Matrix Algebra from a Statistician’s Perspective. Springer, New York.
  • [7] Meinshausen, N. and Bühlmann, P. (2010). Stability selection., J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473.
  • [8] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • [9] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty., Ann. Statist. 38 894–942.
  • [10] Zhang, T. (2013). Multi-stage convex relaxation for feature selection., Bernoulli 19 2277–2293.
  • [11] Zhang, Y. and Shen, X. (2010). Model selection procedure for high-dimensional data., Stat. Anal. Data Min. 3 350–358.
  • [12] Zhang, Y. and Yang, Y. (2015). Cross-validation for selecting a model selection procedure., J. Econometrics 187 95–112.
  • [13] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res. 7 2541–2563.
  • [14] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.