Electronic Journal of Statistics

Recovery of weak signal in high dimensional linear regression by data perturbation

Yongli Zhang

Full-text: Open access

Abstract

How to recover weak signals (i.e., small nonzero regression coefficients) is a difficult task in high dimensional feature selection problems. Both convex and nonconvex regularization methods fail to fully recover the true model whenever there exist strong columnwise correlations in design matrices or small nonzero coefficients below some threshold. To address the two challenges, we propose a procedure, Perturbed LASSO (PLA), that weakens correlations in the design matrix and strengthens signals by adding random perturbations to the design matrix. Moreover, a quantitative relationship between the selection accuracy and computing cost of PLA is derived. We theoretically prove and demonstrate using simulations that PLA substantially improves the chance of recovering weak signals and outperforms comparable methods at a limited cost of computation.

Article information

Source
Electron. J. Statist. Volume 11, Number 2 (2017), 3226-3250.

Dates
Received: November 2016
First available in Project Euclid: 25 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1506326416

Digital Object Identifier
doi:10.1214/17-EJS1320

Zentralblatt MATH identifier
1373.62373

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators

Keywords
Beta-min condition data perturbation high dimensional data irrepresentable condition LASSO weak signal

Rights
Creative Commons Attribution 4.0 International License.

Citation

Zhang, Yongli. Recovery of weak signal in high dimensional linear regression by data perturbation. Electron. J. Statist. 11 (2017), no. 2, 3226--3250. doi:10.1214/17-EJS1320. https://projecteuclid.org/euclid.ejs/1506326416


Export citation

References

  • [1] Bach, F. (2008). Bolasso: Model consistent Lasso estimation through the bootstrap., InProcedings 25th International Conference Machine Learning33–40. Association for Computing Machinery, New York.
  • [2] Bellec, P. and Tsybakov, A. (2017). Bounds on the prediction error of penalized least squares estimators with convex penalty., InModern Problems of Stochastic Analysis and Statistics: Selected Contributions In Honor of Valentin Konakov, to appear. Springer International Publishing AG, Switzerland.
  • [3] Boyd, S. and Vandenberghe, L., (2004).Convex Optimization. Cambridge Univ. Press, Cambridge.
  • [4] Bühlmann, P. (2011). Comments on ‘Regression shrinkage and selection via the lasso: A, retrospective’.J. R. Stat. Soc. Ser. B Stat. Methodol.73277–279.
  • [5] Bühlmann, P., Kalisch, M. and Meier, L. (2014). High-dimensional statistics with a view towards applications in, biology.Annual Review of Statistics and its Applications1255–278.
  • [6] Harville, D. A., (1997).Matrix Algebra from a Statistician’s Perspective. Springer, New York.
  • [7] Meinshausen, N. and Bühlmann, P. (2010). Stability, selection.J. R. Stat. Soc. Ser. B Stat. Methodol.72417–473.
  • [8] Tibshirani, R. (1996). Regression shrinkage and selection via the, lasso.J. R. Stat. Soc. Ser. B Stat. Methodol.58267–288.
  • [9] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave, penalty.Ann. Statist.38894–942.
  • [10] Zhang, T. (2013). Multi-stage convex relaxation for feature, selection.Bernoulli192277–2293.
  • [11] Zhang, Y. and Shen, X. (2010). Model selection procedure for high-dimensional, data.Stat. Anal. Data Min.3350–358.
  • [12] Zhang, Y. and Yang, Y. (2015). Cross-validation for selecting a model selection, procedure.J. Econometrics18795–112.
  • [13] Zhao, P. and Yu, B. (2006). On model selection consistency of, Lasso.J. Mach. Learn. Res.72541–2563.
  • [14] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic, net.J. R. Stat. Soc. Ser. B Stat. Methodol.67301–320.