Electronic Journal of Statistics

LASSO, Iterative Feature Selection and the Correlation Selector: Oracle inequalities and numerical performances

Pierre Alquier

Full-text: Open access

Abstract

We propose a general family of algorithms for regression estimation with quadratic loss, on the basis of geometrical considerations. These algorithms are able to select relevant functions into a large dictionary. We prove that a lot of methods that have already been studied for this task (LASSO, Dantzig selector, Iterative Feature Selection, among others) belong to our family, and exhibit another particular member of this family that we call Correlation Selector in this paper. Using general properties of our family of algorithm we prove oracle inequalities for IFS, for the LASSO and for the Correlation Selector, and compare numerical performances of these estimators on a toy example.

Article information

Source
Electron. J. Statist. Volume 2 (2008), 1129-1152.

Dates
First available in Project Euclid: 21 November 2008

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1227287695

Digital Object Identifier
doi:10.1214/08-EJS288

Zentralblatt MATH identifier
1320.62084

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62J07: Ridge regression; shrinkage estimators 62G15: Tolerance and confidence regions 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Keywords
Regression estimation statistical learning confidence regions shrinkage and thresholding methods LASSO

Citation

Alquier, Pierre. LASSO, Iterative Feature Selection and the Correlation Selector: Oracle inequalities and numerical performances. Electron. J. Statist. 2 (2008), 1129--1152. doi:10.1214/08-EJS288. https://projecteuclid.org/euclid.ejs/1227287695


Export citation

References

  • [1] Alquier, P. Density estimation with quadratic loss: A confidence intervals method., ESAIM P&S 12 (2008), 438–463.
  • [2] Alquier, P. Iterative feature selection in regression estimation., Annales de l’Institut Henri Poincaré, Probability and Statistics 44, 1 (2008), 47–88.
  • [3] Bakin, S., Adaptative Regression and Model Selection in Data Mining Problems. PhD thesis, Australian National University, 1999.
  • [4] Barron, A., Cohen, A., Dahmen, W., and DeVore, R. Adaptative approximation and learning by greedy algorithms., The annals of statistics 36, 1 (2008), 64–94.
  • [5] Bickel, P. J., Ritov, Y., and Tsybakov, A. Simultaneous analysis of lasso and dantzig selector. Annals of Statistics (to, appear).
  • [6] Bunea, F. Honest variable slection in linear and logistic regression models via, 1 and 1+2 penalization. preprint available on arxiv (0808.4051), 2008.
  • [7] Bunea, F., Tsybakov, A., and Wegkamp, M. Sparse density estimation with, 1 penalties. In Proceedings of 20th Annual Conference on Learning Theory (COLT 2007) (2007), Springer-Verlag, pp. 530–543.
  • [8] Bunea, F., Tsybakov, A., and Wegkamp, M. Sparsity oracle inequalities for the lasso., Electronic Journal of Statistics 1 (2007), 169–194.
  • [9] Candes, E., and Tao, T. The dantzig selector: statistical estimation when, p is much larger than n. The Annals of Statistics 35 (2007).
  • [10] Catoni, O. A pac-bayesian approach to adaptative classification. Preprint Laboratoire de Probabilités et Modèles Aléatoires, 2003.
  • [11] Chesneau, C., and Hebiri, M. Some theoretical results on the grouped variable lasso. Preprint Laboratoire de Probabilités et Modèles Aléatoires (submitted), 2007.
  • [12] Cohen, A., Handbook of Numerical Analysis, vol. 7. North-Holland, Amsterdam, 2000.
  • [13] Efron, B., Hastie, T., Johnstone, I., and Tibshirani, R. Least angle regression., The Annals of Statistics 32, 2 (2004), 407–499.
  • [14] Frank, L., and Friedman, J. A statistical view on some chemometrics regression tools., Technometrics 16 (1993), 499–511.
  • [15] Huang, C., Cheang, G. L. H., and Barron, A. Risk of penalized least squares, greedy selection and l1 penalization for flexible function libraries. preprint, 2008.
  • [16] Osborne, M., Presnell, B., and Turlach, B. On the lasso and its dual., Journal of Computational and Graphical Statsitics 9 (2000), 319–337.
  • [17] Panchenko, D. Symmetrization approach to concentration inequalities for empirical processes., The Annals of Probability 31, 4 (2003), 2068–2081.
  • [18] R. A language and environment for statistical computing. By the R development core team, Vienna, Austria. URL: http://www.R-project.org/, 2004.
  • [19] Tibshirani, R. Regression shrinkage and selection via the lasso., Journal of the Royal Statistical Society B 58, 1 (1996), 267–288.
  • [20] Vapnik, V., The nature of statistical learning theory. Springer, 1998.
  • [21] Yuan, M., and Lin, Y. Model selection and estimation in regression with grouped variables., Journal of the Royal Statistical Society B 68, 1 (2006), 49–67.