Open Access
2018 On the prediction loss of the lasso in the partially labeled setting
Pierre C. Bellec, Arnak S. Dalalyan, Edwin Grappin, Quentin Paris
Electron. J. Statist. 12(2): 3443-3472 (2018). DOI: 10.1214/18-EJS1457

Abstract

In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.

Citation

Download Citation

Pierre C. Bellec. Arnak S. Dalalyan. Edwin Grappin. Quentin Paris. "On the prediction loss of the lasso in the partially labeled setting." Electron. J. Statist. 12 (2) 3443 - 3472, 2018. https://doi.org/10.1214/18-EJS1457

Information

Received: 1 January 2018; Published: 2018
First available in Project Euclid: 16 October 2018

zbMATH: 06970009
MathSciNet: MR3864589
Digital Object Identifier: 10.1214/18-EJS1457

Subjects:
Primary: 62H30
Secondary: 62G08

Keywords: high-dimensional regression , Lasso , Oracle inequality , semi-supervised learning , Sparsity , Transductive learning

Vol.12 • No. 2 • 2018
Back to Top