Electronic Journal of Statistics

On the prediction loss of the lasso in the partially labeled setting

Pierre C. Bellec, Arnak S. Dalalyan, Edwin Grappin, and Quentin Paris

Full-text: Open access


In this paper we revisit the risk bounds of the lasso estimator in the context of transductive and semi-supervised learning. In other terms, the setting under consideration is that of regression with random design under partial labeling. The main goal is to obtain user-friendly bounds on the off-sample prediction risk. To this end, the simple setting of bounded response variable and bounded (high-dimensional) covariates is considered. We propose some new adaptations of the lasso to these settings and establish oracle inequalities both in expectation and in deviation. These results provide non-asymptotic upper bounds on the risk that highlight the interplay between the bias due to the mis-specification of the linear model, the bias due to the approximate sparsity and the variance. They also demonstrate that the presence of a large number of unlabeled features may have significant positive impact in the situations where the restricted eigenvalue of the design matrix vanishes or is very small.

Article information

Electron. J. Statist., Volume 12, Number 2 (2018), 3443-3472.

Received: January 2018
First available in Project Euclid: 16 October 2018

Permanent link to this document

Digital Object Identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62G08: Nonparametric regression

Semi-supervised learning sparsity lasso oracle inequality transductive learning high-dimensional regression

Creative Commons Attribution 4.0 International License.


Bellec, Pierre C.; Dalalyan, Arnak S.; Grappin, Edwin; Paris, Quentin. On the prediction loss of the lasso in the partially labeled setting. Electron. J. Statist. 12 (2018), no. 2, 3443--3472. doi:10.1214/18-EJS1457. https://projecteuclid.org/euclid.ejs/1539676834

Export citation


  • Vladimir N. Vapnik., Statistical learning theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons, Inc., New York, 1998. A Wiley-Interscience Publication.
  • Maria-Florina Balcan, Avrim Blum, Patrick Pakyan Choi, John Lafferty, Brian Pantano, Mugizi R. Rwebangira, and Xiaojin Zhu. Person identification in webcam images: An application of semi-supervised learning., ICM L2005 Workshop on Learning with Partially Classified Training Data, 2005.
  • Matthieu Guillaumin, Jakob J. Verbeek, and Cordelia Schmid. Multimodal semi-supervised learning for image classification. In, The Twenty-Third IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010, San Francisco, CA, USA, 13–18 June 2010, pages 902–909, 2010. URL http://dx.doi.org/10.1109/CVPR.2010.5540120.
  • Céline Brouard, Florence d’Alché-Buc, and Marie Szafranski. Semi-supervised penalized output kernel regression for link prediction. In Lise Getoor and Tobias Scheffer, editors, Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28–July 2, 2011, pages 593–600. Omnipress, 2011.
  • O. Chapelle, B. Shölkopf, and A. Zien, editors., Semi-Supervised Learning. MIT Press, 2006.
  • X. Zhu. Semi-supervised learning literature survey. Technical report, University of Wisconsin – Madison, 2008.
  • Philippe Rigollet. Generalized error bounds in semi-supervised classification under the cluster assumption., J. Mach. Learn. Res., 8 :1369–1392, 2007.
  • Junhui Wang and Xiaotong Shen. Large margin semi-supervised learning., J. Mach. Learn. Res., 8 :1867–1891, 2007.
  • John D. Lafferty and Larry A. Wasserman. Statistical analysis of semi-supervised regression. In, NIPS, pages 801–808. Curran Associates, Inc., 2007.
  • Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: a geometric framework for learning from labeled and unlabeled examples., J. Mach. Learn. Res., 7 :2399–2434, 2006.
  • Boaz Nadler, Nathan Srebro, and Xueyuan Zhou. Statistical analysis of semi-supervised learning: The limit of infinite unlabelled data. In, Advances in Neural Information Processing Systems 22, pages 1330–1338. Curran Associates, Inc., 2009.
  • Partha Niyogi. Manifold regularization and semi-supervised learning: Some theoretical analyses., Journal of Machine Learning Research, 14 :1229–1250, 2013. URL http://jmlr.org/papers/v14/niyogi13a.html.
  • Shiliang Sun and John Shawe-Taylor. Sparse semi-supervised learning using conjugate functions., J. Mach. Learn. Res., 11 :2423–2455, 2010.
  • David Azriel, Lawrence D Brown, Michael Sklar, Richard Berk, Andreas Buja, and Linda Zhao. Semi-supervised linear regression., arXiv preprint arXiv :1612.02391, 2016.
  • A. Chakrabortty and T. Cai. Efficient and Adaptive Linear Regression in Semi-Supervised Settings., ArXiv e-prints, January 2017.
  • Robert Tibshirani. Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B, 58(1):267–288, 1996.
  • Peter Bühlmann and Sara van de Geer., Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011. Methods, theory and applications.
  • Pierre C Bellec, Guillaume Lecué, and Alexandre B Tsybakov. Slope meets lasso: improved oracle bounds and optimality., Annals of Statistics, 46(6B) :3603–3642, 2018. URL https://projecteuclid.org/euclid.aos/1536631285.
  • Vladimir Koltchinskii, Karim Lounici, and Alexandre B. Tsybakov. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion., Annals of Statistics, 39(5) :2302–2329, 2011.
  • Tingni Sun and Cun-Hui Zhang. Scaled sparse linear regression., Biometrika, 99(4):879–898, 2012.
  • Arnak S. Dalalyan, Mohamed Heibiri, and Johannes Lederer. On the prediction performance of the lasso., Bernoulli, 23(1):552-581, 2017.
  • Fei Ye and Cun-Hui Zhang. Rate minimaxity of the Lasso and Dantzig selector for the $\ell _q$ loss in $\ell _r$ balls., J. Mach. Learn. Res., 11 :3519–3540, 2010.
  • Garvesh Raskutti, Martin J. Wainwright, and Bin Yu. Minimax rates of estimation for high-dimensional linear regression over $\ell_q$-balls., IEEE Trans. Inform. Theory, 57(10) :6976–6994, 2011.
  • Philippe Rigollet and Alexandre Tsybakov. Exponential screening and optimal rates of sparse estimation., Ann. Statist., 39(2):731–771, 2011.
  • Philippe Rigollet and Alexandre B. Tsybakov. Sparse estimation by exponential weighting., Statist. Sci., 27(4):558–575, 2012.
  • Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov. Simultaneous analysis of lasso and Dantzig selector., Ann. Statist., 37(4) :1705–1732, 2009.
  • Sara van de Geer and Peter Bühlmann. On the conditions used to prove oracle results for the Lasso., Electron. J. Stat., 3 :1360–1392, 2009.
  • Vladimir Koltchinskii., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Eté de Probabilités de Saint-Flour XXXVIII -2008, volume 38. Springer, 2011.
  • Pierre Alquier and Mohamed Hebiri. Transductive versions of the LASSO and the dantzig selector., Journal of Statistical Planning and Inference, 142(9) :2485–2500, 2012.
  • Guillaume Lecué and Shahar Mendelson. Regularization and the small-ball method i: sparse recovery. Technical Report 1601.05584, arXiv, January, 2016.
  • Garvesh Raskutti, Martin J Wainwright, and Bin Yu. Restricted eigenvalue properties for correlated gaussian designs., The Journal of Machine Learning Research, 11 :2241–2259, 2010.
  • Roberto Imbuzeiro Oliveira. The lower tail of random quadratic forms, with applications to ordinary least squares and restricted eigenvalue properties., arXiv preprint arXiv :1312.2903, 2013.
  • Mark Rudelson and Shuheng Zhou. Reconstruction from anisotropic random measurements., Information Theory, IEEE Transactions on, 59(6) :3434–3447, 2013.
  • Anatoli Juditsky and Arkadi Nemirovski. Accuracy guarantees for-recovery., Information Theory, IEEE Transactions on, 57(12) :7818–7839, 2011.
  • Alexandre Belloni, Victor Chernozhukov, and Lie Wang. Pivotal estimation via square-root lasso in nonparametric regression., Ann. Statist., 42(2):757–788, 04 2014. URL http://dx.doi.org/10.1214/14-AOS1204.
  • M. Pensky. Solution of linear ill-posed problems using overcomplete dictionaries. Technical Report 1408.3386, Ann. Statist., to appear, arXiv, August, 2014.
  • R. Vershynin. Introduction to the non-asymptotic analysis of random matrices., ArXiv e-prints, November 2010.
  • Bubacarr Bah and Jared Tanner. Bounds of restricted isometry constants in extreme asymptotics: formulae for Gaussian matrices., Linear Algebra Appl., 441:88–109, 2014.
  • Joel A. Tropp. User-friendly tail bounds for sums of random matrices., Foundations of Computational Mathematics, 12(4):389–434, 2012.
  • Pascal Massart., Concentration Inequalities and Model Selection: Ecole d’Eté de Probabilités de Saint-Flour XXXIII - 2003, volume 1896. Springer, 2007.