The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning

Léo Miolane; Andrea Montanari

doi:10.1214/20-AOS2038

Abstract

The Lasso is a popular regression method for high-dimensional problems in which the number of parameters ${\theta _{1}},\dots ,{\theta _{N}}$ , is larger than the number n of samples: $N\textgreater n$ . A useful heuristics relates the statistical properties of the Lasso estimator to that of a simple soft-thresholding denoiser, in a denoising problem in which the parameters ${({\theta _{i}})_{i\le N}}$ are observed in Gaussian noise, with a carefully tuned variance. Earlier work confirmed this picture in the limit $n,N\to \infty$ , pointwise in the parameters θ and in the value of the regularization parameter.

Here, we consider a standard random design model and prove exponential concentration of its empirical distribution around the prediction provided by the Gaussian denoising model. Crucially, our results are uniform with respect to θ belonging to ${\ell _{q}}$ balls, $q\in [0,1]$ , and with respect to the regularization parameter. This allows us to derive sharp results for the performances of various data-driven procedures to tune the regularization.

Our proofs make use of Gaussian comparison inequalities, and in particular of a version of Gordon’s minimax theorem developed by Thrampoulidis, Oymak and Hassibi, which controls the optimum value of the Lasso optimization problem. Crucially, we prove a stability property of the minimizer in Wasserstein distance that allows one to characterize properties of the minimizer itself.

Funding Statement

This work was partially supported by grants NSF DMS-1613091, NSF CCF-1714305 and NSF IIS-1741162 and ONR N00014-18-1-2729.

Citation

Download Citation

Léo Miolane. Andrea Montanari. "The distribution of the Lasso: Uniform control over sparse balls and adaptive parameter tuning." Ann. Statist. 49 (4) 2313 - 2335, August 2021. https://doi.org/10.1214/20-AOS2038

Information

Received: 1 July 2020; Revised: 1 November 2020; Published: August 2021

First available in Project Euclid: 29 September 2021

MathSciNet: MR4319252

zbMATH: 1480.62145

Digital Object Identifier: 10.1214/20-AOS2038

Subjects:

Primary: 62J05 , 62J07

Keywords: cross-validation , Lasso , Linear regression , Sparsity

Abstract

Funding Statement

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS