Abstract
The Lasso is a method for high-dimensional regression, which is now commonly used when the number of covariates p is of the same order or larger than the number of observations n. Classical asymptotic normality theory does not apply to this model due to two fundamental reasons: The regularized risk is nonsmooth; The distance between the estimator and the true parameters vector cannot be neglected. As a consequence, standard perturbative arguments that are the traditional basis for asymptotic normality fail.
On the other hand, the Lasso estimator can be precisely characterized in the regime in which both n and p are large and is of order one. This characterization was first obtained in the case of Gaussian designs with i.i.d. covariates: here we generalize it to Gaussian correlated designs with non-singular covariance structure. This is expressed in terms of a simpler “fixed-design” model. We establish nonasymptotic bounds on the distance between the distribution of various quantities in the two models, which hold uniformly over signals in a suitable sparsity class and over values of the regularization parameter.
As an application, we study the distribution of the debiased Lasso and show that a degrees-of-freedom correction is necessary for computing valid confidence intervals.
Funding Statement
The first author was partially supported by NSF Grants CCF-1714305, IIS-1741162 and ONR Grant N00014-18-1-2729. We thank the anonymous reviewers for their valuable reviews.
The second author was partially supported by the National Science Foundation Graduate Research Fellowship Grant DGE-1656518.
The third author was partially supported by NSF Grants DMS-2015447/2147546, CAREER award DMS-2143215 and the Google Research Scholar Award.
Citation
Michael Celentano. Andrea Montanari. Yuting Wei. "The Lasso with general Gaussian designs with applications to hypothesis testing." Ann. Statist. 51 (5) 2194 - 2220, October 2023. https://doi.org/10.1214/23-AOS2327
Information