Open Access
December 2018 Debiasing the lasso: Optimal sample size for Gaussian designs
Adel Javanmard, Andrea Montanari
Ann. Statist. 46(6A): 2593-2622 (December 2018). DOI: 10.1214/17-AOS1630

Abstract

Performing statistical inference in high-dimensional models is challenging because of the lack of precise information on the distribution of high-dimensional regularized estimators.

Here, we consider linear regression in the high-dimensional regime $p>>n$ and the Lasso estimator: we would like to perform inference on the parameter vector $\theta^{*}\in\mathbb{R}^{p}$. Important progress has been achieved in computing confidence intervals and $p$-values for single coordinates $\theta^{*}_{i}$, $i\in\{1,\dots,p\}$. A key role in these new inferential methods is played by a certain debiased estimator $\widehat{\theta}^{\mathrm{d}}$. Earlier work establishes that, under suitable assumptions on the design matrix, the coordinates of $\widehat{\theta}^{\mathrm{d}}$ are asymptotically Gaussian provided the true parameters vector $\theta^{*}$ is $s_{0}$-sparse with $s_{0}=o(\sqrt{n}/\log p)$.

The condition $s_{0}=o(\sqrt{n}/\log p)$ is considerably stronger than the one for consistent estimation, namely $s_{0}=o(n/\log p)$. In this paper, we consider Gaussian designs with known or unknown population covariance. When the covariance is known, we prove that the debiased estimator is asymptotically Gaussian under the nearly optimal condition $s_{0}=o(n/(\log p)^{2})$.

The same conclusion holds if the population covariance is unknown but can be estimated sufficiently well. For intermediate regimes, we describe the trade-off between sparsity in the coefficients $\theta^{*}$, and sparsity in the inverse covariance of the design. We further discuss several applications of our results beyond high-dimensional inference. In particular, we propose a thresholded Lasso estimator that is minimax optimal up to a factor $1+o_{n}(1)$ for i.i.d. Gaussian designs.

Citation

Download Citation

Adel Javanmard. Andrea Montanari. "Debiasing the lasso: Optimal sample size for Gaussian designs." Ann. Statist. 46 (6A) 2593 - 2622, December 2018. https://doi.org/10.1214/17-AOS1630

Information

Received: 1 June 2016; Revised: 1 August 2017; Published: December 2018
First available in Project Euclid: 7 September 2018

zbMATH: 06968593
MathSciNet: MR3851749
Digital Object Identifier: 10.1214/17-AOS1630

Subjects:
Primary: 62J05 , 62J07
Secondary: 62F12

Keywords: bias and variance , confidence intervals , high-dimensional regression , Hypothesis testing , Lasso , sample size

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.46 • No. 6A • December 2018
Back to Top