April 2022 Sparse high-dimensional linear regression. Estimating squared error and a phase transition
David Gamarnik, Ilias Zadik
Author Affiliations +
Ann. Statist. 50(2): 880-903 (April 2022). DOI: 10.1214/21-AOS2130

Abstract

We consider a sparse high-dimensional regression model where the goal is to recover a k-sparse unknown binary vector β from n noisy linear observations of the form Y=Xβ+WRn where XRn×p has i.i.d. N(0,1) entries and WRn has i.i.d. N(0,σ2) entries. In the high signal-to-noise ratio regime and sublinear sparsity regime, while the order of the sample size needed to recover the unknown vector information-theoretically is known to be n:=2klogp/log(k/σ2+1), no polynomial-time algorithm is known to succeed unless n>nalg:=(2k+σ2)logp.

In this work, we offer a series of results investigating multiple computational and statistical aspects of the recovery task in the regime n[n,nalg]. First, we establish a novel information-theoretic property of the MLE of the problem happening around n=n samples, which we coin as an “all-or-nothing behavior”: when n>n it recovers almost perfectly the support of β, while if n<n it fails to recover any fraction of it correctly. Second, at an attempt to understand the computational hardness in the regime n[n,nalg], we prove that at order nalg samples there is an Overlap Gap Property (OGP) phase transition occurring at the landscape of the MLE: for constants c,C>0 when n<cnalg OGP appears in the landscape of MLE while if n>Cnalg OGP disappears. OGP is a geometric “disconnectivity” property, which initially appeared in the theory of spin glasses and is known to suggest algorithmic hardness when it occurs. Finally, using certain technical results obtained to establish the OGP phase transition, we additionally establish various novel positive and negative algorithmic results for the recovery task of interest, including the failure of LASSO with access to n<cnalg samples and the success of a simple local search method with access to n>Cnalg samples.

Funding Statement

The first author’s research was supported by the NSF Grant CMMI-1335155.

Citation

Download Citation

David Gamarnik. Ilias Zadik. "Sparse high-dimensional linear regression. Estimating squared error and a phase transition." Ann. Statist. 50 (2) 880 - 903, April 2022. https://doi.org/10.1214/21-AOS2130

Information

Received: 1 May 2018; Revised: 1 August 2021; Published: April 2022
First available in Project Euclid: 7 April 2022

MathSciNet: MR4404922
zbMATH: 1486.62200
Digital Object Identifier: 10.1214/21-AOS2130

Subjects:
Primary: 62J05
Secondary: 62B10 , 68Q87

Keywords: all-or-nothing phenomenon , high-dimensional linear regression , overlap gap property , Second moment method , Sparsity

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
24 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.50 • No. 2 • April 2022
Back to Top