Fundamental barriers to high-dimensional regression with convex penalties

Michael Celentano; Andrea Montanari

doi:10.1214/21-AOS2100

Abstract

In high-dimensional regression, we attempt to estimate a parameter vector ${\beta _{0}}\in {\mathbb{R}^{p}}$ from $n\lesssim p$ observations ${\{({y_{i}},{\boldsymbol{x}_{i}})\}_{i\le n}}$ , where ${\boldsymbol{x}_{i}}\in {\mathbb{R}^{p}}$ is a vector of predictors and ${y_{i}}$ is a response variable. A well-established approach uses convex regularizers to promote specific structures (e.g., sparsity) of the estimate $\widehat{\beta }$ while allowing for practical algorithms. Theoretical analysis implies that convex penalization schemes have nearly optimal estimation properties in certain settings. However, in general the gaps between statistically optimal estimation (with unbounded computational resources) and convex methods are poorly understood.

We show that when the statistican has very simple structural information about the distribution of the entries of ${\beta _{0}}$ , a large gap frequently exists between the best performance achieved by any convex regularizer satisfying a mild technical condition and either: (i) the optimal statistical error or (ii) the statistical error achieved by optimal approximate message passing algorithms. Remarkably, a gap occurs at high enough signal-to-noise ratio if and only if the distribution of the coordinates of ${\beta _{0}}$ is not log-concave. These conclusions follow from an analysis of standard Gaussian designs. Our lower bounds are expected to be generally tight, and we prove tightness under certain conditions.

Funding Statement

The first author was supported in part by NSF Grants DGE – 1656518, CCF – 1714305, IIS – 1741162, and ONR N00014-18-1-2729.

Citation

Download Citation

Michael Celentano. Andrea Montanari. "Fundamental barriers to high-dimensional regression with convex penalties." Ann. Statist. 50 (1) 170 - 196, February 2022. https://doi.org/10.1214/21-AOS2100

Information

Received: 1 April 2019; Revised: 1 June 2021; Published: February 2022

First available in Project Euclid: 16 February 2022

MathSciNet: MR4382013

zbMATH: 1486.62198

Digital Object Identifier: 10.1214/21-AOS2100

Subjects:

Primary: 62J05 , 62J07

Secondary: 62F12

Keywords: approximate message passing , computational to statistical gaps , convex , high-dimensional regression , M-estimation , Penalty

Abstract

Funding Statement

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS