## Abstract

In the standard Gaussian linear measurement model $\mathit{Y}=\mathit{X}{\mathit{\mu}}_{0}\mathbf{+}\mathit{\xi}\in {\mathbb{R}}^{\mathit{m}}$ with a fixed noise level $\mathit{\sigma}>0$, we consider the problem of estimating the unknown signal ${\mathit{\mu}}_{0}$ under a convex constraint ${\mathit{\mu}}_{0}\in \mathit{K}$, where *K* is a closed convex set in ${\mathbb{R}}^{\mathit{n}}$. We show that the risk of the natural convex constrained least squares estimator (LSE) $\stackrel{\u02c6}{\mathit{\mu}}(\mathit{\sigma})$ can be characterized exactly in high-dimensional limits, by that of the convex constrained LSE ${\stackrel{\u02c6}{\mathit{\mu}}}_{\mathit{K}}^{\mathsf{seq}}$ in the corresponding Gaussian sequence model at a different noise level. Formally, we show that

$${\Vert \stackrel{\u02c6}{\mathit{\mu}}(\mathit{\sigma})-{\mathit{\mu}}_{0}\Vert}^{2}/\left(\mathit{n}{\mathit{r}}_{\mathit{n}}^{2}\right)\to 1\phantom{\rule{1em}{0ex}}\text{in probability},$$

where ${\mathit{r}}_{\mathit{n}}^{2}>0$ solves the fixed-point equation

$$\mathbb{E}{\Vert {\stackrel{\u02c6}{\mathit{\mu}}}_{\mathit{K}}^{\mathsf{seq}}\left(\sqrt{({\mathit{r}}_{\mathit{n}}^{2}\mathbf{+}{\mathit{\sigma}}^{2})/(\mathit{m}/\mathit{n})}\right)-{\mathit{\mu}}_{0}\Vert}^{2}=\mathit{n}{\mathit{r}}_{\mathit{n}}^{2}.$$

This characterization holds (uniformly) for risks ${\mathit{r}}_{\mathit{n}}^{2}$ in the maximal regime that ranges from constant order all the way down to essentially the parametric rate, as long as certain necessary nondegeneracy condition is satisfied for $\stackrel{\u02c6}{\mathit{\mu}}(\mathit{\sigma})$.

The precise risk characterization reveals a fundamental difference between noiseless (or low noise limit) and noisy linear inverse problems in terms of the sample complexity for signal recovery. A concrete example is given by the isotonic regression problem: While exact recovery of a general monotone signal requires $\mathit{m}\gg {\mathit{n}}^{1/3}$ samples in the noiseless setting, consistent signal recovery in the noisy setting requires as few as $\mathit{m}\gg log\mathit{n}$ samples. Such a discrepancy occurs when the low and high noise risk behavior of ${\stackrel{\u02c6}{\mathit{\mu}}}_{\mathit{K}}^{\mathsf{seq}}$ differ significantly. In statistical languages, this occurs when ${\stackrel{\u02c6}{\mathit{\mu}}}_{\mathit{K}}^{\mathsf{seq}}$ estimates 0 at a faster “adaptation rate” than the slower “worst-case rate” for general signals. Several other examples, including nonnegative least squares and generalized Lasso (in constrained forms), are also worked out to demonstrate the concrete applicability of the theory in problems of different types.

The proof relies on a collection of new analytic and probabilistic results concerning estimation error, log likelihood ratio test statistics and degree-of-freedom associated with ${\stackrel{\u02c6}{\mathit{\mu}}}_{\mathit{K}}^{\mathsf{seq}}$, regarded as stochastic processes indexed by the noise level. These results are of independent interest in and of themselves.

## Funding Statement

The research of Q. Han is partially supported by NSF Grants DMS-1916221 and DMS-2143468.

## Acknowledgments

The author would like to thank three referees, an Associate Editor and the Editor for a large number of helpful comments and suggestions that significantly improved the quality of the paper.

## Citation

Qiyang Han. "Noisy linear inverse problems under convex constraints: Exact risk asymptotics in high dimensions." Ann. Statist. 51 (4) 1611 - 1638, August 2023. https://doi.org/10.1214/23-AOS2301

## Information