## The Annals of Statistics

- Ann. Statist.
- Volume 45, Number 4 (2017), 1403-1430.

### Computational and statistical boundaries for submatrix localization in a large noisy matrix

T. Tony Cai, Tengyuan Liang, and Alexander Rakhlin

#### Abstract

We study in this paper computational and statistical boundaries for submatrix *localization*. Given one observation of (one or multiple nonoverlapping) signal submatrix (of magnitude $\lambda$ and size $k_{m}\times k_{n}$) embedded in a large noise matrix (of size $m\times n$), the goal is to optimal identify the support of the signal submatrix computationally and statistically.

Two transition thresholds for the signal-to-noise ratio $\lambda/\sigma$ are established in terms of $m$, $n$, $k_{m}$ and $k_{n}$. The first threshold, $\sf SNR_{c}$, corresponds to the computational boundary. We introduce a new linear time spectral algorithm that identifies the submatrix with high probability when the signal strength is above the threshold $\sf SNR_{c}$. Below this threshold, it is shown that no polynomial time algorithm can succeed in identifying the submatrix, under the *hidden clique hypothesis*. The second threshold, $\sf SNR_{s}$, captures the statistical boundary, below which no method can succeed in localization with probability going to one in the minimax sense. The exhaustive search method successfully finds the submatrix above this threshold. In marked contrast to submatrix detection and sparse PCA, the results show an interesting phenomenon that $\sf SNR_{c}$ is *always* significantly larger than $\sf SNR_{s}$ under the sub-Gaussian error model, which implies an essential gap between statistical optimality and computational efficiency for submatrix localization.

#### Article information

**Source**

Ann. Statist., Volume 45, Number 4 (2017), 1403-1430.

**Dates**

Received: October 2015

Revised: April 2016

First available in Project Euclid: 28 June 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1498636861

**Digital Object Identifier**

doi:10.1214/16-AOS1488

**Mathematical Reviews number (MathSciNet)**

MR3670183

**Zentralblatt MATH identifier**

06773278

**Subjects**

Primary: 62C20: Minimax procedures

Secondary: 90C27: Combinatorial optimization

**Keywords**

Computational boundary computational complexity detection planted clique lower bounds minimax signal-to-noise ratio statistical boundary submatrix localization

#### Citation

Cai, T. Tony; Liang, Tengyuan; Rakhlin, Alexander. Computational and statistical boundaries for submatrix localization in a large noisy matrix. Ann. Statist. 45 (2017), no. 4, 1403--1430. doi:10.1214/16-AOS1488. https://projecteuclid.org/euclid.aos/1498636861

#### Supplemental materials

- Supplement to “Computational and statistical boundaries for submatrix localization in a large noisy matrix”. Due to space constraints, we have relegated remaining proofs to the supplement.Digital Object Identifier: doi:10.1214/16-AOS1488SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.