Abstract
This paper studies the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theory for testing whether a cluster exists (detection) and identifying the support of cluster (recovery).
Specifically, we identify the sharp boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible. We also develop the tight computational thresholds: when the signal-to-noise ratio is below these thresholds, we prove that polynomial-time algorithms cannot solve these problems under the computational hardness conjectures of hypergraphic planted clique (HPC) detection and hypergraphic planted dense subgraph (HPDS) recovery. We also propose polynomial-time tensor algorithms that achieve reliable detection and recovery when the signal-to-noise ratio is above these thresholds. Both sparsity and tensor structures yield the computational barriers in high-order tensor clustering. The interplay between them results in significant differences between high-order tensor clustering and matrix clustering in literature in aspects of statistical and computational phase transition diagrams, algorithmic approaches, hardness conjecture, and proof techniques. To our best knowledge, we are the first to give a thorough characterization of the statistical and computational trade-off for such a double computational-barrier problem. Finally, we provide evidence for the computational hardness conjectures of HPC detection (via low-degree polynomial and Metropolis methods) and HPDS recovery (via low-degree polynomial method).
Funding Statement
This work was supported in part by NSF Grant CAREER-1944904, NSF Grants DMS-1811868 and DMS-2023239, NIH Grant R01 GM131399, and Wisconsin Alumni Research Foundation (WARF).
Acknowledgment
We would like to thank Guy Bresler for the helpful discussions. We also thank the Editor, Associate Editor, and two anonymous referees for their helpful suggestions, which helped improve the presentation and quality of this paper.
Citation
Yuetian Luo. Anru R. Zhang. "Tensor clustering with planted structures: Statistical optimality and computational limits." Ann. Statist. 50 (1) 584 - 613, February 2022. https://doi.org/10.1214/21-AOS2123
Information