February 2022 Testing equivalence of clustering
Chao Gao, Zongming Ma
Author Affiliations +
Ann. Statist. 50(1): 407-429 (February 2022). DOI: 10.1214/21-AOS2113

Abstract

In this paper, we test whether two data sets measured on the same set of subjects share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two deterministic two-component mixtures of multivariate Gaussian distributions. Mean parameters of these Gaussian distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the data sets grow at a sublinear rate with the sample size.

Funding Statement

The first author was supported in part by NSF CAREER award DMS-1847590 and NSF grant CCF-1934931. The second author was supported in part by NSF CAREER award DMS-1352060.

Citation

Download Citation

Chao Gao. Zongming Ma. "Testing equivalence of clustering." Ann. Statist. 50 (1) 407 - 429, February 2022. https://doi.org/10.1214/21-AOS2113

Information

Received: 1 February 2020; Revised: 1 June 2021; Published: February 2022
First available in Project Euclid: 16 February 2022

MathSciNet: MR4382022
zbMATH: 1486.62184
Digital Object Identifier: 10.1214/21-AOS2113

Subjects:
Primary: 62C20 , 62H15
Secondary: 62H30

Keywords: discrete structure inference , High-dimensional statistics , Minimax testing error , phase transition , sparse mixture

Rights: Copyright © 2022 Institute of Mathematical Statistics

Vol.50 • No. 1 • February 2022
Back to Top