Abstract
In this paper, we test whether two data sets measured on the same set of subjects share a common clustering structure. As a leading example, we focus on comparing clustering structures in two independent random samples from two deterministic two-component mixtures of multivariate Gaussian distributions. Mean parameters of these Gaussian distributions are treated as potentially unknown nuisance parameters and are allowed to differ. Assuming knowledge of mean parameters, we first determine the phase diagram of the testing problem over the entire range of signal-to-noise ratios by providing both lower bounds and tests that achieve them. When nuisance parameters are unknown, we propose tests that achieve the detection boundary adaptively as long as ambient dimensions of the data sets grow at a sublinear rate with the sample size.
Funding Statement
The first author was supported in part by NSF CAREER award DMS-1847590 and NSF grant CCF-1934931. The second author was supported in part by NSF CAREER award DMS-1352060.
Citation
Chao Gao. Zongming Ma. "Testing equivalence of clustering." Ann. Statist. 50 (1) 407 - 429, February 2022. https://doi.org/10.1214/21-AOS2113
Information