Classification accuracy as a proxy for two-sample testing

Ilmun Kim; Aaditya Ramdas; Aarti Singh; Larry Wasserman

doi:10.1214/20-AOS1962

February 2021 Classification accuracy as a proxy for two-sample testing

Ilmun Kim, Aaditya Ramdas, Aarti Singh, Larry Wasserman

Ann. Statist. 49(1): 411-434 (February 2021). DOI: 10.1214/20-AOS1962

Abstract

When data analysts train a classifier and check if its accuracy is significantly different from chance, they are implicitly performing a two-sample test. We investigate the statistical properties of this flexible approach in the high-dimensional setting. We prove two results that hold for all classifiers in any dimensions: if its true error remains $\epsilon $-better than chance for some $\epsilon >0$ as $d,n\to \infty $, then (a) the permutation-based test is consistent (has power approaching to one), (b) a computationally efficient test based on a Gaussian approximation of the null distribution is also consistent. To get a finer understanding of the rates of consistency, we study a specialized setting of distinguishing Gaussians with mean-difference $\delta $ and common (known or unknown) covariance $\Sigma $, when $d/n\to c\in (0,\infty )$. We study variants of Fisher’s linear discriminant analysis (LDA) such as “naive Bayes” in a nontrivial regime when $\epsilon \to 0$ (the Bayes classifier has true accuracy approaching 1/2), and contrast their power with corresponding variants of Hotelling’s test. Surprisingly, the expressions for their power match exactly in terms of $n$, $d$, $\delta $, $\Sigma $, and the LDA approach is only worse by a constant factor, achieving an asymptotic relative efficiency (ARE) of $1/\sqrt{\pi }$ for balanced samples. We also extend our results to high-dimensional elliptical distributions with finite kurtosis. Other results of independent interest include minimax lower bounds, and the optimality of Hotelling’s test when $d=o(n)$. Simulation results validate our theory, and we present practical takeaway messages along with natural open problems.

Citation

Download Citation

Ilmun Kim. Aaditya Ramdas. Aarti Singh. Larry Wasserman. "Classification accuracy as a proxy for two-sample testing." Ann. Statist. 49 (1) 411 - 434, February 2021. https://doi.org/10.1214/20-AOS1962

Information

Received: 1 May 2019; Revised: 1 February 2020; Published: February 2021

First available in Project Euclid: 29 January 2021

Digital Object Identifier: 10.1214/20-AOS1962

Subjects:

Primary: 62H15

Secondary: 62E20

Keywords: Classification accuracy , high-dimensional asymptotics , Hotelling’s $T^{2}$ test , linear discriminant analysis , Permutation test , two sample testing

Access the abstract

JOURNAL ARTICLE
24 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY