Abstract
In this paper, we study distance covariance, Hilbert–Schmidt covariance (aka Hilbert–Schmidt independence criterion [In Advances in Neural Information Processing Systems (2008) 585–592]) and related independence tests under the high dimensional scenario. We show that the sample distance/Hilbert–Schmidt covariance between two random vectors can be approximated by the sum of squared componentwise sample cross-covariances up to an asymptotically constant factor, which indicates that the standard distance/Hilbert–Schmidt covariance based test can only capture linear dependence in high dimension. Under the assumption that the components within each high dimensional vector are weakly dependent, the distance correlation based $t$ test developed by Székely and Rizzo (J. Multivariate Anal. 117 (2013) 193–213) for independence is shown to have trivial limiting power when the two random vectors are nonlinearly dependent but component-wisely uncorrelated. This new and surprising phenomenon, which seems to be discovered and carefully studied for the first time, is further confirmed in our simulation study. As a remedy, we propose tests based on an aggregation of marginal sample distance/Hilbert–Schmidt covariances and show their superior power behavior against their joint counterparts in simulations. We further extend the distance correlation based $t$ test to those based on Hilbert–Schmidt covariance and marginal distance/Hilbert–Schmidt covariance. A novel unified approach is developed to analyze the studentized sample distance/Hilbert–Schmidt covariance as well as the studentized sample marginal distance covariance under both null and alternative hypothesis. Our theoretical and simulation results shed light on the limitation of distance/Hilbert–Schmidt covariance when used jointly in the high dimensional setting and suggest the aggregation of marginal distance/Hilbert–Schmidt covariance as a useful alternative.
Citation
Changbo Zhu. Xianyang Zhang. Shun Yao. Xiaofeng Shao. "Distance-based and RKHS-based dependence metrics in high dimension." Ann. Statist. 48 (6) 3366 - 3394, December 2020. https://doi.org/10.1214/19-AOS1934
Information