## The Annals of Statistics

### Equivalence of distance-based and RKHS-based statistics in hypothesis testing

#### Abstract

We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

#### Article information

Source
Ann. Statist., Volume 41, Number 5 (2013), 2263-2291.

Dates
First available in Project Euclid: 5 November 2013

https://projecteuclid.org/euclid.aos/1383661264

Digital Object Identifier
doi:10.1214/13-AOS1140

Mathematical Reviews number (MathSciNet)
MR3127866

Zentralblatt MATH identifier
1281.62117

#### Citation

Sejdinovic, Dino; Sriperumbudur, Bharath; Gretton, Arthur; Fukumizu, Kenji. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 (2013), no. 5, 2263--2291. doi:10.1214/13-AOS1140. https://projecteuclid.org/euclid.aos/1383661264

#### References

• Alba Fernández, V., Jiménez Gamero, M. D. and Muñoz García, J. (2008). A test for the two-sample problem based on empirical characteristic functions. Comput. Statist. Data Anal. 52 3730–3748.
• Anderson, N. H., Hall, P. and Titterington, D. M. (1994). Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J. Multivariate Anal. 50 41–54.
• Arcones, M. A. and Giné, E. (1992). On the bootstrap of $U$ and $V$ statistics. Ann. Statist. 20 655–674.
• Bach, F. R. and Jordan, M. I. (2002). Kernel independent component analysis. J. Mach. Learn. Res. 3 1–48.
• Baringhaus, L. and Franz, C. (2004). On a new multivariate two-sample test. J. Multivariate Anal. 88 190–206.
• Berg, C., Christensen, J. P. R. and Ressel, P. (1984). Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Graduate Texts in Mathematics 100. Springer, New York.
• Berlinet, A. and Thomas-Agnan, C. (2004). Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, London.
• Fukumizu, K., Song, L. and Gretton, A. (2011). Kernel Bayes’ rule. In Advances in Neural Information Processing Systems (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 24 1737–1745. Curran Associates, Red Hook, NY.
• Fukumizu, K., Gretton, A., Sun, X. and Schölkopf, B. (2008). Kernel measures of conditional dependence. In Advances in Neural Information Processing Systems 20 489–496. MIT Press, Cambridge, MA.
• Fukumizu, K., Sriperumbudur, B., Gretton, A. and Schoelkopf, B. (2009). Characteristic kernels on groups and semigroups. In Advances in Neural Information Processing Systems 21 473–480. Curran Associates, Red Hook, NY.
• Gretton, A., Fukumizu, K. and Sriperumbudur, B. K. (2009). Discussion of: Brownian distance covariance. Ann. Appl. Stat. 3 1285–1294.
• Gretton, A. and Györfi, L. (2010). Consistent nonparametric tests of independence. J. Mach. Learn. Res. 11 1391–1423.
• Gretton, A., Bousquet, O., Smola, A. and Schölkopf, B. (2005). Measuring statistical dependence with Hilbert–Schmidt norms. In Algorithmic Learning Theory (S. Jain, H. U. Simon and E. Tomita, eds.). Lecture Notes in Computer Science 3734 63–77. Springer, Berlin.
• Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B. and Smola, A. (2007). A kernel method for the two-sample problem. In NIPS 513–520. MIT Press, Cambridge, MA.
• Gretton, A., Fukumizu, K., Teo, C. H., Song, L., Schölkopf, B. and Smola, A. (2008). A kernel statistical test of independence. In Advances in Neural Information Processing Systems 20 585–592. MIT Press, Cambridge, MA.
• Gretton, A., Fukumizu, K., Harchaoui, Z. and Sriperumbudur, B. (2009). A fast, consistent kernel two-sample test. In Advances in Neural Information Processing Systems 22. Curran Associates, Red Hook, NY.
• Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B. and Smola, A. (2012a). A kernel two-sample test. J. Mach. Learn. Res. 13 723–773.
• Gretton, A., Sriperumbudur, B., Sejdinovic, D., Strathmann, H., Balakrishnan, S., Pontil, M. and Fukumizu, K. (2012b). Optimal kernel choice for large-scale two-sample tests. In Advances in Neural Information Processing Systems 25 1214–1222. Curran Associates, Red Hook, NY.
• Lyons, R. (2013). Distance covariance in metric spaces. Ann. Probab. 41 3284–3305.
• Müller, A. (1997). Integral probability metrics and their generating classes of functions. Adv. in Appl. Probab. 29 429–443.
• Reed, M. and Simon, B. (1980). Methods of Modern Mathematical Physics. I: Functional Analysis, 2nd ed. Academic Press, San Diego.
• Schölkopf, B., Smola, A. J. and Müller, K. R. (1997). Kernel principal component analysis. In ICANN (W. Gerstner, A. Germond, M. Hasler and J. D. Nicoud, eds.). Lecture Notes in Computer Science 1327 583–588. Springer, Berlin.
• Sejdinovic, D., Gretton, A., Sriperumbudur, B. and Fukumizu, K. (2012). Hypothesis testing using pairwise distances and associated kernels. In Proceedings of the International Conference on Machine Learning (ICML) 1111–1118. Omnipress, New York.
• Smola, A. J., Gretton, A., Song, L. and Schölkopf, B. (2007). A Hilbert space embedding for distributions. In Proceedings of the Conference on Algorithmic Learning Theory (ALT) 4754 13–31. Springer, Berlin.
• Sriperumbudur, B. (2011). Mixture density estimation via Hilbert space embedding of measures. In Proceedings of the International Symposium on Information Theory 1027–1030. IEEE, Piscataway, NJ.
• Sriperumbudur, B. K., Fukumizu, K. and Lanckriet, G. R. G. (2011). Universality, characteristic kernels and RKHS embedding of measures. J. Mach. Learn. Res. 12 2389–2410.
• Sriperumbudur, B., Gretton, A., Fukumizu, K., Lanckriet, G. and Schölkopf, B. (2008). Injective Hilbert space embeddings of probability measures. In Proceedings of the Conference on Learning Theory (COLT) 111–122. Omnipress, New York.
• Sriperumbudur, B., Fukumizu, K., Gretton, A., Lanckriet, G. and Schoelkopf, B. (2009). Kernel choice and classifiability for RKHS embeddings of probability distributions. In Advances in Neural Information Processing Systems 22. Curran Associates, Red Hook, NY.
• Sriperumbudur, B. K., Gretton, A., Fukumizu, K., Schölkopf, B. and Lanckriet, G. R. G. (2010). Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11 1517–1561.
• Sriperumbudur, B. K., Fukumizu, K., Gretton, A., Schölkopf, B. and Lanckriet, G. R. G. (2012). On the empirical estimation of integral probability metrics. Electron. J. Stat. 6 1550–1599.
• Steinwart, I. and Christmann, A. (2008). Support Vector Machines. Springer, New York.
• Székely, G. and Rizzo, M. (2004). Testing for equal distributions in high dimension. InterStat 5.
• Székely, G. J. and Rizzo, M. L. (2005). A new test for multivariate normality. J. Multivariate Anal. 93 58–80.
• Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794.
• Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236–1265.
• Zhang, K., Peters, J., Janzing, D. and Schoelkopf, B. (2011). Kernel-based conditional independence test and application in causal discovery. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) 804–813. AUAI Press, Corvallis, Oregon.