The Annals of Statistics

Partial distance correlation with methods for dissimilarities

Gábor J. Székely and Maria L. Rizzo

Full-text: Open access

Abstract

Distance covariance and distance correlation are scalar coefficients that characterize independence of random vectors in arbitrary dimension. Properties, extensions and applications of distance correlation have been discussed in the recent literature, but the problem of defining the partial distance correlation has remained an open question of considerable interest. The problem of partial distance correlation is more complex than partial correlation partly because the squared distance covariance is not an inner product in the usual linear space. For the definition of partial distance correlation, we introduce a new Hilbert space where the squared distance covariance is the inner product. We define the partial distance correlation statistics with the help of this Hilbert space, and develop and implement a test for zero partial distance correlation. Our intermediate results provide an unbiased estimator of squared distance covariance, and a neat solution to the problem of distance correlation for dissimilarities rather than distances.

Article information

Source
Ann. Statist., Volume 42, Number 6 (2014), 2382-2412.

Dates
First available in Project Euclid: 20 October 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1413810731

Digital Object Identifier
doi:10.1214/14-AOS1255

Mathematical Reviews number (MathSciNet)
MR3269983

Zentralblatt MATH identifier
1309.62105

Subjects
Primary: 62Hxx: Multivariate analysis [See also 60Exx] 62H20: Measures of association (correlation, canonical correlation, etc.) 62H15: Hypothesis testing
Secondary: 62Gxx: Nonparametric inference

Keywords
Independence multivariate partial distance correlation dissimilarity energy statistics

Citation

Székely, Gábor J.; Rizzo, Maria L. Partial distance correlation with methods for dissimilarities. Ann. Statist. 42 (2014), no. 6, 2382--2412. doi:10.1214/14-AOS1255. https://projecteuclid.org/euclid.aos/1413810731


Export citation

References

  • [1] Baba, K., Shibata, R. and Sibuya, M. (2004). Partial correlation and conditional correlation as measures of conditional independence. Aust. N.Z. J. Stat. 46 657–664.
  • [2] Cailliez, F. (1983). The analytical solution of the additive constant problem. Psychometrika 48 305–308.
  • [3] Cox, T. F. and Cox, M. A. A. (2001). Multidimensional Scaling, 2nd ed. Chapman & Hall, London.
  • [4] Dueck, J., Edelmann, D., Gneiting, T. and Richards, D. (2012). The affinely invariant distance correlation. Preprint. Available at arXiv:1210.2482.
  • [5] Feuerverger, A. (1993). A consistent test for bivariate dependence. Int. Stat. Rev. 61 419–433.
  • [6] Goslee, S. C. and Urban, D. L. (2007). The ecodist package for dissimilarity-based analysis of ecological data. J. Stat. Softw. 22 1–19.
  • [7] Gower, J. C. (1966). Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53 325–338.
  • [8] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning, 2nd ed. Springer, New York.
  • [9] Huber, J. (1981). Partial and semipartial correlation—A vector approach. Two-Year Coll. Math. J. 12 151–153.
  • [10] Josse, J. and Holmes, S. (2013). Measures of dependence between random vectors and tests of independence. Literature review. Available at arXiv:1307.7383.
  • [11] Kim, S. (2012). ppcor: Partial and semi-partial (part) correlation. R package version 1.0. Available at http://CRAN.R-project.org/package=ppcor.
  • [12] Kong, J., Klein, B. E. K., Klein, R., Lee, K. and Wahba, G. (2012). Using distance correlation and SS-ANOVA to assess associations of familial relationships, lifestyle factors, diseases, and mortality. Proc. Natl. Acad. Sci. 109 20352–20357.
  • [13] Legendre, P. (2000). Comparison of permutation methods for the partial correlation and partial Mantel tests. J. Stat. Comput. Simul. 67 37–73.
  • [14] Legendre, P. and Legendre, L. (2012). Numerical Ecology, 3rd English ed. Elsevier, Amsterdam.
  • [15] Li, R., Zhong, W. and Zhu, L. (2012). Feature screening via distance correlation learning. J. Amer. Statist. Assoc. 107 1129–1139.
  • [16] Lyons, R. (2013). Distance covariance in metric spaces. Ann. Probab. 41 3284–3305.
  • [17] Mantel, N. (1967). The detection of disease clustering and a generalized regression approach. Cancer Res. 27 209–220.
  • [18] Mardia, K. V. (1978). Some properties of classical multi-dimensional scaling. Comm. Statist. Theory Methods 7 1233–1241.
  • [19] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
  • [20] Oksanen, J., Guillaume Blanchet, F., Kindt, R., Legendre, P., Minchin, P. R., O’Hara, R. B., Simpson, G. L., Solymos, P., Stevens, M. H. H. and Wagner, H. (2013). vegan: Community ecology package. R package version 2.0-7. Available at http://CRAN.R-project.org/package=vegan.
  • [21] Piepho, H. P. (2005). Permutation tests for the correlation among genetic distances and measures of heterosis. Theor. Appl. Genet. 111 95–99.
  • [22] R Core Team (2013). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.R-project.org/.
  • [23] Reif, J. C., Melchinger, A. E., Xia, X. C., Warburton, M. L., Hoisington, D. A., Vasal, S. K., Srinivasan, G., Bohn, M. and Frisch, M. (2003). Genetic distance based on simple sequence repeats and heterosis in tropical maize populations. Crop Sci. 43 1275–1282.
  • [24] Rizzo, M. L. and Székely, G. J. (2013). pdcor: Partial distance correlation. R package version 1.0.0.
  • [25] Rizzo, M. L. and Székely, G. J. (2014). energy: E-statistics (energy statistics). R package version 1.6.1. Available at http://CRAN.R-project.org/package=energy.
  • [26] Schoenberg, I. J. (1935). Remarks to Maurice Fréchet’s article “Sur la définition axiomatique d’une classe d’espace distanciés vectoriellement applicable sur l’espace de Hilbert.” Ann. of Math. (2) 36 724–732.
  • [27] Sejdinovic, D., Sriperumbudur, B., Gretton, A. and Fukumizu, K. (2013). Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Statist. 41 2263–2291.
  • [28] Smouse, P. E., Long, J. C. and Sokal, R. R. (1986). Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst. Zool. 35 7–632.
  • [29] Stamey, T. A., Kabalin, J. N., McNeal, J. E., Johnstone, I. M., Freiha, F., Redwine, E. A. and Yang, N. (1989). Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate. II. Radical prostatectomy treated patients. J. Urol. 141 1076–1083.
  • [30] Székely, G. J. and Rizzo, M. L. (2009). Brownian distance covariance. Ann. Appl. Stat. 3 1236–1265.
  • [31] Székely, G. J. and Rizzo, M. L. (2012). On the uniqueness of distance covariance. Statist. Probab. Lett. 82 2278–2282.
  • [32] Székely, G. J. and Rizzo, M. L. (2013). The distance correlation $t$-test of independence in high dimension. J. Multivariate Anal. 117 193–213.
  • [33] Székely, G. J. and Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. J. Statist. Plann. Inference 143 1249–1272.
  • [34] Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794.
  • [35] Torgerson, W. S. (1958). Theory and Methods of Scaling. Wiley, New York.
  • [36] Wermuth, N. and Cox, D. R. (2013). Concepts and a case study for a flexible class of graphical Markov models. In Robustness and Complex Data Structures 331–350. Springer, Heidelberg.
  • [37] Young, G. and Householder, A. S. (1938). Discussion of a set of points in terms of their mutual distances. Psychometrika 3 19–22.