The Annals of Applied Statistics

Brownian distance covariance

Gábor J. Székely and Maria L. Rizzo

Full-text: Open access

Abstract

Distance correlation is a new class of multivariate dependence coefficients applicable to random vectors of arbitrary and not necessarily equal dimension. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but generalize and extend these classical bivariate measures of dependence. Distance correlation characterizes independence: it is zero if and only if the random vectors are independent. The notion of covariance with respect to a stochastic process is introduced, and it is shown that population distance covariance coincides with the covariance with respect to Brownian motion; thus, both can be called Brownian distance covariance. In the bivariate case, Brownian covariance is the natural extension of product-moment covariance, as we obtain Pearson product-moment covariance by replacing the Brownian motion in the definition with identity. The corresponding statistic has an elegantly simple computing formula. Advantages of applying Brownian covariance and correlation vs the classical Pearson covariance and correlation are discussed and illustrated.

Article information

Source
Ann. Appl. Stat. Volume 3, Number 4 (2009), 1236-1265.

Dates
First available in Project Euclid: 1 March 2010

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1267453933

Digital Object Identifier
doi:10.1214/09-AOAS312

Mathematical Reviews number (MathSciNet)
MR2752127

Citation

Székely, Gábor J.; Rizzo, Maria L. Brownian distance covariance. The Annals of Applied Statistics 3 (2009), no. 4, 1236--1265. doi:10.1214/09-AOAS312. http://projecteuclid.org/euclid.aoas/1267453933.


Export citation

References

  • [1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, New York.
  • [2] Bakirov, N. K., Rizzo, M. L. and Székely, G. J. (2006). A multivariate nonparametric test of independence. J. Multivariate Anal. 93 1742–1756.
  • [3] Blomqvist, N. (1950). On a measure of dependence between two random variables. Ann. Math. Statist. 21 593–600.
  • [4] Blum, J. R., Kiefer, J. and Rosenblatt, M. (1961). Distribution free tests of independence based on the sample distribution function. Ann. Math. Statist. 32 485–498.
  • [5] Bowman, A. and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations. Oxford Univ. Press, Oxford.
  • [6] Bowman, A. W. and Azzalini, A. (2007). R package ‘sm’: Nonparametric smoothing methods (version 2.2).
  • [7] Bradley, R. C. (1981). Central limit theorem under weak dependence. J. Multivariate Anal. 11 1–16.
  • [8] Bradley, R. C. (1988). A Central Limit theorem for stationary ρ-mixing sequences with infinite variance. Ann. Probab. 16 313–332.
  • [9] Bradley, R. C. (2007). Introduction to Strong Mixing Condition, Vol. 1–3. Kendrick Press.
  • [10] Eckerle, K. and NIST (1979). Circular Interference Transmittance Study. Available at http://www.itl.nist.gov/div898/strd/nls/data/eckerle4.shtml.
  • [11] Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman and Hall, New York.
  • [12] Fox, J. (2009). car: Companion to Applied Regression. R package version 1.2-14.
  • [13] Freedman, J. L. (1975). Crowding and Behavior. Viking Press, New York.
  • [14] Gumbel, E. J. (1961). Multivariate exponential distributions. Bulletin of the International Statistical Institute 39 469–475.
  • [15] Herbin, E. and Merzbach, E. (2007). The multiparameter fractional Brownian motion. In Math. Everywhere 93–101. Springer, Berlin.
  • [16] Hollander, M. and Wolfe, D. A. (1999). Nonparametric Statistical Methods, 2nd ed., Wiley, New York.
  • [17] Johnson, M. E. (1987). Multivariate Statistical Simulation. Wiley, New York.
  • [18] Kotz, S., Balakrishnan, N. and Johnson, N. L. (2000). Continuous Multivariate Distributions, Vol. 1, 2nd ed. Wiley, New York.
  • [19] Landau, H. J. and Shepp, L. A. (1970). On the supremum of a Gaussian process. Sankya Ser. A 32 369–378.
  • [20] Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York.
  • [21] R Development Core Team (2009). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available at http://www.R-project.org.
  • [22] Rényi, A. (1959). On measures of dependence. Acta. Math. Acad. Sci. Hungary 10 441–451.
  • [23] Rizzo, M. L. and Székely, G. J. (2008). Energy: E-statistics (energy statistics). R package version 1.1-0.
  • [24] Saviotti, P. P. (1996). Technological Evolution, Variety and Economy. Edward Elgar, Cheltenham.
  • [25] Székely, G. J. and Bakirov, N. K. (2008). Brownian covariance and CLT for stationary sequences. Technical Report No. 08-01. Dept. Mathematics and Statistics, Bowling Green State Univ., Bowling Green, OH.
  • [26] Székely, G. J. and Bakirov, N. K. (2003). Extremal probabilities for Gaussian quadratic forms. Probab. Theory Related Fields 126 184–202.
  • [27] Székely, G. J. and Rizzo, M. L. (2005). Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method. J. Classification 22 151–183.
  • [28] Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing independence by correlation of distances. Ann. Statist. 35 2769–2794.
  • [29] Talagrand, M. (1988). Small tails for the supremum of a gaussian process. Ann. Inst. H. Poincaré Probab. Statist. 24 307–315.
  • [30] Taskinen, S., Oja, H. and Randles, R. H. (2005). Multivariate nonparametric tests of independence. J. Amer. Statist. Assoc. 100 916–925.
  • [31] United States Bureau of the Census (1970). Statistical Abstract of the United States.
  • [32] Wilks, S. S. (1935). On the independence of k sets of normally distributed statistical variables. Econometrica 3 309–326.