The Annals of Statistics

On the multivariate runs test

Norbert Henze and Mathew D. Penrose

Full-text: Open access


For independent $d$-variate random variables $X_1,\dots,X_m$ with common density $f$ and $Y_1,\dots,Y_n$ with common density $g$, let $R_{m,n}$ be the number of edges in the minimal spanning tree with vertices $X_1,\dots,X_m$, $Y_1,\dots,Y_n$ that connect points from different samples. Friedman and Rafsky conjectured that a test of $H_0: f = g$ that rejects $H_0$ for small values of $R_{m,n}$ should have power against general alternatives. We prove that $R_{m,n}$ is asymptotically distribution-free under $H_0$ , and that the multivariate two-sample test based on $R_{m,n}$ is universally consistent.

Article information

Ann. Statist., Volume 27, Number 1 (1999), 290-298.

First available in Project Euclid: 5 April 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H15: Hypothesis testing
Secondary: 62G10: Hypothesis testing 60F05: Central limit and other weak theorems 60F15: Strong theorems

Multivariate two-sample problem minimal spanning tree multivariate runs test homogeneous Poisson process


Henze, Norbert; Penrose, Mathew D. On the multivariate runs test. Ann. Statist. 27 (1999), no. 1, 290--298. doi:10.1214/aos/1018031112.

Export citation


  • [1] Aldous, D. (1990). A Random tree model associated with random graphs. Random Structures Algorithms 1 383-401.
  • [2] Aldous, D. and Steele, J. M. (1992). Asymptotics for Euclidean minimal spanning trees on random points. Probab. Theory Related Fields 92 247-258.
  • [3] Anderson, N. H., Hall, P. and Titterington, D. M. (1994). Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J. Multivariate Anal. 50 41-54.
  • [4] Bahr, R. (1996). A new test for the multivariate two-sample problem with general alternatives (in German). Doctoral thesis, Univ. Hannover.
  • [5] Bloemena, A. R. (1964). Sampling from a graph. Mathematical Centre Tracts 2. Math. Centrum, Amsterdam.
  • [6] Einmahl, J. H. J. and Khmaladze, E. V. (1998). The two-sample problem in Rm and measure-valued martingales. Report S98-2, Dept. Statistics, Univ. New South Wales, Sidney.
  • [7] Ferger, D. (1997). Optimal tests for the general two-sample problem. Dresdener Schriften zur Mathematischen Stochastik Technische Univ. Dresden.
  • [8] Friedman, J. H. and Rafsky, L. C. (1979). Multivariate generalizations of the Wolfowitz and Smirnov two-sample tests. Ann. Statist. 7 697-717.
  • [9] Gy ¨orfi, L. and Nemetz, T. (1975). f-dissimilarity: A general class of separation measures of several probability measures. In Topics in Information Theory. Colloq. Math. Soc. J´anos Bolyai 16 309-321.
  • [10] Gy ¨orfi, L. and Nemetz, T. (1977). On the dissimilarity of probability measures. Problems Control Inform. Theory 6 263-267.
  • [11] Gy ¨orfi, L. and Nemetz, T. (1978). f-dissimilarity. A generalization of affinity of several distributions. Ann. Inst. Statist. Math. 30 105-113.
  • [12] Henze, N. (1986). On the probability that a random point is the jth nearest neighbour to its own kth nearest neighbour. J. Appl. Probab. 23 221-226.
  • [13] Henze, N. (1988). A multivariate two-sample test based on the number of nearest-neighbor type coincidences. Ann. Statist. 16 772-783.
  • [14] Henze, N. and Voigt, B. (1992). Almost sure convergence of certain slowly changing symmetric oneand multi-sample statistics. Ann. Probab. 20 1086-1098.
  • [15] Kingman, J. F. C. (1993). Poisson Processes. Oxford Univ. Press.
  • [16] Lee, S. (1997). The central limit theorem for Euclidean minimal spanning trees I. Ann. Appl. Probab. 7 996-1020.
  • [17] Penrose, M. D. (1996). The random minimal spanning tree in high dimensions. Ann. Probab. 24 1903-1925.
  • [18] Pickard, D. K. (1982). Isolated nearest neighbours. J. Appl. Probab. 19 444-449.
  • [19] Resnick, S. I. (1987). Extreme Values, Regular Variation, and Point Processes. Springer, New York.
  • [20] Rudin, W. (1987). Real and Complex Analysis, 3rd ed. McGraw-Hill, New York.
  • [21] Schilling, M. F. (1986). Multivariate two-sample tests based on nearest neighbors. J. Amer. Statist. Assoc. 81 799-806.
  • [22] Steele, J. M., Shepp, L. A. and Eddy, W. F. (1987). On the number of leaves of a Euclidean minimal spanning tree. J. Appl. Prob. 24 809-826.