Source: Ann. Statist. Volume 36, Number 3
(2008), 1299-1323.
This paper introduces and studies multivariate spacings. The spacings are developed using the order statistics derived from data depth. Specifically, the spacing between two consecutive order statistics is the region which bridges the two order statistics, in the sense that the region contains all the points whose depth values fall between the depth values of the two consecutive order statistics. These multivariate spacings can be viewed as a data-driven realization of the so-called “statistically equivalent blocks.” These spacings assume a form of center-outward layers of “shells” (“rings” in the two-dimensional case), where the shapes of the shells follow closely the underlying probabilistic geometry. The properties and applications of these spacings are studied. In particular, the spacings are used to construct tolerance regions. The construction of tolerance regions is nonparametric and completely data driven, and the resulting tolerance region reflects the true geometry of the underlying distribution. This is different from most existing approaches which require that the shape of the tolerance region be specified in advance. The proposed tolerance regions are shown to meet the prescribed specifications, in terms of β-content and β-expectation. They are also asymptotically minimal under elliptical distributions. Finally, a simulation and comparison study on the proposed tolerance regions is presented.
References
[1] Barber, C. B., Dobkin, D. P. and Huhdanpaa, H. (1996). The Quickhull algorithm for convex hulls. ACM Trans. Math. Software 22 469–483.
[2] Beirlant, J., Dierckx, G., Guillou, A. and Stacaronricacaron, C. (2002). On exponential representations of Log-spacings of extreme order statistics. Extremes 5 157–180.
[3] Chatterjee, S. K. and Patra, N. K. (1980). Asymptotically minimal multivariate tolerance sets. Calcutta Statist. Assoc. Bull. 29 73–93.
Mathematical Reviews (MathSciNet):
MR596720
[4] Cressie, N. (1979). An optimal statistic based on higher order gaps. Biometrika 66 619–627.
Mathematical Reviews (MathSciNet):
MR556744
[5] Darling, D. (1953). On a class of problems related to the random division of an interval. Ann. Math. Statist. 24 239–253.
Mathematical Reviews (MathSciNet):
MR58891
[6] Di Bucchianico, A., Einmahl, J. H. J. and Mushkudiani, N. A. (2001). Smallest nonparametric tolerance regions. Ann. Statist. 29 1320–1343.
[7] Dohoho, D. (1982). Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Harvard Univ.
[8] Donoho, D. and Gasko, M. (1992). Breakdown properties of location estimates based on half-space depth and projected outlyingness. Ann. Statist. 20 1803–1827.
[9] Einmahl, J. H. J. and van Zuijlen, M. (1988). Strong bounds for weighted empirical distribution functions based on uniform spacings. Ann. Probab. 16 108–125.
Mathematical Reviews (MathSciNet):
MR920258
[10] Fraser, D. (1951). Sequentially determined statistically equivalent blocks. Ann. Math. Statist. 22 372–381.
Mathematical Reviews (MathSciNet):
MR43425
[11] Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Charles Griffin, London.
Mathematical Reviews (MathSciNet):
MR317473
[12] Hall, P. (1986). On powerful distributional tests based on sample spacings. J. Multivariate Anal. 19 201–224.
Mathematical Reviews (MathSciNet):
MR853053
[13] He, X. and Wang, G. (1997). Convergence of depth contours for multivariate datasets. Ann. Statist. 25 495–504.
[14] Hodges, J. (1955). A bivariate sign test. Ann. Math. Statistics 26 523–527.
Mathematical Reviews (MathSciNet):
MR70921
[15] Howe, W. G. (1969). Two-sided tolerance limits for normal populations—some improvements. J. Amer. Statist. Assoc. 64 610–620.
[16] Liu, R. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
[17] Liu, R., Parelius, J. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion). Ann. Statist. 27 783–858.
[18] Liu, R. and Singh, K. (1992). Ordering directional data: Concepts of data depth on circles and spheres. Ann. Statist. 20 1468–1484.
[19] Liu, R. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests. J. Amer. Statist. Assoc. 88 252–260.
[20] Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proc. Nat. Inst. Sci. India 12 49–55.
[21] Moran, P. (1947). A random division of an interval. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 9 92–98.
Mathematical Reviews (MathSciNet):
MR23002
[22] Pyke, R. (1965). Spacings. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 27 395–449.
Mathematical Reviews (MathSciNet):
MR216622
[23] Stahel, W. (1981). Robust Schaetzungen: Infinitesmale Optimalitaet und Schaetzungen von Kovarianzmatrizen. Ph.D. thesis, ETH Zurich.
[24] Tukey, J. (1947). Nonparametric estimation. II. Statistical equivalent blocks and tolerance regions—the continuous case. Ann. Math. Statist. 18 529–539.
Mathematical Reviews (MathSciNet):
MR23033
[25] Tukey, J. (1975). Mathematics and picturing data. Proceedings of the 1975 International Congress of Mathematics 2 523–531.
Mathematical Reviews (MathSciNet):
MR426989
[26] Wald, A. (1943). An extension of Wilks’ method for setting tolerance limits. Ann. Math. Statist. 14 45–55.
Mathematical Reviews (MathSciNet):
MR7965
[27] Weiss, L. (1957). Asymptotic power of certain tests of fit based on sample spacings. Ann. Math. Statist. 28 783–786.
Mathematical Reviews (MathSciNet):
MR96327
[28] Wells, M., Jammalamadaka, S. R. and Tiwari, R. (1993). Large sample theory of spacings statistics for tests of fit for the composite hypothesis. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 55 189–203.
[29] Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math. Statist. 12 91–96.
Mathematical Reviews (MathSciNet):
MR4451
[30] Zuo, Y. (2003). Projection based depth functions and associated medians. Ann. Statist. 31 1460–1490.
[31] Zuo, Y. and Serfling, R. (2000). Structural properties and convergence results for contours of sample statistical depth functions. Ann. Statist. 28 483–499.