The Annals of Statistics

Multivariate spacings based on data depth: I. Construction of nonparametric multivariate tolerance regions

Jun Li and Regina Y. Liu

Source: Ann. Statist. Volume 36, Number 3 (2008), 1299-1323.

Abstract

This paper introduces and studies multivariate spacings. The spacings are developed using the order statistics derived from data depth. Specifically, the spacing between two consecutive order statistics is the region which bridges the two order statistics, in the sense that the region contains all the points whose depth values fall between the depth values of the two consecutive order statistics. These multivariate spacings can be viewed as a data-driven realization of the so-called “statistically equivalent blocks.” These spacings assume a form of center-outward layers of “shells” (“rings” in the two-dimensional case), where the shapes of the shells follow closely the underlying probabilistic geometry. The properties and applications of these spacings are studied. In particular, the spacings are used to construct tolerance regions. The construction of tolerance regions is nonparametric and completely data driven, and the resulting tolerance region reflects the true geometry of the underlying distribution. This is different from most existing approaches which require that the shape of the tolerance region be specified in advance. The proposed tolerance regions are shown to meet the prescribed specifications, in terms of β-content and β-expectation. They are also asymptotically minimal under elliptical distributions. Finally, a simulation and comparison study on the proposed tolerance regions is presented.

Primary Subjects: 62G15, 62G30
Secondary Subjects: 62G20, 62H05
Keywords: Data depth; depth order statistics; multivariate spacings; statistically equivalent blocks; tolerance region

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1211819565
Digital Object Identifier: doi:10.1214/07-AOS505
Mathematical Reviews number (MathSciNet): MR2418658
Zentralblatt MATH identifier: 05294974

References

[1] Barber, C. B., Dobkin, D. P. and Huhdanpaa, H. (1996). The Quickhull algorithm for convex hulls. ACM Trans. Math. Software 22 469–483.
Mathematical Reviews (MathSciNet): MR1428265
Digital Object Identifier: doi:10.1145/235815.235821
[2] Beirlant, J., Dierckx, G., Guillou, A. and Stacaronricacaron, C. (2002). On exponential representations of Log-spacings of extreme order statistics. Extremes 5 157–180.
Mathematical Reviews (MathSciNet): MR1965977
Digital Object Identifier: doi:10.1023/A:1022171205129
[3] Chatterjee, S. K. and Patra, N. K. (1980). Asymptotically minimal multivariate tolerance sets. Calcutta Statist. Assoc. Bull. 29 73–93.
Mathematical Reviews (MathSciNet): MR596720
[4] Cressie, N. (1979). An optimal statistic based on higher order gaps. Biometrika 66 619–627.
Mathematical Reviews (MathSciNet): MR556744
Zentralblatt MATH: 0455.62036
Digital Object Identifier: doi:10.1093/biomet/66.3.619
[5] Darling, D. (1953). On a class of problems related to the random division of an interval. Ann. Math. Statist. 24 239–253.
Mathematical Reviews (MathSciNet): MR58891
Digital Object Identifier: doi:10.1214/aoms/1177729030
Project Euclid: euclid.aoms/1177729030
[6] Di Bucchianico, A., Einmahl, J. H. J. and Mushkudiani, N. A. (2001). Smallest nonparametric tolerance regions. Ann. Statist. 29 1320–1343.
Mathematical Reviews (MathSciNet): MR1873333
Digital Object Identifier: doi:10.1214/aos/1013203456
Project Euclid: euclid.aos/1013203456
[7] Dohoho, D. (1982). Breakdown properties of multivariate location estimators. Ph.D. qualifying paper, Harvard Univ.
[8] Donoho, D. and Gasko, M. (1992). Breakdown properties of location estimates based on half-space depth and projected outlyingness. Ann. Statist. 20 1803–1827.
Mathematical Reviews (MathSciNet): MR1193313
Digital Object Identifier: doi:10.1214/aos/1176348890
Project Euclid: euclid.aos/1176348890
[9] Einmahl, J. H. J. and van Zuijlen, M. (1988). Strong bounds for weighted empirical distribution functions based on uniform spacings. Ann. Probab. 16 108–125.
Mathematical Reviews (MathSciNet): MR920258
Digital Object Identifier: doi:10.1214/aop/1176991888
Project Euclid: euclid.aop/1176991888
[10] Fraser, D. (1951). Sequentially determined statistically equivalent blocks. Ann. Math. Statist. 22 372–381.
Mathematical Reviews (MathSciNet): MR43425
Digital Object Identifier: doi:10.1214/aoms/1177729583
Project Euclid: euclid.aoms/1177729583
[11] Guttman, I. (1970). Statistical Tolerance Regions: Classical and Bayesian. Charles Griffin, London.
Mathematical Reviews (MathSciNet): MR317473
Zentralblatt MATH: 0231.62052
[12] Hall, P. (1986). On powerful distributional tests based on sample spacings. J. Multivariate Anal. 19 201–224.
Mathematical Reviews (MathSciNet): MR853053
Digital Object Identifier: doi:10.1016/0047-259X(86)90027-8
[13] He, X. and Wang, G. (1997). Convergence of depth contours for multivariate datasets. Ann. Statist. 25 495–504.
Mathematical Reviews (MathSciNet): MR1439311
Digital Object Identifier: doi:10.1214/aos/1031833661
Project Euclid: euclid.aos/1031833661
[14] Hodges, J. (1955). A bivariate sign test. Ann. Math. Statistics 26 523–527.
Mathematical Reviews (MathSciNet): MR70921
Digital Object Identifier: doi:10.1214/aoms/1177728498
Project Euclid: euclid.aoms/1177728498
[15] Howe, W. G. (1969). Two-sided tolerance limits for normal populations—some improvements. J. Amer. Statist. Assoc. 64 610–620.
[16] Liu, R. (1990). On a notion of data depth based on random simplices. Ann. Statist. 18 405–414.
Mathematical Reviews (MathSciNet): MR1041400
Digital Object Identifier: doi:10.1214/aos/1176347507
Project Euclid: euclid.aos/1176347507
[17] Liu, R., Parelius, J. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference (with discussion). Ann. Statist. 27 783–858.
Mathematical Reviews (MathSciNet): MR1724033
Project Euclid: euclid.aos/1018031260
[18] Liu, R. and Singh, K. (1992). Ordering directional data: Concepts of data depth on circles and spheres. Ann. Statist. 20 1468–1484.
Mathematical Reviews (MathSciNet): MR1186260
Digital Object Identifier: doi:10.1214/aos/1176348779
Project Euclid: euclid.aos/1176348779
[19] Liu, R. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests. J. Amer. Statist. Assoc. 88 252–260.
Mathematical Reviews (MathSciNet): MR1212489
Digital Object Identifier: doi:10.2307/2290720
[20] Mahalanobis, P. C. (1936). On the generalized distance in statistics. Proc. Nat. Inst. Sci. India 12 49–55.
[21] Moran, P. (1947). A random division of an interval. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 9 92–98.
Mathematical Reviews (MathSciNet): MR23002
Digital Object Identifier: doi:10.2307/2983572
[22] Pyke, R. (1965). Spacings. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 27 395–449.
Mathematical Reviews (MathSciNet): MR216622
[23] Stahel, W. (1981). Robust Schaetzungen: Infinitesmale Optimalitaet und Schaetzungen von Kovarianzmatrizen. Ph.D. thesis, ETH Zurich.
[24] Tukey, J. (1947). Nonparametric estimation. II. Statistical equivalent blocks and tolerance regions—the continuous case. Ann. Math. Statist. 18 529–539.
Mathematical Reviews (MathSciNet): MR23033
Digital Object Identifier: doi:10.1214/aoms/1177730343
Project Euclid: euclid.aoms/1177730343
[25] Tukey, J. (1975). Mathematics and picturing data. Proceedings of the 1975 International Congress of Mathematics 2 523–531.
Mathematical Reviews (MathSciNet): MR426989
Zentralblatt MATH: 0347.62002
[26] Wald, A. (1943). An extension of Wilks’ method for setting tolerance limits. Ann. Math. Statist. 14 45–55.
Mathematical Reviews (MathSciNet): MR7965
Digital Object Identifier: doi:10.1214/aoms/1177731491
Project Euclid: euclid.aoms/1177731491
[27] Weiss, L. (1957). Asymptotic power of certain tests of fit based on sample spacings. Ann. Math. Statist. 28 783–786.
Mathematical Reviews (MathSciNet): MR96327
Digital Object Identifier: doi:10.1214/aoms/1177706892
Project Euclid: euclid.aoms/1177706892
[28] Wells, M., Jammalamadaka, S. R. and Tiwari, R. (1993). Large sample theory of spacings statistics for tests of fit for the composite hypothesis. J. Roy. Statist. Soc. Ser. B Stat. Methodol. 55 189–203.
Mathematical Reviews (MathSciNet): MR1210431
[29] Wilks, S. S. (1941). Determination of sample sizes for setting tolerance limits. Ann. Math. Statist. 12 91–96.
Mathematical Reviews (MathSciNet): MR4451
Digital Object Identifier: doi:10.1214/aoms/1177731788
Project Euclid: euclid.aoms/1177731788
[30] Zuo, Y. (2003). Projection based depth functions and associated medians. Ann. Statist. 31 1460–1490.
Mathematical Reviews (MathSciNet): MR2012822
Digital Object Identifier: doi:10.1214/aos/1065705115
Project Euclid: euclid.aos/1065705115
[31] Zuo, Y. and Serfling, R. (2000). Structural properties and convergence results for contours of sample statistical depth functions. Ann. Statist. 28 483–499.
Mathematical Reviews (MathSciNet): MR1790006
Digital Object Identifier: doi:10.1214/aos/1016218227
Project Euclid: euclid.aos/1016218227

2009 © Institute of Mathematical Statistics