Open Access
2014 Data-based decision rules about the convexity of the support of a distribution
Pedro Delicado, Adolfo Hernández, Gábor Lugosi
Electron. J. Statist. 8(1): 96-129 (2014). DOI: 10.1214/14-EJS877
Abstract

Given $n$ independent, identically distributed random vectors in $\mathbb{R}^{d}$, drawn from a common density $f$, one wishes to find out whether the support of $f$ is convex or not. In this paper we describe a decision rule which decides correctly for sufficiently large $n$, with probability $1$, whenever $f$ is bounded away from zero in its compact support. We also show that the assumption of boundedness is necessary. The rule is based on a statistic that is a second-order $U$-statistic with a random kernel. Moreover, we suggest a way of approximating the distribution of the statistic under the hypothesis of convexity of the support. The performance of the proposed method is illustrated on simulated data sets. As an example of its potential statistical implications, the decision rule is used to automatically choose the tuning parameter of ISOMAP, a nonlinear dimensionality reduction method.

References

1.

Arnold, S. F. (1990)., Mathematical Statistics. Prentice Hall.Arnold, S. F. (1990)., Mathematical Statistics. Prentice Hall.

2.

Baíllo, A., Cuevas, A. and Justel, A. (2000). Set estimation and nonparametric detection., The Canadian Journal of Statistics 28 765–782. MR1821433 10.2307/3315915Baíllo, A., Cuevas, A. and Justel, A. (2000). Set estimation and nonparametric detection., The Canadian Journal of Statistics 28 765–782. MR1821433 10.2307/3315915

3.

Baíllo, A. and Cuevas, A. (2001). On the estimation of a star-shaped set., Advances in Applied Probability 33 717–726. MR1875774 10.1239/aap/1011994024 euclid.aap/1011994024 Baíllo, A. and Cuevas, A. (2001). On the estimation of a star-shaped set., Advances in Applied Probability 33 717–726. MR1875774 10.1239/aap/1011994024 euclid.aap/1011994024

4.

Biau, G., Cadre, B. and Pelletier, B. (2008). Exact rates in density support estimation., Journal of Multivariate Analysis 99 2185–2207. MR2463383 10.1016/j.jmva.2008.02.021Biau, G., Cadre, B. and Pelletier, B. (2008). Exact rates in density support estimation., Journal of Multivariate Analysis 99 2185–2207. MR2463383 10.1016/j.jmva.2008.02.021

5.

Borg, I. and Groenen, P. (2005)., Modern Multidimensional Scaling: Theory and Applications (2nd ed). Springer-Verlag, New York. MR2158691Borg, I. and Groenen, P. (2005)., Modern Multidimensional Scaling: Theory and Applications (2nd ed). Springer-Verlag, New York. MR2158691

6.

Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables., The Annals Probability 33 514–560. MR2123200 10.1214/009117904000000856 euclid.aop/1109868590 Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables., The Annals Probability 33 514–560. MR2123200 10.1214/009117904000000856 euclid.aop/1109868590

7.

Cadre, B. (2006). Kernel estimation of density level sets., Journal of Multivariate Analysis 97 999–1023. MR2256570 10.1016/j.jmva.2005.05.004Cadre, B. (2006). Kernel estimation of density level sets., Journal of Multivariate Analysis 97 999–1023. MR2256570 10.1016/j.jmva.2005.05.004

8.

Cuevas, A. (2009). Set estimation: Another bridge between statistics and geometry., BEIO 25 71–85. MR2750781Cuevas, A. (2009). Set estimation: Another bridge between statistics and geometry., BEIO 25 71–85. MR2750781

9.

Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation., The Annals of Statistics 25 2300–2312. MR1604449 10.1214/aos/1030741073 euclid.aos/1030741073 Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation., The Annals of Statistics 25 2300–2312. MR1604449 10.1214/aos/1030741073 euclid.aos/1030741073

10.

Cuevas, A. and Fraiman, R. (2009). Set estimation., New Perspectives in Stochastic Geometry 1 374–398. MR2654684Cuevas, A. and Fraiman, R. (2009). Set estimation., New Perspectives in Stochastic Geometry 1 374–398. MR2654684

11.

Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation., Adv. in Appl. Probab. 36 340–354. MR2058139 10.1239/aap/1086957575 euclid.aap/1086957575 Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation., Adv. in Appl. Probab. 36 340–354. MR2058139 10.1239/aap/1086957575 euclid.aap/1086957575

12.

Dembo, A. and Peres, Y. (1994). A topological criterion for hypothesis testing., The Annals of Statistics 22 106–117. MR1272078 10.1214/aos/1176325360 euclid.aos/1176325360 Dembo, A. and Peres, Y. (1994). A topological criterion for hypothesis testing., The Annals of Statistics 22 106–117. MR1272078 10.1214/aos/1176325360 euclid.aos/1176325360

13.

Devroye, L., Györfi, L. and Lugosi, G. (1996)., A Probabilistic Theory of Pattern Recognition. Springer, New York. MR1383093Devroye, L., Györfi, L. and Lugosi, G. (1996)., A Probabilistic Theory of Pattern Recognition. Springer, New York. MR1383093

14.

Devroye, L. and Györfi, L. (2002). Distribution and density estimation. In, Principles of Nonparametric Learning; CISM Courses and Lectures No. 434 ( L. Györfi, ed.) 211–270. Springer Verlag, Vienna. MR1987660 10.1007/978-3-7091-2568-7_5Devroye, L. and Györfi, L. (2002). Distribution and density estimation. In, Principles of Nonparametric Learning; CISM Courses and Lectures No. 434 ( L. Györfi, ed.) 211–270. Springer Verlag, Vienna. MR1987660 10.1007/978-3-7091-2568-7_5

15.

Devroye, L. and Lugosi, G. (2002). Almost sure classification of densities., Journal of Nonparametric Statistics 14 675–698. MR1941709 10.1080/10485250215323Devroye, L. and Lugosi, G. (2002). Almost sure classification of densities., Journal of Nonparametric Statistics 14 675–698. MR1941709 10.1080/10485250215323

16.

Erdős, P. (1945). Some remarks on the measurability of certain sets., Bull. Am. Math. Soc. 51 728–731. MR13776 10.1090/S0002-9904-1945-08429-8 euclid.bams/1183507353 Erdős, P. (1945). Some remarks on the measurability of certain sets., Bull. Am. Math. Soc. 51 728–731. MR13776 10.1090/S0002-9904-1945-08429-8 euclid.bams/1183507353

17.

Gorban, A. N., Kégl, B., Wunsch, D. C. and Zinovyev, A., eds. (2007)., Principal Manifolds for Data Visualization and Dimension Reduction. Lecture Notes in Computational Science and Engineering 58. Springer, Berlin Heidelberg. MR2447219Gorban, A. N., Kégl, B., Wunsch, D. C. and Zinovyev, A., eds. (2007)., Principal Manifolds for Data Visualization and Dimension Reduction. Lecture Notes in Computational Science and Engineering 58. Springer, Berlin Heidelberg. MR2447219

18.

Hollander, M. and Wolfe, D. A. (1999)., Nonparametric Statistical Methods, 2nd ed. Wiley Series in Probability and Statistics. Wiley, New York. MR1666064Hollander, M. and Wolfe, D. A. (1999)., Nonparametric Statistical Methods, 2nd ed. Wiley Series in Probability and Statistics. Wiley, New York. MR1666064

19.

Lang, R. (1986). A note on the measurability of convex sets., Archiv der Mathematik 47 90–92. MR855142 10.1007/BF01202504Lang, R. (1986). A note on the measurability of convex sets., Archiv der Mathematik 47 90–92. MR855142 10.1007/BF01202504

20.

Lee, J. A. and Verleysen, M. (2007)., Nonlinear Dimensionality Reduction. Information Science and Statistics. Springer, New York. MR2373983Lee, J. A. and Verleysen, M. (2007)., Nonlinear Dimensionality Reduction. Information Science and Statistics. Springer, New York. MR2373983

21.

Li, S. (2011). Concise formulas for the area and volume of a hyperspherical cap., Asian Journal of Mathematics and Statistics 4 66–70. MR2813331Li, S. (2011). Concise formulas for the area and volume of a hyperspherical cap., Asian Journal of Mathematics and Statistics 4 66–70. MR2813331

22.

Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates., Annals of Applied Probability 19 1108–1142. MR2537201 10.1214/08-AAP569 euclid.aoap/1245071021 Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates., Annals of Applied Probability 19 1108–1142. MR2537201 10.1214/08-AAP569 euclid.aoap/1245071021

23.

Pateiro-López, B. and Rodríguez-Casal, A. (2009). Generalizing the convex hull of a sample: The R package alphahull., Journal of Statistical Software 34 1–28.Pateiro-López, B. and Rodríguez-Casal, A. (2009). Generalizing the convex hull of a sample: The R package alphahull., Journal of Statistical Software 34 1–28.

24.

Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions., Annals of Statistics 22 2031–2050. MR1329181 10.1214/aos/1176325770 euclid.aos/1176325770 Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions., Annals of Statistics 22 2031–2050. MR1329181 10.1214/aos/1176325770 euclid.aos/1176325770

25.

Politis, D. N., Romano, J. P. and Wolf, M. (1999)., Subsampling. Springer, New York. MR1707286Politis, D. N., Romano, J. P. and Wolf, M. (1999)., Subsampling. Springer, New York. MR1707286

26.

Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters-an excess mass approach., The Annals of Statistics 23 855–881. MR1345204 10.1214/aos/1176324626 euclid.aos/1176324626 Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters-an excess mass approach., The Annals of Statistics 23 855–881. MR1345204 10.1214/aos/1176324626 euclid.aos/1176324626

27.

Rigollet, P. and Vert, R. (2009). Optimal rates for plug-in estimators of density level sets., Bernoulli 15 1154–1178. MR2597587 10.3150/09-BEJ184 euclid.bj/1262962230 Rigollet, P. and Vert, R. (2009). Optimal rates for plug-in estimators of density level sets., Bernoulli 15 1154–1178. MR2597587 10.3150/09-BEJ184 euclid.bj/1262962230

28.

Rodríguez-Casal, A. (2007). Set estimation under convexity-type assumptions., Ann. Inst. H. Poincaré Probab. Statist. 43 763–774.Rodríguez-Casal, A. (2007). Set estimation under convexity-type assumptions., Ann. Inst. H. Poincaré Probab. Statist. 43 763–774.

29.

Royden, H. L. (1968)., Real Analysis. Macmillan, New York. MR151555Royden, H. L. (1968)., Real Analysis. Macmillan, New York. MR151555

30.

Schick, A. (1997). On U-statistics with random kernels., Statistics and Probability Letters 34 275–283. MR1458022Schick, A. (1997). On U-statistics with random kernels., Statistics and Probability Letters 34 275–283. MR1458022

31.

Scott, C. D. and Nowak, R. D. (2006). Learning minimum volume sets., Journal of Machine Learning Research 7 665–704. MR2274383Scott, C. D. and Nowak, R. D. (2006). Learning minimum volume sets., Journal of Machine Learning Research 7 665–704. MR2274383

32.

Serfling, R. J. (1980)., Approximation Theorems of Mathematical Statistics. John Wiley & Sons. MR595165Serfling, R. J. (1980)., Approximation Theorems of Mathematical Statistics. John Wiley & Sons. MR595165

33.

Shao, C., Huang, H. and Wan, C. (2007). Selection of the suitable neighborhood size for the ISOMAP algorithm. In, Proceedings of International Conference on Neural Networks 300–305.Shao, C., Huang, H. and Wan, C. (2007). Selection of the suitable neighborhood size for the ISOMAP algorithm. In, Proceedings of International Conference on Neural Networks 300–305.

34.

Steinwart, I., Hush, D. and Scovel, C. (2006). A classification framework for anomaly detection., Journal of Machine Learning Research 6 211. MR2249820Steinwart, I., Hush, D. and Scovel, C. (2006). A classification framework for anomaly detection., Journal of Machine Learning Research 6 211. MR2249820

35.

Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction., Science 290 2319–2323.Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction., Science 290 2319–2323.

36.

Tsybakov, A. B. (1997). On nonparametric estimation of density level sets., The Annals of Statistics 25 948–969. MR1447735 10.1214/aos/1069362732 euclid.aos/1069362732 Tsybakov, A. B. (1997). On nonparametric estimation of density level sets., The Annals of Statistics 25 948–969. MR1447735 10.1214/aos/1069362732 euclid.aos/1069362732

37.

Vert, R. and Vert, J. P. (2006). Consistency and convergence rates of one-class SVMs and related algorithms., Journal of Machine Learning Research 7 817–854. MR2274388Vert, R. and Vert, J. P. (2006). Consistency and convergence rates of one-class SVMs and related algorithms., Journal of Machine Learning Research 7 817–854. MR2274388

38.

Willett, R. M. and Nowak, R. D. (2007). Minimax optimal level-set estimation., IEEE Transactions on Image Processing 16 2965–2979. MR2472804 10.1109/TIP.2007.910175Willett, R. M. and Nowak, R. D. (2007). Minimax optimal level-set estimation., IEEE Transactions on Image Processing 16 2965–2979. MR2472804 10.1109/TIP.2007.910175
Copyright © 2014 The Institute of Mathematical Statistics and the Bernoulli Society
Pedro Delicado, Adolfo Hernández, and Gábor Lugosi "Data-based decision rules about the convexity of the support of a distribution," Electronic Journal of Statistics 8(1), 96-129, (2014). https://doi.org/10.1214/14-EJS877
Published: 2014
Vol.8 • No. 1 • 2014
Back to Top