Given $n$ independent, identically distributed random vectors in $\mathbb{R}^{d}$, drawn from a common density $f$, one wishes to find out whether the support of $f$ is convex or not. In this paper we describe a decision rule which decides correctly for sufficiently large $n$, with probability $1$, whenever $f$ is bounded away from zero in its compact support. We also show that the assumption of boundedness is necessary. The rule is based on a statistic that is a second-order $U$-statistic with a random kernel. Moreover, we suggest a way of approximating the distribution of the statistic under the hypothesis of convexity of the support. The performance of the proposed method is illustrated on simulated data sets. As an example of its potential statistical implications, the decision rule is used to automatically choose the tuning parameter of ISOMAP, a nonlinear dimensionality reduction method.
Electron. J. Statist.
8(1):
96-129
(2014).
DOI: 10.1214/14-EJS877
Baíllo, A., Cuevas, A. and Justel, A. (2000). Set estimation and nonparametric detection., The Canadian Journal of Statistics 28 765–782. MR1821433 10.2307/3315915Baíllo, A., Cuevas, A. and Justel, A. (2000). Set estimation and nonparametric detection., The Canadian Journal of Statistics 28 765–782. MR1821433 10.2307/3315915
Baíllo, A. and Cuevas, A. (2001). On the estimation of a star-shaped set., Advances in Applied Probability 33 717–726. MR1875774 10.1239/aap/1011994024 euclid.aap/1011994024
Baíllo, A. and Cuevas, A. (2001). On the estimation of a star-shaped set., Advances in Applied Probability 33 717–726. MR1875774 10.1239/aap/1011994024 euclid.aap/1011994024
Biau, G., Cadre, B. and Pelletier, B. (2008). Exact rates in density support estimation., Journal of Multivariate Analysis 99 2185–2207. MR2463383 10.1016/j.jmva.2008.02.021Biau, G., Cadre, B. and Pelletier, B. (2008). Exact rates in density support estimation., Journal of Multivariate Analysis 99 2185–2207. MR2463383 10.1016/j.jmva.2008.02.021
Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables., The Annals Probability 33 514–560. MR2123200 10.1214/009117904000000856 euclid.aop/1109868590
Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables., The Annals Probability 33 514–560. MR2123200 10.1214/009117904000000856 euclid.aop/1109868590
Cadre, B. (2006). Kernel estimation of density level sets., Journal of Multivariate Analysis 97 999–1023. MR2256570 10.1016/j.jmva.2005.05.004Cadre, B. (2006). Kernel estimation of density level sets., Journal of Multivariate Analysis 97 999–1023. MR2256570 10.1016/j.jmva.2005.05.004
Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation., The Annals of Statistics 25 2300–2312. MR1604449 10.1214/aos/1030741073 euclid.aos/1030741073
Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation., The Annals of Statistics 25 2300–2312. MR1604449 10.1214/aos/1030741073 euclid.aos/1030741073
Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation., Adv. in Appl. Probab. 36 340–354. MR2058139 10.1239/aap/1086957575 euclid.aap/1086957575
Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation., Adv. in Appl. Probab. 36 340–354. MR2058139 10.1239/aap/1086957575 euclid.aap/1086957575
Dembo, A. and Peres, Y. (1994). A topological criterion for hypothesis testing., The Annals of Statistics 22 106–117. MR1272078 10.1214/aos/1176325360 euclid.aos/1176325360
Dembo, A. and Peres, Y. (1994). A topological criterion for hypothesis testing., The Annals of Statistics 22 106–117. MR1272078 10.1214/aos/1176325360 euclid.aos/1176325360
Devroye, L. and Györfi, L. (2002). Distribution and density estimation. In, Principles of Nonparametric Learning; CISM Courses and Lectures No. 434 ( L. Györfi, ed.) 211–270. Springer Verlag, Vienna. MR1987660 10.1007/978-3-7091-2568-7_5Devroye, L. and Györfi, L. (2002). Distribution and density estimation. In, Principles of Nonparametric Learning; CISM Courses and Lectures No. 434 ( L. Györfi, ed.) 211–270. Springer Verlag, Vienna. MR1987660 10.1007/978-3-7091-2568-7_5
Devroye, L. and Lugosi, G. (2002). Almost sure classification of densities., Journal of Nonparametric Statistics 14 675–698. MR1941709 10.1080/10485250215323Devroye, L. and Lugosi, G. (2002). Almost sure classification of densities., Journal of Nonparametric Statistics 14 675–698. MR1941709 10.1080/10485250215323
Erdős, P. (1945). Some remarks on the measurability of certain sets., Bull. Am. Math. Soc. 51 728–731. MR13776 10.1090/S0002-9904-1945-08429-8 euclid.bams/1183507353
Erdős, P. (1945). Some remarks on the measurability of certain sets., Bull. Am. Math. Soc. 51 728–731. MR13776 10.1090/S0002-9904-1945-08429-8 euclid.bams/1183507353
Gorban, A. N., Kégl, B., Wunsch, D. C. and Zinovyev, A., eds. (2007)., Principal Manifolds for Data Visualization and Dimension Reduction. Lecture Notes in Computational Science and Engineering 58. Springer, Berlin Heidelberg. MR2447219Gorban, A. N., Kégl, B., Wunsch, D. C. and Zinovyev, A., eds. (2007)., Principal Manifolds for Data Visualization and Dimension Reduction. Lecture Notes in Computational Science and Engineering 58. Springer, Berlin Heidelberg. MR2447219
Hollander, M. and Wolfe, D. A. (1999)., Nonparametric Statistical Methods, 2nd ed. Wiley Series in Probability and Statistics. Wiley, New York. MR1666064Hollander, M. and Wolfe, D. A. (1999)., Nonparametric Statistical Methods, 2nd ed. Wiley Series in Probability and Statistics. Wiley, New York. MR1666064
Lang, R. (1986). A note on the measurability of convex sets., Archiv der Mathematik 47 90–92. MR855142 10.1007/BF01202504Lang, R. (1986). A note on the measurability of convex sets., Archiv der Mathematik 47 90–92. MR855142 10.1007/BF01202504
Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates., Annals of Applied Probability 19 1108–1142. MR2537201 10.1214/08-AAP569 euclid.aoap/1245071021
Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates., Annals of Applied Probability 19 1108–1142. MR2537201 10.1214/08-AAP569 euclid.aoap/1245071021
Pateiro-López, B. and Rodríguez-Casal, A. (2009). Generalizing the convex hull of a sample: The R package alphahull., Journal of Statistical Software 34 1–28.Pateiro-López, B. and Rodríguez-Casal, A. (2009). Generalizing the convex hull of a sample: The R package alphahull., Journal of Statistical Software 34 1–28.
Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions., Annals of Statistics 22 2031–2050. MR1329181 10.1214/aos/1176325770 euclid.aos/1176325770
Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions., Annals of Statistics 22 2031–2050. MR1329181 10.1214/aos/1176325770 euclid.aos/1176325770
Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters-an excess mass approach., The Annals of Statistics 23 855–881. MR1345204 10.1214/aos/1176324626 euclid.aos/1176324626
Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters-an excess mass approach., The Annals of Statistics 23 855–881. MR1345204 10.1214/aos/1176324626 euclid.aos/1176324626
Rigollet, P. and Vert, R. (2009). Optimal rates for plug-in estimators of density level sets., Bernoulli 15 1154–1178. MR2597587 10.3150/09-BEJ184 euclid.bj/1262962230
Rigollet, P. and Vert, R. (2009). Optimal rates for plug-in estimators of density level sets., Bernoulli 15 1154–1178. MR2597587 10.3150/09-BEJ184 euclid.bj/1262962230
Shao, C., Huang, H. and Wan, C. (2007). Selection of the suitable neighborhood size for the ISOMAP algorithm. In, Proceedings of International Conference on Neural Networks 300–305.Shao, C., Huang, H. and Wan, C. (2007). Selection of the suitable neighborhood size for the ISOMAP algorithm. In, Proceedings of International Conference on Neural Networks 300–305.
Steinwart, I., Hush, D. and Scovel, C. (2006). A classification framework for anomaly detection., Journal of Machine Learning Research 6 211. MR2249820Steinwart, I., Hush, D. and Scovel, C. (2006). A classification framework for anomaly detection., Journal of Machine Learning Research 6 211. MR2249820
Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction., Science 290 2319–2323.Tenenbaum, J. B., de Silva, V. and Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction., Science 290 2319–2323.
Tsybakov, A. B. (1997). On nonparametric estimation of density level sets., The Annals of Statistics 25 948–969. MR1447735 10.1214/aos/1069362732 euclid.aos/1069362732
Tsybakov, A. B. (1997). On nonparametric estimation of density level sets., The Annals of Statistics 25 948–969. MR1447735 10.1214/aos/1069362732 euclid.aos/1069362732
Vert, R. and Vert, J. P. (2006). Consistency and convergence rates of one-class SVMs and related algorithms., Journal of Machine Learning Research 7 817–854. MR2274388Vert, R. and Vert, J. P. (2006). Consistency and convergence rates of one-class SVMs and related algorithms., Journal of Machine Learning Research 7 817–854. MR2274388
Willett, R. M. and Nowak, R. D. (2007). Minimax optimal level-set estimation., IEEE Transactions on Image Processing 16 2965–2979. MR2472804 10.1109/TIP.2007.910175Willett, R. M. and Nowak, R. D. (2007). Minimax optimal level-set estimation., IEEE Transactions on Image Processing 16 2965–2979. MR2472804 10.1109/TIP.2007.910175