The Annals of Statistics

Adaptive Hausdorff estimation of density level sets

Aarti Singh, Clayton Scott, and Robert Nowak

Full-text: Open access

Abstract

Consider the problem of estimating the γ-level set Gγ*={x: f(x)≥γ} of an unknown d-dimensional density function f based on n independent observations X1, …, Xn from the density. This problem has been addressed under global error criteria related to the symmetric set difference. However, in certain applications a spatially uniform mode of convergence is desirable to ensure that the estimated set is close to the target set everywhere. The Hausdorff error criterion provides this degree of uniformity and, hence, is more appropriate in such situations. It is known that the minimax optimal rate of error convergence for the Hausdorff metric is (n/log n)−1/(d+2α) for level sets with boundaries that have a Lipschitz functional form, where the parameter α characterizes the regularity of the density around the level of interest. However, the estimators proposed in previous work are nonadaptive to the density regularity and require knowledge of the parameter α. Furthermore, previously developed estimators achieve the minimax optimal rate for rather restricted classes of sets (e.g., the boundary fragment and star-shaped sets) that effectively reduce the set estimation problem to a function estimation problem. This characterization precludes level sets with multiple connected components, which are fundamental to many applications. This paper presents a fully data-driven procedure that is adaptive to unknown regularity conditions and achieves near minimax optimal Hausdorff error control for a class of density level sets with very general shapes and multiple connected components.

Article information

Source
Ann. Statist., Volume 37, Number 5B (2009), 2760-2782.

Dates
First available in Project Euclid: 17 July 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1247836668

Digital Object Identifier
doi:10.1214/08-AOS661

Mathematical Reviews number (MathSciNet)
MR2541446

Zentralblatt MATH identifier
1173.62019

Subjects
Primary: 62G05: Estimation 62G20: Asymptotic properties

Keywords
Density level set Hausdorff error rates of convergence adaptivity

Citation

Singh, Aarti; Scott, Clayton; Nowak, Robert. Adaptive Hausdorff estimation of density level sets. Ann. Statist. 37 (2009), no. 5B, 2760--2782. doi:10.1214/08-AOS661. https://projecteuclid.org/euclid.aos/1247836668


Export citation

References

  • [1] Candés, E. and Dohono, D. L. (1999). Curvelets: A surprisingly effective nonadaptive representation for objects with edges. In Curves and Surfaces (L. Schumaker et al., eds.). Vanderbilt Univ. Press, Nashville, TN.
  • [2] Cavalier, L. (1997). Nonparametric estimation of regression level sets. Statistics 29 131–160.
  • [3] Cuevas, A., Manteiga, W. G. and Casal, A. R. (2006). Plug-in estimation of general level sets. Aust. N. Z. J. Stat. 48 7–19.
  • [4] Devroye, L. and Lugosi, G. (2001). Combinatorial Methods in Density Estimation. Springer, New York.
  • [5] Donoho, D. L. (1999). Wedgelets: Nearly-minimax estimation of edges. Ann. Statist. 27 859–897.
  • [6] Ester, M., Kriegel, H. P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD). AAAI Press, Portland, OR.
  • [7] Hårdle, W., Park, B. U. and Tsybakov, A. B. (1995). Estimation of nonsharp support boundaries. J. Multivariate Anal. 5 205–218.
  • [8] Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York.
  • [9] Korostelev, A. P. and Tsybakov, A. B. (1993). Minimax Theory of Image Reconstruction. Springer, New York.
  • [10] Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: An approach based on kernel estimates with variable bandwidth selectors. Ann. Statist. 25 929–947.
  • [11] Liu, R. Y., Parelius, J. M. and Singh, K. (1999). Multivariate analysis by data depth: Descriptive statistics, graphics and inference. Ann. Statist. 27 783–858.
  • [12] Pacifico, M., Genovese, C., Verdinelli, I. and Wasserman, L. (2004). False discovery control for random fields. J. Amer. Statist. Assoc. 99 1002–1014.
  • [13] Polonik, W. (1995). Measuring mass concentrations and estimating density contour cluster-an excess mass approach. Ann. Statist. 23 855–881.
  • [14] Rigollet, P. and Vert, R. (2006). Fast rates for plug-in estimators of density level sets. Available at http://www.citebase.org/abstract?id=oai:arXiv.org:math/0611473.
  • [15] Scott, C. and Davenport, M. (2007). Regression level set estimation via cost-sensitive classification. IEEE Trans. Signal Process. 55 2752–2757.
  • [16] Scott, C. and Nowak, R. (2006). Learning minimum volume sets. J. Mach. Learn. Res. 7 665–704.
  • [17] Singh, A., Nowak, R. and Scott, C. (2008). Adaptive hausdorff estimation of density level sets. In Learning Theory: 21st Annual Conference on Learning Theory (COLT). Omnipress, Helsinki, Finland.
  • [18] Singh, A., Scott, C. and Nowak, R. D. (2007). Adaptive hausdorff estimation of density level sets. Technical Report ECE-07-06, Univ. Wisconsin–Madison, ECE Dept. Available at http://www.pacm.princeton.edu/~asingh/pubs/TR_Hausdorff.pdf.
  • [19] Sole, A., Caselles, V., Sapiro, G. and Arandiga, F. (2004). Morse description and geometric encoding of digital elevation maps. IEEE Trans. Image Process. 13 1245–1262.
  • [20] Steinwart, I., Hush, D. and Scovel, C. (2005). A classification framework for anomaly detection. J. Mach. Learn. Res. 6 211–232.
  • [21] Stuetzle, W. (2003). Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47.
  • [22] Szewczyk, R., Osterweil, E., Polastre, J., Hamilton, M., Mainwaring, A. and Estrin, D. (2004). Habitat monitoring with sensor networks. Communications of the ACM 47 34–40.
  • [23] Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969.
  • [24] Vert, R. and Vert, J.-P. (2006). Consistency and convergence rates of one-class svms and related algorithms. J. Mach. Learn. Res. 7 817–854.
  • [25] Willett, R. and Nowak, R. (2005). Level set estimation in medical imaging. In IEEE SSP Workshop. Bordeaux, France.
  • [26] Willett, R. and Nowak, R. (2007). Minimax optimal level set estimation. IEEE Trans. Inform. Theory 16 2965–2979.
  • [27] Yang, Y. H., Buckley, M., Dudoit, S. and Speed, T. (2002). Comparision of methods for image analysis on cdna microarray data. J. Comput. Graph. Statist. 11 108–136.