The Annals of Statistics

Bump hunting with non-Gaussian kernels

Peter Hall, Michael C. Minnotte, and Chunming Zhang

Source: Ann. Statist. Volume 32, Number 5 (2004), 2124-2141.

Abstract

It is well known that the number of modes of a kernel density estimator is monotone nonincreasing in the bandwidth if the kernel is a Gaussian density. There is numerical evidence of nonmonotonicity in the case of some non-Gaussian kernels, but little additional information is available. The present paper provides theoretical and numerical descriptions of the extent to which the number of modes is a nonmonotone function of bandwidth in the case of general compactly supported densities. Our results address popular kernels used in practice, for example, the Epanechnikov, biweight and triweight kernels, and show that in such cases nonmonotonicity is present with strictly positive probability for all sample sizes n3. In the Epanechnikov and biweight cases the probability of nonmonotonicity equals 1 for all n2. Nevertheless, in spite of the prevalence of lack of monotonicity revealed by these results, it is shown that the notion of a critical bandwidth (the smallest bandwidth above which the number of modes is guaranteed to be monotone) is still well defined. Moreover, just as in the Gaussian case, the critical bandwidth is of the same size as the bandwidth that minimises mean squared error of the density estimator. These theoretical results, and new numerical evidence, show that the main effects of nonmonotonicity occur for relatively small bandwidths, and have negligible impact on many aspects of bump hunting.

Primary Subjects: 62G07
Secondary Subjects: 62G20
Keywords: Bandwidth choice; bootstrap; critical bandwidth; density estimation; kernel methods; modality; mode test; nonparametric curve estimation; unimodality

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1098883784
Digital Object Identifier: doi:10.1214/009053604000000715
Mathematical Reviews number (MathSciNet): MR2102505
Zentralblatt MATH identifier: 1056.62049

References

Chaudhuri, P. and Marron, J. S. (1999). SiZer for exploration of structures in curves. J. Amer. Statist. Assoc. 94 807--823.
Mathematical Reviews (MathSciNet): MR1723347
Chaudhuri, P. and Marron, J. S. (2000). Scale space view of curve estimation. Ann. Statist. 28 408--428.
Mathematical Reviews (MathSciNet): MR1790003
Digital Object Identifier: doi:10.1214/aos/1016218224
Project Euclid: euclid.aos/1016218224
Cheng, M.-Y. and Hall, P. (1999). Mode testing in difficult cases. Ann. Statist. 27 1294--1315.
Mathematical Reviews (MathSciNet): MR1740110
Digital Object Identifier: doi:10.1214/aos/1017939246
Project Euclid: euclid.aos/1017938927
Cuevas, A. and González-Manteiga, W. (1991). Data-driven smoothing based on convexity properties. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 225--240. Kluwer, Dordrecht.
Mathematical Reviews (MathSciNet): MR1154331
Zentralblatt MATH: 0806.62024
Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577--588.
Mathematical Reviews (MathSciNet): MR1340510
Fisher, N. I., Mammen, E. and Marron, J. S. (1994). Testing for multimodality. Comput. Statist. Data Anal. 18 499--512.
Mathematical Reviews (MathSciNet): MR1310472
Fisher, N. I. and Marron, J. S. (2001). Mode testing via the excess mass estimate. Biometrika 88 499--517.
Mathematical Reviews (MathSciNet): MR1844848
Zentralblatt MATH: 0985.62034
Digital Object Identifier: doi:10.1093/biomet/88.2.499
Good, I. J. and Gaskins, R. A. (1980). Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and meteorite data (with discussion). J. Amer. Statist. Assoc. 75 42--73.
Mathematical Reviews (MathSciNet): MR568579
Hall, P. and York, M. (2001). On the calibration of Silverman's test for multimodality. Statist. Sinica 11 515--536.
Mathematical Reviews (MathSciNet): MR1844538
Zentralblatt MATH: 1026.62047
Hartigan, J. A. and Hartigan, P. M. (1985). The DIP test of unimodality. Ann. Statist. 13 70--84.
Mathematical Reviews (MathSciNet): MR773153
Izenman, A. J. and Sommer, C. (1988). Philatelic mixtures and multimodal densities. J. Amer. Statist. Assoc. 83 941--953.
Komlós, J., Major, P. and Tusnády, G. (1976). An approximation of partial sums of independent rv's, and the sample df. II. Z. Wahrsch. Verv. Gebiete 34 33--58.
Mathematical Reviews (MathSciNet): MR402883
Digital Object Identifier: doi:10.1007/BF00532688
Mammen, E., Marron, J. S. and Fisher, N. I. (1992). Some asymptotics for multimodality tests based on kernel density estimates. Probab. Theory Related Fields 91 115--132.
Mathematical Reviews (MathSciNet): MR1142765
Digital Object Identifier: doi:10.1007/BF01194493
Minnotte, M. C. (1997). Nonparametric testing of the existence of modes. Ann. Statist. 25 1646--1660.
Mathematical Reviews (MathSciNet): MR1463568
Digital Object Identifier: doi:10.1214/aos/1031594735
Project Euclid: euclid.aos/1031594735
Minnotte, M. C. and Scott, D. W. (1993). The mode tree: A tool for visualization of nonparametric density estimates. J. Comput. Graph. Statist. 2 51--68.
Müller, D. W and Sawitzki, G. (1991). Excess mass estimates and tests for multimodality. J. Amer. Statist. Assoc. 86 738--746.
Mathematical Reviews (MathSciNet): MR1147099
Polonik, W. (1995a). Measuring mass concentrations and estimating density contour clusters---an excess mass approach. Ann. Statist. 23 855--881.
Mathematical Reviews (MathSciNet): MR1345204
Polonik, W. (1995b). Density estimation under qualitative assumptions in higher dimensions. J. Multivariate Anal. 55 61--81.
Mathematical Reviews (MathSciNet): MR1365636
Digital Object Identifier: doi:10.1006/jmva.1995.1067
Roeder, K. (1990). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. J. Amer. Statist. Assoc. 85 617--624.
Roeder, K. (1994). A graphical technique for determining the number of components in a mixture of normals. J. Amer. Statist. Assoc. 89 487--495.
Mathematical Reviews (MathSciNet): MR1294074
Schoenberg, I. J. (1950). On Pólya frequency functions. II. Variation-diminishing integral operators of the convolution type. Acta Sci. Math. (Szeged) 12 97--106.
Mathematical Reviews (MathSciNet): MR35861
Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683--690.
Mathematical Reviews (MathSciNet): MR1125725
Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. J. Roy. Statist. Soc. Ser. B 43 97--99.
Mathematical Reviews (MathSciNet): MR610384
Silverman, B. W. (1983). Some properties of a test for multimodality based on kernel density estimates. In Probability, Statistics and Analysis (J. F. C. Kingman and G. E. H. Reuter, eds.) 248--259. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR696032
Zentralblatt MATH: 0504.62036

2010 © Institute of Mathematical Statistics