Electronic Journal of Statistics

Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting

José E. Chacón and Tarn Duong

Full-text: Open access


Important information concerning a multivariate data set, such as clusters and modal regions, is contained in the derivatives of the probability density function. Despite this importance, nonparametric estimation of higher order derivatives of the density functions have received only relatively scant attention. Kernel estimators of density functions are widely used as they exhibit excellent theoretical and practical properties, though their generalization to density derivatives has progressed more slowly due to the mathematical intractabilities encountered in the crucial problem of bandwidth (or smoothing parameter) selection. This paper presents the first fully automatic, data-based bandwidth selectors for multivariate kernel density derivative estimators. This is achieved by synthesizing recent advances in matrix analytic theory which allow mathematically and computationally tractable representations of higher order derivatives of multivariate vector valued functions. The theoretical asymptotic properties as well as the finite sample behaviour of the proposed selectors are studied. In addition, we explore in detail the applications of the new data-driven methods for two other statistical problems: clustering and bump hunting. The introduced techniques are combined with the mean shift algorithm to develop novel automatic, nonparametric clustering procedures which are shown to outperform mixture-model cluster analysis and other recent nonparametric approaches in practice. Furthermore, the advantage of the use of smoothing parameters designed for density derivative estimation for feature significance analysis for bump hunting is illustrated with a real data example.

Article information

Electron. J. Statist. Volume 7 (2013), 499-532.

First available in Project Euclid: 6 March 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation
Secondary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

Adjusted Rand index cross validation feature significance nonparametric kernel method mean integrated squared error mean shift algorithm plug-in choice


Chacón, José E.; Duong, Tarn. Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Statist. 7 (2013), 499--532. doi:10.1214/13-EJS781. https://projecteuclid.org/euclid.ejs/1362579368.

Export citation


  • Azzalini, A. and Torelli, N. (2007) Clustering via nonparametric density estimation., Stat. Comput., 17, 71–80.
  • Bowman, A.W. (1984) An alternative method of cross-validation for the smoothing of density estimates., Biometrika, 71, 353–360.
  • Cao, R., Cuevas, A. and González-Manteiga, W. (1994) A comparative study of several smoothing methods in density estimation., Comput. Statist. Data Anal., 17, 153–176.
  • Chacón, J.E. (2009). Data-driven choice of the smoothing parametrization for kernel density estimators., Canad. J. Statist. 37, 249–265.
  • Chacón, J.E. and Duong, T. (2010) Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices., Test, 19, 375–398.
  • Chacón, J.E. and Duong, T. (2011) Unconstrained pilot selectors for smoothed cross validation., Aust. New Zealand J. Statist., 53, 331–351.
  • Chacón, J.E. and Duong, T. (2012) Efficient recursive algorithms for functionals based on higher order derivatives of the multivariate Gaussian density. In, preparation.
  • Chacón, J.E., Duong, T. and Wand, M.P. (2011) Asymptotics for general multivariate kernel density derivative estimators., Statistica Sinica, 21, 807–840.
  • Chaudhuri, P. and Marron, J.S. (1999) SiZer for exploration of structure in curves., J. Amer. Statist. Assoc., 94, 807–823.
  • Cheng, Y. (1995) Mean shift, mode seeking, and clustering., IEEE T. Pattern Anal., 17, 790–799.
  • Choi, E. and Hall, P. (1999) Data sharpening as a prelude to density estimation., Biometrika, 86, 941–947.
  • Comaniciu, D. (2003) An algorithm for data-driven bandwidth selection., IEEE T. Pattern Anal., 25, 281–288.
  • Comaniciu, D. and Meer, P. (2002) Mean shift: A robust approach toward feature space analysis., IEEE Trans. Pattern Anal., 24, 603–619.
  • Comaniciu, D., Ramesh, V. and Meer, P. (2003) Kernel-based object tracking., IEEE Trans. Pattern Anal., 25, 564–577.
  • Cuevas, A., Febrero, M. and Fraiman, R. (2001) Cluster analysis: a further approach based on density estimation., Comput. Statist. Data Anal., 36, 441–459.
  • Dobrovidov, A.V. and Rud’ko, I.M. (2010) Bandwidth selection in nonparametric estimator of density derivative by smoothed cross-validation method., Autom. Remote Control, 71, 209–224.
  • Duong, T. (2007) ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R., J. Statist. Softw., 21(7), 1–16.
  • Duong, T., Cowling, A., Koch, I. and Wand, M.P. (2008) Feature significance for multivariate kernel density estimation., Comput. Stat. Data Anal., 52, 4225–4242.
  • Duong, T. and Hazelton, M.L. (2003) Plug-in bandwidth matrices for bivariate kernel density estimation., J. Nonparametr. Stat., 15, 17–30.
  • Duong, T. and Hazelton, M.L. (2005a) Convergence rates for unconstrained bandwidth matrix selectors in multivariate kernel density estimation., J. Multivariate Anal., 93, 417–433.
  • Duong, T. and Hazelton, M.L. (2005b) Cross-validation bandwidth matrices for multivariate kernel density estimation., Scand. J. Statist., 32, 485–506.
  • Forina M., Armanino C., Lanteri S. and Tiscornia E. (1983) Classification of olive oils from their fatty acid composition. In: H. Martens and H.J. Russwurm (Eds.), Food Research and Data Analysis, Applied Science Publishers, London, pp. 189–214.
  • Fraley, C. and Raftery, A.E. (2002) Model-based clustering, discriminant analysis, and density estimation., J. Amer. Statist. Assoc., 97, 611–631.
  • Frank, A. and Asuncion, A. (2010), UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. University of California, Irvine, School of Information and Computer Science.
  • Fukunaga, K. (1990), Introduction to Statistical Pattern Recognition, 2nd Ed. Academic Press, Boston.
  • Fukunaga, K. and Hostetler, L.D. (1975) The estimation of the gradient of a density function, with applications in pattern recognition., IEEE T. Inform. Theory, 21, 32–40.
  • Gel’fand, I.M. and Shilov, G.E. (1966), Generalized Functions, Volume 1: Properties and Operations. Academic Press, New York.
  • Genovese, C.R, Perone-Pacifico, M., Verdinelly, I. and Wasserman, L. (2009) On the path density of a gradient field., Ann. Statist., 37, 3236–3271.
  • Godtliebsen, F., Marron, J.S. and Chaudhuri, P. (2002) Significance in scale space for bivariate density estimation., J. Comput. Graph. Statist., 11, 1–21.
  • Godtliebsen, F., Marron, J.S. and Chaudhuri, P. (2004) Statistical significance of features in digital images., Image Vision Comput., 22, 1093–1104.
  • Grund, B. and Hall, P. (1995) On the minimisation of the $L^p$ error in mode estimation., Ann Statist., 23, 2264–2284.
  • Hall, P. (1983) Large sample optimality of least squares cross-validation in density estimation., Ann. Statist., 11, 1156–1174.
  • Hall, P. and Marron, J.S. (1987) Extent to which least-squares cross-validation minimises integrated square error in nonparametric density estimation., Probab. Theory Rel. Fields, 74, 567–581.
  • Hall, P. and Marron, J.S. (1991) Lower bounds for bandwidth selection in density estimation., Probab. Theory Rel. Fields, 90, 149–163.
  • Hall, P., Marron, J.S. and Park, B.U. (1992) Smoothed cross validation., Probab. Theory Rel. Fields, 92, 1–20.
  • Hall, P. and Minotte, M.C. (2002) High order data sharpening for density estimation., J. R. Stat. Soc. Ser. B Stat. Methodol., 64, 141–157.
  • Härdle, W., Marron, J.S. and Wand, M.P. (1990) Bandwidth choice for density derivatives., J. R. Stat. Soc. Ser. B Stat. Methodol., 52, 223–232.
  • Holmquist, B. (1985) The direct product permuting matrices., Linear Multilinear Algebra, 17, 117–141.
  • Holmquist, B. (1996a) The $d$-variate vector Hermite polynomial of order $k$., Linear Algebra Appl., 237/238, 155–190.
  • Holmquist, B. (1996b) Expectations of products of quadratic forms in normal variables., Stochastic Anal. Appl., 14, 149–164.
  • Horová, I., Koláček, J. and Vopatová, K. (2013) Full bandwidth matrix selectors for gradient kernel density estimate., Comput. Statist. Data Anal., 57, 364–376.
  • Horová, I. and Vopatová, K. (2011) Kernel density gradient estimate. In, Recent Advances in Functional Data Analysis and Related Topics (ed F. Ferraty), pp. 177–182, Physica Verlag, Heidelberg.
  • Horton, P. and Nakai, K. (1996) A probabilistic classification system for predicting the cellular localization sites of proteins. Proceedings of, Intelligent Systems in Molecular Biology (ISMB-96), 109–115.
  • Hubert, L. and Arabie, P. (1985) Comparing partitions., J. Classification, 2, 193–218.
  • Jones, M.C. (1991) The roles of ISE and MISE in density estimation., Statist. Probab. Lett., 12, 51–56.
  • Jones, M.C. (1992) Potential for automatic bandwidth choice in variations on kernel density estimation., Statist. Probab. Lett., 13, 351–356.
  • Jones, M.C. (1994) On kernel density derivative estimation., Comm. Statist. Theory Methods, 23, 2133–2139.
  • Jones, M.C., Marron, J.S. and Park, B.U. (1991) A simple root $n$ bandwidth selector., Ann. Statist., 19, 1919–1932.
  • Jones, M.C., Marron, J.S., and Sheather, S.J. (1996) A brief survey of bandwidth selection for density estimation., J. Amer. Statist. Assoc., 91, 401–407.
  • Magnus, J.R. and Neudecker, H. (1979) The commutation matrix: some properties and applications., Ann. Statist., 7, 381–394.
  • Kollo, T. and von Rosen, D. (2005), Advanced Multivariate Statistics with Matrices. Springer, Dordrecht.
  • Li, J., Ray, S. and Lindsay, B.G. (2007) A nonparametric statistical approach to clustering via mode identification., Journal of Machine Learning Research, 8, 1687–1723.
  • Magnus, J.R. and Neudecker, H. (1999), Matrix Differential Calculus with Applications in Statistics and Econometrics: Revised Edition. John Wiley & Sons, Chichester.
  • Mathai, A.M. and Provost, S.B. (1992), Quadratic Forms in Random Variables: Theory and Applications. Marcel Dekker, New York.
  • Milligan, G.W. and Cooper, M.C. (1986) A study of the comparability of external criteria for hierarchical cluster analysis., Multivariate Behav. Res., 21, 441–458.
  • Naumann, U. and Wand, M.P. (2009) Automation in high-content flow cytometry screening., Cytometry A, 75A, 789–797.
  • Park, B.U. and Marron, J.S. (1990) Comparison of data-driven bandwidth selectors., J. Amer. Statist. Assoc., 85, 66–72.
  • Parzen, E. (1962) On estimation of a probability density function and mode., Ann. Math. Statist., 33, 1065–1076.
  • Pawlowsky-Glahn, V. and Buccianti, A. (2011), Compositional Data Analysis: Theory and Applications. John Wiley & Sons, Chichester.
  • Pratt, J.P., Zeng, Q.T., Ravnic, D., Huss, H., Rawn, J. and Mentzer, S.J. (2009) Hierarchical clustering of monoclonal antibody reactivity patterns in nonhuman species., Cytometry A, 75A, 734–742.
  • Rinaldo, A. and Wasserman, L. (2010) Generalized density clustering., Ann. Statist., 38, 2678–2722.
  • Rudemo, M. (1982) Empirical choice of histograms and kernel density estimators., Scand. J. Statist., 9, 65–78.
  • Schott, J.R. (2003) Kronecker product permutation matrices and their application to moment matrices of the normal distribution., J. Multivariate Anal., 87, 177–190.
  • Sheather, S.J. and Jones, M.C. (1991) A reliable data-based bandwidth selection method for kernel density estimation., J. R. Stat. Soc. Ser. B Stat. Methodol., 53, 683–690.
  • Scott, D.W. (1992), Multivariate Density Estimation: Theory, Practice, and Visualization. John Wiley & Sons, New York.
  • Simonoff, J.S. (1996), Smoothing Methods in Statistics. Springer-Verlag, Berlin.
  • Stone, C.J. (1984) An asymptotically optimal window selection rule for kernel density estimates., Ann. Statist., 12, 1285–1297.
  • Stuetzle, W. (2003) Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample., J. Classification, 20, 25–47.
  • Vieu, P. (1996) A note on density mode estimation., Statist. Probab. Lett., 26, 297–307.
  • Wand, M.P. and Jones, M.C. (1993) Comparison of smoothing parameterizations in bivariate kernel density estimation., J. Amer. Statist. Assoc., 88, 520–528.
  • Wand, M.P. and Jones, M.C. (1995)., Kernel smoothing, Chapman & Hall.
  • Wang, X., Qiu, W. and Zamar, R.H. (2007) CLUES: A non-parametric clustering method based on local shrinking., Comput. Statist. Data Anal., 52, 286–298.
  • Wu, T.-J. (1997) Root $n$ bandwidth selectors for kernel estimation of density derivatives., J. Amer. Statist. Assoc., 92, 536–547.
  • Zeng, Q.T., Pratt, J.P., Pak, J., Ravnic, D., Huss, H. and Mentzer, S.J. (2007) Feature-guided clustering of multi-dimensional flow cytometry datasets., Journal of Biomedical Informatics, 40, 325–331.