The Annals of Statistics

Kernel density estimation via diffusion

Z. I. Botev, J. F. Grotowski, and D. P. Kroese
Source: Ann. Statist. Volume 38, Number 5 (2010), 2916-2957.

Abstract

We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability.

First Page: Show Hide
Primary Subjects: 62G07, 62G20
Secondary Subjects: 35K05, 35K15, 60J60, 60J70
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1281964340
Digital Object Identifier: doi:10.1214/10-AOS799
Zentralblatt MATH identifier: 1200.62029
Mathematical Reviews number (MathSciNet): MR2722460

References

[1] Abramson, I. S. (1982). On bandwidth variation in kernel estimates—a square root law. Ann. Statist. 10 1217–1223.
Mathematical Reviews (MathSciNet): MR673656
Zentralblatt MATH: 0507.62040
Digital Object Identifier: doi:10.1214/aos/1176345986
Project Euclid: euclid.aos/1176345986
[2] Azencott, R. (1984). Density of diffusions in small time: Asymptotic expansions. In Seminar on Probability, XVIII. Lecture Notes in Math. 1059 402–498. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR770974
[3] Bellman, R. (1961). A Brief Introduction to Theta Functions. Holt, Rinehart and Winston, New York.
Mathematical Reviews (MathSciNet): MR125252
Zentralblatt MATH: 0098.28301
[4] Botev, Z. I. (2007). Kernel density estimation using Matlab. Available at http://www.mathworks.us/matlabcentral/fileexchange/authors/27236.
[5] Botev, Z. I. (2007). Nonparametric density estimation via diffusion mixing. Technical report, Dept. Mathematics, Univ. Queensland. Available at http://espace.library.uq.edu.au.
[6] Chaudhuri, P. and Marron, J. S. (2000). Scale space view of of curve estimation. Ann. Statist. 28 408–428.
Mathematical Reviews (MathSciNet): MR1790003
Zentralblatt MATH: 1106.62318
Digital Object Identifier: doi:10.1214/aos/1016218224
Project Euclid: euclid.aos/1016218224
[7] Choi, E. and Hall, P. (1999). Data sharpening as a prelude to density estimation. Biometrika 86 941–947.
Mathematical Reviews (MathSciNet): MR1741990
Zentralblatt MATH: 0942.62038
Digital Object Identifier: doi:10.1093/biomet/86.4.941
[8] Cohen, J. K., Hagin, F. G. and Keller, J. B. (1972). Short time asymptotic expansions of solutions of parabolic equations. J. Math. Anal. Appl. 38 82–91.
Mathematical Reviews (MathSciNet): MR303086
Digital Object Identifier: doi:10.1016/0022-247X(72)90119-9
[9] Csiszár, I. (1972). A class of measures of informativity of observation channels. Period. Math. Hungar. 2 191–213.
[10] Devrôye, L. (1997). Universal smoothing factor selection in density estimation: Theory and practice. Test 6 223–320.
Mathematical Reviews (MathSciNet): MR1616896
Zentralblatt MATH: 0949.62026
Digital Object Identifier: doi:10.1007/BF02564701
[11] Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Springer, New York.
Mathematical Reviews (MathSciNet): MR1847783
[12] Ethier, S. N. and Kurtz, T. G. (2009). Markov Processes. Characterization and Convergence. Wiley, New York.
Mathematical Reviews (MathSciNet): MR838085
[13] Feller, W. (1952). The parabolic differential equations and the associated semi-groups of transformations. Ann. of Math. (2) 55 468–519.
Mathematical Reviews (MathSciNet): MR47886
Digital Object Identifier: doi:10.2307/1969644
[14] Friedman, A. (1964). Partial Differential Equations of Parabolic Type. Prentice Hall, Englewood Cliffs, NJ.
Mathematical Reviews (MathSciNet): MR181836
[15] Hall, P. (1990). On the bias of variable bandwidth curve estimators. Biometrika 77 523–535.
Mathematical Reviews (MathSciNet): MR1087843
Zentralblatt MATH: 0733.62046
Digital Object Identifier: doi:10.1093/biomet/77.3.529
[16] Hall, P., Hu, T. C. and Marron, J. S. (1995). Improved variable window kernel estimates of probability densities. Ann. Ststist. 23 1–10.
Mathematical Reviews (MathSciNet): MR1331652
Zentralblatt MATH: 0822.62026
Digital Object Identifier: doi:10.1214/aos/1176324451
Project Euclid: euclid.aos/1176324451
[17] Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6 109–115.
Mathematical Reviews (MathSciNet): MR907270
[18] Hall, P. and Minnotte, M. C. (2002). High order data sharpening for density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 141–157.
Mathematical Reviews (MathSciNet): MR1883130
Zentralblatt MATH: 1015.62031
Digital Object Identifier: doi:10.1111/1467-9868.00329
[19] Hall, P. and Park, B. U. (2002). New methods for bias correction at endpoints and boundaries. Ann. Statist. 30 1460–1479.
Mathematical Reviews (MathSciNet): MR1936326
Zentralblatt MATH: 1014.62041
Digital Object Identifier: doi:10.1214/aos/1035844983
Project Euclid: euclid.aos/1035844983
[20] Hall, P. and Park, B. U. (2002). New methods for bias correction at endpoints and boundaries. Ann. Statist. 30 1460–1479.
Mathematical Reviews (MathSciNet): MR1936326
Digital Object Identifier: doi:10.1214/aos/1035844983
Project Euclid: euclid.aos/1035844983
[21] Havrda, J. H. and Charvat, F. (1967). Quantification methods of classification processes: Concepts of structural α entropy. Kybernetika (Prague) 3 30–35.
Mathematical Reviews (MathSciNet): MR208775
Zentralblatt MATH: 0153.48403
[22] Jones, M. C. and Foster, P. J. (1996). A simple nonnegative boundary correction method for kernel density estimation. Statist. Sinica 6 1005–1013.
Mathematical Reviews (MathSciNet): MR1422417
Zentralblatt MATH: 0859.62037
[23] Jones, M. C., Marron, J. S. and Park, B. U. (1991). A simple root n bandwidth selector. Ann. Statist. 19 1919–1932.
Mathematical Reviews (MathSciNet): MR1135156
Zentralblatt MATH: 0745.62033
Digital Object Identifier: doi:10.1214/aos/1176348378
Project Euclid: euclid.aos/1176348378
[24] Jones, M. C., Marron, J. S. and Sheather, S. J. (1993). Simple boundary correction for kernel density estimation. Statist. Comput. 3 135–146.
[25] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401–407.
Mathematical Reviews (MathSciNet): MR1394097
Zentralblatt MATH: 0873.62040
Digital Object Identifier: doi:10.2307/2291420
[26] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). Progress in data-based bandwidth selection for kernel density estimation. Comput. Statist. 11 337–381.
Mathematical Reviews (MathSciNet): MR1415761
Zentralblatt MATH: 0897.62037
[27] Jones, M. C., McKay, I. J. and Hu, T. C. (1994). Variable location and scale kernel density estimation. Ann. Inst. Statist. Math. 46 521–535.
Mathematical Reviews (MathSciNet): MR1309722
Zentralblatt MATH: 0818.62039
[28] Jones, M. C. and Signorini, D. F. (1997). A comparison of higher-order bias kernel density estimators. J. Amer. Statist. Assoc. 92 1063–1073.
Mathematical Reviews (MathSciNet): MR1482137
Zentralblatt MATH: 0888.62035
Digital Object Identifier: doi:10.2307/2965571
[29] Kannai, Y. (1977). Off diagonal short time asymptotics for fundamental solutions of diffusion equations. Comm. Partial Differential Equations 2 781–830.
Mathematical Reviews (MathSciNet): MR603299
Zentralblatt MATH: 0381.35039
Digital Object Identifier: doi:10.1080/03605307708820048
[30] Kapur, J. N. and Kesavan, H. K. (1987). Generalized Maximum Entropy Principle (With Applications). Standford Educational Press, Waterloo, ON.
Mathematical Reviews (MathSciNet): MR934205
Zentralblatt MATH: 0718.62007
[31] Karunamuni, R. J. and Alberts, T. (2005). A generalized reflection method of boundary correction in kernel density estimation. Canad. J. Statist. 33 497–509.
Mathematical Reviews (MathSciNet): MR2232376
Digital Object Identifier: doi:10.1002/cjs.5550330403
[32] Karunamuni, R. J. and Zhang, S. (2008). Some improvements on a boundary corrected kernel density estimator. Statist. Probab. Lett. 78 499–507.
Mathematical Reviews (MathSciNet): MR2400863
[33] Kerm, P. V. (2003). Adaptive kernel density estimation. Statist. J. 3 148–156.
[34] Kloeden, P. E. and Platen, E. (1999). Numerical Solution of Stochastic Differential Equations. Springer, Berlin.
[35] Ladyženskaja, O. A., Solonnikov, V. A. and Ural’ceva, N. N. (1967). Linear and Quasilinear Equations of Parabolic Type. Translations of Mathematical Monographs 23 xi+648. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR241822
[36] Larsson, S. and Thomee, V. (2003). Partial Differential Equations with Numerical Methods. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1995838
[37] Lehmann, E. L. (1990). Model specification: The views of fisher and neyman, and later developments. Statist. Sci. 5 160–168.
Mathematical Reviews (MathSciNet): MR1062574
Digital Object Identifier: doi:10.1214/ss/1177012164
Project Euclid: euclid.ss/1177012164
[38] Loader, C. R. (1999). Bandwidth selection: Classical or plug-in. Ann. Statist. 27 415–438.
Mathematical Reviews (MathSciNet): MR1714723
Zentralblatt MATH: 0938.62035
Digital Object Identifier: doi:10.1214/aos/1018031201
Project Euclid: euclid.aos/1018031201
[39] Loftsgaarden, D. O. and Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 1049–1051.
Mathematical Reviews (MathSciNet): MR176567
Zentralblatt MATH: 0132.38905
Digital Object Identifier: doi:10.1214/aoms/1177700079
Project Euclid: euclid.aoms/1177700079
[40] Marron, J. S. (1985). An asymptotically efficient solution to the bandwidth problem of kernel density estimation. Ann. Statist. 13 1011–1023.
Mathematical Reviews (MathSciNet): MR803755
Zentralblatt MATH: 0585.62073
Digital Object Identifier: doi:10.1214/aos/1176349653
Project Euclid: euclid.aos/1176349653
[41] Marron, J. S. and Ruppert, D. (1996). Transformations to reduce boundary bias in kernel density-estimation. J. Roy. Statist. Soc. Ser. B 56 653–671.
Mathematical Reviews (MathSciNet): MR1293239
[42] Marron, J. S. and Wand, M. P. (1992). Exact mean integrated error. Ann. Statist. 20 712–736.
Mathematical Reviews (MathSciNet): MR1165589
Zentralblatt MATH: 0746.62040
Digital Object Identifier: doi:10.1214/aos/1176348653
Project Euclid: euclid.aos/1176348653
[43] Molchanov, S. A. (1975). Diffusion process and Riemannian geometry. Russian Math. Surveys 30 1–63.
[44] Park, B. U., Jeong, S. O. and Jones, M. C. (2003). Adaptive variable location kernel density estimators with good performance at boundaries. J. Nonparametr. Stat. 15 61–75.
Mathematical Reviews (MathSciNet): MR1958960
Zentralblatt MATH: 1019.62031
Digital Object Identifier: doi:10.1080/10485250306041
[45] Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidith selectors. J. Amer. Statist. Assoc. 85 66–72.
[46] Samiuddin, M. and El-Sayyad, G. M. (1990). On nonparametric kernel density estimates. Biometrika 77 865.
Mathematical Reviews (MathSciNet): MR1086696
Zentralblatt MATH: 0712.62033
Digital Object Identifier: doi:10.1093/biomet/77.4.865
[47] Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1191168
[48] Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683–690.
Mathematical Reviews (MathSciNet): MR1125725
[49] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR848134
Zentralblatt MATH: 0617.62042
[50] Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York.
Mathematical Reviews (MathSciNet): MR1391963
Zentralblatt MATH: 0859.62035
[51] Terrell, G. R. and Scott, D. W. (1992). Variable kernel density estimation. Ann. Statist. 20 1236–1265.
Mathematical Reviews (MathSciNet): MR1186249
Zentralblatt MATH: 0763.62024
Digital Object Identifier: doi:10.1214/aos/1176348768
Project Euclid: euclid.aos/1176348768
[52] Wand, M. P. and Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9 97–117.
Mathematical Reviews (MathSciNet): MR1280754
Zentralblatt MATH: 0937.62055
[53] Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1319818

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics