Source: Ann. Statist. Volume 38, Number 5
(2010), 2916-2957.
We present a new adaptive kernel density estimator based on linear diffusion processes. The proposed estimator builds on existing ideas for adaptive smoothing by incorporating information from a pilot density estimate. In addition, we propose a new plug-in bandwidth selection method that is free from the arbitrary normal reference rules used by existing methods. We present simulation examples in which the proposed approach outperforms existing methods in terms of accuracy and reliability.
References
[1] Abramson, I. S. (1982). On bandwidth variation in kernel estimates—a square root law. Ann. Statist. 10 1217–1223.
Mathematical Reviews (MathSciNet):
MR673656
[2] Azencott, R. (1984). Density of diffusions in small time: Asymptotic expansions. In Seminar on Probability, XVIII. Lecture Notes in Math. 1059 402–498. Springer, Berlin.
Mathematical Reviews (MathSciNet):
MR770974
[3] Bellman, R. (1961). A Brief Introduction to Theta Functions. Holt, Rinehart and Winston, New York.
Mathematical Reviews (MathSciNet):
MR125252
[5] Botev, Z. I. (2007). Nonparametric density estimation via diffusion mixing. Technical report, Dept. Mathematics, Univ. Queensland. Available at
http://espace.library.uq.edu.au.
[6] Chaudhuri, P. and Marron, J. S. (2000). Scale space view of of curve estimation. Ann. Statist. 28 408–428.
[7] Choi, E. and Hall, P. (1999). Data sharpening as a prelude to density estimation. Biometrika 86 941–947.
[8] Cohen, J. K., Hagin, F. G. and Keller, J. B. (1972). Short time asymptotic expansions of solutions of parabolic equations. J. Math. Anal. Appl. 38 82–91.
Mathematical Reviews (MathSciNet):
MR303086
[9] Csiszár, I. (1972). A class of measures of informativity of observation channels. Period. Math. Hungar. 2 191–213.
[10] Devrôye, L. (1997). Universal smoothing factor selection in density estimation: Theory and practice. Test 6 223–320.
[11] Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Springer, New York.
[12] Ethier, S. N. and Kurtz, T. G. (2009). Markov Processes. Characterization and Convergence. Wiley, New York.
Mathematical Reviews (MathSciNet):
MR838085
[13] Feller, W. (1952). The parabolic differential equations and the associated semi-groups of transformations. Ann. of Math. (2) 55 468–519.
Mathematical Reviews (MathSciNet):
MR47886
[14] Friedman, A. (1964). Partial Differential Equations of Parabolic Type. Prentice Hall, Englewood Cliffs, NJ.
Mathematical Reviews (MathSciNet):
MR181836
[15] Hall, P. (1990). On the bias of variable bandwidth curve estimators. Biometrika 77 523–535.
[16] Hall, P., Hu, T. C. and Marron, J. S. (1995). Improved variable window kernel estimates of probability densities. Ann. Ststist. 23 1–10.
[17] Hall, P. and Marron, J. S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6 109–115.
Mathematical Reviews (MathSciNet):
MR907270
[18] Hall, P. and Minnotte, M. C. (2002). High order data sharpening for density estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 141–157.
[19] Hall, P. and Park, B. U. (2002). New methods for bias correction at endpoints and boundaries. Ann. Statist. 30 1460–1479.
[20] Hall, P. and Park, B. U. (2002). New methods for bias correction at endpoints and boundaries. Ann. Statist. 30 1460–1479.
[21] Havrda, J. H. and Charvat, F. (1967). Quantification methods of classification processes: Concepts of structural α entropy. Kybernetika (Prague) 3 30–35.
Mathematical Reviews (MathSciNet):
MR208775
[22] Jones, M. C. and Foster, P. J. (1996). A simple nonnegative boundary correction method for kernel density estimation. Statist. Sinica 6 1005–1013.
[23] Jones, M. C., Marron, J. S. and Park, B. U. (1991). A simple root n bandwidth selector. Ann. Statist. 19 1919–1932.
[24] Jones, M. C., Marron, J. S. and Sheather, S. J. (1993). Simple boundary correction for kernel density estimation. Statist. Comput. 3 135–146.
[25] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). A brief survey of bandwidth selection for density estimation. J. Amer. Statist. Assoc. 91 401–407.
[26] Jones, M. C., Marron, J. S. and Sheather, S. J. (1996). Progress in data-based bandwidth selection for kernel density estimation. Comput. Statist. 11 337–381.
[27] Jones, M. C., McKay, I. J. and Hu, T. C. (1994). Variable location and scale kernel density estimation. Ann. Inst. Statist. Math. 46 521–535.
[28] Jones, M. C. and Signorini, D. F. (1997). A comparison of higher-order bias kernel density estimators. J. Amer. Statist. Assoc. 92 1063–1073.
[29] Kannai, Y. (1977). Off diagonal short time asymptotics for fundamental solutions of diffusion equations. Comm. Partial Differential Equations 2 781–830.
Mathematical Reviews (MathSciNet):
MR603299
[30] Kapur, J. N. and Kesavan, H. K. (1987). Generalized Maximum Entropy Principle (With Applications). Standford Educational Press, Waterloo, ON.
Mathematical Reviews (MathSciNet):
MR934205
[31] Karunamuni, R. J. and Alberts, T. (2005). A generalized reflection method of boundary correction in kernel density estimation. Canad. J. Statist. 33 497–509.
[32] Karunamuni, R. J. and Zhang, S. (2008). Some improvements on a boundary corrected kernel density estimator. Statist. Probab. Lett. 78 499–507.
[33] Kerm, P. V. (2003). Adaptive kernel density estimation. Statist. J. 3 148–156.
[34] Kloeden, P. E. and Platen, E. (1999). Numerical Solution of Stochastic Differential Equations. Springer, Berlin.
[35] Ladyženskaja, O. A., Solonnikov, V. A. and Ural’ceva, N. N. (1967). Linear and Quasilinear Equations of Parabolic Type. Translations of Mathematical Monographs 23 xi+648. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet):
MR241822
[36] Larsson, S. and Thomee, V. (2003). Partial Differential Equations with Numerical Methods. Springer, Berlin.
[37] Lehmann, E. L. (1990). Model specification: The views of fisher and neyman, and later developments. Statist. Sci. 5 160–168.
[38] Loader, C. R. (1999). Bandwidth selection: Classical or plug-in. Ann. Statist. 27 415–438.
[39] Loftsgaarden, D. O. and Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 1049–1051.
Mathematical Reviews (MathSciNet):
MR176567
[40] Marron, J. S. (1985). An asymptotically efficient solution to the bandwidth problem of kernel density estimation. Ann. Statist. 13 1011–1023.
Mathematical Reviews (MathSciNet):
MR803755
[41] Marron, J. S. and Ruppert, D. (1996). Transformations to reduce boundary bias in kernel density-estimation. J. Roy. Statist. Soc. Ser. B 56 653–671.
[42] Marron, J. S. and Wand, M. P. (1992). Exact mean integrated error. Ann. Statist. 20 712–736.
[43] Molchanov, S. A. (1975). Diffusion process and Riemannian geometry. Russian Math. Surveys 30 1–63.
[44] Park, B. U., Jeong, S. O. and Jones, M. C. (2003). Adaptive variable location kernel density estimators with good performance at boundaries. J. Nonparametr. Stat. 15 61–75.
[45] Park, B. U. and Marron, J. S. (1990). Comparison of data-driven bandwidith selectors. J. Amer. Statist. Assoc. 85 66–72.
[46] Samiuddin, M. and El-Sayyad, G. M. (1990). On nonparametric kernel density estimates. Biometrika 77 865.
[47] Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. Wiley, New York.
[48] Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B 53 683–690.
[49] Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
Mathematical Reviews (MathSciNet):
MR848134
[50] Simonoff, J. S. (1996). Smoothing Methods in Statistics. Springer, New York.
[51] Terrell, G. R. and Scott, D. W. (1992). Variable kernel density estimation. Ann. Statist. 20 1236–1265.
[52] Wand, M. P. and Jones, M. C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9 97–117.
[53] Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Chapman and Hall, London.