Bernoulli

  • Bernoulli
  • Volume 8, Number 4 (2002), 423-449.

Nonlinear kernel density estimation for binned data: convergence in entropy

Gordon Blower and Julia E. Kelsall

Full-text: Open access

Abstract

A method is proposed for creating a smooth kernel density estimate from a sample of binned data. Simulations indicate that this method produces an estimate for relatively finely binned data which is close to what one would obtain using the original unbinned data. The kernel density estimate $\hat {f}$, is the stationary distribution of a Markov process resembling the Ornstein-Uhlenbeck process. This $\hat {f}$, may be found by an iteration scheme which converges at a geometric rate in the entropy pseudo-metric, and hence in $L^1$, and transportation metrics. The proof uses a logarithmic Sobolev inequality comparing relative Shannon entropy and relative Fisher information with respect to $\hat {f}$.

Article information

Source
Bernoulli Volume 8, Number 4 (2002), 423-449.

Dates
First available in Project Euclid: 7 March 2004

Permanent link to this document
http://projecteuclid.org/euclid.bj/1078681378

Mathematical Reviews number (MathSciNet)
MR2003d:62101

Zentralblatt MATH identifier
1006.62030

Keywords
binned data density estimation kernel estimation logarithmic Sobolev inequality transportation

Citation

Blower, Gordon; Kelsall, Julia E. Nonlinear kernel density estimation for binned data: convergence in entropy. Bernoulli 8 (2002), no. 4, 423--449. http://projecteuclid.org/euclid.bj/1078681378.


Export citation

References

  • [1] Azzalini, A. and Bowman, A.W. (1997) Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press.
  • [2] Barron, A.R. (1986) Entropy and the central limit theorem. Ann. Probab., 14, 336-342.
  • [3] Barron, A.R. and Sheu, C.H. (1991) Approximation of density functions by sequences of exponential families. Ann. Statist., 19, 1347-1369. Abstract can also be found in the ISI/STMA publication
  • [4] Boneva, L.I., Kendall, D.G. and Stefanov, I. (1971) Spline transformations: Three new diagnostic aids for the statistical data-analyst (with discussion). J. Roy. Statist. Soc. Ser. B, 33, 1-70.
  • [5] Bowman, A.W. (1984) An alternative method of cross validation for the smoothing of density estimates. Biometrika, 71, 353-360.
  • [6] Carlen, E.A. and Soffer, A. (1991) Entropy production by block variable summation and central limit theorems. Comm. Math. Phys., 140, 339-371.
  • [7] Davies, E.B. (1989) Heat Kernels and Spectral Theory. Cambridge, Cambridge University Press.
  • [8] Deuschel, J.-D. and Stroock, D.W. (1990) Hypercontractivity and spectral gap of symmetric diffusions with applications to the stochastic Ising models. J. Funct. Anal., 92, 30-48.
  • [9] Dudley, R.M. (1989) Real Analysis and Probability. Pacific Grove, CA: Wadsworth and Brooks/Cole.
  • [10] Gross, L. (1975) Logarithmic Sobolev inequalities. Amer. J. Math., 97, 1061-1083.
  • [11] Gross, L. (1993) Logarithmic Sobolev inequalities and contractivity properties of semigroups. In G. Dell´Antonio and U. Mosco (eds), Dirichlet Forms (Varenna, 1992), Lecture Notes in Math. 1563, pp. 54-88. Berlin: Springer-Verlag.
  • [12] Härdle, W.K. and Scott, D.W. (1992) Smoothing by weighted averaging of rounded points. Comput. Statist., 7, 97-128.
  • [13] Heuser, H.G. (1982) Functional Analysis. New York: Wiley.
  • [14] Kallenberg, O. (1997) Foundations of Modern Probability. New York: Springer-Verlag.
  • [15] Koo, J.Y. and Kooperberg, C. (2000) Logspline density estimation for binned data. Statist. Probab. Lett., 46, 133-147. Abstract can also be found in the ISI/STMA publication
  • [16] Kooperberg, C. and Stone, C.J. (1991) A study of logspline density estimation. Comput. Statist. Data Anal., 12, 327-347.
  • [17] Minnotte, M.C. (1996) The bias-optimized frequency polygon. Comput. Statist., 11, 35-48. Abstract can also be found in the ISI/STMA publication
  • [18] Minnotte, M.C. (1998) Achieving high-order convergence rates for density estimation with binned data. J. Amer. Statist. Assoc., 93, 663-672.
  • [19] Otto, F. and Villani, C. (2000) Generalization of an inequality by Talagrand and links with the logarithmic Sobolev inequality. J. Funct. Anal., 173, 361-400.
  • [20] Rothaus, O.S. (1980) Logarithmic Sobolev inequalities and the spectrum of Sturm-Liouville operators. J. Funct. Anal., 39, 42-56.
  • [21] Scott, D.W. and Sheather, S.J. (1985) Kernel density estimation with binned data. Comm. Statist. Theory Methods, 14, 1353-1359.
  • [22] Sheather, S.J. and Jones, M.C. (1991) A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Ser. B, 53, 683-690. Abstract can also be found in the ISI/STMA publication
  • [23] Stone, C.J., Hansen, M.H., Kooperberg, C. and Truong, Y.K. (1997) Polynomial splines and their tensor products in extended linear modelling. Ann. Statist., 25, 1454-1470. Abstract can also be found in the ISI/STMA publication
  • [24] Talagrand, M. (1996) Transportation cost for Gaussian and other product measures. Geom. Funct. Anal., 6, 587-600.
  • [25] Titterington, D.M. (1983) Kernel-based density estimation using censored, truncated or grouped data. Comm. Statist. Theory Methods, 12, 2151-2167.
  • [26] Tobler, W.R. (1979) Smooth pycnophylactic interpolation for geographical regions. J. Amer. Statist. Assoc., 74, 519-536.
  • [27] Wand, M.P. and Jones, M.C. (1995) Kernel Smoothing. London: Chapman & Hall.