The Annals of Statistics

Density estimation for biased data

Sam Efromovich

Full-text: Open access

Abstract

The concept of biased data is well known and its practical applications range from social sciences and biology to economics and quality control. These observations arise when a sampling procedure chooses an observation with probability that depends on the value of the observation. This is an interesting sampling procedure because it favors some observations and neglects others. It is known that biasing does not change rates of nonparametric density estimation, but no results are available about sharp constants. This article presents asymptotic results on sharp minimax density estimation. In particular, a coefficient of difficulty is introduced that shows the relationship between sample sizes of direct and biased samples that imply the same accuracy of estimation. The notion of the restricted local minimax, where a low-frequency part of the estimated density is known, is introduced; it sheds new light on the phenomenon of nonparametric superefficiency. Results of a numerical study are presented.

Article information

Source
Ann. Statist. Volume 32, Number 3 (2004), 1137-1161.

Dates
First available in Project Euclid: 24 May 2004

Permanent link to this document
http://projecteuclid.org/euclid.aos/1085408497

Digital Object Identifier
doi:10.1214/009053604000000300

Mathematical Reviews number (MathSciNet)
MR2065200

Zentralblatt MATH identifier
02100795

Subjects
Primary: 625G07
Secondary: 62C05: General considerations 62E20: Asymptotic distribution theory

Keywords
Adaptation average risk coefficient of difficulty nonparametric restricted minimax small sample

Citation

Efromovich, Sam. Density estimation for biased data. Ann. Statist. 32 (2004), no. 3, 1137--1161. doi:10.1214/009053604000000300. http://projecteuclid.org/euclid.aos/1085408497.


Export citation

References

  • Brown, L. D., Low, M. G. and Zhao, L. H. (1997). Superefficiency in nonparametric function estimation. Ann. Statist. 25 2607--2625.
  • Buckland, S. T., Anderson, D. R., Burnham, K. P. and Laake, J. L. (1993). Distance Sampling: Estimating Abundance of Biological Populations. Chapman and Hall, London.
  • Cook, R. D. and Martin, F. B. (1974). A model for quadrat sampling with ``visibility bias.'' J. Amer. Statist. Assoc. 69 345--349.
  • Cox, D. R. (1969). Some sampling problems in technology. In New Developments in Survey Sampling (N. L. Johnson and H. Smith, Jr., eds.) 506--527. Wiley, New York.
  • Devroye, L. (1987). A Course in Density Estimation. Birkhäuser, Boston.
  • Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: The $L_1$ View. Wiley, New York.
  • Efromovich, S. (1985). Nonparametric estimation of a density with unknown smoothness. Theory Probab. Appl. 30 557--568.
  • Efromovich, S. (1989). On sequential nonparametric estimation of a density. Theory Probab. Appl. 34 228--239.
  • Efromovich, S. (1998). On global and pointwise adaptive estimation. Bernoulli 4 273--282.
  • Efromovich, S. (1999). Nonparametric Curve Estimation: Methods, Theory and Applications. Springer, New York.
  • Efromovich, S. (2000). On sharp adaptive estimation of multivariate curves. Math. Methods Statist. 9 117--139.
  • Efromovich, S. (2001). Density estimation under random censorship and order restrictions: From asymptotic to small samples. J. Amer. Statist. Assoc. 96 667--684.
  • Efromovich, S. (2004). Distribution estimation for biased data. J. Statist. Plann. Inference. To appear.
  • Gill, R. D., Vardi, Y. and Wellner, J. A. (1988). Large sample theory of empirical distributions in biased sampling methods. Ann. Statist. 16 1069--1112.
  • Golubev, G. K. (1991). LAN in problems of nonparametric estimation of functions and lower bounds for quadratic risks. Theory Probab. Appl. 36 152--157.
  • Ibragimov, I. A. and Khasminskii, R. Z. (1981). Statistical Estimation: Asymptotic Theory. Springer, New York.
  • Kahane, J.-P. (1985). Some Random Series of Functions, 2nd ed. Cambridge Univ. Press.
  • Lee, J. and Berger, J. O. (2001). Semiparametric Bayesian analysis of selection models. J. Amer. Statist. Assoc. 96 1397--1409.
  • Patil, G. P. and Rao, C. R. (1977). The weighted distributions: A survey of their applications. In Applications of Statistics (P. R. Krishnaiah, ed.) 383--405. North-Holland, Amsterdam.
  • Pinsker, M. S. (1980). Optimal filtering of a square integrable signal in Gaussian noise. Problems Inform. Transmission 16 52--68.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, London.
  • Sun, J. and Woodroofe, M. B. (1997). Semiparametric estimates under biased sampling. Statist. Sinica 7 545--575.
  • Wu, C. O. (1997). A cross-validation bandwidth choice for kernel density estimates with selection biased data. J. Multivariate Anal. 61 38--60.
  • Wu, C. O. and Mao, A. Q. (1996). Minimax kernels for density estimatorion with for biased data. Ann. Inst. Statist. Math. 48 451--467.