The Annals of Statistics

Estimation of sums of random variables: Examples and information bounds

Cun-Hui Zhang

Abstract

This paper concerns the estimation of sums of functions of observable and unobservable variables. Lower bounds for the asymptotic variance and a convolution theorem are derived in general finite- and infinite-dimensional models. An explicit relationship is established between efficient influence functions for the estimation of sums of variables and the estimation of their means. Certain “plug-in” estimators are proved to be asymptotically efficient in finite-dimensional models, while “u,v” estimators of Robbins are proved to be efficient in infinite-dimensional mixture models. Examples include certain species, network and data confidentiality problems.

Article information

Source
Ann. Statist., Volume 33, Number 5 (2005), 2022-2041.

Dates
First available in Project Euclid: 25 November 2005

Permanent link to this document
https://projecteuclid.org/euclid.aos/1132936555

Digital Object Identifier
doi:10.1214/009053605000000390

Mathematical Reviews number (MathSciNet)
MR2211078

Zentralblatt MATH identifier
1086.62035

Citation

Zhang, Cun-Hui. Estimation of sums of random variables: Examples and information bounds. Ann. Statist. 33 (2005), no. 5, 2022--2041. doi:10.1214/009053605000000390. https://projecteuclid.org/euclid.aos/1132936555

References

• Benedetti, R. and Franconi, L. (1998). Statistical and technological solutions for controlled data dissemination. In Pre-proceedings of New Techniques and Technologies for Statistics, Sorrento 1 225--232.
• Bethlehem, J., Keller, W. and Pannekoek, J. (1990). Disclosure control of microdata. J. Amer. Statist. Assoc. 85 38--45.
• Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore.
• Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: A review. J. Amer. Statist. Assoc. 88 364--373.
• Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scand. J. Statist. 11 265--270.
• Chao, A. and Bunge, J. (2002). Estimating the number of species in a stochastic abundance model. Biometrics 58 531--539.
• Clauset, A. and Moore, C. (2003). Traceroute sampling makes random graphs appear to have power law degree. Preprint.
• Coates, A., Hero, A., Nowak, R. and Yu, B. (2002). Internet tomography. IEEE Signal Processing Magazine 19(3) 47--65.
• Darroch, J. N. and Ratcliff, D. (1980). A note on capture--recapture estimation. Biometrics 36 149--153.
• Duncan, G. T. and Pearson, R. W. (1991). Enhancing access to microdata while protecting confidentiality: Prospects for the future (with discussion). Statist. Sci. 6 219--239.
• Engen, S. (1974). On species frequency models. Biometrika 61 263--270.
• Faloutsos, M., Faloutsos, P. and Faloutsos, C. (1999). On power-law relationships of the Internet topology. In Proc. ACM SIGCOMM 1999 251--262. ACM Press, New York.
• Fisher, R. A., Corbet, A. S. and Williams, C. B. (1943). The relation between the number of species and the number of individuals in a random sample of an animal population. J. Animal Ecology 12 42--58.
• Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika 40 237--264.
• Govindan, R. and Tangmunarunkit, H. (2000). Heuristics for Internet map discovery. In Proc. IEEE INFOCOM 2000 3 1371--1380. IEEE Press, New York.
• Lakhina, A., Byers, J., Crovella, M. and Xie, P. (2003). Sampling biases in IP topology measurements. In Proc. IEEE INFOCOM 2003 1 332--341. IEEE Press, New York.
• Pfanzagl, J. (with the assistance of W. Wefelmeyer) (1982). Contributions to a General Asymptotic Statistical Theory. Lecture Notes in Statist. 13. Springer, New York.
• Polettini, S. and Seri, G. (2003). Guidelines for the protection of social micro-data using individual risk methodology. Application within $\mu$-Argus version 3.2, CASC Project Deliverable No. 1.2-D3. Available at neon.vb.cbs.nl/casc/deliv/12D3_guidelines.pdf.
• Rao, C. R. (1971). Some comments on the logarithmic series distribution in the analysis of insect trap data. In Statistical Ecology (G. P. Patil, E. C. Pielou and W. E. Waters, eds.) 1 131--142. Pennsylvania State Univ. Press, University Park.
• Rieder, H. (2000). One-sided confidence about functionals over tangent cones. Available at www.uni-bayreuth.de/departments/math/org/mathe7/RIEDER/pubs/cc.pdf.
• Rinott, Y. (2003). On models for statistical disclosure risk estimation. Working paper no. 16, Joint ECE/Eurostat Work Session on Data Confidentiality, Luxemburg, 2003. Available at www.unece.org/stats/documents/2003/04/confidentiality/wp.16.e.pdf.
• Robbins, H. (1977). Prediction and estimation for the compound Poisson distribution. Proc. Natl. Acad. Sci. U.S.A. 74 2670--2671.
• Robbins, H. (1980). An empirical Bayes estimation problem. Proc. Natl. Acad. Sci. U.S.A. 77 6988--6989.
• Robbins, H. (1988). The $u,v$ method of estimation. In Statistical Decision Theory and Related Topics IV (S. S. Gupta and J. O. Berger, eds.) 1 265--270. Springer, New York.
• Robbins, H. and Zhang, C.-H. (1988). Estimating a treatment effect under biased sampling. Proc. Natl. Acad. Sci. U.S.A. 85 3670--3672.
• Robbins, H. and Zhang, C.-H. (1989). Estimating the superiority of a drug to a placebo when all and only those patients at risk are treated with the drug. Proc. Natl. Acad. Sci. U.S.A. 86 3003--3005.
• Robbins, H. and Zhang, C.-H. (1991). Estimating a multiplicative treatment effect under biased allocation. Biometrika 78 349--354.
• Robbins, H. and Zhang, C.-H. (2000). Efficiency of the $u,v$ method of estimation. Proc. Natl. Acad. Sci. U.S.A. 97 12,976--12,979.
• Sampford, M. R. (1955). The truncated negative binomial distribution. Biometrika 42 58--69.
• Spring, N., Mahajan, R. and Wetherall, D. (2002). Measuring ISP topologies with rocketfuel. In Proc. ACM SIGCOMM 2002 133--145. ACM Press, New York.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press.
• Vardi, Y. (1996). Network tomography: Estimating source-destination traffic intensities from link data. J. Amer. Statist. Assoc. 91 365--377.