The Annals of Statistics

Nonparametric regression with homogeneous group testing data

Aurore Delaigle and Peter Hall

Full-text: Open access


We introduce new nonparametric predictors for homogeneous pooled data in the context of group testing for rare abnormalities and show that they achieve optimal rates of convergence. In particular, when the level of pooling is moderate, then despite the cost savings, the method enjoys the same convergence rate as in the case of no pooling. In the setting of “over-pooling” the convergence rate differs from that of an optimal estimator by no more than a logarithmic factor. Our approach improves on the random-pooling nonparametric predictor, which is currently the only nonparametric method available, unless there is no pooling, in which case the two approaches are identical.

Article information

Ann. Statist. Volume 40, Number 1 (2012), 131-158.

First available in Project Euclid: 15 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression

Bandwidth local polynomial estimator pooling prevalence smoothing


Delaigle, Aurore; Hall, Peter. Nonparametric regression with homogeneous group testing data. Ann. Statist. 40 (2012), no. 1, 131--158. doi:10.1214/11-AOS952.

Export citation


  • Bilder, C. R. and Tebbs, J. M. (2009). Bias, efficiency, and agreement for group-testing regression models. J. Stat. Comput. Simul. 79 67–80.
  • Chen, C. L. and Swallow, W. H. (1990). Using group testing to estimate a proportion, and to test the binomial model. Biometrics 46 1035–1046.
  • Chen, P., Tebbs, J. M. and Bilder, C. R. (2009). Group testing regression models with fixed and random effects. Biometrics 65 1270–1278.
  • Delaigle, A. and Hall, P. (2011). Supplement to “Nonparametric regression with homogeneous group testing data.” DOI:10.1214/11-AOS952SUPP.
  • Delaigle, A. and Meister, A. (2011). Nonparametric regression analysis for group testing data. J. Amer. Statist. Assoc. 106 640–650.
  • Dorfman, R. (1943). The detection of defective members of large populations. Ann. Math. Statist. 14 436–440.
  • Fahey, J. W., Ourisson, P. J. and Degnan, F. H. (2006). Pathogen detection, testing, and control in fresh broccoli sprouts. Nutrition J. 5 13.
  • Fan, J. (1993). Local linear regression smoothers and their minimax efficiencies. Ann. Statist. 21 196–216.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. Chapman and Hall, London.
  • Fan, J., Heckman, N. E. and Wand, M. P. (1995). Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J. Amer. Statist. Assoc. 90 141–150.
  • Gastwirth, J. L. and Hammick, P. A. (1989). Estimation of the prevalence of a rare disease, preserving the anonymity of the subjects by group testing: Applications to estimating the prevalence of AIDS antibodies in blood donors. J. Statist. Plann. Inference 22 15–27.
  • Gastwirth, J. L. and Johnson, W. O. (1994). Screening with cost-effective quality control: Potential applications to HIV and drug testing. J. Amer. Statist. Assoc. 89 972–981.
  • Hardwick, J., Page, C. and Stout, Q. F. (1998). Sequentially deciding between two experiments for estimating a common success probability. J. Amer. Statist. Assoc. 93 1502–1511.
  • Lennon, J. T. (2007). Diversity and metabolism of marine bacteria cultivated on dissolved DNA. Applied and Environmental Microbiology 73 2799–2805.
  • Nagi, M. S. and Raggi, L. G. (1972). Importance to “airsac” disease of water supplies contaminated with pathogenic Escherichia coli. Avian Diseases 16 718–723.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Thorburn, D., Dundas, D., McCruden, E., Cameron, S., Goldberg, D., Symington, I., Kirk, A. and Mills, P. (2001). A study of hepatitis C prevalence in healthcare workers in the west of Scotland. Gut 48 116–120.
  • Vansteelandt, S., Goetghebeur, E. and Verstraeten, T. (2000). Regression models for disease prevalance with diagnostic tests on pools of serum samples. Biometrics 56 1126–1133.
  • Wahed, M. A., Chowdhury, D., Nermell, B., Khan, S. I., Ilias, M., Rahman, M., Persson, L. A. and Vahter, M. (2006). A modified routine analysis of arsenic content in drinking-water in Bangladesh by hydride generation-atomic absorption spectrophotometry. J. Health, Population and Nutrition 24 36–41.
  • Xie, M. (2001). Regression analysis of group testing samples. Stat. Med. 20 1957–1969.

Supplemental materials

  • Supplementary material: Additional material. The supplementary article contains a description of Delaigle and Meister’s method, details for bandwidth choice, an alternative procedure for multivariate setting and unequal groups, and additional numerical results.