The Annals of Statistics

On combinatorial testing problems

Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi

Full-text: Open access

Abstract

We study a class of hypothesis testing problems in which, upon observing the realization of an n-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been “contaminated,” that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples.

Article information

Source
Ann. Statist., Volume 38, Number 5 (2010), 3063-3092.

Dates
First available in Project Euclid: 30 August 2010

Permanent link to this document
https://projecteuclid.org/euclid.aos/1283175989

Digital Object Identifier
doi:10.1214/10-AOS817

Mathematical Reviews number (MathSciNet)
MR2722464

Zentralblatt MATH identifier
1200.62059

Subjects
Primary: 62F03: Hypothesis testing
Secondary: 62F05: Asymptotic properties of tests

Keywords
Hypothesis testing multiple hypotheses Gaussian processes

Citation

Addario-Berry, Louigi; Broutin, Nicolas; Devroye, Luc; Lugosi, Gábor. On combinatorial testing problems. Ann. Statist. 38 (2010), no. 5, 3063--3092. doi:10.1214/10-AOS817. https://projecteuclid.org/euclid.aos/1283175989


Export citation

References

  • Aldous, D. J. (1990). The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J. Discrete Math. 3 450–465.
  • Alon, N., Krivelevich, M. and Sudakov, B. (1999). Finding a large hidden clique in a random graph. Randoms Structures Algorithms 13 457–466.
  • Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
  • Arias-Castro, E., Candès, E. and Durand, A. (2009). Detection of abnormal clusters in a network. Technical report, Univ. California, San Diego.
  • Arlot, S., Blanchard, G. and Roquain, E. (2010a). Some nonasymptotic results on resampling in high dimension, I: Confidence regions. Ann. Statist. 38 51–82.
  • Arlot, S., Blanchard, G. and Roquain, E. (2010b). Some nonasymptotic results on resampling in high dimension. II. Multiple tests. Ann. Statist. 38 83–99.
  • Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
  • Benjamini, I., Lyons, R., Peres, Y. and Schramm, O. (2001). Uniform spanning forests. Ann. Probab. 29 1–65.
  • Bhattacharyya, A. (1946). On a measure of divergence between two multinomial populations. Sankhyā 7 401–406.
  • Boucheron, S., Lugosi, G. and Massart, P. (2000). A sharp concentration inequality with applications. Random Structures Algorithms 16 277–292.
  • Broder, A. (1989). Generating random spanning trees. In 30th Annual Symposium on Foundations of Computer Science 442–447. IEEE Press, Research Triangle Park, NC.
  • Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: The L1 View. Wiley, New York.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • Dubdashi, D. and Ranjan, D. (1998). Balls and bins: A study in negative dependence. Random Structures Algorithms 13 99–124.
  • Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899–929.
  • Durot, C. and Rozenholc, Y. (2006). An adaptive test for zero mean. Math. Methods Statist. 15 26–60.
  • Feder, T. and Mihail, M. (1992). Balanced matroids. In STOC’92: Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing 26–38. ACM, New York.
  • Feige, U. and Krauthgamer, R. (2000). Finding and certifying a large hidden clique in a semirandom graph. Random Structures Algorithms 16 195–208.
  • Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.
  • Grimmett, G. R. and Winkler, S. N. (2004). Negative association in uniform forests and connected graphs. Random Structures Algorithms 24 444–460.
  • Haussler, D. (1995). Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232.
  • Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
  • Ingster, Y. I. (1999). Minimax detection of a signal for lpn-balls. Math. Methods Statist. 7 401–428.
  • Jerrum, M., Sinclair, A. and Vigoda, E. (2004). A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM 51 671–697.
  • Le Cam, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41 802–828.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, New York.
  • Moon, J. W. (1970). Counting Labelled Trees. Canadian Mathematical Monographs 1. Canadian Mathematical Congress, Montreal.
  • Propp, J. G. and Wilson, D. B. (1998). How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. J. Algorithms 27 170–217.
  • Romano, J. P. and Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
  • Shabalin, A. A., Weigman, V. J., Perou, C. M. and Nobel, A. B. (2009). Finding large average submatrices in high dimensional data. Ann. Appl. Statist. 3 985–1012.
  • Slepian, D. (1962). The one-sided barrier problem for Gaussian noise. Bell System Tech. J. 41 463–501.
  • Talagrand, M. (2005). The Generic Chaining. Springer, New York.
  • Tsirelson, B. S., Ibragimov, I. A. and Sudakov, V. N. (1976). Norm of Gaussian sample function. In Proceedings of the 3rd Japan–U.S.S.R. Symposium on Probability Theory. Lecture Notes in Math. 550 20–41. Springer, Berlin.
  • Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequences of events to their probabilities. Theory Probab. Appl. 16 264–280.
  • Vonnegut, K. (1973). Breakfast of Champions. Delacorte Press, New York.