The Annals of Statistics

On combinatorial testing problems

Louigi Addario-Berry, Nicolas Broutin, Luc Devroye, and Gábor Lugosi
Source: Ann. Statist. Volume 38, Number 5 (2010), 3063-3092.

Abstract

We study a class of hypothesis testing problems in which, upon observing the realization of an n-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been “contaminated,” that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples.

First Page: Show Hide
Primary Subjects: 62F03
Secondary Subjects: 62F05
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1283175989
Digital Object Identifier: doi:10.1214/10-AOS817
Zentralblatt MATH identifier: 1200.62059
Mathematical Reviews number (MathSciNet): MR2722464

References

Aldous, D. J. (1990). The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J. Discrete Math. 3 450–465.
Mathematical Reviews (MathSciNet): MR1069105
Zentralblatt MATH: 0717.05028
Digital Object Identifier: doi:10.1137/0403039
Alon, N., Krivelevich, M. and Sudakov, B. (1999). Finding a large hidden clique in a random graph. Randoms Structures Algorithms 13 457–466.
Mathematical Reviews (MathSciNet): MR1662795
Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
Mathematical Reviews (MathSciNet): MR2435454
Zentralblatt MATH: 1143.62006
Digital Object Identifier: doi:10.1214/07-AOS526
Project Euclid: euclid.aos/1216237298
Arias-Castro, E., Candès, E. and Durand, A. (2009). Detection of abnormal clusters in a network. Technical report, Univ. California, San Diego.
Arlot, S., Blanchard, G. and Roquain, E. (2010a). Some nonasymptotic results on resampling in high dimension, I: Confidence regions. Ann. Statist. 38 51–82.
Mathematical Reviews (MathSciNet): MR2589316
Zentralblatt MATH: 1180.62066
Digital Object Identifier: doi:10.1214/08-AOS667
Project Euclid: euclid.aos/1262271609
Arlot, S., Blanchard, G. and Roquain, E. (2010b). Some nonasymptotic results on resampling in high dimension. II. Multiple tests. Ann. Statist. 38 83–99.
Mathematical Reviews (MathSciNet): MR2589317
Zentralblatt MATH: 1181.62055
Digital Object Identifier: doi:10.1214/08-AOS668
Project Euclid: euclid.aos/1262271610
Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
Mathematical Reviews (MathSciNet): MR1935648
Project Euclid: euclid.bj/1078435219
Benjamini, I., Lyons, R., Peres, Y. and Schramm, O. (2001). Uniform spanning forests. Ann. Probab. 29 1–65.
Mathematical Reviews (MathSciNet): MR1825141
Zentralblatt MATH: 1016.60009
Project Euclid: euclid.aop/1008956321
Bhattacharyya, A. (1946). On a measure of divergence between two multinomial populations. Sankhyā 7 401–406.
Mathematical Reviews (MathSciNet): MR18387
Boucheron, S., Lugosi, G. and Massart, P. (2000). A sharp concentration inequality with applications. Random Structures Algorithms 16 277–292.
Mathematical Reviews (MathSciNet): MR1749290
Broder, A. (1989). Generating random spanning trees. In 30th Annual Symposium on Foundations of Computer Science 442–447. IEEE Press, Research Triangle Park, NC.
Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: The L1 View. Wiley, New York.
Mathematical Reviews (MathSciNet): MR780746
Zentralblatt MATH: 0546.62015
Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
Mathematical Reviews (MathSciNet): MR1383093
Zentralblatt MATH: 0853.68150
Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
Mathematical Reviews (MathSciNet): MR2065195
Zentralblatt MATH: 1092.62051
Digital Object Identifier: doi:10.1214/009053604000000265
Project Euclid: euclid.aos/1085408492
Dubdashi, D. and Ranjan, D. (1998). Balls and bins: A study in negative dependence. Random Structures Algorithms 13 99–124.
Mathematical Reviews (MathSciNet): MR1642566
Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899–929.
Mathematical Reviews (MathSciNet): MR512411
Digital Object Identifier: doi:10.1214/aop/1176995384
Project Euclid: euclid.aop/1176995384
Durot, C. and Rozenholc, Y. (2006). An adaptive test for zero mean. Math. Methods Statist. 15 26–60.
Mathematical Reviews (MathSciNet): MR2225429
Feder, T. and Mihail, M. (1992). Balanced matroids. In STOC’92: Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing 26–38. ACM, New York.
Feige, U. and Krauthgamer, R. (2000). Finding and certifying a large hidden clique in a semirandom graph. Random Structures Algorithms 16 195–208.
Mathematical Reviews (MathSciNet): MR1742351
Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.
Mathematical Reviews (MathSciNet): MR1869112
Grimmett, G. R. and Winkler, S. N. (2004). Negative association in uniform forests and connected graphs. Random Structures Algorithms 24 444–460.
Mathematical Reviews (MathSciNet): MR2060630
Haussler, D. (1995). Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232.
Mathematical Reviews (MathSciNet): MR1313896
Zentralblatt MATH: 0818.60005
Digital Object Identifier: doi:10.1016/0097-3165(95)90052-7
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
Mathematical Reviews (MathSciNet): MR144363
Zentralblatt MATH: 0127.10602
Digital Object Identifier: doi:10.2307/2282952
Ingster, Y. I. (1999). Minimax detection of a signal for lpn-balls. Math. Methods Statist. 7 401–428.
Mathematical Reviews (MathSciNet): MR1680087
Zentralblatt MATH: 1103.62312
Jerrum, M., Sinclair, A. and Vigoda, E. (2004). A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM 51 671–697.
Mathematical Reviews (MathSciNet): MR2147852
Digital Object Identifier: doi:10.1145/1008731.1008738
Le Cam, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41 802–828.
Mathematical Reviews (MathSciNet): MR267676
Zentralblatt MATH: 0246.62039
Digital Object Identifier: doi:10.1214/aoms/1177696960
Project Euclid: euclid.aoms/1177696960
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, New York.
Mathematical Reviews (MathSciNet): MR1102015
Zentralblatt MATH: 0748.60004
Moon, J. W. (1970). Counting Labelled Trees. Canadian Mathematical Monographs 1. Canadian Mathematical Congress, Montreal.
Mathematical Reviews (MathSciNet): MR274333
Propp, J. G. and Wilson, D. B. (1998). How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. J. Algorithms 27 170–217.
Mathematical Reviews (MathSciNet): MR1622393
Zentralblatt MATH: 0919.68092
Digital Object Identifier: doi:10.1006/jagm.1997.0917
Romano, J. P. and Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
Mathematical Reviews (MathSciNet): MR2156821
Zentralblatt MATH: 1117.62416
Digital Object Identifier: doi:10.1198/016214504000000539
Shabalin, A. A., Weigman, V. J., Perou, C. M. and Nobel, A. B. (2009). Finding large average submatrices in high dimensional data. Ann. Appl. Statist. 3 985–1012.
Slepian, D. (1962). The one-sided barrier problem for Gaussian noise. Bell System Tech. J. 41 463–501.
Mathematical Reviews (MathSciNet): MR133183
Talagrand, M. (2005). The Generic Chaining. Springer, New York.
Mathematical Reviews (MathSciNet): MR2133757
Tsirelson, B. S., Ibragimov, I. A. and Sudakov, V. N. (1976). Norm of Gaussian sample function. In Proceedings of the 3rd Japan–U.S.S.R. Symposium on Probability Theory. Lecture Notes in Math. 550 20–41. Springer, Berlin.
Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequences of events to their probabilities. Theory Probab. Appl. 16 264–280.
Vonnegut, K. (1973). Breakfast of Champions. Delacorte Press, New York.

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?