Source: Ann. Statist. Volume 38, Number 5
(2010), 3063-3092.
We study a class of hypothesis testing problems in which, upon observing the realization of an n-dimensional Gaussian vector, one has to decide whether the vector was drawn from a standard normal distribution or, alternatively, whether there is a subset of the components belonging to a certain given class of sets whose elements have been “contaminated,” that is, have a mean different from zero. We establish some general conditions under which testing is possible and others under which testing is hopeless with a small risk. The combinatorial and geometric structure of the class of sets is shown to play a crucial role. The bounds are illustrated on various examples.
References
Aldous, D. J. (1990). The random walk construction of uniform spanning trees and uniform labelled trees. SIAM J. Discrete Math. 3 450–465.
Alon, N., Krivelevich, M. and Sudakov, B. (1999). Finding a large hidden clique in a random graph. Randoms Structures Algorithms 13 457–466.
Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
Arias-Castro, E., Candès, E. and Durand, A. (2009). Detection of abnormal clusters in a network. Technical report, Univ. California, San Diego.
Arlot, S., Blanchard, G. and Roquain, E. (2010a). Some nonasymptotic results on resampling in high dimension, I: Confidence regions. Ann. Statist. 38 51–82.
Arlot, S., Blanchard, G. and Roquain, E. (2010b). Some nonasymptotic results on resampling in high dimension. II. Multiple tests. Ann. Statist. 38 83–99.
Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
Benjamini, I., Lyons, R., Peres, Y. and Schramm, O. (2001). Uniform spanning forests. Ann. Probab. 29 1–65.
Bhattacharyya, A. (1946). On a measure of divergence between two multinomial populations. Sankhyā 7 401–406.
Mathematical Reviews (MathSciNet):
MR18387
Boucheron, S., Lugosi, G. and Massart, P. (2000). A sharp concentration inequality with applications. Random Structures Algorithms 16 277–292.
Broder, A. (1989). Generating random spanning trees. In 30th Annual Symposium on Foundations of Computer Science 442–447. IEEE Press, Research Triangle Park, NC.
Devroye, L. and Györfi, L. (1985). Nonparametric Density Estimation: The L1 View. Wiley, New York.
Mathematical Reviews (MathSciNet):
MR780746
Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
Dubdashi, D. and Ranjan, D. (1998). Balls and bins: A study in negative dependence. Random Structures Algorithms 13 99–124.
Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899–929.
Mathematical Reviews (MathSciNet):
MR512411
Durot, C. and Rozenholc, Y. (2006). An adaptive test for zero mean. Math. Methods Statist. 15 26–60.
Feder, T. and Mihail, M. (1992). Balanced matroids. In STOC’92: Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing 26–38. ACM, New York.
Feige, U. and Krauthgamer, R. (2000). Finding and certifying a large hidden clique in a semirandom graph. Random Structures Algorithms 16 195–208.
Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.
Grimmett, G. R. and Winkler, S. N. (2004). Negative association in uniform forests and connected graphs. Random Structures Algorithms 24 444–460.
Haussler, D. (1995). Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30.
Mathematical Reviews (MathSciNet):
MR144363
Ingster, Y. I. (1999). Minimax detection of a signal for lpn-balls. Math. Methods Statist. 7 401–428.
Jerrum, M., Sinclair, A. and Vigoda, E. (2004). A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. J. ACM 51 671–697.
Le Cam, L. (1970). On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41 802–828.
Mathematical Reviews (MathSciNet):
MR267676
Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Springer, New York.
Moon, J. W. (1970). Counting Labelled Trees. Canadian Mathematical Monographs 1. Canadian Mathematical Congress, Montreal.
Mathematical Reviews (MathSciNet):
MR274333
Propp, J. G. and Wilson, D. B. (1998). How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a directed graph. J. Algorithms 27 170–217.
Romano, J. P. and Wolf, M. (2005). Exact and approximate stepdown methods for multiple hypothesis testing. J. Amer. Statist. Assoc. 100 94–108.
Shabalin, A. A., Weigman, V. J., Perou, C. M. and Nobel, A. B. (2009). Finding large average submatrices in high dimensional data. Ann. Appl. Statist. 3 985–1012.
Slepian, D. (1962). The one-sided barrier problem for Gaussian noise. Bell System Tech. J. 41 463–501.
Mathematical Reviews (MathSciNet):
MR133183
Talagrand, M. (2005). The Generic Chaining. Springer, New York.
Tsirelson, B. S., Ibragimov, I. A. and Sudakov, V. N. (1976). Norm of Gaussian sample function. In Proceedings of the 3rd Japan–U.S.S.R. Symposium on Probability Theory. Lecture Notes in Math. 550 20–41. Springer, Berlin.
Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequences of events to their probabilities. Theory Probab. Appl. 16 264–280.
Vonnegut, K. (1973). Breakfast of Champions. Delacorte Press, New York.