The Sparse Poisson Means Model

We consider the problem of detecting a sparse Poisson mixture. Our results parallel those for the detection of a sparse normal mixture, pioneered by Ingster (1997) and Donoho and Jin (2004), when the Poisson means are larger than logarithmic in the sample size. In particular, a form of higher criticism achieves the detection boundary in the whole sparse regime. When the Poisson means are smaller than logarithmic in the sample size, a different regime arises in which simple multiple testing with Bonferroni correction is enough in the sparse regime. We present some numerical experiments that confirm our theoretical findings.


Introduction
The Poisson distribution is well suited to model count data in a broad variety of scientific and engineering fields.In this paper, we consider a stylized detection problem where we observe n independent Poisson counts X 1 , . . ., X n from a mixture where and ε ∈ [0, 1] is the fraction of the non-null effects.All the parameters are allowed to change with n.We are interested in detecting whether there are any non-null effects in the sample.Specifically, we know the null means , λ 1 , . . ., λ n , and our goal is to test Put differently, we want to address the following multiple hypotheses problem H 0,i : We do assume that ε is the same for all i, although this is done for ease of exposition.This model may arise in goodness-of-fit testing for homogeneity in a Poisson process.Suppose we record the arrival time of alpha particles over a time period and we are interested in testing for uniformity.One way to do so is to partition the time period into non-overlapping intervals, and count how many particles arrived with each interval.These counts can be modeled by a Poisson distribution.For this problem, and any other discrete goodness-of-fit testing problems, one would typically use Pearson's chi-squared test, but we show that, under some mild conditions, this test is (grossly) suboptimal in the sparse regime where ε = ε n = o(1/ √ n).In another situation, we might be interested in detecting genes that are differentially expressed.Marioni et al. (2008) find that the variation of count data across technical replicates can be captured using a Poisson model when the over-(or under-) dispersion is not significant.Suppose we know the Poisson mean count for each gene expressed under normal conditions and want to detect a difference in expression under some other (treatment) condition.
In the model we consider here (1) the sparsity assumption is on the number of nonzero effects, which on average is nε.We assume that ε → 0, so the number of nonzero effects is negligible compared to the number n of bins or genes being tested.And so there are some nonzero effects under the alternative, we assume throughout the paper that nε → ∞. (4) We note that sparsity here has a different meaning from the use in the literature on sparse multinomials (Holst, 1972;Morris, 1975).We note that sparsity here has a different meaning from the use in the literature on sparse multinomials Holst (1972); Morris (1975), where the number of the bins is large so that some bins have small expected counts.
The Poisson sparse mixture model we consider here is analogous to the normal sparse mixture model pioneered by Ingster (1997) and Donoho and Jin (2004), where the normal location family N (λ, λ) plays the role of the Poisson family Pois(λ).(We note that in the normal model, one can work with N (µ, 1), µ = √ λ, without loss of generality, while such a reduction does not apply to the Poisson model.)Our results for the Poisson model are completely parallel to those for the normal model when the Poisson means are large enough that the normalized counts are uniformly well-approximated by the standard normal distribution under the null.Specifically, we show that this is the case when min (For two sequences (a n ), (b n ) ⊂ R + , a n b n means that a n /b n → ∞.)In particular, we show that multiple testing via the higher criticism, which Donoho and Jin (2004) developed based on an idea of J. Tukey, is asymptotically optimal to first order, just as in the normal model.To show this, we use care in approximating the tails of the Poisson distribution with the tails of the normal distribution.This is done by standard moderate deviations bounds.
When the Poisson means are smaller, by which we mean we uncover a different regime where multiple testing via Bonferroni correction is optimal in the sparse regime.In this regime, the normal approximation to the Poisson distribution is not uniformly valid, and in fact not valid at all for those indices i for which λ i remains fixed.We use large deviations bounds to control the tails of the Poisson distribution.
In any case, we assume that the expected counts are lower bounded by a positive constant, concretely This is to make the paper self-contained, and also because in practice it is common to pool together bins to make the expected counts larger than some pre-specified minimum.
The remainder of the paper is organized as follows.In Section 2, we derive information lower bounds under various conditions on the Poisson means.In Section 3, we study the Pearson's chisquared goodness-of-fit test and also the max test, which is closely related to multiple testing with Bonferroni correction, showing that none of them is optimal in all sparsity regimes.We then study the higher criticism and show that it is optimal in all sparsity regimes, matching the information bound to first-order.In Section 4, we show the result of some numerical simulations to accompany our theoretical findings.Section 7 is a discussion section.The proofs are gathered in Section 5. We then briefly touch on the one-sided setting in Section 6.

Information Bounds
We are particularly interested in regimes where the proportion of non-null effects tends to zero as the sample size grows to infinity, i.e. ε → 0 as n → ∞.We follow the literature on the normal sparse mixture model (Cai et al., 2011;Donoho and Jin, 2004;Ingster, 1997).We parameterize and consider two regimes where the detection problem behaves quite differently: the sparse regime where β ∈ (1/2, 1) and the dense regime where β ∈ (0, 1/2).We then parameterize the Poisson means in (1) differently in each regime.When the λ i 's are relatively large, we are guided by the correspondence between the normal model and the Poisson model via the normalized counts (5).Suppose we know the fraction ε and all null and non-null Poisson rates.By the Neyman-Pearson fundamental lemma, the most powerful test for this simple versus simple hypothesis testing problem is the likelihood ratio test (LRT).Hence the performance of the LRT gives an information bound for this detection problem.We investigate this information bound by finding the conditions such that the risk (the sum of probabilities of type I and type II errors) of LRT goes to one as n → ∞.We say a test is asymptotically powerful when its risk tends to zero and asymptotically powerless when its risk tends to one.All the limits are with respect to n → ∞.

Dense Regime
Guided by the correspondence with the normal model, in the dense regime where β < 1/2, we parameterize the effects as follows where s ∈ R is fixed.Define Proposition 1.Consider the testing problem (3) with parameterizations (9) with β < 1/2 and (10).All tests are asymptotically powerless if The expert will recognize the perfect correspondence with the detection boundary for the dense regime in the two-sided detection problem in the normal model.

Sparse Regime
Guided by the correspondence with the normal model, in the sparse regime where β > 1/2, we start by parameterizing the effects as follows where r ∈ (0, 1) is fixed.Define Proposition 2. Consider the testing problem (3) with parameterizations (9) with β > 1/2 and (13) with (6).All tests are asymptotically powerless if Thus, Propositions 1 and 2 together show that, when (6) holds, meaning that min i λ i log n, the detection boundary for the Poisson model is in perfect correspondence with the detection boundary for the normal model.
When the null means (λ i : i = 1, . . ., n) are smaller, a different detection boundary emerges in the sparse regime.To better describe the detection boundary that follows, we adopt the following parameterization Indeed, this particular case corresponds to ∆ i = λ 1−γ i (log n) γ , and assuming the λ i 's are smaller than log n as we do, this implies that λ i = 0, as it cannot be negative.Proposition 3. Consider the testing problem (3) with parameterizations (9) with β > 1/2 and (16) with (7) and (8).All tests are asymptotically powerless if γ < β.

Tests
In this section we analyze some tests that are shown to achieve parts of the detection boundary.We find that the chi-squared test achieves the detection boundary in the dense regime, the test based on the maximum normalized count (which is closely related to multiple testing with Bonferroni correction) achieves the detection boundary in the very sparse regime, while multiple testing with the higher criticism achieves the detection boundary in all regimes.

The chi-squared test
We start by analyzing Pearson's chi-squared test, which rejects for large values of The rationale behind using this test is two-fold.On the one hand, D = i Z 2 i -where the Z i 's are defined in (5) -is the analog of the chi-squared test that plays a role in detecting a normal mean in the dense regime.On the other hand, this is one of the most popular approaches for goodness-of-fit testing if one interprets X 1 , . . ., X n as the counts in a sample of size N ∼ Pois( i λ i ) with values in {1, . . ., n}.
Although we could state a more general result, we opt for simplicity and state a performance bound when the expected counts are not too small.Proposition 4. Consider the testing problem (3) with (8), and let a i = ∆ 2 i /λ i .Then chi-squared test is asymptotically powerful if and asymptotically powerless if From this, we immediately obtain the following result, which at once states that the chi-squared test achieves the detection boundary in the dense regime, and does not achieve the detection boundary in the sparse regime.
Other classical goodness-of-tests include the (generalized) likelihood ratio G 2 test and the Freeman-Tukey test.Adapted to our context, the likelihood ratio G 2 test rejects for large values of while the Freeman-Tukey test rejects large values of We did not investigate these tests in detail, but partial work suggests that they are (as expected) equivalent to the chi-squared in the regimes we are most interested in.

The max test
In analogy with the normal model, we consider the max test which rejects large values of where the Z i 's are defined in (5).
Hence, the max test achieves the detection boundary (14) in the very sparse regime where β ∈ (3/4, 1).We speculate that, just as in the normal model, the max test does not achieve the detection boundary when β < 3/4.

The higher criticism test
In the normal model, Donoho and Jin (2004) advocate a test based on the normalized empirical process of the Z i 's.In our case, these variables are not identically distributed.It would make sense to convert these to P-values, then, and we will comment on that in Section 3.4.For now, we opt for the following definition where We consider the higher criticism test rejects for large values of T .This definition extends the higher criticism of Donoho and Jin (2004), in particular the variant HC+, to the case where the test statistics are not identically distributed under the null -and cannot be transformed to be so.The discretization of the supremum makes the control under the null particularly simple.
We speculate that, just as in the normal model, the higher criticism is also able to achieve the detection boundary in the dense regime.

Multiple testing: Fisher, Bonferroni and Tukey
We now take a multiple testing perspective.In multiple testing jargon, our null hypothesis H 0 is the complete null, since Several possible definitions for P-values are possible here.We define the P-value for the ith hypothesis testing problem as follows There does not seem to be a consensus on the definition of P-value for asymmetric discrete null distributions (Dunne et al., 1996).We speculate that any reasonable definition leads to the same asymptotic results in our context.We note that the p i 's are independent, but they are discrete, and therefore not uniformly distributed in (0, 1) under the complete null.In fact, they are not even identically distributed unless the λ i 's are all equal.That said, for each i, the null distribution of p i stochastically dominates the uniform distribution.
Lemma 1. (Lehmann and Romano, 2005, Lem 3.3.1)For any λ > 0, With P-values now defined, we can draw from the literature on multiple comparisons and make correspondences with the tests that we studied in the previous sections.

Fisher's method
The chi-squared test is, in our context, intimately related to multiple testing with Fisher's method, which rejects the complete null for large values of We speculate that, like Pearson's chi-squared test, Fisher's method achieves the detection boundary in the dense regime.We were able to prove it in the simpler one-sided setting.Details are postponed to Section 6.

Bonferroni's method
The max test is, in turn, intimately related to multiple testing with Bonferroni's method, which rejects the (complete) null for small values of min i=1,...,n p i .
In fact, the two procedures are identical when the λ i 's are all equal.One can show that Proposition 5 applies to the Bonferroni test also.Instead of formally proving this, we focus on complementing the lower bound established in Proposition 3.
We note that the same is true if we merely focus on the large Z i 's, meaning, if we replace the two-sided P-values p i with In fact, one cannot exploit the assumption that λ i = 0 for all i.Indeed, if we consider the test that rejects for large values of Y := #{i : X i = 0}, it is asymptotically powerless.This follows from an application of Lemma 5.By a simple application of Lyapunov's central limit theorem and (8), Y is asymptotically normal both under the null and the alternative.Moreover, where we used ( 8) and ( 7), while and, after some simple calculations using (8), We can easily check that the conditions of Lemma 5 are satisfied when β > 1/2.

Tukey's higher criticism
This brings us back to the higher criticism, which is some sense is an intermediate method between Fisher's and Bonferroni's methods.Donoho and Jin (2004) attribute to Tukey the idea of testing the complete null based on the maximum of the normalized empirical process of the P-values, which equivalently leads to rejecting for larges values of where

Simulations
We present the result of some numerical experiments whose purpose is to see the behavior of the various tests in finite samples.So the asymptotic analysis is relevant, we chose to work with n = 10 4 and n = 10 6 .In some bioinformatics/genetics applications, n could be in the millions.We compare the tests in terms of their power when the level is controlled at α = 0.05 by simulation.(We generate the test statistic 500 times under the null and take the (1 − α)-quantile as the critical value.)The power against a particular alternative is then obtained empirically from 200 repeats.We note that, for the higher criticism, we work with the P-values defined in (24) and their corresponding null distribution where T := {t ∈ (0, 1) : 1/n ≤ F i (t) ≤ 1/2, i = 1, . . ., n}.We note that (28) is a generalized form of Tukey's higher criticism (27) for the case where p i 's are not identically distributed.Thus we find (28) more natural than ( 23), but the two are very closely related and the latter is more easily amenable to mathematical analysis.In practice, we estimate F i by simulation.
In the first set of experiments, we investigate how the test performance matches the theoretical information boundary (11).We set n = 10 6 , all the λ i 's equal to λ 0 = 15 > log(n) ≈ 14, and vary β in the range of (0, 0.5) with 0.025 increments and s in the range of [−0.5, 0] with 0.025 increments.When the λ i 's are all equal, Bonferroni's method is equivalent to the max test, and is therefore omitted.The results are summarized in Figure 1.We see that the phase transition phenomenon is clear.We can see the performance of the chi-squared test and Fisher's method are similar and comparable with the higher criticism, and achieve the asymptotic detection boundary.As expected, the max test has hardly any power in the dense regime.We note that very similar trends are observed in the normal means model.
In the second set of experiments, we generate settings where the λ i 's are different.We take n = 10 4 and fix β = 0.2, and the λ i 's are generated iid from λ 0 + Exp(λ 0 ), where Exp(λ) denotes the exponential distribution with mean λ, and we let λ 0 ∈ {1, 10, 100}.The results are summarized in Figure 2. We can see the chi-squared test and Fisher's method perform similarly and are the best, closely followed by the higher criticism.The max test and the Bonferroni's method perform similarly and poorly, as expected.The effect of λ 0 does not seem important.

In the sparse regime
In the sparse regime, we have (9) with β ∈ (1/2, 1) and the parameterization (2) with ( 13).The experiments are otherwise parallel to those performed in the dense regime.
In the first set of experiments, we set n = 10 6 , means all equal to λ 0 = 15, and vary β in the range [0.5, 1] with increments of 0.025, and r in the range [0, 1] with increments of 0.05.The results are summarized in Figure 3.While the chi-squared test is not competitive, as expected, we can see that the higher criticism has more power in the moderately sparse regime where β ∈ (0.5, 0.75), while the max test is clearly the best in the very sparse regime where β ∈ (0.75, 1).The asymptotic detection boundary is seen to be fairly accurate, although less so as β approaches 1, where the asymptotics take longer to come into effect.(For example, when n = 10 6 and β = 0.9, there are only n 1−0.9 ≈ 4 anomalies.)We note that very similar trends are observed in the normal means model.
In the second set of experiments, we set n = 10 4 and β = 0.6 (moderately sparse) or β = 0.8 (very sparse), and the λ i 's are generated iid from λ 0 + Exp(λ 0 ), where λ 0 ∈ {1, 10, 100}.The simulation results are reported in Figure 4 and Figure 5.We can see that the max test and Bonferroni's method perform similarly, and dominate in the very sparse regime.The chi-squared test is somewhat better than Fisher's method, and in some measure competitive in the moderately sparse regime, but essentially powerless in the very sparse regime.The higher criticism is the clear winner in the moderately sparse regime, as expected, and holds its own in the very sparse regime, although clearly inferior to the max test.Comparing the results for different λ 0 , we may conclude that, in the sparse regime, smaller counts (i.e., small λ 0 ) make the problem more difficult -at least in this finite sample setting.When X and Y are random variables, X ∼ Y means they have the same distribution.For a random variable X and distribution F , X ∼ F means that X has distribution F .For a sequence of random variables (X n ) and a distribution F , X n F means that X n converges in distribution to F .Everywhere, we identify a distribution and its cumulative distribution function.For a distribution F , F (x) = 1 − F (x) will denote its survival function.We say that an event E n hold with high probability (w.h.p.) if P(E n ) → 1 as n → ∞.
We let P 0 , E 0 , Var 0 (resp.P 0,i , E 0,i , Var 0,i ) and P 1 , E 1 , Var 1 (resp.P 1,i , E 1,i , Var 1,i ) denote the probability, expectation and variance under the null (resp.null at observation i) and alternative (resp.alternative at observation i), respectively.Recall that Υ λ denotes a random variable with the Poisson distribution with mean λ, denoted P λ , so that for a set A, P λ (A) = P (Υ λ ∈ A).

Preliminaries
We state here a few results that will be used later on in the proofs of the main results stated earlier in the paper.We start with a couple of facts about the Poisson distribution.
The following are moderate deviation bounds for the Poisson distribution Pois(λ) as λ → ∞.
Lemma 2. Let a : (0, ∞) → (0, ∞) be such that a(λ) → ∞ and a(λ)/λ → 0 as λ → ∞.Then Proof.We focus on the first statement.Let m = [λ] and take Y 1 , . . ., Y m+1 iid Poisson with mean 1. Fixing ε ∈ (0, 1), we have where where in the first inequality we used the fact that Υ λ is stochastically bounded from above by m+1 i=1 Y i , and in the second inequality we used the union bound.By (Dembo and Zeitouni, 1998, Th 3.7 And using the fact that P(Υ 1 ≥ x)/ P(Υ 1 = x) → 1 as x → ∞, we have Since a(λ) = o(m), we have that II = o(I), and conclude that lim sup and because ε > 0 is arbitrary, we may take ε = 0 in this last display.The reverse inequality is proved similarly.
The following are concentration bounds for the Poisson distribution.For a real x, let x (resp.x ) denote the smallest (resp.largest) integer greater (resp.smaller) than or equal to x. Lemma 3.For x ≥ 0, define h(x) = x log(x) − x + 1, with h(0) = 0.Then, for any λ > 0, Proof.The upper bounds result from a straightforward application of Chernoff's bound.For the first lower bound, take x ≥ λ and let m = x .Then The second lower bound is proved similarly.
Lemma 4.There is a universal constant C > 0 such that Proof.Let m = λ be the smallest integer greater than or equal to λ.It is enough to prove the result when λ ≥ 1, in which case 1/2 ≤ λ/m ≤ 1.Take Y 1 , . . ., Y m are iid Pois(λ/m), so that The result now follows by the Berry-Esseen theorem.
The following lemma is standard, and appears for example in (Arias-Castro and Wang, 2013).
Lemma 5. Consider a test that rejects for large values of a statistic T n with finite second moment, both under the null and alternative hypotheses.Then the test that rejects when Assume in addition that T n is asymptotically normal, both under the null and alternative hypotheses.Then the test is asymptotically powerless if Finally, we state without proof the following simple result.

Proof of Proposition 1
Here we use the second moment method without truncation, which amounts to proving that Var 0 (L) → 0, or equivalently, E 0 (L 2 ) ≤ 1 + o(1), where L is the likelihood ratio where We have , where = 1 + a n , where a n := ε 2 cosh(n 2s ) − 1 .
In the third line we used the fact that ∞ x=0 λ x /x! = e λ for all λ ∈ R, and in the fourth line we used (10).Condition ( 12) and the fact that β < 1/2 imply that s < 0, and a Taylor expansion gives a n ≤ n −2β+4s , eventually.We deduce that E 0 (L 2 ) ≤ (1 + a n ) n , and the RHS tends to 1 when na n → 0, which is the case because of (12).

Proof of Proposition 2
We use the truncated second moment method of Ingster in the form put forth by Butucea et al. (2013).Define where η > 0 is chosen small enough that ( 33) and (34) hold simultaneously.
Define the truncated likelihood function, where L i is defined in (31).As in Butucea et al. (2013), it suffices to prove that First moment.We have Applying Lemma 2, using (13) and the fact that λ i ∼ λ i ∼ λ i log n because of (6), we get uniformly over i = 1, . . ., n.Hence, Using the expression for ε, we have .
Second moment.We have where In the third line we used the fact that (a + b) 2 ≤ 2a 2 + 2b 2 for all a, b ∈ R.
When η = 0, the exponent is equal to Hence, when η > 0 is small enough, We conclude that E 0 ( L2 i ) ≤ 1 + o(n −1 ), uniformly in i, which implies that

Proof of Proposition 3
The proof parallels that of Proposition 2.Here we define where c is a small positive constant that will be chosen later on, and consider the following truncated likelihood First moment.Taking into account the fact that λ i = 0, it suffices to prove that Hence, using Lemma 3, we get log as soon as ζ min / log(ζ min ) is large enough.This implies that max → ∞ eventually, and using Lemma 3, we get log as soon as ζ 1−γ min / log(ζ min ) is large enough.Since γ < β by assumption, this implies ε max Second moment.Taking into account the fact that λ i = 0, it suffices to prove that uniformly over i = 1, . . ., n.We quickly see that For the other term, we distinguish two cases.

Proof of Proposition 4
We have Using this, for the Poisson model (1), we have and, after some simple but tedious calculations, for some universal constant C > 0, using (8).We have E Because of (8), we have n i=1 1 λ i = O(n) and then, by (18), we have ε n i=1 a i → ∞.With this and the second part of ( 18), it becomes straightforward to see that the first part of Lemma 5 applies and we conclude that way.
We now prove that the chi-squared test is asymptotically powerless under (19).For one thing, this condition implies that Var 1 (D) ∼ Var 0 (D), based on ( 19) and the bound on R above, and also that E 1 (D)−E 0 (D) Var 1 (D) ∨ Var 0 (D).It therefore suffices to prove that D is asymptotically normal both under the null and under the alternative.We have D = i Z 2 i , where Z 2 i := (X i − λ i ) 2 /λ i , and these being independent random variables, it suffices to verify Lyapunov's conditions.Some straightforward calculations yield for some constant C > 0, and using (8), we get With some more work, and using ( 8), we also obtain for some constant C > 0, so that Var which is an immediate consequence of (19).

Proof of Proposition 5
When r > (1 − √ 1 − β) 2 , there exists a δ > 0 such that r > ( Define the threshold c n = 2(1 + δ) log(n).Under the null, by the union bound and Lemma 2, under (6), Under the alternative, define I := {i : We then derive the following Under the alternative, let I = {i : X i ∼ Pois(λ i )}.Note that λ i h(X i /λ i ) ≥ log(n/ω n ) implies where the equality is due to the fact that, necessarily, X i ≥ 3λ i eventually, and the inequality comes from Lemma 3. Thus, defining q i = P λ i h(Υ λ i /λ i ) ≥ log(n/ω n ) , we arrive at where q min := min i=1,...,n q i , and in the last line we used the fact that |I | ∼ Bin(n, ε/2), so that |I | ≥ nε/4 with probability tending to one.Note that , where for t ≥ 0, h −1 (t) is defined as the unique x ≥ 1 such that h(x) = t.Notice that h −1 (t) ∼ t/ log t when t → ∞.Let ζ i = log n/λ i , so that ζ min := min i ζ i → ∞ when (7) holds.We have Therefore, applying the first lower bound in Lemma 3, we get In particular, q min ≥ n γ−1+o(1) , implying that nεq min ≥ n γ−β+o(1) → ∞, because γ > β by assumption.We conclude that P 1 (min i p i > ω n /n) = o(1), as we needed to prove.
6 The one-sided setting Up until now, we considered a two-sided setting, partly motivated by the important example of goodness-of-fit testing, where Pearson's chi-squared test is omnipresent.Simpler is a one-sided setting, where instead of (1) we have together with λ i = λ i + ∆ i and ε ∈ [0, 1], and address the problem (3) in this context.Such a model may be relevant in some image processing applications where the goal is to detect an anomaly in the form pixels with higher-intensity.

Dense Regime
In the dense regime where (9) holds with β < 1/2, we consider the same parameterization (10).Define Proposition 8. Consider the testing problem (3) in the one-sided setting (36), with parameterizations (9) with β < 1/2 and (10).All tests are asymptotically powerless if The proof is parallel to that of Proposition 1 -in fact simpler -and is omitted.We note that this detection boundary is in direct correspondence with that in the normal model (Cai et al., 2011).
In the one-sided setting, the chi-squared test does not achieve the detection boundary.However, its one-sided version does.Indeed, consider the test that rejects for large values of n i=1 Proposition 9. Consider the testing problem (3) in the one-sided setting (36), with (8), and let The test based on (39) is asymptotically powerful if (18) holds.In particular, with parameterization (9) with β < 1/2 and (10), the test is asymptotically powerful when s > ρ one dense (β).
The proof is parallel to, and in fact much simpler than, that of Proposition 4, and is omitted.
All the arguments are simpler in the one-sided setting, so much so that we are able to analysis Fisher's method.In the one-sided setting, instead of ( 24), define the P-values as in (26).Note that Lemma 1 still applies.
To streamline the proof, which is somewhat long and technical, we implicitly focused on the most interesting case where the a i 's are bounded, but this is not intrinsic to the method.In fact, the test has increasing power with respect to each a i .The technical proof is detailed in Section 6.3.

Sparse Regime
In the sparse regime, the same results apply.In particular, the detection boundary described in Propositions 2 and 3 applies.The max test -now based on max i Z i -and Bonferroni's method achieve the detection boundary in the very sparse regime (β > 3/4).The higher criticism is now based on , with definition (26) and and it achieves the detection boundary over the whole sparse regime (β > 1/2).The technical arguments are parallel, and in fact simpler, and are omitted.

Proof of Proposition 10
Let V be the statistic (25).We seek to apply Lemma 5, which is based on the first two moments, under the null and under the alternative.In what follows, λ ≥ 1 and λ = λ + a √ λ with 0 < a ≤ 1 for some constant C > 0.

Conclusion.
Since the test has increasing power with respect to each a i , we may assume that a i ≤ 1 for all i.Let F λ i = −2 log G λ i (X i ) and notice that V = i F λ i is our test statistic.We have and Var 0 (V ) as well as Var 1 (V ) By Lemma 5, we conclude that the test is asymptotically powerful when

Discussion
We drew a strong parallel between the Poisson means model and the normal means model.The correspondence is in fact exact when all the λ i 's are at least logarithmic in n.When the λ i are smaller, we uncovered a new detection boundary in the sparse regime.We studied the chi-squared test, the max test and the higher criticism, which are shown here to have similar properties as in the normal model.Motivated by the higher criticism, we also advocated a multiple testing approach to Poisson means model, and studied emblematic approaches such as Fisher's and Bonferroni's methods, which are indeed shown to achieve the detection boundary in some regime/model.An open direction might be to adapt the method of Meinshausen and Rice (2006) for estimating the number of non null effects in the Poisson means model.

Figure 1 :Figure 2 :
Figure 1: Simulation results in the dense regime, with n = 10 6 and all λ i 's equal to λ 0 = 15.The blue line is the information boundary (11).

Figure 3 :Figure 4 :Figure 5 :
Figure3: Simulation results in the sparse regime, with n = 10 6 and all λ i 's equal to λ 0 = 15.The blue line is the information boundary (14).The dashed blue curve for the max test is the boundary that it can achieve.