Optimal weighting for false discovery rate control

How to weigh the Benjamini-Hochberg procedure? In the context of multiple hypothesis testing, we propose a new step-wise procedure that controls the false discovery rate (FDR) and we prove it to be more powerful than any weighted Benjamini-Hochberg procedure. Both finite-sample and asymptotic results are presented. Moreover, we illustrate good performance of our procedure in simulations and a genomics application. This work is particularly useful in the case of heterogeneous $p$-value distributions.


Introduction
In many practical situations, e.g. functional Magnetic Resonance Imaging (fMRI) or microarray data, the problem of testing simultaneously a large number m of null hypotheses arises. Since Neyman-Pearson's approach is the most commonly used strategy for single testing, many researches have focused on generalizing this approach to the multiple testing case. First, one should choose a global type I error to be controlled, as the probability of making at least one false discovery (family-wise error rate, FWER) or more recently the mean proportion of false discoveries among all the discoveries (false discovery rate, FDR, see Benjamini and Hochberg (1995)). Second, one should build a procedure that controls the so-chosen global type I error rate. For instance, Benjamini and Hochberg (1995) proved that the linear step-up procedure (LSU) controls the FDR when the underlying tests are independent. Third, one should show that the obtained procedure has good power performance, the power being generally defined as the expected number of true discoveries.
To our knowledge, while the two first points above are widely studied (e.g. building FDR controlling procedures, see e.g. Benjamini and Yekutieli (2001); Storey (2002); Sarkar (2002))), the last point is most of the time evaluated with simulations, without a full theoretical support. Only few works have studied rigorously the optimality of certain classes of multiple testing procedures (see Lehmann et al. (2005); Wasserman and Roeder (2006); Rubin et al. (2006); Storey (2007); Finner et al. (2009)).
Maximizing the power while controlling the FDR remains a difficult task, because the FDR involves a random denominator (the number of discoveries). The present paper gives a contribution to the latter maximization problem, in the simple case where the null and alternative distributions are known. This framework is natural for the power maximization of tests, as it was also used in Neyman-Pearson's lemma for single testing. Although leading to oracle procedures, it can be used in practice as soon as the null and alternative distributions are estimated or guessed reasonably accurately from independent data.
More formally, assume that each hypothesis is tested using a test statistic, that can then be transformed into a p-value p i , and denote by F i the alternative c.d.f. of p i . In general, the F i 's can be possibly very different (e.g. with heterogeneous underlying data) and the p-values cannot be considered interchangeably. Therefore, a p-value weighting approach seems appropriate to improve the performance of a multiple testing procedure. This technic, that can be traced back to Holm (1979), consists in replacing in input each original p-value p i by the weighted p-value p ′ i = p i /w i for some weight vector (w 1 , . . . , w m ) summing to m. Here, we focus on the weighted version of the LSU procedure that was proposed in Genovese et al. (2006) (see also Blanchard and Roquain (2008)). In the latter paper, it was demonstrated that the weighted LSU still controls the FDR for any weighting (under independence between the p-values), and that some of these procedures could improve the power of the LSU asymptotically. In the present paper, we aim to find the most powerful procedure among all the weighted LSU procedures, or more precisely, to find a procedure that mimics the best procedure among the weighted LSU procedures. Moreover, this procedure should be computable from the p-values distributions, i.e. the F i 's.
When using the weighted version of the FWER-controlling Bonferroni procedure, Wasserman and Roeder (2006) (see also Rubin et al. (2006)) have found the optimal weighting. In Storey (2007), an optimal procedure was also proposed, maximizing the expected number of true discoveries while controlling the expected number of false discoveries. All these procedures use deterministic thresholds, which make the power maximization feasible. However, in the case of the FDR-controlling weighted LSU, the threshold depends on the final number of discoveries and a power maximization seems very difficult to make, even in the asymptotic framework where the number of p-values m tends to infinity (see Genovese et al. (2006)).
The main idea of this paper is to find the optimal weights simultaneously for all the possible rejection proportions u ∈ [0, 1]. These multi-weights are then collected in optimal weight functions u → W ⋆ i (u) which in turn are sequentially integrated in a step-up procedure. While the LSU procedure uses as threshold function u → αu, we find that the new procedure uses a threshold function u → αuW ⋆ i (u) not necessarily linear (and depending on the F ′ i s). The new procedure, called "optimal multi-weighted step-up procedure", will be presented in detail in Section 3. In Section 4, we show that it enjoys the following properties: (i) FDR control for a finite number of hypotheses, up to slight modifications; (ii) power optimality for a finite number of hypotheses, up to error terms; (iii) power optimality without error term and FDR control without modification when the number of hypotheses m tends to infinity (in a specific asymptotic setting).
These results are established in two different (classical) models of p-values, both assuming independence between the p-values. The results (ii) and (iii) additionally use that the F i 's are strictly concave functions and that the maximization of the power at any rejection proportion is feasible, which remain quite mild assumptions.
In Section 5, we present a simulation study which exhibits the behavior of the new procedure when the F i 's are correctly specified or misspecified. Section 5 discusses some applications and our conclusions are given in Section 7. All our results are proved in Section 8, while some technical parts are gathered in Appendix. Our proofs mainly use the "self-consistency condition" introduced in Blanchard and Roquain (2008) (see also Finner et al. (2009)) and Hoeffding's inequality (see Hoeffding (1963)).

Models for the p-values
Let us first define the two different models for the p-values that will be used throughout the paper.
We consider a finite set of m null hypotheses on a probability space and we let H i := 0 (resp. 1) if the i-th null hypothesis is true (resp. false). Letting H := (H i ) 1≤i≤m ∈ {0, 1} m , we denote by H 0 := {i ∈ {1, . . . , m} | H i = 0} the set corresponding to the true null hypotheses and by m 0 := |H 0 | its cardinal. Analogously, we define H 1 := {i ∈ {1, . . . , m} | H i = 1} and m 1 := |H 1 | for the alternative hypotheses. Since H 1 is the complement of H 0 in {1, . . . , m}, we have m 1 = m − m 0 . The proportion of true nulls (resp. false nulls) is denoted by π 0 := m 0 /m (resp. π 1 := m 1 /m) as usual. We suppose that for the i-th null hypothesis it is given a p-value p i i.e. a measurable function from the observation space into [0, 1] such that the distribution of p i is uniform on [0, 1] when the i-th null hypothesis is true: Under the alternative, we denote by F i the cumulative distribution function of p i : ∀i ∈ H 1 , ∀t ∈ [0, 1], F i (t) := P(p i ≤ t). In our setting, the F i 's are allowed to be different and we denote F := (F i ) i∈H1 the family of alternative c.d.f.'s. The p-values are assumed mutually independent. The latter model has parameters (H, F) and will be referred troughout the paper as the conditional model (because it uses a fixed vector H).
Additionally, we will consider the so-called random effects model (see e.g. Efron et al. (2001);Storey (2003); Genovese and Wasserman (2004)). In this model, H is generated independently from all other random variables, from m i.i.d. Bernoulli priors. The probability for a null to be true (resp. false) is denoted by π 0 := P(H i = 0) ∈ (0, 1) (resp. π 1 := 1 − π 0 ). Then, the p-values are assumed to follow the conditional model conditionally to H: the p-values are mutually independent conditional to H and each p i is uniform conditional to H i = 0 (i.e. satisfies (1) conditional to H i = 0) and has for alternative c.d.f. F i conditional to H i = 1. As a consequence, unconditionally, the p-values are independent and for i = 1, . . . , m, the c.d.f. of each p-value p i is t → π 0 t+π 1 F i (t). This model has for parameters (π 0 , F) where F = (F i ) 1≤i≤m is the family of alternative c.d.f.'s. The latter model will be referred trough the paper as the unconditional model.

Assumptions and notation
We introduce the following possible regularity assumptions on the parameter F of each model, the derivative of F i being denoted by f i : the F i 's are continuous, strictly concave functions on [0, 1]; (A1) the F i 's are twice differentiable on (0, 1); (A2) As illustration, the assumptions (A1)-(A4) are all satisfied in the one-sided testing Gaussian case where we test for any i the null "µ i = 0" against "µ i > 0" from a Gaussian test statistic of mean µ i and variance 1. In that case, we have where we denoted Φ(z) := P [Z ≥ z] for Z ∼ N (0, 1). Finally, for any non-decreasing function F : [0, 1] → [0, 1] we denote and for λ > 0, . We easily check that I(F ) and J (F ) are maxima and that F (I(F )) = I(F ) and F (J (F )) = J (F ).

Multiple testing procedures, FDR and power
A multiple testing procedure R is defined as an algorithm which, from the data, aims to reject part of the null hypotheses. Below, we will consider, as is usually the case, multiple testing procedures which can be written as a function of the family of p-values p = (p i , i ∈ {1, . . . , m}). More formally, we define a multiple testing procedure as a measurable function R, which takes as input a realization of the p-value family p ∈ [0, 1] m and which returns a subset R(p) of {1, . . . , m}, corresponding to the rejected hypotheses (i.e. i ∈ R(p) means that the i-th hypothesis is rejected by R).
As introduced by Benjamini and Hochberg (1995), the false discovery rate (FDR) of a multiple testing procedure is defined as the mean proportion of true hypotheses in the set of the rejected hypotheses: where | · | denotes the cardinality function. Of course, the FDR in (3) depends on the model chosen for the p-values. In particular, the FDR in the conditional model involves an expectation taken conditionally to H, whereas the FDR in the unconditional model additionally uses an averaging over H. It is worth noticing that, if a procedure controls the FDR in the conditional model, that is conditionally to any value of H ∈ {0, 1} m , it controls also the FDR unconditionally. Finally, we use the standard power criterium equal to the mean proportion of correctly rejected hypotheses, that is, (4) In the notation below, we will sometimes drop the explicit dependency in p for short, writing e.g. R instead of R(p).

Weighted linear step-up procedures
Let us consider w = (w i ) i a vector of non-negative real numbers such that m i=1 w i = m, called here a weight vector, and consider the weighted p-values p ′ i = p i /w i , ordered as: p ′ (1) ≤ · · · ≤ p ′ (m) with the convention p ′ (0) = 0. As introduced by Genovese et al. (2006), the weighted linear step-up procedure associated to w, denoted here by LSU(w), rejects the i-th hypothesis if In particular, the procedure LSU(w) using ∀i, w i = 1 corresponds to the standard linear step-up procedure of Benjamini and Hochberg (1995), denoted here by LSU. Letting G w (u) = m −1 m i=1 1{p i ≤ αw i u}, the rejection proportion u can equivalently be defined as using the notation of Section 2.2. Contrary to (5), expression (6) does not make any specific use of the reordered p-values p ′ (1) , . . . , p ′ (m) , so that it is generally more convenient from a mathematical point of view.
For any choice of weight vector w, Genovese et al. (2006) proved that the weighted linear step-up procedure controls the FDR at level αm −1 i (1 − H i )w i ≤ α in the conditional model and (thus) at level π 0 α ≤ α in the unconditional model.

New approach
We present in this section a new family of multiple testing procedures, called multi-weighted procedures. We start by motivating their introduction from the power optimization problem among the family of weighted linear step-up procedures.

Weight functions
Following Genovese et al. (2006), the explicit computation of the power of the LSU(w) is a difficult task (even asymptotically): it depends on the final proportion of rejections of the procedure u = I( G w ), which is a random variable itself depending on w. Therefore, we propose here to perform the optimization for each fixed rejection proportion u which in turn leads to a family of optimal weight vectors depending on u, 0 < u ≤ 1.
First, define the power of the procedure that thresholds each p-value p i at level αw i u: corresponding intuitively to the "power of the LSU(w) at rejection proportion u". Second, define a weight (vector) function as a function W : u ∈ (0, 1] → W(u) = (W i (u)) i ∈ (R + ) m such that each W(u) is a weight vector, that is, ∀u ∈ (0, 1], m i=1 W i (u) = m and such that the following property holds ∀i ∈ {1, . . . , m}, u → W i (u) u is nondecreasing on (0, 1]. Additionally, a weight function is said continuous if for all i, u ∈ (0, 1] → W i (u) are continuous functions.
Definition 3.1. Any weight function W ⋆ solving simultaneously the maximization problems: is called the optimal weight function.
Note that W ⋆ is called here abusively "the" optimal weight function even if it is not proved to be unique. Of course the optimal weight function depends on the model chosen for the p-values. The following proposition gives (strong) sufficient conditions for existence and unicity of the optimal weight function in the different models described in Section 2.1.
Proposition 3.2. Assume (A1)-(A2)-(A3) and denote the derivative of F i by f i . Then the weight function W ⋆ satisfying (9) exists and is unique in either of the following cases: • In the conditional model, if α < π 1 , with for all u ∈ (0, 1], • In the unconditional model, with for all u ∈ (0, 1], In each case, y ⋆ (u) is defined as the unique element providing m i=1 W ⋆ i (u) = m. Moreover, in both models, the weight function W ⋆ is continuous, and assuming in addition (A4), the limits W ⋆ i (0 + ) exist for all i. The proof, which is based on similar arguments than those proposed in Rubin et al. (2006) and Wasserman and Roeder (2006), is given in Section 8.6. Of course, the optimal weight function depends on the parameters of the model: on (H, F) in the conditional model and on F (only) in the unconditional model.
For instance, when the p-values are generated from the Gaussian model (2), the optimal weight function in the unconditional model is given by where c(u) is the unique element of R such that m i=1 W ⋆ i (u) = m. It therefore only depends on the vector of alternative means µ = (µ i ) 1≤i≤m . Figure 1 displays the optimal weight vectors W(u) for a particular choice of means and different values of u. We observe that W(u) strongly depends on u: for u = 1, the weight vector is larger for small means, whereas as u decreases, the weight vector is maximum on larger means. In particular, for small u, the weighting is close to zero for the smallest means, because they produce p-values much larger than αu (with high probability).
The Gaussian formula (11) can be also suitable for test statistics "close to be Gaussian", namely for locally uniform asymptotically normal test statistics (see Chapter 14 of van der Vaart (1998) and Section 4.3 and Section 7 of Roquain and van de Wiel (2008)). This is the case for instance for the Mann-Whitney test statistic.

Multi-weighted procedures
From the previous section, we have now to integrate several weight vectors in a single multiple testing procedure, or more precisely to use a weight vector w which may depend on u. For this, we extend the definition of weighted linear procedures to the case of multi-weighted procedures. First, we define the threshold collection ∆ = (∆ i (u)) i,u associated to a given weight function Conversely, given any threshold collection ∆ = (∆ i (u)) i,u such that each ∆ i is nonnegative, nondecreasing on [0, 1] and such that ∀u ∈ (0, 1], m −1 i ∆ i (u) = αu, we define the weight function W = (W i (u)) i,u associated to ∆ by ∀i ∈ {1, . . . , m}, ∀u ∈ (0, 1], W i (u) := ∆ i (u)/(αu). As a consequence, the threshold collection ∆ and the weight function W are one to one associated. Definition 3.3. Consider a weight function W(·) = (W i (·)) i and its associated threshold collection ∆. The multi-weighted step-up procedure with weight function W, denoted by SU(W), rejects the i-th null hypothesis if and where we denoted In particular, in the case where for all u, W i (u) = w i is independent of u, the procedure SU(W) reduces to LSU(w). More generally, the above definition of SU(W) allows to choose thresholding ∆ i (u) not linear in u.
As for LSU(w), the multi-weighted procedure SU(W) can also be derived from a re-ordering based algorithm. The main difference is that the original pvalues are ordered in several ways, because several weighting are used. Namely, if for r ≥ 1, q r denotes the r-th smallest W(r/m)-weighted p-value i.e. is equal Similarly to the step-up case, we can define the multi-weighted step-down procedure with weight function W (and associated threshold collection ∆), denoted by SD(W), as rejecting the i-th null hypothesis if Remark that the procedures SU(W) and SD(W) only use the values of W(u) for u ∈ {1/m, 2/m, . . . , 1}, which makes them easily computable. We refer the reader to Appendix A for explicit algorithmic versions of the procedures SU(W) and SD(W).
The particular multi-weighted step-up procedure SU(W ⋆ ) using the optimal weight function W ⋆ is called the optimal multi-weighted procedure. From an intuitive point of view, since this weighting maximizes the power at any rejection proportion, the latter procedure should be more powerful than any standard weighted procedure LSU(w). One of the goal of Section 4 is to state this optimality result formally.
Finally, let us remark that in the unconditional model and under the assumptions and notation of Proposition 3.2, the optimal multi-weighted step-up procedure may be written under the following form: is adjusted from all the p-values, the f i 's and the pre-specified level α. As a consequence, this procedure is based on individual tests of Neyman-Pearson's type (the observed variables being restricted to the p-values).

Main results
We present in this section the main properties of the multi-weighted procedures. First, the finite-sample FDR control of SU(W) for any weight function W, up to slight modifications. Second, the finite-sample power optimality of the procedure SU(W ⋆ ) (using the optimal weight function), up to some small error terms. Third, a consistency result, proving that the latter slight modifications are unnecessary and that error terms vanish, in a particular asymptotic setting where m tends to infinity.

Finite-sample FDR control
First, let us recall that for any choice of weight vector w = (w 1 , . . . , w m ), the weighted linear step-up procedures LSU(w) controls the FDR at level in the conditional model and at level π 0 α ≤ α in the unconditional model (see Genovese et al. (2006); Blanchard and Roquain (2008)). These controls are non-asymptotic, in the sense that they are valid for any finite m ≥ 2.
Unfortunately, the procedure SU(W) cannot be proved to control the FDR at level α for any choice of weight function W and for any m ≥ 2. In Appendix C, a (least favorable) choice of weight function is given when m = 2, for which FDR(SU(W)) slightly exceeds α. Therefore, in order to obtain rigorous FDR control for each m and any weight function, we need to slightly correct SU(W).
Theorem 4.1. Consider W(·) = (W i (·)) i any weight function. Then for any finite m ≥ 2, the two following procedures have their FDR less than or equal to The proof of Theorem 4.1 is given in Section 8.3. Note that this result covers the earlier result of Genovese et al. (2006); Blanchard and Roquain (2008), by taking W i (u) = w i constant in u.
Since from (8), we have αuW i (u) ≤ αW i (1), both modifications of SU(W) proposed above should be not too large when αW i (1) is close to 0 (e.g. when α is small). Furthermore, while the correction proposed in the weighting of the step-up procedure is more conservative than the one of the step-down, a step-up procedure is always more powerful than a step-down procedure (for the same threshold collection). Therefore, in general, no modified procedure dominates the other. Nevertheless, in the particular simulation setting of Section 5, we will see that the step-down modification appears to be better.
When using the optimal weight function W ⋆ , Theorem 4.1 provides two modifications of the optimal multi-weighted procedure SU(W ⋆ ) that control the FDR. More importantly, it shows that any misspecification in W ⋆ (e.g. in the model parameters) still leads to the correct FDR control. This is a crucial point in practice.
Explicit finite-sample bounds for the FDR of SU(W) -the step-up procedure without modification -are given in Proposition 8.4 in the unconditional model (see Section 8.5). It shows that FDR(SU(W)) should be close to π 0 α when m is large, so that the modifications of Theorem 4.1 are not needed anymore in that case. We will develop the resulting FDR consistency result more formally in Section 4.3 under some asymptotic conditions and for the optimal weighting.

Finite-sample power optimality
For a given weight function W of associated threshold collection ∆, let us denote being the mean proportion of rejections at levels (∆ i (u)) i and define similarly G w (u) for a weight vector w. Then the following theorem holds: Theorem 4.2. In the unconditional model, assume that F satisfies (A1) and consider a weight function W ⋆ which maximizes the power at every proportion rejection, i.e. satisfying (9). Consider λ > 0 with λ < π 0 (1 − α). Then we have for any finite m, where ∀x ∈ R, ε(m, x) := π 1 m 2 exp −2m x−m −1 2 + and where the maximum is taken over all the weight vectors w. Moreover, we have (15) can be seen as a nonasymptotic "oracle inequality", stating that the power of the optimal multiweighted procedure is close to the power of the best weighted linear step-up procedure. This finite-sample optimality result makes sense because SU(W ⋆ ), as all the weighted linear step-up procedures, controls the FDR non-asymptotically at level α (up to the slight modifications presented in Section 4.1).
In (15), λ should be chosen such that the errors terms ε(m, ) and 2λ(1 − απ 0 ) are as small as possible. From an asymptotic point of view, assuming that the quantities I − λ (G W ⋆ ) and I + λ (G w ) are bounded away from 0 when m tends to infinity (for any fixed λ), the error terms tend to zero by taking successively m tending to infinity and λ tending to zero. However, the best choice λ = λ m depends on the parameter F and seems quite difficult to derive under an explicit form (and so are the corresponding convergence rates in (15)).
The next section presents sufficient asymptotic conditions making I − λ (G W ⋆ ) and I + λ (G w ) bounded away from 0 when m tends to infinity, so that the error terms will asymptotically vanish in oracle inequality (15).

Consistency
We propose in this section an asymptotic framework in which the optimality of SU(W ⋆ ) and its FDR control hold when m tends to infinity, without modification or error term.
First, we define the asymptotic setting. For all m ≥ 2, we consider the munconditional model, where the m p-values are chosen as the m first p-values of an infinite sequence of independent p-values (p i ) i≥1 , each p-value p i having the c.d.f. π 0 t + π 1 F i (t), for a given infinite sequence of c.d.f.'s F = (F i ) i≥1 . In this context, the weight functions depend on m, and we underline this dependence in the notation, by denoting W (m) instead of W (and w (m) instead of w).
Second, we define a converging weight function sequence (W (m) ) m as a sequence of weight functions such that the associated function sequence (G W (m) ) m (defined in (14)) converges point-wise (on [0, 1]). For short, we will often use the notation G ∞ for the limit function of (G W (m) ) m .
Theorem 4.3. Consider the above asymptotic framework in which F is assumed to satisfy (A1) and consider a class W of converging weight function sequences. Let (W ⋆,(m) ) m be a sequence of weight functions such that for all m, W ⋆,(m) maximizes the power at every proportion rejection in the m-unconditional model (i.e. satisfies (9)). For the sequence (W ⋆,(m) ) m , assumed to lie in W, and for any weight vector sequence (w (m) ) m belonging to W, we additionally assume that the associated limit function G ∞ is continuous and satisfies the maximum above being taken over any sequence of weight vectors (w (m) ) m belonging to W. Moreover, we have Theorem 4.3 is proved in Section 8.5. Under the conditions of Theorem 4.3, inequalities (16) and (17) imply that SU(W ⋆ ) is asymptotically more powerful than any weighted linear step-up procedure (in a certain class of converging weight vector sequences) while having the same asymptotic FDR control. Since the uniform weighting sequence w (m) i = 1 is always converging (with a continuous strictly concave limit function, from (A1)), it can always be added in the class W. As a consequence, the procedure SU(W ⋆ ) always improves the original LSU asymptotically. However, this should be balanced with the fact that SU(W ⋆ ) uses the true parameters of the model, whereas LSU does not.
To satisfy the assumptions of Theorem 4.3, we have to choose a convenient class of converging weighting sequences W, containing the optimal weighting sequence. We give below two examples of such choice when F is assumed to have a particular structure.
A first example is the case of clustered p-values: consider a parameter F satisfying (A1) and such that F i is equal to F A (resp. F B ) for i ∈ S B ) does not depend on m (this holds up to take a subsequence of m). In this context, we merely check that a weight vector maximizing the power (at a given rejection proportion) has the same weight within a cluster. It is therefore natural to consider the following class of weighting: Since for any weight function sequence W (m) m of W the function G W (m) (u) = π 0 αu + π 1 π A F A (αuW A (u)) + π 1 π B F B (αuW B (u)) does not depend of m, W is a class of converging weight function sequences. Moreover, G ∞ (u) = G W (m) (u) is continuous and satisfies I − λ (G ∞ ) > 0 for λ < I(G ∞ ), either for W (m) (u) = w (m) a weight vector, or for W (m) (u) = W ⋆,(m) (u) the optimal weight function (from one of the last statements of Theorem 4.2). Finally, we can apply Theorem 4.3 to obtain the oracle inequality (16). Moreover, the last assumption required for the FDR control (17)   for more convenience. Also note that the function t → F t can be extended to all t in [0, 1]. In that setting, it is relevant to consider the following class of weighting: (11) provides the form of the optimal weight function: αuW i/m (u) = m. In Section 8.7, we prove that (W ⋆,(m) ) m belongs to W, with a "limit weighting" (W ⋆ t (·)) t∈[0,1] ≥ 0 given by Similarly, any weight vector sequence of the form w (m) = W ⋆,(m) (u 0 ) (with u 0 fixed in (0, 1]) belongs to W with a limit function G ∞ (u) = π 0 αu + π 1 1 0 F t (αuW ⋆ t (u 0 ))dt continuous and strictly concave (implying I − λ (G ∞ ) > 0 for λ < I(G ∞ )). Denoting G ⋆,∞ the limit function of G W ⋆,(m) , we merely check that G ⋆,∞ ≥ G ∞ . Since this holds for any choice of u 0 , we derive that I − λ (G ⋆,∞ ) > 0 for λ < I(G ⋆,∞ ) (using inequalities similar to (25)). As a consequence, we may apply Theorem 4.3 to obtain the oracle inequality (16). In particular, the power of the multi-weighted procedure SU(W ⋆,(m) ) is always asymptotically larger than the power of the weighted linear step-up LSU(w (m) ) for any weight vector of the form w (m) = W ⋆,(m) (u 0 ), with u 0 ∈ (0, 1]. Roughly, the latter signifies that SU(W ⋆,(m) ) automatically chooses the best weighting among {W ⋆,(m) (u 0 )} u0 .

Simulation study
An important point is now to evaluate the improvement of the new multiweighted procedure, both when we plug the true parameters or misspecified parameters in the optimal weighting. For this, we propose to perform simulations in the -restricted but convenient -one-sided Gaussian testing framework under the conditional model.

Simulations framework
We consider the problem of testing for each i ∈ {1, . . . , m}, the null "µ i = 0" against the alternative "µ i > 0" from the observation of m independent variables (X i ) i with X i ∼ N (µ i , 1). The parameters (H, F) of the (conditional) model are fully determined from the vector µ = (µ i ) i , namely by H i = 1{µ i > 0} and (2), respectively. They represent informations of a different nature: H provides the location of the positive means while F supplies their values.
For all our experiments, the number of tests is m = 1000. The vector µ is taken such that the m 0 = 700 first components of µ are equal to zero (the proportion of zeros in the mean vector is thus π 0 = 0.7). The m 1 = 300 remaining non-zero means are taken in two different ways: • Case 1: the non-zero means increase linearly from 3 m1 µ to 3µ. • Case 2: the non-zero means are gathered in three groups of different values µ, 2µ and 3µ, of respective sizes 120, 120 and 60.
The following procedures are considered: -[LSU] the linear step-up procedure LSU, -[LSU ⋆ ] the step-up procedure with threshold collection αu/π 0 , -[SU-W-oracle] the multi-weighted step-up procedure SU( W ⋆ ) of Theorem 4.1, using the optimal weight matrix W ⋆ (given by (11)), -[SD-W-oracle] the multi-weighted step-down procedure SD( W ⋆ ) of Theorem 4.1, using the optimal weight matrix W ⋆ , -[Unif-oracle] the weighted linear step-up procedure LSU(w ⋆ ) using a weight vector uniform on H 1 : w ⋆ i = 0 for µ i = 0 and w ⋆ i = m/m 1 for µ i > 0. The procedures [SU-W-oracle], [SD-W-oracle] and [Unif-oracle] correspond to the case where the weighting uses the true mean vector µ, hence the name "oracle". In situations where we replace µ by a "guess" µ in the weights, the procedures are called "guessed" and are denoted by [SU-W-guess], [SD-W-guess], [Unif-guess] respectively. The procedure [Unif-oracle/guess] renders a uniform weighting over the (guessed) false nulls and is close in spirit to the approach of Genovese et al. (2006). It takes only into account the subset where the hypotheses are false ("location information"), but not the values of the non-zero means.
The procedure [LSU ⋆ ] is performed to compare with quite recent developments on π 0 -adaptive procedures (see e.g. Benjamini et al. (2006)). Since it uses a perfect estimation of π 0 , it represents the best theoretical π 0 -adaptive modification of the LSU that we can build. For clarity reasons, we avoid the problem of choosing a particular estimator of π 0 and we only consider [LSU ⋆ ].
All the latter procedures have provable FDR control (see Section 4.1), so that it is relevant to compare them in terms of power. In all experiments the targeted FDR level is either α = 0.01 or α = 0.05. The different performed procedures are compared in terms of relative power (RelPow) with respect to the LSU procedure, defined as the expected surplus proportion of correct rejections among the false nulls: for a multiple testing procedure R, Roughly speaking, this relative power represents the surplus "probability" of a false null to be rejected with respect to the LSU. This power is estimated using Monte-Carlo simulations. Additionally, we also evaluate the "power range" defined by the power of the weighted linear procedures LSU(W ⋆ (u 0 )) for any u 0 ∈ {1/m, 2/m, . . . , 1}. It is represented by a gray area over the pictures. Finally, the optimal multi-weighted step-up procedure SU(W ⋆ ) without correction (which controls the FDR when m → ∞) is also considered, but it is not reported on our figures, because its (relative) power is almost indistinguishable from the top of the power range.

Procedures using the true parameters
We report on Figure 2 the relative power (18)  of the multi-weighted procedures over [LSU] is satisfactory. Also, [SD-W-oracle] performs here better than [SU-W-oracle] (especially for α = 0.05), so that the loss in the correction within [SU-W-oracle] seems significantly larger than the loss in the correction within [SD-W-oracle]. Furthermore, [SD-W-oracle] is more powerful than [LSU ⋆ ] (actually, this is still true using a smaller π 0 , e.g. π 0 = 0.5), and [SD-W-oracle] is always better than [Unif-oracle], and allows sometimes for much more discoveries. This seems coherent because [SD-W-oracle] takes into account more (correct) prior informations than [Unif-oracle]: namely, [SD-W-oracle] uses both the values and the location of the non-zero means (we are in the conditional model), while [Uniforacle] only uses the location information.
Finally, the procedure [SD-W-oracle] is close to the top of the power range (gray area), that is, has a power close to the power of the best procedure among LSU(W ⋆ (u 0 )), u 0 ∈ {1/m, 2/m, . . . , 1}. This corroborates the optimality results of Section 4.2 and Section 4.3 in this (conditional) setting.

Procedures using misspecified parameters
We consider here the same experiment as before, except that we take into account the "randomness" due to a prior guess µ i of each µ i . For this, we add a misspecification parameter σ and we suppose that the guessed means are of the form: ∀i ∈ {1, . . . , m}, where ε i are i.i.d with distribution N (0, σ 2 ) (taken independent of the p i 's). The misspecification parameter σ is taken in the range {j/4, j = 0, . . . , 12}. Remark here that the way of guessing the mean is quite raw, because it does not take into account the specific form of the parameters (of course, this guessing can be improved here by taking local means). However, we keep this raw modeling here because we do not want to make any assumption on the parameters. Figure 3 reports the relative power (18) -guess] with respect to σ. We performed 100 simulations to compute the relative power and the latter is moreover averaged over 10 generated values of the µ i 's (for each values of σ).
In this experiment, we see that both multi-weighted procedures are better than other procedures when the guesses are good i.e. over the range σ ∈ [0, 1.2], but may be worst than the simple [LSU] procedure when σ is large. Furthermore, note that the procedure [Unif-guess] quickly collapses when σ grows and therefore only proposes a slight improvement of [LSU] (or [LSU ⋆ ]) when the guesses are good. However, it is "less risky" than the multi-weighted procedures for large σ. Again, this conclusion is natural because the multi-weighted procedures take here into-account more prior information than [Unif-guess].
Finally, although admittedly of a limited scope, these experiments show that in principle, taking into account a correct guess of the parameters in the multi- weighted procedures should improve the power substantially. The loss/gain magnitude of these procedures depends on the quantity of prior information used (location of the positive means, value of the positive means, or both).

Application to mRNA and DNA microarray experiments
In a typical microarray experiment, we want to find differentially expressed mRNA genes between two groups of individuals. For the i-th gene, the level expression is of the form X i,1 , . . . , X i,ki for group 1 and Y i,1 , . . . , Y i,ℓi for group 2, where k i (resp. ℓ i ) is the number of individuals in group 1 (resp. group 2). In some microarray experiments, the sample sizes (k i , ℓ i ) available to assess the differential mRNA expression of gene i may strongly depend on i, e.g. when the number of missing data differs per gene. In this application, we consider a covariate, the DNA copy number status of the same gene, which determines the groups and the sample sizes. DNA copy number status is obtained from an independent array CGH experiment, after a few pre-processing steps (see e.g. Picard et al. (2007)). We focus on the covariate A i,j which is equal to 1 when gene i is gained for individual j (i.e. when sample j has an abnormally high DNA copy number of gene i), and 0 otherwise. The biological goal behind this is to find the genes for which the mRNA expression is induced by the DNA copy number. This is particularly useful to study cancer pathologies (see e.g. Hyman et al. (2002)). Sample size dependent weights are in particular attractive here, because many genes show a large unbalance in the amount of gains (defining group 1) and non-gains (defining group 2).
Using the above framework, we analyze microarray lymphoma cancer data of Muris et al. (2007). In these data m = 11 169 genes and n = 42 individuals are studied. The p-value of each gene was computed using a Mann-Whitney test. We aim to consider as prior the sample size information only, without any guess on which hypotheses are false or true. The asymptotic normality of the Mann-Whitney test statistic is used to define asymptotically optimal multiweights W ⋆ which depend only on (k i , ℓ i ) and an estimate for the global effect θ, which is a gene-independent parameter for the effect of copy number gain on mRNA gene expression. The expression of the multi-weights and the estimate for θ, θ m , are detailed in Roquain and van de Wiel (2008). The estimator θ m converges in probability when m grows to infinity, so that we believe that the fluctuations of θ m in the weights will have a marginal effect on the effective FDR of the so multi-weighted procedure when m becomes large (however we did not investigate formally the corresponding asymptotic study for now).
We applied the step-up multi-weighted procedure SU(W ⋆ ), using the estimator θ m ≃ 1.01 of the global effect size θ. Since m is large we focus on the unmodified version of our procedure, which guarantees asymptotic FDR control. For different values of α, the number of discoveries of this procedure and of the LSU are given in Table 1.
We observe that our new step-up procedure discovers more differentially expressed genes when α ∈ {0.005, 0.01}. For α ∈ {0.05, 0.1}, the performance of the two step-up procedures is similar. So, the improvement of our procedure is here mostly noticeable when the proportion of rejections is small. This is in accordance with our intuition: the prior information (here the sample sizes) is particularly useful when the proportion of rejections is expected to be small. Finally, let us remark that these positive results on the sample size problem have been corroborated in a specific simulation study as well (not reported here).

Conclusion and discussions
When the parameters of the p-value model are known, we proposed to solve the problem of the LSU optimal weighting by finding a new procedure which provably outperforms all the weighted LSU procedures (up to small error terms) and which can be easily computed from these parameters. Our simulations illustrated the strength of the improvement of our new approach in situations where it uses the true or misspecified parameters. In our results, the assumptions concerning the marginal distributions of the p-values were quite mild: the FDR control only required that each p-value is uniform under the null while the optimality results only required the strict concavity of the c.d.f.'s of the p-values. Moreover, the existence of the optimal weight function only asked to maximize simultaneously the power at any proportion rejection, and we gave strong sufficient assumptions for its existence and unicity.
Several extensions to this work are possible: first, we have supposed the independence between the p-values all along the paper, which is a standard but somewhat unrealistic assumption for the applications. In Roquain and van de Wiel (2008), we proposed some extensions of the present FDR control results to the case of positively regressively dependence or unspecified dependence. However, the so-derived procedures seemed too conservative for practical use. Therefore, there is a room left for future investigations, which join the very active (but challenging) research field studying the impact of p-value dependence on FDR control (see e.g. Kim and van de Wiel (2008); Romano et al. (2008)).
Second, our FDR controls are done at level smaller than π 0 α (asymptotically, in the unconditional model). Therefore, when π 0 is small, our procedures are inevitably conservative, because their actual FDR is much lower than the fixed target. This is a classical problem for the LSU procedure and several works have been proposed to address this issue, by integrating a π 0 -estimate in the threshold, building so-called adaptive LSU procedures (see e.g. Benjamini et al. (2006); Blanchard and Roquain (2009)). A possible interesting extension of our work could therefore be to derive adaptive multi-weighted procedures, which would increase the power when the data contain a lot of signal.
A third -and maybe more important -direction for future works is the investigation of data-driven weighting. A first idea could be to replace the function Pow u (·), the power at rejection proportion u, by an empirical substitute and to perform the simultaneous maximization with this substitute. This would yield an empirical optimal weight function W ⋆ that can in turn be integrated in a multi-weighted procedure. While this certainly requires to use a model with some replications, the theoretical FDR control and power optimality of such data-driven procedure are not straightforward from the present work, because all our proofs here use the fact that the weight functions are deterministic.

Useful notation and lemmas
Let us first introduce the following notation that will be useful throughout our proofs: if R is the step-up procedure associated to a given weight function W of associated threshold collection ∆, and u := |R|/m its rejection proportion, that is u = I( G W ), we denote by: 1. R −i the step-up procedure on the set of hypotheses corresponding to {1, . . . , m}\{i}, that is excluding the i-th null, and associated to the threshold collection ∀j = i, ∀u, ∆ j ((1 − m −1 )u); and we denote by u −i := |R −i |/(m−1) its rejection proportion, so that −i the step-up procedure on the set of hypotheses excluding the i-th null associated to the threshold collection ∀j = i, ∀u, ∆ j ((1 − m −1 )u + m −1 ); and we denote u ′ −i := |R ′ −i |/(m − 1) its rejection proportion, hence u ′ Similarly, when R is step-down, we define R −i and R ′ −i as step-down procedures and we denote u : The two following lemmas make a link between the rejection proportions of R, R −i and R ′ −i , for different values of p i . They are proved in Appendix B and are related to Lemma 10.20 of Roquain (2007).
Lemma 8.1. Let R be the step-up procedure associated to a given weight function of threshold collection ∆ and consider u, u −i and u ′ −i as above. Then we have point-wise: Lemma 8.2. Let R be a step-down procedure associated to a given weight function of threshold collection ∆ and consider u and u −i as above. Then we have point-wise, for any k ∈ {1, . . . , m},

Proof of Theorem 4.1 -step-up part
The inequalities are established in the conditional model (the result in the unconditional model directly follows).
We use in all our FDR bounds that a procedure R satisfying the "selfconsistency condition" R = {i | p i ≤ ∆ i (|R|/m)} has a FDR equal to Now, consider the multi-weighted step-up procedure R = SU( W) of Theorem 4.1, and denote by ∆ the threshold collection associated to W: ∆ i (k/m) = α W i (k/m)k/m = αW i (k/m)k/m(1 + αW i (1)) −1 ≤ 1. Since any step-up procedure satisfies the self consistency condition, we may use (19). Furthermore, using the notation of Section 8.1 and applying Lemma 8.1 (first statement), the assertion Thus, we may rewrite the FDR as follows: Then, since u ′ −i only depends on the p-values of (p j , j = i), it is independent of p i and we obtain where we used that p i has a uniform distribution on [0, 1] (from (1)). Next, consider the threshold collection ∀j ∈ {1, . . . , m}, ∀u, ∆ ′ j (u) = ∆ j ((1 − m −1 )u + m −1 ) and the associated step-up procedure that we denote by R ′ . Let us also denote its rejection proportion by u ′ = |R ′ |/m. From the definition of Section 8.1, the restriction of R ′ to the hypothesis set corresponding to {1, . . . , m}\{i} is exactly the procedure R ′ −i . Therefore, from Lemma 8.1 (second statement applied to R ′ ), the condition

Proof of Theorem 4.1 -step-down part
Again, it is sufficient to look at the conditional model. First, let us prove that for any step-down procedure R with threshold collection ∆ and rejection proportion u, we have for any i, where u −i is the rejection proportion of the step-down procedure associated to ∆ and restricted to the hypotheses different from the i-th hypothesis as defined in Section 8.1. This result has been implicitly proved in Gavrilov et al. (2009) (Section 3), using a specific non-weighted step-down procedure. Here, we state (21) in a more general framework. Applying the two first points of Lemma 8.2, we obtain the following relations: As a consequence, the latter combined with (19) states (21). Now, consider the step-down procedure R of Theorem 4.1, that is, associated to the threshold collection ∆ i (k/m) = α W i (k/m)k/m = αW i (k/m)k/m(1 + αW i (k/m)k/m) −1 ≤ 1. We use the independence between the p-values and (21) to show The third point of Lemma 8.2 thus implies

Proof of Theorem 4.2
Let us assume that the following proposition holds (the proof is given at the end of this section): Proposition 8.3. In the unconditional model, consider a weight function W with its associated threshold collection ∆ and putū = I(G W ). Then the following holds: (ii) assuming ∆ ≤ 1, we have for all λ > 0, λ <ū, We now prove Theorem 4.2 by applying Proposition 8.3. First, remark that, in the unconditional model, we have for any weight vector w, so that maximizing in w the power at rejection level u is equivalent to maximize G w (u) in w. As a consequence, taking the optimal weight function W ⋆ , we deduce from (9) that for any weight vector w and for any u we have G w (u) ≤ G W ⋆ (u). Denoting u w := I(G w ) and u ⋆ := I(G W ⋆ ), this in turn implies that Second, remark that W ⋆ has a threshold collection ∆ ⋆ satisfying ∆ ⋆ ≤ 1. The latter holds because the F i 's are increasing (as non-decreasing strictly concave functions), and because ∆ ⋆ i (u) ≤ αW ⋆ i (1) with W ⋆ (1) maximizing the power at rejection proportion 1. Third, we check the assumption of (i) Proposition 8.3 for W(·) constantly equal to a weight vector w, which directly follows from the strict concavity of G w (itself coming from the strict concavity of the F i 's). Forth, let us prove that I + λ (G w ) > 0 and I − λ (G W ⋆ ) > 0. The first statement comes from the definition of I(G w ). To prove the second statement, consider u ⋆ = I(G W ⋆ ) and the weight vector w = W(u ⋆ ), so that u ⋆ is equal to u w = I(G w ) (because u w ≤ u ⋆ from (24)). Using again that W ⋆ is a maximum, we obtain by the strict concavity of G w . This implies I − λ (G W ⋆ ) > 0. Finally, using (22) with W(·) constantly equal to any weight vector w, together with (23) used with W = W ⋆ , we obtain for all λ > 0, λ < u ⋆ and λ < π 0 (1 − α), which proves (15).

Proof of Theorem 4.3
First remark that for any weight function sequence of W, the convergence of (G W (m) ) m to G ∞ is uniform, because all these fonctions are non-decreasing and because G ∞ is assumed to be continuous on [0, 1]. Next we prove that I(G W (m) ) → I(G ∞ ). (This will imply directly that I + λ (G W (m) ) → I + λ (G ∞ ) and I − λ (G W (m) ) → I − λ (G ∞ ) for λ < I(G ∞ ).) For this, take a subsequence m ′ such that I(G W (m ′ ) ) converges and prove that its limit ℓ is equal to I(G ∞ ). From the uniform convergence and the continuity of G ∞ , ℓ satisfies G ∞ (ℓ) = ℓ.
This yields (28) by letting λ → 0 and by noticing that lim m {I(G W ⋆,(m) )} exists. Finally, we have to check that the use of W ⋆,(m) in Proposition 8.3 was allowed, i.e. that for all m and u ′ > u > u ⋆ := I(G W ⋆,(m) ), inequality u ′ − G W ⋆,(m) (u ′ ) > u − G W ⋆,(m) (u) holds. For this, we let w := W ⋆,(m) (u ′ ) and u w := I(G w ). Since u ⋆ ≥ u w and G w (u w ) = u w we have u ′ − G w (u ′ ) > u − G w (u) (G w being strictly concave). Therefore, for this particular weight vector w, where the last inequality holds because W ⋆,(m) is a maximum.
Finally, to get the FDR statement (17), we use the same reasoning as above combined with the following finite FDR approximation result: Proposition 8.4. In the unconditional model, consider a weight function W with its associated threshold collection ∆ and putū = I(G W ). Assume that for all u ′ > u >ū, u ′ − G W (u ′ ) > u − G W (u) and take λ > 0 with λ < 1 −ū. Then the following bounds hold: FDR(SU(W)) ≤ π 0 α + π 0 αm 3 exp − 2m I + λ (G W ) − m −1 2 Assuming additionally ∆ ≤ 1, we have To prove Proposition 8.4, we write the FDR as (using the same reasoning and notation as in Section 8.4), On one hand, whenū > λ, we may write On the other hand, whenū ≤ λ, we have Ω c 2 = ∅ and This implies (30). The proof for (31) is similar, by noticing that (32) is an equality when ∆ ≤ 1. point is thus proved by additionally noticing that when both p i ≤ ∆ i ( u) and p i ≤ ∆ ′ i ( u ′ −i ) we have both u ′ −i ≥ φ −1 ( u) and φ( u ′ −i ) ≤ u, so that φ( u ′ −i ) = u. For the second point of the lemma, remark that m G W (u) = (m − 1) G −i (um/(m − 1)) + 1{p i ≤ ∆ i (u)}.
Appendix C: Some FDR bounds for SU(W) and SD(W) C.1.

C.2. Step-down case
The next result states that SD(W) control non-asymptotically the FDR without correction in the case m = 2 and when m 0 = m in the conditional model. This is quite intriguing and we may think that SD(W) controls the FDR for any m and m 0 .
Lemma C.2. For any weight function W, the procedure SD(W) controls the FDR at level α in either of the two following cases: (i) in the unconditional model when all the hypotheses are true, that is m 0 = m, (ii) in both conditional and unconditional model when m = 2.
To prove (i), we easily check that, when all the hypotheses are true, the FDR of SD(W) is 1 − P G W (1/m) = 0 and is thus equal to the FDR of LSD(W(1/m)), which is equal to α from results on weighted linear step down procedures (see e.g. Blanchard and Roquain (2008)). To prove point (ii), we just have to check the case m 0 = 1 from point (i). This trivially holds from (20) (which also holds in the step-down case), because all the weights are smaller than m = 2.