Asymptotic properties of false discovery rate controlling procedures under independence

We investigate the performance of a family of multiple comparison procedures for strong control of the False Discovery Rate ($\mathsf{FDR}$). The $\mathsf{FDR}$ is the expected False Discovery Proportion ($\mathsf{FDP}$), that is, the expected fraction of false rejections among all rejected hypotheses. A number of refinements to the original Benjamini-Hochberg procedure [1] have been proposed, to increase power by estimating the proportion of true null hypotheses, either implicitly, leading to one-stage adaptive procedures [4, 7] or explicitly, leading to two-stage adaptive (or plug-in) procedures [2, 21]. We use a variant of the stochastic process approach proposed by Genovese and Wasserman [11] to study the fluctuations of the $\mathsf{FDP}$ achieved with each of these procedures around its expectation, for independent tested hypotheses. We introduce a framework for the derivation of generic Central Limit Theorems for the $\mathsf{FDP}$ of these procedures, characterizing the associated regularity conditions, and comparing the asymptotic power of the various procedures. We interpret recently proposed one-stage adaptive procedures [4, 7] as fixed points in the iteration of well known two-stage adaptive procedures [2, 21].


Introduction
Multiple testing problems arise when many binary tests are performed simultaneously. The rejection of all hypotheses with individual p-values smaller than a fixed threshold results in an increasing number of false discoveries as the number of tests increases. There is therefore a need for a risk measure taking multiple testing into account. Multiple testing procedures use a collection of p-values as input and output a set of hypotheses to be rejected. Since the seminal article by Benjamini and Hochberg [1], the False Discovery Rate (FDR) has been accepted as a practical and convenient method for risk assessment in multiple testing problems involving high-dimensional data analysis, including non parametric estimation by wavelet methods in image analysis, functional magnetic resonance imaging (fMRI) in medicine, source detection in astronomy, or DNA microarray analysis in biology. The FDR is the expected proportion of erroneous rejections among all rejections.
The procedure originally proposed by Benjamini and Hochberg [1], which we term procedure BH95, controls FDR when the true null hypotheses are independent, or display certain forms of positive dependence [1,3]. Considerable efforts have been devoted to the development of procedures retaining the FDRcontrolling capabilities of the BH95 procedure under more general conditions of dependence [3,19], and/or with higher power [2,4,7,21,23]. The second aim is driven by the observation that the original BH95 procedure actually controls FDR at level of exactly π 0 α, where π 0 is the (unknown) proportion of true null hypotheses [3,8,19,23]. When π 0 < 1, applying the BH95 procedure at level α/π 0 would increase the number of rejections while keeping FDR ≤ α. However, as π 0 is unknown, this is only an Oracle procedure. Many proposed procedures therefore try to imitate the Oracle by applying the BH95 procedure at level α/ π 0 , where π 0 estimates (or at least provides an upper bound for) the true π 0 [2,21,23]. Such procedures are referred to as two-stage adaptive or plug-in procedures. A new class of procedures has recently been proposed, with the aim of providing tighter FDR control than the BH95 procedure, while avoiding explicit solution of the semi-parametric problem of π 0 estimation. Procedures of this second class are referred to as one-stage adaptive procedures [4,7].
The FDR controlling properties of such procedures have been carefully studied for a finite number of hypotheses hypotheses [1,2,4,8,9,19,21], or asymptotically [5,7,8,10,11,21,22,23]. As the proportion of erroneous rejections (FDP) is a stochastic quantity, its fluctuations around its mean value are worth investigating. Several procedures have been proposed for controlling the upper quantiles of the FDP [11,12,13,15,16,17,18,26]. The asymptotic behavior of process (FDP m (t)) 0<t≤1 , where t is a deterministic threshold, has also been studied [11,21,23]. We focus here on the properties of the random threshold τ associated with a given multiple testing procedure, particularly in the asymptotic distribution of FDP m ( τ ), the FDP actually reached by the procedure.
Organization of the paper. In section 2 we propose a general framework for asymptotic analysis of the FDP of multiple testing procedures. In section 3 we derive the asymptotic distribution of the FDP of a multiple testing procedure with generic threshold function T and characterize the asymptotic equivalence of multiple testing procedures. These results are explicitly connected to the regularity of the map T , which is then discussed. In section 4 we derive the asymptotic behavior of several existing procedures. In section 5 we point out interesting connections between one-stage adaptive and two-stage adaptive procedures. The main results are summarized and discussed in section 6, and proofs of the main results are gathered in section 7.

Background
Throughout the paper, we consider a sequence (P i ) i∈N of p-values associated with a collection of binary tests of a null hypothesis H 0 against an alternative hypothesis H 1 .  Note that T does not depend on m in Definition 2.2. In the remainder of this paper T (G) will be denoted by τ ⋆ .

Formalism
FDP as a stochastic process of a random threshold. As suggested in a previous study [11], the False Discovery Proportion can be viewed as a stochastic process. Let G 0,m and G 1,m denote the (unobservable) empirical distribution function of the p-values under the null and alternative hypotheses:  .
As π 0 (m) is deterministic in our setting and verifies lim m→+∞ π 0 (m) = π 0 , we will assume without loss of generality that it is constant equal to π 0 , in order to alleviate notation. Therefore, we have G m = π 0 G 0,m +(1−π 0 ) G 1,m , and, for any t ∈ [0, 1], R m (t) = 1 m m i=1 1 Pi≤t = G m (t) and V m (t) = 1 m {i/H0 true} 1 Pi≤t = π 0 G 0,m (t), so that is the False Discovery Proportion achieved at the deterministic threshold t. The asymptotic properties of the stochastic process (FDP m (t)) 0≤t≤1 were analyzed by Genovese and Wasserman [11]. They noticed that FDR m (t) = E [FDP m (t)], so the achieved FDR at t, may be written as where p(t) = π0t G(t) is the positive False Discovery Rate (pFDR) at t, as defined by [21]. They proved that the FDP m process converges to pFDR at a rate 1 √ m , and built confidence envelopes for the FDP process using this result.
We make use of this stochastic process approach here to study the behavior of the FDP actually achieved by a given multiple testing procedure T , that is, the random variable FDP m (T ( G m )). We investigated the asymptotic behavior of this variable and, in particular, its fluctuations around the asymptotic FDR achieved by procedure T , by writing FDP m (T ( G m )) as a function of the empirical distribution functions under the null and alternative hypotheses. Letting the FDP achieved by procedure T may be written as Using the functional Delta method [27], this formalism makes it possible to break down the analysis of FDP m (T ( G m )) into the regularity properties of the map T , which depend solely on the procedure, and the asymptotic behavior of the empirical distribution functions of the pvalues, which can be derived from Donsker's invariance principle [6] because p-values are assumed to be independent.
Remark 2.3. This paper focuses on FDP, but the formalism we propose here could be used to derive the asymptotic distribution of any risk measure based on the number of true/false positive/negatives, under the same regularity conditions. In particular, the results obtained here can also be applied to the False Non-discovery Proportion (FNP) [10]: Multiple testing procedures studied. The threshold function of the BH95 procedure is defined by As the BH95 procedure keeps the false discovery rate at a level of (exactly) π 0 α when p-values are independent [3,8,19,23], it is conservative by a factor π 0 . Other multiple testing procedures have been proposed that estimate π 0 , either implicitly or explicitly, to provide tighter (i.e. more powerful) FDR control under independence: One-stage adaptive procedures (BR08 [4], FDR08 [7]) use rejection curves other than Simes' line, without explicitly incorporating an estimate of π 0 . Two-stage adaptive procedures (BKY06 [2], STS04 [23], Sto02 [21]) apply the BH95 procedure at a level of α/ π 0 , where π 0 is an estimator of π 0 .
We therefore consider threshold functions of the form where r α : [0, 1] → R + will be called a rejection curve (after [7]), and A : D[0, 1] → [0, 1] will be called a level function. r α will be denoted by r(α, ·) whenever the dependence on α is of importance. A and r α are two degrees of freedom that can be used to describe generalizations of the BH95 procedure, corresponding to the case in which the level function is constant (equal to α), and the rejection curve is Simes' line. In this paper we consider increasing rejection curves satisfying r α (0) = 0, so that U(F, α) ≥ 0 for any F ∈ D[0, 1] and α ∈ [0, 1].

Overview of main results
Theorem 3.2 below shows that the FDP of a multiple testing procedure with threshold function T converges in distribution at rate 1/ √ m to a conservative, procedure-specific FDR level. This theorem holds under a general regularity condition on the map T , which is implied by the existence and uniqueness of an interior right-crossing point between the distribution function of the pvalues and the rejection curve of the procedure; the existence condition for a given procedure may be interpreted as a natural generalization of the notion of criticality, which has recently been introduced for the BH95 procedure [5].
Although the BH95 procedure is known to control FDR at a level of exactly π 0 α [3,8], other procedures have been proved to yield only an FDR not larger than α, either for a finite number of hypotheses (procedures STS04, BKY06 and BR08) or asymptotically (Sto02 and FDR08). In section 4 we derive the asymptotic behavior of these procedures, and the associated regularity conditions. As all procedures converge at the same rate 1/ √ m, their asymptotic power may be explicitly compared through their attained asymptotic FDR.
In section 5 we demonstrate the existence of interesting connections between the one-stage and two-stage adaptive procedures under investigation: with a striking symmetry, procedure BR08 may be interpreted as a fixed point of the iteration of procedure BKY06, and procedure FDR08 as a fixed point of the iteration of procedure Sto02.

Asymptotic properties of threshold procedures
This section provides general results about multiple testing procedures with threshold functions satisfying the following regularity condition. We refer to [27] for a formal definition of Hadamard differentiability. We begin by deriving the asymptotic distribution of the FDP of any multiple testing procedure satisfying Condition C.1 (section 3.1). We then define and characterize asymptotic equivalence between multiple testing procedures in terms of Condition C.1 (section 3.2). Finally we interpret this Condition in terms of crossing points between the distribution function G of the p-values and the rejection curve (section 3.3).

Asymptotic False Discovery Proportion
Condition C.1 makes it possible to use the functional Delta method [27] to derive the asymptotic distribution of the False Discovery Proportion FDP m (T ( G m )) actually achieved by procedure T from the convergence in distribution of the centered empirical processes associated with G 0,m and G 1,m , which is a consequence of Donsker's theorem [27]:

is a Gaussian process with continuous sample paths and covariance function given by
where γ 0 is the covariance function of B, that is, γ 0 : (s, t) → s ∧ t − st.  According to (ii), the asymptotic FDR achieved by procedure T is the pFDR at the asymptotic threshold τ ⋆ = T (G). This is true because τ ⋆ is positive (by Condition C.1). In particular, Theorem 3.2 provides a necessary and sufficient condition under which a multiple testing procedure with Hadamard differentiable threshold function asymptotically controls FDR: Remark 3.4 (Form ofṪ G ). The expression ofṪ G for threshold functions is given by Corollary 7.12, which shows that for one-stage adaptive procedures (where the level function A is constant),Ṫ G is proportional to the inverse of the difference between the slopes of r α and G at τ ⋆ . For two-stage plug-in procedures, which typically estimate π 0 using G(u 0 ) for some u 0 (e.g. u 0 = λ for procedure Sto02),Ṫ G involves an additional term that depends on G(u 0 ), and the asymptotic distribution of the FDP depends on the centered Gaussian random variable Z(u 0 ), where Z is defined in Theorem 3.1.

Asymptotically equivalent procedures
Some multiple testing procedures cannot be written in terms of threshold functions, because they do not depend exclusively on G m , but instead also directly depend on the number m of observations. When such procedures are only slight perturbations of actual threshold procedures, they share the same asymptotic distribution, as explained below.
If Condition C.1 holds for T , and if ε m = o 1 √ m , T m is asymptotically equivalent to T as m → +∞.
Several applications of Proposition 3.6 are given in section 4. For example, the asymptotic behavior of procedure T m = STS04(λ) can be derived from that of procedure T = Sto02(λ), for which Theorem 3.2 may be used because Sto02(λ) is an actual threshold function.

Regularity conditions
For the threshold functions under investigation, T (G) is defined as the last point for which G ≥ r(A(G), ·). Therefore, the existence of a unique interior right crossing point between G and r(A(G), ·) ensures that Theorem 3.2 and Proposition 3.6 are applicable, i.e. that T (G) > 0, and that T is Hadamard differentiable at G (Condition C.1). For two-stage adaptive (plug-in) procedures, for which the level function A is not constant, additional technical assumptions concerning the regularity of A require checking (see Corollary 7.12) to ensure that Condition C.1 holds. Condition g(t) < ∂r ∂u (A(G), t) in Definition 3.7 ensures that G and r A(G) = r(A(G), ·) actually cross at t, i.e. that G ≥ r A(G) in a left-neighborhood of t, and that G ≤ r A(G) in a right-neighborhood of t.
Studies of the asymptotic distribution of the abovementioned FDR controlling procedures require investigation, in each case, of the conditions guaranteeing the existence of a unique interior right crossing point. To this end, we broke this condition down as follows: Condition C.2 (Existence). T has an interior right crossing point.

Condition C.3 (Uniqueness). T has at most one interior right crossing point.
Condition C.3 always holds for procedures based on Simes' line (BH95, Sto02, and BKY06) because their rejection curve is linear, and G is concave. Condition C.2 typically holds in situations in which the slope of G at the origin is large enough. In the case of the BH95 procedure, Chi recently showed the existence of a critical value α ⋆ depending solely on the distribution function G of the p-values, such that if α < α ⋆ , the number of discoveries made by the BH95 procedure is stochastically bounded as the number of tested hypotheses increases, whereas if α > α ⋆ , the proportion of discoveries converges in probability to a positive value τ ⋆ = T (G) [5].
In section 4, we provide a detailed analysis of a number of FDR controlling procedures, and present, for each, a critical value for the target FDR level characterising situations in which condition C.2 is guaranteed for the procedure.

Results for procedures of interest
We apply the results of the preceding section to a series of procedures with proven (asymptotic) FDR control. Starting from the original BH95 procedure and its Oracle version (section 4.1), we then turn to adaptive procedures, which implicitly or explicitly incorporate an estimate of the proportion π 0 of true null hypotheses: one-stage adaptive procedures are studied in section 4.2, and two-stage adaptive procedures (also called plug-in procedures) are studied in section 4.3.

BH95 procedure
We will first recall the definition of the BH95 procedure in our framework.
As the rejection curve of procedure BH95 is linear, and G is concave, the uniqueness Condition C.3 always holds, and the existence Condition C.2 can be reduced to α > α ⋆ , where α ⋆ = inf u→0 u/G(u) = lim u→0 1/g(u) corresponds to the critical value of the BH95 procedure [5]: Condition C.4 (Condition C.2 for the BH95 procedure). The target FDR level α is greater than the critical value α ⋆ of the BH95 procedure.
The criticality phenomenon is illustrated in Figure 2 for Laplace (double exponential) test statistics. The Weak Law of Large Numbers phenomenon analyzed by [5], which occurs when α > α ⋆ , was noted by [10]. We now derive the corresponding central limit theorem under the same hypothesis, and the asymptotic distribution of the FDP actually achieved by the BH95 procedure.
Applying the BH95 procedure at level α/π 0 leads to an Oracle procedure (as π 0 is not known) that is more powerful as it controls FDR at level exactly α. This procedure, which we denote by BH95o, has threshold function T BH95o (F ) = sup{u ∈ [0, 1], F (u) ≥ π 0 u/α} , and its critical value is therefore π 0 α ⋆ , which translates into the following regularity condition: The corresponding asymptotic properties can be derived from Theorem 4.2:

One-stage adaptive procedures
The first class of adaptive procedures studied here are one-stage adaptive procedures, because they estimate π 0 implicitly, rather than through a level function A.
The adaptive procedure associated with r α is the multiple testing procedure defined by the threshold function The rejection curve of adaptive procedures is not linear, so the conditions under which Condition C.1 is fulfilled are more subtle than for the BH95 procedure (section 4.1) or for two-stage adaptive procedures (section 4.3).
As f α (1) = 1, the corresponding threshold function is always equal to 1. This procedure therefore systematically rejects all hypotheses, and does not control FDR either for finite sample size or asymptotically. Several ways of overcoming this problem have been proposed [7], including truncating the rejection curve, yielding the following procedure: Definition 4.5 (Procedure FDR08(λ)). Let λ ∈ [0, 1). The rejection curve of the FDR08(λ) procedure is defined by f λ α (u) = f α (u) for u ≤ λ, and +∞ otherwise. The threshold function of the FDR08(λ) procedure is therefore given by We introduce the following regularity condition: Note that κ, 1−π0 1−α is the crossing point between the rejection curve f α and the distribution function DU(π 0 ) in the extremal Dirac-Uniform configuration where all p-values drawn from H 1 are equal to 0. As G = π 0 G 0 + (1 − π 0 )G 1 ≤ DU(π 0 ), condition C.6 ensures that any interior right crossing point between G and f α occurs before λ. In practice, κ is unknown because it depends on π 0 . However, an upper bound for κ can be deduced from a lower bound for π 0 ; for example, in microarray data analysis, it can often be assumed that π 0 > 1 2 : in this case, κ is smaller than α 1−α .
By definition 4.5, the rejection curve f λ α of any procedure FDR08(λ) satisfying Condition C.6 is equal to f α on [0, κ], corresponding to the admissible region for interior right crossing points. The following Proposition is a straightforward consequence of this observation: Proposition 4.6. All FDR08(λ) procedures satisfying Condition C.6 are asymptotically equivalent in the sense of Definition 3.5.
As the corresponding asymptotic distribution does not depend on λ, we will refer to it simply as the "asymptotic distribution of the FDR08 procedure". In order to characterize this distribution we introduce a further technical condition to ensure that κ < 1. Combined with Condition C.4, it also ensures that existence Condition C.2 holds for procedure FDR08(λ), because the slope of f λ α at the origin is 1/α.
Condition C.7 is a mild assumption in practice, because π 0 is typically expected to be greater than 1/2, in microarray data analysis, for example. When α ≥ π 0 , there is no need for sophisticated FDR controlling procedures because rejecting all hypotheses yields FDP = π 0 and thus FDR ≤ α.

Under uniqueness Condition C.3, and existence Conditions C.4 and C.7, we have
so that procedure FDR08 is asymptotically more powerful than procedure BH95, and less powerful than procedure BH95o.
As for the FDR08 procedure, the rejection curve of the BR08(λ) procedure is not linear and we therefore need to make two assumptions to ensure that existence Condition C.2 holds: Condition C.8 ensures that there is no criticality phenomenon, that is, that the slope of the distribution function G is great enough at the origin, and Condition C.9 ensures that a right crossing point occurs before λ, because the BR08(λ) procedure is truncated at λ: Remark 4.9. Condition C.9 may be written as G(λ) ≤ b λ α , or as G(λ) ≤ f λ α , because the rejection curves of procedures BR08(λ) and FDR08 intersect at λ. , Theorem 4.10 implies that procedure BR08(λ) controls FDR asymptotically at level α: as τ ⋆ ≤ λ, we have p ⋆ ≤ απ 0 However, as b λ α (u) ≥ u/α if and only if u ≥ λα, procedure BR08(λ) need not be more powerful than procedure BH95, and we have the following characterization: where ≫ means "is more powerful than", and τ ⋆ BH95 is the asymptotic threshold of procedure BH95. An explicit characterization of situations in which BR08(λ) ≫ BH95 for Gaussian test statistics is given in [4].

Two-stage adaptive (plug-in) procedures
In this section we study two-stage adaptive or plug-in procedures, in which a conservative step-up procedure is applied to a data-dependent level. In particular, we consider the case of Simes' line-based plug-in procedures, in which procedure BH95 is applied at level α/ π 0 , where π 0 is estimated from the data: Such procedures will simply be called plug-in procedures hereafter.
As r α is linear, and G is concave, uniqueness Condition C.3 always holds for plug-in procedures, and existence Condition C.2 is the same as for procedure BH95, except that α is replaced by the value of the level function A at G: Condition C.10 (Condition C.2 for plug-in procedures). The level function A(G) associated with the target FDR level α is greater than the critical value of the BH95 procedure.
Care is required when deriving the asymptotic distribution of the FDP for plug-in procedures, because the Hadamard derivative of T ,Ṫ G (H), typically involves the value of H at τ ⋆ and at a point u(λ) used for the estimation of π 0 : u(λ) = λ for procedure Sto02, and u(λ) = U (G, λ) for procedure BKY06(λ). The asymptotic variance of the False Discovery Proportion therefore involves the covariance between Z(τ ⋆ ) and Z(u(λ)).
We consider the two types of plug-in procedures most widely used and theoretically justified: Sto02-like procedures (Sto02 [21], STS04 [23]), in which π 0 is estimated by 1− Gm(λ) 1−λ or a slight variant, and the BKY06 procedure [2], in which an upper bound for π 0 is derived from a first application of the classical BH95 procedure.
Definition 4.13 (Procedure Sto02 [21]). Procedure Sto02 is the multiple testing procedure with threshold function The level function of this procedure is therefore π 0 G (λ) will simply be denoted by π 0 (λ).
This procedure is known to provide asymptotic control of FDR at level α [21], but does not necessarily control FDR at level α for finite sample size. This led to the definition of a modification of the Sto02 procedure that does control FDR even for finite sample size [23]: Definition 4.14 (Procedure STS04(λ) [23]). Procedure STS04(λ) rejects pvalues smaller than According to Proposition 3.6, procedures STS04(λ) and Sto02(λ) are asymptotically equivalent provided that Conditions C.9 and C.11 hold (see Proposition 7.16 page 1104 for a formal proof). where . .
is the number of hypotheses rejected by a first application of the BH95 procedure at level β. We shall consider a recently proposed generalization of this procedure [4], in which procedure BH95 is applied at level [2]). Let λ ∈ [0, 1), and The threshold function of procedure BKY06(λ)is defined for any F ∈ D[0, 1] by T BKY06(λ) (F ) = U(F, A(F )), that is, , which permits proving that this procedure controls FDR for finite sample size [4]. According to Proposition 3.6, these two procedures are asymptotically equivalent, so we will use Definition 4.17.
As procedure BKY06(λ) is based on two successive applications of procedure BH95, at level λ and α(1 − λ), Condition C.2 holds if and only if Condition C.8 holds and λ > α ⋆ .

Connection between one-stage and two-stage adaptive procedures
We have introduced two types of FDR controlling procedures generalizing the BH95 procedure: two-stage adaptive (plug-in) procedures explicitly incorporate an estimate of π 0 into the standard BH95 procedure, whereas one-stage adaptive procedures do not explicitly use such an estimate, but still provide tighter FDR control than the BH95 procedure.
We will now investigate connections between one-stage and two-stage adaptive procedures, which naturally appear when using the formalism of threshold functions: with a striking symmetry, the threshold of procedure BR08(λ) may be interpreted as a fixed point of an iterated BKY06(λ) procedure, whereas the threshold of procedure FDR08 may be interpreted as a fixed point of an iterated Sto02(λ) procedure. We provide heuristic reasons for these connections in section 5.1; in section 5.2 we present general results for the connection between one-stage and two-stage adaptive procedures, and derive consequences for the connection between procedures Sto02(λ) and FDR08 on the one hand, and between procedures BKY06(λ) and BR08(λ) on the other hand.

Heuristics
Procedures BKY06(λ) and BR08(λ). The BKY06 procedure was designed to derive an approximate upper bound for π 0 from a first application of procedure BH95, and to use this upper bound in a second application of the BH95 procedure, leading to less conservative FDR control. For λ ∈ [0, 1), the threshold function of the BKY06(λ) procedure is defined by It therefore seems natural to iterate this process, using the number of rejections at the second application to find a less conservative upper bound for π 0 , and to use this new upper bound in a third application of the BH95 procedure, and so on. Based on this idea, Benjamini et al. suggested defining a multi-stage procedure for the particular situation in which λ = α 1+α [2]. In our framework, this iterative process suggests the introduction of a fixed-point procedure defined for any F ∈ D[0, 1] by: The term fixed-point procedure refers to the following property of the corresponding asymptotic threshold τ ⋆ ∞ = T BKY06(λ) ∞ (G). Let us suppose that τ ⋆ ∞ is the threshold obtained at a given stage of the abovementioned iteration pro- is also the asymptotic threshold at the next stage, and is thus a fixed point of the iteration process. It turns out that this fixed-point procedure is the BR08(λ) procedure investigated in section 4.2: F (u) ≥ u α(1−λ) (1 − F (u)) may be written as F (u) ≥ u α(1−λ)+u , and the right-hand side is the rejection curve b λ α of the BR08(λ) procedure.
Procedures Sto02(λ) and FDR08(λ). The same idea may be adapted to procedure Sto02(λ), which is defined for 0 ≤ λ < 1 by the threshold function If τ λ = T Sto02(λ) ( G m ) denotes the empirical threshold of procedure Sto02(λ), one may use τ λ to estimate π 0 , that is, calculate the threshold given by procedure Sto02( τ λ ), and so on. This suggests that an associated fixed-point procedure could be defined as Again, the term fixed-point procedure refers to the fact that if τ ⋆ ∞ = T (G) is used as a new λ to estimate π 0 in procedure Sto02(λ), then the asymptotic threshold of procedure Sto02(λ) is also τ ⋆ ∞ , which is therefore a fixed point of the iteration process. It turns out that this fixed-point procedure is the FDR08 procedure investigated in section 4.2: 1−u may be written as F (u) ≥ u α+(1−α)u , and the right-hand side is the rejection curve f α of the FDR08 procedure.

Formal connections
We present a general result concerning connections between one-stage and twostage adaptive procedures, providing a formal justification for the connections mentioned in section 5.1, and accounting for their symmetry. This result is based on the following assumption concerning the threshold function of the one-stage adaptive procedure: Remark 5.1. In Condition C.13, c α (F, ·) is not the rejection curve of procedure T , because it depends on F . For example, for procedure FDR08, we will use Theorem 5.2 shows that we can associate with a one-stage adaptive procedure fulfilling Condition C.13 a two-stage adaptive procedure with linear rejection curve, and level function given by for fixed t ∈ (0, 1). The asymptotic threshold of the one-stage procedure may then be interpreted as the fixed point of iterations of the two-stage procedure.
Theorem 5.2 (Connection between one-stage and two-stage adaptive procedures). Let λ ∈ (0, 1). Let us consider a multiple testing procedure with a threshold function T that may be written as for any F ∈ D[0, 1]. Let T t be the threshold function defined by for any t ∈ (0, 1) and any F ∈ D[0, 1]. Let us assume that existence Condition C.2 and uniqueness Condition C.3 hold for procedure T , and that, for any t ∈ (0, 1), existence Condition C.2 holds for procedure T t . Let τ ⋆ = T (G) and τ (t) = T t (G) be the asymptotic thresholds of procedures T and T t , respectively. If c α satisfies Condition C.13, we have Corollary 5.3 (Asymptotic power comparison). With the same notation and under the same conditions, the following assertions are equivalent: In the remainder of this section, we use Theorem 5.2 to characterize the connection between the abovementioned procedures.
Procedures Sto02(λ) and FDR08(λ). Theorem 5.4 gives the convergence of the process consisting of the recursive use of the asymptotic threshold of procedure Sto02(λ) as a new λ. It holds under the same regularity conditions as those required to obtain the asymptotic distribution of procedure FDR08.
be the asymptotic threshold of procedure Sto02(u). For any t ∈ (0, 1), define the sequence (t n ) ∈ [0, 1] N by t 0 = t, and t i+1 = τ (t i ) for i ∈ N. Let us assume that uniqueness Condition C.3 holds for procedure FDR08, and that the target FDR level α satisfies existence Conditions C.4 and C.7. Then, Corollary 5.5 (Asymptotic power comparison -Sto02(λ) vs FDR08). With the same notation and under the same conditions, procedure Sto02(λ) is asymptotically more powerful than procedure FDR08 if and only if λ > τ (λ).
When using procedure Sto02(λ) in practice, we would not want any of the rejected hypotheses to be incorporated into the estimation of π 0 . Thus, the empirical rejection threshold T Sto02(λ) ( G m ) should be less than λ. In such situations, as T Sto02(λ) ( G m ) converges at rate 1/ √ m to τ (λ) = T Sto02(λ) (G), procedure Sto02(λ) is probably more powerful than procedure FDR08 according to Corollary 5.5.
For example, setting λ to a value less than α 1+α , corresponding to the original BKY06 procedure [2], ensures that the associated BR08(λ) procedure is asymptotically more powerful than the associated BKY06(λ) procedure.

Concluding remarks
This paper demonstrates the power and flexibility of the formalism of threshold functions, making it possible to derive the asymptotic properties of well known FDR controlling procedures with their associated regularity conditions, and to identify and characterize novel connections between one-stage and two-stage adaptive procedures. These results are summarized in Table 1. We should recall that the threshold function associated with the level function A and rejection curve r α = r(α, ·) is defined by By definition, the level function A equals α for one-stage procedures, and the rejection curve of Simes' line-based procedures is r α : u → u/α.
Regularity conditions. For one-stage adaptive procedures FDR08 and BR08(λ), the uniqueness Condition C.3 has to be assumed (cf. Table 1): as the rejection curve is not linear, the interior right crossing point is not necessarily unique; in practice the uniqueness condition holds except in pathological situations. For Simes' line-based procedures BH95, Sto02(λ) and BKY06(λ), existence Condition C.2 holds provided that the slope of the distribution function exceeds a certain threshold at the origin (that is, that there is no criticality phenomenon). For one-stage adaptive procedures, it is also required that the rejection curve r α ends below the distribution function G, which corresponds to Condition C.7 for procedure FDR08, and Condition C.9 for procedure BR08(λ).
The criticality phenomenon studied by [5] is intrinsic to the multiple testing problem, and not specific to a given procedure, as the minimum attainable pFDR level β ⋆ = inf t>0 pFDR(t) depends solely on the parameters of the model [5]. When β ⋆ = 0, say for the Gaussian location problem, there is no criticality phenomenon for any procedure: α ⋆ = 0, and all existence Conditions concerning the behavior of the distribution function G close to 0 are fulfilled for any procedure, and for any target FDR level α. When β ⋆ > 0, say for the Laplace  location problem (Figure 2, page 1076), there is a criticality phenomenon for every procedure; however the critical value, that is, the minimum target FDR level for which existence Condition C.2 holds, may depend on the procedure, as illustrated by the existence conditions in Table 1.
Power comparisons. All procedures are asymptotically conservative, and therefore yield asymptotic FDR below the target level. Procedures FDR08 and Sto02 (and thus STS04) are always more powerful than procedure BH95, but this is not the always the case for procedures BR08(λ)(section 4.2) and BKY06(λ) (section 4.3).
For one-stage adaptive procedures, for any λ ∈ (0, 1) such that the regularity conditions for procedures FDR08 and BR08(λ) hold, FDR08 is asymptotically more powerful than BR08(λ). Indeed, Condition C.9 ensures that the asymptotic thresholds of both procedures are less than λ. As the rejection curve f α of procedure FDR08 is smaller than the rejection curve b λ α of BR08 on [0, λ], the asymptotic threshold of procedure FDR08 is greater than that of procedure BR08(λ). However, it should be noted that procedure BR08(λ) does control FDR for a finite number of tested hypotheses, whereas procedure FDR08 does not.
For two-stage adaptive procedures, for any λ ∈ (0, 1) such that the regularity conditions for procedures Sto02(λ) and BKY06(λ) hold, Sto02(λ) (and thus STS04) is asymptotically more powerful than BKY06(λ), as demonstrated by the corresponding asymptotic FDR levels in Table 1: as u λ ≤ λ, we have . This suggests that procedure STS04(λ) is preferable to procedure BKY06(λ) in practice. This recommendation should be balanced against the choice of λ and the desired robustness to dependence between null hypotheses. Based on a simulation study, procedure Sto02(α) was recently reported to be much more robust to positive dependence between null hypotheses than procedure Sto02(1/2) [4], which is still a standard choice in practical implementations, such as the SAM (Significance Analysis of Microarrays) software [24].
Towards optimality. This comparison raises the question of whether the formalism of threshold functions can be used to derive procedures more powerful than those studied here. One possible approach consists of trying to improve the estimation of π 0 to build a procedure closer to the Oracle BH95 procedure, as discussed in [11]. However, consistent estimators of π 0 have slower convergence rates than 1/ √ m, resulting in slower convergence rates than 1/ √ m for the associated FDP. This may be illustrated by the influence of λ on procedure Sto02(λ): the larger λ, the smaller the bias E [ π 0 (λ)] − π 0 , and the larger the variance of π 0 (λ). The question of how to choose λ as a function of the number of hypotheses tested and the assumed regularity of G is discussed in another work [14].
Another possibility would be to consider procedures more general than those used in this paper: the BH95o procedure has been shown to give the lowest false non discovery rates (FNR) of the threshold procedures controlling FDR at level α [10]. The question of optimality in a broader family of testing procedures has recently been raised [25]: Z score-based threshold procedures may outperform p value-based threshold procedures, as they make it possible to choose different significance thresholds for positive and negative significance cutoffs. This suggests to extend our framework to Z score-based procedures.
Confidence intervals. An interesting practical application of this work concerns the derivation of asymptotic confidence intervals for the FDP of a given procedure. Our results give explicit asymptotic distributions for the attained FDP, but this issue is not straightforward because these distributions depend on unknown quantities, including the proportion π 0 , the asymptotic threshold FDR τ ⋆ , or the distribution function G and its associated density g. These quantities should, in turn, be estimated. Bootstrapping techniques could be used for this purpose; we leave this question for further research.
Extension to other dependence settings. We have derived the asymptotic properties of several multiple testing procedures and the associated regularity conditions in the situation in which p-values are independent. However, our formalism makes it possible to deal with any dependence situation for which the vector ( G 0,m , G 1,m ) of empirical distribution functions of the p-values under the null and alternative hypotheses satisfies Donsker's invariance principle. For example, the form of the asymptotic distributions of the threshold T ( G m ) and the associated FDP would remain the same in the conditional dependence model recently proposed by Wu [28].

Asymptotic FDP: general threshold functions
In this section, we provide proofs for the results of section 3.

Proof of Theorem 3.2
The following lemma will be used in several subsequent proofs.
as H is continuous. The first term goes to 0 as t → 0 by the convergence of H t to H on D[0, 1], and the second term also tends to 0 by the continuity of H, because lim t→0 u t = u.
where τ ⋆ = T (G) and τ ⋆ t denotes T (G + tH t ). By the Hadamard differentiability of T at G tangentially to C[0, 1], we have, as H = π 0 H 0 + (1 − π 0 )H 1 is continuous at τ ⋆ , In order to conclude, we notice that according to Lemma 7.1, which concludes the proof.

Proof of Proposition 3.6
Lemma 7.4 states the asymptotic equivalence between a multiple testing procedure defined as a threshold function and a slight modification of this procedure. Proof of Lemma 7.4. The proof is based on the idea that, as ε m = o 1 √ m , and G m converges at rate 1 √ m to G, a modification of T of the order of ε m does not change the asymptotic distribution of the associated FDP, because T is Hadamard-differentiable. For the sake of simplicity in notation, we prove only that Indeed, as the associated FDP is a Hadamard-differentiable function of the empirical distribution functions under the null and alternative hypotheses G 0,m and G 1,m , the arguments developed below can be transposed (but with much more cumbersome notation) to prove that According to Donsker's invariance principle (Theorem 3.1), Z m converges in distribution on [0, 1] to a Gaussian process with continuous sample paths. For Z ∈ D[0, 1], let We have According to Condition C.1, T is Hadamard-differentiable at G tangentially to C[0, 1]. Therefore, for any sequence Z m of D[0, 1] that converges to Z ∈ Thus, φ m (Z m ) converges to 0 for any sequence Z m of D[0, 1] that converges to Z ∈ C[0, 1]. Therefore, according to the Extended Continuous Mapping Theorem [27,Theorem 18.11], φ m (Z m ) converges in distribution (hence also in probability) to 0. Let where G 0,m and G m are non decreasing functions, so that As T ε m ≤ T m ≤ T for any m ∈ N, Lemma 7.4 ensures that √ m ( τ m − τ ) and √ m ( τ ε m − τ ) converge to 0 in probability. Therefore, as G m ( τ ) and G m ( τ ε m ) converge in probability to G (T (G)) as m → +∞, we have ) also converges in probability to 0 (according to Lemma 7.4).

Asymptotic FDP: specific threshold functions
We now apply the results of section 7.1 to threshold functions of the form In this section, we will use the notation r : (α, u) → r α (u) whenever the dependence of r α in α is of importance. We begin by giving sufficient conditions for the regularity of U and A under which Condition C.1 holds (section 7.2.1, Proposition 7.5). Then we provide sufficient conditions for U to be regular enough to be consistent with hypotheses (i) to (iii) of Proposition 7.5 (section 7.2.2). Finally we derive the asymptotic distribution of the corresponding False Discovery Proportion (section 7.2.3). tangentially to C[0, 1], for any α in a neighborhood of A(G); its derivative will be denoted by ∇ F U G,α ; (ii) ∇ F U G,. is continuous at A(G); (iii) U is differentiable with respect to its second variable; its derivative will be denoted by ∇ α U (G,A(G)) ; (iv) A is Hadamard-differentiable at G tangentially to C[0, 1]; its derivative will be denoted byȦ G .
As A is continuous at G (by (iv)), A (G + tH t ) lies in a neighborhood of A(G) for small t > 0, and U is Hadamard-differentiable with respect to its first variable at (G, A (G + tH t )) by (i). We therefore have by the continuity of ∇ F U G,· at A(G) (ii). Then, combining (iii) and (iv) yields which concludes the proof.

Regularity of U
The crucial point for proving the desired regularity of U is its Hadamard differentiability with respect to its first variable at (G, α), tangentially to C[0, 1], for α in a neighborhood of A(G). Lemma 7.6 is a straightforward analytical translation of Conditions C.2 and C.3.
Lemma 7.6. Under Conditions C.2 and C.3, the unique interior right crossing point τ ⋆ between r α and G is positive.
We begin by proving the continuity of U at (G, α) for α in a neighborhood of A(G). We then (Proposition 7.11) provide the sufficient conditions for conditions (i), (ii), and (iii) of Proposition 7.5 to hold. Lemma 7.7. For any F ∈ D[0, 1], and α ∈ [0, 1] such that r α is continuous, one of the following two assertions holds: Proof of Lemma 7.7. According to the definition of U(F, α), F (u) ≤ r α (u) for any u > U(F, α). As F is right-continuous and r α is continuous, we have F (U(F, α)) ≤ r α (U(F, α)). Therefore, either (i) holds, or F (U(F, α)) < r α (U(F, α)). In the second case, according to the definition of U(F, α), there is a non decreasing sequence (u n ) that converges to U(F, α) such that F (u n ) ≥ r α (u n ). As r α is continuous and F is left-continuous, we have F (U(F, α) − ) ≥ r α (U(F, α)), which proves (ii).
We thus work with this truncated version for the remainder of the proof. By definition, the rejection curve of procedure STS04 is larger than that of procedure Sto02. Therefore, we have T STS04(λ) m (F ) ≤ T Sto02(λ) (F ) for any F ∈ D[0, 1]. With the same argument we also have T Sto02(λ) F − 1 m ≤ T STS04(λ) m (F ) for any F ∈ D[0, 1]. As we have assumed that Condition C.11 holds, Condition C.1 holds for T according to Proposition 7.15, and the result follows from Proposition 3.6.