Detection boundary in sparse regression

We study the problem of detection of a p -dimensional sparse vector of parameters in the linear regression model with Gaussian noise. We establish the detection boundary, i


Introduction
We consider the linear regression model with random design: where θ j ∈ IR are unknown coefficients, ξ i are i.i.d.N (0, σ 2 ) random variables, X ij are random variables, which are identically distributed, and (X ij , 1 ≤ i ≤ n) are independent for any fixed j with EX ij = 0, EX 2 ij = 1.We study separately the 1 settings with known σ > 0 (then assuming that σ = 1 without loss of generality) and unknown σ > 0. We also assume that X ij , 1 ≤ j ≤ p, 1 ≤ i ≤ n, are independent of ξ i , 1 ≤ i ≤ n.
Based on the observations Z = (X, Y ) where X = (X ij , 1 ≤ j ≤ p, 1 ≤ i ≤ n), and Y = (Y i , 1 ≤ i ≤ n), we consider the problem of detecting whether the signal θ = (θ 1 , . . ., θ p ) is zero (i.e., we observe the pure noise) or θ is some sparse signal, which is sufficiently well separated from 0. Specifically, we state this as a problem of testing the hypothesis H 0 : θ = 0 against the alternative H k,r : θ ∈ Θ k (r) = {θ ∈ IR p k : θ ≥ r}, where IR p k denotes the ℓ 0 ball in IR p of radius k, • is the Euclidean norm, and r > 0 is a separation constant.
The smaller is r, the harder is to detect the signal.The question that we study here is: What is the detection boundary, i.e., what is the smallest separation constant r such that successful detection is still possible?The problem is formalized in an asymptotic minimax sense, cf.Section 2 below.This question is closely related to the previous work by several authors on detection and classification boundaries for the Gaussian sequence model [4,8,9,10,11,12,13,15,16,17,18,19,20,21,22,23,24].These papers considered model (1.1) with p = n and X ij = δ ij , where δ ij is the Kronecker delta, or replications of such a model (in classification setting).The main message of the present work is that, under some conditions, the detection boundary phenomenon similar to the one discussed in those papers, extends to linear regression.Our results cover the high-dimensional p ≫ n setting.
We now give a brief summary of our findings under the simplifying assumption that all the regressors X ij are i.i.d.standard Gaussian.We consider the asymptotic setting where p → ∞, n → ∞ and k = p 1−β for some β ∈ (0, 1).The results are different for moderately sparse alternatives (0 < β < 1/2) and highly sparse alternatives (1/2 < β < 1).We show that for moderately sparse alternatives the detection boundary is of the order of magnitude whereas for highly sparse alternatives (1/2 < β < 1) it is of the order This solves the problem of optimal rate in detection boundary for all the range of values (p, n).Furthermore, for highly sparse alternatives under the additional assumption we obtain the sharp detection boundary, i.e., not only the rate but also the exact constant.This sharp boundary has the form where The function ϕ(•) here is the same as in the above mentioned detection and classification problems, as first introduced in [15].We also provide optimal testing procedures.In particular, the sharp boundary (1.5)- (1.6) is attained on the Higher Criticism statistic.One of the applications of this result is related to transmission of signals under compressed sensing, cf.[7,5].Assume that a sparse high-dimensional signal θ is coded using compressed sensing with i.i.d.Gaussian X ij and then transmitted through a noisy channel.Observing the noisy outputs Y i , we would like to detect whether the signal was indeed transmitted.For example, this is of interest if several signals appear in consecutive time slots but some slots contain no signal.Then the aim is to detect informative slots.Our detection boundary (1.5) specifies the minimal energy of the signal sufficient for detectability.We note that ϕ(•) < √ 2, so that successful detection is possible for rather weak signals whose energy is under the threshold 2k log(p)/n.This can be compared with the asymptotically optimal recovery of sparsity pattern (RSP) by the Lasso in the same Gaussian model as ours [29,30].Observe that the RSP property is stronger than detection (i.e., it implies correct detection) but [29] defines the alternative by {θ ∈ IR p k : |θ j | ≥ c log(p)/n, ∀j} for some constant c > 2, which is better separated from the null than our alternative Θ k (r).Thresholds that are larger in order of magnitude are required if one would like to perform detection based on estimation of the values of coefficients in the ℓ 2 norm [3,5].
In many applications, the variance of the noise ξ is unknown.Does the problem of detection become more difficult in this case?In order to answer this question, we investigate the detection boundaries in the unknown variance setting.Related work [27,28] develop minimax bounds for detection in model (1.1) under assumptions different from ours and under unknown variance.However, [27] does not provide a sharp boundary.Here, we prove that for β ∈ (1/2, 1) and k log(p) ≪ √ n, the detection boundaries are the same for known and unknown variance.In contrast, when k log(p) ≫ √ n, the detection boundary is much larger in the case of unknown variance than for known variance.We also provide an optimal testing procedure for unknown variance.
After we have obtained our results, we became aware of the interesting parallel unpublished work of Arias-Castro et al. [2].There the authors derive the detection boundary in model (1.1) with known variance of the noise for both fixed and random design.Their approach based on the analysis of the Higher Criticism shares some similarities with our work.When the variables X ij are i.i.d.standard normal and the variance is known, we can directly compare our results with [2].In [2] the detection boundaries analogous to (1.2) and (1.3) do not contain the minimum with the n −1/4 term, because they are proved in a smaller range of values (p, n) where this term disappears.In particular, the conditions in [2] exclude the high-dimensional case p ≫ n.We also note that, due to the constraints on the classes of matrices X, [2] obtains the sharp boundary (1.5)-(1.6)under the condition p 1−β (log(p)) 2 = o( √ n) which is more restrictive than our condition (1.4).
The other difference is that [2] does not treat the case of unknown variance of the noise.
Below we will use the following notation.We write Z = (X, Y ) where X = (X ij , 1 ≤ j ≤ p, 1 ≤ i ≤ n), and Y = (Y i , 1 ≤ i ≤ n) are the observations satisfying (1.1).Let P θ be the probability measure that corresponds to observations Z, P θ,i be those corresponding to observations Z i = (X i1 , ..., X ip , Y i ) with fixed i = 1, ..., n, and P X , P X,i be the probability measures corresponding to observations X or X (i) = (X i,j , 1 ≤ j ≤ p).We denote by P X θ and P X θ,i the conditional distributions of Y given X and of Y i given X (i) respectively.The corresponding expectations are denoted by E X θ and E X θ,i .Clearly, and We denote by X j ∈ IR n the jth column of matrix X = (X ij ), and set

Detection problem
For θ ∈ IR p , we denote by M(θ) = p j=1 1I {θ j =0} the number of non-zero components of θ, where 1I {A} is the indicator function.As above, let IR p k , 1 ≤ k ≤ p, denote the ℓ 0 ball in IR p of radius k, i.e., the subset of IR p that consists of vectors θ with M(θ) ≤ k, or equivalently, θ ∈ IR p k contains no more than k nonzero coordinates.In particular IR p p = IR p .Recall the notation Θ k (r) = {θ ∈ IR p k : θ ≥ r}.We consider the problem of testing the hypothesis H 0 : θ = 0 against the alternative H k,r : θ ∈ Θ k (r).In this paper we study the asymptotic setting where p → ∞, n → ∞ and k = p 1−β .The coefficient β ∈ [0, 1] is called the sparsity index.We assume in this section that σ 2 is known.Modifications for the case of unknown variance are discussed in Section 4.2.1.

Assumptions on X
We will use at different instances some of the following conditions on the random variables X ij .A1.The random variables Let U j , 1 ≤ j ≤ p be random variables such that we have equality in distribution L(U j , U l ) = L(X ij , X il ), 1 ≤ j < l ≤ p.We will need the following technical assumptions.
4 Main results

Detection boundary under known variance
For this case we suppose σ = 1 without loss of generality.
Remark 4.1 This theorem can be extended to non-random design matrix X. Inspection of the proof shows that, instead of B1, we only need the assumption: For some B n,p tending to ∞ slowly enough, Indeed, B1 is used in the proofs only to assure that (4.4) holds true with P Xprobability tending to 1 (this is deduced from assumption B1 and (3.6)).Also instead of B2 and B3, we can assume that there exists η n,p → 0 such that Under B2, B3, relations (4.5) hold with P X -probability tending to 1, see Corollary 7.1.
The result of the theorem remains valid for non-random matrices X satisfying (4.4) and (4.5).

Upper bounds
In order to construct a test procedure that achieves the detection boundary, we combine several tests.
First, we study the widest non-sparse case k = p, i.e., we consider Θ p (r) = {θ ∈ IR p : θ ≥ r}.Consider the statistic which is the H 0 -centered and normalized version of the classical χ 2 n -statistic n i=1 Y 2 i .The corresponding tests ψ 0 α and ψ 0 are of the form: where α ∈ (0, 1), u α is the (1 − α)-quantile of the standard Gaussian distribution and T np is any sequence satisfying T np → ∞.
We now introduce a test ψ 1 α that achieves the second boundary (4.2).Consider the following kernel The U-statistic t 1 based on the kernel K is defined by Note that the U-statistic t 1 can be viewed as the H 0 -centered and normalized version of the statistic χ 2 p = n p j=1 θ2 j based on the estimators θj = n −1 n i=1 Y i X ij : Indeed, up to a normalization, the first sum is the U-statistic t 1 , and moving off the second sum corresponds to centering.Given α ∈ (0, 1), we consider the test ψ 1 α = 1I We can omit the condition p = o(n 2 ) since the test ψ 0 α/2 achieves the optimal rate for p ≥ n.Combining this bound with Theorem 4.1, we conclude that ψ * α simultaneously achieves the optimal detection rate for all β ∈ (0, 1/2]. We now turn to testing in the highly-sparse case β ∈ (1/2, 1).Here we use a version of "Higher Criticism Tests" (HC-tests, cf.[8]).Set Let q i = P (|N (0, 1)| > |y i |) be the p-value of the i-th component and let q (i) denote these quantities sorted in increasing order.We define the HC-statistic by simultaneously achieves the optimal detection rate for all β ∈ (1/2, 1).
In conclusion, under Assumption A3, the test max(ψ 0 α/2 , ψ 1 α/2 , ψ HC ) simultaneously achieves the optimal detection rate for all β ∈ (0, 1).The detection boundary is of the order of magnitude Furthermore, we establish the sharp detection boundary (i.e., with exact asymptotic constant) of the form

Detection boundary under unknown variance 4.2.1 Detection problem
Since the variance of the noise is now assumed to be unknown, the tests ψ under study should not require the knowledge of σ 2 .The type I error probability is now taken uniformly over σ > 0: The type II error probability over an alternative Θ ⊂ R p is Similarly to the setting with known variance, we consider the sum of the two errors: Finally, the minimax total error probability in the hypothesis testing problem with unknown variance is

Lower bounds
Take The detection boundary stated in Theorem 4.5 does not depend on the unknown σ 2 .This is due to the definition (4.9) of the type II error probability β un (ψ, Θ k (r)) that considers alternatives of the form σθ with θ ∈ Θ k (r).

Upper bounds
The HC-test ψ HC defined in (4.7) still achieves the optimal detection rate when the variance is unknown as shown by the next proposition.
In conclusion, in the setting with unknown variance we prove that the sharp detection boundary (i.e., with exact asymptotic constant) of the form , for a larger zone of values (p, n) than for the case of known variance.However, this extension corresponds to (p, n) for which the rate itself is strictly slower than under the known variance.Indeed, if the variance σ 2 is known, as shown in Section 4.1, the detection boundary is of the order (4.8).Thus, there is an asymptotic difference in the order of magnitude of the two detection boundaries for k log(p) ≫ √ n.
5 Proofs of the lower bounds

The prior
Let us consider a random vector θ = (θ j ) with coordinates θ j = bε j , where ε j ∈ (0, +1, −1) iid, such that This introduces a prior probability measure π j on θ j and the product prior measure π = p j=1 π j on θ.The corresponding expectation and variance operators will be denoted by E π and Var π .
Proof.Observe that We have Applying the Chebyshev inequality, we get with and similarly, π(m(θ) > k) → 0. 2 Lemma 5.1 implies that, in order to obtain asymptotic lower bounds for the minimax problem, we only have to study the Bayesian problem which corresponds to the prior π, see for instance [18], Proposition 2.9.Consider the mixture and the likelihood ratio In order to prove the lower bounds we only need to check that L π (Z) → 1 in P 0 − probability. (5.1) . For β > 1/2, we take c ∈ (0, 1) such that x c = x/c < ϕ(β), which is possible as x < ϕ(β).We will use the short notation x and a for x c and a c = b √ n = a/c.We set which corresponds to he −a 2 j +a j T = 1.

Study of the likelihood ratio L π
First observe that by (1.7) Note that conditional measure P X θ corresponds to observation of the Gaussian vector N (v, I n ) where v = p j=1 θ j X j , I n is the n × n identity matrix, and the likelihood ratio under the expectation is where We can write where π Z = p j=1 π Z,j is the random probability measure on IR p with the density i.e., the measure π Z,j is supported at the points {0, b, −b} and where we set Proof of Proposition 5.1 is given in Section 5.3.
Propositions 5.1 and 5.2 imply that, for any δ > 0, Since E 0 L π = 1 and L π (Z) ≥ 0, this yields L π → 1 in P 0 -probability.This yields indistinguishability in the problem.Let us consider the random measure πZ = p j=1 πZ,j , where πZ,j is supported at the points {0, b, −b} and πZ,j (0 , and observe that the event A ± j implies q ± Z,j ≤ 1/2, i.e, the measures πZ,j are correctly defined.We define the event Proof.Denote A c the complement of the event A. Since y ′ j ∼ N (0, 1) under P 0 , we have By Corollary 7.1 we get a j = b X j ∼ b √ n uniformly in 1 ≤ j ≤ p in P Xprobability.By (7.1) this implies p j=1 Φ(−T j ) = o(1) in P X -probability.2 We can replace the measure π Z by πZ in (5.3).This follows from the following lemma Lemma 5.3 In P 0 -probability, (5.5) Proof.Applying the equality E πZ (dπ Z /dπ Z ) = 1 and the inequality 1 + x ≤ e x , we get Consequently, we only have to prove that in P 0 -probability, Since H(Z) ≥ 0, the last relation follows from .
By Lemma 5.2, it is sufficient to study these terms under the event A which corresponds to max 1≤j≤p q ± Z,j ≤ 1/2.Under this event, we have h ± Z,j = q ± Z,j /λ j , λ j = 1 + q + Z,j + q − Z,j − h, and direct calculation gives , where By Lemma 5.2, the relation (5.3) follows from π(Σ) → 1, in P 0 -probability.Thus, we only need to check that in P 0 -probability, E πZ ∆ 2 → 0 for ∆ = ∆(X, θ) defined by (5.2).By Markov inequality, the last relation follows from )) → 0, in P X -probability.Let us introduce the events X n,p .Taking a positive family η = η n,p → 0, we set It follows from Corollary 7.1 that, under assumptions B2 or B3 we can take η = η n,p → 0 such that P X (X n,p ) → 1.We have where (5.10) Let us take the expectation E X 0 over Y of each of these expressions.We define the vector Here, E X V refers to the expectation of Y over the Gaussian measure N (V, I n ).We derive that η r η s (X jr , X js ) P j 1 ,...,jm (η), where Let us define Then, P j 1 ,...,jm (η) writes as We have ) η s η r (X js , X jr ) P j 1 ,j 2 ,j 3 (η), (5.13) η s η r (X js , X jr ) P j 1 ,j 2 ,j 3 ,j 4 (η).(5.14)

Evaluation of probabilities
By definition of (z 1 , . . ., z m ) we have where we set, for the Gaussian random vector (z 1 , z 2 ) with Ez k = 0, The control of P j 1 ,...,jm (η) then depends on the sequence x np .CASE 1: x = 0.Under the event X n,p , we have max j a j = o( log(p)) and Tj k / log(p) → ∞.Under the event X np , we have We conclude Let us define  Let us turn to the second term in (5.15).If Tj k ≥ log(p), then If Tj k ≤ log(p), we have Tj k r ks = o(1) under the event X np .By Lemma 7.3 and previous evaluations, we get Finally, we obtain (5.19)

Evaluation of A 2
We have b 2 max 1≤j 1 <j 2 ≤p |(X j 1 , X j 2 )| = o(1) under the event X n,p .Since P j 1 ,j 2 (η) = O(1), we get from (5.12) By Assumption B1, we have It then follows from (5.6) and (5.9) that A 2 is of the order in P X -probability.

It follows that
By Taylor expansion of the exponential function, the expectation over η is of the form, for c sr = b 2 (X js , X jr ), Under the event X n,p , we derive from (5.7) that where we derive from (3.6) Applying Markov's inequality yields Combining these bounds, we obtain , we get A 3 = o P X (1).

Evaluation of A 4
Let us evaluate the item A 4 .Similarly to A 3 , we can write η s η r (X js , X jr ) P j 1 ,j 2 ,j 3 ,j 4 (η) . (5.20) Under the event X n,p we have .
CASE 1: x > 0. By (5.18), we have Applying a Taylor expansion of the exponential term in (5.20) yields where CASE 2: x = 0.By (5.16), P j 1 ,j 2 ,j 3 ,j 4 (η) = 1 − o(p −2 ).Arguing as in Case 1, we get All in all, we obtain that under the event X n,p , where We combine the classical upper bounds, with (3.6) and obtain

Proof of Proposition 5.2
We will prove that there exists a family of events Z n,p such that P 0 (Z n,p ) → 1 and We take Z n,p = {(X, Y ) : |y ′ j | ≤ T j , 1 ≤ j ≤ p, X ∈ X n,p } where X n,p was defined in Section 5.3.2.It follows from Lemma 5.2 and Section 5.3.2 that P 0 (Z n,p ) → 1.
Under the events Z n,p we can replace the quantities (h/2)e d ± j /2 by q ± j = (h/2)e d ± j 1I ±y ′ j <T j , cf.Section 5.3.1.Let us consider Under the event A = A n,p = p j=1 (A + j + A − j ) defined in Section 5.3.1, we have uniformly in 1 ≤ j ≤ p, as h → 0. Consequently, we have Thus, we need to show that A 1 → 0 and that A 2 → 0 in P 0 -probability.It was stated in the proof of Lemma 5.3 that E X 0 A 2 = o P X (1).Markov's inequality then allows to derive that A 2 = o P X (1).In order to prove the first relation, we shall show that E X 0 A 1 → 0 and that Var X 0 A 1 → 0 in P 0 -probability.Observe that Φ(−T j + a j ).By (7.1) and ( 7.2) we have We have Var X 0 A 1 ≤ B + A 2 with B = 1≤j<l≤p ∆j ∆l and ∆j = ∆ j − E X 0 ∆ j .We need to check that, in P X probability, Note that E X 0 ( ∆j ∆l ) = B jl − C jl , where We consider independent random variables η 1 , η 2 taking values −1 and 1 with probabilities 1/2.We write (compare with (5.12)) Here we set We obtain the new decomposition where Let us recall some notations introduced in Section 5.3.4.r jl (η) = η 1 η 2 r jl , Moreover, z j and z l stand for standard Gaussian variables with Cov(z j , z l ) = r jl (η).Then, P j,l (η) is written as +P X 0 (z j < − Tj + bm jl (η), z l < − Tl + bm lj (η)) .
CASE 1: x = 0.The evaluations of the terms V jl in (5.21) are similar to the ones in Section 5.3.4.We get We derive that h 2 1≤j<l≤p V j,l = o(h 2 ).
CASE 2: x > 0. We have (compare with (5.17) and (5.19)) Taking the expectation over η, we get in P X -probability.Therefore Under X n,p we have r 2 jl ∼ n −2 (X j , X l ) 2 .Since E X [(X j , X l ) 2 ] = O(n) for j = l (Assumption B1), we get 1) and since x > 0, we derive that k = o( √ n).Consequently, we have Let us turn to the terms U jl .They are handled as in Section 5.3.2.We have Then, we get where Arguing as for H, we get . The proposition follows. 2

Proof of Theorem 4.5
An in the proof of Theorem 4.1, we consider x = lim sup x n,p and we take c ∈ (0, 1) such that x c = x/c < ϕ(β).We also define b = x c log(p)/n.We first consider the case where k log(p)/n → 0. We use a different prior π than for Theorem 4.1.Let us note M(k, p) the collection of subsets of {1, . . ., p} of size k.We consider a random vector θ = (θ j ) with coordinates θ j = bǫ j where ǫ j ∈ (0, 1).The set of non-zero coefficient of ǫ is drawn uniformly in M(k, p).This introduces a prior probability π on θ.
Consider the mixture and the likelihood ratio As in the proof of Theorem 4.1, we shall prove that L π (Z) converges to 1 in P 0 probability.This will enforce that γ un n,p,k [x c k log(p)/ √ 1 − kb 2 ] → 1.Since kb 2 converges to 0, this will complete the proof.
Since E 0 [L π (Z)] = 1, we get the desired result by combining these two lemmas.
Let us turn to the case k log(p)/n → ∞.We consider b > 0 defined by Lemma 5. 6 We have This lemma implies that for r = (2β − 1)k log(p)/n → ∞, we have γ un n,p,k (r) → 1.

2
In the proof of the following lemmas, o(1) stands for a positive quantity which depends only on (k, p, n) and tends to 0 as (n, p) tend to infinity.

Proof of Lemma 5.4
In order to upper bound Let us take the expectation of When S = 0, we have Let us now consider the case S > 0. On the event Ω, we have n,p -restricted isometry of order 2k.Then, we can upper bound the expectation with respect to Y .
For any x < 0, we have Φ(x) ≤ e −x 2 /2 .Hence, we get Φ(x) ≤ e −x 2 − /2 for any x ∈ R. It follows that Hence, we get where S follows a hypergeometric distribution with parameters p, k and k/p.We know from Aldous (p.173) [1] that S has the same distribution as the random variable E(U|B p ) where U is binomial random variable of parameters k, k/p and B p some suitable σ-algebra.By a convexity argument, we then obtain By symmetry, it is sufficient to prove that The value of these expectations depends on i through the property "i ∈ m" or "i / ∈ m".Let us assume for instance that 1 ∈ m and 2 / ∈ m.Then, we get First, we upper bound ].We first take the expectation of L m (Z) conditionally to X 1 and Y : Then, we take the expectation with respect to Y Moreover, on Ω 2 we have Arguing as in the proof of Lemma 5.4, we get Taking the expectation with respect to W 3 leads to As in the proof of Lemma 5.4, we upper bound the term E 0 [L 2 π (Z)] by Jensen's inequality.
Let us consider the type II errors.We need to show that, if nr 4 → ∞, then sup θ∈Θp(r) P θ (t 0 ≤ u α ) → 0. We will prove that, uniformly over θ ∈ Θ p (r), Indeed, if (6.1) is true, we derive that for n, p large enough, by Chebychev's inequality.In order to check (6.1), we use the identities It follows that Since E X ( X j 2 ) = n, E X ((X j , X l )) = 0, j = l, we get the first convergence in (6.1): Let us turn to the variance term By A2, the random variables X ij are independent in (i, j), i = 1, ..., n, j = 1, .., p.

Tests based on the statistic t 1
First observe that under H 0 , the statistic t 1 is a degenerate U-statistic of the second order, i.e., for Z s = (X (s) , Y s ), s = 1, 2, 3 one has E Z 1 K(Z 1 , Z 2 ) = 0, which yields where E Z 3 denotes the expectation over Z 3 under P 0 .In order to establish the asymptotic normality of t 1 we only need to check the two following conditions, see [14] Lemma 3.4, We have by Assumption A1, Since E 0 (K 2 (Z 1 , Z 2 )) = 1, we get the first convergence in (6.3).Next by A2, , where we set As a consequence, we get where b 4 ∆ = sup i E(X 4 i1 ).By B1, the second convergence in (6.3) holds true.Thus, Theorem 4.3 (i) follows.
Let us now evaluate the type II errors under P θ .Recall that by (1.1), Observe that E θ Y i X ij = θ j and set Consider the representation where Observe that the kernel K θ (Z 1 , Z 2 ) is symmetric and degenerate under P θ , i.e., The terms K θ (Z 1 , Z 2 ), δ(Z 1 ), and δ(Z 2 ) are centered and uncorrelated under P θ .As a consequence, we derive that Let us compute the variances.Let δ ij be the Kronecker function.Using the representation Note that where (we omit the first index i = 1, 2 in X ij ) Observe that We obtain We now compute Let us take the expectation with respect to X.By Assumption A2, we have (6.6) Similarly for i = 1, 2, we compute the variance of δ(Z i ).
It follows that y j = (X j , Y )/ Y ∼ N (0, 1) and y 1 , . . ., y p are i.i.d.under P 0 .As a consequence, the random variables q i are independent uniformly distributed on (0, 1) under P 0 .We denote by F p (t) the empirical distribution of (q i ) 1≤i≤p : Then, the normalized uniform empirical process is defined by Arguing as in Donoho and Jin [8], we observe that t HC = sup t≤1/2 W p (t).It is stated in [26], Chapter 16 that This proves the result.2
Proposition 6.1 Consider the set of parameters Θ(4) k defined by Let us introduce the statistic t max and the corresponding test ψ max defined by If ψ ′ max = 1, it follows that q (1) ≤ 2Φ(− 2.5 log(p)) ≤ 2p −5/4 .Hence, we have . For p large enough, this implies that ψ max = 1.
Lemma 6.1 For any T > 0 going to infinity and such that T = o( √ n), we have Taking T = 4 log(p), we obtain We recall that θ 2 ≤ 4k log(p)/n = o(1) since θ ∈ Θ(3) k (r np ).Hence, we get Combining this bound with (6.15), we obtain that there exists an event Z np,4 of probability tending to one and a sequence δ = δ np → 0 such that Since H np = o(p η ), this implies (6.17) and then (6.12). 2 6.3.5 Proof of Lemma 6.1 Let us bound the deviations of Sj by computing the exponential moments of ∆ j .For any h such that h 2 θ 2 ≤ 1/4, we have Recall that we consider r np = (ϕ(β) + δ 0 ) k log(p)/n with arbitrarily small δ 0 > 0 (see (6.8)).Recalling that T p = log(p), we apply the results of Section 7. In order to obtain (6.18), we have to check that there exists η > 0 such that, for n, p large enough, The relation (6.18) follows. 2 6.4 Proof of Proposition 4.6 Under H 0 , the distributions of the variables (y i ) i=1,...,p do not depend on σ 2 .As a consequence, E 0,σ (ψ HC ) = E 0,1 .This last quantity has been shown to converge to 0 in Theorem 4.4.Hence, we get α un (ψ HC ) = o(1).

Thresholds
Take the thresholds T = T j satisfying T j = a j 2 + log(h −1 ) a j .
7.2 Norms X j and scalar products (X j , X l ) Clearly, E( X j 2 ) = n, E(X j , X l ) = 0 , Var(X j , X l ) = n.
By Assumption B1, there exists D > 0 such that sup j =l Var(X j , X l ) ≤ nD and sup j Var( X j 2 ) ≤ nD.

. 8 ) 5 . 3 . 3
Expectation over πZ and over E X 0 Let us define the variables η k in {1, −1}.The expectations over πZ are of the form