Minimax testing of a composite null hypothesis defined via a quadratic functional in the model of regression

We consider the problem of testing a particular type of composite null hypothesis under a nonparametric multivariate regression model. For a given quadratic functional $Q$, the null hypothesis states that the regression function $f$ satisfies the constraint $Q[f]=0$, while the alternative corresponds to the functions for which $Q[f]$ is bounded away from zero. On the one hand, we provide minimax rates of testing and the exact separation constants, along with a sharp-optimal testing procedure, for diagonal and nonnegative quadratic functionals. We consider smoothness classes of ellipsoidal form and check that our conditions are fulfilled in the particular case of ellipsoids corresponding to anisotropic Sobolev classes. In this case, we present a closed form of the minimax rate and the separation constant. On the other hand, minimax rates for quadratic functionals which are neither positive nor negative makes appear two different regimes:"regular"and"irregular". In the"regular"case, the minimax rate is equal to $n^{-1/4}$ while in the"irregular"case, the rate depends on the smoothness class and is slower than in the"regular"case. We apply this to the issue of testing the equality of norms of two functions observed in noisy environments.


Problem statement
Consider the nonparametric regression model with multi-dimensional random design: We observe (x i , t i ) i=1,...,n obeying the relation where t i ∈ ∆ ⊂ R d are random design points, 1 ≤ d < ∞, f : ∆ → R is the unknown regression function and ξ i s represent observation noise.Throughout this work, we assume that the vectors t i = (t 1 i , . . ., t d i ), for i = 1, . . ., n, are independent and identically distributed with uniform distribution on ∆ = [0, 1] d , which is equivalent to t k i iid ∼ U (0, 1).Furthermore, conditionally on T n = {t 1 , . . ., t n }, the variables ξ 1 , . . ., ξ n are assumed i.i.d. with zero mean and variance τ 2 , for some known τ ∈ (0, ∞).
Let L 2 (∆) denote the Hilbert space of all squared integrable functions defined on ∆.Assume that we are given two disjoint subsets F 0 and F 1 of L 2 (∆).We are interested in analyzing the problem of testing hypotheses: To be more precise, let us set z i = (x i , t i ) and denote by P f be the probability distribution of the data vector (z 1 , . . ., z n ) given by (1).The expectation with respect to P f is denoted by E f .The goal is to design a testing procedure φ n : (R × ∆) n → {0, 1} for which we are able to establish theoretical guarantees in terms of the cumulative error rate (the sum of the probabilities of type I and type II errors): To measure the statistical complexity of this testing problem, it is relevant to analyze the minimax error rate where inf φn denotes the infimum over all testing procedures.The focus in this paper is on a particular type of null hypotheses H 0 that can be defined as the set of functions lying in the kernel of some quadratic functional As described later in this section, this kind of null hypotheses naturally arises in several problems including variable selection, testing partial linearity of a regression function or the equality of norms of two signals.Then, it is appealing to define the alternative as the set of functions satisfying |Q[f ]| > ρ 2 for some ρ > 0. However, without further assumptions on the nature of functions f , it is impossible to design consistent testing procedures for discriminating between F 0 and F 1 .One approach to making the problem meaningful is to assume that the function f belongs to a smoothness class.Typical examples of smoothness classes are Sobolev and Hölder classes, Besov bodies or balls in reproducing kernel Hilbert spaces.
In the present work, we assume that the function f belongs to a smoothness class Σ that can be seen as an ellipsoid in the infinite-dimensional space L 2 (∆).Thus, the null and the alternative are defined by One can take note that both hypotheses are composite and nonparametric.

Background on minimax rate-and sharp-optimality
Given the observations (x i , t i ) i=1,...,n , we consider the problem of testing the composite hypothesis F 0 against the nonparametric alternative F 1 (ρ) defined by (5).The goal here is to obtain, if possible, both rate and sharp asymptotics for the cumulative error rate in the minimax setup.These notions are defined as follows.For a fixed small number γ ∈ (0, 1), the function r * n is called minimax rate of testing if: • there exists C ′ > 0 such that ∀ C < C ′ , we have lim inf • there exists C ′′ > 0 and a test φ n such that ∀ C > C ′′ , lim sup n→∞ γ n (F 0 , F 1 (Cr * n ), φ n ) ≤ γ.
A testing procedure φ n is called minimax rate-optimal if lim sup n→∞ γ n (F 0 , F 1 (Cr * n ), φ n ) ≤ γ for some C > 0. Note that the minimax rate and the rate-optimal test may depend on the prescribed significance level γ.However, in most situations this dependence cancels out from the rate and appears only in the constants.If the constants C ′ and C ′′ coincide, then their common value is called exact separation constant and any test satisfying the second condition is called minimax sharp optimal.The minimax rate r * n is actually not uniquely defined, but the product of the minimax rate with the exact separation constant is uniquely defined up to an asymptotic equivalence.For more details on minimax hypotheses testing we refer to (Ingster and Suslina, 2003).While minimax rate-optimality is a desirable feature for a testing procedure, it may still lead to overly conservative tests.A (partial) remedy for this issue is to consider sharp asymptotics of the error rate.In fact, one can often prove that when n → ∞, where Φ is the c.d.f. of the standard Gaussian distribution, u n (•) is some "simple" function from R + to R and o( 1) is a term tending to zero uniformly in ρ as n → ∞.This relation implies that by determining r * n as a solution with respect to ρ to the equation u n (ρ) = z 1−γ/2where z α stands for the α-quantile of the standard Gaussian distribution-we get not only the minimax rate, but also the exact separation constant.When relation ( 6) is satisfied, we say that Gaussian asymptotics hold.

Overview of the main contributions
Our contributions focus on the case where the smoothness class Σ is an ellipsoid in L 2 (∆) and the quadratic functional Q admits a diagonal form in the orthonormal basis corresponding to the directions of the axes of the ellipsoid Σ.To be more precise, let L be a countable set and {ϕ l } l∈L be an orthonormal system in L 2 (∆).For a function f ∈ L 2 (∆), let θ[f ] = {θ l [f ]} l∈L be the generalized Fourier coefficients with respect to this system, i.e., θ l [f ] = f, ϕ l , where •, • denotes the inner product in L 2 (∆).The functional sets Σ ⊂ L 2 (∆) under consideration are subsets of ellipsoids with directions of axes {ϕ l } l∈L and with coefficients c = {c l } l∈L ∈ R L + : The diagonal quadratic functional is defined by a set of coefficients q = {q l } l∈L : Note that if Q is definite positive, i.e., q l > 0 for all l ∈ L, then the null hypothesis becomes f = 0 and the problem under consideration is known as detection problem.However, the goal of the present work is to consider more general types of diagonal quadratic functionals.Namely, two situations are examined: (a) all the coefficients q l are nonnegative and (b) the two sets L + = {l ∈ L : q l > 0} and L − = {l ∈ L : q l < 0} are nonempty.In the first situation, we establish Gaussian asymptotics of the cumulative error rate and propose a minimax sharp-optimal test.Under some conditions, we show that the sequence1 r * n,γ = min ρ > 0 : inf provides the minimax rate of testing with constants C ′ = C ′′ = 1.This result is instantiated to some examples motivating our interest for testing the hypotheses (5).One example, closely related to the problem of variable selection (Comminges and Dalalyan, 2012), is testing the relevance of a particular covariate in high-dimensional regression.This problem is considered in a more general setup corresponding to testing that a partial derivative of order α = (α 1 , . . ., α d ), denoted by ∂ α 1 +...+α d f /∂t α 1 1 . . .∂t α d d , is identically equal to zero against the hypothesis that this derivative is significantly different from 0. As a consequence of our main result, we show that if f lies in the anisotropic Sobolev ball of smoothness σ = (σ 1 , . . ., σ d ), and we set δ , then the minimax optimal-rate is r * n = n −2σ(1−δ)/(4σ+d) provided that δ < 1 and σ > d/4.Furthermore, we derive Gaussian asymptotics and exhibit the exact separation constant in this problem.
The second situation we examine in this paper concerns the case where the cardinalities of both L + and L − are nonzero.A typical application of this kind of problem is testing the equality of the norms of two signals observed in noisy environments.In this set-up, we provide minimax rates of testing and exhibit the presence of two regimes that we call regular regime and irregular regime.In the regular regime, the minimax rate is r * n = n −1/4 , while in the irregular case it may be of the form n −a with an a < 1/4 that depends on the degree of smoothness of the functional class.Note that all our results are non-adaptive: our testing procedures make explicit use of the smoothness characteristics of the function f .Adaptation to the unknown smoothness for the problem we consider is an open question for which the works (Spokoiny, 1996, Gayraud andPouet, 2005) may be of valuable guidance.

Relation to previous work
Starting from the seminal papers by Ermakov (1990) and Ingster (1993a,b,c), minimax testing of nonparametric hypotheses received a great deal of attention.A detailed review of the literature on this topic being out of scope of this section, we only focus on discussing those previous results which are closely related to the present work.The goal here is to highlight the common points and the most striking differences with the existing literature.The major part of the statistical inference for nonparametric hypotheses testing was developed for the Gaussian white noise model (GWNM) and its equivalent formulation as Gaussian sequence model (GSM).As recent references for the problem of testing a simple hypothesis in these models, we cite (Ermakov, 2011, Ingster et al., 2012), where the reader may find further pointers to previous work.In the present work, the null hypothesis defined by ( 5) is composite and nonparametric.Early references for minimax results for composite null hypotheses include (Horowitz and Spokoiny, 2001, Pouet, 2001, Gayraud and Pouet, 2001, 2005), where the case of parametric null hypothesis is of main interest.These papers deal with the one-dimensional situation and provide only minimax rates of testing without attaining the exact separation constant.Furthermore, the alternative is defined as the set of functions that are at least at a Euclidean distance ρ from the null hypothesis, which is very different from the alternatives considered in this work.More recently, the nonasymptotic approach to the minimax testing gained popularity (Baraud et al., 2003, 2005, Laurent et al., 2011, 2012).One of the advantages of the nonasymptotic approach is that it removes the frontier between the concepts of parametric and nonparametric hypotheses, while its limitation is that there is no result on sharp optimality (even the notion itself is not well defined).Note also that all these papers deal with the GSM considering as main application the case of one dimensional signals, as opposed to our set-up of regression with high-dimensional covariates.Let us review in more details the papers (Ingster and Sapatinas, 2009) and (Laurent et al., 2011) that are very closely related to our work either by the methodology which is used or by the problem of interest.Ingster and Sapatinas (2009) extended some results on the goodnessof-fit testing for the d-dimensional GWNM to the goodness-of-fit testing for the multivariate nonparametric regression model.More precisely, they tested the null hypothesis H 0 : f = f 0 , where f 0 is a known function, against the alternative n , where Σ is an ellipsoid in the Hilbert space L 2 (∆).They obtained both rate and sharp asymptotics for the error probabilities in the minimax setup.So the model they considered is the same as the one we are interested in here, but the hypotheses H 0 and H 1 are substantially different.As a consequence, the testing procedure we propose takes into account the general forms of H 0 and H 1 given by ( 5) and is different from the asymptotically minimax test of Ingster and Sapatinas (2009).Furthermore, we substantially relaxed the contraint on the noise distribution by replacing Gaussianity assumption by the condition of bounded 4th moment.Laurent et al. (2011) considered the GWNM from the inverse problem point of view, i.e., when the signal of interest g undergoes a linear transformation T before being observed in noisy environment.This corresponds to f = T [g] with a compact injective operator T .Then the two assertions g = 0 and T [g] = 0 are equivalent.Consequently, if the goal is to detect the signal f , one can consider the two testing problems : The authors discussed advantages and limitations of each of these two formulations in terms of minimax rates.Depending on the complexity of the inverse problem and on the assumptions on the function to be detected (sparsity or smoothness), they proved that the specific treatment devoted to inverse problem which includes an underlying inversion of the operator, may worsen the detection accuracy.For each situation, they also highlighted the cases where the direct strategy fails while a specific test for inverse formulation works well.The inverse formulation is closely related to our definition (5) of the hypotheses H 0 and H 1 , since 2 is a quadratic functional.However, our setting is more general in that we consider functionals with non-trivial kernels and with possibly negative diagonal entries.

Organization
The rest of the paper is organized as follows.The results concerning sharp asymptotics for positive semi-definite diagonal functionals are provided in Section 2. In particular, the rates of separation for a general class of tests called linear U-tests are explored in Subsection 2.2.The asymptotically optimal linear U-test is provided in Subsection 2.3 along with its rate of separation, which is shown to coincide with the minimax exact rate in Subsection 2.4.Section 3 is devoted to a discussion of the assumptions and to the consequences of the main result for some relevant examples.The results for nonpositive and nonnegative diagonal quadratic functionals are stated in Section 4 along with an application to testing the equality of the norms of two signals.A summary and some perspectives are provided in Section 5. Finally, the proofs of the results are postponed to the Appendix.

Additional notation
In what follows, the notation A n = O(B n ) means that there exists a constant c > 0 such that A n ≤ cB n and the notation A n = o(B n ) means that the ratio A n /B n tends to zero.The relation A n ∼ B n means that A n /B n tends to 1, while the relation A n ≍ B n means that there exist constants 0 < c 1 < c 2 < ∞ and n 0 large enough such that c 1 ≤ A n /B n ≤ c 2 for n ≥ n 0 .For a real number c, we denote by c + its positive part max(0, c) and by ⌊c⌋ its integer part.For a set A, 1 A stands for its indicator function and |A| denotes its cardinality.Given a q > 0 and a function f , f q = ∆ |f (t)| q dt 1/q is the conventional ℓ q -norm of f .Similarly, for a vector or an array u indexed by a countable set L, u q = ( l∈L |u l | q ) 1/q is the ℓ q -norm of u.As usual, we also denote by u 0 and u ∞ , respectively, the number of nonzero entries and the magnitude of the largest entry of u ∈ R L .
In the sequel, without loss of generality, we assume that the standard deviation of the noise is equal to one: τ = 1.The case of general but known τ can be deduced as a consequence of our results.
Recall that we consider quadratic functionals Q of the form , for some given array q = {q l } l∈L .The major difference between the functional l∈L θ l [f ] 2 that appears in the problem of detection (Ingster andSapatinas, 2009, Ingster et al., 2012) and this general functional actually lies in the fact that the support of q defined by S F = supp(q) = l ∈ L : q l = 0 is generally different from L. Furthermore, large coefficients q l amplify the error of estimating Q[f ] and, therefore, it becomes more difficult to distinguish H 0 from H 1 .An interesting question, to which we answer in the next sections, is what is the interplay between c and q that makes it possible to distinguish between the null and the alternative.Let S c F denote the complement of S F and, for a set L ⊂ L, span {ϕ l } l∈L be the closed linear subspace of L 2 (∆) spanned by the set {ϕ l } l∈L .Let Π S F f and Π S c F f be the orthogonal projections of a function f ∈ Σ on span {ϕ l } l∈S F and span {ϕ l } l∈S c F respectively.To simplify notation, the subscript S c F is omitted in the rest of the paper, i.e., Π S c F f is replaced by Πf .Finally, throughout this work we will assume that f is centered, i.e., ∆ f (t) dt = 0, and that {ϕ l } is an orthonormal basis of the subspace of L 2 (∆) consisting of all centered functions.In other terms, all the functions ϕ i are orthogonal to the constant function.

Linear U-tests and their error rate
We start by introducing a family of testing procedures that we call linear U-tests.To this end, we split the sample into two parts: a small part of the sample is used to build a pilot estimator Πf n of Πf , whereas the remaining observations are used for distinguishing between H 0 and H 1 .Let us set m = n−⌊ √ n⌋ and call the two parts of the sample D 1 = {(x i , t i ) : i = 1, . . ., m} and D 2 = {(x i , t i ) : i = m + 1, . . ., n}.Using a pilot estimator Πf n of Πf , we define the adjusted observations xi = x i − Πf n (t i ) and zi = (x i , t i ).
Definition 1.Let w n = {w l,n } l∈S F be an array of real numbers containing a finite number of nonzero entries and such that w n 2 = 1.Let u be a real number.We call a linear U-test based on the array w n the procedure φ w n = 1 {U w n >u} , where U n is the linear in w n U-statistic defined by We shall prove that an appropriate choice of w n and u leads to a linear U-test that is asymptotically sharp-optimal.The rationale behind this property relies on the by now well-understood principle of smoothing out high frequencies of a noisy signal.In fact, if we call {θ l [f ]} l∈S F the (relevant part of the) representation of f in the frequency domain, then { 1 m m i=1 xi ϕ l (t i )} l∈S F is a nearly unbiased estimator of this representation.Then, the array w n acts as a low pass filter that shrinks to zero the coefficients corresponding to high frequencies in order to prevent over-fitting.The first step in establishing theoretical guarantees on the error rate of a linear U-test consists in exploring the behavior of the statistic U n under the null.
Proposition 1.Let w n,l ≥ 0 for all n ∈ N and l ∈ L. Assume that E[ξ 4 1 ] < ∞ and the following conditions are fulfilled: Then, uniformly in f ∈ F 0 , the U-statistic defined by ( 9) converges in distribution to the standard Gaussian distribution N (0, 1).
In other terms, this proposition claims that under appropriate conditions, for every u ∈ R, the sequence sup f ∈F 0 |P f (U n > u) − Φ(u)| tends to zero, as n goes to infinity.This means that under the null, the distribution of the test statistic U n is asymptotically parameter free.This is frequently referred to as Wilks' phenomenon.To complete the investigation of the error rate of a linear U-test, we need to characterize the behavior of the test statistic U n under the alternative.As usual, this step is more involved.Roughly speaking, we will show that under the alternative the test statistic U n is close to a Gaussian random variable with mean h n [f, w n ] = m(m−1) 2 1/2 l∈L(wn) w l,n θ 2 l [f ] and variance 1.The rigorous statement is provided in the next proposition.
Proposition 2. Let the assumptions of Proposition 1 be satisfied.Assume that in addition: Then, for every ρ > 0, the type II error of the linear U-test based on w n satisfies: where the term o(1) does not depend on ρ.
Let us provide an informal discussion of the assumptions introduced in the previous propositions.The first two assumptions in Proposition 1 mean that most nonzero entries of the array w n should be of the same order.Arrays that have a few spikes and many small entries are discarded by these assumptions.Furthermore, the number of samples in the frequency domain that are not annihilated by w n should be small as compared to the sample size n.
The third assumption of Proposition 1 is trivially satisfied for bases of bounded functions such as sine and cosine bases and their tensor products.For localized bases like wavelets, this assumption imposes a constraint on the size of the support of w n : it should not be too small.The last assumption of Proposition 1 will be discussed in more detail later.One should also take note that the only reason for requiring from the functions f to be smooth under the null is the need to be able to construct a uniformly consistent pilot estimator of Πf .
Concerning the assumptions imposed in Proposition 2, the first one means that only coefficients θ l corresponding to high frequencies are strongly shrunk by w n .This is a kind of coherence assumption between the smoothing filter w n and the coefficients c = {c l } l∈L encoding the prior information on the signal smoothness.
where the term o(1) is uniform in ρ > 0. Using the symmetry of Φ and the monotonicity of Φ ′ on R + , one easily checks that the value of the threshold u minimizing the main term in the right-hand side of the last display is u This result provides a constructive tool for determining the rate of separation of a given linear U-test.In fact, one only needs to set u = z 1−γ/2 and find a sequence r n such that inf f ∈F 1 (rn) h n [f, w n ] ∼ 2z 1−γ/2 , where z α is the α-quantile of N (0, 1).
Remark 1.We explain here the use of xi instead of x i in our testing procedure.Actually if we were only interested in rate-optimality, this precaution would not have been necessary.The problem only arises when dealing with sharp-optimality and it concerns the variance of U n .Indeed we need some terms that appear in the variance to tend to zero when Q[f ] = 0 or Q[f ] is small (those terms only need to be bounded for the rate-optimality).If we had used x i instead of xi , we would have ended up with terms like f 2 in the variance.The information contained in the assertion "Q[f ] is small" concerns only the coefficients {θ l } l∈S F , thus it implies that Π S F f 2 is small but it does not say anything about f 2 .We can also remark that this problem does not arise in the Gaussian sequence model as one estimates θ 2 l by an unbiased estimator whose variance makes appear only θ l .
Remark 2. We chose to consider only the criterion γ n (F 0 , F 1 (ρ), φ w n ) so as to simplify the exposition of our results.But we could have dealt with the classical Neyman-Pearson criterion that we recall here.For a significance level 0 < α < 1 and a test ψ, we set Ingster and Sapatinas (2009) and more generally in Ingster and Suslina (2003).The transposition to our case is straightforward.

Minimax linear U-tests
The relation ( 11) being valid for a large variety of arrays w n , it is natural to look for a w n minimizing the right-hand side of ( 11).This leads to the following saddle point problem: It turns out that this saddle point problem can be solved with respect to w and leads to a one-parameter family of smoothing filters w.
Proposition 3. Assume that for every T > 0, the set For a given ρ > 0, assume that the equation has a solution and denote it by T ρ .Then, the pair (w * , v * ) defined by provides a solution to the saddle point problem ( 12), that is This result tells us that the "optimal" weights w n for the linear U-test φ w n should be of the form ( 14), which is particularly interesting because of its dependence on only one parameter T > 0. The next theorem provides a simple strategy for determining the minimax sharpoptimal test among linear U-tests satisfying some mild assumptions.We will show later in this section that this test is also minimax sharp-optimal among all possible tests.
Theorem 1. Assume that E[ξ 4 1 ] < ∞ and for every T > 0, the set N (T ) = {l ∈ S F : c l < T q l } is finite.For a prescribed significance level γ ∈ (0, 1), let T n,γ be a sequence of positive numbers such that the following relation holds true: as n → ∞, Let us define If the following conditions are fulfilled: The proof of this result, provided in the Appendix, is a direct consequence of Proposition 1, 2 and 3.As we shall see below, the rate r * n,γ defined in Theorem 1 is the minimax sharp-rate in the problem of testing hypotheses (5), provided that the assumptions of the theorem are fulfilled.As expected, getting such a strong result requires non-trivial assumptions on the nature of the functional class, that of the hypotheses to be tested, as well as the interplay between them.Some short comments on these assumptions are provided in the remark below, with a further development left to subsequent sections.
Remark 3. The very first assumption is that the set N (T ) is finite.It is necessary for ensuring that the linear U-test we introduced is computable.This assumption is fulfilled when, roughly speaking, the coefficients which express the regularity, {c l } l∈L , grow at a faster rate than the coefficients {q l } l∈L of the quadratic functional Q.Assumptions Remark 4. The result stated in Theorem 1 is in the spirit of the previous work on the sharp asymptotics in minimax testing, initiated by Ermakov (1990) in the problem of detection 2 ) under Gaussian white noise.The explicit form2 of the weights w * l,n is obtained by solving a quadratic optimization problem called the extremal problem in a series of recent works (Ingster and Suslina, 2003, Ingster and Sapatinas, 2009, Ingster and Stepanova, 2011, Ingster et al., 2012), see also Ermakov (2004) for a similar result in the heteroscedastic GWNM.In the case q l = 1, ∀l ∈ L, the aforementioned extremal problem is equivalent to the saddle point problem (12).In a nutshell, the main differences of Theorem 1 as compared to the existing results is the extension to the case of general coefficients q l and to non-Gaussian error distribution, as well as the use in the test statistic U w n of the adjusted responses {x i } instead of the raw data {x i }.

Lower bound
We shall state in this section the result showing that the rate r * n,γ introduced in Theorem 1 is the minimax rate of testing and the exact separation constant associated with this rate is equal to one.This also implies that the testing procedure proposed in previous subsection is not only minimax rate-optimal but also minimax sharp-optimal among all possible testing procedures.In this subsection, we consider the functional classes Σ = Σ p,L defined by F f = 0 .Clearly, for p > 4, this functional class is smaller than those satisfying conditions of Theorem 1.Therefore, any lower bound proven for these functional classes will also be a lower bound for the functional classes for which Theorem 1 is applicable.
Theorem 2. Assume that ξ i s are standard Gaussian random variables and that for every are fulfilled, then for every C < 1 the minimax risk satisfies Although the main steps of the proof of this theorem, postponed to the Appendix, are close to those of (Ingster and Sapatinas, 2009), we have made several improvements which resulted in both shorter and more transparent proof and relaxed assumptions.The most notable improvement is perhaps the fact that in condition [C3] it is not necessary to have C 3 = 1.We will further discuss this point and the other assumptions in the next section.
Remark 5.If we were only interested in minimax rate-optimality, we could have used simpler prior in the proof of Theorem 2 which would also yield the desired lower bound under slightly weaker assumptions.One can also deduce from the proof that for a concrete pair (c, q), a simple way to deduce the minimax rate of separation consists in finding a sequence , where M (T ) = l∈N (T ) q 2 l .

Bases satisfying assumption [C3]
First we give examples of orthonormal bases satisfying assumption [C3], irrespectively of the nature of arrays c and q defining the smoothness class and the quadratic functional Q.One can take note that despite more general settings considered in the present work, our assumption [C3] is significantly weaker than the corresponding assumption in (Ingster and Sapatinas, 2009), which requires C 3 to be equal to one.In fact, in a remark, Ingster and Sapatinas (2009) suggest that their proof remains valid under our assumption ). Due to a better analysis, we succeeded to establish sharp asymptotics under the weak version of [C3] without any additional price (except that a logarithmic factor appears now in the corresponding condition in Theorem 2).
Fourier basis Let us consider first the following Fourier basis in dimension d for which L = Z d and where (Z d ) + denotes the set of all k ∈ Z d \ {0} such that the first nonzero element of k is positive and k • t stands for the usual inner product in R d .Since all the basis functions are bounded by Tensor product Fourier basis We can also consider the traditional tensor product Fourier basis as in Ingster and Sapatinas (2009).
Haar basis Let ϕ j,k (•), j ∈ N, k ∈ {1, . . ., 2 j } , be the standard orthonormal Haar basis on [0, 1], where j is the scale parameter and k is the shift.The tensor product (ϕ j,k ) j,k Haar basis is then where j = (j 1 , . . ., j d ) and k = (k 1 , . . ., k d ).As shown in (Ingster and Sapatinas, 2009), under the extra assumption that the coefficients c l = c j,k and q l = q j,k depend only on the scale parameter, i.e., c j,k = c j and q j,k = q j , assumption [C3] is satisfied with C 3 = 1.Note that the same holds true for the multivariate Haar basis defined in the more commonly used way (see Cohen (2003), chapter 2): ϕ l (t) = d i=1 ψ ω i j,k i (t i ) , where l = (j, k, ω) such that j ∈ N, k ∈ {1, . . ., 2 j } d and ω ∈ {0, 1} d \ {0} with ψ 0 j,k and ψ 1 j,k being the scaled and shifted mother wavelet and father wavelet, respectively.
Compactly supported wavelet basis Since we are not limited to the case C 3 = 1, any orthonormal wavelet basis satisfies assumption [C3], as long as the wavelets are compactly supported and provided that the coefficients c l and q l depend on the level of the resolution and not on the shift.

Examples of estimators satisfying [C6]
We present below pilot estimators that in two different contexts satisfy assumption [C6].
Tensor-product Fourier basis For the first example, we assume that the orthonormal system {ϕ l } is the tensor product Fourier basis.Then we have sup l sup t∈∆ |ϕ l (t)| ≤ 2 d/2 .The anisotropic Sobolev ball with radius R and smoothness σ = (σ 1 , . . ., σ d ) ∈ (0, ∞) d is defined by The estimator we suggest to use is constructed as follows.We first estimate Then we choose a tuning parameter T = T n > 0 and define the pilot estimator To ease notation, we set N 1 (T ) = {l ∈ S c F : c l < T } and N 2 (T ) = S c F \ N 1 (T ).Lemma 1. Assume that either one of the following conditions is satisfied: Compactly supported orthonormal wavelet basis The same method can be applied in the case of an orthonormal basis of compactly supported wavelets of L 2 [0, 1] d .We suppose that the coefficients c l = c j,k correspond to those of a Besov ball B s 2,2 , i.e., c j = 2 js , and that σ = s − d/4 > 0. Let us set, for J ∈ N, In the following two subsections, we apply the previous results to two examples of quadratic functionals involving derivatives.The orthonormal system we use is the tensor product Fourier basis.

Testing partial derivatives
We assume here that f belongs to a Sobolev class with anisotropic constraints and the quadratic functional Q corresponds, roughly speaking, to the squared L 2 -norm of a partial derivative.More precisely, let α ∈ R d + and σ ∈ R d + be two given vectors and define, for every We will assume that d j=1 (α j /σ j ) < 1.For a function f = l∈L θ l ϕ l ∈ L 2 (∆), we set f 2 2,c = l∈L c l θ 2 l and f 2 2,q = l∈L q l θ 2 l .Then, for a 1-periodic function which is differentiable enough, and if the α j and σ j are integers, we have Proposition 4. Let us define δ, σ, (κ j ) and κ by δ , where the minimax rate r * n and the exact separation constant are 4σ+d , and .
Furthermore, the sequence of linear U-tests φ n of Theorem 1 is asymptotically minimax with ). Remark 6.The previous result can be used for performing dimensionality reduction through variable selection (Comminges and Dalalyan, 2012).Indeed, in a high-dimensional set-up it is of central interest to eliminate the irrelevant covariates.The coordinate t i of t is irrelevant if f is constant on the line {t ∈ ∆ : t j = a j for all j = i}, whatever the vector a ∈ ∆ is.This implies that the i th partial derivative of f is zero.Therefore, one can test the relevance of a variable, say t 1 , by comparing ∂f /∂t 1 2 with 0. In our notation, this amounts to testing hypotheses (5) with 2,q such that q l = (2πl 1 ) 2 .Combining Proposition 4 and Theorem 1, one can easily deduce a minimax sharp-optimal test and the minimax sharp-rates for this variable selection problem.
Remark 7. Another interesting particular case of the setting described in this subsection concerns the problem of component identification in partial linear models (Samarov et al., 2005).We say that f obeys a partial linear model if for some small subset J of indices {1, . . ., d} and for a vector β ∈ R |J c | , one can write f (t) = g(t J ) + β ⊤ t J c for every t ∈ ∆.The problem of component identification in this model is to determine for an index j whether j ∈ J or not.One way of addressing this issue is to perform a test of hypothesis Q[f ] = f 2 2,q = 0, where q l = (2πl j ) 4 .Roughly speaking, this corresponds to checking whether the second order partial derivative of f with respect to t j is zero or not (if the null is not rejected, then j ∈ J c ).Once again, Proposition 4 and Theorem 1 provide a minimax sharp-optimal test for this problem along with the minimax rates and exact separation constants.
Remark 8.In the case where the covariates t i are not observable and only x i 's are available, our model coincides with the convolution model, for which the minimax rates of testing were obtained by Butucea (2007) in the one-dimensional case with simple null hypothesis.It would be interesting to extend our results to such a model and to get minimax rates and, if possible, separation constants in the multidimensional convolution model.

Testing the relevance of a direction in a single-index model
Recall that a single-index model is a particular case of (1) corresponding to functions f that can be written in the form f (t) = g(β ⊤ 0 t) for some univariate function g : R → R and some vector β 0 ∈ R d .Assume now that for a candidate vector β ∈ R d \ {0} we wish to test the goodness-of-fit of the single-index model (Dalalyan et al., 2008, Gaïffas andLecué, 2007).This corresponds to testing the hypothesis This condition implies that ∂f ∂t i (t) , ∀i ∈ {1, . . ., d}, which in turn can be written as Without loss of generality, we assume that β 2 = 1 and set , with σ > d/4.Then, when σ is an integer, for a 1-periodic function which is smooth enough, To state the result providing the minimax rate and the exact constant in this problem, we introduce the constants and C2 = C1 − C0 .
Proposition 5.In the setting described above, the exact minimax rate r * n,γ is given by r The sequence of tests φ n of Theorem 1 is minimax sharp-optimal if T = T n,γ is chosen as Remark 9.The testing procedures provided in Propositions 4 and 5 require the precise knowledge of the smoothness parameter σ, which may not be available in practice.Indeed, the parameter σ explicitly enters in the definition of the tuning parameter T n .The adaptation to the unknown smoothness σ is an interesting problem for future research.We believe that rates of separation similar to those of Propositions 4 and 5 can be established for adaptive tests (up to logarithmic factors) using the Berry-Esseen type theorem for degenerate U -statistics of Butucea et al. (2009).

Nonpositive and nonnegative diagonal quadratic functionals
In this section we consider the more general setting obtained by abandoning the assumption that all the entries q l of the array q have the same sign.That is, we still have The sets F 0 and F 1 (r n ) are defined as before, cf. ( 5), and we use the same notation as in the positive case.Namely, for T > 0, we set N (T ) = l ∈ S F : c l < T |q l | , N (T ) = |N (T )| and M (T ) = l∈N (T ) q 2 l .We point out that, in the case considered in this section, a phenomenon of phase transition occurs: there is a regular case in which the rate is independent of the precise degree of smoothness, and an irregular case where the rate is smoothness-dependent.To be more precise, let |Q| denote the diagonal positive quadratic functional whose coefficients are |q l | for every l ∈ L. Let us recall that the minimax rate r * n in testing the significance of In our context, this rate corresponds to the irregular case: if Σ contains functions that are not smooth enough (compared to the difficulty of the problem, that is to say if q l 's are "too large" compared to c l 's), the minimax rate corresponding to Q is the same as for |Q| obtained in previous sections.By contrast, in the regular case, the minimax rate is smoothnessindependent and equals r * n = n −1/4 .

Testing procedure and upper bound on the minimax rate
The testing procedure we use in the present context is of the same type as the one used for nonnegative quadratic functionals.More precisely, for a tuning parameter T n and for a threshold u, we set φ n (T ) = 1 |Un(T )|>u , where the U -statistic U n (T ) is defined by with G T (t 1 , t 2 ) = M (T ) −1/2 l∈N (T ) q l ϕ l (t 1 )ϕ l (t 2 ).Theorem 3. Let γ ∈ (0, 1) be a fixed significance level.Let us denote by the type I error is bounded by γ/2: then the type II error is also bounded by γ/2: the cumulative error rate of the test φ n (T ) is bounded by γ for every alternative This theorem provides a nonasymptotic evaluation of the cumulative error rate of the linear U-test based on the array w l ∝ q l truncated at the level T .In the cases where the constants B 1 and B 2 can be reliably estimated and the function M (T ) admits a simple form, it is reasonable to choose the truncation level T by minimizing the expression 2 . By choosing T in such a way, we try to enlarge the set of alternatives for which the cumulative error rate stays below the prescribed level γ.Therefore, the last theorem implies the following non-asymptotic upper bound on the minimax rate of separation: This non-asymptotic bound clearly shows the presence of two asymptotic regimes.The first one corresponds to the case where n is much larger than M (T * ), whereas the second regime corresponds to n = o(M (T * )).Here, T * is the minimizer of the bound on ρ 2 obtained in the theorem above.The next corollary exhibits the rates of separation in these two different regimes.
Corollary 1. Assume that the arrays q and c are such that M (αT ) ≍ T →∞ M (T ) for every α > 0. Let T 0 n be any sequence of positive numbers satisfying all the assumptions of Theorem 3 are satisfied, then for some C > 0 the linear U-test φ n (T ) based on the threshold T = T n satisfies Thus, the rate of convergence is r ) and r * n = n −1/4 otherwise.Remark 10.Condition [D4] of Theorem 3 is more obscure than the other assumptions of theorem.Clearly, it imposes additional smoothness constraints on the function f .Using the Cauchy-Schwarz inequality, one can easily check that either one of the assumptions [D4-1] and [D4-2] below is sufficient for [D4]:

Lower bound on the minimax rate
We will show in this subsection that the asymptotic rate of separation provided by Corollary 1 is unimprovable, in the sense that there is no testing procedure having a faster separation rate.To this end, for every a ∈ {−, +} we set Theorem 4. Let us consider the problem of testing where F 0 and F 1 are defined by ( 5) and Assume that the sets L + and L − defined by ( 21) are both nonempty and that ξ i 's are Gaussian.
The following assertions are true.
2. Let T 0 n be a sequence of reals such that 4T are fulfilled, then there exists Corollary 2. Combining the two assertions of this theorem, we get that the minimax rate of separation r * n is lower bounded by . Thus, if the conditions of Theorems 3 and 4 are satisfied, then the minimax rate of separation is given by r , where

Testing equality of norms
As an application of the testing methodology developed in this section, we consider the problem of testing the equality of norms of two functions observed in noisy environment.More precisely, let us consider the following two-sample problem: for i = 1, . . ., n we observe (x 1,i , t 1,i ) and (x 2,i , t 2,i ) such that where t s,i 's are independent random vectors drawn from the uniform distribution over [0, 1] d .Furthermore, we assume that ξ s,i 's are i.i.d.such that E(ξ s,i |{t s,j }) = 0, E(ξ 2 s,i |{t s,j }) = 1 and, for some C ξ < ∞, E(ξ 4 s,i |{t s,j }) ≤ C ξ almost surely.Assuming that both g 1 and g 2 belong to a smoothness class Σ, we wish to test the hypothesis where for any function g we denoted by g W α 2 the (anisotropic) Sobolev norm of order α ∈ R d + (the precise definition is given below).It can be useful to perform such a test prior to using a shifted curve model in the context of curve registration (Dalalyan andCollier, 2012, Collier, 2012).Indeed, if there exists τ ∈ [0, 1] d such that g 1 (t) = g 2 (t − τ ) for every t ∈ [0, 1] d and the function g 1 is one-periodic, then necessarily g 1 W α 2 = g 2 W α 2 for any α.Thus, the rejection of the null hypothesis implies the inadequacy of the shifted curve model.In order to show how this type of test can be derived from the framework presented in the previous subsections, let us consider the case of a Sobolev ellipsoid Σ.Let {ψ m } l∈M be an orthonormal basis of the subspace L 2,c ([0, 1] d ) of L 2 ([0, 1] d ) consisting of all the functions orthogonal to the constant function.We will assume that both g 1 and g 2 are centered (this implies that they are orthogonal to the constant function as well).The Fourier coefficients of a function g w.r.t. a basis {ψ m } will be denoted by θ ψ m [g].We assume that for some array c and some constant L > 0 it holds that Assume now that we wish to test where q = {q m } is a given array.In order to show that this problem can be solved within the framework of the previous subsections, we introduce the functional set Setting L = M × {1, 2} and for l = (m, s) ∈ M × {1, 2} we get an orthonormal basis of Σ L .Clearly, for a function f ∈ Σ L , we have Therefore, for studying the rate of separation of a testing procedure we can assume that f ∈ Σ 2 L , whereas for establishing lower bounds on the minimax rate of separation we can use the relation Σ 1 L ⊂ Σ L .In both cases, this perfectly matches the framework of the previous subsections.We give a concrete example by setting M = Z d and choosing as {ψ m } the Fourier basis in dimension d.Similarly to the example in Subsection 3.3, we focus on anisotropic Sobolev smoothness classes defined via coefficients As it was done previously, δ = d j=1 α j /σ j and σ stands for the harmonic mean of σ j 's: σ = 1 d d j=1 σ −1 j −1 .We still assume that δ < 1 and σ > d/4.To test the equality of Sobolev norms, we introduce the coefficients q l , l = (m, s) ∈ Z d × {1, 2}, of the quadratic functional Q: Theorems 3 and 4, as well as the computations done in the proof of Proposition 4, imply that the minimax rate of separation in the problem described above is: r 4 .It is interesting to note that if δ ≥ 1/2 then we are in the irregular regime irrespectively of the value of σ and, therefore, the rate of separation is strictly slower than the rate n −1/4 .

Conclusion and outlook
We have presented a statistical analysis of the problem of testing the significance of the value Q[f ] for a quadratic functional Q of a regression function f .While the overwhelming majority of previous research focused on the case of a function f observed at any point in Gaussian white noise, we have considered here the more realistic setting when the observations are noisy values of f at a finite number of points uniformly randomly drawn from [0, 1] d .Furthermore, we have explored not only the case of positive semi-definite functional Q but also the situation when Q is neither positive nor negative semi-definite.In the first situation we have established asymptotic results providing the minimax rates of separation along with the sharp constants.In the second case, the analysis we have carried out is nonasymptotic and leads to the asymptotically minimax rate of separation, which exhibits two different regimes: the regular and the irregular regimes.Another distinctive feature of our approach is that we have put the emphasis on the multidimensional setting d > 1, even if at this stage we have not tackled the problem of increasingly high dimensionality: d = d n → ∞ as the sample size n tends to infinity.The results we have obtained are closely related to those of estimating quadratic functionals.While the presence of such a relation is not surprising in itself, the actual nature of the relation uncovers some interesting new phenomena.In fact, the test statistic used in our work is a properly normalized estimator of the quadratic functional Q[f ], which is constructed following the classical approach of weighted squared linear functional estimation (cf., for instance, Donoho and Nussbaum (1990)).Usually, the proper choice of the shrinkage weights and the resulting rates of convergence differ in the problem of hypothesis testing and in the problem of estimation.This is why the well-known "elbow" effect (phase transition) in estimating quadratic functionals disappears when the problem of hypotheses testing is considered for Interestingly, the results of Section 4 show that this difference between the rates of convergence is erased when the quadratic functional Q[f ] is neither positive nor negative.In fact, the rates of separation we have obtained in this case coincide with the square-root of the rates of estimation (Donoho andNussbaum, 1990, Fan, 1991).Therefore, the "elbow" effect is present in this problem of hypotheses testing.More interestingly, the rates of separation we obtained in the case of positive semi-definite functionals Q coincide with the rates of estimation of the functional at least in the Gaussian white noise model (Lepski et al., 1999).An intriguing question worth of being further explored is whether this analogy extends to the model of regression with random design and general positive semi-definite functionals Q[f ].Several relevant problems remained out of scope of the present paper.Most important ones are the possibility of extending our results to the case of nondiagonal functionals Q[f ] and the attainability of the obtained rates of separation by adaptive tests.More specifically, in some applications such as in deconvolution it may be more realistic to assume that the functional basis in which the smoothness of f is expressed does not coincide with the basis of the singular vectors of (the bilinear operator underlying) Q.This means that Q[f ] will be of the form . Furthermore, it would be more reasonable to replace the assumption l∈L c l θ l [f ] 2 ≤ 1 with some known array c = {c l } l∈L by the assumption l∈L c l (µ * )θ l [f ] 2 ≤ 1, where c(µ) = {c l (µ)} l∈L is a collection of arrays such that the mapping µ → c(µ) is known but the precise value µ * for which the smoothness constraint is valid is unknown.In the light of the previous discussion, it seems natural to study these two extensions (nondiagonal Q and adaptation to the smoothness class) by considering the problem of testing and the problem of estimating functionals in a joint framework.In particular, any progress in establishing upper bounds for estimators of Q[f ] or |Q[f ]| will straightforwardly lead to upper bounds for the rates of separation.Quite surprisingly, these problems of estimation received little attention in the context of nonparametric regression3 .They constitute interesting avenues for future research.
Appendix A: Proofs of results stated in Section 2

A.1. Proof of Proposition 1
Throughout the proof, the terms o(1), O(1) and the equivalences are uniform over Σ.Let L(w n ) be the support of w n .E D 2 f will denote the conditional expectation with respect to D 2 .We define This allows us to rewrite the U-statistic U n in the form where are U-statistics with the kernels To prove Proposition 1 and the subsequent results, we need two auxiliary lemmas.
Lemma 3. Let w n = (w l,n ) l∈L be a family of positive numbers containing only a finite number of nonzero entries and such that l∈L w 2 l,n = 1.Let L(w n ) be the support of w n .Then the expectation of the U-statistic U n is given by: whereas for the variances it holds As Πf n ∈ span {ϕ l } l∈S c F , we have Πf n ϕ l = 0 for all l ∈ S F .Therefore Now, let us evaluate the variances.Since ξ i s are non correlated zero-mean random variables with variance one, and ϕ l 's are orthonormal, it holds that Using the definition of G n (t 1 , t 2 ), we get Then, the Pythagoras theorem yields This completes the proof (28).As for the variance of U n,2 , we have where Let us bound the first term A n,1 : Now, in view of Bessel's inequality, and the expression inside the last expectation can be bounded using the inequality ).The term A n,2 can be dealt with similarly.Using the Cauchy-Schwarz inequality, .

By virtue of the Bessel inequality, it holds that
The last expectation can be bounded in the same way as we did several lines above for the term A n,1 .The last term A n,3 is actually negative Combining all these estimates, we get (29).
Lemma 4. Let w n = (w l,n ) l∈L be a family of positive numbers containing only a finite number of nonzero entries and such that l∈L w 2 l,n = 1.Assume that the random variable ξ 1 has finite fourth moment: then U n,0 is asymptotically Gaussian N (0, 1).
Proof.This result is an immediate consequence of (Hall, 1984, Theorem 1).
With these tools at hand, we are now in a position to establish the asymptotic normality of the U-statistic U n which leads to an evaluation of the type I error of the U-test.Let us recall that, for f ∈ F 0 , it holds Q[f ] = q l θ l [f ] 2 = 0 and, therefore, θ l [f ] = 0 for all l ∈ S F = {l : q l = 0}.Hence, for every f ∈ F 0 , h n [f, w n ] = 0 and Π S F f = 0. So, it follows from Lemma 3 that under the assumptions of the proposition, the convergences n,2 ] → 0 hold true uniformly in f ∈ F 0 .This implies that U n,1 and U n,2 tend to zero in P f -probability, uniformly in f ∈ F 0 .On the other hand, according to Lemma 4, U n,0 → N (0, 1) in distribution.The claim of the proposition follows from Slutsky's lemma.

A.2. Proof of Proposition 2
We first note that for every h > 0 it holds sup The value of h will be made precise later in the proof.Assume merely by now that h > 2(1+u).Then, Using the conditions of the proposition and the inequalities of Lemma 3, we get that for some constants C, C ′ independent of h, sup Let us switch to the second sup in (31).Let δ n > 0 be a sequence tending to zero.One readily checks that where F U 0,n (•) is the c.d.f. of U 0,n .On the one hand, we know from Lemma 4 that U n,0 converges in distribution to N (0, 1).This entails that F U 0,n converges uniformly over R to Φ. Therefore, On the other hand, in view of Lemma 3, Applying Hölder's inequality we get Π . Therefore, we have sup Choosing h large enough and then making δ n tend to zero sufficiently slowly we get the desired result.

A.3. Proof of Proposition 3
Using Kneser's minimax theorem for bilinear forms (Kneser, 1952), we can interchange the sup and the inf as follows: sup Furthermore, the array w * attaining the sup is given by w * l = v l / v 2 .Now, the minimization at the right-hand side of (34) involves a convex second-order cost function v 2 2 and linear constraints v l ≥ 0, v, c ≤ 1 and v, q ≤ ρ 2 .Therefore, according to KKT conditions, if there exist µ, λ ≥ 0 and ν ∈ R L + satisfying for some v * ∈ R L + the conditions 2v * + λc − µq − ν = 0 and λ( v * , c − 1) = 0, µ( v * , q − ρ 2 ) = 0 and ν l v * l = 0 for all l, then v * is a solution to the minimization problem (34).Under the conditions of the proposition, one easily checks that these KKT conditions are fulfilled with λ = 2/ l c l (T ρ q l − c l ) + , µ = 2T ρ / l c l (T ρ q l − c l ) + and ν l = 2(c l − T ρ q l ) + / l c l (T ρ q l − c l ) + .

A.4. Proof of Theorem 1
To ease notation, we set N n,γ = N (T n,γ ).We first check that under the assumptions of the theorem all the conditions required in Propositions 1 and 2 are fulfilled.Since imply respectively the third and the second conditions of Proposition 1. Finally, condition [C6] implies the fourth condition of Proposition 1.Thus, we have checked that under the conditions of the theorem, the claim of Proposition 1 holds true.To check that the claim of Proposition 2 holds true as well, it suffices to check the first assumption of that proposition (the second one being identical to [C7]).In fact, it is not difficult to check that the first assumption of Proposition 2 follows from [C2], [C4] and [C5] for the sequence ζ 2 n = min l∈Nn,γ q 2 l /4 l∈Nn,γ q 2 l .Therefore, combining the results of Proposition 1 and 2, we get that In view of Proposition 3, the infimum over f of h n [f, w * n ] can be evaluated as follows: .
Inserting this expression in (35) and using (15), we get that

A.5. Proof of Theorem 2
The proof of the lower bound follows the steps of (Ingster and Sapatinas, 2009).However, we considerably modified the way some of these steps are carried out which allowed us to relax several assumptions and resulted in a shorter proof.Let us recall that θ is the array of Fourier coefficients of a function in L 2 (∆) w.r.t. the system (ϕ l ) l∈L .We introduce the sets Θ 1 (ρ) = θ ∈ ℓ 2 (L) : c, θ 2 ≤ 1, q, θ 2 ≥ ρ 2 and Θ 0 = θ ∈ ℓ(L) : c, θ 2 ≤ 1, q, θ 2 = 0 , where we used the notation To get this lower bound, we define prior measures that are essentially concentrated on the sets Θ 0 and Θ 1 .Let π 1 n and π 2 n be measures on the space ℓ 2 (L) such that π 1 n (Θ 0 ) = 1 + o(1) and π 2 n (Θ 1 (Cr * n,γ )) = 1 + o(1).Those priors lead to the corresponding mixtures: ) is the minimal total error probability for testing the simple null hypothesis H 0 : P = P π 1 n against the simple alternative H 1 : P = P π 2 n , then we have (see Proposition 2.11 in Ingster and Suslina (2003)) . As shows the next result, to get the desired lower bound, it suffices to show that the Bayesian log-likelihood log(dP π 2 n /dP π 1 n ) is asymptotically equivalent to a Gaussian log-likelihood.Lemma 5 (section 4.3.1 in Ingster and Suslina (2003)).If there exists a deterministic sequence u n and a sequence of random variables η n such that under P π 1 n -probability η n converges in distribution to N (0, 1) and . For our purposes, we choose π 1 n to be the Dirac measure in 0 and denote the corresponding mixture probability P π 1 n by P 0 .It is clear that with this choice π 1 n (Θ 0 ) = 1.We now explain how π 2 n , that we will call π n from now on, is built.Let a n ∈ R L + be an array containing a finite number of nonzero elements.Let L(a n ) be the support of a n , i.e., a l = 0 if and only if l ∈ L(a n ).We assume that L(a n ) ⊂ S F and define π n (dθ) as the Gaussian product measure such that under π n the entries θ l are independent Gaussian with zero mean and variance a l .Proposition 6.Let δ ∈ (0, 1) be such that 1 − δ ≥ C. Assume that a n = (1 − δ)v n and, as n → ∞, the following assumptions are fulfilled: Proof.The proof of this proposition will be carried out with the help of several lemmas.The fact that π n Θ 1 (Cr * n,γ ) = 1 + o(1) is proved in the following lemma.Lemma 6. Assume that a n = (1 − δ)v n satisfies [L1] and [L2].Then, for every δ ∈ (0, 1), it holds that π n Θ 1 (Cr * n ) = 1 + o(1).
Proof.Let us denote H 1 (θ) = l∈L q l θ 2 l and H 2 (θ) = l∈L c l θ 2 l .In view of [L1], we have On the other hand, since the variance of the sum of independent random variables equals the sum of the variances of these random variables, we get (q l a l ).
By Tchebychev's inequality, we arrive at The claim of the lemma follows now from condition [L2].
Second, we show that for every p > 2 and every L > 0, the probability π n (θ : l θ l ϕ l p > L) tends to zero.Indeed, in view of the Tchebychev inequality and Fubini's theorem, Using the fact that for every fixed t, the random variable l θ l ϕ l (t) is Gaussian with zero mean and variance l a l ϕ 2 l (t), we get The last expression tends to zero as n → ∞ in view of condition [L3].
We focus now on the proof of (36).Set m = |L(a n )| and let Φ n be the m × n matrix having as generic element (Φ n ) li = ϕ l (t i ).Let A n be m × m diagonal matrix having the nonzero entries of a n on its main diagonal.It is clear that under P πn , conditionally to T n , x = (x 1 , . . ., x n ) ⊤ is distributed according to a multivariate Gaussian distribution with zero mean and n × n covariance matrix R n = Φ ⊤ n A n Φ n + I n .Therefore, the logarithm of its density w.r.t.P 0 is given by log dP πn dP 0 (x; t 1 , . . ., In what follows, we denote by |||M||| = sup x 2 =1 Mx 2 the spectral norm of a matrix M.
Using the identity ( Rn where we used that Rn and Rn + zB n = I m + zA n Φ n , one can note that for an appropriately chosen vector ξ ∼ N m (0, I m ), it holds that the derivative of which is given by g It is well-known that ξ 2 2 being distributed according to the χ 2 m distribution is O P (m), as m → ∞.This completes the proof of the lemma.
According to (Vershynin, 2012, Cor. 5.52), under [C3], we have ) 1/2 with probability at least 1−1/n.Furthermore, using the facts that the Rn is a diagonal matrix with diagonal entries ≥ 1 and that the variance of the sum of independent random variables equals the sum of variances, one readily checks that E| implies that the two conditions of the last lemma are fulfilled and, therefore, its claim holds true.Using the fact that A n is diagonal, we get Lemma 8. Let us denote .
If the conditions mn 3 a n 3 ∞ → 0, and a n 3 = o( a n 2 ) are fulfilled, then η n converges in distribution to N (0, 1) and Proof.Since n a ∞ → 0, we have na l na l +1 = na l − (na l ) 2 + O((na l ) 3 ) and log(na l + 1) = na l − (na l ) 2 2 +O((na l ) 3 ).This implies that l∈L na l ∞ ).On the other hand, using the central limit theorem for triangular arrays, we get the weak convergence of η n to N (0, 1) provided that u −3 n l (na l ) 3 /(na l + 1) 3 tends to zero.Since under the conditions of the lemma this convergence trivially holds, we get the claim of the lemma.
To complete the proof of Theorem 2, we shall show now that if we choose T n,γ as in Theorem 1 and define v n by , then all the conditions of Proposition 6 are fulfilled.We start by noting that [L1] is straightforward.To check the first relation in [L2], we use [C1] and |N (T n,γ )| → ∞, along with the following evaluations: For the second relation in [L2], in view of (15), ∀l ∈ N (T n,γ ) we have The last term tends to zero due to [C9].From the definition of v n , equation ( 15) and condition [C1] one can deduce that Thus, all the conditions of Proposition 6 are fulfilled and, therefore, Since this equation is true for every δ ∈ (0, 1 − C), it is also true for δ = 0, and the claim of Theorem 2 follows from (15).
On the other hand, Using Fubini's theorem and Rosenthal's inequality, for some constant C > 0, we get dt By Hölder's inequality, we get where we used the fact that E[ξ 4 ] < ∞ and that E[f (t i ) 4 ] ≤ 2 2d ( l c −1 l ) 2 < ∞ under the conditions of the lemma.Similar arguments lead to Combining the obtained evaluations, we get The required consistency follows from the assumption Let us consider the case Σ ⊂ W σ 2 (R).Without loss of generality, we will assume that Σ = W σ 2 (R) and c l = d i=1 (2πl i ) 2σ i /R 2 .The computations remain the same as in the previous case but the term Π 2 f 4 4 is bounded using Sobolev inequality (Kolyada, 1993).Indeed, choosing σ ′ so that σ This completes the proof, since the last term tends to zero as T → ∞.

B.2. Proof of Lemma 2
Let us introduce Π J f = k∈[1,2 J ] d α J,k ϕ J,k .We first decompose the empirical coefficients as follows: Then, using standard arguments, we have ).Furthermore, by well-known properties of wavelet bases (Cohen, 2003) and the Rosenthal inequality, Finally we obtain, uniformly over , and the announced result follows.

Appendix C: Proof of Proposition 4
We are going to check that all the assumptions of Theorem 1 and Theorem 2 are satisfied.We can use the Sobolev embedding theorem (Kolyada, 1993) for [C7]: if σ > d/4, then [C7] is satisfied.For the pilot estimator proposed in subsection 3.2, [C6] holds as well.Since the Fourier basis is uniformly bounded, checking , where r * n and C * γ are defined in Proposition 4. We will show that To this end, we need an asymptotic analysis of the terms and I 2 (T ) = I 1 (T ) − I 0 (T ).For the first one, it holds that For every i ∈ {1, . . ., d}, we set Note that, as δ < 1, we have γ i > 0. With this notation, As m i → ∞ for every i, we can replace the sums by integrals Next, we make the change of variables . We get Now, we make another change of variables: Next we give an explicit form for this integral I 0 (T ) ∼ π −d T 4δσ+d 2(1−δ)σ I, where Now, the Liouville formula (see, for instance, Ingster and Stepanova (2011)) combined with the well-known identity Therefore, Very similar computations imply that, as T → ∞, we have Note now that ( 15) is equivalent to n 2 T 2 I 0 (T ) ∼ 8T 4 (I 1 (T ) − I 0 (T )) 2 z 2 1−γ/2 .Using the asymptotic equivalents for I 0 and I 1 we have derived above, one directly checks that the value of T n,γ proposed in Proposition 4 satisfies (15).Furthermore, since ( 16) is equivalent to [C8] and [C9] are fulfilled.Using the same method as the one used above to evaluate I 0 , we get The In order to check [C1] and [C9], we need to find an upper bound for max l∈N (Tn,γ ) q l .In the following calculations, the term C is a constant which depends only on d, α and σ and can vary from line to line.Let l ∈ N (T ), then c l ≤ T q l , which implies, for every i = 1, . . ., d, l and by symmetry, Next, using (42), ( 43) and the third inequality in (41), we get Iterations of the previous process lead to the inequality max j l 2σ j j ≤ CT 1/(1−δ) .Therefore, max l∈N (T ) q l = C d j=1 l The arguments are almost the same as in the proof of Theorem 1.We use the array w n with entries w l = q l 1 {l∈N (T )} /M (T ) 1/2 and the kernel G n (t 1 , t 2 ) = l∈L w l ϕ l (t 1 )ϕ l (t 2 ) in order to define the linear U-test statistic: x i x j G n (t i , t j ).
We write as U n = U n,0 + U n,1 + U n,2 , where and U n,2 = n 2 −1/2 i<j f (t i )f (t j )G n (t i , t j ).The first and the second moments of this Ustatistic are described in the next result, in which we use the notation T w [f ] = l w l θ l [f ]ϕ l .
Lemma 9. Let w n = (w l,n ) l∈L be an array containing only a finite number of nonzero entries and such that l∈L w 2 l,n = 1.Let L(w n ) be the support of w n .The expectation of the U-statistic U n is given by: Proof.This result can be proved along the lines of the proof of Lemma 3. The only difference is in the evaluation of the term A n,2 , for which we have This yields the desired result.
Let us now study the type I and type II error probabilities of the test φ n (T ) = 1 {|Un(T )|>u} .
Evaluation of type II error Using similar arguments, we get This can also be written as: Clearly, the right hand-side of this inequality is lower than γ/2 if This completes the proof of Theorem 3.

E.2. Proof of Corollary 1
It is enough to remark that (since M (•) is increasing and and 1 √ n ≤ T −1 n .In view of these inequalities, the claim of the corollary immediately follows from Theorem 3.
To conclude, it suffices to use inequality (2.74) from (Tsybakov, 2009), which implies that γ n (F 0 , F 1 (r n )) ≥ 0.25e −z 2 (2θ 0,+ ) −2 = γ for z = 2θ 0,+ [ln(4γ) −1 ] 1/2 .It remains to prove the second assertion of the theorem.To ease notation, we write T n instead of T 0 n and set Let us assume that M + (T n ) ≥ M − (T n ).We use the fact that testing n , with f ∈ F − .The rest of the proof follows the same steps as those of the proof of Theorem 2. As indicated in Remark 5, we use as π n the simplified prior for which θ l 's are independent Gaussian random variables with zero mean and variance a l = q l 2TnM + (Tn) 1 {l∈L + ∪N (Tn)} .It is an easy exercice to show that conditions [L1]-[L5] of Proposition 6 are fulfilled with δ = 1/2.This completes the proof of the theorem.R. Vershynin.Introduction to the non-asymptotic analysis of random matrices.Technical report, arXiv:1011.3027v6, 2012.
[C1], [C2], [C4] and [C5] are satisfied in most cases we are interested in.Two illustrative examples-concerning Sobolev ellipsoids with quadratic functionals related to partial derivatives-for which these hypotheses are satisfied are presented in Subsections 3.3 and 3.4.Assumption [C3] is essentially a constraint on the basis {ϕ l }; we show in Subsection 3.1 that it is satisfied by many bases commonly used in statistical literature.[C6] and [C7] are related to additional technicalities brought by the regression model, which force us to impose more regularity than in the Gaussian sequence model.
The second assumption of Proposition 2 is rather weak and usual in the context of regression with random design.It is only needed for getting uniform control of the error rate and the actual value of the norm Π S F f p does not enter in any manner in the definition of the testing procedure.