A limited in bandwidth uniformity for the functional limit law of the increments of the empirical process

Consider the following local empirical process indexed by $K\in \mathcal{G}$, for fixed $h>0$ and $z\in \mathbb{R}^d$: $$G_n(K,h,z):=\sum_{i=1}^n K \Bigl(\frac{Z_i-z}{h^{1/d}}\Big) - \mathbbE \Bigl(K \Bigl(\frac{Z_i-z}{h^{1/d}}\Big)\Big),$$ where the $Z_i$ are i.i.d. on $\mathbb{R}^d$. We provide an extension of a result of Mason (2004). Namely, under mild conditions on $\mathcal{G}$ and on the law of $Z_1$, we establish a uniform functional limit law for the collections of processes $\bigl\{G_n(\cdot,h_n,z), z\in H, h\in [h_n,\mathfrak{h}_n]\big\}$, where $H\subset \mathbb{R}^d$ is a compact set with nonempty interior and where $h_n$ and $\mathfrak{h}_n$ satisfy the Cs\"{o}rg\H{o}-R\'{e}v\'{e}sz-Stute conditions.

1. Introduction and statement of main results

Introduction
Let (Z i ) i≥1 be a independent, identically distributed sample taking values in R d .
Since the pioneering works of Stute (1982), several researchers have investigated the limit behaviour of the functional increments of the empirical process, which are defined as follows, for fixed h > 0 and z ∈ R d : Here we write [a, b] := [a 1 , b 1 ] × · · · × [a d , b d ] for a, b ∈ R d . Deheuvels and Mason (1992) have provided a uniform functional limit law (UFLL) for the following collections of functional increments: Θ n := 1 (2h n log(1/h n )) 1/2 ∆α n (·, h n , z), z ∈ [0, 1 − h n ] , when d = 1, Z 1 is uniform on [0, 1] and h n satisfies the Csörgő-Révész-Stute (CRS) conditions (see (HV1)-(HV3) below). Implicit in their result is the UFLL for Θ n when (Z i ) i≥1 take values in R and have a density f that is continuous and strictly positive on an open set O, and when z appearing in (1) ranges in a bounded interval H ⊂ O. However, the extension of this result to the multivariate case (d¿1) remained an open problem for almost a decade. Recently, Mason (2004) (see also Einmahl and Mason (2000)) solved this problem by combining the techniques of Deheuvels and Mason (1992) with recent tools in general empirical process theory. Namely, he obtained asymptotic results in a more general framework, considering the following type of stochastic processes indexed by K: Here, K ranges through a class of functions G satisfying some conditions that will stated later (see (HK1)-(HK5) in the sequel). More precisely, Mason established a UFLL for the following the sets of processes, as n → ∞, where H is a compact set of R d with nonempty interior. To cite his result, we have to recall the basic assumptions made in Mason (2004). We say that a sequence of constants satisfies the Csörgő-Révész-Stute (CRS) conditions whenever (HV1) h n ↓ 0, 0 < h n < 1, nh n ↑ ∞, (HV2) lim n→∞ nh n / log n = ∞, Let G be a class of real Borel functions on R d . Set I d := [0, 1] d and Let || · || R d be the euclidian norm on R d . We make the following assumptions on G.
(HK1) lim Here, N (ǫ, F ) denotes the uniform covering number of F for ǫ and the class of norms {L 2 (P)}, with P varying in the set of all probability measures on R d , and taking the F ≡ 1 as an envelope function for the class F (for more details, see Van der Vaart and Wellner (1996), p. 83-84, with r = 2). To overcome any measurability problem, we make the following assumption.
Let L * 2 (G) be the Hilbert subspace of L 2 (R d , λ) spanned by G, where λ denotes the Lebesgue measure on R d . The (rate) function J that rules the large deviation properties of the isonormal Gaussian process generated by L * 2 (G), λ can be defined, for a function Ψ : G → R, by Let B(G) be the set of all real bounded functions on G. Denote by | · | d the usual max norm on R d , namely We make a last assumption upon the law of the Z i (recall that H is a compact set with nonempty interior).
(H f) There exists α such that Z 1 has a density f that is continuous and strictly positive on the set Under all the above assumptions, Mason established the following result.
Theorem (Mason, 2004 The author proved this result by combining the ideas of Einmahl and Mason (2000) with some recent results in large deviation theory (see Arcones (2003, 200)), Gaussian approximation results for finite dimensional laws (Zaitsev (1987a,b)) and sharp bounds for empirical processes (see Talagrand (1994), or Bousquet (2002) and Klein (2002) for sharper bounds). In the present paper, we show that the arguments of Mason can be efficiently used to enrich his theorem with an additional uniformity in h ∈ [h n , h n ], under some mild conditions. The remainder of this article is organised as follows. The main result is given by Theorem 1 in §1.2. The proof follows in §2 and is divided into two parts. In §2.1 we sate a large deviation result and a concentration inequality. These two results are somewhat straightforward in regard to the works of Arcones (2003) and Einmahl and Mason (2000). They will play a crucial role in our proof of Theorem 1, which is written in §2.2.

Statement of the result
We provide in the present paper an extension of the just mentioned theorem of Mason (2004) showing that his UFLL still holds uniformly in h n ≤ h ≤ h n , provided that both (h n ) n≥1 and (h n ) n≥1 satisfy (HV1)-(HV3).
Theorem 1. Let H be a compact subset of R d with nonempty interior. Let (Z i ) i≥1 be an i.i.d. sequence of random variables satisfying (H f ). Let (h n ) n≥1 and (h n ) n≥1 be two sequences of positive numbers satisfying (HV1)-(HV3) as well as h n > 2h n . Then we have almost surely: Sketch our proof : Roughly speaking, our proof is divided into the following steps: • The proof of Mason can be very crudely summed up as follows: given properly chosen sequences of events (E n (ǫ, h n )) n≥1 , he proves that, for fixed ǫ > 0 we have, for all large n for some δ > 0. Then he makes use of the fact that (h n ) n≥1 satisfies conditions (HV1)-(HV3) to achieve his goals, by making use of usual blocking techniques along with the Borel-Cantelli lemma. • Given ρ > 1, we discretise [h n , h n ] into the following grid of size R n ≈ log(h n /h n )/ log(ρ): {h n,0 , h n,1 , h n,2 . . . , h n,Rn } = {h n , ρh n , ρ 2 h n , . . . , h n }.
• For fixed ǫ, we show that P(E n (ǫ, h n,l )) ≤ h δ n,l uniformly in l, for some δ > 0. To do this, we make use of argument that are very similar to those of Mason for proving (5), but taking additional care to get inequalities uniformly in l (see our key argument in §2.2, Step 1). Indeed, we had to write an concentration inequality (see Proposition 2.2), which is somehow a finite distance version of the inequality used by Mason.
• Then we write, ∆ n denoting a term of oscillation of our proceses between two consecutives h n,l , Thus, we can make use of the fact that (h n ) n≥1 satisfies (HV1)-(HV3) and continue our proof as in the proof of Mason. The oscillation term ∆ n is controlled by the concentration inequality of Proposition 2.2. We now focus on some corollaries of Theorem 1. Denote by g n,h,z a non usual form of functional increments of the empirical process, namely: with the notation [s, 1] : where || g || [0,1] d := sup{| g(s) |, s ∈ [0, 1] d }, and where S stands for the following Strassen type set of functions mapping [0, 1] d to R: (10) Denote by f n (K, h, z) the Parzen-Rosenblatt density estimator, namely The main interest of deriving (8) and (9) from Theorem 1 is that it enables us to straightforwardly derive asymptotic confidence bands for f n (K, h, z) that are uniform in h ∈ [h n , h n ], which is the subject of the following corollary.
Proof. By a change of scale, we can assume that K has his support included in [0, 1] d . Define the following application that maps the space of all bounded real functions on [0, 1] d to R: Obviously R is continuous with respect to || · || [0,1] d , since K has bounded variation. Noticing that, by an integration by parts, we have we readily infer the claimed result, by optimising R on the limit set S. We omit details.
In order to state our next corollary, we need to introduce some more notations. Given a positive random variable h * n , we define Corollary 1.2. Let h * n be a sequence of positive random variables satisfying, with probability one: Then we have almost surely Proof. The proof is a direct consequence of corollary 1, by manipulating the following countable collection of events We omit details.
Remark 1. Corollaries 1.1 and 1.2 should be compared to Theorem 1 of Einmahl and Mason (2005). Our results rely on assumptions that are stronger than those stated in Einmahl and Mason (2005). However, we derive uniform rates of convergence that are exact and explicit. Remark 2. Several data-driven bandwidths have a well-known asymptotic limit behavior. Typical examples are the bandwidth selectors of Park and Marron (see Park and Marron (1985)) and of Sheather and Jones (see Sheather and Jones (1991)). But the almost sure limit behavior of h * n is seldom known. However, limits in probability are often provided in the literature. For example, it has been proved (see Park and Marron (1985)) that the bandwidth selector h * n,1 of Park and Marron satisfies, under mild conditions, Here O P (1) means that the sequence is bounded in probability, and h n,0 is the (deterministic) minimizer of which in turn is equivalent to Cn −1/5 for some constant C. Despite such type of asymptotic results for h * n do not meet the requirements of corollary 1.2, it is possible to adapt the latter corollary to derive weaker versions of (14) and (15), which hold in probability instead of almost surely.

The main tools
Our proof of Theorem 1 relies on two crucial tools. First we shall make use of a criterion in large deviation theory for functional spaces. This criterion, which is mostly due to Arcones (2003), is stated in § 2.1.1. We shall also make use of a concentration inequality (se Proposition 2.2) which is proved by borrowing the arguments of Einmahl and Mason (2000).

Uniform large deviation principles
In the proof of Theorem 1, we shall require large deviation results that are uniform in the rows for triangular arrays of processes. This required uniformity leads us to first state a result that can be straightforwardly derived from Theorem 3.1 of Arcones (2003). In the sequel, (ǫ n,i ) n≥1,i≤pn will always denote a triangular array of positive numbers satisfying lim n→∞ max i≤pn ǫ n,i = 0. Given a set T , B(T ) will denote the space of bounded real functions on T . We shall endow this space with the usual sup-norm || · || T . Let (E, ϑ) be a topological space. A real function J : E → R + is said to be a rate function (implicitly for (E, ϑ)) when the sets of the kind {x ∈ E : J(x) ≤ a}, a ≥ 0, are compacts sets of (E, ϑ). Finally, let X n,i n≥1,i≤pn be a triangular array of random elements (not necessarily Borel) taking value in E. We say that X n,i n≥1,i≤pn satisfies the uniform large deviation principle (ULDP) for the triangular array ǫ n,i n≥1,i≤pn and the rate function J whenever • For any closed set F ∈ T we have lim sup n→∞ max i≤pn ǫ n,i log P * X n,i (·) ∈ F ≤ −J(F ).
When referring to outer and inner probabilities P * and P * , one should take care of the underlying probability space, which is taken to be the canonical probability space in our context.
By assumption (HK5) we shall only manipulate true probabilities in our proof of Theorem 1. The following result, which can be seen as a direct corollary of Theorem 2.1 of Arcones (2003), will be used when establishing Proposition 2.3 in the sequel.
Proposition 2.1. Let (X n,i ) n≥1, i≤pn be a triangular array of random elements of B(T ), and let ǫ n,i n≥1, i≤pn be a triangular array of positive numbers. Suppose that the following conditions are satisfied.
1. There exists a semi distance ρ on T that makes T totally bounded. 2. For any p ≥ 1, and (t 1 , . . . , t p ) ∈ R p , the triangular array of random variables X n,i (t 1 ), . . . , X n,i (t p ) n≥1, i≤pn satisfies the ULDP for (ǫ n,i ) n≥1, i≤pn and a rate function J t1,...,tp on R p . 3. For any α > 0 and M¿0, there exists η > 0 such that Then (X n,i (·)) n≥1, i≤pn satisfies the ULDP in B(T ), || · || T for (ǫ n,i ) n≥1, i≤pn and the following rate function Proof. The proof is a direct copy of the proof of Theorem 2.1 in Arcones (2003), replacing P * U n ∈ F ≤ . . . by max i≤pn P * X n,i ∈ F ≤ . . . and P * U n ∈ O ≥ . . . by min i≤pn P * X n,i ∈ O ≥ . . ., and so on. We avoid writing the proof for this reason.

A concentration inequality
For any real Borel function g we set The following concentration inequality for local empirical processes will play a crucial role in the sequel. It states a somewhat finite distance version of the arguments of Einmahl and Mason (2005).

Proof of part (i) of Theorem 1
We will make repeatedly use of the following obvious argument: First select ǫ > 0 arbitrarily. We claim that, almost surely, To prove this, we introduce some parameters that will be properly adjusted in the sequel. Recall that α > 0 appears in (4). Let γ > 0, ρ > 1 and 0 < δ < α/4 be real numbers. We shall invoke some usual blocking arguments along the subsequence n k := [(1 + γ) k ], k ≥ 1. For fixed k ≥ 1, consider the following discretisation of [h n k , h n k−1 ].
h n k ,R k := h n k−1 , h n k ,l := ρ l h n k , l = 0, . . . , R k − 1, where R k := [log(h n k−1 /h n k )/ log(ρ)] + 1, and [u] denotes the only integer q fulfilling q ≤ u < q + 1. Since h n and h n satisfy (HV1)-(HV3), the triangular array h n k ,l , 0 ≤ l ≤ R k satisfies the two following properties that will play a crucial role in our arguments (see our key argument 1 below).
Recall that I d := [0, 1] d . For each 0 ≤ l ≤ R k , we proceed as in Mason (2004), covering H by pairwise disjoint hypercubes written as Γ k,l,j := z k,l,j + 0, (δh n k ,l ) 1/d d , 1 ≤ j ≤ J l , with z k,l,j ∈ H. Since 0 < δ < α/4, we have, for all k ≥ 1 and 0 ≤ l ≤ R k , Notice that by construction: where C := C(δ) depends on δ > 0 and on the volume of H only. Set N k := {n k−1 + 1, . . . , n k } whenever n k−1 < n k , and N k = ∅ elsewhere. For A ⊂ B(G) and ǫ > 0 we write the ǫ neighbourhood of A as For all large k, we split the following probabilities in two.
Our aim is to prove that P 1,k and P 2,k are both summable in k, which would prove part (i) of Theorem 1 by an application of Borel-Cantelli's lemma to P k .

Step 1: blocking and poissonisation
By a blocking argument that is similar to Ottaviani's inequality (see for example Deheuvels and Mason (1992), Lemma 3.4), we have, for fixed k ≥ 1, l ≤ R k , j ≤ J l : P n∈N k G n (·, h n k ,l , z k,l,j ) (2f (z k,l,j )n k h n k ,l log(1/h n k ,l ))) 1/2 / ∈ K 2ǫ ≤ 1 m k P G n k (·, h n k ,l , z k,l,j ) (2f (z k,l,j )n k h n k ,l log(1/h n k ,l )) 1/2 / ∈ K ǫ , where m k := min To control m k , we shall invoke an argument that will be repeatedly used in that article. Roughly speaking, we make use of the arguments of Mason (2004) replacing h n k by h n k ,l . This leads us to consider the following classes of function, for k ≥ 1, l ≤ R k , j ≤ J l : To obtain an upper bound that holds uniformly in h n k ,l , we shall show that all these classes do satisfy the assumptions of Proposition 2.2 simultaneously in h n k ,l an z k,l,j . To prove this, assertions (29) and (30) will play a crucial role.
Key argument 1 : By Lemma 1 in Mason (2004) (Bochner's lemma), and by (29) we have sup where v k → 0 as k → ∞. Notice that each F k,l,j is uniformly bounded by 1, in virtue of (HK2) and (26). We now use the notations of Proposition 2.2 with τ := 2, ρ 0 = 2, δ 0 = 1, p = 4, n := n k − n k−1 and the constants C, v appearing in assumption (HK4). Since n k − n k−1 ∼ γ/(1 + γ)n k and by both (30) and (29), we have, for all large k, h n k ,l ≤ C 2 and min 0≤l≤R k This implies that, for all large k, all the classes F k,l,j do satisfy the assumptions of Proposition 2.2. Moreover, for γ > 0 small enough and for k large enough we have and hence P max m≤n k −n k−1 || G m (·, h n k ,l , z k,l,j ) || G ≥ ǫ 2f (z k,l,j )n k log(1/h n k ,l ) Therefore, for γ > 0 small enough and for k large enough we have m k ≥ 1/2. Now let be the Poissonized version of G n . Here, (η n ) n≥1 is a Poisson random variable with expectation n, and independent of (Z i ) i≥1 . Recalling that by construction, there exists C = C(δ) < ∞ such that J l ≤ C(δ)/h n k ,l , l = 1, ..., R k , it follows that, ultimately as k → ∞, The last inequality is a consequence of usual poissonization inequalities (see, e.g., Giní et al. (2003), Lemma 2.1).

Step 2: A uniform large deviation result
In order to control the P k,l,j uniformly in l and j, we shall establish a uniform large deviation principle that is stated in the next proposition. Recall that J has been defined in (3). Some routine analysis shows that J is a rate function on B(G).
Proposition 2.3. Let (n k ) k≥1 be a strictly increasing integer-valued sequence and z k,l,j k≥1,l≤R k , j≤J l a triangular array of points belonging to H. Under (HV1)-(HV3) and (HK1)-(HK5), the triangular array G n k (·, h n k ,l , z k,l,j ) 2f (z k,l,j )n k h n k ,l log(1/h n k ,l ) 1/2 k≥1,l≤R k ,j≤J l , satisfies the ULDP for the rate function J and the triangular array Proof. To prove Proposition 2.3, we shall make use of Proposition 2.1, and we hence have to check conditions 1, 2 and 3 of that theorem. Compared to Proposition 1 of Mason (2004), the present proposition adds a uniformity in h n k ,l . Checking condition 2 of Proposition 2.1 readily follows the lines of the proof of Mason according to the following remarks: We can apply his Fact 2 with an additional uniformity in h n k ,l , as h n → 0 and hence Bochner's lemma still holds uniformly in h n k ,l . We can also apply his Fact 3, replacing h n by h n k ,l . Hence, his assertion (4.16) still holds replacing h n by h n k ,l , with an additional uniformity in h n k ,l . Now define the following distance on G.
It remains to show that for any M > 0, α > 0, there exists δ > 0 fulfilling lim sup So as conditions 1 and 3 of Proposition 2.1 would be checked. Choose M > 0 and α > 0 arbitrarily. For each k ≥ 1, 0 ≤ l ≤ R k , 1 ≤ j ≤ J l and δ > 0, consider the following class of functions: Let D 1 (2v) be the as in Proposition 2.2 (recall that v > 0 appears in assumption (HK4)). We have, for any k ≥ 1, 0 ≤ l ≤ R k , 1 ≤ j ≤ J l , P k,l,j := P sup By Chernoff's inequality we have By (HK4) we have, by simple arguments, An application of Lemma 1 in Mason (2004) in combination with (29) leads to the following inequality, for all large k: Reasoning as in the key argument 1, we conclude that, for all large k, each class F k,l,j,δ , 0 ≤ l ≤ R k , 1 ≤ j ≤ J l do fulfill conditions (20), (18) and (19) of Proposition 2.2 with ρ 0 := α, τ := √ 2δ, δ 0 := 2, p := 4, n := 2n k and C ′ , v ′ appearing in (43). Applying Proposition 2.2, we get, for all large k and for each 0 ≤ l ≤ R k , 1 ≤ j ≤ J l , Notice that (45) is true for all δ > 0 satisfying 2δ ≤ α. By (42) in conjunction with (45), we have for δ > 0 small enough, ultimately as k → ∞, This shows that (40)  which completes the proof of Proposition 2.3 by an application of Proposition 2.1.

Step 4: an upper bound for P 2,k
Now for an arbitrary ǫ > 0 we shall adjust δ > 0 and ρ > 1 such that, for γ > 0 small enough, the sequence P 2,k has a finite sum in k. We start by the following decomposition.
In order to control P 2,2,k , we make the following decomposition for all large k: where P 2,2,k,l,j := P max n∈N k sup h n k ,l ≤h≤ρh n k ,l , z∈Γ k,l,j B k,n,h,z G n (·, h, z) f (z)n k h log(1/h n k ,l ) G ≥ ǫ .
Hence, for any choice of γ > 0 small enough, we have (recall Proposition 2.2 and assumption (HV4)) lim inf k→∞ inf n∈N k , z∈H, hn k ≤h≤hn k−1 Now consider the following classes of functions: By (29) and Lemma 1 in Mason (2004) we have, for all large k: Recall (26). According to (HK4) we have Proceeding similarly a in the key argument 1, we infer that all the classes F k,l,j , 0 ≤ l ≤ R k , 1 ≤ j ≤ J l do satisfy conditions (20) (18) and (19) in Proposition 2.2, with τ := √ A 2 ∧ 1, ρ := 2, δ 0 = 2, p = 4, n := n k , h := h n k ,l , C ′ and v. Making use once again of Proposition 2.2, and assuming that γ is small enough to fulfill (48) we get that, for all large k and 0 ≤ l ≤ R k , 1 ≤ j ≤ J l , Hence, proceeding as in key argument 2 we get, ultimately as k → ∞, whence (P 2,2,k ) is summable by (HV3).
By making use of similar arguments, it can be proved that, for a suitable choice of ρ > 1 and δ > 0 small enough, we have k≥1 P 2,1,k < ∞.
This result is proved by considering the classes F ′ k,j,l := f (z k,l,j ) −1/2 K · − z k,l,j h n k ,l 1/d − K · − z h 1/d , z ∈ Γ k,l,j , h n k ,l ≤ h ≤ ρh n k ,l , K ∈ G , and showing by (HV 1) that, given ε > 0, one can choose ρ > 1 and δ > 0 small enough to fulfill sup{Var g(Z) , g ∈ F ′ k,j,l } ≤ ǫh n k ,l uniformly in j and l. We omit details for sake of brievness.

Proof of part (ii) of Theorem 1
Since K is a compact subset of B(G), || · || G , it is sufficient to show that for fixed Ψ ∈ K and ǫ > 0 we have almost surely lim n→∞ sup h∈ [hn,hn] inf z∈H G n (·, h, z) Choose an open hypercube H ′ ⊂ H such that P(Z 1 ∈ H ′ ) ≤ 1/2. Such a choice is possible because H has a nonempty interior by assumption. Let 1 < ρ be a parameter that will be fixed later. Consider the net h n,l :=ρ l h n , l = 0, . . . , R n − 1, h n,Rn := h n , R n :=[log(h n /h n )/ log(ρ)] + 1.
For fixed l ≤ R n we divide H' into disjoint hypercubes Γ n,j,l = z n,l,j + [0, h n,l 1/d ) d .
Note than we can construct J l := C/h n,l disjoint hypercubes, where C depends only on the volume of H ′ .
But (HK3) entails that, for fixed l ≤ R n , j ≤ J l , the events E n,l,j are mutually independent (by classical properties of Poisson random measures), whence From Proposition 2.3 and by lower semi continuity of J we deduce that, for some α > 0 and for all large n we have min 0≤l≤Rn, 1≤j≤J l P E C n,l,j ≥ h n,l 1−α .
Hence, ultimately as → ∞, Assumptions (HV2) and (HV3) readily imply that P n has a finite sum in n, which proves (53) by the Borel-Cantelli lemma.