Uniform limit laws of the logarithm for estimators of the additive regression function in the presence of right censored data

It has been recently shown that nonparametric estimators of the additive regression function could be obtained in the presence of right censoring by coupling the marginal integration method with initial kernel-type Inverse Probability of Censoring Weighted estimators of the multivariate regression function \citeDebbarhViallon3. In this paper, we get the exact rate of strong uniform consistency for such estimators. Our uniform limit laws especially lead to the construction of asymptotic simultaneous 100% confidence bands for the true regression function.


Introduction
Consider a triple (Y, C, X) of random variables defined in IR + × IR + × IR d , d ≥ 2, where Y is the variable of interest (typically a lifetime variable), C a censoring variable and X = (X 1 , . . ., X d ) a vector of concomitant variables.In most practical applications, such as epidemiology or reliability, the relationship between Y and X is of particular interest.Denoting by ψ a given measurable function, we will focus here on the study of the conditional expectation of ψ(Y ) given X = x, for all x = (x 1 , . . ., x d ) ∈ IR d . (1.1) The introduction of the function ψ will allow us to treat simultaneously the standard regression function and the conditional distribution function (see Remark 2.4 below).
In the right censorship model, the pair (Y, C) is not directly observed and the corresponding information is given by Z = min{Y, C} and δ = 1I {Y ≤C} , 1I E standing for the indicator function of the set E. Therefore, we will assume that a sample D n = {(Z i , δ i , X i ), i = 1, . . ., n} of independent and identically distributed replicae of the triple (Z, δ, X) is at our disposal.In this setting, transformations of the observed data D n are usually needed to estimate functionals of the conditional law of Y (see, e.g., [4,17,[28][29][30] and the recent work of [35]).Estimators based on these transformations are usually referred to as synthetic data estimators.In this paper, following the ideas initiated by [28], we use a nonparametric version of particular synthetic data estimators, commonly referred to as Inverse Probability of Censoring Weighted [I.P.C.W.] estimators (see [3,5] and [27] for some results related to nonparametric I.P.C.W. estimates of the censored regression function).It is however noteworthy that the methodology we propose here for I.P.C.W -type estimators shall apply with minor modifications to cope with other synthetic data estimators (see Paragraph

below).
A well-known issue in nonparametric estimation is the so-called curse of dimensionality: the rate of convergence of nonparametric estimators generally decreases as the dimensionality d of the covariate increases.To get round this problem, one solution is to work, if possible, under the additive model assumption, which allows to write the regression function as follows, (1.2) In (1.2), the real-valued functions m ψ,ℓ , ℓ = 1, . . ., d, are defined up to an additive constant, and the assumption IEm ψ,ℓ (X ℓ ) = 0, ℓ = 1, . . ., d, is usually made to ensure identifiability.This assumption implies µ = IEψ(Y ).In the uncensored case, several methods have been proposed to estimate the additive regression function.We shall evoke, among others, the methods based on B-splines [36], on the backfitting algorithm [21,23,32] and on marginal integration [34,40,31].
In [17], Fan and Gijbels established the asymptotic normality for estimators obtained via the backfitting algorithm combined with various synthetic data estimators.In [3], Brunel and Comte considered additive models as special cases in their study of adaptive projection I.P.C.W. estimators.Here, following the ideas introduced in [10], we make use of the marginal integration method, coupled with initial kernel-type I.P.C.W. estimators to provide an estimator for the additive censored regression function.This combination leads to estimators for which the theory is easier to derive, which was wanted here, given the technicalities in the proof, even in this simplified setting (note however that, as already mentioned, extensions to other synthetic data estimators can be obtained; see Paragraph 3.1).In a previous work [10], the mean-square convergence rate was established for the integrated estimator defined in (2.7) below.In the present paper, we get the exact corresponding rate of strong uniform consistency (see Theorem 3.2 below).Our limit law corresponds to the extension of Theorem 2 in [9] to the censored case.Moreover, following the ideas developed in [13], asymptotic simultaneous 100% confidence bands are derived for the true regression function.This kind of bands may be complementary to the more classical (1 − α) × 100% pointwise confidence intervals derived from CLT type results (see Section 4).

Hypotheses-Notations
Before presenting our estimator and stating our results, we shall introduce some notations as well as our working assumptions.First consider the hypotheses to be made on the random triple (Y, C, X).Introduce, for all t ∈ IR, Remark 2.1.Assumption (C.3) will allow to control bias terms.Assumptions (C.1) and (C.2) are essentially needed when using most synthetic data estimators.(C.2) allows to use convergence results for the Kaplan-Meier [25] estimator of G.In addition, (C.1) especially allows to derive the result (2.1) below, which is a fundamental requirement for synthetic data.This assumption was also used by Stute [38] in another context.It is however noteworthy that Beran [2] (see also [8] and [12]) worked under the weaker assumption of conditional independence between Y and C given X to derive properties for a local version of the Kaplan-Meier estimator.On the other hand, to use Beran's local Kaplan-Meier estimator the censoring has to be locally fair, i.e., such that IP(C > t|X) > 0 whenever IP(Y > t|X).Here, (see assumption (A)(ii) below), we essentially suppose that G(t) > 0 whenever F (t) > 0, which is on its turn a weaker assumption.For a nice discussion on the differences between the assumptions to be made when using either Beran's estimator or I.P.C.W. type estimators, we refer to [5].
Further let C 1 , . . ., C d , be d compact intervals of IR with non empty interior, and set For every subset E of IR q , q ≥ 1, and any α > 0, introduce the α-neighborhood E α of E, with | • | IR q standing for the usual euclidian norm on IR q .The functions f and f ℓ , ℓ = 1, . . ., d, will be supposed to be continuous, and we will assume the existence of a constant α > 0 such that the following assumptions hold.
Remark 2.2.Assumption (C.4) is classical when dealing with kernel type estimators of the regression function (see, e.g., [13,15]).The fact that s ′ > sd in (C.5), when combined with (C.3) above and (K.1-2) and (H.4) below, allows to derive easily the results pertaining to the case where the density function f is unknown from the ones obtained in the simpler case where this function is known.Some refinements in our proofs might allow for relaxing (C.5) (see, e.g., [22]).
Recalling (1.1), we will let ψ vary in a pointwise measurable VC subgraph class F of measurable real-valued functions defined on IR (for the definitions of pointwise measurable classes of functions and VC subgraph classes of functions, we refer to p. 110 and Chapter 2 in [41]).We will also assume that F has a measurable envelope function Υ(y) ≥ sup ψ∈F |ψ(y)|, y ∈ IR, such that (C.6) Υ is uniformly bounded on IR.
Remark 2.3.In the uncensored setting, (C.6) can be replaced by some finiteness condition on the moment of order 2 of Υ(Y ) (see [13] or [15]).In the censored setting however, such refinements are useless due to the assumption (A) below.
Remark 2.4.Choices of particular interest for the class F are F reg = {I}, where I denotes the identity function on IR and F cdf = {1I (−∞,t] , t ∈ IR}.Considering the class F reg allows to treat the case of the classical regression function.On the other hand, considering the class F cdf allows to derive the uniform consistency (especially over t ∈ IR) for estimates of the conditional distribution function.We refer to [15] for examples in the uncensored case.
We will further employ sequences of positive constants {h n } n≥1 and {h ℓ,n } n≥1 , 1 ≤ ℓ ≤ d, satisfying the following conditions.(H.1) Remark 2.5.Assumptions (H.1-2-5) are classical in the empirical process theory, and are often referred to as the Csörgő-Révész-Stute [CRS] conditions [7,37].They especially allow to control variance-type terms.On the other hand, assumption (H.3) allows to control bias terms (see Lemma 5.8 below).As already mentioned, assumption (H.4) allows to derive easily the results pertaining to the case where the density function f is unknown from the ones obtained in the simpler case where this function is known.
As mentioned in [19], functionals of the (conditional) law can generally not be estimated on the complete support when the variable of interest is rightcensored.So, to state our results, we will work under the assumption (A), that will be said to hold if either (A)(i) or (A)(ii) below holds.For any right continuous survival function L defined on IR, set T L = sup{t ∈ IR : L(t) > 0}.
as n → ∞, for every ℓ = 1, . . ., d.It is noteworthy that the assumption (A)(ii) is needed in our proofs when considering the estimation of the "classical" regression function, which corresponds to the choice ψ(y) = y.On the other hand, rates of convergence for estimators of functionals such as the conditional distribution function IP(Y ≤ t|X) can be obtained under weaker conditions, when restricting ourselves to t ∈ [0, ω] with ω < T H .
These preliminaries being given, we can recall the procedure we proposed in [10] to estimate the censored regression function under the additive model assumption.Let K be a bounded and compactly supported kernel on IR d .By kernel, we mean as usual a measurable function integrating to one on its support.We define the kernel density estimator fn of f by Now, as was observed notably by Koul et al. [28], we have under (C.1), Then, denoting by G ⋆ n the Kaplan-Meier [25] estimator of G, kernel-type estimators of the multivariate regression function m ψ (x) defined in (1.1) can be easily constructed [27].Here, because marginal integration will further be applied, the internal estimator idea of Jones [24] has to be used.That leads us to consider the following multivariate I.P.C.W. kernel-type estimator of the regression function, Here the kernel functions K ℓ , ℓ = 1, . . ., d, defined in IR are supposed to be continuous, of bounded variation (i.e.such that 0 and compactly supported.Recalling that a kernel function Γ defined in IR d is said to be of order γ, for any α ≥ 1, whenever (a) and (b) below holds jointly, (a) In order to apply the marginal integration method (see [31,34]), introduce q 1 , . . ., q d , d given density functions defined in IR.Further set, for all x = (x 1 , .., x d ) ∈ IR d , q(x) = d ℓ=1 q ℓ (x ℓ ) and, for every ℓ = 1, . . ., d, q −ℓ (x −ℓ ) = j =ℓ q j (x j ) with x −ℓ = (x 1 , .., x ℓ−1 , x ℓ+1 , .., x d ).Now, we can define in such a way that, recalling (1.2), the two following equalities hold, In view of (2.4) and (2.5), for every ℓ = 1, . . ., d, η ψ,ℓ and m ψ,ℓ are equal up to an additive constant, so that the functions η ψ,ℓ are actually some additive components, which coincide with m ψ,ℓ for the choice q ℓ = f ℓ (which is only achievable if f ℓ is known).From (2.2) and (2.3), a natural estimator of the ℓ-th component η ψ,ℓ is given, for ℓ = 1, . . ., d, by From (2.5) and (2.6), an estimator m ⋆ ψ,add of the additive regression function can be deduced, In the sequel, we will assume that the known integration density function q ℓ has a compact support included in C ℓ , ℓ = 1, . . ., d.Moreover, we will impose the following assumption on the functions q −ℓ , ℓ = 1, . . ., d.

Main results
We have now all the ingredients to state our results.From now on, a.s.
From Theorem 3.1, we will deduce an analogous result for the additive regression function estimator m ⋆ ψ,add defined in (2.7).Theorem 3.2.Assume the hypotheses of Theorem 3.1 hold.If, in addition, h ℓ,n = h 1,n for every ℓ = 1, . . ., d, then we have, where σ is as in (2.10).
Keep in mind that a similar result is readily obtained for the conditional distribution function by selecting The proofs of Theorems 3.1 and 3.2 are postponed to Section 5. A sketch of the proof of Theorem 3.1 is as follows.It will be split into two main parts.First we will assume that both the survival function G of C and the density function f of X are known.Then, using appropriate approximations lemmas, we will show how to treat the general case (i.e. the case where neither f nor G is known).To establish the results in the case where both G and f are known, we will mostly borrow the arguments developed in [13] and [15] (see also [16]), which rest on recent developments in empirical process theory, and especially on an exponential bound due to Talagrand [39] (see also Inequality A.1 in the Appendix).
In the following Paragraph 3.1, we show how our results may be extended to the case of more general synthetic data.In Section 4 we present an application of our results, following the ideas developed in [13].

Extensions
Here, we will limit ourselves to the case F = {I}, where I stand for the identity function on IR.The corresponding estimator defined in (2.2), and then the one defined under the additive assumption in (2.7), rest on the following transformation, which is due to Koul et al. [28]: for 1 ≤ i ≤ n, which, in the case where G is known, reduces to Note that (3.4) sets a censored observation to 0 and multiplies an uncensored observation by a factor [G(Z i )] −1 , which can be very large if G(Z i ) is near 0. Alternative, and more general, synthetic data can be constructed in the following way.For any given ρ ∈ IR, set with ρ chosen such that Θ 1 (Z) > 0 almost surely.Now, consider the transformation, for 1 ≤ i ≤ n, Observe that (3.4) corresponds to the particular choice ρ = −1.The choice ρ = 0 is also popular, and was first considered in [30].Other choices (including some data-dependent choices) are discussed in [17].
Consider the following independence assumption.
Observe that ( C.1) naturally implies (C.1).Under the assumption ( C.1), we have (see, e.g., [26]) A close look into the proof presented in Section 5 below reveals that, in the case where F = {I}, Theorems 3.1 and 3.2 still hold when considering estimators built on the "general" synthetic data, under the assumption ( C.1).The only difference is the term H ψ (u) = H I (u) (since ψ = I) defined in (2.8) that shall be replaced in the general case by where
In view of Theorem 3.1, it is straightforward that, for each 0 < ε < 1, there exists almost surely an n 0 = n 0 (ε) such that, for all n ≥ n 0 , Therefore, under the assumptions of Theorem 3.1, the interval provides asymptotic simultaneous confidence bands (at an asymptotic confidence level of 100 %) for η ψ,ℓ (x ℓ ) over x ℓ ∈ C ℓ (see [13] for more details).It is noteworthy that our bands do not provide confidence regions in the usual sense, since they are not based on a specified confidence level 1 − α.Instead, they hold with probability tending to 1 as n → ∞, and are then more conservative (since they are simultaneous and with an asymptotic level of 100 %).A comparison between pointwise (1 − α) × 100% confidence intervals and our simultaneous almost certainty bands can be found in [11].In most applications, we recommend the construction of both types of confidence region to assess the form of the relationship between ψ(Y ) and X.
Remark 4.1.For finite sample size use, Deheuvels and Mason [13] give some recommendations on how to ensure that the simultaneous (almost certainty) confidence bands defined in (4.2) include the pointwise confidence intervals.See Remark 1.7 (pp.233-235) in [13] for more details.
The confidence bands appear to be adequate, in the sense that they contain the true value of the additive component for "almost" every x ∈ [−1, 1].The fact that the true function does not belong to our bands for some points was expected: it is due to the ε term in (4.1).In other respect, the boundary effect pertaining to kernel estimators is perceptible on the plots of Figure 1.In view of the assumption (C.4), we shall however recall that our theorems do not allow to build confidence bands on the entire [−1, 1], and the plots should only be considered on, typically, [−0.9, 0.9].

Proof of Theorem 3.1
Only the proof for the first component is provided.The proof for the d − 1 remaining components follows from similar arguments and is therefore omitted.As already mentioned, we first consider the case where both the survival function G of C and the density function f of X are known.
In this paragraph, we intend to prove the following result, which is the version of Theorem 3.1 in the case where both f and G are known.
In a first step, we will establish Lemma 5.1 below.
Lemma 5.1.Under the assumptions of Theorem 3.1, we have where σ 1 is as in (2.10).
Let ψ ∈ F be a fixed real-valued, measurable and uniformly bounded function defined in IR.Following the ideas developed in [15], we will first establish Lemma 5.2 below, which corresponds to Lemma 5.1 in the case where F is reduced to {ψ}.Then, we will show how to handle the uniformity over the whole class F (see Lemma 5.6 and 5.7 in the sequel).
The proof of Lemma 5.2 is built on recent developments in empirical process theory (see, e.g., [13,15,16]).Denote by α n the multivariate empirical process based upon (X 1 , Z 1 , δ 1 ), . . ., (X n , Z n , δ n ) and indexed by a class G of measurable functions defined on IR d+2 .More formally, for g ∈ G, α n (g) is defined by (5.4) For X i = (X i,1 , . . ., X i,d ), 1 ≤ i ≤ n, and with From (5.1), (5.2), (5.4) and (5.5), we successively get the two following equalities Lemma 5.3 below enables to evaluate the respective order of each of the terms in the right hand side of (5.8).Its proof is postponed until Section 5.2.
Lemma 5.3.Under the conditions of Theorem 3.1, we have, almost surely, (5.9) In view of (5.8) and (5.9), the asymptotic behavior of the left hand side of (5.8) can be deduced from that of α n (g x1 ψ,n ).Then, following once again the ideas of [15] (see also [13]), the proof of Lemma 5.2 will be split into an upper bound part (captured in Lemma 5.4) and a lower bound part (captured in Lemma 5.5).

Upper bound part
Lemma 5.4.Recall the definitions (2.10), (5.4) and (5.5).Under the assumptions of Theorem 3.1, we have, for all ε > 0, with probability one, (5.10) Proof.We will first examine the behavior of the process α n (g x1 ψ,n ) on an appropriately chosen grid of C 1 (partitioning).To do so, we will make use of Bernstein's maximal inequality.Then, we will evaluate the uniform oscillations of our process between the grid points (evaluation of the oscillations).Towards this aim, we will make use of an inequality due to Mason [33], recalled for convenience in Inequality A.1 (see the Appendix).
Partitioning.Let a 1 and c 1 be such that C 1 = [a 1 , c 1 ], and fix 0 < δ < 1.From now on, set, for some λ > 1, n k = [λ k ], for all k ≥ 1, and consider the following partitioning of the compact C 1 , .11)where u ≤ [u] < u + 1 denotes the integer part of u.
Here, we claim that, for all ε > 0, with probability one, lim sup (5.12) For any real valued function ϕ defined on a set B, we use the notation ϕ B = sup x∈B |ϕ(x)|, and in the particular case where B = IR m , for m ≥ 1, we will write ϕ = ϕ B .Recall that K ℓ , ℓ = 1, . . ., d, is of bounded variation, and ψ is uniformly bounded.Thus, under the hypothesis (A), there exists a constant 0 < κ < ∞ such that, for each 0 ≤ j ≤ J k and any Moreover, by (C.1), and making use of a classical conditioning argument, it follows from (2.8), (5.5) and (5.6) that, for all 0 But, by setting h −1 = (h 2,n k , . . ., h d,n k ) T and making use of classical changes of variables, it can be derived that, under (K.1) and (Q.1), for a given 0 < θ < 1, in such a way that Recalling the definition (2.10) of σ ψ,1 , it readily follows, from (5.14) and (5.15), that, for all ε > 0 and for n large enough, max In view of (5.4), (5.13) and (5.16), we can apply Bernstein's maximal inequality (see for instance Lemma 2.2 in [14]) to the sequence of random variables, This yields, for n large enough, IP max 1,n k . (5.17) Keep in mind the definition (5.11) of J k .Since, under (H.5), k≥1 h ̺ 1,n < ∞, for all ̺ > 0, the result (5.17) combined with the Borel-Cantelli Lemma naturally implies (5.12).
Evaluation of the oscillations.In the sequel, for any class G of measurable functions, we will denote by α n G = sup g∈G |α n (g)|, with α n as in (5.4).
For future use, first consider a slightly wider class of functions than the one strictly needed in this paragraph.Namely, set Arguing exactly as in pages 17 and 18 of [15], it can be shown that, for all measurable functions, which has a uniform polynomial covering number, i.e., such that for some C 0 > 0 and µ > 0, and all 0 , IP probability measure} denotes the uniform covering number of the class G ′ for ε and the class of norms {L 2 (IP)}, with IP varying in the set of all probability measures on IR d+2 (for more details, see, e.g., pp.83-84 in [41]).
To study the behavior of the process α n (g x1 ψ,n ) between the grid points x 1,j and x 1,j+1 , with 0 ≤ j ≤ J k − 1, we introduce the following class of functions Note that, for every 0 . Now we claim that, for any ε > 0, there exist almost surely a δ ε and a λ ε such that, lim sup whenever (5.11) holds with 0 To establish (5.19), we will make use of Inequality A.1 (see the Appendix).Towards this aim, first note that, since K 1 is of bounded variation, we can write where K 1,1 and K 1,2 are two non-decreasing functions of bounded variation on IR.Clearly, K 1,1 and K 1,2 are such that .
and making use of the same arguments as those used to derive (5.16), it is readily shown that where D 1 and A 2 are the constants involved in Inequality A.1.By selecting δ > 0 sufficiently small, and λ > 1 close enough to 1 to make max n k−1 <n≤n k |h n − h n k |/h n k as small as desired for large k (using (H.1-2)), we get Now observe that for all 0 ≤ j ≤ J k − 1, we have ||g ψ || ≤ κ uniformly over , where κ is as in (5.13).Therefore, applying Inequality A.1 with τ as in (5.21) and ρ = τ 2/A 2 yields IP max Arguing as before, (5.19) now follows under (H.5) from (5.22) and the definition (5.11) of J k , in combination with the Borel-Cantelli Lemma.

Conclusion:
The proof of Lemma 5.4 is completed by combining (5.12) and (5.19).

Lower bound part
Lemma 5.5.Recall the definitions (5.4), (5.5) and (2.10).Under the assumptions of Theorem 3.1, we have, with probability one, (5.23) Proof.Recall the definition (2.9), and note that, from Scheffe's Lemma, it follows under (A) and (C.2-3) that the function is continuous on C 1 (see Section A.3 in [13] for a complete proof of such continuity results).Then, for any ǫ > 0, we can select a sub-interval Now, consider the following partitioning of J For each x 1,i , 1 ≤ i ≤ k n , define the function where T n is as in (5.6).Given these notations, the proof of Lemma 5.5 follows from the same lines as those used to establish Proposition 3 in [15].For the sake of brevity, we omit the details of these book-keeping arguments.
From Lemmas 5.3, 5.4 and 5.5, we achieve the proof of Lemma 5.2.
Under the conditions of Theorem 3.1, we readily obtain from Lemma 5.2 that, with probability one, for any finite subclass G ⊂ F , Therefore, to achieve the proof of Lemma 5.1 we shall show how to extend (5.24) to the entire class F .The following couple of lemmas are directed towards this aim.
Lemma 5.6.Assume the assumptions of Theorem 3.1 hold.For all ε > 0, we can find a finite subclass G ε ⊂ F , such that, for any ψ 1 ∈ F , and for n large enough, where, for all ψ ∈ F , g x1 ψ,n is as in (5.5).
] holds.Under (A), it follows from (5.5) and (5.6) that, for ψ 1 , ψ 2 ∈ F and where Besides, since F is a V C subgraph class, it is totally bounded with respect to d Q , where Q is the uniform (0, ω 0 ) distribution.Thus, for any ε > 0, we can find a finite class G ε such that Fix ε > 0 and select n 0 > 0 so large that (5.25) holds for all n ≥ n 0 .Further define, for all ψ 1 , ψ 2 ∈ F ,

Now consider the class of functions
Lemma 5.7.Under the assumptions of Theorem 3.1, we have, with probability one, where A is an absolute constant.
Recalling (5.8), the proof of Lemma 5.1 is achieved by combining (5.24) with the results of Lemma 5.6 and 5.7.Now, to conclude the proof of Proposition 5.1, it is clearly enough to establish the following result.
Lemma 5.8.Recall the definition (5.2).Under the assumptions of Theorem 3.1, we have, Proof.From (5.1), and arguing as in (2.1), it holds that Then, by making use a Taylor development of order s (rendered possible by the assumptions (K.1) and (C.3)), we get By (H.3), the result of Lemma 5.8 is now a direct consequence of (5.2).

Two useful approximation lemmas
Now, we shall show how to treat the general case (i.e. when neither 2) [resp.(2.6)]) in the case where G is known and f is unknown.Namely, we have Proof.Because q(x) = d ℓ=1 q ℓ (x ℓ ) and because the functions q ℓ , ℓ = 1, . . ., d, are bounded, a classical change of variable yields, for i = 1, . . .n, Recall the definition (5.1) and set Ψ(y, c) = 1I {y≤c} ψ(y ∧ c)/G(y ∧ c), for all y, c ∈ IR.Then, since, for ℓ = 1, . . ., d, q ℓ has a compact support included in C ℓ and K ℓ is compactly supported, we have under (H.1) and for n large enough, Clearly, by (A), Ψ is uniformly bounded.Therefore, the following result (see, e.g., [1]) is enough to conclude under (C.4) that, almost surely as n → ∞, Similarly, it can be shown that, almost surely as n → ∞, From these two last statements and the definitions (5.2) and (5.28), the proof of Lemma 5.9 is completed under the assumption (H.4).
Lemma 5.10.Recall the definitions (2.6) and (5.28).Under the assumptions of Theorem 3.1, we have, almost surely as n → ∞, (5.30) Proof.First consider the case where (A)(i) holds.Set b = inf x∈C α f (x).Note that b > 0 by (C.4).Then, recalling (2.2) and (5.27) and arguing as we did along the proof of Lemma 5.9, we get Since ω < T H , the iterated law of the logarithm of [18] ensures that almost surely as n → ∞.Therefore, it follows under (C.2-3-4) that, almost surely as n → ∞, (5.31) In the same spirit it can be shown that, almost surely as n → ∞, which, by (H.5), completes the proof of Lemma 5.10 under A(i).In the case where (A)(ii) holds, the proof follows from the same lines as above, making use of either the iterated law of the logarithm of [20] (if (A)(ii) holds with p = 1/2) or Theorem 2.1 of [6] (if (A)(ii) holds with 0 < p < 1/2) instead of the iterated law of the logarithm of [18].The details are omitted.
By combining Lemmas 5.9 and 5.10 with Proposition 5.1, we conclude the proof of Theorem 3.1.

Proof of Lemma 5.3
Set and It follows that, Observing that g is uniformly bounded under the assumptions of Theorem 3.1 and making use of some conditioning arguments, it can be shown that 1 n Γ 2 1,n (x 1 ) → 0 as n → ∞.

Fig 1 .
Fig 1. Results of the simulation study: true additive components (solid line), their estimates (red dashed line), and the associated confidence bands (dotted line).

5. 1 . 1 .
The case where both f and G are known Recall the definitions (2.2) and (2.6) and let m ψ,n [resp.η ψ,1 ] be the version of m ⋆ ψ,n (x) [resp.η ⋆ ψ,1 ] in the case where both G and f are known.Namely we have m ψ,n (x) = 1 n n i=1