Central limit theorems for the $L_p$-error of smooth isotonic estimators

We investigate the asymptotic behavior of the $L_p$-distance between a monotone function on a compact interval and a smooth estimator of this function. Our main result is a central limit theorem for the $L_p$-error of smooth isotonic estimators obtained by smoothing a Grenander-type estimator or isotonizing the ordinary kernel estimator. As a preliminary result we establish a similar result for ordinary kernel estimators. Our results are obtained in a general setting, which includes estimation of a monotone density, regression function and hazard rate. We also perform a simulation study for testing monotonicity on the basis of the $L_2$-distance between the kernel estimator and the smoothed Grenander-type estimator.


Introduction
The property of monotonicity plays an important role when dealing with survival data or regression relationships. For example, it is often natural to assume that increasing a factor X has a positive (negative) effect on a response Y or that the risk for an event to happen is increasing (decreasing) over time. In situations like these, incorporating monotonicity constraints in the estimation procedure leads to more accurate results. The first non-parametric monotone estimators were introduced in [19], [6], and [41], concerning the estimation of a monotone probability density, regression function, and failure rate. These estimators are all piecewise constant functions that exhibit a non-normal limit distribution at rate n 1/3 .
On the other hand, under some more regularity assumptions on the function of interest, smooth non-parametric estimators can be used to achieve a faster rate of convergence to a Gaussian distributional law. Typically, these estimators are constructed by combining an isotonization step with a smoothing step. Estimators constructed by smoothing followed by an isotonization step have been considered in [7], [47], [17], and [44], for the regression setting, in [46] for estimating a monotone density, and in [16], who consider maximum smoothed likelihood estimators for monotone densities. Methods that interchange the smoothing step and the isotonization step, can be found in [42], [13], and [35], who study kernel smoothed isotonic estimators. Comparisons between isotonized smooth estimators and smoothed isotonic estimators are made in [40], [25] and [24].
A lot of attention has been given in the literature to the pointwise asymptotic behavior of smooth estimators and monotone estimators, separately. However, for example for goodness of fit tests, global errors of estimates are needed instead of pointwise results. For the Grenander estimator of a monotone density, a central limit theorem for the L 1 -error was formulated in [20] and proven rigorously in [21]. A similar result was established in [11] for the regression context. Extensions to general L p -errors can be found in [30] and in [12], where the latter provides a unified approach that applies to a variety of statistical models. On the other hand, central limit theorems for regular kernel density estimators have been obtained in [9] and [8].
In this paper we investigate the L p -error of smooth isotonic estimators obtained by kernel smoothing the Grenander-type estimator or by isotonizing the ordinary kernel estimator. We consider the same general setup as in [12], which includes estimation of a probability density, a regression function, or a failure rate under monotonicity constraints (see Section 3 in [12] for more details on these models). An essential assumption in this setup is that the observed process of interest can be approximated by a Brownian motion or a Brownian bridge. Our main results are central limit theorems for the L p -error of smooth isotonic estimators for a monotone function on a compact interval. However, since the behavior of these estimators is closely related to the behavior of ordinary kernel estimators, we first establish a central limit theorem for the L p -error of ordinary kernel estimators for a monotone function on a compact interval. This extends the work by [9] on the L p -error of densities that are smooth on the whole real line, but is also of interest by itself. The fact that we no longer have a smooth function on the whole real line, leads to boundary effects. Unexpectedly, different from [9], we find that the limit variance of the L p -error changes, depending on whether the approximating process is a Brownian motion or a Brownian bridge. Such a phenomenon has also not been observed in other isotonic problems, where a similar embedding assumption was made. Usually, both approximations lead to the same asymptotic results (e.g., see [12] and [30]).
After establishing a central limit theorem for the L p -error of ordinary kernel estimators, we transfer this result to the smoothed Grenander estimator. The key ingredient here is the behavior of the process obtained as the difference between a naive estimator and its least concave majorant. For this we use results from [37]. As an intermediate result, we show that the L p -distance between the smoothed Grenander-type estimator and the ordinary kernel estimator converges at rate n 2/3 to some functional of two-sided Brownian motion minus a parabolic drift.
The situation for the isotonized kernel estimator is much easier, because it can be shown that this estimator coincides with the ordinary kernel estimator on large intervals in the interior of the support, with probability tending to one. However, since the isotonization step is performed last, the estimator is inconsistent at the boundaries. For this reason, we can only obtain a central limit theorem for the L p -error on a sub-interval that approaches the whole support, as n converges to infinity. Finally, the results on the L p -error can be applied immediately to obtain a central limit theorem for the Hellinger loss.
The paper is organized as follows. In Section 2 we describe the model, the assumptions and fix some notation that will be used throughout the paper. A central limit theorem for the L p -error of the kernel estimator is obtained in Section 3. This result is used in Section 4 and 5 to obtain the limit distribution of the L p -error of the SG and GS estimators. Section 6 is dedicated to corresponding asymptotics for the Hellinger distance. In Section 7 we provide a possible application of our results by considering a test for monotonicity. Details of some of the proofs are delayed to Section 8 and to additional technicalities have been put in the supplemental material in [38].

Assumptions and notations
Consider estimating a function λ : [0, 1] → R subject to the constraint that it is nonincreasing. Suppose that on the basis of n observations we have at hand a cadlag step estimator Λ n of A typical example is the estimation of a monotone density λ on a compact interval by means of the empirical cumulative distribution function Λ n . Hereafter M n denotes the process M n = Λ n − Λ, µ is a measure on the Borel sets of R, and k is a twice differentiable symmetric probability density with support [−1, 1]. (1) The rescaled kernel is defined as k b (u) = b −1 k (u/b) where the bandwidth b = b n → 0, as n → ∞. In the sequel we will make use of the following assumptions.
In particular, the approximation of the process M n by a Gaussian process, as in assumption (A2), is required also in [12]. It corresponds to a general setting which includes estimation of a probability density, regression function or a failure rate under monotonicity constraints (see Section 3 in [12] for more details on these models). First we introduce some notation. We partly adopt the one used in [9] and briefly explain their appearance. Letλ s n be the standard kernel estimator of λ, i.e.
As usual we decompose into a random term and a bias term: where When nb 5 → C 0 > 0, then g (n) (t) converges to After separating the bias term, the first term on the right hand side of (3) involves an integral of k b (t − u) with respect to the process M n . Due to (A2), this integral will be approximated by an integral with respect to a Gaussian process. For this reason, the limiting moments of the L p -error involve integrals with respect to Gaussian densities, such as and a Taylor expansion of k b (t−u) yields the following constants involving the kernel function: For example, the limiting means of the L p -error and a truncated version are given by: where D and g (n) are defined in (7) and (4). Depending on the rate at which b → 0, the limiting variance of the L p -error has a different form. When nb 5 → 0, the limiting variance turns out to be where with σ 1 representing p-th moments of bivariate Gaussian vectors, where D, ψ, and φ are defined in (7) and (6). When nb 5 → C 0 > 0 and B n in (A2) is a Brownian motion, the limiting variance of the L p -error is where g, D, ψ, and φ are defined in (5), (7) and (6), whereas, if B n in (A2) is a Brownian bridge, the limiting variance is slightly different, with Finally, the following inequality will be used throughout this paper: where p ∈ [1, ∞), −∞ ≤ A < B ≤ ∞ and q, h ∈ L p (A, B).

Kernel estimator of a decreasing function
We extend the results of [9] and [8] to the case of a kernel estimator of a decreasing function with compact support. Note that, since the function of interest cannot be twice differentiable on R (not even continuous), the kernel estimator is inconsistent at zero and one. Moreover we show that the contribution of the boundaries to the L p -error is not negligible, so in order to avoid the L p -distance to explode we have to restrict ourselves to the interval [b, 1 − b] or apply some boundary correction.

A modified L p -distance of the standard kernel estimator
Letλ s n be the standard kernel estimator of λ defined in (2). In order to avoid boundary problems, we start by finding the asymptotic distribution of a modification of the L p -distance instead of Theorem 3.1. Assume that (A1)-(A3) hold. Let k satisfy (1) and let J c n be defined in (15). Suppose p ≥ 1 and nb → ∞.
The proof goes along the same lines as in the one for the case of the L p -norms for kernel density estimators on the whole real line (see [9] and [8]). The main idea is that by means of assumption (A2), it is sufficient to prove the central limit theorem for the approximating process. When B n in (A2) is a Brownian motion, the latter one can be obtained by a bigblocks-small-blocks procedure using the independence of the increments of the Brownian motion. When B n in (A2) is a Brownian bridge, we can still obtain a central limit theorem, but the limiting variance turns out to be different. The latter result differs from what is stated in [9]. In [9], the complete proof for both Brownian motion and Brownian bridge, is only given for the case nb 5 → 0, and it is shown that the random variables obtained by using the Brownian motion and the Brownian bridge as approximating processes are asymptotically equivalent (see their Lemma 6). In fact, when dealing with a Brownian bridge, the rescaled L p -error is asymptotically equivalent to the L p -error that corresponds to the Brownian motion process plus an additional term which is equal to CW (L(1)), for a constant C proportional on θ 1 (p) defined in (13). When the bandwidth is small, i.e., nb 5 → 0, the bias term g(t) in the definition of θ 1 (p) disappears. Hence, by the symmetry property of the standard normal density, θ 1 (p) = 0 and as a consequence C = 0. This means that the additional term resulting from the fact that we are dealing with a Brownian bridge converges to zero. For details, see the proof of Lemma 8.1. When nb 5 → C 2 0 > 0, only a sketch of the proof is given in [9] for B n being a Brownian motion and it is claimed that again the limit distribution would be the same for B n being a Brownian bridge. However, in out setting we find that the limit variances are different.
Let (W t ) t∈R be a Wiener process and define Hence, if B n in assumption (A2) is a Brownian motion, then according to (14), According to assumption (A2), the right hand side of (18) is of the order O P (n −1/2+1/q ), and As a result, the statement follows from the fact that (bσ 2 (p)) −1/2 where g (n) and m c n (p) are defined in (4) and (8), respectively. This result is a generalization of Lemmas 1-5 in [9] and the proof goes in the same way. However, for completeness we give all the details in the supplementary material. See Lemma A.1 in [38].
Finally, if B n is a Brownian bridge on [0, L(1)], we use the representation B n (t) = W (t) − tW (L(1))/L(1). By replacing Γ in the previous reasoning, the statement follows from Lemma 8.1.
When nb 4 → 0, the centering constant m n (p) can be replaced by a quantity that does not depend on n.

Boundary problems of the standard kernel estimator
We show that, actually, we cannot extend the results of Theorem 3.1 to the whole interval [0, 1], because then the inconsistency at the boundaries dominates the L p -error. A similar phenomenon was also observed in the case of the Grenander-type estimator (see [12] and [30]), but only for p ≥ 2.5. In our case the contribution of the boundaries to the L p -error is not negligible for all p ≥ 1. This mainly has to do with the fact that the functions g (n) , defined in (4), converge to infinity. As a result, all the previous theory, which relies on the fact that For the first term within the brackets, we have whereas for any 0 < c < 1 and t ∈ [0, cb], Because nb → ∞, this would mean that What would solve the problem is to assume that λ is twice differentiable as a function defined on R (see [9] and [8]). This is not the case, because here we are considering a function which is positive and decreasing on [0, 1] and usually is zero outside this interval. This means that as a function on R, λ is not monotone anymore and has at least one discontinuity point.
The following results indicate that inconsistency at the boundaries dominates the L p -error, i.e., the expectation and the variance of the integral close to the end points of the support converge to infinity. We cannot even approach the boundaries at a rate faster than b (as in the case of the Grenander-type estimator), because the kernel estimator is inconsistent on the whole interval [0, b) (and (1 − b, 1]). Proposition 3.3. Assume that (A1)-(A3) hold and letλ s n be defined in (2). Let k satisfy (1). Suppose that p ≥ 1 and nb → ∞.
The previous results also hold if we consider the integral on The proof can be found in the supplemental material [38].
Remark 3.4. Note that, if b ∼ n −α , for some 0 < α < 1, then for α < 1/3, Proposition 3.3(i) shows that for all p ≥ 1, the expectation of the boundary regions in the L p -error tends to infinity. This holds in particular for the optimal choice α = 1/5. For p < 1/(1 − α), Proposition 3.3(ii) allows us to include the boundary regions in the central limit theorem for the L p -error of the kernel estimator, with J n (p) defined in (16) andm n (p) = 1 0 g (n) (t) p dµ(t). However, the bias termm n (p) is not bounded anymore. On the other hand, if p > 1/(1− α), Proposition 3.3(iii) shows that the boundary regions in the L p -error behave asymptotically as random variables whose variance tends to infinity.
Remark 3.5. The choice of the measure µ instead of the Lebesgue measure, in [9] and [8], is motivated by the fact that, for a particular µ(t) = w(t)dt, the normalizing constants m(p) and σ(p) in the CLT will not depend on the unknown function. In our case, a proper choice for µ can also be used to get rid of the boundary problems. This happens when µ puts less mass on the boundary regions in order to compensate the inconsistency of the kernel estimator. For and, as a result, Theorem 3.1 also holds if we replace J c n (p) with J n (p), defined in (16).

Kernel estimator with boundary correction
One way to overcome the inconsistency problems of the standard kernel estimator is to apply some boundary correction. Let nowλ s n be the 'corrected' kernel estimator of λ, i.e.
where k For s ∈ [−1, 1], the coefficients ψ 1 (s), ψ 2 (s) are determined by As a result, the boundary corrected kernel satisfies Moreover, ψ 1 and ψ 2 are continuously differentiable (in particular they are bounded). We aim at showing that in this case, Theorem 3.1 holds for the L p -error on the whole support, i.e., with J n (p) instead of J c n (p). Note that boundary corrected kernel estimator coincides with the standard kernel estimator on [b, 1 − b]. Hence the behavior of the L p -error on [b, 1 − b] will be the same. We just have to deal with the boundary regions [0, b] and [1 − b, 1]. Proposition 3.6. Assume that (A1)-(A3) hold and letλ s n be defined in (26). Let k satisfy (1) and suppose p ≥ 1 and nb → ∞. Then The previous result also holds if we consider the integral on The proof can be found in the supplemental material [38].
Proof. It follows from combining Theorem 3.1 and Proposition 3.6, together with the fact that where D and g (n) are defined in (7) and (4).

Smoothed Grenander-type estimator
The smoothed Grenander-type estimator is defined bỹ whereΛ n is the least concave majorant of Λ n . We are interested in the asymptotic distribution of the L p -error of this estimator: We will compare the behavior of the L p -error ofλ SG n with that of the regular kernel estimatorλ s n from (26). Becausẽ we will make use of the behavior ofΛ n − Λ n , which has been investigated in [37], extending similar results from [15] and [32]. The idea is to representΛ n −Λ n in terms of the mapping CM I that maps a function h : R → R into the least concave majorant of h on the interval I ⊂ R, or equivalently by the mapping D h = CM I h − h. Let B n be as in assumption (A2) and ξ n a N (0, 1) distributed r.v. independent of B n . Define versions W n of Brownian motion by Define where with L as in Assumption (A2). We start with the following result on the L p -distance betweenλ SG n andλ s n . In order to use results from [37], we need that 1 ≤ p < min(q, 2q − 7), where q is from Assumption (A2). Moreover, in order to obtain suitable approximations in combination with results from [37], we require additional conditions on the rate at which 1/b tends to infinity. Also see Remark 4.2. For the optimal rate b ∼ n −1/5 , the result in Theorem 4.1 is valid, as long as p < 5 and q > 9.
Theorem 4.1. Assume that (A1) − (A2) hold and let µ be a finite measure on (0, 1). Let k satisfy (1) and letλ SG n andλ s n be defined in (29) and (26), respectively. If 1 ≤ p < min(q, 2q − 7) and nb → ∞, such that where Z(t) = W (t) − t 2 , with W being a two-sided Brownian motion originating from zero, and Proof. We write We first show that and then the result would follow from the continuous mapping theorem. Note that integration by parts yields The proof consists of several succeeding approximations of A E n . For details, see Lemmas 8.2 to 8.6. First we replace A E n in the previous integral by A W n . The approximation of Y n (t) by where A W n is defined in (32), is possible thanks to Assumption (A2). According to (14), According to Lemma 8 then we obtain and (35) follows. In order to prove (38), we replace A W n by n 2/3 D Inv Λ W n , i.e., we approximate Y where I nv = [0, 1]∩[v−n −1/3 log n, v+n −1/3 log n] and Λ W is defined in (33). From Lemma 8 Hence, similar to the argument that leads to (39) then, together with (14), it follows that Consequently, (38) is equivalent to (41). In order to prove (41), let Let Again, similar to the argument that leads to (39) then, together with (14), it follows that which would prove (41). We proceed with proving (44). Let W be a two sided Brownian motion originating from zero. We have that as a process in s. Consequently, and . Once more, similar to the argument that leads to (39) then, together with (14), it follows that and as a result, also (44) holds.
As a final step, we prove (48). Since It remains to show that because then, it follows that dy.
Assumptions (A1) and (A2) imply that t → c 1 (t) is strictly positive and differentiable with bounded derivative, so by a Taylor expansion we get which concludes the proof of (50) and finishes the proof of the theorem.
Remark 4.2. Note that the assumption 1/b = o n 1/6+1/(6p) (log n) −(1+1/p) of the previous theorem puts a restriction on p, when b has the optimal rate n −1/5 . This is due to the approximation of Y   For example, for t ∈ (0, b), it can be shown that there exists a universal constant K > 0, such that which is not bounded in probability for p > 1. For details see the supplemental material [38]. The same result also holds for t ∈ (1 − b, 1).
In the special case p = 1, for t ∈ (0, b) we have If (A3) holds, then Similarly, we can deal with the case t ∈ (1 − b, 1). It follows that We are now ready to formulate the CLT for the smoothed Grenander-type estimator. The result will follow from combining Corollary 3.7 with Theorem 4.1. Because we now deal with the L p -error betweenλ SG n and λ, the contribution of the integrals over the boundary regions (0, 2b) and (1 − 2b, 1) can be shown to be negligible. This means we no longer need the third requirement in Theorem 4.1 on the rate of 1/b. Theorem 4.4. Assume that (A1)−(A3) hold and let k satisfy (1). Let I SG n be defined in (30).
Proof. Define By Corollary 3.7, we already have that forλ s n defined in (26). Hence it is sufficient to show that Indeed, by (14), we get Moreover, by integration by parts and the Kiefer-Wolfowitz type of result in Corollary 3.1 in [14], it follows that Together with Proposition 3.6 this implies (52). Similarly, we also have Thus, it remains to prove Again, from (14), we have Then, (56) follows immediately from (57) and the fact that, according to Theorem 3.1, This proves the theorem.
Remark 4.5. Note that, if b = cn −α , for some 0 < α < 1, the proof is simple and short in case α < p/(3(1 + p)) because the Kiefer-Wolfowitz type of result in Corollary 3.1 in [14] is sufficient to prove (58). Indeed, from (54), it follows that However, this assumption on α is quite restrictive because for example if α = 1/5 then the theorem holds only for p > 3/2 (not for the L 1 -loss) and if α = 1/4 then the theorem holds only for p > 3.

Isotonized kernel estimator
The isotonized kernel estimator is defined as follows. First, we smooth the piecewise constant estimator Λ n by means of a boundary corrected kernel function, i.e., let where k (t) b (u) defined as in (27). Next, we define a continuous monotone estimatorλ GS n of λ as the left-hand slope of the least concave majorant Λ s n of Λ s n on [0, 1]. In this way we define a sort of Grenander estimator based on a smoothed naive estimator for Λ. For this reason we use the superscript GS.
We are interested in the asymptotic distribution of the L p -error of this estimator: It follows from Lemma 1 in [22] (in the case of a decreasing function), thatλ GS n is continuous and is the unique minimizer of dt over all nonincreasing functions λ, whereλ s n (t) = dΛ s n (t)/dt. This suggestsλ s n (t) as a naive estimator for λ 0 (t). Note that, for t ∈ [b, 1 − b], from integration by parts we get i.e.,λ s n coincides with the usual kernel estimator of λ on the interval [b, 1 − b]. Let 0 < γ < 1. It can be shown that See Corollary B.2 in the supplemental material [38]. Hence, their L p -error betweenλ GS n and λ s n will exhibit the same behavior in the limit. Note that this holds for every γ < 1, which means that the interval we are considering is approaching (b, 1 − b). Consider a modified L p -error of the isotonized kernel estimator defined by We then have the following result.

Hellinger error
In this section we investigate the global behavior of estimators by means of a weighted Hellinger distance whereλ n is the estimator at hand. This metric is convenient in maximum likelihood problems, which goes back to [33,34,3]. Consistency in Hellinger distance of shape constrained maximum likelihood estimators has been investigated in [43], [45], and [10], whereas rates on Hellinger risk measures have been obtained in [45], [28], and [27]. The first central limit theorem type of result for the Hellinger distance was presented in [39] for Grenander type estimators of a monotone function. We deal with the smooth (isotonic) estimators following the same approach. Note that, for the Hellinger distance to be well defined we need to assume that λ takes only positive values. We follow the same line of argument as in [39]. We first establish that which shows that the squared Hellinger loss can be approximated by a weighted squared L 2 -distance. For details, see Lemma C.1 in the supplemental material [38], which is the corresponding version of Lemma 2.1 in [39]. Hence, a central limit theorem for squared the Hellinger loss follows directly from the central limit theorem for the weighted L 2 -distance (see Theorem C.2 in the supplemental material [38], which corresponds to Theorem 3.1 in [39]). An application of the delta method will then lead to the following result.
(iv) Under the conditions of Theorem 4.4, results (i)-(iii) also hold when replacingλ s n by the smoothed Grenander-type estimatorλ SG n , defined in (29). Proof. The proof consists of an application of the delta-method in combination with Theorem C.2 in the supplemental material [38]. According to part (i) of Theorem C.2 in [38], where Z is a mean zero normal random variable with variance τ 2 (2). Therefore, in order to obtain part (i) of Theorem 6.1, we apply the delta method with the mapping φ(x) = 2 −1/2 x 1/2 . Parts (ii)-(iv) are obtained in the same way.
To be complete, note that from Corollary B.2, the previous central limit theorems also hold for the isotonized kernel estimatorλ GS n , defined in Section 5, when considering a Hellinger distance corresponding to the interval (b γ , 1 − b γ ) instead of (0, 1) in (63).

Testing
In this section we investigate a possible application of the results obtained in Section 4 for testing monotonicity. For example, Theorem 4.4 could be used to construct a test for the single null hypothesis H 0 : λ = λ 0 , for some known monotone function λ 0 . Instead, we investigate a nonparametric test for monotonicity on the basis of the L p -distance between the smoothed Grenander-type estimator and the kernel estimator, see Theorem 4.1.
The problem of testing a nonparametric null hypothesis of monotonicity has gained a lot of interest in the literature (see for example [29] for the density setting, [26], [23] for the hazard rate, [1], [4], [5], [18] for the regression function).
We consider a regression model with deterministic design points where the ǫ i 's are independent normal random variables with mean zero and variance σ 2 . Such a model satisfies Assumption (A2) with q = +∞ and Λ n (t) = n −1 i≤nt Y i , for t ∈ [0, 1] (see Theorem 5 in [12]).
Assume we have a sample of n obseravtions Y 1 , . . . , Y n . Let D be the space of decreasing functions on [0, 1]. We want to test H 0 : λ ∈ D against H 1 : λ / ∈ D. Under the null hypothesis we can estimate λ by the smoothed Grenander-type estimatorλ SG n defined as in (29). On the other hand, under the alternative hypothesis we can estimate λ by the kernel estimator with boundary correctionsλ s n defined in (26). Then, as a test statistics we take , and at level α, we reject the null hypothesis if T n > c n,α for some critical value c n,α > 0. In order to use the asymptotic quantiles of the limit distribution in Theorem 4.1, we need to estimate the constant C 0 which depends on the derivatives of λ. To avoid this, we choose to determine the critical value by a bootstrap procedure. We generate B = 1000 samples of size n from the model (64) with λ replaced by its estimatorλ SG n under the null hypothesis. For each of these samples we compute the estimatorsλ SG, * n ,λ s, * n and the test statistics Then we take as a critical value, the 100α-th upper-percentile of the values T * n,1 , . . . , T * n,B . We repeat this procedure N = 1000 times and we count the percentage of rejections. This gives an approximation of the level (or the power) of the test if we start with a sample for which the true λ is decreasing (or non-decreasing).
We investigate the performance of the test by comparing it to tests proposed in [1], [2] and in [18]. For a power comparison, [1] and [2] consider the following functions We denote by T B the local mean test of [2] and S reg n the test proposed in [1] on the basis of the distance between the least concave majorant of Λ n and Λ n . The result of the simulations for n = 100, α = 0.05, b = 0.1, are given in Table 1. We see that, apart from the last case, all  the three tests perform very well and they are comparable. However, our test behaves much better for the function λ 7 , which is more difficult to detect than the others.
The second model that we consider is taken from [1] and [18], which is a regression function given by The results of the simulation, again for n = 100, α = 0.05, b = 0.1 and various values of a and σ 2 are given in Table 2. We denote by S reg n the test of [1] and by T run the test of [18]. Note that when a = 0, the regression function is decreasing so H 0 is satisfied. We observe that our test rejects the null hypothesis more often than T run and S reg n but, however, it has rejection probability smaller than 0.05. As the value of a increases, the monotonicity of λ a is perturbed. For a = 0.25 our test performs significantly better than the other two and, as expected, the power decreases as the variance of the errors increases. When a = 0.45 and σ 2 not to large, the three test show optimal power but, when σ 2 increases, T n outperforms T run and S reg n .  We note that the test performs the same way if, instead of the L 2 -distance betweenλ SG n andλ s n , we use the L 1 -distance on (0, 1). Indeed, in Remark 4.3 we showed that , for p = 1, the limit theorem holds on the whole interval (0, 1). Moreover, we did not investigate the choice of the bandwidth. We take b = 0.1, which seems to be a reasonable one considering that the whole interval has length one. 1. If nb 5 → 0, then (bσ 2 (p)) −1/2

Auxiliary results and proofs
where σ 2 (p) is defined in (9). 2. If nb 5 → C 2 0 , then (bθ 2 (p)) −1/2 Proof. From the properties of the kernel function and L we have where the O P term is uniformly for t ∈ [0, 1]. Hence, inequality (14) implies that Therefore, it is sufficient to prove a CLT for We can then write where we use for the first term in the integrand on the right hand side of the first equality in (68). The third term on the right hand side of (68) converges to zero in probability, so it suffices to deal with the first two terms. To establish a central limit theorem for the first term, one can mimic the approach in [9] using a big-blocks-small-blocks procedure. See Lemmas A.1 and A.2 in the supplemental material [38] for details. It can be shown that The random variables ζ i are independent and satisfy where γ 2 (p) is defined in (51). Next, consider the second term in the right hand side of (68). We have where D and σ n (t) are defined in (7) and (67), respectively, and φ denotes the standard normal density. Note that

Hence, integration by parts gives
where θ 1 is defined in (13). We conclude Moreover, Covar |X n,t | p−1 sgn {X n,t } , |X n,s | p−1 sgn {X n,s } L ′ (t)L ′ (s)w(t)w(s) dt ds because for |t − s| > 2b, X n,t is independent of X n,s . As a result, using that X n,t has bounded moments, we obtain This means that in probability and −p W (L(1)) L(1) Going back to (68), we conclude that In the case nb 5 → 0, we have g(t) = 0 in the definition of θ 1 (p) in (13). Hence, by the symmetry of the standard normal distribution, it follows that θ 1 (p) = 0 and as a result C = 0. According to (70) and (72), this means that converges in distribution to a mean zero normal random variable with variance σ 2 (p). Then, consider the case nb 5 → C 2 0 > 0. Note that ζ i depends only on the Brownian motion on the interval [ (1). Hence, the left hand side of (72), can be written as Since now we have a sum of independent random variables, we apply the Lindeberg-Feller and that the Lyapounov condition is satisfied. Once we have (73), condition (74) is equivalent to In order to prove this, we use that E[ζ 4 i ] = O(M 2 2 ), (see (S6) in the proof of Lemma A.2 in the supplemental material [38]). Then, we get

It can be shown that
.
where σ 2 n (t) is defined in (67) and Because σ 2 n (t) → D 2 l(t), where D is defined in (7), g (n) (t) → g(t), as defined in (5), and b −1 k t−u b l(u) du → l(t), we find that applying the definitions of C and θ 1 (p) in (71) and (13), respectively. It follows from the Lindeberg-Feller central limit theorem that Proof. We follow the same reasoning as in the proof of Lemma 8 in [37]. Let I nv = [0, 1] ∩ [v − n −1/3 log n, v + n −1/3 log n] and for J = E, W , let Then according to Lemma 3 in [37], there exists C > 0, independent of n, v, d, such that From the proof of Lemma 8 in [37], using (76) with d = log n, we have According to the assumptions on the order of b −1 , the right hand side is of order o P (1).

Supplementary Material
Supplement to "Central limit theorems for global errors of smooth isotonic estimators".
• Supplement A: Kernel estimator of a decreasing function.
n be as in (17). Assume that (A1) and (A3) hold. Then (bγ 2 (p)) −1/2 where γ 2 (p), g (n) and m c n (p) are defined respectively in (51), (4) and (8). Proof. With a change of variable we can write where First, we show that η has no effect on the asymptotic distribution, i.e. is negligible. Using Jensen inequality and (a + b) p ≤ 2 p (a p + b p ) and the fact that l and w are bounded, we obtain for some positive constants C 1 and C 2 . On the other hand, Hence, This means that bη = o P (1). The statement follows immediately from Lemma A.2.
With this notation we can write and we aim at showing that the first term in the right hand side of the previous equation determines the asymptotic distribution of and . Furthermore, and, as we did for η, it can be seen that E[ξ 2 i ] = O(1) and E[γ 2 i ] = O(1). Since γ i depends only on the Brownian motion on the interval [L(b(iM 2 +2i−2)), L(b(iM 2 + 2i + 2))], it follows that γ i are independent (note that M 2 > 2). Moreover, γ * is independent of Next, since ζ i , i = 1, . . . , M 3 are independent, we apply the central limit theorem to conclude that and that they satisfy the Lyapunov's condition Note that, once we have (S5), the Lyapunov's condition is equivalent to b and that all the moments of the ξ i 's are finite, we obtain that In particular, it also follows that Now we prove (S5). From (S1), it follows that We have already shown in the proof of the previous lemma that E[η 2 ] = O(1), so the first term in the right hand side of the previous equation converges to zero. Furthermore, Now, making use of (S4), (S7) and the fact that, by Cauchy-Schwartz, This means that

Moreover, from Lemma A.3, it follows that
where ρ n (t, u) and σ n (t) are defined respectively in (S9) and (S8). First we consider the case nb 5 → 0 and show that we can remove the g (n) functions from the previous integral. Indeed, since Note that, if |t − u| ≥ 2b, then ρ n (t, u) = 0 and the previous integrands are equal to zero. Hence, a sufficient condition for the left hand side of the previous inequality to converge to zero is to have This is indeed the case because g n (u) = O (nb) 1/2 b 2 uniformly w.r.t. u and (nb) 1/2 b 2 → 0.
In the same way we can remove also the other g (n) functions from the integrand, i.e.
With the change of variable t = u + sb, we get where r(s) is defined in (7). The continuity of the functions l and w and the dominated convergence theorem yield Then, with the change of variable yr(s) + 1 − r 2 (s)x = z we can write equivalently where σ 1 is defined in (10). Let us now consider the case nb 5 → c 2 0 > 0. First we show that the g (n) (u) functions can be replaced by g(u) defined in (5). Indeed, g (n) (u) = g(u) + o((nb) 1/2 b 2 ), where the big O term is uniform w.r.t. u and similar calculations to those of the previous case allow us to conclude that With the change of variable t = u + sb, we get Proof. First, note that (X n,t , X n,u ) ∼ N g (n) (t) g (n) (u) , σ 2 n (t) σ n (t, u) σ n (t, u) σ 2 n (u) .
Hence, we have Consequently, we obtain Proof of Proposition 3.3. We first prove (i). For each t ∈ [0, b), we havẽ uniformly in t ∈ [0, b], and that according to (21), Moreover, for t ≤ b/2, Now, define the event Then, P(A n ) → 1 and on the event A n , |λ s n (t) − λ(t)| ≥ C/2. Consequently we obtain for some c > 0. Hence In order to prove (ii), due to (14), we can bound According to (S10) uniformly for t ∈ [0, b]. Hence, we obtain because n (p−1)/2 b p/2 = (bn 1−1/p ) p/2 → 0. Next we deal with (iii). Again by means of (14), we can bound and, as in (S10), Together with (S12), we obtain Because n −1+1/q b −1 = O(1), the term within the brackets is of order O P (1), and since b p−1 n p−2+2/q → 0, the right hand side tends to zero. This proves (25). Then, by Jensen's inequality, we get (S13) Note that Y n (t) ∼ N (0, σ 2 n (t)), where, if B n is a Brownian motion, and if B n is a Brownian bridge. Now, choose ǫ > 0. Then For c > 0, define the events and let Then, since Y n has continuous paths, we have Moreover, Y n (t) > 0 on the event A n1 , and from (23), it follows that Y n (t) + g (n) (t) < 0, for n sufficiently large. Therefore, for n sufficiently large, we have on A n1 , Similarly, Y n (t) < 0 on the event A n2 and Y n (t) + g (n) (t) < 0, for large n, so that on A n2 , Next, write (S16) Consider the first term on the right hand side. Because for n large, Y n (t) + g (n) (t) < 0 on the event A n1 , we have |Y n (t) + g (n) (t)| ≤ |g (n) (t)|. It follows that on the event A n1 ∩ B n : This means that we can remove the absolute value signs in the first term on the right hand side of (S16). Similarly, Y n (t) + g (n) (t) < 0, for n sufficiently large on the event A n2 , so that on the event A n2 ∩ B c n : cb 0 |Y n (t) + g (n) (t)| p dµ(t) ≥ cb 0 |g (n) (t)| p dµ(t) ≥ cb 0 E |Y n (t) + g (n) (t)| p dµ(t), so that we can also remove the absolute value signs in the second term on the right hand side of (S16). It follows that the right hand of (S16) is equal to by using (21) and (22). Furthermore, for the first term on the right hand side where ǫ n (t) = ǫ/(2g (n) (t)) = O((nb) −1/2 ) → 0, due to (20), (21) and (22), where the big-O term is uniformly for t ∈ [0, b]. This means that, for n large, 1 + ǫ n (t) > 0, and by a Taylor expansion |1 + ǫ n (t)| p = 1 + pǫ n (t) + O((nb) −1 ). It follows that Going back to (S13), since P(A n1 ) → 1 and P(A n2 ) → 1, we conclude that b −1 Var Furthermore, where φ denotes the standard normal density. This proves (S20) for the case that B n is a Brownian motion. When B n in (A2) is a Brownian bridge, then we use the representation B n (u) = W n (u) − uW n (L(1))/L(1), for some Brownian motion W n . In this case, by means of (14), we can bound For t ∈ I 3 ,Λ * n (t) = Λ s n (t), so (S24) is trivial. For t ∈ I 2 , by the mean value theorem, for some ξ t ∈ (t, b γ ) ⊂ (b, b γ ). Thus, P Λ * n (t) ≥ Λ s n (t), for all t ∈ I 2 ≥ P(A n ) ≥ 1 − δ/10, for n sufficiently large, according to Lemma B.1. The argument for I 4 is exactly the same. Next, we consider t ∈ I 1 . We havê where Λ s is the deterministic version of Λ s n , For the first term on right hand side of (S25), note that due to Assumption (A2). Moreover, for the third term on right hand side of (S25), for t ∈ (b, 1 − b), we have For the second term on right hand side of (S25), for t ∈ [0, b), we write where ξ t ∈ (t, b γ ). Furthermore, the first two integrals on the right hand side can be written as with ξ t ∈ (t, b γ ), |ξ 1,y − b γ | ≤ by and |ξ 2,y − t| ≤ by. This means that P Λ * n (t) − Λ s n (t) ≥ 0, for all x ∈ I 1 ≥ P Y n ≤ Hence, for n large enough, this probability is greater than 1 − δ/10, because γ < 1.