Weak convergences of marked empirical processes in a Hilbert space and their applications

In this paper, weak convergences of marked empirical processes in $L^2(\mathbb{R},\nu)$ and their applications to statistical goodness-of-fit tests are provided, where $L^2(\mathbb{R},\nu)$ is the set of equivalence classes of the square integrable functions on $\mathbb{R}$ with respect to a finite Borel measure $\nu$. The results obtained in our framework of weak convergences are, in the topological sense, weaker than those in the Skorokhod topology on a space of c\'adl\'ag functions or the uniform topology on a space of bounded functions, which have been well studied in previous works. However, our results have the following merits: (1) avoiding conditions which do not suit for our purpose; (2) treating a weight function which makes us possible to propose an Anderson--Darling type test statistics for goodness-of-fit tests. Indeed, the applications presented in this paper are novel.


Introduction and main results
This paper deals with the weak convergence of a certain sequence of marked empirical processes in L 2 space. Let us begin with preparing a minimal set of notations to describe our main theorems and the scientific background around our results. Let ν be a finite Borel measure on R, and L 2 (R, ν) the set of equivalence classes of the square integrable functions on R with respect to ν. As for L 2 (R, ν), a inner product ·, · defined by f, g = R f (x)g(x)ν(dx) for f, g ∈ L 2 (R, ν) and a norm · defined by f = f, f 1/2 for f ∈ L 2 (R, ν) are equipped. For an interval A, the function 1 A (·) is defined by 1 A (x) = 1 (x ∈ A), 0 (x ∈ A).
For every positive integer n, let us introduce a filtered probability space (Ω n , F n , F n = {F n i } i≥0 , P n ). Let {X n i } i≥0 be a real-valued F n -adapted sequence and {m n i } i≥1 a real valued F n -adapted martingale difference sequence (thus for every i, m n i is F n i -measurable and E n [m n i |F n i−1 ] = 0 almost surely). In this paper, we show weak convergences in L 2 (R, ν) of an empirical process marked by the martingale difference sequence {m n i } i≥1 x Z n (x) = n i=1 1 (−∞,x] (X n i−1 )m n i and its weighted process with weight function x → w(x)(> 0) to Gaussian processes G and G w , respectively. The limits are x G(x) = B(Ψ(x)), where x B(x) is a standard Brownian motion and Ψ is the limit (for the exact sense of the limit, see Assumptions 1.1 and 1.2 below) of First, we provide a sufficient condition to show the weak convergence of Z n .
(iii) There exists a measurable function φ on R such that for every n ∈ N and i = 1, . . . , n there exist some nonnegative constants c n i such that sup n n i=1 c n i < ∞ and that E n [(m n i ) 2 |F n i−1 ] ≤ c n i φ(X n i−1 ) almost surely. (iv) All X n i 's have the same distribution as ζ such that E[φ(ζ)] < ∞. The first goal of this paper is to show the following theorem which asserts the weak convergence of Z n under Assumption 1.1. Its proof will be presented in Section 2.
Remark 1.1. An important point of Theorem1.1 is that we avoid the assumption (B) in Lemma 3.1 of Koul and Stute (1999) which makes a restriction on the transition density of a discrete time Markovian process and does not suit for our diffusion process model considered in Section 4. Although Escanciano (2007) gave a result for a non-Markovian process, he assumed a condition on the smoothness (the condition (D) in his Theorem 1) of the model which also does not fit in our purpose. However, notice that our result does not cover theirs because they considered the weak convergence under the uniform metric.
Next, we provide a sufficient condition to show the weak convergence of Z w n . Obviously, if we set w(·) = 1, then Z w n becomes Z n . However, Theorem 1.1 was separately stated, because (1.2) is stronger than (1.1).
(ii) There exists a constant δ > 0 such that as n → ∞, and there exists a function Λ such that for all sufficiently large n and that The second goal of this paper is to show the following theorem which asserts the weak convergence of Z w n under Assumption 1.2. Its proof will be presented in Section 3. From the practical viewpoint, the case where w = (Ψ) −1/2 is important since it corresponds to the standardization.
Theorem 1.2. Under Assumption 1.2, Z w n converges weakly to G w in L 2 (R, ν) as n → ∞. Remark 1.2. As for (1.2), it follows from a well-known fact on the uniform integrability that if for every x ∈ R, and also equivalent to I n (x) → Ψ(x) for every x ∈ R.
Remark 1.3. As for (1.3), if we assume Assumption 1.1 (iii)(iv), then so we can take Φ(x) as the right-hand side of the above display (if the integrability condition holds).
Based on Theorems 1.1 and 1.2, which are of interest in their own right, we discuss goodness-of-fit tests for stationary ergodic processes. Specifically, we propose a Cramér-von Mises type statistic based on discrete time observation to test a simple hypothesis for a diffusion process and an Anderson-Darling type statistic for a time series. Goodness-of-fit tests have been extensively studied in the literature because they are useful to judge that a mathematical model is acceptable to describe sampled data. We refer to González-Manteiga and Crujeiras (2013) for a review on the goodness-of-fit tests, whose Section 5 is devoted to tests when dependence is present. Among abundant works treating goodnessof-fit tests for stochastic process models, we are interested in an approach based on empirical processes marked by residuals developed by Koul and Stute (1999) and Escanciano (2007). Our limit theorems (Theorems 1.1 and 1.2) do not include Theorem 2.1 of Koul and Stute (1999) or Theorem 1 of Escanciano (2007), but our results contain the following merits which are important in our applications. The assumptions of Theorem 2.1 of Koul and Stute (1999) or Theorem 1 of Escanciano (2007) do not suit our diffusion process setting, on the other hand Theorem 1.1 can be applied. Moreover, although a weak convergence of an Anderson-Darling type test statistic cannot be directly derived from the weak convergence in the Skorokhod space or ∞ space which have been established in Koul and Stute (1999) and Escanciano (2007), Theorem 1.2 enables us to consider the Anderson-Darling type test statistic. As a result, the applications presented in this paper are novel. Remark 1.4. Based on smoothing, Nishiyama (2000Nishiyama ( , 2009 and Masuda et al. (2011) proposed the Kolmogorov-Smirnov type goodness-of-fit tests for time series models and diffusion processes (based on discrete time observation), respectively. What they treated is not Z n (x) but its smoothed version using the Kernel density estimation. Remark 1.5. The goodness-of-fit test for diffusion processes based on continuous time observation, which is not studied in this paper, is considered in several works. See for example Dachian and Kutoyants (2008), Kutoyants (2010), Negri and Nishiyama (2009) and referenecs therein. Remark 1.6. In this paper, we only consider simple hypotheses. Koul and Stute (1999) have considered not only simple hypothesis but also parametric composite hypothesis based on the idea of the martingale transformation (Khmaladze, 1981). Considering parametric composite hypothesis is a possible direction in future researches.

Proof of Theorem 1.1
By Prohorov's tightness criterion for Hilbert space valued random sequences (see, e.g., Theorem 1.8.4 of van der Vaart and Wellner (1996)), it suffices to show the following two lemmas.
we shall apply the martingale central limit theorem for the martingale difference sequence .
It is not difficult to prove that Assumption 1.1 (i) leads On the other hand, it holds that What is left is to show the Lyapunov-type condition the left-hand side of (2.2) is bounded above by which converges to 0 in probability by Assumption 1.1 (ii). This completes the proof.

Proof of Lemma 2.2
For simplicity, let us denote It follows from Assumption 1.1 (iii)(iv) that This completes the proof.

Proof of Theorem 1.2
By Prohorov's criterion, it suffices to show the following two Lemmas.
In the proofs of these lemmas, let n be sufficiently large such that (1.2) and (1.4) hold.

Proof of Lemma 3.1
Since it holds that we shall apply the martingale central limit theorem for the martingale difference sequence We use the dominated convergence theorem to see for every x and y. Moreover, it holds that for every x and y, where we have used and Ψ(x ∧ y) ≤ Ψ(x)Ψ(y) and Φ(x ∧ y) ≤ Φ(x)Φ(y) which follow from the monotonicity of Ψ and Φ. Furthermore, it follows from the Schwartz inequality that Therefore, the dominated convergence theorem implies (3.2).
Next we see the Lyapunov-type condition, that is to say, the nonnegative valued random variable converges to 0 in probability. Since it follows from the Schwartz inequality that to 0. Moreover, this display can be evaluated by so the dominated convergence theorem yields that the right-hand side converges to 0. Indeed, as for the integrand, it holds for every x that This completes the proof.

Proof of Lemma 3.2
In this subsection, let us denote for simplicity. It holds that As for the first term in the right-hand side of (3.4), since The dominated convergence theorem yields that where wB • Ψ means w(·)B(Ψ(·)). That is because for every x ∈ R we have I n (x)(w(x)) 2 → Ψ(x)(w(x)) 2 and I n (x)(w(x)) 2 ≤ Φ(x)(w(x)) 2 whose righthand side is ν-integrable.
As for the second term in the right-hand side of (3.4), since { ξ n i , e j } n i=1 is a martingale difference sequence, we have The dominated convergence theorem yields that That is because, as for the integrand, it holds that for every x and y, and On the other hand, it holds that From what have been already proven, Finally, the dominated convergence theorem yields that (3.6) equals This completes the proof.

Application 1: Cramér-von Mises type goodness-of-fit test for drift parameters in diffusion processes
In this section, we show the application of Theorem 1.1 to the goodness-of-fit test for a diffusion process model.

Problem setting and test procedure
We consider a strictly stationary ergodic stochastic process {X t } t≥0 which is a solution to a one-dimensional stochastic differential equation (SDE) where the random variable X 0 is an almost surely finite initial value, S(·) is a measurable function in interest, σ(·) is a known measurable function and t W t is a standard Wiener process defined on a stochastic basis (Ω, F, (F t ) t∈[0,∞) , P ). Let us list up some assumptions on the functions S(·) and σ(·).
In our problem, from the continuous stochastic process (4.1), {X t n i } n i=1 is observed at discrete time points 0 = t n 0 < t n 1 < · · · < t n n satisfying t n n → ∞, n∆ 2 n → 0 (4.2) as n → ∞, where ∆ n = max 1≤i≤n |t n i − t n i−1 |.
Remark 4.2. We propose an asymptotically distribution free tests based on the sampling scheme (4.2), namely, high frequency data. We should mention that there is a huge literature on discrete time approximations of statistical estimators for diffusion processes; see, for example, the Introduction of Gobet et al. (2004) for a review including not only high frequency cases but also low frequency cases. In our context of goodness-of-fit test, however, it seems difficult to obtain asymptotically distribution free results based on low frequency data. Our result for this problem is related to the preceding work, Masuda et al. (2011), who considered some Kolmogorov-Smirnov type tests based on smoothing. The ideal assertion for the Kolmogorov-Smirnov type tests is still an open problem because it needs a weak convergence theorem in ∞ (R). Under the setting above, the problem is to conduct a goodness-of-fit test of (4.1), that is to say, we wish to test the null hypothesis H 0 : S = S 0 versus H 1 : S = S 0 for a given S 0 with σ being a known function. Let us define the test statistic As it is shown in the next subsection, the asymptotic null distribution of D n is 1 0 |B(u)| 2 du. (4.4)

Justification of proposed procedure
Let us asymptotically justify our test procedure. Let us denote Suppose that H 0 is true. Then, as it will be seen in the proof of Proposition 4.1, the sequence {m n i } n i=1 is close to {m n i } n i=1 which is a martingale difference sequence with respect to the filtration {F i−1 } ∞ i=1 , and Theorem 1.1 yields the weak convergence in L 2 (R, Ψ S0,σ ) of which will be denoted by M b n (·). Proposition 4.1. Let ν be any finite Borel measure on R. Assume (A1) and (A2). Then, U n (·; S) converges weakly in L 2 (R, ν) to B • Ψ S,σ (·) as n → ∞ with (4.2), where B(·) is a standard Brownian motion and is defined in (4.5). From (4.2), it is easy to see that |U n (·; S) − M a n (·)| converges in probability under the uniform metric, and thus also under the L 2 (R, ν)-metric.
Let us show that M a n (·) − M b n (·) converges weakly in L 2 (R, ν) to zero (the degenerate random field) and that M b n (·) converges to B • Ψ S,σ (·); then the assertion of the lemma follows from Slutsky's lemma. To show these two weak convergence claims, we shall apply Theorem 1.1 for (σ(X s ) − σ(X t n i−1 ))dW s (i = 1, . . . , n) (4.6) and ξ n i (x) = 1 (−∞,x] (X t n i−1 )m n i , (i = 1, . . . , n) (4.7) respectively. The condition (i) in Assumption 1.1 for (4.6) where the limit is zero is clear, while that for (4.7) can be proven by using Lemma 4.1 (iii). The condition (ii) in Assumption 1.1 is indeed satisfied. The conditions (iii) and (iv) in Assumption 1.1 is immediate from the stationarity (as for (4.6), use also the latter inequality of Lemma 4.1 (i)). This completes the proof.
The limit random variable satisfies that where the notation = d means the distributions are the same. So, by using the continuous mapping theorem, we obtain the following corollary.
Corollary 4.1. Suppose that (A1) and (A2) are satisfied for a given, specific S 0 and a known σ. If H 0 is true, then D n converges in distribution to (4.4) as n → ∞ with (4.2).
To close this subsection, let us mention the consistency of the test. Let us write the alternative hypothesis in interest as (4.8) Hereafter, (4.8) is assumed to be true. Observe that By using Proposition 4.1 and the continuous mapping theorem, the second term of the right-hand side is O P (1). To prove that the probability that the first term is bounded by M tends to zero as n → ∞ for any M > 0, let us first see that for every x ∈ R which follows from Lemma 4.1 (iii) presented in the next subsection. It is easy to show that this convergence holds uniformly in x. Hence Therefore, it holds that P (D n > M ) = P (Ψ S0,σ (∞)D 1/2 n > Ψ S0,σ (∞)M 1/2 ) → 1 for any constant M > 0.

A technical lemma
In this subsection, we show the following lemma which has already been used.
(i) There exists a constant C p,S,σ > 0 depending only on p, (S, σ) (ii) For given p Lipschitz continuous functions g = (g 1 , ..., g p ), there exists a constant C p,g,S,σ > 0 depending also on (S, σ) such that if |t n i − t n i−1 | ≤ 1 then Assume that X is ergodic with the absolutely continuous invariant distribution µ. Let x ∈ R and p − 1 Lipschitz continuous functions g = (g 1 , ..., g p−1 ) such that that p−1 j=1 g j is µ-integrable be given. If ∆ n → 0 then it holds that (This assertion is true also for p = 1 if we read 1−1 j=1 g j ≡ 1.) Proof. The assertion (i) is well-known; see, for example, Kessler (1997). The assertion (ii) can be proven by using (i). Let us show (iii). We write g(z) = p−1 j=1 g j (z). We may assume that all g j 's are nonnegative without loss of generality. (For the general case, notice that g is represented as the sum of some terms of the form a p−1 j=1 g j where g j = g j ∨ 0 or (−g j ) ∨ 0 and a = 1 or −1.) For any ε > 0, choose two Lipschitz continuous functions l, u such that l ≤ 1 (−∞,x] ≤ u and that R |u(z) − l(z)|g(z)µ(dz) < ε. Then it holds that By doing the same argument replacing u by l we finally get Since the choice of ε is arbitrary, we have proven the assertion of (iii). This completes the proof.

Application 2: Anderson-Darling type goodness-of-fit test for nonlinear time series
In this section, we show the application of Theorem 1.2 to the goodness-of-fit test for a Markovian nonlinear time series model.

Problem setting and test procedure
We consider a strictly stationary ergodic stochastic process {X i } ∞ i=−∞ given by where S(·) is a measurable function, σ(·) is a known measurable function satisfying inf x∈R σ(x) > 0, and {ε i } ∞ i=−∞ is an unobserved iid sequence of absolutely continuous random variables satisfying P (ε 1 ≤ 0) = 1/2 and ε i is independent of X i−1 for all i ∈ Z. In this section, no moment condition on ε 1 is assumed.
Let us introduce the following assumption on S(·) and σ(·).
(B) The process {X i } ∞ i=−∞ is stationary and ergodic with the absolutely continuous invariant law µ S,σ , where the ergodicity is in the sense of the almost sure convergence, that is to say, for every µ S,σ -integrable function g(·). Moreover, the distribution function Ψ S,σ of µ S,σ satisfies R µ S,σ (dx) Ψ S,σ (x) < ∞.
In our problem, from the stochastic process (5.1), a time series {X i } n i=0 is observed.
Under the setting above, the problem is to conduct a goodness-of-fit test of (5.1), that is to say, we wish to test the null hypothesis H 0 : S = S 0 versus H 1 : S = S 0 for a given S 0 with σ being a known function. Let us define the test statistic where sign(·) = −1 (−∞,0) (·) + 1 (0,∞) (·). As it is shown in the next subsection, the asymptotic null distribution of T n is Remark 5.1. Our statistic contains sign(·) along the lines of Erlenmaier (1997) and Section 7.3 of Nishiyama (2000). Of course, if the corresponding required condition on {ε i } ∞ i=1 is satisfied, other functions mentioned in Koul and Stute (1999) can be used. Some examples are f (·) = ·, f (·) = 1 (0,∞) (·) − (1 − α), and other bounded functions. A merit of f (·) = sign(·) is robustness against outliers. Remark 5.2. Our procedure can be regarded as an Anderson-Darling type statistic in the sense of

Justification of proposed procedure
Let us asymptotically justify our test procedure by using Theorem 1.2. Let where m n i = sign(X i − S 0 (X i−1 )) √ n (i = 1, . . . , n).
Suppose that H 0 is true. Then {m n i } n i=1 is a martingale difference sequence with respect to the filtration {F i } n i=0 where F i = σ{X j : 0 ≤ j ≤ i} for i = 1, . . . , n, and it holds that (m n i ) 2 = 1 n a.s. (i = 1, . . . , n).
Proposition 5.1. Suppose that (B) is satisfied for a given, specific S 0 and a known σ. If H 0 is true, then T n defined in (5.2) converges in distribution to (5.3) as n → ∞.

(5.4)
Then the null hypothesis is δ = 0 and the alternative hypothesis is 0 < |δ| < 1/2. Hereafter, 0 < |δ| < 1/2 is assumed. From Ψ S0,σ (x) ≤ Ψ S0,σ (∞) = 1, it follows that The right-hand side of the above display is bounded below by The first term tends to positive infinity in probability since which follows from the ergodicity, whereas the second term is O P (1) which is a consequence of Theorem 1.2 since {m n i } ∞ i=1 is a martingale difference sequence with respect to the filtration {F i } ∞ i=0 . Therefore, it holds that P (T n > M ) = P (T 1/2 n > M 1/2 ) → 1 for any constant M > 0.