Strong Invariance Principles for Ergodic Markov Processes

Strong invariance principles describe the error term of a Brownian approximation of the partial sums of a stochastic process. While these strong approximation results have many applications, the results for continuous-time settings have been limited. In this paper, we obtain strong invariance principles for a broad class of ergodic Markov processes. Strong invariance principles provide a unified framework for analysing commonly used estimators of the asymptotic variance in settings with a dependence structure. We demonstrate how this can be used to analyse the batch means method for simulation output of Piecewise Deterministic Monte Carlo samplers. We also derive a fluctuation result for additive functionals of ergodic diffusions using our strong approximation results.


Introduction
Let X = (X k ) k∈N be a stochastic sequence defined on a common probability space and consider the partial sum process S n , given by S n = n k=1 X k .We say that a strong invariance principle holds for X if there exist a probability space (Ω, F , P) on which we can construct a sequence of random variables X ′ = (X ′ k ) k∈N and a Brownian motion W = (W (t)) t≥0 , such that X and X ′ are equal in law and |S ′ n − µn − σW (n)| = O(ψ n ) a.s., where S ′ n denotes the partial sum process of X ′ , µ and σ are finite constants determined by the law of the process, O describes the asymptotic regime, and ψ n the corresponding approximation error.More specifically, if S = (S t ) t≥0 denotes a stochastic process and ψ = (ψ t ) t≥0 is some positive sequence, we write S T = o(ψ T ) a.s. and S T = O(ψ T ) a.s. to denote P lim T →∞ S T ψ T = 0 = 1 and P lim sup respectively.For notational convenience we will usually make no distinction between X and X ′ .
For a sequence of independent, identically distributed random variables with mean zero and unit variance, the Komlós-Major-Tusnády approximation [48; 49] asserts that if E|X 1 | p < ∞ for some p > 2, then on a suitably enriched probability space, we can construct a Brownian motion W = {W (t), t ≥ 0} such that S n = W (n) + o(n 1/p ) a.s. ( If we can additionally assume that the moment-generating function exists in an area around zero, i.e., Ee t|X| < ∞ for some t > 0, then Furthermore, if only existence of the the second moment is assumed, Major [56] showed that there exists a sequence t n ∼ n such that The error terms appearing in the strong invariance principles (1), (2), and (3) are optimal.The approximation error appearing in the strong invariance principle also quantifies the convergence rate in the functional central limit theorem, as shown in Csörgö and Horváth [18; Theorem 1.16 and Theorem 1.17].These strong approximation results are powerful tools used to obtain numerous results in both probability and statistics as seen in, e.g., Csörgö and Hall [21], Csörgö and Révész [20], Parzen [67], and Shorack and Wellner [76].
Naturally, it is of great interest to extend these results beyond the i.i.d.setting.Berkes et al. [5] gives an extensive overview of invariance principles for dependent sequences.In Markovian settings, strong approximation results were obtained by Cuny et al. [22], Csáki and Csörgő [17], Vats et al. [82], and Merlevède et al. [57], among others.The strong invariance principle of Merlevède et al. [57] attains the Komlós-Major-Tusnády bound given in (2).The results of Csáki and Csörgő [17] and Merlevède et al. [57] are established through an application of Nummelin splitting, introduced in the seminal papers of Athreya and Ney [2] and Nummelin [62].Provided that the transition operator of the chain satisfies a one-step minorisation condition, a bivariate process can be constructed such that this process possesses a recurrent atom and the first coordinate of the constructed process is equal in law to the original Markov chain.Consequently, the chain inherits a regenerative structure and can thus be divided into independent identically distributed cycles.By application of the Komlós-Major-Tusnády approximations strong invariance principles can be obtained.Strong approximation results for Markov chains are useful tools for analysing estimators of the asymptotic variance of Markov Chain Monte Carlo (MCMC) sampling algorithms.Damerdji [23; 24], Flegal and Jones [36], and Vats et al. [82] show strong consistency of the batch means and spectral variance estimators for MCMC simulation output using the appropriate strong approximation results.
Recently, there has been growing interest in Monte Carlo algorithms based on Piecewise Deterministic Markov Processes (PDMPs).The main appeal of these processes is their non-reversible nature.It is well known that non-reversibility can significantly improve performance of sampling methods, in terms of both convergence rate to equilibrium and asymptotic variance, see for example, the results of Hwang et al. [45] and Lelievre et al. [51] regarding convergence to stationarity and Duncan et al. [31] and Rey-Bellet and Spiliopoulos [71] regarding the asymptotic variance.Furthermore, PDMPs have piecewise deterministic paths and can therefore be simulated without discretisation error, in contrast to for example Langevin and Hamiltonian dynamics.The primary sampling algorithms belonging to this class are the Zig-Zag sampler and the Bouncy Particle sampler, introduced by Bierkens and Roberts [8] and Bouchard-Côté et al. [11] respectively.Moreover, since these processes maintain the correct target distribution if subsampling is employed, they enjoy advantageous scaling properties to large datasets, as seen in Bierkens et al. [9].
In order for many useful results regarding estimation of the asymptotic variance of Markov chain simulation output to carry over to PDMP-based methods, it is required that a strong invariance principle holds for the underlying continuous-time process.In this paper, we obtain strong approximation results for a broad class of (continuous-time) ergodic Markov processes.Firstly, we show that the strong invariance principle given in Theorem 4.1 can be obtained directly through ergodicity and moment conditions.However, the resulting error rate is not explicit and therefore less convenient to work with.
A natural approach for obtaining a more refined strong invariance principle would be through regenerative properties of the process.However, it is in general not possible to show that the transition semigroup satisfies a minorisation condition such that a regenerative structure can be obtained.The resolvent chain, on the other hand, does satisfy a one-step minorisation condition.Utilising this result, Löcherbach and Loukianova [54] extend the concept of Nummelin splitting to Harris recurrent Markov processes.Hence we can redefine the process such that it is embedded in a richer process which is endowed with a recurrent atom.Although the resulting cycles are not independent and we therefore do not have regeneration in the classic sense, we do obtain short range dependence.Therefore we can utilise the approximation results of Berkes et al. [4] to obtain a strong invariance principle attaining a convergence rate of order O(T 1/4 log T ).This result is formulated in Theorem 4.5 and covers a wide range of Markov processes including ergodic diffusions.Although the nearly optimal bound of Berkes et al. [4] does not carry over, to the best of our knowledge, there are currently no approaches established that lead to superior rates for the class of processes considered in Theorem 4.5.
For PDMPs we are able to give a strong invariance principle with an improved approximation error.We show that the univariate Zig-zag process has regenerative cycles.This allows us to follow the approach of Merlevède et al. [57] such that the optimal strong approximation error of O(T 1/p ) can be obtained.Moreover, if the target distribution factorises into a product of independent densities, the optimal approximation bound carries over to the multivariate settings.Furthermore, we also show that the results of Merlevède et al. [57] can be extended under less restrictive conditions such that the optimal approximation error ( 2) is still attained.Finally, we discuss some applications of our obtained strong invariance principles.We demonstrate how the obtained strong approximation results can be utilised for analysing the batch means estimator of the asymptotic variance of continuous-time Monte Carlo samplers.Theorem 5.2 weakens the existing regularity conditions guaranteeing strong convergence of the batch means estimator in an MCMC setting.This is a direct consequence of the fact that Theorems 4.6 and 4.7 obtain the optimal approximation rate of O(T 1/p ) whereas previous work on estimation of the MCMC standard error is based on strong invariance principles with limited accuracy, which we further explain in Remark 5.3.Furthermore, we demonstrate the applicability of our results to diffusion processes and show that the magnitude of increments can be described with our obtained approximation results.
This article is organised as follows.In section 2, we give a brief introduction of Piecewise Deterministic Markov processes and state our motivational example.In Section 3, we review Nummelin splitting in continuous time as introduced by Löcherbach and Loukianova [54] and discuss other relevant results.In Section 4, the main results of the paper are given.In Section 5, we discuss the estimation of the asymptotic variance for PDMC simulation output.Section 6 illustrates the applicability of our results to diffusion processes.In Section 7, the proofs of the main results are given.

Motivating Example: Estimation of the asymptotic variance of Piecewise Deterministic Monte Carlo samplers
Suppose our goal is to sample from a probability distribution π(dx) on R d , which admits Lebesgue density where U is referred to as the associated potential of the target π.We will assume that U is twice continuously differentiable and can be evaluated pointwise.Typically, the objective is to compute expectations with respect to this distribution, in other words, we are interested interested in π(f ) = f (x)π(dx), for some appropriately integrable function f .
Piecewise Deterministic Monte Carlo (PDMC) samplers consist of a position and a velocity component.We will consider processes Z = (Z t ) t≥0 with Z t = (X t , V t ), where X t and V t denote the position and velocity component respectively.Our process takes values in E = X × V, where X denotes the state-space of the position component and V denotes the space of attainable velocities.Piecewise Deterministic Markov processes are characterised by their deterministic dynamics between random event times along with a Markov kernel that describes the transitions at events.More specifically, their deterministic dynamics are described by some ordinary differential equation.Both the Zig-Zag process and the Bouncy Particle sampler have piecewise linear trajectories characterised by Thus the rate of change of the position is described by the velocity, whereas the velocity does not change along the deterministic dynamics.Changes in the velocity occur according to some inhomogeneous Poisson rate λ(Z t ).The Poisson events consist of changes in the velocity component of our process.The fundamental idea behind these sampling methods is to choose the event rate and the changes in velocity such that the position component explores the state-space according to the target distribution π.The event rate should increase in an appropriate manner as the position is moving towards regions of lower probability mass.
For the Zig-Zag process the set of possible velocities is given by V = {−1, +1} d .We can distinguish d types of events for the Zig-Zag sampler.For every dimension i of our position component, an event will consist of flipping component i of the velocity, while keeping the other (d − 1) components unchanged.More specifically, our transition at events can be described by F i : V → V, which is the mapping that flips the i-th component of the velocity, i.e., for v ∈ V we have that the k-th entry of F i (v) is given by A change in the i-th component of the velocity will be governed by the inhomogeneous Poisson rate λ i .For the (canonical) Zig-Zag sampler these rates are given by where (x) + := max{x, 0}.Hence for the Zig-Zag process events occur with rate The simulation scheme for Zig-Zag is given in Algorithm 1 below.
Algorithm 1. Zig-Zag Sampler 4. The time of the k-th event is given by T In Bierkens et al. [9] it is shown that if we have then the Zig-Zag process has the desired invariant distribution given by π(dx)ν(dv), where the target distribution π is the marginal distribution of the position component and ν is a uniform distribution over the set of velocities V. Consider the case when the target π is of product form, namely π(x) where each π i is a one-dimensional probability density.Then the Zig-Zag process with stationary distribution π can be defined through d independent one-dimensional Zig-Zag processes.The potential of the product form target is given by U (x) = − d i=1 log π i (x i ), and therefore the corresponding Poisson event rates are given by where U (x i ) = − log π i (x i ).Because the switching intensity of every coordinate only depends on its own position and velocity, we see that the corresponding Poisson processes are independent.Therefore it follows that the d-dimensional Zig-Zag process Z t with target distribution π can be decomposed into d independent one-dimensional Zig-Zag processes (Z i t ) d i=1 , where every coordinate i moves according to Z i t which has target distribution π i for i = 1, • • • , d.For the simulation scheme of the Bouncy Particle Sampler we refer to Bouchard-Côté et al. [11].In the one-dimensional case the canonical BPS and ZZS are described by the same PDMP.For a more detailed introduction to PDMP-based samplers we refer to Fearnhead et al. [35].It can be shown that under very mild regularity conditions both sampling processes admit a stationary distribution given by where the target distribution π is the marginal distribution of the position component and ν is the marginal distribution of the velocity component.Moreover, an ergodic law of large numbers holds lim for all µ-integrable f .Given the independence of position and velocity at equilibrium, as seen in ( 8), the time average of the position component is a natural estimator for the space average π(f ).In order to assess the accuracy of our sampling method, we require a central limit theorem to hold; and estimate the corresponding asymptotic variance Σ f .Moreover, the asymptotic variance is also useful for determining the efficiency of the sampling algorithm via measures such as the effective sample size, see for example Gong and Flegal [41] or Vats et al. [83].The estimation of the asymptotic variance is also required for the implementation of stopping rules, which consists of justifiable criteria for termination of the simulation.In order to validate stopping rules that guarantee a desired level of precision, Glynn and Whitt [39] show that the estimator of the asymptotic covariance matrix must be strongly consistent.Strong invariance principles play a central role in the analysis of estimators of the asymptotic variance of Markov Chain Monte Carlo (MCMC) sampling algorithms, see for example Damerdji [23;24], Flegal and Jones [36], and Vats et al. [82].In this paper, we obtain strong approximation results for broad classes of ergodic Markov processes.We show that for PDMPs many results regarding estimation of the asymptotic variance immediately carry over.

Nummelin splitting in continuous time
Let X = (X t ) t≥0 be a stochastic process defined on a filtered probability space (Ω, F , (F t ) t≥0 , P x ), with Polish state space (E, E ) and initial value X 0 = x.We consider the case where X is a positive Harris recurrent strong Markov process with transition semigroup given by (P t ) t≥0 with finite invariant measure π.By definition of positive Harris recurrence, π can be normalised to be a probability measure and we have that Throughout this paper we will additionally require ergodicity of the considered processes.We say that a Markov process X is ergodic with convergence rate Ψ if where V is some positive π-integrable function and Ψ some positive rate tending to zero.Furthermore, a process is called polynomially or exponentially ergodic if Ψ decays accordingly.For a more thorough discussion of these definitions we refer to Meyn and Tweedie [58].
The resolvent chain X = ( Xn ) n≥0 is obtained by observing the process at independent exponential times, i.e., Xn := X Tn for n ≥ 0.Here (T n ) n≥0 denote the sampling times at which we observe the process X, which are defined as T 0 := 0 and T n := n k=1 σ k , where (σ k ) k≥1 denote a sequence of i.i.d.exponential random variables.The resolvent chain will inherit positive Harris recurrence from the original process.The transition kernel of the process X = ( Xn ) n∈N is given by and satisfies the one-step minorisation condition, see for example Höpfner and Löcherbach [44] or Revuz [70], where h ⊗ ν(x, A) = h(x)ν(A), with h(x) = α½ C (x) for some α ∈ (0, 1), a measurable set C with π(C) > 0, and ν(•) a probability measure equivalent to π(• ∩ C).
The minorisation condition of the resolvent chain motivates the introduction of the kernel K((x, u), dy) : E × [0, 1] → E given by where the residual kernel W (x, dy) is defined as Since the resolvent chain is also positive Harris recurrent, it will hit C infinitely often.Given that the resolvent chain has hit C, with probability α the chain will move independently of its past according to the small measure ν and with probability (1 − α) it will move according to the residual kernel W .By the Borel-Cantelli lemma the residual chain will generate according to ν infinitely often.Let R k denote the k-th time that the resolvent chain moves according to ν.The randomised stopping times (R k ) k serve as regeneration epochs for the resolvent; for every k, XR k has law ν and is independent of both its past and of R k .The implied regenerative properties that the process X obtains through its resolvent are made explicit with the approach of Löcherbach and Loukianova [54].Their framework requires the following regularity conditions on the transition semigroup of the process X: Assumption 1. (i) The semigroup (P t ) t≥0 is Feller, i.e., for every A ∈ E , the mapping x −→ P t (x, A) is bounded.
At the so-called sampling times of the process X, we can apply the Nummelin splitting technique to the resolvent chain.We then fill in the original process between the sampling times.Following this procedure, Löcherbach and Loukianova [54] construct on an extended probability space a process Z with state space E × [0, 1] × E, that admits a recurrent atom.The first coordinate of Z has the same law as the original process X, the second coordinate denotes the auxiliary variables employed in order to generate draws from the resolvent chain via the splitting procedure, and the third coordinate corresponds to the subsequent values of the resolvent chain.
The process Z = (Z 1 t , Z 2 t , Z 3 t ) t≥0 can be constructed according to the following procedure.Firstly, let where U [0, 1] denotes the uniform distribution on the unit interval.Given {Z 2 0 = u}, draw Z 3 0 according to K((x, u), dx ′ ).Then inductively for n ≥ 1, on Z 0 = (x, u, x ′ ): I. Choose σ n+1 according to The next sampling time T n+1 is given by T n + σ n+1 .II.On {σ n+1 = t}, put Z 2 Tn+s := u and Z 3 Tn+s := x ′ for all 0 ≤ s < t.III.Draw a bridge of Z 1 conditioned on its endpoints Z 1 Tn and Z 1 Tn+1 , so that for every 0 ≤ s < t we obtain Let Z 1 Tn+s := x 0 for some fixed x 0 ∈ E on {p t (x, x ′ ) = 0}.Moreover, given Z 1 Tn+s = y on s + u < t we have that Again, on {p t−s (y, Tn+1 independently of Z s , s < T n+1 , uniformly on the unit interval.Given {Z 2 Tn+1 = u ′ }, generate Note that in the construction of Z the inter-sampling times (σ n ) n≥1 are drawn according to (16), their conditional distribution given the starting and endpoint of the sampled chain.Equation ( 17) and (18), describe the distributions of points in a bridge of the process X.The first coordinate of Z consists of bridges drawn according to the law of the original process X, between realisations of the resolvent chain.The results of Löcherbach and Loukianova [54; 55] that we work with are given in the following propositions.Firstly, the first coordinate of Z has the desired distribution.
Proposition 3.1.The constructed process Z is a Markov process with respect to its natural filtration F.Moreover, the first coordinate Z 1 is equal in law to our process X, namely, Moreover, (T n − T n−1 ) n≥1 are i.i.d exponential random variables and are independent of Z 1 ; therefore, we also have that Moreover, the process X is embedded in a richer process Z, which admits a recurrent atom A := C × [0, α] × E in the sense of the following proposition.
Proposition 3.2.Let (S n , R n ) be a sequence of stopping times defined as S 0 = R 0 := 0 and The stopping times {S n } n thus denote the hitting times of the recurrent atom A for the jump process (Z Tn ) n , and {R n } n denote the implied regeneration epochs of the process Z.As a direct consequence, we obtain the following regenerative structure for the original process.
Proposition 3.3.Let f be a measurable π-integrable function, then we can construct a sequence of increasing stopping times {R n } n with R 0 = 0 and such that the sequence {ξ n } n is a stationary sequence under The regenerative structure given in Proposition 3.3 was also noted by Sigman [77].They define a process X to be one-dependent regenerative if there exists, on a possibly enlarged probability space, a sequence of randomised stopping times R n with corresponding cycle lengths Note that according to this definition the initial cycle is allowed to have a different distribution.Löcherbach and Loukianova [54] give a constructive approach towards this result, in which they explicitly define the corresponding stopping times and the recurrent atom.By the implied regenerative structure of X, we obtain the following characterisation of the stationary measure.Proposition 3.4 (Sigman [77; Theorem 2]).Let X be a positive recurrent one-dependent regenerative process, then we can characterise its stationary measure as follows where ̺ is defined as E ν R 1 .Moreover, we have the following erdogic law of large numbers for all f : R d → R p with π(|f |) < ∞.Note that the normalisation constant of π in Proposition 3.4 is finite and non-zero due to the positive Harris recurrence of the process.
Remark 3.5.The framework of Löcherbach and Loukianova [54; 55] does not require ergodicity.Moreover, it is important to note that contrary to the classically regenerative case, Proposition 3.4 does not imply convergence in total variation to the stationary measure.For a counterexample see Sigman [77;Remark 3.2].△ For our applications we will require ergodicity and hence we must additionally impose this as stated in (11).These ergodicity requirements are usually established through Foster-Lyapunov drift conditions; see Down et al. [30] and Fort and Roberts [37] for exponential and polynomial ergodicity respectively.These results have been applied to several classes of diffusion processes, see for example Cattiaux et al.For PDMPs, Bierkens et al. [10] show exponential ergodicity of the Zig-Zag process for target distributions that have a non-degenerate local maximum and appropriately decaying tails.In Deligiannidis et al. [28] and Durmus et al. [32] conditions for exponential ergodicity of the Bouncy Particle Sampler are given.Utilising hypocoercivity techniques, Andrieu et al. [1] establish polynomial rates of convergence for PDMPs with heavy-tailed stationary distributions.When we are concerned with PDMPs we will require the following regularity conditions on the stationary density: Assumption 2. Assume that π is twice continuously differentiable, strictly positive, has a nondegenerate local maximum and lim x →∞ π(x) = 0.
These regularity conditions are often imposed in order to analyse the ergodic behaviour of PDMPs.Assumption 2 with accompanying conditions on the decay of the tails of the target distribution are used to show various rates of ergodicity.

Main Theorems
The most straightforward approach for obtaining a strong approximation result for Markov processes would be through ergodicity requirements.Kuelbs and Philipp [50] show that a multivariate strong invariance principle holds for sums of random vectors satisfying a strong mixing condition; see also Theorem 7.1.This mixing condition can easily be satisfied by guaranteeing an appropriate rate of ergodicity of the process.All proofs are provided in Section 7.
Then for every initial distribution and for all f : E → R d with π( f 2+δ ) < ∞, we can construct a process that is equal in law to X together with a standard p-dimensional Brownian motion W = (W (t)) t≥0 on some probability space such that and positive semi-definite d × d covariance matrix Σ f given by with all entries converging absolutely and integration of matrices defined element-wise.
Remark 4.2.The asymptotic covariance matrix Σ f given in Theorem 4.1 cannot be simplified.
Only for the univariate case (p = 1) and for reversible processes do we obtain that As a result of the reversibility, the cross-covariance matrices in (23) will be symmetric and thus the asymptotic covariance can be expressed as (24).△ The rate ψ T appearing in Theorem 4.1 will depend on the dependence and moment structure of the considered process.For processes admitting higher order moments and having faster decaying levels of dependence the approximation bound ψ T will tend to infinity at a slower rate.This can be interpreted as the magnitude of the difference between the centred additive functional of the process and the approximating Brownian motion being smaller.Although result (21) has useful applications for arbitrary λ ∈ (0, 1/2), many refined limit theorems require an explicit remainder term, where more insight is given regarding the impact of the moment and dependence structure on the approximation error.In order to derive a more refined strong invariance principle we will make us of splitting arguments.Following the continuous time Nummelin splitting technique, as introduced by Löcherbach and Loukianova [54], it follows that the process can be embedded in a richer process, which admits a recurrent atom.Hence the process can be redefined such that it can be split in identically distributed blocks of strongly mixing random variables.Therefore we can utilise the approximation results for weakly m-dependent sequences of Berkes et al. [4] to obtain a strong invariance principle; see also Theorem 7.4.
Proposition 4.3.Let X = (X t ) t≥0 be an aperiodic, positive Harris recurrent Markov process for which Assumption 1 is satisfied.Let f : E → R, be a given π-integrable function.Define the sequence of random times {R n } ∞ n=1 and {ξ n } ∞ n=1 as given in Proposition 3.2 and 3.3.Moreover, assume that Then for every initial distribution we can construct a process, on an enriched probability space, that is equal in law to X together with two standard Brownian motions W 1 and W 2 such that where {σ 2 T } and {τ 2 T } are non-decreasing sequences with as T → ∞, and ψ T , π(f ), ̺, and σ ξ are defined in equations ( 29) to (32).
In Proposition 4.3 we obtain an explicit approximation error.In alignment with expectations, we see that faster convergence to the stationary measure and the existence of higher order moments will result in an improved approximation error.However, the required moment conditions for Proposition 4.3 stated in ( 25) and ( 26) are impractical and would be burdensome, if not impossible, to verify directly for most applications.For classically regenerative Markov chains this problem also arises, see the analogous requirements of regenerative simulation given in Mykland et al. [61] and the strong approximation result of Csáki and Csörgő [17].Hobert et al. [43] were the first to simplify moment conditions of this form and give practical sufficient conditions for regenerative simulation.More specifically, in their main result they show that polynomial or geometric ergodicity and moment conditions with respect to the stationary measure are sufficient to guarantee finiteness of the second moment of a cycle.This result was generalised to higher order cycle moments by Jones et al. [46] and Bednorz and Latuszyński [3]; hence simplifying the required conditions of Csáki and Csörgő [17].However, the aforementioned approaches are all for Markov chains satisfying a one-step minorisation condition, i.e., for the classically regenerative setting.Since our setting involves a more complicated reconstruction of the process of interest, the results do not immediately carry over.In Theorem 4.3, we show that the cycle moment conditions ( 25) and ( 26) required for Proposition 4.3 can also be guaranteed with more easily verifiable ergodicity and moment conditions.
Theorem 4.4.Let X = (X t ) t≥0 be an aperiodic, positive Harris recurrent Markov process for which Assumption 1 is satisfied.Moreover, let X be polynomially ergodic of order β > p/ε + 2, for some ε > 0 then Moreover, for all measurable f : By combining Proposition 4.3 and Theorem 4.4 we obtain the desired strong invariance principle.
Theorem 4.5.Let X = (X t ) t≥0 be an aperiodic, positive Harris recurrent Markov process for which Assumption 1 is satisfied.Moreover, let X be polynomially ergodic of order β > p/ε + 3, for given p > 2 and some ε > 0. Then for every initial distribution and for all measurable f : E → R with π(|f | p+ǫ ) < ∞ we can, on an enriched probability space, define a process that is equal in law to X and two standard Brownian motions W 1 and W 2 such that where {σ 2 T } and {τ 2 T } are non-decreasing sequences with , and Proof.The assertion follows immediately from Proposition 4.3 and Theorem 4.4.
The appearance of the second Brownian motion in Theorem 4.5 is inherited from the strong invariance principle of Berkes et al. [4].Although we obtain different time perturbations of the Brownian motions, all desired properties carry over.The second Brownian motion appearing in ( 28) is of a smaller magnitude, and will therefore be asymptotically negligible in typical applications.Furthermore, even though the two Brownian motions are not independent, their correlation decays over time Note that the nearly optimal convergence rate of Berkes et al. [4] does not carry over.Instead we obtain an approximation error that cannot be improved beyond O(T 1/4 log T ).Obtaining an approximation error superior to O(T 1/4 log T ) remains an open problem for the class of processes considered in Theorem 4.5.A possible approach for attaining a better convergence rate would be to extend to results of Berkes et al. [4] to a multivariate setting and then follow the approach of Merlevède et al. [57].
The univariate Zig-zag process passes every point in its state-space, in particular also the local optima of its target density, an infinite amount of times.This allows us the define regenerative cycles of the process.Therefore we can adapt the approach of Merlevède et al. [57] and obtain the optimal bound of O(T 1/p ) for the strong approximation of the one-dimensional Zig-Zag process.
Theorem 4.6.Let Z = (X t , V t ) t≥0 be a aperiodic, positive Harris recurrent Zig-zag process with an invariant distribution π ⊗ υ, where π satisfies Assumption 2.Moreover, let Z be polynomially ergodic of order β > p/ε + 2, for given p > 2 and some ε ∈ (0, 1).Then for every initial distribution and for all measurable f : where σ 2 f can be characterised as (36).Merlevède et al. [57] obtains a strong invariance principle for one-dimensional Markov chains satisfying a one-step minorization condition by making use of the implied regenerative properties.Note that their approach carries over for any regenerative process.However, they assume that the chain is exponentially ergodic and that the test function f is bounded.The boundedness of f is very restricting for applications in MCMC, since it excludes many interesting examples such as the posterior mean and variance.Theorem 4.6 extends their results by only imposing polynomial ergodicity and only a necessary moment condition for the test function.
Furthermore, we see that if the target distribution is of product form, i.e., satisfies the factorisation π(x) = d i=1 π i (x i ), then the optimal bound carries over to the multivariate settings.Theorem 4.7.Let Z = (X t , V t ) t≥0 be a aperiodic, positive Harris recurrent Zig-zag process with an invariant distribution π ⊗ υ, where π is of product form and every π i satisfies Assumption 2.Moreover, let Z be polynomially ergodic of order β > p/ε + 2, for given p > 2 and some ε ∈ (0, 1).Then for every initial distribution and for all f : E → R d that can be decomposed as and covariance matrix Note that although the proof of Theorem 4.7 relies on the fact that the d-dimensional Zig-Zag process Z can be decomposed into d one-dimensional independent Zig-Zag processes, the multivariate invariance principle does not directly follow from an application of Theorem 4.6, since even though the individual coordinates have regenerative cycles, the multivariate process Z does not possess regeneration times.Moreover, it must be guaranteed that the approximating Brownian motions for the individual components are defined on the same probability space.
Remark 4.8.The aforementioned results require a certain degree of polynomial ergodicity, but can mutatis mutandis be seen to hold assuming exponential ergodicity.△

Analysis of batch means for Piecewise Deterministic Monte Carlo
In order to assess the accuracy of our PDMC sampler, we require a central limit theorem to hold and estimate the corresponding asymptotic variance.In Bierkens and Duncan [7] several conditions are given to obtain a CLT for the univariate Zig-Zag process.Durmus et al. [32], Deligiannidis et al. [28], and Bierkens et al. [10] obtain a CLT for the Bouncy Particale sampler and Zig-Zag process respectively through geometric drift conditions, which in turn also imply exponential ergodicity.The strong invariance principles we obtained in Theorems 4.1, 4.5, 4.6, and 4.7 immediately imply the following central limit theorems for polynomially ergodic Markov processes.
For simplicity, we will mainly consider the one-dimensional case, i.e. our quantity of interest is given by π(f ), with f : R p → R a given π-integrable function.Let the simulation output, which in our case consists of the position component of a PDMP, be given by (X t ) t∈[0,T ] .Note that from Corollary 5.1 also a (functional) central limit theorem follows for the position component of the process.We are interested in estimating the asymptotic variance (39); which we will denote with σ 2 f , when we are not considering the multivariate setting.The batch means method divides the obtained sample trajectory of our process into nonoverlapping parts.The sample variance of the means of the obtained batches gives rise to a natural estimator for the asymptotic variance.More specifically, we divide our simulation output in k T batches of length ℓ T such that k T = ⌊T /ℓ T ⌋.We proceed by computing the sample average of each obtained batch; If a functional central limit theorem holds for our process, it follows that the computed means Zi (ℓ T ) are asymptotically independent and identically distributed for all fixed amount of batches.Hence, we can heuristically reason that the sample variance of ( Zi (ℓ T ) kT i=1 will be close to Var( Zi (ℓ T )).Moreover, since each Zi (ℓ T ) is also an empirical mean, it is reasonable to expect their variance to be approximately σ 2 f /ℓ T .The batch means estimator of the asymptotic variance is defined by correcting the sample variance of the batch means ( Zi (ℓ T ) kT i=1 by a factor ℓ T , namely Following the framework of Damerdji [24], we impose the following conditions on the amount of batches and their length.
Assumption 3. Let the amount of batches k T and their lengths ℓ T be such that i. k T → ∞ , ℓ T → ∞, and ℓ T /T → 0 as T → ∞, ii. ℓ T and T /ℓ T are both monotonically increasing, iii. there exists a constant c ≥ 1 such that The first requirement of Assumption 3 is a necessary condition for consistency as seen from the results of Glynn and Whitt [38].The second requirement is solely for technical reasons and the third requirement ensures that the amount of the batches grows fast enough; if we choose ℓ T = T α the requirement holds for all α ∈ (0, 1), since we can choose c > 1/(1 − α).
Proof.Remark 5.3.Note that Theorem 5.2 weakens the currently available regularity conditions guaranteeing strong convergence of the batch means estimator.This is a direct consequence of the fact that Theorems 4.6 and 4.7 obtain the optimal approximation rate of O(T 1/p ) whereas the results of Jones et al. [46] are based upon the strong invariance principle of Csáki and Csörgő [17] which attains the rate O(T γ log T ), with γ = max(1/p, 1/4).More specifically, for f with π(|f | p ) < ∞ Jones et al. [46] requires T γ log 3 (T )/ℓ T → 0 as T → ∞.In particular for the case where p > 4, Theorem 5.2 is able to significantly weaken the conditions on the required batch length ℓ T .As a direct results of the smaller batch lengths, we are able use a higher amount of bathes k T , which results in a smaller variance for the batch means estimator, as seen in Theorem 5.4.Note that a similar conclusion holds for the overlapping batch means and spectral variance estimators considered in Flegal and Jones [36].△ We see from the required assumption (42) that a larger approximation error in the strong invariance principle, which corresponds to higher orders of dependence, results in a larger required batch size ℓ T .This is in agreement with the idea behind batching methods; every batch should give a proper representation of the dependence structure of the process.Otherwise a structural bias will be introduced in the estimation procedure.On the other hand, choosing the batch size larger than necessary, will result in a lower amount of batches k T leading to a higher variance for the estimator.Strong approximations can also be used to characterise the mean squared error and obtain a central limit theorem for the batch means estimator.
Theorem 5.4.Let Z be polynomially ergodic of order β > p/ε + 2, for given p > 2 and some ε ∈ (0, 1) with stationary measure µ satisfying (8) and let π(|f | p ) < ∞.Let the initial distribution be given by µ and assume that Assumption 3 holds and E µ C 2 < ∞, where C is defined in (90).Then we have that Moreover, if ℓ −1 T T 1/p (T log T ) 1/2 → 0 as T → ∞, then we obtain a CLT for the batch means estimator The first and second term in (43) describe the variance, whereas the third term represents the bias.Note that the second term does not depend on ℓ T and tends to zero.The obtained bounds for the variance are sharp, whereas, the bounds for bias have room for improvement.
In the multivariate setting, where our quantity of interest is given by π(f ), with f : R p → R d a given π-integrable function, the batch means estimator is given by where Zi (ℓ T ) is defined in (40).Given the strong invariance principle of Theorem 4.1, the results of Vats et al. [83] for the multivariate batch means estimator immediately carry over.
Theorem 5.5.Let Z be polynomially ergodic of order Assume that Assumption 3 holds and that with ψ T defined in (22), then for every initial distribution we have that ΣT → Σ f as T → ∞ with probability 1.
Proof Furthermore, if the target distribution is of product form and we consider the Zig-Zag Sampler, then Theorem 4.7 gives a strong invariance principle with an explicit approximation error.Therefore, we can replace condition (46) of Theorem 5.5 with (42) for every component of the Zig-Zag process.This results in a condition that can more easily be verified.

Batch size selection for PDMC
Glynn and Whitt [38] show that there exists no consistent estimator of σ 2 f with a fixed amount of batches.Hence the amount of batches should explicitly depend on the length of the simulation T .For the standard choice ℓ T = T α we see that for α > 1/2p we obtain both strong consistency and L 2 -convergence of the batch means estimator.Theorem 5.4 suggests that α * = (2 + p)/2p would be optimal in the mean squared error sense.The well-known results of Chien et al. [15], Goldsman et al. [40], and Song and Schmeiser [78] obtain a bound for the bias of order O(ℓ −1 T ), which implies an optimal (in the MSE sense) batch size of ℓ ⋄ T ≍ T 1/3 .However, the aforementioned results require the sampling process to be stationary, uniformly ergodic, and satisfy moment condition π(f 12 ) < ∞.Obtaining the bias term of order O(ℓ −1 T ) for batch means under milder conditions remains an unaddressed problem.Theorem 5.2 and 5.4 only require a strong invariance principle, which we have shown holds under polynomial ergodicity; a very reasonable assumption for simulation output.Moreover, these results do not require stationarity and thus hold for every initial distribution.Theorem 5.4 imposes more demanding conditions on ℓ T than aforementioned frameworks, however, it is quite reasonable to let the batch size depend on the dependence structure of the process through ψ T , instead of only on through the constant sγ(s)ds, as is the case in the aforementioned results.Moreover, in practice the performance of batch means methods with batch size ℓ ⋄ T are often found to be sub-optimal whereas larger batch sizes see better finite sample performance, as noted by for example Flegal and Jones [36].
An alternative approach for determining the optimal batch size was given by Chien [16], who obtains an optimal batch size of lT ≍ T 1/2 by minimising the distance between the cumulants of the studentised ergodic average and a standard Gaussian, which suggest that the resulting confidence intervals enjoy improved finite-sample properties.

Asymptotic normality of the batch means estimator
We see that given polynomial ergodicity, also the central limit theorem for the batch-means estimator carries over to the PDMC setting.The results of Sherman and Goldsman [75] require uniform ergodicity and the moment condition π(f 12 ) < ∞ in order to obtain asymptotic normality of the batch means estimator.Since uniform ergodicity is not attainable for most practical problems, less stringent conditions on the rate of ergodicity are desired.Theorem 5.4 places more restricting conditions on the batch size and excludes the choice ℓ ⋄ T ≍ T 1/3 .Chakraborty et al. [14] obtain a CLT for the batch means estimator assuming reversibility, stationarity, geometric ergodicity and moment condition π(f 8 ) < ∞.Moreover, the required batch size must be such that k T = o(ℓ 2 T ).Hence their result is also unable to guarantee asymptotic normality for batch size ℓ ⋄ T .We see that Theorem 5.4 gives more practical conditions for guaranteeing asymptotic normality of the batch-means estimator, in particular, the results are applicable to non-reversible processes.

Spectral variance and overlapping batch means estimators for the PDMC standard error
Analogous to the batch means method, given the strong invariance principle formulated in Theorem 4.1, many results for other estimators of the asymptotic variance also carry over.Flegal and Jones [36] gives more convenient alternatives for some of the requirements of the framework given in Damerdji [23].The results of Flegal and Jones [36] regarding spectral variance and overlapping batch means estimators for MCMC output are thus also applicable for PDMC, with minor adjustments to their assumptions.Note that the assumed minorisation condition and geometric ergodicity of the Markov chain in Flegal and Jones [36] are only imposed such that the strong invariance principle of Csáki and Csörgő [17] holds.Although implementation of spectral variance estimators for continuous-time output might be impractical, these estimators are still of theoretical interest.Numerous estimation methods, such as overlapping batch means and certain standardised time series methods, with feasible implementation for PDMC output, can be show to be (asymptotically) equivalent to spectral estimators.Furthermore, we expect the results of Vats et al. [82] and Liu et al. [53] regarding spectral variance and generalised overlapping batch means estimators respectively to remain valid in the continuous-time setting.Hence also the implications for the optimal values of the tuning parameters of these estimation methods for the asymptotic variance remains valid.Lastly, note that our results hold for all sampling algorithms that produce continuous-time output, and are not restricted to the PDMP setting.

Increments of Additive Functionals of Ergodic Markov processes
Strong approximation results enable various asymptotic properties of Brownian motion to carry over to other stochastic processes.In this section, we show that the strong invariance principle given in Theorem 4.5 can be used to show that the increments of additive functionals of Markov processes are of the same magnitude as Brownian increments, provided we have sufficient decay of the approximation error.The following theorem describes the magnitude of the fluctuations of Brownian increments over subintervals of length a T .
Theorem 6.1 (Csörgö and Révész [19; Theorem 1]).Let W = (W t ) t≥0 denote a Brownian motion, and let a T be a positive non-decreasing function of T such that 0 < a T ≤ T and T /a T is non-decreasing.Then where .
Taking a T = T gives the law of iterated logarithm, and for a T = c log T with c > 0, the Erdös-Rényi law of large numbers for Brownian motion is obtained.This fluctuation result has been extended to other processes such as integrated Brownian motion, fractional Brownian motion, and non-stationary Gaussian processes, see Li [52], El-Nouty [34] and Ortega [66] respectively.While these fluctuation results are of independent interest, they are also used as building blocks in applications, such as proving convergence properties of kernel density estimators, see for example Révész [69] and Deheuvels [27].These fluctuation results are also used for proving almost sure convergence of various estimators of the asymptotic variance in simulation output settings, see the references given in Section 4. In order to describe the fluctuations of additive functionals over an interval of a specified length a T , we require an explicit remainder term for the Brownian approximation, as given in Theorem 4.5.However, due to the appearance of the second Brownian motion in this invariance principle and the perturbed time sequences, it is not immediate that the Brownian fluctuation result carries over.Berkes et al. [4] show that the magnitude of the increments of partial sums of weakly m-dependent sequences are indeed given by Theorem 6.1, due to the smaller scaling of the second Brownian motion.However, in our case the perturbed time sequences are random since they depend on the amount of one-dependent regenerative cycles of the process, hence the desired result does not follow directly by Berkes et al.Suppose that β T ψ T = o(1), where , and ψ T = max T 1/2q log T, T 1/p log 2 (T ) .
Then we have that lim sup As noted by Berkes et al. [4], the spit invariance principle also implies the distributional version of Theorem 6.2; with similar adaptations to their argument this would also hold in our case.
Since the approximation error ψ T of Theorem 4.5 cannot be guaranteed to be smaller than O(T 1/4 log T ), the fluctuation result given in Theorem 6.2 cannot describe the magnitude of increments over slowly growing time intervals a T .

Application to diffusion processes
Diffusions are an important class of processes for which the strong approximation given in Theorem 4.5 and the related fluctuation result given in Theorem 6.2 are applicable.Let X = (X t ) t≥0 denote a one-dimensional diffusion process that is defined as the solution of the following timehomogeneous stochastic differential equation (SDE) where µ is the initial distribution of the process, X ⊆ R denotes the state-space, b : X − → R and σ : X − → R denote the drift and volatility function respectively, and the process W is a Brownian motion.We assume that all required regularity conditions hold such that the existence and uniqueness of a strong solution of the SDE is guaranteed.For example, we can impose Lipschitz conditions on the drift and volatility of the SDE.For a more detailed explanation, we refer to Rogers and Williams [74].
For diffusion processes to admit the desired ergodic properties we must impose additional regularity conditions.The scale function of a one-dimensional diffusion process is given by dy dz and must satisfy lim If condition (50) holds it follows that the diffusion is recurrent, the time for the process to return to any bounded subset of its state space is a.s.finite.The speed density of the diffusion process m : , must be Lebesque integrable for the diffusion to be positive Harris recurrent.For higher dimensional diffusion process Bhattacharya [6] gives conditions that guarantee positive Harris recurrence.In order for the obtained strong invariance principles given in Theorem 4.1 and Theorem 4.5 to hold, we require polynomial or exponential convergence to stationarity.These assumptions are usually obtained by verifying drift conditions for the diffusion processes, see for example Cattiaux et  In order for the strong approximation result in Theorem 4.5 and the related fluctuation result of Theorem 6.2 to hold, the Nummelin splitting scheme of Löcherbach and Loukianova [54] must be applicable.Therefore we must impose regularity conditions such that Assumption 1 is satisfied, i.e., the transition semigroup of the diffusion must be Feller and admit densities with respect to some dominating measure.Under appropriate growth and continuity conditions on the drift and volatility, diffusion processes are Feller, see for example Williams [84; Theorem 2.2].Moreover, if the volatility function σ is strictly positive (positive-definite in the multivariate case), the diffusion is elliptic and therefore admits transition densities; Stroock and Varadhan [80; Theorem 3.2.1].Hence, Assumption 1 is satisfied.Alternatively, for multivariate diffusions, we can impose the parabolic Hörmander condition which ensures that the propagation of the noise through the different coordinates is sufficient, such that the transition density exists, see for example Rogers and Williams [74;Theorem 38.16].

Discussion and suggestions for further research
We see that Theorem 4.1 and Theorem 4.5 are applicable to a broad class of diffusions and extend the current results on strong approximations for diffusion processes.Heunis [42] and Mihalache [60] obtain strong invariance principles for diffusions and a complementary fluctuation result and change point test respectively.The results of Mihalache [60] yield an explicit approximation error comparable to that of Theorem 4.5, but are only applicable to stochastic integrals with respect to Brownian, i.e., diffusion processes with no drift.The results of Heunis [42] give an implicit approximation error and hold for singular diffusions.The strong invariance principle of Heunis [42] is not covered by our results since singular diffusions generally do not satisfy the mixing properties required for our framework.
The obtained strong invariance principles offer numerous applications for diffusion processes, see for example Csörgö and Hall [21] and their given references.Following the approach of Berkes et al. [4; Proposition 2], Theorem 4.5 can be used to obtain a change-point test for diffusions.If the diffusion process we consider has a drift that enforces mean-reversion, we could construct a test for the existence of a deterministic linear trend over specified time periods.This approach would require continuous-time output of a diffusion process, and is therefore more of theoretical interest.However, it is plausible that the asymptotic behaviour of the change-point test should carry over to the high-frequency setting, where the diffusion is observed discretely and it is assumed the inter-observation times tend to zero.

Theorem 4.1
Kuelbs and Philipp [50] give a strong invariance principle for mixing random variables.In order to state their result, we first briefly introduce mixing coefficients.Let A and B denote two sub σ-algebras of our probability space.The α-mixing coefficients of two σ-algebras quantify their dependence as follows The mixing coefficients of a stochastic process X, endowed with its natural filtration, are defined as α X (s) : The mixing coefficients of a process measure the dependence between events in terms of units of time that they are apart.For a stationary Markov process the mixing coefficients simplify as α(s) = α(σ(X 0 ), σ(X s )), as shown in for example Bradley [12; page 118].
Then we can redefine ξ on a new probability space on which we can also construct a p-dimensional Brownian motion W with covariance matrix Σ ξ , with absolutely converging entries for some λ ξ ∈ (0, 1/2) depending only on ε, δ and p.
The following lemmata are useful in the proof of Theorem 4.1.
Lemma 7.2 (Douc et al. [29;Theorem F.3.3]).Let X be a stationary ergodic Markov process with rate of convergence to stationarity given by ψ, then α X (s), the α-mixing coefficients of the process X, decay according to ψ, i.e., for all s ≥ 0 we have that where Ψ and V are as stated in (11).Lemma 7.3 (Davydov [26] and Rio [72]).Let (Ω, F , Pr) be a probability space and A and B be two sub σ-algebras and consider random variables X and Y that are measurable with respect to these σ-algebras respectively.Moreover, assume that X ∈ L p (Pr) and Y ∈ L q (Pr), for some p, q ≥ 1.Then we can bound their covariance in terms of the α-mixing coefficients as follows 1/r X p Y q , where p, q, r ∈ [1, ∞] and Following a traditional blocking argument it is now straightforward to show that the result of Kuelbs and Philipp [50] also holds for continuous-time ergodic processes.

Proof of Theorem 4.1
Proof.For notational convenience introduce Y = (Y t ) t≥0 , where Y t := f (X t ) − π(f ) for t ≥ 0, and ξ = (ξ k ) n k=1 , with n := n T := ⌊T ⌋ and Furthermore, by definition n is a function of the sample size T , however, for notational convenience we suppress this.Since we are in the setting of Lemma 7.2, X has polynomially decaying α-mixing coefficients, which we will denote with (α X (s)) s≥0 .Consequently, we have that Y and ξ are both stationary and strongly mixing processes with polynomially decaying α-mixing coefficients (α Y (s)) s≥0 and (α ξ (h)) h∈N respectively.This can easily be seen by observing that σ (f (X t )) ⊆ σ (X t ) and σ (ξ k ) ⊆ σ (X s : s ≤ k).In order to show that a strong invariance holds for Y , we will show that it holds for ξ and determine the growth rate of the corresponding remainder terms.Moment conditions for ξ are directly inherited by the assumed moment conditions for X.By an application of Jensen's inequality we see that for p = 2 + δ we have that Therefore, by Theorem 7.1, we can redefine ξ on a new probability space on which we can also construct a d-dimensional Brownian motion W with covariance matrix Σ ξ , with absolutely converging entries for some λ ξ ∈ (0, 1/2) depending only on ε, δ and d.The claim follows if we show that In order to show that (51) holds, we note that By a Borel-Cantelli argument it will follow that Indeed, let ε > 0 be given and introduce the event By Markov's inequality it follows that the introduced sequence of events satisfies The Borel-Cantelli lemma states that P(lim sup A n,ε ) = 0.In order for (55) to hold, it is sufficient to note that from a certain point in time the remainder term will always satisfy the asserted bound for almost every sample path, i.e., Pr where the first inequality follows due to monotonicity and the second-last equality by the continuity of measures.A similar Borel-Cantelli argument also shows that (52) holds.Introduce sequence of events for given ε > 0 and some κ > p.Since all moments of W (T ) − W (n) are finite, we have by Markov's inequality that the introduced sequence of events satisfies By completely analogous reasoning as the previous Borel-Cantelli argument we see that Therefore the term (52) will be asymptotically negligible.Finally, we see that by Lemma 7.3 it immediately follows that the asserted asymptotic variance Σ f is finite, i.e., all entries By definition α-mixing sequences are monotonically decreasing and bounded by 1/4.The second term of ( 56) is treated similarly.In order to show that Σ f = Σ ξ , we will show that all entries are equal.Firstly, we decompose the asymptotic covariance matrix as follows lim Let Σ T 1 , Σ T 2 , Σ T 3 and Σ T 4 denote the four terms in (57).We will show that the entry-wise convergence gives us the desired result.For 1 ≤ i, j ≤ d we obtain the following expressions for the elements of the matrices in ( 57): We see that (Σ T 1 ) ij tends to the asymptotic variance (Σ f ) ij , since Finally, another application of Lemma 7.3 gives us that (Σ T 2 ) ij , (Σ T 3 ) ij and (Σ T 4 ) ij tend to zero since

Proposition 4.3
Berkes et al. [4] give a strong invariance principle for weakly m-dependent processes, which are defined as processes that can be approximated by m-dependent processes in the L p -sense, with a sufficiently decaying approximation error (rate function in terminology of Berkes et al. [4]).
Their strong invariance principle, stated in Theorem 7.4, is obtained through a classical blocking argument for m-dependent random variables.By dividing an m-dependent sequence into nonoverlapping long and short blocks, two sequences of independent random variables are obtained; these can both be approximated by a Brownian motion.Trivially, stationary m-dependent processes satisfying appropriate moment conditions fall into their framework.For more details we refer to Berkes et al. [4].Then the series Eξ 0 ξ k is absolutely convergent, and we can redefine ξ on a new probability space on which we can also construct two standard Brownian motions W 1 and W 2 such that Proof.By Proposition 3.1 we see that we can redefine our process such that it is embedded in a richer process Z.We will identify X as the first coordinate of the process Z.Following Proposition 3.2, we introduce the sequence of stopping times (S n , R n ) defined as S 0 = R 0 := 0 and S n+1 := inf{T m > R n : Z Tm ∈ A} and R n+1 := inf{T m : T m > S n+1 }.
Then Z R + n is independent of F Rn−1 for all n ≥ 1 and (Z Rn ) n≥1 is an i.i.d sequence with Z Rn ∼ ν(dx)λ(du)K((x, u), dx ′ ) for all n ≥ 1.
As a direct consequence, the sequence {ξ n } n defined as is stationary under P ν .Moreover, by Proposition 3.3 for n ≥ 2, ξ n is independent of F Rn−2 .Let N (T ) denote the amount of regenerations of the resolvent chain up to time T , namely It immediately follows that Consequently, we have that By an analogous Borel-Cantelli argument as given in Theorem 4.1 for the remainder term defined in (54), we have that Furthermore, by Proposition 3.3, the sequence {ξ k } ∞ k=1 is a stationary m-dependent sequence.By the imposed moment conditions and stationarity, we have by the reasoning given in Berkes et al. [4; Section 3.1] that {ξ k } ∞ k=1 is also a weakly m-dependent process with a rate function κ(m) equal to zero for m ≥ 1. Hence by Theorem 7.4, we can redefine (ξ k ) k on a new probability space on which we can also construct two standard Brownian motions W 1 and W 2 such that where {s n } and {t n } are increasing deterministic sequences with s 2 n = σ 2 ξ n + O(n/ log n) and t 2 n = O(n/ log n).Note that by Proposition 3.4 we have that Hence Furthermore, by definition of big O in (64), there exists an almost surely finite random variable C such that for almost all sample paths ω we have that for all n ≥ N 0 ≡ N 0 (ω) we have that Since we have that E ν R q 1 < ∞, by Csörgö and Horváth [18; Theorem 2.4] we can construct a Brownian motion W such that By the law of iterated logarithm for Brownian motion we obtain Since N (T ) is almost surely increasing and tends to infinity, we have that for almost every sample path ω there exists a T 0 ≡ T 0 (ω) such that N (T )(ω) ≥ N 0 for all T ≥ T 0 .Hence we obtain from (65) that lim sup where s 2 N (T ) and t 2 N (T ) are almost surely increasing sequences, which given N (T ) are deterministic with We see that (68) can be reformulated as Here the second equality follows by (67).Furthermore, the asymptotic behaviour of N (T ) motivates the introduction of σ 2 T , τ 2 T and β T defined as By Theorem 6.1 (see also Theorem 1.2.1 of Csörgö and Révész [20]) we see that with Combining results ( 63), (69), and (71) concludes the proof.By the same arguments given in the proof of Theorem 4.1, the strong invariance principle holds for every initial distribution.For this proof, we will rely on the following properties of the resolvent chain.Granted that the process X is aperiodic and positive Harris recurrent, then also the resolvent X will inherit these properties, as seen in Meyn and Tweedie [59; Propostion 5.4.5] and Tweedie [81; Theorem 3.1] respectively.Moreover, by Down et al. [30;Theorem 5.3], exponential convergence to stationarity is equivalent for X and X.The split chain of the resolvent in turn obtains aperiodicity and positive Harris recurrence from X, as seen in for example Nummelin [63].Following a co-deinitialising argument of Roberts and Rosenthal [73], we see that the split chain inherits the rate of convergence of the resolvent chain.To conclude, we see that the split chain inherits aperiodicity, positive Harris recurrence, and the rate of ergodicity from the process X.
Note that by Proposition 3.1 (Z 1 Tn , Z 2 Tn ) n , the jump chain of the first two coordinates of Z, has the same distribution as the split chain of the resolvent.From ( 13) and (14) we see that (Z 1  Tn , Z 2 Tn ) n is a Markov chain taking values in E ′ := E × [0, 1] that moves according to the kernel where λ denotes Lebesgue measure on the unit interval.Observe that the kernel of the split chain of the resolvent also satisfies a one-step minorisation condition U ′ ≥ s ⊗ ν ⊗ λ, i.e., U ′ ((x, u), (dx, du)) ≥ s(x, u)ν(dx)λ(du), where Moreover, the split chain of the resolvent is aperiodic, positive Harris recurrent and inherits the rate of convergence to stationarity from X.
Proof.Since the resolvent chain has the same stationary distribution as the process X, i.e., π = πU , the claim follows with the identical argument of Hobert et al.Proof.Firstly, by the construction of the randomised stopping times (S n ) n and (R n ) n we see that R n = S n +σ n+1 , where σ n+1 has a standard exponential distribution.Hence, by the triangle inequality in L q (π) we only need to show that E π [S 1 q ] < ∞, with Let Z = ( Zn ) n denote the jump chain of the process Z, i.e., Zn = Z Tn , where the (T n ) n denote the jump times.Let X = ( Xn ) n≥0 again denote the resolvent chain.Let N t denote the amount of jumps up to time t.Let τA denote the hitting time of the recurrent atom for jump chain Z, i.e., τA : = inf{n ≥ 0 : For notational convenience we introduce q := β − 1, note that by the assumed ergodicity assumptions we have that q > (p + ε)/ε.From the relation between the expectation of positive random variables and tail probabilities we can express the expectation of interest as follows Note that Γ(k + q)/Γ(k − 1) can be dominated by some polynomial ψ(k) with a leading term of order k q+1 .By Nummelin and Tuominen [65; Proposition 1.6] we have that It follows that E π S q 1 < ∞.For the second statement of Theorem 4.4 we follow the argument of Bednorz and Latuszyński [3; Theorem 2] with some minor adaptations.We give the proof for completion.
Here the inequalities follow by Minkowski's integral inequality, Hölder's inequality, stationarity, and Markov's inequality.Note that the integral on the last line is finite due to the imposed condition on the rate of polynomial ergodicity since q = β − 1 > (p + ε)/ε.An application of Lemma 7.5 concludes the proof.
Remark 7.6.For the exponentially erogdic case we would make use of Nummelin and Tuominen [64; Lemma 2.8] which states that for an exponentially ergodic Markov chain there exists an r > 1 such that Proof.Proof.Let x 0 denote the smallest local optimum of the density π, i.e., x 0 = min{x : π ′ (x) = 0}.
Since the tails of π are diminishing, we must have that x 0 is a local maximum.Moreover, define a set A as follows A = [−M, x 0 ] × {+1}.
Note that for all (x, v) ∈ A we have that the switching intensity λ(x, v) = 0, since the process is moving toward a higher density region.Hence the process will move deterministically for time M until the point x 0 × {+1} is reached and the probability of a velocity change becomes positive.This motivates the introduction to the stopping times R n defined as By the Markov property, the sequence {ξ n } defined as is i.i.d under P ν , with ν a Dirac measure at the point x 0 × {+1}.Note that this argument holds for any local optimum by the smoothness assumptions on π.Note that we also have that R n = M + τ A with τ A again denoting the hitting time of set A. Since we have that where τ again denotes the hitting time of the resolvent chain, we can follow the argument of Theorem 4.4 to obtain that Moreover, for all measurable f : E → R with π(|f | p+ε ) < ∞ where p ≥ 1 and β > (p + 2ε)/ε, we have that Define (τ k ) k∈N as τ k = R k − R k−1 and let ̺ and σ 2 ̺ denote the mean and variance respectively.The sequence of random vectors (ξ k , τ k ) are independent and identically distributed.If we choose α = Cov ν (ξ 1 , τ 1 )/ Var ν (τ 1 ), then it immediately follows that ξ k − α(τ k − ̺) and τ k are uncorrelated.
From the assumptions on the rate of ergodicity we see that the approximation error simplifies to o(n wehre γ = σ 2 ̺ /̺ and N is constructed increment-wise from B 2 in a determinstic way and is therefore also independent of B 1 .From ( 76) and ( 78) it follows that We claim that it therefore follows that Indeed, we have that ), it follows from the ergodic law of large numbers that and therefore our claim (80) follows.Combining ( 75), (79), and (80) it follows that Let (Γ s ) s≥0 be defined as Γ 0 := 0 and Γ s := N −1 (s), the generalized inverse of the Poisson process.Taking n = Γ n ′ in (83) and subsequently making the substitution n = γn ′ , it follows that where we used the fact that Γ is a non-decreasing process that tends to infinity.Moreover, since Γ n has a Gamma distribution it follows from the Komlós-Major-Tusnády approximation [49; Theorem 1] that there exists a Brownian motion Note that the Poisson process N and therefore its corresponding event time process Γ are independent of B 1 .Therefore by an application of Lemma 7.7 with n = Γ n it follows that there exists a standard Brownian motion B 4 independent of N and Γ such that Applying the obtained approximations given in ( 85) and ( 86) to (84) it follows that Note that since B 3 and B 4 are independent we have that is a standard Brownian motion since Furthermore, by definition of big O, there exists an almost surely finite random variable C such that for almost all sample paths ω we have that for all n ≥ N 0 ≡ N 0 (ω) we have that It immediately follows that (90) also holds for T sufficiently large and hence carries over for T → ∞.By the same argument given in the proof of Theorem 4.1, the strong invariance principle holds for every initial distribution.the theorem follows if we can show that a strong invariance principle holds for every component on the same probability space.Firstly, assume that the initial distribution of Z is π.
In order to obtain a Brownian approximation for every coordinate we will use a regenerative argument along the lines of Theorem 4.6.For every component i = 1, • • • , d we define the following: x i 0 the smallest local maximum of the density π i , i.e., x i 0 = min{x : π ′ i (x) = 0} and corresponding set set A i = [−M, x i 0 ] × {+1}, and the sequences of stopping times {R i n } n∈N as follows R i 0 = inf{t ≥ 0 : (X i t , V i t ) = (x i 0 , 1)}, and R i n = inf{t ≥ R n−1 : (X i t , V i t ) = (x i 0 , 1)}.Furthermore, we also introduce for every coordinate i the sequence {ξ i n } defined as Note that for all components {ξ i n } n is i.i.d under P νi , with ν i a Dirac measure at the point x i 0 × {+1}.We can follow the argument of Theorem 4.4 to obtain that Moreover, for all measurable f : E → R with π(|f | p+ε ) < ∞ where p ≥ 1 and β > 2 + p/ε, we have that Note that for the RHS of (91), we have for every coordinate i that By assuming that the process starts at its stationary distribution, it follows by the argument in the proof of Theorem 4.4 that it follwos that there exists a 2d-dimensional Brownian motion such that where Σz = diag(Var ν (z 1 ), • • • , Var ν (z k )).All components of z k are independent and therefore also the corresponding components of the Brownian motion are independent.Note that we have that for every component z i k of z k we have that there exists two independent Brownian motions B 1 and B 2 such that n k=1 Note that in (95) we have that E ν ξ i 1 = 0 by Theorem 3.4 and that σi = Var ν (ξ i 1 − α i (τ i 1 − ̺ i )).By following the argument of the proof of Theorem 4.6 for every component, we see that By combining (91), ( 93) and (97) the claim follows.By the argument given in the proof of Theorem 4.1, the strong invariance principle holds for every initial distribution.ξ n/ log n.By the properties of the sequences s 2 n and t 2 n described in Theorem 7.4 we have that for any ε > 0 there exists some n 0 such that for all n ≥ n 0 Since a T → ∞ as T → ∞ by assumption, and we have that the amount of regenerations of the resolvent chain in an interval of length a T will be equal to a T /̺ + o(a T ), it follows that for all ε > 0 there exists some T 0 such that for all T ≥ T 0 Similarly, it can be shown that lim sup A 2 = 0 almost surely, which completes the proof.

)
Proof.By the imposed conditions of the process, the strong invariance principle formulated in Theorem 4.1 holds.The first claim then follows by Damerdji [25; Theorem 1 and Lemma 3] and the second by Damerdji [25; Proposition 2].

1 .
By the Komlós-Major-Tusnády approximation the fluctuation result immediately carries over for i.i.d.sequences satisfying appropriate moment conditions, as seen in Csörgö and Révész [20; Theorem 3.1.1and 3.2.1].

[ 4 ;Theorem 6 . 2 .
Theorem 4].Let X = (X t ) t≥0 be an aperiodic, positive Harris recurrent Markov process for which Assumption 1 is satisfied.Moreover, let X be polynomially ergodic of order β > 3 + p/ε, for given p > 2 and some ε > 0. Consider a function f : E → R with π(f ) = 0 and π(|f | p+ε ) < ∞.Let a T be a given positive non-decreasing function of T such that i. 0 < a T ≤ T , ii.T /a T is non-decreasing, iii. a T is regularly varying at ∞ with index ζ ∈ (0, 1].

7. 4 . 2 .
Proof of Theorem 4.7Proof.From Bierkens et al.[9; Proposition 2.8]  we see that the Zig-Zag process with a stationary distribution of product form π(x) = d i=1 π i (x i ) can be decomposed into d independent Zig-Zag processes, each with stationary distribution π i .Since we have thatT 0 f (X t ) dt − T π(f ) − Σ X i t ) dt − T π i (f i ) − σ fi W i (T ) ,(91)
1] denotes a standard p-dimensional Brownian motion, and the weak convergence is with respect to the Skorohod topology on D[0, 1], the space of real-valued càdlàg functions with domain [0, 1].Proof.By Philipp and Stout [68; Theorem 1.E], the FCLT immediately follows from the strong invariance principle formulated in Theorem 4.1.Similarly, by Damerdji [24; Proposition 2.1] the CLT follows.
1/p < ∞.Note that we have now shown our result assuming stationarity, i.e., with initial distribution π.However, by a standard argument this can be extended to every initial distribution.It is well known that for ergodic Markov processes bounded harmonic function is constant, see for example Kallenberg [47; Theorem 20.10].Hence by following the argument of Meyn and Tweedie [59; Proposition 17.1.6]it follows that the strong invariance principle holds for every initial distribution.
1/p log 2 n) a.s., where {s n } and {t n } are non-decreasing deterministic sequences with s 2 n = σ 2 ξ n + O(n/ log n) and t 2 n = O(n/ log n).As noted by Berkes et al. [4] the perturbed time sequences {s n } and {t n } are deterministic and can be explicitly calculated.
.4.Theorem 4.6 and 4.7 Lemma 7.7(Merlevède et al. [57; Lemma 2.4]).Let B be a standard Brownian motion and N be a Poisson process with intensity λ, independent of B. Then there exists a standard Brownian motion W that is also independent of N such that The claim immediately follows from Merlevède et al. [57; Lemma 2.4] and a Borel-Cantelli argument.
1/p ).By Komlos [49; Theorem 1(ii)], a Poisson Process N with intensity λ = ̺ 2 /σ 2 ̺ can be constructed from the Brownian motion B 2 such that is almost surely finite and hence asymptotically negligible.Define (τi k ) k∈N as τ k = R i k − R i k−1 and let ̺ i and σ 2 ̺i denote the mean and variance respectively.The sequence of random vectors (ξ i k , τ i k ) are independent and identically distributed.If we choose α i = Cov ν (ξ i 1 , τ i 1 )/ Var ν (τ i 1 ), then it immediately follows that ξ i k − α i (τ i k − ̺ i ) and τ i k are uncorrelated.By applying the multivariate Komlós-Major-Tusnády approximation given in Einmahl [33; Theorem 1] and Csörgö and Horváth [18; Theorem 2.1] to the sequence of random vectors