Subgeometric hypocoercivity for piecewise-deterministic Markov process Monte Carlo methods

We extend the hypocoercivity framework for piecewise-deterministic Markov process (PDMP) Monte Carlo established in [2] to heavy-tailed target distributions, which exhibit subgeometric rates of convergence to equilibrium. We make use of weak Poincaré inequalities, as developed in the work of [15], the ideas of which we adapt to the PDMPs of interest. On the way we report largely potential-independent approaches to bounding explicitly solutions of the Poisson equation of the Langevin diffusion and its first and second derivatives, required here to control various terms arising in the application of the hypocoercivity result.


Introduction
In this work, we study piecewise-deterministic Markov processes (PDMPs) which are used in the context of Monte Carlo inference to draw samples from some given target density π on R d , for instance in Bayesian computation. Notable examples of such processes are the Zig-zag process of [5] and the Bouncy Particle Sampler of [9]. PDMPs have gained attention within the field of Markov chain Monte Carlo (MCMC) because such methods depart significantly from traditional reversible MCMC approaches based on Metropolis-Hastings. These processes are constructed to be nonreversible, intuitively enabling persistent exploration within the state space, rather than the diffusive exploration characteristic of reversible schemes.
However, this nonreversiblility also introduces mathematical difficulties when analysing the theoretical properties of the resulting algorithms, such as rates of convergence. Traditional methods based on spectral theory for self-adjoint operators in Hilbert spaces can no longer be applied, and furthermore the underlying operators which define the process tend to be non-coercive: the symmetric component of the operator has a nontrivial kernel. This implies that one cannot expect straightforward geometric convergence of the semigroup (P t ), in the sense that there exists some ρ > 0, such that for appropriate functions f , for some C ≤ 1, ξ(t) = exp(−ρt) for t ≥ 0 and an appropriate norm · . In order to understand degenerate dynamics, the hypocoercivity framework has been developed, following the approach of [11]. This framework was first applied to PDMPs in [2], where exponential convergence of the semigroups was proven as in (1) when the target density π satisfies a Poincaré inequality: for some constant C P > 0, for suitably differentiable functions f ∈ L 2 (π), with f dπ = 0, where · 2 is the norm in L 2 (π). The authors were able to conclude that (1) holds for such targets, with an exponential rate function ξ, and for some constant C > 1.
The goal of this work is to extend the hypocoercivity results of [2] to targets π which do not possess a Poincaré inequality (2), but instead possess a weak Poincaré inequality of the form where α : (0, ∞) → [1, ∞) is a decreasing function, typically divergent as r ↓ 0, and Ψ is an appropriate functional. This encompasses target distributions which possess subgeometric tail decay, and are typically referred to as 'heavy-tailed'. To do this, we will utilize the approach of [15], where such inequalities were studied for degenerate diffusions. Our main abstract result will be a convergence result of the form (1) where the rate function ξ is in fact subgeometric.
As a concrete application, our bounds on the semigroup of the form (1) will allow us to check conditions which ensure that a central limit theorem holds for (appropriately scaled) ergodic averages of the process.

Contribution
In this subsection, we carefully describe our contributions in relation to the literature, particularly references [2,15] and [14]. Readers interested in our actual results are encouraged to move on to the following subsection where we define our notation, or to Section 2 for our assumptions, Section 3 for our abstract result, Section 4 for our result for PDMPs, or to Section 5 where we compare our bounds theoretically and empirically on illustrative examples.
In the present manuscript, we work in and extend the general framework laid out in [2] for PDMPs. The analysis carried out in [2] crucially relied on the existence of a (strong) Poincaré inequality (2), which enabled the authors to establish geometric convergence of the semigroup (1). These results rely themselves on the framework proposed by [11] for which the first rigorous proof was established in [14] and whose results were adapted to take into account technical specificities of PDMPs in [2]. Our work aims to combine the framework recently proposed in [15] to tackle scenarios where application of the ideas of [11,14] is sought, but only a weak form of the Poincaré inequality is satisfied, and [2] which takes into account PDMP idiosyncrasies.
More specifically, while our abstract assumptions and the resulting theorem and its proof (see Sections 2, 3) may superficially appear very similar to those of [15], we were not able to straightforwardly apply their results and have adapted them following [2]. This disparity fundamentally arises from the differences in the how the corresponding processes arise. For PDMPs, as in the present work and in [2], the initial point of departure is an explicit construction of the process, in terms of the deterministic dynamics and the switching mechanism, driven by an inhomogeneous Poisson process. The infinitesimal generator is a by-product, and its closure not sufficiently tractable to work with. On the other hand, for diffusion processes as in [15], one can begin with an appropriate differential operator, the putative infinitesimal generator, take the closure, and then via the standard operator-theoretic machinery define the ensuing semigroup and stochastic process. The key technical differences arise on the level of checking the closure of certain operators and we comment on this point in relevant places in the text.
In relation to the actual results of [15], by specialising to our particular PDMP setting we are also able to obtain slightly better constants in the decay of the semigroup. In relation to [2], by leveraging the powerful results of [16], we establish (quasi) potential-independent approaches to bounding the difficult cross terms arising in the application of the hypocoercivity result, which depend on smoothness estimates of the solution of the Poisson equation for the Langevin diffusion process.
The recent work [17] also studies geometric convergence of PDMP semigroups. Their work also crucially relies on a (strong) Poincaré inequality, and their Assumption 3 typically holds when the potential U (x) = − log π(x) grows at a superlinear rate in |x|. Thus the framework of [17] cannot currently be applied to heavy-tailed targets.
Finally, we briefly mention some other related work. For an accessible introduction to hypocoercivity, we recommend [1], which focusses on the finitedimensional ODE setting. In [10], hypocoercivity techniques were used to study randomized Hamiltonian Monte Carlo (RHMC) and derive dimension-free exponential convergence rates. For connections between hypocoercivity and convergence proofs based on Lyapunov functions, see [18]. For a broad and recent review of the current literature on hypocoercivity, see [4].

Notation
• |·| denotes the Euclidean norm on R d and for v, w ∈ R d , v, w = v w is the associated inner product where v is the transpose of v.
• I d denotes the d × d identity matrix.
• For a vector w ∈ R d we will write w i , i = 1, . . . , d, for its coordinates with respect to the standard basis.
• For A a set, IA is the associated indicator or characteristic function.
• For a smooth manifold M and k ∈ N ∪ {∞}, C k (M, R m ) denotes the set of k-times continuously differentiable functions f : which are in addition bounded and have bounded derivatives up to order k. A subscript C k c (M) denotes functions in C k (M) which are compactly supported.
denotes the partial derivative of f with respect to the ith coordinate, for k ≥ 1, and analogously for i, j ∈ {1, 2, . . . , d}, ∂ i,j f denotes ∂ i ∂ j f , for k ≥ 2.
• For any measurable space (M, F) with probability measure m, we let L 2 (m) be the Hilbert space of real measurable functions f with M |f | 2 dm < ∞, with inner product f, g 2 = M f g dm and corresponding norm · 2 .
When there is ambiguity we may also write f, g L 2 (m) , · L 2 (m) or f, g m , · m . We use the same notation for F, Note that we use the notation normally associated with Sobolev spaces, but our derivatives are not weak derivatives.
• For a measurable function f : M → R, let f osc := ess m sup f −ess m inf f .
• L ∞ (m) will denote the Banach space of (equivalence classes of) measurable functions f : M → R with ess m sup |f | < ∞.

PDMP notation
We summarize here our PDMP notation; for the underlying assumptions, see Section 2. Given potential U : X → R, we will denote the target distribution of interest by π = e −U / X e −U (y) dy on X = R d equipped with its Borel σ-algebra. V ⊂ R d is a closed subset and we have a probability measure ν defined on V equipped with its Borel σ-algebra V. Then set E = X × V and define the augmented probability measure µ = π ⊗ ν. We will be working with PDMPs whose generators are of the form, for Here R v is the refreshment operator, given for any f ∈ L 2 (µ), by For a sequence of continuous vector fields F k : X → R d , k ∈ {1, 2, . . . , K}, such that ∇ x U = K k=1 F k , we now define the corresponding bounce operators B k . For each k ∈ {1, 2, . . . , K}, x ∈ X set then set for any f : The intensity λ k (x, v) has an explicit form, depending on the dynamics, which ensures invariance of µ. In particular, we require that , which is a necessary condition for µ to be an invariant measure (see our Assumption 3). Finally assumed to be finite.
Remark 1. The PDMPs considered in [2] are slightly more general as they include the possibility of non-linear drift, that is the first order derivative term v This includes important examples such as randomized HMC [8] and the Boomerang Sampler [6]; to simplify the expressions we have not included this term but the results hold in this more general setting also. More complex refreshment operators can also be considered; see [2].

PDMP assumptions
In what follows, 'Conditions' will refer to the conditions needed for the abstract hypocoercivity result to hold; these are inspired by [15]. 'Assumptions' will refer to the assumptions made on our PDMP process, which we will show imply that the Conditions hold. We first give a basic condition, following [2].
(b) µ is a stationary measure for (P t ) t≥0 .
(c) There exists a core C for L such that C is dense in L 2 (µ) and C ⊂ D(L) ∩ D(L * ), where (L * , D(L * )) is the adjoint of L on L 2 (µ).
We now give the assumptions on the potential U .
Assumption 1. The potential U is such that, (a) U ∈ C 2+α (X) for some α ∈ (0, 1); (c) either of the following holds: We remark that this is where our present work diverges substantially from [2]. Since we are interested in studying subgeometric rates of convergence, we do not assume a Poincaré inequality here as in [2]: instead, we will later assume a weak Poincaré inequality.
In both of these cases Assumption 1 holds since ∇ x U and ∇ 2 x U are both bounded. Both of these examples have subexponential decay but satisfy a weak Poincaré inequality as we show in Example 4 using the results of [21]. We assume the following assumptions on the vector fields, as in [2]. (a) for k ∈ {0, 1, . . . , K}, F k ∈ C 2 (X, R d ); (c) for all k ∈ {0, 1, . . . , K}, there exists a k ≥ 0 such that for all x ∈ X, Following [2], we make the following assumption on the event rate.
Example 2. Many standard PDMP algorithms satisfy these assumptions: the canonical basis, then we have the Zig-Zag process [5].
(b) The choice K = 1 and F 1 = ∇ x U gives the Bouncy Particle Sampler [9]. Now we give assumptions on V and ν: Assumption 4. We assume the following.
(c) For any bounded and measurable function g : (d) ν has finite fourth order marginal moment, (e) Assume that m 2 ≥ 1.
The last condition is purely technical and allows for simpler expressions in Theorem 1.
We note that this assumption precludes the use of heavy-tailed distributions for the velocity component v. By the discussion after [2, H4] if ν is rotation invariant then Assumption 4-(a)-(b)-(c) are satisfied. Assumption 4-(b) above implies also that Assumption 5. The refreshment mechanism is given by Assumption 6. The refreshment rate λ ref : X → R is bounded from below and above as follows: there exist λ > 0, c λ ≥ 0 such that for each x ∈ X,

Abstract result
We will decompose our operator L into symmetric and antisymmetric parts, We remark that while our upcoming abstract result, Theorem 1, closely resembles Theorem 2.1 of [15], our definitions of the abstract operators S, T are given by (6) above, which follows the approach of [2] instead. In our PDMP setting we require this approach in order to explicitly identify the operators S, T in Section 4 and define intermediate quantities and their properties below. By contrast, in the diffusion setting of [15], the authors are able to employ It" o's Formula to identify their corresponding symmetric and antisymmetric operators. Thus we cannot simply use the approach and Theorem 2.1 of [15] directly, but we combine the two approaches of [2,15].
As in [2] we note that under Conditions 1 and 2, T Π v is closable, with closure T Π v , D T Π v ; this follows from the fact that T is antisymmetric and C is dense as shown in the proof of Lemma 2. This allows us to define, by [19, Theorem 5.1.9], As detailed in [2, Lemma 3] A is closable with bounded closure (on L 2 (µ)), and we denote its closure by A hereafter. This is different from [14,15] where instead it is assumed that T is closed, or closable, leading to a definition of A either involving T Π v , orT Π v , and for which we could not establish key intermediate results, given the level of current understanding of PDMPs of the type considered in this manuscript.
Condition 3. We have: Condition 4. There exists some R 0 ≥ 1 such that for any f ∈ C, We state the assumptions on the functional Ψ which will appear in our weak Poincaré inequalities.

Condition 5.
We have a functional Ψ : Further, setting G : we also assume that Ψ satisfies for each f ∈ L 2 (µ) and t ≥ 0, We now state the required weak Poincaré inequalities.
Corollary 1. Assume the same conditions as Theorem 1, except that (10) is replaced by a strong Poincaré inequality, and assume furthermore that for each Then we have that (11) holds, additionally, for any f ∈ L 2 (µ), with As we shall see in Example 3, our application of Theorem 1 to PDMPs, given in Theorem 2, greatly broadens the class of PDMP Monte Carlo processes for which a central limit theorem holds. The following corollary will be applied to our examples in Section 5.

Corollary 2. Whenever
then for any f ∈ L 2 (µ) such that µ(f ) = 0 and Ψ(f ) < ∞, the finite-dimensional distributions of the rescaled process converges as N → ∞ to those of a standard one-dimensional Brownian motion, where σ > 0 is an appropriately chosen constant defined in [22,Theorem MW]. (11) vacuously holds. So assume Ψ(f ) < ∞, and choose a sequence (f n ) ⊂ D(L) satisfying (14). We can apply (11) to each f n , and by taking the lim sup we conclude the inequality (11) is valid for f also. The alternative expression for ξ is immediate from the expression (12) since in this case, α 2 can be uniformly bounded from above. If c 2 denotes the constant in (12), then we can choose Proof of Corollary 2. This is a direct application of [22, Theorem MW] which holds whenever, with f ∈ L 2 (µ) such that f dµ = 0 and v t := t 0 P s f ds for each t > 0, Remark 2. The proof of Theorem 1 follows the proof of Theorem 2.1 of [15], however we have adapted the proof to take into account our differing assumptions and have been careful to track the constants involved. Due to the structure of the generators of the PDMPs we consider, under Assumptions 4 and 5, the operator S will satisfy (10) with α 2 (r) constant and hence we may set Ψ = 0 in this inequality and obtain (13); see Corollary 1. This stems from the specific refreshment mechanism employed here -the measure ν may be required to satisfy a weak Poincaré inequality of the form of (23) when using a diffusion for this update; see Section 4.3 for a similar situation involving SDEs. Therefore we include the details of the proof of Theorem 1, making it straightforward to see how the constants simplify in Corollary 1.
Before we prove Theorem 1 we need the following two lemmas. The following is taken from [ holds for some decreasing α : Then, for any m 2 > 0, The following is a consequence of Lemma 1.
. This together with Condition 5 allows us to apply Lemma 1 to conclude that Rearranging, setting f = Π v g for g ∈ C ⊂ D(T Π v ) and using Condition 5 we obtain We now establish the following result which is based on [2, Lemma 5]. To prove the above theorem we need to define the closureĀ of A which is defined on the whole space L 2 (µ), this is possible since A is a bounded operator see [2,Lemma 3]. Define for any g ∈ D(L) Lemma 3. Assume that Conditions 1, 2, 3, 4, 5, 6 hold. Then for any g ∈ D(L) we have for any r 1 , r 2 > 0, Proof. Note that the proof of the inequality for F 2 in [2, Lemma 5] does not rely upon the Poincaré inequality so we may use the same proof to obtain for all g ∈ D(L), Fix g ∈ C and using that T is antisymmetric we have that Lg, g 2 = Sg, g 2 and by (10) we have for r > 0, To extend this to D(L) fix f ∈ D(L) and let {f n } n ⊆ C be as in Condition 5, then we have This inequality can be extended to g ∈ D(L) since C is dense in D(L) andĀT Π v is bounded, and Π v and Id − Π v are bounded. From Lemma 2, we have for g ∈ C ⊆ D(T Π v ). Note that (18) can be extended to f ∈ D(L): fix f ∈ D(L) and let {f n } n ⊆ C be as in Condition 5. Then (18) holds for each f n and can be extended to f , sinceĀT Π v is bounded and by (7). Finally therefore, putting the pieces together we can conclude that for each g ∈ D(L), Proof of Theorem 1. We combine approach of [2,Theorem 4] with that of [15,Theorem 2.1]. Without loss of generality we may assume µ(f ) = 0. Let us define for any > 0 and g ∈ L 2 (µ), As in [2], we have the equivalence, for any 0 < < (m 2 /2) 1/2 and g ∈ L 2 (µ), For f ∈ L 2 (µ) let us write for convenience, f t := P t f for each t ≥ 0. Then from the Dynkin formula we know that f t ∈ D(L) and df t /dt = Lf t for each t > 0. Then, we can use Lemma 3 to obtain We now follow the calculations in the proof of [15, Theorem 2.1]. Our approach is very similar, however we obtain slightly better bounds (from our Lemma 3) which lead to slightly better constants in the end; hence we include a full proof. We use Young's inequality to bound the cross term This gives us Now we take and using the fact that f t . Now since < 1/2 we use (19) to see that for g ∈ L 2 (µ) . So by Gronwall's lemma, for t ≥ 0, Now we choose r 1 = r, r 2 = r/α 1 (r 1 ) 2 , and then using that m 2 ≥ 1 (Assumption 4) and (19), Here we can take for as defined in (20), Thus we can conclude that (11) holds with ξ as in (12) for some c 1 , c 2 > 0.

Application to PDMPs
In this case the operator L acts on smooth functions in C = C 2 b (E) as follows, In which case we have that the operators S and T are given for functions f ∈ C by We take Ψ = · 2 osc . Our goal in the section is the following theorem, which will follow from Corollary 1 once we have checked that the abstract conditions hold.
In particular, c 2 may be taken to be Example 3 (Central limit theorems.). From our results in Example 5, we can apply Corollary 2 to our running examples to see that a central limit theorem holds for f ∈ L 2 (µ) such that f osc < ∞, (a) for U (x) = 1 2 (d + p) log 1 + |x| 2 whenever p is large enough so that τ < 1/2, where τ is defined later in (24), (b) for U (x) = σ|x| δ for any σ > 0, 0 < δ < 1.

Checking Condition 1
In [2] it is argued that the BPS and the ZZ processes are both well-defined Markov processes satisfying Condition 1 with C = C 2 b (E) as a core (see their remarks after their Corollary 2). In order to help the reader we provide here a brief overview of existing theoretical results which have been used to establish a similar property, and can be adapted to establish Condition 1. For the BPS, it is shown in [12], in full detail, that C 1 c (E) is core for its generator on C 0 (E), the set of continuous functions vanishing at infinity. This relies on a stability property of C 1 (E), the set of continuously differentiable functions, under the semigroup [12, Lemma 17], which by using [13, Proposition 3.3] implies the core property. As remarked in [12,Remark 18], [12,Lemma 17] can be extended to cover stability of C k c (E) for k ≥ 2 and an application of [13, Proposition 3.3] leads to the desired conclusion for the generator on C 0 (E). Crucially [12,Lemma 17] requires the intensity to belong to C 1 (E), a property not satisfied by the standard BPS or ZZ when using the "canonical" choice of intensity. This is however relaxed for BPS by utilising their [12,Theorem 21] in conjunction with [12, Propositions 9 and 23]. In [3, Theorem 5.11] these ideas are used in the context of the ZZ process to establish in detail that C 1 c (E) is a core for the generator on L 2 (µ), assuming that the intensities involved belong to C 1 (E)-this extends directly to the scenario where C 1 (E) is replaced with C 2 (E). We note that [3, Proposition 5.17] defines a family of smooth intensities, C 2 (E) under Condition 1, uniformly converging to the popular canonical choice, so we may apply [12,Proposition 11 and 27] to get that C 2 c (E) is a core for the generator in C 0 (E) of ZZ with the canonical intensity. Since the semigroup preserves the domain of the generator in C 0 (E), when we extend the semigroup to L 2 (µ) we have that the domain of the generator is a core for the generator in L 2 (µ) by using [13,Proposition 3.3]. Then since convergence in C 0 (E) implies convergence in L 2 (µ) we have that C 2 c (E) is a core for the generator of ZZ with the canonical intensity in L 2 (µ). This immediately implies that C 2 b (E) is a core in L 2 (µ) as desired. See also [7] for a direct approach in the one-dimensional ZZ case.

Checking Condition 6: weak Poincaré inequalities
To establish weak Poincaré inequalities, our starting point is [21], as in [15]. [21,Theorem 3.1] and the subsequent remark allow us to deduce that there exists decreasing functions α 1 , α 2 : (0, ∞) → [1, ∞) such that the weak Poincaré inequalities hold We need to show conditions (9) and (13) hold. First by [2, Lemma 9 (b)] we have for any f ∈ C, Multiplying by Π v f and integrating we obtain Therefore substituting the above expression into (23) we have Thus we have (9). Note that (13) follows immediately from [2, Proposition 10], since we have the same refreshment mechanism. Indeed, we can obtain for f ∈ C, where c is a universal constant independent of dimension and r. Note that [21, Example 1.4(a)] considers a slightly different function, V (x) = (d + p) log (1 + |x|). However, the difference between these functions is bounded so if (23) holds for V , then it also holds for U .
We detail convergence rates these potentials lead to in Section 5.

Checking Condition 4: finding R 0
The most difficult part of the proof is checking Condition 4 which will control the remainder terms. For light-tailed targets in [2] this is done by showing that solutions of the Poisson equation have polynomially growing derivatives. This implies in their setting that they are π-integrable, however for heavy-tailed measures π this is not sufficient. By using Schauder estimates we know that the solution u f of the Poisson equation is twice differentiable, however we do not know in general that the derivatives are π-integrable. By multiplying the solution by smooth cut-off functions it is shown in [16] that under Assumption 1 the first and second derivatives are in L 2 (π). We can write down the solution as since ∇ x is a densely-defined closed operator on L 2 (π), [2, Proposition 26] shows that (Id + ∇ * x ∇ x ) −1 is a positive self-adjoint bounded operator on L 2 (π), which furthermore is a bijection between L 2 (π) and D(∇ * x ∇ x ). In our case, we will utilize the powerful abstract result of [16].
We also remark here that our subsequent argument patches a minor omission in [15], in their proof of (H3) for degenerate diffusions. In [15], the authors reference [11] and [14, Section 5.1] for elliptic a priori estimates. However, the cited references assume the existence of a strong Poincaré inequality, which precisely falls outside the scope of the processes under consideration. (b) there exist some κ 1 , κ 2 > 0 such that: Here and if |∇ x U | is bounded then we may take κ 2 = sup x∈X |∇ x U (x)|, otherwise κ 2 = 4(4κ 1 + C U d 1+ω ), where ω, c U and C U are as in Assumption 1. Proof.
Note by rescaling f we may take m 2 = 1. The fact that u f ∈ L 2 (π) follows from the fact that (Id + ∇ * x ∇ x ) −1 is a positive self-adjoint bounded operator on L 2 (π), as detailed in [2,Proposition 26]. We now make use of [16,Theorem 3.3]. Since we are dealing with a simplified version of the Poisson equation (26), the Hypotheses 2.1(i)-(iii) of [16] are trivially satisfied. Hypothesis 2.1(iv) of [16] is equivalent in our setting to Assumption 1-(b). Hence we have satisfied the hypotheses of [16,Theorem 3.3]. The bounds (27) then follow immediately from [16,Theorem 3.3]. The finiteness and upper bound of ∇ 2 x u f 2 follows from revisiting the proof of [16,Theorem 3.3], see Appendix A.2. If there exists κ 2 such that ∇ x U ∞ ≤ κ 2 then (29) follows from (27). Now consider the case where (5) holds.
In the following, for functions R d → R d , we will use the bare notation · to denote the norm · L 2 π (R d ;R d ) . Following the proof of [2, Lemma 34] we have for any φ ∈ C ∞ c (X) and ε > 0 that Now using (5) we obtain Rearranging and setting ε = 1/4 gives As C ∞ c is a core for (∇ x , D(∇ x )) we have the above inequality for any φ ∈ D(∇ x ). In particular we shall set φ(x) = ∂ i u f which gives Now summing over i ∈ {1, . . . , d} we obtain Recall here that |∇ x u f (x)| denotes the Euclidean norm of ∇ x u f (x). Finally by using Cauchy-Schwarz we obtain the desired result with κ 2 = 4(4κ 1 + C U d 1+ω ).
A proof of the following result is given in [2, Lemma 9(c)]; however it relies on a density argument involving C 3 poly (X), therefore requiring the existence of moments under π and hence stronger assumption on U . The below establishes that this assumption is not required. Lemma 5. Under Assumption 1, then for any f ∈ L 2 (π) Proof. The proof is along the same lines as that of [2, Lemma 9(c)], replacing C 3 poly (X) with H 2 (X), thanks to the results of [16]. More precisely from [16, Lemma 3.1] we have that for any g ∈ H 2 (X) we can find {g n ∈ C 2 b (X)} n∈N such that ∇ x g n → ∇ x g and ∇ 2 x g n → ∇ 2 x g in L 2 (π), therefore implying Lemma 7) and the two operators are closed. Therefore the two operators coincide on Lemma 4), which is dense in L 2 (π) and we deduce (31) by boundedness of the two inverses.
In order to show that Condition 4 follows we use [2, Lemma 12 & 13] which states the following.
(a) For any f ∈ C 2 b (E), (c) For any f ∈ C 2 b (E), with κ 1 and κ 2 positive constants as defined in Lemma 4.

Corollary 3.
Assume that L is given by (22). Assume in addition that Assumptions 1 -6 hold. Then Condition 4 holds with Proof of Corollary 3 . By Lemma 6, we have

Now by Assumption 2-(c) and Lemma 4 we have
Similarly using Assumption 6 we have

Checking Condition 5
Recall that here Ψ(·) = · 2 osc . We combine the approaches of [15] and [2]. Note then that the conditions in (8) follow immediately from the definition of Ψ and the contractivity in L ∞ (µ) of the Markov semigroups P t and e −tG ; the latter corresponds to a diffusion semigroup. Now we check (7). Fix some f ∈ D(L) with Ψ(f ) = f 2 osc < ∞. Without loss of generality, by translating f , we can assume that µ(f ) = 0. Hence we have γ 1 := ess µ inf f ≤ 0 and is a core, we can choose a sequence {g n } ∞ n=1 ⊂ C such that g n → f and Lg n → Lf in L 2 (µ). We take, as in [15], for each n ∈ N a monotone increasing function h n ∈ C ∞ (R) which satisfies 0 ≤ h n ≤ 1 and n . Now similarly we set f n := h n (g n ) ∈ C and we have f n → f in L 2 (µ). By construction f n osc ≤ γ 2 − γ 1 + 1 n so we have The third equality follows from the fact that B k is symmetric on L 2 (µ), and B k λ e k (x, v) = λ e k (x, v), as in the proof of Proposition 7 of [3]. The inequality follows from the fact that h n is 1-Lipschitz; we have that (h n (x) − h n (y)) 2 ≤ (x − y) 2 for any x, y ∈ R. Thus we have verified Condition 5.
Since C is a core for L, and L is densely defined, C is also dense in L 2 (µ). So given some f ∈ L 2 (µ) with Ψ(f ) < ∞, this argument also allows us to conclude that there exists a sequence (f n ) ⊂ D(L) satisfying (14) as required in the assumptions of Corollary 1.

Examples
In this section we apply Theorem 7 to our running examples and obtain explicit bounds on convergence rate. We further explore the tightness of such bounds on various examples, both theoretically and empirically. Our main finding is that although our bounds are useful (e.g. we establish the existence of a central limit theorem for a large class of problems; see Example 3) and widely applicable, they are not sharp and rather pessimistic. In particular we find that the bounds we obtain for PDMPs do not compare favourably with the corresponding bounds for (reversible) Langevin diffusions for a particular heavy-tailed target density. Informally, this should not be surprising since for PDMPs, condition (9) is precisely that required of a Langevin diffusion (there is equivalence in the reversible case) to achieve a particular subgeometric rate of convergence. This condition drives all subsequent developments where the nonreversible nature of the initial process does not seem to play a rôle anymore.
Example 5 (Example 1 and 4 continued). In Example 4 we showed the weak Poincaré Inequality holds for the two examples considered and we now show what rate we obtain by applying Theorem 2. These obtained rates will immediately allow us to check condition (15) to ensure central limit theorems, as discussed in Example 3.
(a) For the case U (x) = 1 2 (d + p) log(1 + |x| 2 ) for some p > 0, we have from Example 4 (a) that α 1 is given by (24). Hence by Theorem 2, we have the bound, for some c > 0, (b) Consider the case U (x) = σ|x| δ , for x ∈ R d with |x| ≥ M , for some δ, σ, M > 0, by Example 4 (b) we have that α 1 is given by (25). By Theorem 2 we have that (11) holds with ξ(t) = inf r > 0 : to obtain the inequality in the last line we use that log(1 + x) − log(x) = log(1 + x −1 ) ≤ x −1 for x ≥ 1. Now for t ≤ 1 the required bound is immediate so we shall assume t ≥ 1 in which case 1 + exp(−kt δ 8−7δ ) ≤ 2 ≤ 2kt δ 8−7δ so there exists C(k, δ) > 0 such that Therefore we have, for some c > 0, Let us compare the rates we obtain with those found for the reversible and nonreversible Langevin diffusion. In [21] the authors consider the reversible (overdamped) Langevin diffusion, for heavy-tailed target distributions and prove convergence to equilibrium by using the weak Poincaré Inequality and standard techniques, whereas U in scenario: in [15] they use hypocoercivity to prove convergence to equilibria for the nonreversible (underdamped) Langevin diffusion, In this case the diffusion has a unique invariant measure with density which is proportional to e −V1(x)−V2(v) . In [ [15] for non-reversible Langevin dynamics which have Gaussian velocity. This is a demonstration of the limits of hypocoercivity theory for subgeometric target distributions, since the rate ξ(t) given by Theorem 1 depends only on α 1 , α 2 and c 2 but α 1 and α 2 are given by the target measure so are the same for each algorithm.
Rearranging, we find Substituting these expressions into the defintion of α 1 (r) we have Now we will run the Zig-Zag sampler with this potential, in which case one finds that For this choice of U we find c U = 0.3. Then To leading order 1 ξ(t) ∼ W ( 3c 2 π 4 512 t) −1/6 , here W (x) is the Lambert function defined as the inverse of xe x . We can also compare this with the numerical performance for the Zig-Zag sampler with canonical switching rates, below is a plot of E[f (X t )] 2 started with inital condition X 0 = −5 and with the velocity V 0 drawn uniformly from {1, −1}. To run the simulation we generated N = 10 7 Zig-Zag samplers and then calculated N −1 N n=1 f (X n t ) to estimate µ(f ). In the figure below we have used f (x) = I{x ≥ 5}. As we can see in the plot the process appears to converge much faster than the theoretical bound of ξ(t) which is included as a reference. Note that in some of the plots the error is converging to a constant value (around 10 −4 ) this is due to error in running a finite number of particles to estimate the expectation. We have also included on the plot a simulation for the non-reversible Langevin SDE where (B t ) t≥0 is a one-dimensional Brownian motion. To simlute the nonreversible Langevin process we used the Euler-Maryuma scheme with step size 0.01. We see for this example that the Zig-Zag process is converging to zero faster than the non-reversible Langevin SDE.
Letting n → ∞ and applying the dominated convergence theorem leads to the desired result; the required integrability follows from applying the Cauchy-Schwarz inequality several times, Assumption 1-(c) and that f (x)∂U (x) ∈ L 2 (π). Note if ∂U is bounded then it is immediate to see f (x)∂U (x) ∈ L 2 (π) on the other hand if ∂U is not bounded then Assumption 1-(c)ii holds and we use (30) with ϕ = f to show f (x)∂U (x) ∈ L 2 (π), indeed this gives When d > 1, simply notice that for f ∈ H 1 (X) and G = (g 1 , . . . , g d ) ∈ H 1 (X, R d ), ∇ x f, G π = d i=1 ∂ i f, g i π and apply the result above. The second statement is immediate.
Proof. From Lemma 7, for F ∈ H 1 (X, R d ) and g ∈ H 1 (π) we have ∇ x g, F π = g, −div x F + ∇ x U F π . We need to check that this is applicable in the case x → F v (x) := v·f (x, v) for fixed v ∈ V, provided F v ∈ H 1 (E, R d ), which is clearly true here. This is true for v = 0. For v = 0, noting that ∇ x (v · f ) = v(∇ x f ) we deduce that F v ∈ H 1 (X, R d ). The rest then follows from a calculation identical to that in the proof of [2, Proposition 7].