Cores for Piecewise-Deterministic Markov Processes used in Markov Chain Monte Carlo

We show fundamental properties of the Markov semigroup of recently proposed MCMC algorithms based on Piecewise-deterministic Markov processes (PDMPs) such as the Bouncy Particle Sampler, the Zig-Zag process or the Randomized Hamiltonian Monte Carlo method. Under assumptions typically satisfied in MCMC settings, we prove that PDMPs are Feller and that their generator admits the space of infinitely differentiable functions with compact support as a core. As we illustrate via martingale problems and a simplified proof of the invariance of target distributions, these results provide a fundamental tool for the rigorous analysis of these algorithms and corresponding stochastic processes.

functions f : R d → R of the domain dom(L) of L often lack properties such as differentiability, boundedness or compact support. This makes mathematical analysis hard and technical. Even to give a rigorous proof of the invariance of the target distribution under the Markov process -a fundamental property for their applicability -can often be laborious [7,14]. The goal of this work is to help out here: we give sufficient conditions for the PDMP to be a Feller process and for the generator L to admit the space C ∞ c (R d ) of compactly supported, infinitely differentiable functions as a core. This simplifies analysis significantly.
For example, it follows that infinitesimal invariance of a probability measure on C ∞ c (R d ) immediately implies invariance and we can equivalently characterize a PDMP process via martingale problems on C ∞ c (R d ). With this work, we therefore hope to provide a useful tool for other researchers to make further progress in the rigorous analysis of these algorithms and their stochastic processes.
This work is structured as follows. In section 2, we define PDMPs on R d , the basic concept of this work. In section 3, we prove a new inequality for the "Jacobian" of PDMPs, which provides the basis for the following sections. In section 4, we prove that PDMPs are Feller under assumptions typically satisfied in MCMC settings. In section 5, we combine these results to the main theorem of this work identifying C ∞ c (R d ) as a core of PDMPs under reasonable assumptions. In section 6, we show how our result can be applied to popular MCMC schemes. Finally, we show in section 7 how our results can be applied in the analysis of MCMC algorithms.

Piecewise-deterministic Markov processes
We start by giving a definition of a Piecewise-deterministic Markov process (PDMP) on R d [9]. The deterministic dynamics of a PDMP are given by the ordinary differential equation (ODE) d dt x(t) = g(x(t)), x(0) = z ∈ R d , (2.1) where g ∈ C 1 (R d , R d ) is a vector field. We impose the following assumption on g: Assumption 2.1 (Lipschitz continuity). The function g ∈ C 1 (R d , R d ) is Lipschitz continuous, i.e. there exists a L > 0 such that where · denotes the operator norm with respect to the Euclidean norm.
In a PDMP, the deterministic dynamics are interrupted by random jumps at random times. To describe the random jumps, we are given a probability space (S, S, Ξ) and a function R : . For a fixed ξ ∈ S, R ξ : R d → R d describes a "jump map" and ξ ∼ Ξ describes the random sample of such a jump map.
The random jump times are given by an inhomogeneous Poisson process with a continuous intensity function λ : R d → R ≥0 . More specifically, given the process is at z ∈ R d at time t 0 = 0, the distribution µ z of the time until the next jump is determined by Finally, define the position between jumps Z t .
A process Z t defined like that is called a Piecewise-deterministic Markov process (PDMP).

Grönwall-Jacobi inequality
In this section, we prove a Grönwall-like inequality for PDMPs, which will provide the basis for later sections. For all t ∈ R, we write Dϕ t to denote the Jacobian of the function ϕ t : R d → R d .
Consider the example on R with g(x) = x and L = 1. Then ϕ t (z) = z exp (t) and |Dϕ t (z)| = | exp (t)|. So one can see that the bound is actually sharp in this case. subcontractive: Assumption 3.2 states that the jump maps do not enlarge distances between points. Intuitively, this allows to control the position of a PDMP after a jump. Using this assumption, Lemma 3.1 can be further extended to a process allowing for jumps: Cores for PDMPs used in MCMC Proposition 3.3 (Grönwall-Jacobi for PDMPs). Suppose that assumption 2.1 and 3.2 hold. Then for all t ≥ 0, k ∈ N, t 0 , t 1 , ...t k ≥ 0 such that t 0 + ... + t k = t, ξ 1 , ...ξ k ∈ S and z ∈ R d : Proof. For k = 0 this is Lemma 3.1. Assume that the statement is true for k − 1.
where in the last inequality we used Assumption 3.2.

PDMPs and the Feller property
be the subspace of all continuous function that vanish at infinity and let C k c (R d ) be the subspace of k-times continuously differentiable functions with compact support.
The main goal of this section is to show fundamental properties of P t , in particular that it is Feller. Define the space on which P t is strongly-continuous is a local martingale [9]. For the scope of this work, it is sufficient that every differentiable and bounded f is in the domain of A and it holds [9, theorem 26.14]).
Assumption 4.1. One of the following two conditions is true: Proposition 4.2 (Feller). Let Assumption 2.1 and 4.1 be true, then P t is Feller, i.e. its semigroup P t satisfies the following two conditions: Proof. We begin by proving strong continuity.
Af is continuous and has compact support. If λ ∞ < ∞ both summands in eq. (4.2) are uniformly bounded.
In both cases, the local martingale in eq. (4.1) is a true martingale due to the uniform bound. By taking expectation values in eq. (4.1) and using Fubini, one has for all z ∈ R d : [10]. By this, it follows that also Feller property. Assume condition (1) of Assumption 4.1 is true. Let f ∈ C 0 (R d ) and ǫ > 0. Let N t be the number of jumps up to time t. Since λ is bounded, we a find a k ∈ N such that the probability of cases where N t > k is less than ǫ/ f ∞ . But then For all cases where N t ≤ k, we find that the norm of goes to infinity as z → ∞. (the jumping times and N t might change but this does not matter since the dynamics go to infinity uniformly over 0 ≤ s ≤ t and N t is bounded). Therefore, f (Z t ) → 0 as z → ∞ for all cases when N t ≤ k. By dominated convergence, Since ǫ > 0 was arbitary, we can conclude that P t f ∈ C 0 (R d ).
If condition (2) of Assumption 4.1 is true, it is clear that Z t → ∞ as z → ∞ and therefore f (Z t ) → 0 almost surely. By dominated convergence, we find that P t f (z) → 0 as z → ∞ and hence P t f ∈ C 0 (R d ).
Consider now P t as a semigroup on C 0 (R d ). Let L be its (strong) generator on C 0 (R d ) As the proof of the next lemma shows, the extended generator A and the strong generator L coincide on dom(L) under suitable assumptions. These assumptions often bound the non-local part of A (second summand in eq. (4.2)) such that jumps can be neglected for t → 0.
, which implies that Lf = Af and f ∈ dom(L). So it remains to show that Af ∈ B P (R d ).
If jumps are isometric, then it holds that Af has the same support as f , in particular compact support, and therefore is in (2) of Assumption 4.1 holds, we can show directly by dominated convergence that Af (z) → 0 as z → ∞, i.e. Af ∈ C 0 (R d ) ⊂ B P as shown above. This finishes the proof.

Cores for PDMPs
Knowing that P t is Feller is advantageous since C 0 (R d ) consists of "nice" functions and due to strong continuity we can restrict our attention to regular dense subsets such as C 1 c (R d ). However, Markov processes are often studied via their strong generator L. For example, in many cases we have no analytical expression for P t , while the generator of a PDMP is explicitly given (see eq. (4.2), note that A = L on dom(L)).
Naturally, the question arises whether such "sufficient, regular subsets" exists for dom(L) as they do for P t . In contrast to the operators P t , L is not continuous in general and therefore a mere dense subset is not "sufficient". That is why one searches for cores, a fundamental concept in the study of semigroups [15,22]. A core of L is a subset D ⊂ dom(L) such that for all f ∈ dom(L) there exists a sequence f n ∈ D such that Since P t is a contraction semigroup and f ∈ dom(L) by theorem 4.3, one has P t f ∈ dom(L) and Lf ∈ C 0 (R d ). To show differentiability, rewrite The function within the expectation is differentiable in z. By dominated convergence, it therefore suffices to show that the gradient is uniformly bounded: by the Grönwall-Jacobi bound in Assumption 3.3. One can conclude that: and by dominated convergence one can conclude that ∂ j P t f is continuous and bounded by ∇f ∞ exp (Lt). In addition, Since ∇f ∈ C c (R d ), by the Feller property P t ( ∇f ) = E z [ ∇f (Z t ) ] ∈ C 0 (R d ). The bound in eq. (5.3) then implies that ∂ j P t f vanishes at infinity as well. Combined with the continuity, we can conclude that ∂ j P t f ∈ C 0 (R d ).
Cores for PDMPs used in MCMC Corollary 5.2. Suppose that Assumption 2.1, 3.2 and 4.1 hold. Then the subspace D is a core of the generator L of the semigroup P t considered as a semigroup on C 0 (R d ).
Proof. Since  We can further reduce D to a more regular subset: Lemma 5.3. Suppose that Assumption 2.1, 3.2, and 4.1 hold. Then the set C 1 c (R d ) is a core of L.
Proof. Since C 1 c (R d ) ⊂ D and D is a core, it suffices to show that for all f ∈ D there Define B k := B(0, kc) \ B(0, k) and compute using Assumption 2.1 Next, we show that the second term in eq. (5.6) goes to zero if one of the conditions in Assumption 4.1 holds. Firstly, assume condition (2). Then, since R ξ (z) = z , also η k (R ξ (z)) = η k (z) and the second term becomes which converges to 0 as k → ∞ since Lf ∈ C 0 (R d ).
Secondly, assume condition (1). Then one can show by dominated convergence that This goes to zero since g, ∇f ∈ C 0 (R d ) and due to eq. (5.4).
Theorem 5.4. Suppose that Assumption 2.1, 3.2, and 4.1 hold. Then C ∞ c (R d ) is a core of the generator L of the semigroup P t considered as a semigroup on C 0 (R d ).
Proof. Let f ∈ C 1 c (R d ) be arbitrary. Choose f n ∈ C ∞ c (R d ) such that f n − f ∞ and ∇f n − ∇f ∞ go to zero as n → ∞ and such that suppf, suppf n ⊂ B(0, R) for a R > 0 (for example choose f n := f * η k for a mollifier η k ). With a similiar computation as above, one sees that Lf n → Lf uniformly: Clearly, the second and the third term converge to zero. What about the first term? If If λ is bounded (condition 1 of Assumption 4.1), one similarly gets that This finishes the proof.

MCMC algorithms fulfilling the assumptions
In this section, we give examples of PDMPs fulfilling the assumptions of this work where we will focus on recently proposed MCMC schemes. All algorithms aim to provide samples from a probability distribution π on R d with π(dq) = c −1 0 exp (−U (q))dq, (6.1) where U : R d → R is interpreted as potential and c 0 is the normalizing constant. Instead of sampling directly from π, all these MCMC schemes simulate Markov processes on an enlarged state space R d × R d with invariant distribution π ⊗ µ where µ = N (0, 1 d ) (alternatively µ = Unif(S d−1 )). For an element z = (q, p) ∈ R 2d , we interpret q ∈ R d as the position and p as the corresponding velocity.

Randomized Hamiltonian Monte Carlo
Firstly, we discuss the Randomized Hamiltonian Monte Carlo method (RHMC) [7]. While for the popular Hamiltonian Monte Carlo method the times between jumps are constant, here these times are Exp(λ ref )-distributed. Considered as a PDMP, this means that we have a constant intensity function λ(z) = λ ref . The vector field g is given by laws of physics, i.e. g(q, p) = (p, −∇U (q)) where U : R d → R is interpreted as a potential and ∇U as force. We assume throughout that U ≥ 0. The physical law of conservation of energy then imply the invariance of the Boltzmann-Gibbs distribution π ⊗N (0, 1 d ) under these dynamics [6]. Refreshments only act on the velocities: R ξ (q, p) = (q, αp+ √ 1 − α 2 ξ) where ξ ∼ N (0, I d ) and 0 ≤ α ≤ 1.   Then it holds that the RHMC process is Feller and C ∞ c (R 2d ) is a core of its generator.
Proof. If ∇U is Lipschitz then the vector field g is Lipschitz, so Assumption 2.1 is true. Since DR ξ (q, p) = 1 (in particular ≤ 1), it satisfies Assumption 3.2. Finally, we show that condition (2) of Assumption 4.1 is true. The flows preserves energy given by the Hamiltonian H [6], i.e. for z = (q, p) ∈ R 2d : Let R > 0 be arbitrary. By assumption, H −1 ([0, L]) is compact for every L > 0. By the Heine-Borel theorem, there is an In [11], it was proven that RHMC is Feller under similiar assumptions. But instead of using resolvents and semigroup theory, we can give a direct proof using the results of this work.
The assumption that α > 0 is crucial for the proof. Intuitively, if α = 0, the process "forgets" its velocity after each jump. So even if our process Z t starts at z = (q, p) and it holds that p → ∞ (and thereby z → ∞) we cannot infer that Z t goes to infinity as well.

Isometric refreshment process -Zig-Zag, Pure Reflection Process
Now, we consider PDMPs where velocities are constant in deterministic intervals, i.e. the flow is ϕ t (q, p) = (q + tp, p). (6.3) In particular, the vector field is g(q, p) = (p, 0) which is clearly Lipschitz. So Assumption 2.1 holds trivially. Let λ be an arbitrary continuous intensity function. Consider refreshments by (q, p) → (q, O ξ,q (p)) where O ξ,q ∈ O(d) is a random orthogonal matrix. We will call this an Isometric Refreshment process. Examples include the Zig-Zag process, where R ξ (q, p) = (q, p 1 , ..., p i(ξ)−1 , −p i(ξ) , p i(ξ)+1 ..., p d ) with i(ξ) is a random index [5], the Pure Reflection process, where R : (q, p) → (q, −p) [16], and as we will see also a version of the Bouncy Particle Sampler [8]. Proof. The only non-trivial thing to show is condition (2) of Assumption 4.1. Fix t > 0. If the process starts at z = (q 0 , p 0 ), we know that for some p 1 , . . . , p l such that p l = p 0 it holds Z t = (q 0 + k l=0 t l p l , p k ).
In particular, Z t ≥ p k = p 0 and Z t ≥ q 0 − k l=0 t l p l = q 0 − t p 0 . Therefore, As z = (q 0 , p 0 ) → ∞, the right-hand side goes to infinity. Hence, condition (2) of Assumption 4.1 is fulfilled and we can apply Lemma 4.2 and Theorem 5.4.

Bouncy Particle Sampler
The Bouncy Particle Sampler (BPS) was introduced in [8,23]. Again, the state space is R d × R d and the vector field is g(q, p) = (p, 0). But the bouncy particle sampler admits two kinds of jump mechanism: bounces and velocity refreshments. If a bounce occurs at state z = (q, p), it is done in the same way as a particle would change his velocity after a collision with the hyperplane {∇U (q)} ⊥ , i.e. the coordinates q stay constant but the velocity changes by The intensity for bounces are given by λ(q, p) = max( ∇U (q), p , 0) =: ∇U (q), p + .
Velocity refreshments are similiar to the ones of the RHMC: it is governed by a constant Proof. By direct computation, one can see that R(q) is an orthogonal map and the BPS with α = 1 is an isometric refreshment process. The statement follows by Theorem 6.2.
In practice, the Bouncy Particle Sampler is used with refreshments: Corollary 6.4. For autoregressive velocity refreshments, i.e. α > 0, the Bouncy Particle Sampler is Feller and C ∞ c (R 2d ) is a core of its generator.
Proof. The proof that the process is Feller is similiar to the proof of Proposition 4.2each jump mechanism fulfils one of the two conditions in Assumption 4.1.
To show that C ∞ c (R 2d ) is a core, let, as before, 0 < α ≤ 1. Let L ′ be the generator of the BPS with α = 1 (only bounces) and L the generator of the BPS as assumed in the corollary. Then by [9, theorem 26.14] . Since B is a bounded operator, the core C ∞ c (R 2d ) of L ′ will also be a core of L. So Lemma 6.3 implies the statement. Alternatively, the proof for Theorem 5.4 can be performed in the same way having two summands for each jump mechanism.

Invariance of probability measures
Let µ be a probability measure on R d , L the generator of a Markov process Z t on R d and D a core of L. By [15,Prop. 9.9.2], the condition Lf dµ = 0, for all f ∈ D implies that µ is an invariant distribution of Z t , i.e. if Z 0 ∼ µ then also Z t ∼ µ for all t > 0. By Theorem 5.4, we can provide very simple proofs for the invariance of the target distribution under the afore-mentioned MCMC schemes since we can choose D = C ∞ c (R 2d ) and use the regularity of functions in C ∞ c (R 2d ). As an illustration, we outline this for RHMC and invariant distribution µ = π ⊗ N (0, 1 d ) with π given as in eq. (6.1). We compute using eq. (4.2) Lf µ(dz) = ∇f, (p, ∇U (q) µ(dz) + λ ref E[f (q, αp + 1 − α 2 ξ)] − f (z)µ(dz).
Due to the regularity of f , one can use integration by parts to show that the first term vanishes. The second one vanishes since for independent p, ξ ∼ N (0, 1 d ) also αp + √ 1 − α 2 ξ ∼ N (0, 1 d ). In [7] where RHMC was introduced, the authors performed a similiar computation -but here, we can stop at this point since we know C ∞ c (R 2d ) is a core.

Martingale problems
As another application of the previous results, we present an equivalent characterization of a PDMP process by martingale problems [15, chapter 4]. For example, such a characterization is used in the analysis of scaling limits of PDMPs [11]. Define Now, let Y t be a solution for (A, v) and f ∈ dom(L) be arbitrary. Since C ∞ c (R d ) is a core of dom(L), we can find f n ∈ C ∞ c (R d ) such that Lf n − Lf ∞ , f n − f ∞ → 0. It can be easily seen that then for every t ≥ 0: M Y,fn